Repository: MicrosoftLearning/mslearn-ai-language Branch: main Commit: 71f8bfd1f6a2 Files: 103 Total size: 382.2 KB Directory structure: gitextract_wp6mqwji/ ├── .github/ │ └── workflows/ │ └── voice-live-web-files.yml ├── .gitignore ├── Instructions/ │ ├── Exercises/ │ │ ├── 01-analyze-text.md │ │ ├── 02-language-agent.md │ │ ├── 03-gen-ai-speech.md │ │ ├── 04-azure-speech.md │ │ ├── 05-azure-speech-mcp.md │ │ ├── 06-voice-live-agent.md │ │ └── 07-translation.md │ └── Labs/ │ ├── 01-analyze-text.md │ ├── 02-qna.md │ ├── 03-language-understanding.md │ ├── 04-text-classification.md │ ├── 05-extract-custom-entities.md │ ├── 06-translate-text.md │ ├── 07-speech.md │ ├── 08-translate-speech.md │ ├── 09-audio-chat.md │ ├── 10-voice-live-api.md │ └── 11-voice-live-agent-web.md ├── LICENSE ├── Labfiles/ │ ├── 01-analyze-text/ │ │ └── Python/ │ │ ├── readme.txt │ │ └── text-analysis/ │ │ ├── requirements.txt │ │ ├── reviews/ │ │ │ ├── review1.txt │ │ │ ├── review2.txt │ │ │ ├── review3.txt │ │ │ ├── review4.txt │ │ │ └── review5.txt │ │ └── text-analysis.py │ ├── 02-language-agent/ │ │ └── Python/ │ │ └── text-agent/ │ │ ├── requirements.txt │ │ └── text-agent.py │ ├── 02-qna/ │ │ ├── Python/ │ │ │ ├── qna-app/ │ │ │ │ ├── qna-app.py │ │ │ │ └── requirements.txt │ │ │ └── readme.txt │ │ └── ask-question.sh │ ├── 03-gen-ai-speech/ │ │ └── Python/ │ │ ├── generate-speech/ │ │ │ ├── generate-speech.py │ │ │ └── requirements.txt │ │ └── transcribe-speech/ │ │ ├── requirements.txt │ │ └── transcribe-speech.py │ ├── 03-language/ │ │ ├── Clock.json │ │ ├── Python/ │ │ │ ├── clock-client/ │ │ │ │ ├── clock-client.py │ │ │ │ └── requirements.txt │ │ │ └── readme.txt │ │ └── send-call.sh │ ├── 04-azure-speech/ │ │ └── Python/ │ │ └── voice-mail/ │ │ ├── requirements.txt │ │ └── voice-mail.py │ ├── 04-text-classification/ │ │ ├── Python/ │ │ │ ├── classify-text/ │ │ │ │ ├── articles/ │ │ │ │ │ ├── test1.txt │ │ │ │ │ └── test2.txt │ │ │ │ ├── classify-text.py │ │ │ │ └── requirements.txt │ │ │ └── readme.txt │ │ ├── classify-text.ps1 │ │ ├── test1.txt │ │ └── test2.txt │ ├── 05-custom-entity-recognition/ │ │ ├── Python/ │ │ │ ├── custom-entities/ │ │ │ │ ├── ads/ │ │ │ │ │ ├── test1.txt │ │ │ │ │ └── test2.txt │ │ │ │ ├── custom-entities.py │ │ │ │ └── requirements.txt │ │ │ └── readme.txt │ │ ├── extract-entities.ps1 │ │ ├── test1.txt │ │ └── test2.txt │ ├── 05-speech-tool/ │ │ └── Python/ │ │ └── speech-client/ │ │ ├── requirements.txt │ │ └── speech-client.py │ ├── 06-translator-sdk/ │ │ └── Python/ │ │ ├── readme.txt │ │ └── translate-text/ │ │ ├── requirements.txt │ │ └── translate.py │ ├── 06-voice-live/ │ │ └── Python/ │ │ └── chat-client/ │ │ ├── chat-client.py │ │ └── requirements.txt │ ├── 07-speech/ │ │ └── Python/ │ │ ├── readme.txt │ │ └── speaking-clock/ │ │ ├── requirements.txt │ │ └── speaking-clock.py │ ├── 07-translation/ │ │ └── Python/ │ │ ├── readme.txt │ │ └── translators/ │ │ ├── requirements.txt │ │ ├── translate-speech.py │ │ └── translate-text.py │ ├── 08-speech-translation/ │ │ └── Python/ │ │ ├── readme.txt │ │ └── translator/ │ │ ├── requirements.txt │ │ └── translator.py │ ├── 09-audio-chat/ │ │ └── Python/ │ │ ├── audio-chat.py │ │ └── requirements.txt │ └── 11-voice-live-agent/ │ └── python/ │ ├── .dockerignore │ ├── .gitignore │ ├── .python-version │ ├── Dockerfile │ ├── README.md │ ├── azdeploy.sh │ ├── azure.yaml │ ├── infra/ │ │ ├── ai-foundry.bicep │ │ ├── main.bicep │ │ └── main.parameters.json │ ├── pyproject.toml │ ├── requirements.txt │ └── src/ │ ├── __init__.py │ ├── flask_app.py │ ├── static/ │ │ ├── app.js │ │ └── style.css │ └── templates/ │ └── index.html ├── README.md ├── _build.yml ├── _config.yml ├── downloads/ │ └── python/ │ └── readme.md └── index.md ================================================ FILE CONTENTS ================================================ ================================================ FILE: .github/workflows/voice-live-web-files.yml ================================================ name: Zip Voice Live Web Files on: workflow_dispatch: push: branches: - 'main' paths: - 'Labfiles/11-voice-live-agent/python/**' permissions: contents: write defaults: run: shell: bash jobs: create_zip: runs-on: ubuntu-latest steps: - name: Checkout uses: actions/checkout@v4 - name: Create Voice Live zip run: | rm -f ./downloads/python/voice-live-web.zip cd ./Labfiles/11-voice-live-agent/python/ zip -r -q ../../../downloads/python/voice-live-web.zip . - name: Commit and push uses: Endbug/add-and-commit@v9 # Updated to latest version with: add: 'downloads/python/voice-live-web.zip' message: 'Updating Zip for python source files' push: true ================================================ FILE: .gitignore ================================================ bin obj *.sln ================================================ FILE: Instructions/Exercises/01-analyze-text.md ================================================ --- lab: title: 'Analyze text' description: "Use Azure Language in Foundry Tools to analyze text." level: 300 duration: 30 islab: true --- # Analyze Text **Azure Language in Foundry Tools** supports analysis of text, including language detection, entity recognition, and PII redaction. For example, suppose a travel agency wants to process hotel reviews that have been submitted to the company's web site. By using the Azure Language, they can determine the language each review is written in, identify named entities, such as places, landmarks, or people mentioned in the reviews, and redact any personally identifiable information before publishing them on the company's website. In this exercise, you'll use the Azure Language Python SDK for text analytics to implement a simple hotel review application. While this exercise is based on Python, you can develop text analytics applications using multiple language-specific SDKs; including: The code used in this exercise is based on the for Microsoft Foundry Tools SDK for Python. You can develop similar solutions using the SDKs for Microsoft .NET, JavaScript, and Java. Refer to [Microsoft Foundry SDK client libraries](https://learn.microsoft.com/azure/ai-foundry/how-to/develop/sdk-overview) for details. This exercise takes approximately **30** minutes. > **Note**: Some of the technologies used in this exercise are in preview or in active development. You may experience some unexpected behavior, warnings, or errors. ## Prerequisites Before starting this exercise, ensure you have: - An active [Azure subscription](https://azure.microsoft.com/pricing/purchase-options/azure-account) - [Visual Studio Code](https://code.visualstudio.com/) installed - [Python version **3.13.xx**](https://www.python.org/downloads/release/python-31312/) installed\* - [Git](https://git-scm.com/install/) installed and configured - [Azure CLI](https://learn.microsoft.com/cli/azure/install-azure-cli?view=azure-cli-latest) installed > \* Python 3.14 is available, but some dependencies are not yet compiled for that release. The lab has been successfully tested with Python 3.13.12. ## Create a Microsoft Foundry project Microsoft Foundry uses projects to organize models, resources, data, and other assets used to develop an AI solution. 1. In a web browser, open the [Microsoft Foundry portal](https://ai.azure.com) at `https://ai.azure.com` and sign in using your Azure credentials. Close any tips or quick start panes that are opened the first time you sign in, and if necessary use the Foundry logo at the top left to navigate to the home page. 1. If it is not already enabled, in the tool bar the top of the page, enable the **New Foundry** option. Then, if prompted, create a new project with a unique name; expanding the **Advanced options** area to specify the following settings for your project: - **Foundry resource**: *Use the default name for your resource (usually {project_name}-resource)* - **Subscription**: *Your Azure subscription* - **Resource group**: *Create or select a resource group* - **Region**: Select any available region 1. Select **Create**. Wait for your project to be created. 1. On the home page for your project, note that the API key, project endpoint, and OpenAI endpoint are displayed here. > **TIP**: You're going to need the project endpoint later! ## Get the application files from GitHub The initial application files you'll need to develop the review analysis application are provided in a GitHub repo. 1. Open Visual Studio Code. 1. Open the command palette (*Ctrl+Shift+P*) and use the `Git:clone` command to clone the `https://github.com/microsoftlearning/mslearn-ai-language` repo to a local folder (it doesn't matter which one). Then open it. You may be prompted to confirm you trust the authors. 1. After the repo has been cloned, in the Explorer pane, navigate to the folder containing the application code files at **/Labfiles/01-analyze-text/Python/text-analysis**. The application files include: - **reviews** (a subfolder containing the review documents) - **.env** (the application configuration file) - **requirements.txt** (the Python package dependencies that need to be installed) - **text-analysis.py** (the code file for the application) ## Configure your application 1. In Visual Studio Code, view the **Extensions** pane; and if it is not already installed, install the **Python** extension. 1. In the **Command Palette**, use the command `python:select interpreter`. Then select an existing environment if you have one, or create a new **Venv** environment based on your Python 3.13.x installation. > **Tip**: If you are prompted to install dependencies, you can install the ones in the *requirements.txt* file in the */Labfiles/01-analyze-text/Python/text-analysis* folder; but it's OK if you don't - we'll install them later! > **Tip**: If you prefer to use the terminal, you can create your **Venv** environment with `python -m venv labenv`, then activate it with `\labenv\Scripts\activate`. 1. In the **Explorer** pane, right-click the **text-analysis** folder containing the application files, and select **Open in integrated terminal** (or open a terminal in the **Terminal** menu and navigate to the */Labfiles/01-analyze-text/Python/text-analysis* folder.) > **Note**: Opening the terminal in Visual Studio Code will automatically activate the Python environment. You may need to enable running scripts on your system. 1. Ensure that the terminal is open in the **text-analysis** folder with the prefix **(.venv)** to indicate that the Python environment you created is active. 1. Install the Azure Language Text Analytics SDK and other required packages by running the following command: ``` pip install -r requirements.txt ``` 1. In the **Explorer** pane, in the **text-analysis** folder, select the **.env** file to open it. Then update the configuration values to include the **endpoint** (up to the *.com* domain) for your Foundry project (copy these from the Foundry portal). > **Important**: Modify the pasted endpoint to remove the "/api/projects/{project_name}" suffix - the endpoint should be *https://{your-foundry-resource-name}.services.ai.azure.com*. Save the modified configuration file. ## Add code to connect to your Azure AI Language resource 1. In the **Explorer** pane, in the **text-analysis** folder, open the **text-analysis.py** file. 1. Review the existing code. You will add code to work with the Azure Language Text Analytics SDK. > **Tip**: As you add code to the code file, be sure to maintain the correct indentation. 1. At the top of the code file, under the existing namespace references, find the comment **Import namespaces** and add the following code to import the namespaces you will need to use the Text Analytics SDK: ```python # import namespaces from azure.identity import DefaultAzureCredential from azure.ai.textanalytics import TextAnalyticsClient ``` 1. In the **main** function, note that code to load the endpoint from the configuration file has already been provided. Then find the comment **Create client using endpoint**, and add the following code to create a client for the Text Analysis API: ```Python # Create client using endpoint credential = DefaultAzureCredential() ai_client = TextAnalyticsClient(endpoint=foundry_endpoint, credential=credential) ``` 1. Save the changes to the code file. Then, in the terminal pane, use the following command to sign into Azure. ```powershell az login ``` > **Note**: In most scenarios, just using *az login* will be sufficient. However, if you have subscriptions in multiple tenants, you may need to specify the tenant by using the *--tenant* parameter. See [Sign into Azure interactively using the Azure CLI](https://learn.microsoft.com/cli/azure/authenticate-azure-cli-interactively) for details. 1. When prompted, follow the instructions to sign into Azure. Then complete the sign in process in the command line, viewing (and confirming if necessary) the details of the subscription containing your Foundry resource. 1. After you have signed in, enter the following command to run the application: ``` python text-analysis.py ``` 1. Observe the output as the code should run without error, displaying the contents of each review text file in the **reviews** folder. The application successfully creates a client for the Text Analytics API but doesn't make use of it. We'll fix that in the next section. ## Add code to detect language Now that you have created a client for the API, let's use it to detect the language in which each review is written. 1. In the code editor, find the comment **Get language**. Then add the code necessary to detect the language in each review document: ```python # Get language detectedLanguage = ai_client.detect_language(documents=[text])[0] print('\nLanguage: {}'.format(detectedLanguage.primary_language.name)) ``` > **Note**: *In this example, each review is analyzed individually, resulting in a separate call to the service for each file. An alternative approach is to create a collection of documents and pass them to the service in a single call. In both approaches, the response from the service consists of a collection of documents; which is why in the Python code above, the index of the first (and only) document in the response ([0]) is specified.* 1. Save your changes. Then re-run the program. 1. Observe the output, noting that this time the language for each review is identified. ## Add code to extract entities Often, documents or other bodies of text mention people, places, time periods, or other entities. The text Analytics API can detect multiple categories (and subcategories) of entity in your text. 1. In the code editor, find the comment **Get entities**. Then, add the code necessary to identify entities that are mentioned in each review: ```python # Get entities entities = ai_client.recognize_entities(documents=[text])[0].entities if len(entities) > 0: print("\nEntities") for entity in entities: print('\t{} ({})'.format(entity.text, entity.category)) ``` 1. Save your changes and re-run the program. 1. Observe the output, noting the entities that have been detected in the text. ## Add code to redact PII Often, privacy policies and legislation can require that personally identifiable information (PII). such as names, addresses, phone numbers, and other private details be redacted from documents. 1. In the code editor, find the comment **Get PII**. Then, add the code necessary to identify PII entities that are mentioned in each review: ```python # Get PII pii_result = ai_client.recognize_pii_entities(documents=[text])[0] pii_entities = pii_result.entities if len(pii_entities) > 0: print("\nPII Entities") for pii_entity in pii_entities: print('\t{} ({})'.format(pii_entity.text, pii_entity.category)) print("Redacted Text:\n {}".format(pii_result.redacted_text)) ``` 1. Save your changes and re-run the program. 1. Observe the output, noting the PII entities that are identified, and reviewing the redacted version of each document that is produced. ## Clean up If you've finished exploring Azure Language in Foundry Tools, you should delete the resources you have created in this exercise to avoid incurring unnecessary Azure costs. 1. Open the [Azure portal](https://portal.azure.com) and view the contents of the resource group where you deployed the resources used in this exercise. 1. On the toolbar, select **Delete resource group**. 1. Enter the resource group name and confirm that you want to delete it. ================================================ FILE: Instructions/Exercises/02-language-agent.md ================================================ --- lab: title: 'Develop a text analysis agent' description: 'Use Azure Language in Foundry Tools to add text analysis capabilities to an AI agent.' duration: 30 level: 300 islab: true --- # Develop a text analysis agent **Azure Language in Foundry Tools** supports analysis of text, including language detection, entity recognition, and PII redaction. You can use the service directly in an application through its REST API and several language-specific SDKs. You can also use the **Azure Language in Foundry Tools MCP server** to integrate its capabilities into an AI agent; which is what you'll do in this exercise. The code used in this exercise is based on the for Microsoft Foundry Tools SDK for Python. You can develop similar solutions using the SDKs for Microsoft .NET, JavaScript, and Java. Refer to [Microsoft Foundry SDK client libraries](https://learn.microsoft.com/azure/ai-foundry/how-to/develop/sdk-overview) for details. This exercise takes approximately **30** minutes. > **Note**: Some of the technologies used in this exercise are in preview or in active development. You may experience some unexpected behavior, warnings, or errors. ## Prerequisites Before starting this exercise, ensure you have: - An active [Azure subscription](https://azure.microsoft.com/pricing/purchase-options/azure-account) - [Visual Studio Code](https://code.visualstudio.com/) installed - [Python version **3.13.xx**](https://www.python.org/downloads/release/python-31312/) installed\* - [Git](https://git-scm.com/install/) installed and configured - [Azure CLI](https://learn.microsoft.com/cli/azure/install-azure-cli?view=azure-cli-latest) installed > \* Python 3.14 is available, but some dependencies are not yet compiled for that release. The lab has been successfully tested with Python 3.13.12. ## Create a Microsoft Foundry project Microsoft Foundry uses projects to organize models, resources, data, and other assets used to develop an AI solution. 1. In a web browser, open the [Microsoft Foundry portal](https://ai.azure.com) at `https://ai.azure.com` and sign in using your Azure credentials. Close any tips or quick start panes that are opened the first time you sign in, and if necessary use the Foundry logo at the top left to navigate to the home page. 1. If it is not already enabled, in the tool bar the top of the page, enable the **New Foundry** option. Then, if prompted, create a new project with a unique name; expanding the **Advanced options** area to specify the following settings for your project: - **Foundry resource**: *Use the default name for your resource (usually {project_name}-resource)* - **Subscription**: *Your Azure subscription* - **Resource group**: *Create or select a resource group* - **Region**: Select any available region > **TIP**: Remember (or make a note of) the Foundry resource name - you're going to need it later! 1. Select **Create**. Wait for your project to be created. 1. On the home page for your project, note that the API key, project endpoint, and OpenAI endpoint are displayed here. > **TIP**: Copy the project key to the clipboard - you're going to need it later! ## Create an agent Now that you have a Foundry project, you can create an agent. 1. Now you're ready to **Start building**. Select **Create agents** (or on the **Build** page, select the **Agents** tab); and create a new agent named `Text-Analysis-Agent`. When ready, your agent opens in the agent playground. 1. In the model drop-down list, ensure that a **gpt-4.1** model has been deployed and selected for your agent. 1. Assign your agent the following **Instructions**: ``` You are an AI agent that assists users by helping them analyze text. ``` 1. Use the **Save** button to save the changes. 1. Test the agent by entering the following prompt in the **Chat** pane: ``` What can you help me with? ``` The agent should respond with an appropriate answer based on its instructions. ## Create an Azure Language in Foundry Tools connection Foundry includes an MCP server for Azure Language in Foundry Tools, which you can connect to your project and use in your agent. 1. In the navigation pane on the left, select the **Tools** page. 1. On the **Tolls** tab, connect a tool; selecting **Azure Language in Foundry Tools** in the **Catalog** and connecting it to an endpoint. specifying the following configuration - **Name**: A unique name for your tool/ - **Remote MCP Server endpoint**: `https://{foundry-resource-name}.cognitiveservices.azure.com/language/mcp?api-version=2025-11-15-preview` - **Parameters**: foundry-resource-name: *Your foundry resource name* - **Authentication**: Key-based: - **Credential**: - `Ocp-Apim-Subscription-Key`: *API Key for your Foundry project* > **Note**: If key-based authentication is disabled by a policy in your Azure subscription, you can use Entra ID authentication to connect the agent to the Azure Language service. 1. Wait for the MCP tool connection to be created, and then view its details page. 1. On the details page for the Azure Language in Foundry Tools connection, select **Use in an agent**, and then select the **Text-Analysis-Agent** agent you created previously. The agent should open in the playground, with the Azure Language in Foundry Tools tool connected. ## Test the Azure Language tool in the playground Now let's test the agent's ability to use the tool you connected. 1. In the agent playground for the **Text-Analysis-Agent** agent, modify the instructions as follows: ``` You are an AI agent that assists users by helping them analyze text. Use the Azure Language tool to perform text analysis tasks. ``` 1. Use the **Save** button to save the changes. 1. Test the agent by entering the following prompt in the **Chat** pane: ``` Identify the PII entities in this article, and generate a redacted version: Microsoft was founded on April 4, 1975, by childhood friends Bill Gates (then 19) and Paul Allen (22) after they were inspired by the Altair 8800, one of the first personal computers, featured on the cover of Popular Electronics. They contacted the Altair’s maker, MITS, and successfully developed a version of the BASIC programming language, despite initially not owning the machine themselves. The pair formed a partnership called “Micro‑Soft” in Albuquerque, New Mexico, close to MITS’s headquarters, with the goal of writing software for emerging microcomputers. In the late 1970s, Microsoft grew by supplying programming languages to multiple hardware vendors, then relocated to the Seattle area in 1979. A pivotal moment came in 1980 when Microsoft partnered with IBM to provide an operating system for the IBM PC, leading to MS‑DOS and establishing the company’s dominance in personal computing. Gates guided the company’s long-term strategy as CEO, while Allen contributed key technical vision in its early years, setting Microsoft on a path that would reshape the software industry. ``` 1. When prompted, approve use of the Azure Language tool by selecting **Always approve all Azure Language in Foundry Tools tools** (you may need to do this twice because the prompt asked for two distinct text analysis tasks). 1. Review the response, which should identify any personally identifiable information in the article about the founding of Microsoft, and create a version of the article with this information redacted. 1. Review the **Logs** for the chat and verify that the Azure Language tool was used by the agent to process the prompt. ## Configure tool approval As you've seen in the playground, to use the tool, the agent needs approval. 1. In the playground, in the list of **Tools** under the **Instructions**, in the menu for the Azure language tool you added, select **Configure**. 1. Ensure that the **Approval setting for tools in this MCP server for this agent** setting is **Always auto-approve all tools** (if not, change it and add it). 1. Save any changes to the agent. ## Create a client application Now that you have a working agent, you can create a client application that uses it. ### Get the application files from GitHub 1. Open Visual Studio Code. 1. Open the command palette (*Ctrl+Shift+P*) and use the `Git:clone` command to clone the `https://github.com/microsoftlearning/mslearn-ai-language` repo to a local folder (it doesn't matter which one). Then open it. You may be prompted to confirm you trust the authors. 1. After the repo has been cloned, in the Explorer pane, navigate to the folder containing the application code files at **/Labfiles/02-language-agent/Python/text-agent**. The application files include: - **.env** (the application configuration file) - **requirements.txt** (the Python package dependencies that need to be installed) - **text-agent.py** (the code file for the application) ### Configure the application 1. In Visual Studio Code, view the **Extensions** pane; and if it is not already installed, install the **Python** extension. 1. In the **Command Palette**, use the command `python:select interpreter`. Then select an existing environment if you have one, or create a new **Venv** environment based on your Python 3.1x installation. > **Tip**: If you are prompted to install dependencies, you can install the ones in the *requirements.txt* file in the */Labfiles/02-language-agent/Python/text-agent* folder; but it's OK if you - don't we'll install them later! > **Tip**: If you prefer to use the terminal, you can create your **Venv** environment with `python -m venv labenv`, then activate it with `\labenv\Scripts\activate`. 1. In the **Explorer** pane, right-click the **text-agent** folder containing the application files, and select **Open in integrated terminal** (or open a terminal in the **Terminal** menu and navigate to the */Labfiles/02-language-agent/Python/text-agent* folder.) > **Note**: Opening the terminal in Visual Studio Code will automatically activate the Python environment. You may need to enable running scripts on your system. 1. Ensure that the terminal is open in the **text-agent** folder with the prefix **(.venv)** to indicate that the Python environment you created is active. 1. Install the Foundry SDK package, the Azure Identity package, and other required packages by running the following command: ``` pip install -r requirements.txt ``` 1. In the **Explorer** pane, in the **text-agent** folder, select the **.env** file to open it. Then update the configuration values to include your project **endpoint** (from the project home page in Foundry Portal) and the name of your agent (which should be **Text-Analysis-Agent** - note that this name is case-sensitive). 1. Save the modified configuration file. ### Implement application code 1. In the **Explorer** pane, in the **text-agent** folder, open the **text-agent.py** file. 1. Review the existing code. You will add code to submit prompts to your agent. > **Tip**: As you add code to the code file, be sure to maintain the correct indentation. 1. At the top of the code file, under the existing namespace references, find the comment **Import namespaces** and add the following code to import the namespaces you will need: ```python # import namespaces from azure.identity import DefaultAzureCredential from azure.ai.projects import AIProjectClient ``` 1. In the **main** function, note that code to load the endpoint from the configuration file has already been provided. Then find the comment **Get project client**, and add the following code to create a client for your Foundry project: ```python # Get project client project_client = AIProjectClient( endpoint=foundry_endpoint, credential=DefaultAzureCredential(), ) ``` 1. Find the comment **Get an OpenAI client**, and add the following code to get an OpenAI client with which to call your agent. ```python # Get an OpenAI client openai_client = project_client.get_openai_client() ``` 1. Find the comment **Use the agent to get a response**, and add the following code to submit a user prompt to your agent, and display the response. ```python # Use the agent to get a response prompt = input("User prompt: ") response = openai_client.responses.create( input=[{"role": "user", "content": prompt}], extra_body={"agent_reference": {"name": agent_name, "type": "agent_reference"}}, ) print(f"{agent_name}: {response.output_text}") ``` 1. Save the changes you made to the code file. ## Test the client application Now let's test the application by running it in a Python environment and authenticating the connection to your project. 1. In the Visual Studio Code terminal, enter the following command to sign into Azure ```powershell az login ``` > **Note**: In most scenarios, just using *az login* will be sufficient. However, if you have subscriptions in multiple tenants, you may need to specify the tenant by using the *--tenant* parameter. See [Sign into Azure interactively using the Azure CLI](https://learn.microsoft.com/cli/azure/authenticate-azure-cli-interactively) for details. 1. When prompted, follow the instructions to sign into Azure. Then complete the sign in process in the command line, viewing (and confirming if necessary) the details of the subscription containing your Foundry resource. 1. After you have signed in, enter the following command to run the application: ```powershell python text-agent.py ``` 1. When prompted, enter the following prompt: ``` Extract named entities from the following text: "Pierre and I went to Paris on July 14th." ``` 1. Review the response, which should identify named people, places, and dates. ## View tool details The Azure Language in Foundry Tools tool provides a wide range of functionality, and the agent must select the appropriate function to call. We can see the options available in the agent's response. 1. In the **text-agent.py** code file, add the following line immediately after the *print(f"{agent_name}: {response.output_text}")* line you added previously (before the *except Exception as ex:* line): ```python print(f"\nResponse Details: {response.model_dump_json(indent=2)}") ``` 1. Save the changes to the code file. 1. In the terminal, re-enter the command to run the application (`python text-agent.py`). 1. When prompted, enter the following command: ``` Tell me what entities and dates are mentioned in this review, and whether it is positive or negative: "I booked my flight to Paris in July with Margie's Travel, and it was fantastic!" ``` 1. Review the response (you may need to scroll quite far up to see it), which should identify entities and dates, and determine the sentiment of the text. 1. Review the JSON response details, which indicate each of the tools available to the agent. In this case, it should have used the **extract_named_entities_from_text** and **detect_sentiment_from_text** tools within Azure Language in Foundry Tools. ## Clean up resources If you're finished exploring the Azure Language service, you can delete the resources you created in this exercise. Here's how: 1. In the Azure portal, browse to the Foundry resource you created in this lab. 1. On the resource page, select **Delete** and follow the instructions to delete the resource. ================================================ FILE: Instructions/Exercises/03-gen-ai-speech.md ================================================ --- lab: title: 'Use speech-capable generative AI models' description: Implement speech functionality using generative AI. duration: 30 level: 300 islab: true --- # Use speech-capable generative AI models Increasingly, generative AI model capabilities are evolving beyond text-based language completion to support content in other formats - including audible speech. In this exercise, you'll use generative AI models to support two common scenarios: - Speech synthesis (text-to-speech) - generating speech output. - Speech recognition (speech-to-text) - transcribing speech input. While this exercise is based on Python, you can develop generative AI speech applications using multiple language-specific SDKs; including: - [OpenAI SDK for Python](https://pypi.org/project/openai/) - [OpenAI SDK for .NET]() - [OpenAI SDK for JavaScript](https://www.npmjs.com/package/openai) This exercise takes approximately **30** minutes. > **Note**: Some of the technologies used in this exercise are in preview or in active development. You may experience some unexpected behavior, warnings, or errors. ## Prerequisites Before starting this exercise, ensure you have: - An active [Azure subscription](https://azure.microsoft.com/pricing/purchase-options/azure-account) - [Visual Studio Code](https://code.visualstudio.com/) installed - [Python version **3.13.xx**](https://www.python.org/downloads/release/python-31312/) installed\* - [Git](https://git-scm.com/install/) installed and configured - [Azure CLI](https://learn.microsoft.com/cli/azure/install-azure-cli?view=azure-cli-latest) installed > \* Python 3.14 is available, but some dependencies are not yet compiled for that release. The lab has been successfully tested with Python 3.13.12. ## Create a Microsoft Foundry project Microsoft Foundry uses projects to organize models, resources, data, and other assets used to develop an AI solution. 1. In a web browser, open the [Microsoft Foundry portal](https://ai.azure.com) at `https://ai.azure.com` and sign in using your Azure credentials. Close any tips or quick start panes that are opened the first time you sign in, and if necessary use the Foundry logo at the top left to navigate to the home page. 1. If it is not already enabled, in the tool bar the top of the page, enable the **New Foundry** option. Then, if prompted, create a new project with a unique name; expanding the **Advanced options** area to specify the following settings for your project: - **Foundry resource**: *Use the default name for your resource (usually {project_name}-resource)* - **Subscription**: *Your Azure subscription* - **Resource group**: *Create or select a resource group* - **Region**: Select *East US 2* (For this exercises some models are only available in this location.) 1. Select **Create**. Wait for your project to be created. Then view its home page. ## Deploy models To develop speech-enables apps, we're going to need speech-enabled models. Specifically, we need a model that can perform speech-generation, and a model that can process speech input. ### Deploy a speech-generation model 1. Now you're ready to **Start building**. Select **Find models** (or on the **Discover** page, select the **Models** tab) to view the Microsoft Foundry model catalog. 1. In the model catalog, search for `gpt-4o-mini-tts`. 1. Review the model card, and then deploy it using the default settings. 1. When the model has been deployed, view its details, noting that the **Target URI** and **Key** required to use it are available here (you'll need the Target URI later). ### Deploy a speech-recognition model 1. In the Foundry portal menu bar, select **Build**; and then view the **Models** page. Note that the *gpt-4o-mini-tts* model you deployed is listed. 1. Select **Deploy a base model, and search the catalog for `gpt-4o-mini-transcribe`. 1. Deploy a *gpt-4o-mini-transcribe* model using the default settings. 1. Return to the **Models** page and verify that both of the model you deployed are listed. 1. Select either of the models to view the Target URI you need to use in your code. ## Get the application files from GitHub The initial application files you'll need to develop speech applications are provided in a GitHub repo. 1. Open Visual Studio Code. 1. Open the command palette (*Ctrl+Shift+P*) and use the `Git:clone` command to clone the `https://github.com/microsoftlearning/mslearn-ai-language` repo to a local folder (it doesn't matter which one). Then open it. You may be prompted to confirm you trust the authors. 1. In Visual Studio Code, view the **Extensions** pane; and if it is not already installed, install the **Python** extension. 1. In the **Command Palette**, use the command `python:select interpreter`. Then select an existing environment if you have one, or create a new **Venv** environment based on your Python 3.1x installation. > **Tip**: If you are prompted to install dependencies, you can install the ones in the *requirements.txt* file in the */Labfiles/03-gen-ai-speech/Python/generate-speech* folder; but it's OK if you don't - we'll install them later! > **Tip**: If you prefer to use the terminal, you can create your **Venv** environment with `python -m venv labenv`, then activate it with `\labenv\Scripts\activate`. ## Create a speech-generation app 1. After the repo has been cloned, in the Explorer pane, navigate to the folder containing the application code files at **/Labfiles/03-gen-ai-speech/Python/generate-speech**. The application files include: - **.env** (the application configuration file) - **requirements.txt** (the Python package dependencies that need to be installed) - **generate-speech.py** (the code file for the application) ### Configure your application 1. In the **Explorer** pane, right-click the **generate-speech** folder containing the application files, and select **Open in integrated terminal** (or open a terminal in the **Terminal** menu and navigate to the */Labfiles/03-gen-ai-speech/Python/generate-speech* folder.) > **Note**: Opening the terminal in Visual Studio Code will automatically activate the Python environment. You may need to enable running scripts on your system. 1. Ensure that the terminal is open in the **generate-speech** folder with the prefix **(.venv)** to indicate that the Python environment you created is active. 1. Install the OpenAI SDK package and other required packages by running the following command: ``` pip install -r requirements.txt ``` 1. In the **Explorer** pane, in the **generate-speech** folder, select the **.env** file to open it. Then update the configuration values to include the **Target URI** (endpoint) for your **gpt-4o-mini-tts** model. > **Tip**: Copy the Target URI from the model details page in the Foundry portal. Save the modified configuration file. ### Write code to use the model for speech-generation 1. In the **Explorer** pane, in the **generate-speech** folder, select the **generate-speech.py** file to open it. 1. Review the existing code. You will add code to use the OpenAI SDK to access your model. > **Tip**: As you add code to the code file, be sure to maintain the correct indentation. 1. At the top of the code file, under the existing namespace references, find the comment **Import namespaces** and add the following code to import the namespace you will need to use the OpenAI SDK: ```python # import namespaces from openai import AzureOpenAI from azure.identity import DefaultAzureCredential, get_bearer_token_provider ``` 1. In the **main** function, note that code to load the endpoint from the configuration file has already been provided. Then find the comment **Create the Azure OpenAI client**, and add the following code to create a client for the OpenAI API: ```Python # Create the Azure OpenAI client token_provider = get_bearer_token_provider( DefaultAzureCredential(), "https://ai.azure.com/.default" ) client = AzureOpenAI( azure_endpoint=endpoint, azure_ad_token_provider = token_provider, api_version="2025-03-01-preview" ) ``` 1. Find the comment **Generate speech and save to file**, and add the following code to submit a prompt to the speech-generation model save the response as a file. ```Python # Generate speech and save to file with client.audio.speech.with_streaming_response.create( model=model_deployment, voice="alloy", input="My voice is my passport!", instructions="Speak in a serious tone.", ) as response: response.stream_to_file(speech_file_path) ``` 1. Save the changes to the code file. ### Run the application 1. In the terminal pane, use the following command to sign into Azure. ```powershell az login ``` > **Note**: In most scenarios, just using *az login* will be sufficient. However, if you have subscriptions in multiple tenants, you may need to specify the tenant by using the *--tenant* parameter. See [Sign into Azure interactively using the Azure CLI](https://learn.microsoft.com/cli/azure/authenticate-azure-cli-interactively) for details. 1. When prompted, follow the instructions to sign into Azure. Then complete the sign in process in the command line, viewing (and confirming if necessary) the details of the subscription containing your Foundry resource. 1. After you have signed in, enter the following command to run the application: ``` python generate-speech.py ``` 1. Observe the output as the code generates the requested speech and saves it in a file. The code should also play the generated audio file. ## Create a speech-transcription app 1. In the Explorer pane, navigate to the folder containing the application code files at **/Labfiles/03-gen-ai-speech/Python/transcribe-speech**. The application files include: - **.env** (the application configuration file) - **requirements.txt** (the Python package dependencies that need to be installed) - **transcribe-speech.py** (the code file for the application) ### Configure your application 1. In the **Explorer** pane, right-click the **transcribe-speech** folder containing the application files, and select **Open in integrated terminal** (or in the existing terminal, navigate to the */Labfiles/03-gen-ai-speech/Python/transcribe-speech* folder.) > **Note**: Opening the terminal in Visual Studio Code will automatically activate the Python environment. You may need to enable running scripts on your system. 1. Ensure that the terminal is open in the **transcribe-speech** folder with the prefix **(.venv)** to indicate that the Python environment you created previously is active. 1. Install the OpenAI SDK package and other required packages by running the following command: ``` pip install -r requirements.txt ``` > **Note**: This step isn't actually necessary if you completed the previous part of this exercise, as botg apps use the same environment and have the same dependencies - but it won't do any harm! 1. In the **Explorer** pane, in the **transcribe-speech** folder, select the **.env** file to open it. Then update the configuration values to include the **Target URI** (endpoint) for your **gpt-4o-mini-transcribe** model. > **Tip**: Copy the Target URI from the model details page in the Foundry portal. Save the modified configuration file. ### Write code to use the model for speech-transcription 1. In the **Explorer** pane, in the **transcribe-speech** folder, select the **transcribe-speech.py** file to open it. 1. Review the existing code. You will add code to use the OpenAI SDK to access your model. > **Tip**: As you add code to the code file, be sure to maintain the correct indentation. 1. At the top of the code file, under the existing namespace references, find the comment **Import namespaces** and add the following code to import the namespace you will need to use the OpenAI SDK: ```python # import namespaces from openai import AzureOpenAI from azure.identity import DefaultAzureCredential, get_bearer_token_provider ``` 1. In the **main** function, note that code to load the endpoint from the configuration file has already been provided. Then find the comment **Create the Azure OpenAI client**, and add the following code to create a client for the OpenAI API: ```Python # Create the Azure OpenAI client token_provider = get_bearer_token_provider( DefaultAzureCredential(), "https://ai.azure.com/.default" ) client = AzureOpenAI( azure_endpoint=endpoint, azure_ad_token_provider = token_provider, api_version="2025-03-01-preview" ) ``` 1. Find the comment **Call model to transcribe audio file**, and add the following code to submit an audio file to the speech-transcription model generate a transcript. ```Python # Call model to transcribe audio file audio_file = open(file_path, "rb") transcription = client.audio.transcriptions.create( model=model_deployment, file=audio_file, response_format="text" ) print(transcription) ``` 1. Save the changes to the code file. ### Run the application 1. In the terminal pane, use the following command to sign into Azure. ```powershell az login ``` > **Note**: In most scenarios, just using *az login* will be sufficient. However, if you have subscriptions in multiple tenants, you may need to specify the tenant by using the *--tenant* parameter. See [Sign into Azure interactively using the Azure CLI](https://learn.microsoft.com/cli/azure/authenticate-azure-cli-interactively) for details. 1. When prompted, follow the instructions to sign into Azure. Then complete the sign in process in the command line, viewing (and confirming if necessary) the details of the subscription containing your Foundry resource. 1. After you have signed in, enter the following command to run the application: ``` python transcribe-speech.py ``` 1. Observe the output as the code submits the audio file to the model for transcription and displays the results. The code should also play the audio file. ## Clean up If you've finished exploring speech-enabled models in Foundry Tools, you should delete the resources you have created in this exercise to avoid incurring unnecessary Azure costs. 1. Open the [Azure portal](https://portal.azure.com) and view the contents of the resource group where you deployed the resources used in this exercise. 1. On the toolbar, select **Delete resource group**. 1. Enter the resource group name and confirm that you want to delete it. ================================================ FILE: Instructions/Exercises/04-azure-speech.md ================================================ --- lab: title: 'Recognize and synthesize speech' description: Implement speech functionality using Azure Speech in Foundry Tools. duration: 30 level: 300 islab: true --- # Recognize and synthesize speech **Azure Speech in Foundry Tools** is a service that provides speech-related functionality, including: - A *speech-to-text* API that enables you to implement speech recognition (converting audible spoken words into text). - A *text-to-speech* API that enables you to implement speech synthesis (converting text into audible speech). In this exercise, you'll use both of these APIs to implement a voice message assistant. While this exercise is based on Python, you can develop speech applications using multiple language-specific SDKs; including: - [Azure Speech SDK for Python](https://pypi.org/project/azure-cognitiveservices-speech/) - [Azure Speech SDK for .NET](https://www.nuget.org/packages/Microsoft.CognitiveServices.Speech) - [Azure Speech SDK for JavaScript](https://www.npmjs.com/package/microsoft-cognitiveservices-speech-sdk) This exercise takes approximately **30** minutes. > **Note**: Some of the technologies used in this exercise are in preview or in active development. You may experience some unexpected behavior, warnings, or errors. ## Prerequisites Before starting this exercise, ensure you have: - An active [Azure subscription](https://azure.microsoft.com/pricing/purchase-options/azure-account) - [Visual Studio Code](https://code.visualstudio.com/) installed - [Python version **3.13.xx**](https://www.python.org/downloads/release/python-31312/) installed\* - [Git](https://git-scm.com/install/) installed and configured - [Azure CLI](https://learn.microsoft.com/cli/azure/install-azure-cli?view=azure-cli-latest) installed > \* Python 3.14 is available, but some dependencies are not yet compiled for that release. The lab has been successfully tested with Python 3.13.12. ## Create a Microsoft Foundry project Microsoft Foundry uses projects to organize models, resources, data, and other assets used to develop an AI solution. 1. In a web browser, open the [Microsoft Foundry portal](https://ai.azure.com) at `https://ai.azure.com` and sign in using your Azure credentials. Close any tips or quick start panes that are opened the first time you sign in, and if necessary use the Foundry logo at the top left to navigate to the home page. 1. If it is not already enabled, in the tool bar the top of the page, enable the **New Foundry** option. Then, if prompted, create a new project with a unique name; expanding the **Advanced options** area to specify the following settings for your project: - **Foundry resource**: *Use the default name for your resource (usually {project_name}-resource)*\* - **Subscription**: *Your Azure subscription* - **Resource group**: *Create or select a resource group* - **Region**: Select any available region > **TIP**: \* Remember the Foundry resource name - you'll need it later! 1. Wait for your project to be created. Then view the home page for your project. ## Get the application files from GitHub The initial application files you'll need to develop the voice application are provided in a GitHub repo. 1. Open Visual Studio Code. 1. Open the command palette (*Ctrl+Shift+P*) and use the `Git:clone` command to clone the `https://github.com/microsoftlearning/mslearn-ai-language` repo to a local folder (it doesn't matter which one). Then open it. You may be prompted to confirm you trust the authors. 1. After the repo has been cloned, in the Explorer pane, navigate to the folder containing the application code files at **/Labfiles/04-azure-speech/Python/voice-mail**. The application files include: - **messages** (a subfolder containing audio recordings of messages) - **.env** (the application configuration file) - **requirements.txt** (the Python package dependencies that need to be installed) - **voice-mail.py** (the code file for the application) ## Configure your application 1. In Visual Studio Code, view the **Extensions** pane; and if it is not already installed, install the **Python** extension. 1. In the **Command Palette**, use the command `python:select interpreter`. Then select an existing environment if you have one, or create a new **Venv** environment based on your Python 3.1x installation. > **Tip**: If you are prompted to install dependencies, you can install the ones in the *requirements.txt* file in the */Labfiles/04-azure-speech/Python/voice-mail* folder; but it's OK if you don't - we'll install them later! > **Tip**: If you prefer to use the terminal, you can create your **Venv** environment with `python -m venv labenv`, then activate it with `\labenv\Scripts\activate`. 1. In the **Explorer** pane, right-click the **voice-mail** folder containing the application files, and select **Open in integrated terminal** (or open a terminal in the **Terminal** menu and navigate to the */Labfiles/04-azure-speech/Python/voice-mail* folder.) > **Note**: Opening the terminal in Visual Studio Code will automatically activate the Python environment. You may need to enable running scripts on your system. 1. Ensure that the terminal is open in the **voice-mail** folder with the prefix **(.venv)** to indicate that the Python environment you created is active. 1. Install the Azure AI Speech SDK package and other required packages by running the following command: ``` pip install -r requirements.txt ``` 1. In the **Explorer** pane, in the **voice-mail** folder, select the **.env** file to open it. Then update the configuration values to reflect the Cognitive Services **endpoint** for your Foundry resource. > **Important**: The endpoint should be *https://{YOUR_FOUNDRY_RESOURCE}.cognitiveservices.azure.com/*. The Foundry Resource name usually takes the form *{project_name}-resource*. Save the modified configuration file. ## Add code to synthesize speech 1. In the **Explorer** pane, in the **voice-mail** folder, open the **voice-mail.py** file. 1. Review the existing code. You will add code to work with the Azure Speech SDK. > **Tip**: As you add code to the code file, be sure to maintain the correct indentation. 1. At the top of the code file, under the existing namespace references, find the comment **Import namespaces** and add the following code to import the namespaces you will need to use the Speech SDK: ```python # import namespaces from azure.identity import DefaultAzureCredential import azure.cognitiveservices.speech as speech_sdk ``` 1. In the **main** function, note that code to load the endpoint from the configuration file has already been provided. Then find the comment **Create speech_config using Entra ID authentication**, and add the following code to create a Speech Configuration object: ```Python # Create speech_config using Entra ID authentication credential = DefaultAzureCredential() speech_config = speech_sdk.SpeechConfig( token_credential=credential, endpoint=foundry_endpoint) ``` 1. Review the rest of the **main** function, and note that a loop has been implemented that enables the user to choose one of three options: 1. Record a voice greeting 1. Transcribe messages 1. Exit the application 1. Find the **record_greeting** function, which you will implement to record a voice greeting as an audio file. 1. In the **record_greeting** function, find the comment **Synthesize the greeting message to an audio file**, and add the following code to synthesize speech from the text entered by the user and save it as an audio file. ```python output_file = "greeting.wav" audio_config = speech_sdk.audio.AudioOutputConfig(filename=output_file) speech_config.speech_synthesis_voice_name = "en-US-Serena:DragonHDLatestNeural" speech_synthesizer = speech_sdk.SpeechSynthesizer( speech_config=speech_config, audio_config=audio_config ) result = speech_synthesizer.speak_text_async(greeting_message).get() if result.reason == speech_sdk.ResultReason.SynthesizingAudioCompleted: print(f"Greeting recorded and saved to {output_file}") speech_synthesizer = None # Release the synthesizer resources else: print("Error recording greeting: {}".format(result.reason)) ``` 1. Save the changes to the code file. Then, in the terminal pane, use the following command to sign into Azure. ```powershell az login ``` > **Note**: In most scenarios, just using *az login* will be sufficient. However, if you have subscriptions in multiple tenants, you may need to specify the tenant by using the *--tenant* parameter. See [Sign into Azure interactively using the Azure CLI](https://learn.microsoft.com/cli/azure/authenticate-azure-cli-interactively) for details. 1. When prompted, follow the instructions to sign into Azure. Then complete the sign in process in the command line, viewing (and confirming if necessary) the details of the subscription containing your Foundry resource. 1. After you have signed in, enter the following command to run the application: ```powershell python voice-mail.py ``` 1. When prompted, enter **1** to record a greeting. 1. Enter a greeting, like `Hi. The person you called is not available right now. Leave a message.` 1. Wait while the speech is synthesized and saved as an audio file. You can select the *greeting.wav* file that is generated in the voice-mail folder to play it in Visual Studio Code. ## Add code to recognize speech 1. In the **voice-mail.py** code file, find the **transcribe_messages** function; which you will implement to transcribe each of the voice messages in the **messages** subfolder. The functional already contains code to loop through the files in the **messages** folder. 1. In the **transcribe_messages** function, find the comment **Transcribe the audio file**, and add the following code to transcribe the audio. ```python # Transcribe the audio file audio_config = speech_sdk.audio.AudioConfig(filename=file_path) speech_recognizer = speech_sdk.SpeechRecognizer( speech_config=speech_config, audio_config=audio_config ) result = speech_recognizer.recognize_once_async().get() if result.reason == speech_sdk.ResultReason.RecognizedSpeech: print(f"Transcription: {result.text}") else: print("Error transcribing message: {}".format(result.reason)) ``` 1. Save the changes to the code file. Then, in the terminal, enter the following command to run the application: ```powershell python voice-mail.py ``` 1. When prompted, enter **2** to transcribe messages. 1. View the transcription for each message. Each file is played back automatically, so you can hear the message. ## Clean up If you've finished exploring Azure Speech in Foundry Tools, you should delete the resources you have created in this exercise to avoid incurring unnecessary Azure costs. 1. Open the [Azure portal](https://portal.azure.com) and view the contents of the resource group where you deployed the resources used in this exercise. 1. On the toolbar, select **Delete resource group**. 1. Enter the resource group name and confirm that you want to delete it. ## More information For more information about using the **Speech-to-text** and **Text-to-speech** APIs, see the [Speech-to-text documentation](https://learn.microsoft.com/azure/ai-services/speech-service/index-speech-to-text) and [Text-to-speech documentation](https://learn.microsoft.com/azure/ai-services/speech-service/index-text-to-speech). ================================================ FILE: Instructions/Exercises/05-azure-speech-mcp.md ================================================ --- lab: title: 'Use Azure Speech in an agent' description: Use the Azure Speech in Foundry Tools MCP server to add speech capabilities to an agent. duration: 30 level: 300 islab: true --- # Use Azure Speech in an agent > WARNING: You may experience a bocking error in this lab. The issue is under investigation. We apologize for the inconvenience. **Azure Speech in Foundry Tools** provides an MCP server that you can use to enable and agent to call its speech recognition and synthesis capabilities. In this exercise, you'll configure the Azure Speech in Foundry Tools MCP server, and connect it to an agent. The code used in this exercise is based on the Foundry Tools SDK for Python. You can develop similar solutions using the SDKs for Microsoft .NET, JavaScript, and Java. Refer to [Microsoft Foundry SDK client libraries](https://learn.microsoft.com/azure/ai-foundry/how-to/develop/sdk-overview) for details. This exercise takes approximately **30** minutes. > **Note**: Some of the technologies used in this exercise are in preview or in active development. You may experience some unexpected behavior, warnings, or errors. ## Prerequisites Before starting this exercise, ensure you have: - An active [Azure subscription](https://azure.microsoft.com/pricing/purchase-options/azure-account) - [Visual Studio Code](https://code.visualstudio.com/) installed - [Python version **3.13.xx**](https://www.python.org/downloads/release/python-31312/) installed\* - [Git](https://git-scm.com/install/) installed and configured - [Azure CLI](https://learn.microsoft.com/cli/azure/install-azure-cli?view=azure-cli-latest) installed > \* Python 3.14 is available, but some dependencies are not yet compiled for that release. The lab has been successfully tested with Python 3.13.12. ## Create an Azure storage account The Azure Speech MCP server uses an Azure storage account to save generated audio files. 1. Open the [Azure portal](https://portal.azure.com) at `https://portal.azure.com`, and sign in using your Azure credentials. 1. Create a new **Azure storage account** resource with the following settings: - **Subscription**: *Your subscription* - **Resource group**: *Create or select a resource group* - **Storage account name**: *A unique name for your storage account* - **Region**: *Any available region* - **Preferred storage type**: Azure blob storage or Azure Data Lake Storage Gen2 - **Primary workload**: Cloud native - **Performance**: Standard - **Redundancy**: Locally-redundant storage (LRS) 1. When the Azure storage account resource has been created, go to it in the portal. 1. In the left navigation pane for the storage account, expand **Data storage**, and select **Containers**. 1. Add a new container named **files**. This is where your agent will save the audio files it generates. 1. In the context menu (**...**) for the **files** container, select **Generate SAS**, and create a SAS token with the following details: - **Signing method** : Account key - **Signing key**: Key1 - **Stored access policy**: None - **Permissions**: - Read - Add - Create - Write - List - **Start and expiry date/time**: - **Start**: The current date and time - **Expiry**: 11:59pm tomorrow - **Allowed IP addresses**: *Leave blank* - **Allowed protocols**: HTTPS only > **IMPORTANT**: Copy the generated SAS token and URL, and store them in a text file for now - you'll need them later! ## Create a Microsoft Foundry project Microsoft Foundry uses projects to organize models, resources, data, and other assets used to develop an AI solution. 1. In a web browser, open the [Microsoft Foundry portal](https://ai.azure.com) at `https://ai.azure.com` and sign in using your Azure credentials. Close any tips or quick start panes that are opened the first time you sign in, and if necessary use the Foundry logo at the top left to navigate to the home page. 1. If it is not already enabled, in the tool bar the top of the page, enable the **New Foundry** option. Then, if prompted, create a new project with a unique name; expanding the **Advanced options** area to specify the following settings for your project: - **Foundry resource**: *Use the default name for your resource (usually {project_name}-resource)* - **Subscription**: *Your Azure subscription* - **Resource group**: *Create or select a resource group* - **Region**: Select any available region > **TIP**: Remember (or make a note of) the Foundry resource name - you're going to need it later! 1. Wait for your project to be created. 1. On the home page for your project, note that the API key, project endpoint, and OpenAI endpoint are displayed here. > **TIP**: Copy the project key to the clipboard - you're going to need it later! ## Create an agent Now that you have a Foundry project, you can create an agent. 1. Now you're ready to **Start building**. Select **Create agents** (or on the **Build** page, select the **Agents** tab); and create a new agent named `speech-agent`. When ready, your agent opens in the agent playground. 1. In the model drop-down list, ensure that a **gpt-4.1** model has been deployed and selected for your agent. 1. Assign your agent the following **Instructions**: ``` You are an AI agent that uses the Azure AI Speech tool to transcribe and generates speech. ``` 1. Use the **Save** button to save the changes. 1. Test the agent by entering the following prompt in the **Chat** pane: ``` What can you help me with? ``` The agent should respond with an appropriate answer based on its instructions. ## Create an Azure Speech in Foundry Tools connection Foundry includes an MCP server for Azure Speech in Foundry Tools, which you can connect to your project and use in your agent. 1. In the navigation pane on the left, select the **Tools** page. 1. On the **Tolls** tab, connect a tool; selecting **Azure Speech MCP Server** in the **Catalog** and connecting it to an endpoint. specifying the following configuration - **Name**: *A unique name for your tool.* - **Remote MCP Server endpoint**: `https://{foundry-resource-name}.cognitiveservices.azure.com/speech/mcp?api-version=2025-11-15-preview` - **Parameters**: foundry-resource-name: *Your foundry resource name* - **Authentication**: Key-based: - **Credential**: - `Ocp-Apim-Subscription-Key`: *API Key for your Foundry project* - **Add key value pair** - `X-Blob-Container-Url`: *The SAS URL for your storage container* > **Note**: If key-based authentication is disabled by a policy in your Azure subscription, you can use Entra ID authentication to connect the agent to the Azure Language service. 1. Wait for the MCP tool connection to be created, and then view its details page. 1. On the details page for the Azure Speech in Foundry Tools connection, select **Use in an agent**, and then select the **Speech-Agent** agent you created previously. The agent should open in the playground, with the Azure Speech in Foundry Tools tool connected. ## Test the Azure Speech tool in the playground Now let's test the agent's ability to use the tool you connected. 1. In the agent playground for the **speech-agent** agent, enter the following prompt: ``` Generate "To be or not to be, that is the question." as speech ``` 1. When prompted, approve use of the Azure Speech tool by selecting **Always approve all Azure Speech MCP Server tools**. > NOTE: You may encounter the error ***HTTP 404 (not found)***. This issue is currently under investigation. If this occurs, the rest of the lab exercise will not work. We apologize for the inconvenience. 1. Review the response, which should include a link to the generated audio file. Then click the link to hear the synthesized speech. 1. Enter the following prompt: ``` Transcribe the file at https://microsoftlearning.github.io/mslearn-ai-language/Labfiles/05-speech-tool/speech_1.wav ``` 1. If prompted, approve use of the Azure Speech tool by selecting **Always approve all Azure Speech MCP Server tools**. 1. Review the output, which should be a transcription of the audio file. ## Create a client application Now that you have a working agent, you can create a client application that uses it. ### Get the application files from GitHub 1. Open Visual Studio Code. 1. Open the command palette (*Ctrl+Shift+P*) and use the `Git:clone` command to clone the `https://github.com/microsoftlearning/mslearn-ai-language` repo to a local folder (it doesn't matter which one). Then open it. You may be prompted to confirm you trust the authors. 1. After the repo has been cloned, in the Explorer pane, navigate to the folder containing the application code files at **/Labfiles/05-speech-tool/Python/speech-client**. The application files include: - **.env** (the application configuration file) - **requirements.txt** (the Python package dependencies that need to be installed) - **speech-client.py** (the code file for the application) ### Configure the application 1. In Visual Studio Code, view the **Extensions** pane; and if it is not already installed, install the **Python** extension. 1. In the **Command Palette**, use the command `python:select interpreter`. Then select an existing environment if you have one, or create a new **Venv** environment based on your Python 3.1x installation. > **Tip**: If you are prompted to install dependencies, you can install the ones in the *requirements.txt* file in the */Labfiles/05-speech-tool/Python/speech-client* folder; but it's OK if you don't - we'll install them later! > **Tip**: If you prefer to use the terminal, you can create your **Venv** environment with `python -m venv labenv`, then activate it with `\labenv\Scripts\activate`. 1. In the **Explorer** pane, right-click the **text-agent** folder containing the application files, and select **Open in integrated terminal** (or open a terminal in the **Terminal** menu and navigate to the */Labfiles/05-speech-tool/Python/speech-client* folder.) > **Note**: Opening the terminal in Visual Studio Code will automatically activate the Python environment. You may need to enable running scripts on your system. 1. Ensure that the terminal is open in the **speech-client** folder with the prefix **(.venv)** to indicate that the Python environment you created is active. 1. Install the Foundry SDK package, the Azure Identity package, and other required packages by running the following command: ``` pip install -r requirements.txt ``` 1. In the **Explorer** pane, in the **speech-client** folder, select the **.env** file to open it. Then update the configuration values to include your project **endpoint** (from the project home page in Foundry Portal)and the name of your agent (which should be **Speech-Agent** - note that this name is case-sensitive). 1. Save the modified configuration file. ### Implement application code 1. In the **Explorer** pane, in the **speech-client** folder, open the **speech-client.py** file. 1. Review the existing code. You will add code to submit prompts to your agent. > **Tip**: As you add code to the code file, be sure to maintain the correct indentation. 1. At the top of the code file, under the existing namespace references, find the comment **Import namespaces** and add the following code to import the namespaces you will need: ```python # import namespaces from azure.identity import DefaultAzureCredential from azure.ai.projects import AIProjectClient ``` 1. In the **main** function, note that code to load the endpoint from the configuration file has already been provided. Then find the comment **Get project client**, and add the following code to create a client for your Foundry project: ```python # Get project client project_client = AIProjectClient( endpoint=foundry_endpoint, credential=DefaultAzureCredential(), ) ``` 1. Find the comment **Get an OpenAI client**, and add the following code to get an OpenAI client with which to call your agent. ```python # Get an OpenAI client openai_client = project_client.get_openai_client() ``` 1. Find the comment **Use the agent to get a response**, and add the following code to submit a user prompt to your agent, and display the response. ```python # Use the agent to get a response response = openai_client.responses.create( input=[{"role": "user", "content": prompt}], extra_body={"agent_reference": {"name": agent_name, "type": "agent_reference"}}, ) print(f"{agent_name}: {response.output_text}") ``` 1. Save the changes you made to the code file. ## Test the client application Now let's test the application by running it in a Python environment and authenticating the connection to your project. 1. In the terminal pane, use the following command to sign into Azure. ```powershell az login ``` > **Note**: In most scenarios, just using *az login* will be sufficient. However, if you have subscriptions in multiple tenants, you may need to specify the tenant by using the *--tenant* parameter. See [Sign into Azure interactively using the Azure CLI](https://learn.microsoft.com/cli/azure/authenticate-azure-cli-interactively) for details. 1. When prompted, follow the instructions to sign into Azure. Then complete the sign in process in the command line, viewing (and confirming if necessary) the details of the subscription containing your Foundry resource. 1. After you have signed in, enter the following command to run the application: ```powershell python speech-client.py ``` 1. When prompted, enter the following prompt: ``` Synthesize "Better a witty fool, than a foolish wit!" as speech using the voice "en-GB-SoniaNeural". ``` 1. Review the response, which should include a clickable link to a generated audio file. 1. After checking out the generated audio file, enter the following prompt: ``` Transcribe https://microsoftlearning.github.io/mslearn-ai-language/Labfiles/05-speech-tool/speech_2.wav ``` 1. Review the response. 1. To exit the program, enter "quit" (or just press return) ## Clean up resources If you're finished exploring the Azure AI Language service, you can delete the resources you created in this exercise. Here's how: 1. In the Azure portal, browse to the Foundry resource you created in this lab. 1. On the resource page, select **Delete** and follow the instructions to delete the resource. ================================================ FILE: Instructions/Exercises/06-voice-live-agent.md ================================================ --- lab: title: 'Develop a Voice Live agent' description: 'Use Azure Speech Voice Live in Microsoft Foundry Tools to create a conversational agent.' level: 300 duration: 30 islab: true --- # Develop a Voice Live agent Speech-capable AI agents enable users to interact conversationally - using spoken command and questions that generate vocal responses. In this exercise, you'll the Voice Live capability of Azure Speech in Microsoft Foundry Tools to create a real-time voice-based agent. This exercise takes approximately **30** minutes. > **Note**: Some of the technologies used in this exercise are in preview or in active development. You may experience some unexpected behavior, warnings, or errors. ## Prerequisites Before starting this exercise, ensure you have: - An active [Azure subscription](https://azure.microsoft.com/pricing/purchase-options/azure-account) - [Visual Studio Code](https://code.visualstudio.com/) installed - [Python version **3.13.xx**](https://www.python.org/downloads/release/python-31312/) installed\* - [Git](https://git-scm.com/install/) installed and configured - [Azure CLI](https://learn.microsoft.com/cli/azure/install-azure-cli?view=azure-cli-latest) installed > \* Python 3.14 is available, but some dependencies are not yet compiled for that release. The lab has been successfully tested with Python 3.13.12. ## Create a Microsoft Foundry project Microsoft Foundry uses projects to organize models, resources, data, and other assets used to develop an AI solution. 1. In a web browser, open [Microsoft Foundry](https://ai.azure.com) at `https://ai.azure.com` and sign in using your Azure credentials. Close any tips or quick start panes that are opened the first time you sign in, and if necessary use the Foundry logo at the top left to navigate to the home page. 1. If it is not already enabled, in the tool bar the top of the page, enable the **New Foundry** option. Then, if prompted, create a new project with a unique name; expanding the **Advanced options** area to specify the following settings for your project: - **Foundry resource**: *Enter a valid name for your AI Foundry resource.* - **Subscription**: *Your Azure subscription* - **Resource group**: *Create or select a resource group* - **Region**: Select any available region 1. Select **Create**. Wait for your project to be created. Then view its home page. ## Create an agent Now let's create an agent. 1. Now you're ready to **Start building**. Select **Create agents** (or on the **Build** page, select the **Agents** tab); and create a new agent named `chat-agent`. When ready, your agent opens in the agent playground. 1. In the model drop-down list, ensure that a **gpt-4.1** model has been deployed and selected for your agent. 1. Assign your agent the following **Instructions**: ``` You are an AI assistant that helps people find information about AI and related topics. You answer questions concisely and precisely. ``` 1. Use the **Save** button to save the changes. 1. Test the agent by entering the following prompt in the **Chat** pane: ``` What can you help me with? ``` The agent should respond with an appropriate answer based on its instructions. ## Configure Azure Speech Voice Live Enabling speech mode for a Foundry agent integrates Azure Speech Voice Live - adding speech capabilities to the agent. 1. In the pane on the left, under the model selection list, enable **Voice mode**. If the **Configuration** pane does not open automatically, use the "cog" icon above the chat interface to open it. 1. In the **Configuration** pane, under **Voice Live**, review the default speech input and output configuration. You can try different voices, previewing them until you decide which one to use. 1. Close the **Configuration** pane and use the **Save** button to save the agent. ## Use speech to interact with the agent Now you're ready to chat with the agent. 1. In the Chat pane, use the **Start session** button to start a conversation with the agent. If prompted, allow access to the system microphone. The agent will start a speech session, and listen for your prompt. 1. When the app status is **Listening…**, say something like "*How does speech recognition work?*" and wait for a response. 1. Verify that the app status changes to **Processing…**. The app will process the spoken input. >**Tip**: The processing speed may be so fast that you do not actually see the status before it changes back to *Speaking*. 1. When the status changes to **Speaking…**, the app uses text-to-speech to vocalize the response from the model. To see the original prompt and the response as text, select the **cc** button on the bottom of the chat screen. >**Tip**: The follow-on prompt is submitted just by speaking. You can even interrupt the agent to keep the interaction focused on what you need done. You can also use the **Stop generation** button in the chat pane to stop long-running responses. The button will end the conversation. You will need to start a new conversation to continue using the agent. 1. To continue the conversation, just ask another question, such as "*How does speech synthesis work?*", and review the response. 1. When you have finished chatting with the agent, use the **X** icon to end the session. A transcript of the conversation will be displayed. ## Create a client application To use your agent in a custom application, you need to write code that uses the Azure Speech Voice Live SDK to initiate and manage a conversation session. ### Get the application files from GitHub 1. Open Visual Studio Code. 1. Open the command palette (*Ctrl+Shift+P*) and use the `Git:clone` command to clone the `https://github.com/microsoftlearning/mslearn-ai-language` repo to a local folder (it doesn't matter which one). Then open it. You may be prompted to confirm you trust the authors. 1. After the repo has been cloned, in the Explorer pane, navigate to the folder containing the application code files at **/Labfiles/06-voice-live/Python/chat-client**. The application files include: - **.env** (the application configuration file) - **requirements.txt** (the Python package dependencies that need to be installed) - **chat-client.py** (the code file for the application) ### Configure the application 1. In Visual Studio Code, view the **Extensions** pane; and if it is not already installed, install the **Python** extension. 1. In the **Command Palette**, use the command `python:select interpreter`. Then select an existing environment if you have one, or create a new **Venv** environment based on your Python 3.13.xx installation. > **Tip**: If you are prompted to install dependencies, you can install the ones in the *requirements.txt* file in the */Labfiles/06-voice-live/Python/chat-client* folder; but it's OK if you - don't we'll install them later! > **Tip**: If you prefer to use the terminal, you can create your **Venv** environment with `python -m venv labenv`, then activate it with `\labenv\Scripts\activate`. 1. In the **Explorer** pane, right-click the **chat-client** folder containing the application files, and select **Open in integrated terminal** (or open a terminal in the **Terminal** menu and navigate to the */Labfiles/06-voice-live/Python/chat-client* folder.) > **Note**: Opening the terminal in Visual Studio Code will automatically activate the Python environment. You may need to enable running scripts on your system. 1. Ensure that the terminal is open in the **chat-client** folder with the prefix **(.venv)** to indicate that the Python environment you created is active. 1. Install the Foundry SDK package, the Azure Identity package, and other required packages by running the following command: ``` pip install -r requirements.txt azure-identity azure-ai-voicelive==1.2.0b4 --pre azure-ai-projects==2.0.0b4 ``` 1. In the **Explorer** pane, in the **chat-client** folder, select the **.env** file to open it. Then update the configuration values to include your Foundry resource **endpoint** (get the project endpoint from the project home page in Foundry Portal, but use only the base URL up to the *.com* domain), your project name, and the name of your agent (which should be **Chat-Agent** - note that this name is case-sensitive). > **Important**: Modify the pasted endpoint to remove the "/api/projects/{project_name}" suffix - the endpoint should be *https://{your-foundry-resource-name}.services.ai.azure.com*. 1. Save the modified configuration file. ### Implement application code 1. In the **Explorer** pane, in the **chat-client** folder, open the **chat-client.py** file. 1. Review the existing code. Most of the application scaffolding has been provided - you must implement the key steps required to use the Voice Live SDK to manage a conversation with your agent. > **Tip**: As you add code to the code file, be sure to maintain the correct indentation. 1. At the top of the code file, under the existing namespace references, find the comment **Import namespaces** and add the following code to import the namespaces you will need: ```python # import namespaces from azure.identity.aio import AzureCliCredential from azure.ai.voicelive.aio import connect from azure.ai.voicelive.models import ( InputAudioFormat, Modality, OutputAudioFormat, RequestSession, ServerEventType, AudioNoiseReduction, AudioEchoCancellation, AzureSemanticVadMultilingual ) ``` 1. In the **main** function, note that code to load the endpoint from the configuration file has already been provided, as has code to get an authentication credential and to create and run a **VoiceAssistant** object. The **VoiceAssistant** class encapsulates the logic to manage the Voice Live conversation. 1. Under the **main** function, find the **VoiceAssistant** class definition. The ****init**** function to initialize an object based on the class has already been implemented. You must implement the **start** function, which is the core function to establish the conversation session. 1. Find the comment **STEP 1: Connect Azure VoiceLive to the agent**, and add the following code (being careful to indent it one level in under the **try:** statement): ```python # STEP 1: Connect Azure VoiceLive to the agent async with connect( endpoint=self.endpoint, credential=self.credential, api_version="2026-01-01-preview", agent_config=self.agent_config ) as connection: self.connection = connection ``` This step creates a connection to your agent so the Voice Live SDK can establish a conversation with it. 1. Find the comment **STEP 2: Initialize audio processor**, and add the following code (being careful to indent it *another level in* under the step 1 code you just added): ```python # STEP 2: Initialize audio processor self.audio_processor = AudioProcessor(connection) ``` This code attaches an AudioProcessor object based on the class definition further down in the code file. The AudioProcessor is a utlility class to manage audio hardware I/O. 1. Find the comment **STEP 3: Configure the session**, and add the following code (being careful to maintain the same indentation as the step 2 code above): ```python # STEP 3: Configure the session await self.setup_session() ``` This code configures the session with the appropriate audio formats, conversational turn-detection semantics, and options to handle echos and background noise. 1. Find the comment **STEP 4: Start audio systems**, and add the following code (being careful to maintain the same indentation as the step 3 code above): ```python # STEP 4: Start audio systems self.audio_processor.start_playback() print("\n✅ Ready! Start speaking...") print("Press Ctrl+C to exit\n") ``` This code starts the audio processor so that it monitors the microphone for audio input and plays back audio output. 1. Find the comment **STEP 5: Process events**, and add the following code (being careful to maintain the same indentation as the step 4 code above): ```python # STEP 5: Process events await self.process_events() ``` This code runs the main loop to process events such as speech input, response output, and interruptions. 1. Save the changes to the code file. The completed function should look like this: ```python async def start(self): """Start the voice assistant.""" print("\n" + "=" *60) print(f"🎙️ {self.agent_config['agent_name']}") print("="* 60) # Add your code in this try block! try: # STEP 1: Connect Azure VoiceLive to the agent async with connect( endpoint=self.endpoint, credential=self.credential, api_version="2026-01-01-preview", agent_config=self.agent_config ) as connection: self.connection = connection # STEP 2: Initialize audio processor self.audio_processor = AudioProcessor(connection) # STEP 3: Configure the session await self.setup_session() # STEP 4: Start audio systems self.audio_processor.start_playback() print("\n✅ Ready! Start speaking...") print("Press Ctrl+C to exit\n") # STEP 5: Process events await self.process_events() finally: if hasattr(self, 'audio_processor'): self.audio_processor.shutdown() ``` ## Run the application Now you're ready to run your application, and have a conversation with your agent. > **TIP**: The application works best when using a headset. When using speakers, there's a risk that the agent can "hear" its own responses and process them as new user input. 1. In the Visual Studio Code terminal, enter the following command to sign into Azure ```powershell az login ``` When prompted, sign into Azure using your credentials. > **Note**: In most scenarios, just using *az login* will be sufficient. However, if you have subscriptions in multiple tenants, you may need to specify the tenant by using the *--tenant* parameter. See *[Sign into Azure interactively using the Azure CLI](https://learn.microsoft.com/cli/azure/authenticate-azure-cli-interactively)* for details. 1. In the Visual Studio Code terminal, confirm the details of your Azure subscription; and then enter the following command to run the client application: ```powershell python chat-client.py ``` 1. When prompted, begin a conversation with the agent by asking a question such as "*How is computer speech used in AI?*". 1. Listen to the response and then continue the conversation - note that you can interrupt the agent to ask new questions. 1. When you're finished, press **CTRL+C** to end the conversation and stop the program. ## Clean up If you have finished exploring Microsoft Foundry, delete any resources that you no longer need. This avoids accruing any unnecessary costs. 1. Open the **Azure portal** at [https://portal.azure.com](https://portal.azure.com) and select the resource group that contains the resources you created. 1. Select **Delete resource group** and then **enter the resource group name** to confirm. The resource group is then deleted. ================================================ FILE: Instructions/Exercises/07-translation.md ================================================ --- lab: title: 'Translate text and speech' description: Implement translation with Azure Translator and Azure Speech in Foundry Tools. duration: 30 level: 300 islab: true --- # Translate text and speech **Azure Translator in Foundry Tools** is a service that enables you to translate text between languages. Similarly, **Azure Speech in Foundry Tools** provides translation services for speech. In this exercise, you'll use them to create translation apps that translates input in any supported language to the target language of your choice. While this exercise is based on Python, you can develop text translation applications using multiple language-specific SDKs; including: - [Azure Translator client library for Python](https://pypi.org/project/azure-ai-translation-text/) - [Azure Translator client library for .NET](https://www.nuget.org/packages/Azure.AI.Translation.Text) - [Azure Translator client library for JavaScript](https://www.npmjs.com/package/@azure-rest/ai-translation-text) - [Azure AI Speech SDK for Python](https://pypi.org/project/azure-cognitiveservices-speech/) - [Azure AI Speech SDK for .NET](https://www.nuget.org/packages/Microsoft.CognitiveServices.Speech) - [Azure AI Speech SDK for JavaScript](https://www.npmjs.com/package/microsoft-cognitiveservices-speech-sdk) This exercise takes approximately **30** minutes. > **Note**: Some of the technologies used in this exercise are in preview or in active development. You may experience some unexpected behavior, warnings, or errors. ## Prerequisites Before starting this exercise, ensure you have: - An active [Azure subscription](https://azure.microsoft.com/pricing/purchase-options/azure-account) - [Visual Studio Code](https://code.visualstudio.com/) installed - [Python version **3.13.xx**](https://www.python.org/downloads/release/python-31312/) installed\* - [Git](https://git-scm.com/install/) installed and configured - [Azure CLI](https://learn.microsoft.com/cli/azure/install-azure-cli?view=azure-cli-latest) installed > \* Python 3.14 is available, but some dependencies are not yet compiled for that release. The lab has been successfully tested with Python 3.13.12. ## Create a Microsoft Foundry project Microsoft Foundry uses projects to organize models, resources, data, and other assets used to develop an AI solution. 1. In a web browser, open the [Microsoft Foundry portal](https://ai.azure.com) at `https://ai.azure.com` and sign in using your Azure credentials. Close any tips or quick start panes that are opened the first time you sign in, and if necessary use the Foundry logo at the top left to navigate to the home page. 1. If it is not already enabled, in the tool bar the top of the page, enable the **New Foundry** option. Then, if prompted, create a new project with a unique name; expanding the **Advanced options** area to specify the following settings for your project: - **Foundry resource**: *Use the default name for your resource (usually {project_name}-resource)*\* - **Subscription**: *Your Azure subscription* - **Resource group**: *Create or select a resource group* - **Region**: Select any available region > **TIP**: \* Remember the Foundry resource name - you'll need it later! 1. Wait for your project to be created. Then view the home page for your project. ## Explore Azure Translator in Foundry Tools in the portal You can use the Azure Translator playground in the Foundry portal to experiment with the service. 1. Now you're ready to **Start building**. Select **Explore playgrounds** (or on the **Build** page, select the **Models** tab) to view the models in your project. 1. In the **Models** page, select the **AI services** tab to view the list of Azure services in Foundry Tools. 1. In the list of tools, select **Azure Translator - Text translation**. 1. In the Text translator playground, in the **Source text** area, enter the text `Hello world!`. Then, in the **Translation** area, select any language and use the **Translate** button to generate the translation. 1. Try a few more languages. 1. Select the **Code** tab to view sample code for using Azure Translator; and note the **ENDPOINT** variable used in the code for the REST API, which should be similar to `https://{foundry-resource-name}.cognitiveservices.azure.com/`. This endpoint uses an older format for Azure AI Services, but is still used to connect to the Azure Translator resource in a Foundry resource. You can also use it to connect to Azure Speech tools. > **TIP**: You're going to need the endpoint later! ## Get application files from GitHub The initial application files you'll need to develop the translation application are provided in a GitHub repo. 1. Open Visual Studio Code. 1. Open the command palette (*Ctrl+Shift+P*) and use the `Git:clone` command to clone the `https://github.com/microsoftlearning/mslearn-ai-language` repo to a local folder (it doesn't matter which one). Then open it. You may be prompted to confirm you trust the authors. 1. In Visual Studio Code, view the **Extensions** pane; and if it is not already installed, install the **Python** extension. 1. In the **Command Palette**, use the command `python:select interpreter`. Then select an existing environment if you have one, or create a new **Venv** environment based on your Python 3.1x installation. > **Tip**: If you are prompted to install dependencies, you can install the ones in the *requirements.txt* file in the */Labfiles/07-translation/Python/translators* folder; but it's OK if you don't - we'll install them later! > **Tip**: If you prefer to use the terminal, you can create your **Venv** environment with `python -m venv labenv`, then activate it with `\labenv\Scripts\activate`. ## Create a text translation application Now you're ready to use Azure Translator to implement text translation. 1. After the repo has been cloned, in the Explorer pane, navigate to the folder containing the application code files at **/Labfiles/07-translation/Python/translators**. The application files include: - **.env** (the application configuration file) - **requirements.txt** (the Python package dependencies that need to be installed) - **translate-text.py** (the code file for text-application) - **translate-speech.py** (the code file for speech-application) ### Configure your text translation application 1. In the **Explorer** pane, in the **translators** folder, select the **.env** file to open it. Then update the configuration values to reflect the Cognitive Services **endpoint** for your Foundry resource. > **Important**: The endpoint should be *https://{YOUR_FOUNDRY_RESOURCE}.cognitiveservices.azure.com/*. The Foundry Resource name usually takes the form *{project_name}-resource*. Save the modified configuration file. 1. In the **Explorer** pane, right-click the **translators** folder containing the application files, and select **Open in integrated terminal** (or open a terminal in the **Terminal** menu and navigate to the */Labfiles/07-translation/Python/translators* folder.) > **Note**: Opening the terminal in Visual Studio Code will automatically activate the Python environment. You may need to enable running scripts on your system. 1. Ensure that the terminal is open in the **translators** folder with the prefix **(.venv)** to indicate that the Python environment you created is active. 1. Install the Azure Translator SDK, Speech SDK, and other required packages by running the following command: ``` pip install -r requirements.txt ``` ### Add code to translate text 1. In the **Explorer** pane, in the **translators** folder, open the **translate-text.py** file. 1. Review the existing code. You will add code to work with Azure Translator. > **Tip**: As you add code to the code file, be sure to maintain the correct indentation. 1. At the top of the code file, under the existing namespace references, find the comment **Import namespaces** and add the following code to import the namespaces you will need to use the Translator SDK: ```python # import namespaces from azure.identity import DefaultAzureCredential from azure.ai.translation.text import * from azure.ai.translation.text.models import InputTextItem ``` 1. In the **main** function, note that the existing code reads the configuration settings. 1. Find the comment **Create client using endpoint and credential** and add the following code: ```python # Create client using endpoint and credential credential = DefaultAzureCredential() client = TextTranslationClient(credential=credential, endpoint=foundry_endpoint) ``` 1. Find the comment **Choose target language** and add the following code, which uses the Text Translator service to return list of supported languages for translation, and prompts the user to select a language code for the target language: ```python # Choose target language languagesResponse = client.get_supported_languages(scope="translation") print("{} languages supported.".format(len(languagesResponse.translation))) print("(See https://learn.microsoft.com/azure/ai-services/translator/language-support#translation)") print("Enter a target language code for translation (for example, 'en'):") targetLanguage = "xx" supportedLanguage = False while supportedLanguage == False: targetLanguage = input() if targetLanguage in languagesResponse.translation.keys(): supportedLanguage = True else: print("{} is not a supported language.".format(targetLanguage)) ``` 1. Find the comment **Translate text** and add the following code, which repeatedly prompts the user for text to be translated, uses the Azure AI Translator service to translate it to the target language (detecting the source language automatically), and displays the results until the user enters *quit*: ```python # Translate text inputText = "" while inputText.lower() != "quit": inputText = input("Enter text to translate ('quit' to exit):") if inputText != "quit": input_text_elements = [InputTextItem(text=inputText)] translationResponse = client.translate(body=input_text_elements, to_language=[targetLanguage]) translation = translationResponse[0] if translationResponse else None if translation: sourceLanguage = translation.detected_language for translated_text in translation.translations: print(f"'{inputText}' was translated from {sourceLanguage.language} to {translated_text.to} as '{translated_text.text}'.") ``` 1. Save the changes to the code file. Then, in the terminal pane, use the following command to sign into Azure. ```powershell az login ``` > **Note**: In most scenarios, just using *az login* will be sufficient. However, if you have subscriptions in multiple tenants, you may need to specify the tenant by using the *--tenant* parameter. See [Sign into Azure interactively using the Azure CLI](https://learn.microsoft.com/cli/azure/authenticate-azure-cli-interactively) for details. 1. When prompted, follow the instructions to sign into Azure. Then complete the sign in process in the command line, viewing (and confirming if necessary) the details of the subscription containing your Foundry resource. 1. After you have signed in, enter the following command to run the application: ``` python translate-text.py ``` 1. When prompted, enter a valid target language from the list in the link displayed. 1. Enter a phrase to be translated (for example `This is a test` or `C'est un test`) and view the results, which should detect the source language and translate the text to the target language. 1. When you're done, enter `quit`. You can run the application again and choose a different target language. ## Create a speech translation application Now you're ready to use Azure Speech to implement text translation. ### Configure your speech translation application 1. In the **translators** folder, verify that the .env file contains the **endpoint** for your Foundry resource (Azure Speech can use the same information as Azure Translator to connect to your Foundry resource). 1. Ensure that the terminal is open in the **translators** folder with the prefix **(.venv)** to indicate that the Python environment you created is active. 1. If you did not previously install the required packages, enter the following command to do so now: ``` pip install -r requirements.txt ``` ### Add code to translate speech 1. In the **Explorer** pane, in the **translators** folder, open the **translate-speech.py** file. 1. Review the existing code. You will add code to work with Azure Speech. > **Tip**: As you add code to the code file, be sure to maintain the correct indentation. 1. At the top of the code file, under the existing namespace references, find the comment **Import namespaces** and add the following code to import the namespace you will need to use the Speech SDK: ```python # Import namespaces from azure.identity import DefaultAzureCredential import azure.cognitiveservices.speech as speech_sdk ``` 1. In the **main** function, under the comment **Get configuration settings**, note that the code loads the endpoint you defined in the configuration file. 1. Find the following code under the comment **Configure translation**, and add the following code to configure your connection to the Foundry endpoint for Azure Speech, and prepare to translate speech in US English to French, Spanish, and Hindi: ```python # Configure translation credential = DefaultAzureCredential() translation_cfg = speech_sdk.translation.SpeechTranslationConfig( token_credential=credential, endpoint=foundry_endpoint ) translation_cfg.speech_recognition_language = 'en-US' translation_cfg.add_target_language('fr') translation_cfg.add_target_language('es') translation_cfg.add_target_language('hi') audio_in_cfg = speech_sdk.AudioConfig(use_default_microphone=True) translator = speech_sdk.translation.TranslationRecognizer( translation_config=translation_cfg, audio_config=audio_in_cfg ) print('Ready to translate from',translation_cfg.speech_recognition_language) ``` 1. You will use the **SpeechTranslationConfig** to translate speech into text, but you will also use a **SpeechConfig** to synthesize translations into speech. Add the following code under the comment **Configure speech for synthesis of translations**: ```python # Configure speech for synthesis of translations speech_cfg = speech_sdk.SpeechConfig( token_credential=credential, endpoint=foundry_endpoint) audio_out_cfg = speech_sdk.audio.AudioOutputConfig(use_default_speaker=True) voices = { "fr": "fr-FR-HenriNeural", "es": "es-ES-ElviraNeural", "hi": "hi-IN-MadhurNeural" } print('Ready to use speech service.') ``` 1. Now it's time to add the code to translate the user's speech int the system microphone. Find the comment **Translate user speech**, and add the following code: ```python # Translate user speech print("Speak now...") translation_results = translator.recognize_once_async().get() print(f"Translating '{translation_results.text}'") ``` 1. When the results are returned, the application will iterate through the translations, printing the text and playing the synthesized speech through the default system speaker. Find the comment **Print and speak the translation results** and add he following code: ```python # Print and speak the translation results translations = translation_results.translations for translation_language in translations: print(f"{translation_language}: '{translations[translation_language]}'") speech_cfg.speech_synthesis_voice_name = voices.get(translation_language) audio_out_cfg = speech_sdk.audio.AudioOutputConfig(use_default_speaker=True) speech_synthesizer = speech_sdk.SpeechSynthesizer(speech_cfg, audio_out_cfg) speak = speech_synthesizer.speak_text_async(translations[translation_language]).get() if speak.reason != speech_sdk.ResultReason.SynthesizingAudioCompleted: print(speak.reason) ``` 1. Save the changes to the code file. Then, in the terminal pane, if you are not already signed into Azure (or your session may have expired) use the following command to sign into Azure. ```powershell az login ``` > **Note**: In most scenarios, just using *az login* will be sufficient. However, if you have subscriptions in multiple tenants, you may need to specify the tenant by using the *--tenant* parameter. See [Sign into Azure interactively using the Azure CLI](https://learn.microsoft.com/cli/azure/authenticate-azure-cli-interactively) for details. 1. When prompted, follow the instructions to sign into Azure. Then complete the sign in process in the command line, viewing (and confirming if necessary) the details of the subscription containing your Foundry resource. 1. After you have signed in, enter the following command to run the application: ``` python translate-speech.py ``` 1. When prompted, say something aloud (for example, "*Hello!"*). The program should translate it to the languages specified in the code (French, Spanish, and Hindi), and print and speak the translations. > **NOTE**: The translation to Hindi may not always be displayed correctly in the terminal due to character encoding issues. ## Clean up resources If you have finished exploring Microsoft Foundry, delete any resources that you no longer need. This avoids accruing any unnecessary costs. 1. Open the **Azure portal** at [https://portal.azure.com](https://portal.azure.com) and select the resource group that contains the resources you created. 1. Select **Delete resource group** and then **enter the resource group name** to confirm. The resource group is then deleted. ================================================ FILE: Instructions/Labs/01-analyze-text.md ================================================ --- lab: title: 'Analyze text (deprecated)' description: "Use Azure AI Language to analyze text, including language detection, sentiment analysis, key phrase extraction, and entity recognition." islab: false --- # Analyze Text (deprecated) > **Note**: This exercise is deprecated. Consider completing the replacement exercise at . **Azure AI Language** supports analysis of text, including language detection, sentiment analysis, key phrase extraction, and entity recognition. For example, suppose a travel agency wants to process hotel reviews that have been submitted to the company's web site. By using the Azure AI Language, they can determine the language each review is written in, the sentiment (positive, neutral, or negative) of the reviews, key phrases that might indicate the main topics discussed in the review, and named entities, such as places, landmarks, or people mentioned in the reviews. In this exercise, you'll use the Azure AI Language Python SDK for text analytics to implement a simple hotel review application based on this example. While this exercise is based on Python, you can develop text analytics applications using multiple language-specific SDKs; including: - [Azure AI Text Analytics client library for Python](https://pypi.org/project/azure-ai-textanalytics/) - [Azure AI Text Analytics client library for .NET](https://www.nuget.org/packages/Azure.AI.TextAnalytics) - [Azure AI Text Analytics client library for JavaScript](https://www.npmjs.com/package/@azure/ai-text-analytics) This exercise takes approximately **30** minutes. ## Provision an *Azure AI Language* resource If you don't already have one in your subscription, you'll need to provision an **Azure AI Language service** resource in your Azure subscription. 1. Open the Azure portal at `https://portal.azure.com`, and sign in using the Microsoft account associated with your Azure subscription. 1. Select **Create a resource**. 1. In the search field, search for **Language service**. Then, in the results, select **Create** under **Language Service**. 1. Select **Continue to create your resource**. 1. Provision the resource using the following settings: - **Subscription**: *Your Azure subscription*. - **Resource group**: *Choose or create a resource group*. - **Region**:*Choose any available region* - **Name**: *Enter a unique name*. - **Pricing tier**: Select **F0** (*free*), or **S** (*standard*) if F is not available. - **Responsible AI Notice**: Agree. 1. Select **Review + create**, then select **Create** to provision the resource. 1. Wait for deployment to complete, and then go to the deployed resource. 1. View the **Keys and Endpoint** page in the **Resource Management** section. You will need the information on this page later in the exercise. ## Clone the repository for this course You'll develop your code using Cloud Shell from the Azure Portal. The code files for your app have been provided in a GitHub repo. 1. In the Azure Portal, use the **[\>_]** button to the right of the search bar at the top of the page to create a new Cloud Shell in the Azure portal, selecting a ***PowerShell*** environment. The cloud shell provides a command line interface in a pane at the bottom of the Azure portal. > **Note**: If you have previously created a cloud shell that uses a *Bash* environment, switch it to ***PowerShell***. 1. In the cloud shell toolbar, in the **Settings** menu, select **Go to Classic version** (this is required to use the code editor). **Ensure you've switched to the classic version of the cloud shell before continuing.** 1. In the PowerShell pane, enter the following commands to clone the GitHub repo for this exercise: ``` rm -r mslearn-ai-language -f git clone https://github.com/microsoftlearning/mslearn-ai-language ``` > **Tip**: As you enter commands into the cloudshell, the output may take up a large amount of the screen buffer. You can clear the screen by entering the `cls` command to make it easier to focus on each task. 1. After the repo has been cloned, navigate to the folder containing the application code files: ``` cd mslearn-ai-language/Labfiles/01-analyze-text/Python/text-analysis ``` ## Configure your application 1. In the command line pane, run the following command to view the code files in the **text-analysis** folder: ``` ls -a -l ``` The files include a configuration file (**.env**) and a code file (**text-analysis.py**). The text your application will analyze is in the **reviews** subfolder. 1. Create a Python virtual environment and install the Azure AI Language Text Analytics SDK package and other required packages by running the following command: ``` python -m venv labenv; ./labenv/bin/Activate.ps1; pip install -r requirements.txt azure-ai-textanalytics==5.3.0 ``` 1. Enter the following command to edit the application configuration file: ``` code .env ``` The file is opened in a code editor. 1. Update the configuration values to include the **endpoint** and a **key** from the Azure Language resource you created (available on the **Keys and Endpoint** page for your Azure AI Language resource in the Azure portal) 1. After you've replaced the placeholders, within the code editor, use the **CTRL+S** command or **Right-click > Save** to save your changes and then use the **CTRL+Q** command or **Right-click > Quit** to close the code editor while keeping the cloud shell command line open. ## Add code to connect to your Azure AI Language resource 1. Enter the following command to edit the application code file: ``` code text-analysis.py ``` 1. Review the existing code. You will add code to work with the AI Language Text Analytics SDK. > **Tip**: As you add code to the code file, be sure to maintain the correct indentation. 1. At the top of the code file, under the existing namespace references, find the comment **Import namespaces** and add the following code to import the namespaces you will need to use the Text Analytics SDK: ```python # import namespaces from azure.core.credentials import AzureKeyCredential from azure.ai.textanalytics import TextAnalyticsClient ``` 1. In the **main** function, note that code to load the Azure AI Language service endpoint and key from the configuration file has already been provided. Then find the comment **Create client using endpoint and key**, and add the following code to create a client for the Text Analysis API: ```Python # Create client using endpoint and key credential = AzureKeyCredential(ai_key) ai_client = TextAnalyticsClient(endpoint=ai_endpoint, credential=credential) ``` 1. Save your changes (CTRL+S), then enter the following command to run the program (you maximize the cloud shell pane and resize the panels to see more text in the command line pane): ``` python text-analysis.py ``` 1. Observe the output as the code should run without error, displaying the contents of each review text file in the **reviews** folder. The application successfully creates a client for the Text Analytics API but doesn't make use of it. We'll fix that in the next section. ## Add code to detect language Now that you have created a client for the API, let's use it to detect the language in which each review is written. 1. In the code editor, find the comment **Get language**. Then add the code necessary to detect the language in each review document: ```python # Get language detectedLanguage = ai_client.detect_language(documents=[text])[0] print('\nLanguage: {}'.format(detectedLanguage.primary_language.name)) ``` > **Note**: *In this example, each review is analyzed individually, resulting in a separate call to the service for each file. An alternative approach is to create a collection of documents and pass them to the service in a single call. In both approaches, the response from the service consists of a collection of documents; which is why in the Python code above, the index of the first (and only) document in the response ([0]) is specified.* 1. Save your changes. Then re-run the program. 1. Observe the output, noting that this time the language for each review is identified. ## Add code to evaluate sentiment *Sentiment analysis* is a commonly used technique to classify text as *positive* or *negative* (or possible *neutral* or *mixed*). It's commonly used to analyze social media posts, product reviews, and other items where the sentiment of the text may provide useful insights. 1. In the code editor, find the comment **Get sentiment**. Then add the code necessary to detect the sentiment of each review document: ```python # Get sentiment sentimentAnalysis = ai_client.analyze_sentiment(documents=[text])[0] print("\nSentiment: {}".format(sentimentAnalysis.sentiment)) ``` 1. Save your changes. Then close the code editor and re-run the program. 1. Observe the output, noting that the sentiment of the reviews is detected. ## Add code to identify key phrases It can be useful to identify key phrases in a body of text to help determine the main topics that it discusses. 1. In the code editor, find the comment **Get key phrases**. Then add the code necessary to detect the key phrases in each review document: ```python # Get key phrases phrases = ai_client.extract_key_phrases(documents=[text])[0].key_phrases if len(phrases) > 0: print("\nKey Phrases:") for phrase in phrases: print('\t{}'.format(phrase)) ``` 1. Save your changes and re-run the program. 1. Observe the output, noting that each document contains key phrases that give some insights into what the review is about. ## Add code to extract entities Often, documents or other bodies of text mention people, places, time periods, or other entities. The text Analytics API can detect multiple categories (and subcategories) of entity in your text. 1. In the code editor, find the comment **Get entities**. Then, add the code necessary to identify entities that are mentioned in each review: ```python # Get entities entities = ai_client.recognize_entities(documents=[text])[0].entities if len(entities) > 0: print("\nEntities") for entity in entities: print('\t{} ({})'.format(entity.text, entity.category)) ``` 1. Save your changes and re-run the program. 1. Observe the output, noting the entities that have been detected in the text. ## Add code to extract linked entities In addition to categorized entities, the Text Analytics API can detect entities for which there are known links to data sources, such as Wikipedia. 1. In the code editor, find the comment **Get linked entities**. Then, add the code necessary to identify linked entities that are mentioned in each review: ```python # Get linked entities entities = ai_client.recognize_linked_entities(documents=[text])[0].entities if len(entities) > 0: print("\nLinks") for linked_entity in entities: print('\t{} ({})'.format(linked_entity.name, linked_entity.url)) ``` 1. Save your changes and re-run the program. 1. Observe the output, noting the linked entities that are identified. ## Clean up resources If you're finished exploring the Azure AI Language service, you can delete the resources you created in this exercise. Here's how: 1. Close the Azure cloud shell pane 1. In the Azure portal, browse to the Azure AI Language resource you created in this lab. 1. On the resource page, select **Delete** and follow the instructions to delete the resource. ## More information For more information about using **Azure AI Language**, see the [documentation](https://learn.microsoft.com/azure/ai-services/language-service/). ================================================ FILE: Instructions/Labs/02-qna.md ================================================ --- lab: title: 'Create a Question Answering solution (deprecated)' description: "Use Azure AI Language to create a custom question answering solution." islab: false --- # Create a Question Answering Solution (deprecated) > **Note**: This exercise is deprecated. Consider reviewing the QuickStart tutorial at One of the most common conversational scenarios is providing support through a knowledge base of frequently asked questions (FAQs). Many organizations publish FAQs as documents or web pages, which works well for a small set of question and answer pairs, but large documents can be difficult and time-consuming to search. **Azure AI Language** includes a *question answering* capability that enables you to create a knowledge base of question and answer pairs that can be queried using natural language input, and is most commonly used as a resource that a bot can use to look up answers to questions submitted by users. In this exercise, you'll use the Azure AI Language Python SDK for text analytics to implement a simple question answering application. While this exercise is based on Python, you can develop question answering applications using multiple language-specific SDKs; including: - [Azure AI Language Service Question Answering client library for Python](https://pypi.org/project/azure-ai-language-questionanswering/) - [Azure AI Language Service Question Answering client library for .NET](https://www.nuget.org/packages/Azure.AI.Language.QuestionAnswering) This exercise takes approximately **20** minutes. ## Provision an *Azure AI Language* resource If you don't already have one in your subscription, you'll need to provision an **Azure AI Language service** resource. Additionally, to create and host a knowledge base for question answering, you need to enable the **Question Answering** feature. 1. Open the Azure portal at `https://portal.azure.com`, and sign in using the Microsoft account associated with your Azure subscription. 1. Select **Create a resource**. 1. In the search field, search for **Language service**. Then, in the results, select **Create** under **Language Service**. 1. Select the **Custom question answering** block. Then select **Continue to create your resource**. You will need to enter the following settings: - **Subscription**: *Your Azure subscription* - **Resource group**: *Choose or create a resource group*. - **Region**: *Choose any available location* - **Name**: *Enter a unique name* - **Pricing tier**: Select **F0** (*free*), or **S** (*standard*) if F is not available. - **Azure Search region**: *Choose a location in the same global region as your Language resource* - **Azure Search pricing tier**: Free (F) (*If this tier is not available, select Basic (B)*) - **Responsible AI Notice**: *Agree* 1. Select **Create + review**, then select **Create**. > **NOTE** > Custom Question Answering uses Azure Search to index and query the knowledge base of questions and answers. 1. Wait for deployment to complete, and then go to the deployed resource. 1. View the **Keys and Endpoint** page in the **Resource Management** section. You will need the information on this page later in the exercise. ## Create a question answering project To create a knowledge base for question answering in your Azure AI Language resource, you can use the Language Studio portal to create a question answering project. In this case, you'll create a knowledge base containing questions and answers about [Microsoft Learn](https://learn.microsoft.com/training/). 1. In a new browser tab, go to the Language Studio portal at [https://language.cognitive.azure.com/](https://language.cognitive.azure.com/) and sign in using the Microsoft account associated with your Azure subscription. 1. If you're prompted to choose a Language resource, select the following settings: - **Azure Directory**: The Azure directory containing your subscription. - **Azure subscription**: Your Azure subscription. - **Resource type**: Language - **Resource name**: The Azure AI Language resource you created previously. If you are not prompted to choose a language resource, it may be because you have multiple Language resources in your subscription; in which case: 1. On the bar at the top if the page, select the **Settings (⚙)** button. 2. On the **Settings** page, view the **Resources** tab. 3. Select the language resource you just created, and click **Switch resource**. 4. At the top of the page, click **Language Studio** to return to the Language Studio home page. 1. At the top of the portal, in the **Create new** menu, select **Custom question answering**. 1. In the ***Create a project** wizard, on the **Choose language setting** page, select the option to **Select the language for all projects**, and select **English** as the language. Then select **Next**. 1. On the **Enter basic information** page, enter the following details: - **Name** `LearnFAQ` - **Description**: `FAQ for Microsoft Learn` - **Default answer when no answer is returned**: `Sorry, I don't understand the question` 1. Select **Next**. 1. On the **Review and finish** page, select **Create project**. ## Add sources to the knowledge base You can create a knowledge base from scratch, but it's common to start by importing questions and answers from an existing FAQ page or document. In this case, you'll import data from an existing FAQ web page for Microsoft Learn, and you'll also import some pre-defined "chit chat" questions and answers to support common conversational exchanges. 1. On the **Manage sources** page for your question answering project, in the **╋ Add source** list, select **URLs**. Then in the **Add URLs** dialog box, select **╋ Add url** and set the following name and URL before you select **Add all** to add it to the knowledge base: - **Name**: `Learn FAQ Page` - **URL**: `https://learn.microsoft.com/en-us/training/support/faq?pivots=general` 1. On the **Manage sources** page for your question answering project, in the **╋ Add source** list, select **Chitchat**. The in the **Add chit chat** dialog box, select **Friendly** and select **Add chit chat**. > **NOTE** > If you encounter the error **BadArgument Invalid input**, follow these steps as a workaround: > > - Open the FAQ page in a new browser tab: > `https://learn.microsoft.com/en-us/training/support/faq?pivots=general` > - At the bottom left panel, look for the **Download PDF** button. > - You’ll be taken to a PDF view of the webpage. Select the print option (or press `Ctrl+P` / `Cmd+P`). > - In the print dialog, choose **Save as PDF** as the printer and select **Pages 1–4** (these pages cover the FAQ content needed). > - Save the file locally. > - Go back to the **Manage sources** page, select **+ Add source**, and choose **Files**. > - Select **+ Add File**, enter `Learn FAQ Page` as the name, upload the saved PDF, and select **Add all**. ## Edit the knowledge base Your knowledge base has been populated with question and answer pairs from the Microsoft Learn FAQ, supplemented with a set of conversational *chit-chat* question and answer pairs. You can extend the knowledge base by adding additional question and answer pairs. 1. In your **LearnFAQ** project in Language Studio, select the **Edit knowledge base** page to see the existing question and answer pairs (if some tips are displayed, read them and choose **Got it** to dismiss them, or select **Skip all**) 1. In the knowledge base, on the **Question answer pairs** tab, select **+**, and create a new question answer pair with the following settings: - **Source**: `https://learn.microsoft.com/en-us/training/support/faq?pivots=general` - **Question**: `What are the different types of modules on Microsoft Learn?` - **Answer**: `Microsoft Learn offers various types of training modules, including role-based learning paths, product-specific modules, and hands-on labs. Each module contains units with lessons and knowledge checks to help you learn at your own pace.` 1. Select **Done**. 1. In the page for the **What are the different types of modules on Microsoft Learn?** question that is created, expand **Alternate questions**. Then add the alternate question `How are training modules organized?`. In some cases, it makes sense to enable the user to follow up on an answer by creating a *multi-turn* conversation that enables the user to iteratively refine the question to get to the answer they need. 1. Under the answer you entered for the module types question, expand **Follow-up prompts** and add the following follow-up prompt: - **Text displayed in the prompt to the user**: `Learn more about training`. - Select the **Create link to new pair** tab, and enter this text: `You can explore modules and learning paths on the [Microsoft Learn training page](https://learn.microsoft.com/training/).` - Select **Show in contextual flow only**. This option ensures that the answer is only ever returned in the context of a follow-up question from the original module types question. 1. Select **Add prompt**. ## Train and test the knowledge base Now that you have a knowledge base, you can test it in Language Studio. 1. Save the changes to your knowledge base by selecting the **Save** button under the **Question answer pairs** tab on the left. 1. After the changes have been saved, select the **Test** button to open the test pane. 1. In the test pane, at the top, deselect **Include short answer response** (if not already unselected). Then at the bottom enter the message `Hello`. A suitable response should be returned. 1. In the test pane, at the bottom enter the message `What is Microsoft Learn?`. An appropriate response from the FAQ should be returned. 1. Enter the message `Thanks!` An appropriate chit-chat response should be returned. 1. Enter the message `What are the different types of modules on Microsoft Learn?`. The answer you created should be returned along with a follow-up prompt link. 1. Select the **Learn more about training** follow-up link. The follow-up answer with a link to the training page should be returned. 1. When you're done testing the knowledge base, close the test pane. ## Deploy the knowledge base The knowledge base provides a back-end service that client applications can use to answer questions. Now you are ready to publish your knowledge base and access its REST interface from a client. 1. In the **LearnFAQ** project in Language Studio, select the **Deploy knowledge base** page from the navigation menu on the left. 1. At the top of the page, select **Deploy**. Then select **Deploy** to confirm you want to deploy the knowledge base. 1. When deployment is complete, select **Get prediction URL** to view the REST endpoint for your knowledge base and note that the sample request includes parameters for: - **projectName**: The name of your project (which should be *LearnFAQ*) - **deploymentName**: The name of your deployment (which should be *production*) 1. Close the prediction URL dialog box. ## Prepare to develop an app in Cloud Shell You'll develop your question answering app using Cloud Shell in the Azure portal. The code files for your app have been provided in a GitHub repo. 1. In the Azure Portal, use the **[\>_]** button to the right of the search bar at the top of the page to create a new Cloud Shell in the Azure portal, selecting a ***PowerShell*** environment. The cloud shell provides a command line interface in a pane at the bottom of the Azure portal. > **Note**: If you have previously created a cloud shell that uses a *Bash* environment, switch it to ***PowerShell***. 1. In the cloud shell toolbar, in the **Settings** menu, select **Go to Classic version** (this is required to use the code editor). **Ensure you've switched to the classic version of the cloud shell before continuing.** 1. In the PowerShell pane, enter the following commands to clone the GitHub repo for this exercise: ``` rm -r mslearn-ai-language -f git clone https://github.com/microsoftlearning/mslearn-ai-language ``` > **Tip**: As you enter commands into the cloudshell, the ouput may take up a large amount of the screen buffer. You can clear the screen by entering the `cls` command to make it easier to focus on each task. 1. After the repo has been cloned, navigate to the folder containing the application code files: ``` cd mslearn-ai-language/Labfiles/02-qna/Python/qna-app ``` ## Configure your application 1. In the command line pane, run the following command to view the code files in the **qna-app** folder: ``` ls -a -l ``` The files include a configuration file (**.env**) and a code file (**qna-app.py**). 1. Create a Python virtual environment and install the Azure AI Language Question Answering SDK package and other required packages by running the following command: ``` python -m venv labenv; ./labenv/bin/Activate.ps1; pip install -r requirements.txt azure-ai-language-questionanswering ``` 1. Enter the following command to edit the configuration file: ``` code .env ``` The file is opened in a code editor. 1. In the code file, update the configuration values it contains to reflect the **endpoint** and an authentication **key** for the Azure Language resource you created (available on the **Keys and Endpoint** page for your Azure AI Language resource in the Azure portal). The project name and deployment name for your deployed knowledge base should also be in this file. 1. After you've replaced the placeholders, within the code editor, use the **CTRL+S** command or **Right-click > Save** to save your changes and then use the **CTRL+Q** command or **Right-click > Quit** to close the code editor while keeping the cloud shell command line open. ## Add code to use your knowledge base 1. Enter the following command to edit the application code file: ``` code qna-app.py ``` 1. Review the existing code. You will add code to work with your knowledge base. > **Tip**: As you add code to the code file, be sure to maintain the correct indentation. 1. In the code file, find the comment **Import namespaces**. Then, under this comment, add the following language-specific code to import the namespaces you will need to use the Question Answering SDK: ```python # import namespaces from azure.core.credentials import AzureKeyCredential from azure.ai.language.questionanswering import QuestionAnsweringClient ``` 1. In the **main** function, note that code to load the Azure AI Language service endpoint and key from the configuration file has already been provided. Then find the comment **Create client using endpoint and key**, and add the following code to create a question answering client: ```Python # Create client using endpoint and key credential = AzureKeyCredential(ai_key) ai_client = QuestionAnsweringClient(endpoint=ai_endpoint, credential=credential) ``` 1. In the code file, find the comment **Submit a question and display the answer**, and add the following code to repeatedly read questions from the command line, submit them to the service, and display details of the answers: ```Python # Submit a question and display the answer user_question = '' while True: user_question = input('\nQuestion:\n') if user_question.lower() == "quit": break response = ai_client.get_answers(question=user_question, project_name=ai_project_name, deployment_name=ai_deployment_name) for candidate in response.answers: print(candidate.answer) print("Confidence: {}".format(candidate.confidence)) print("Source: {}".format(candidate.source)) ``` 1. Save your changes (CTRL+S), then enter the following command to run the program (you maximize the cloud shell pane and resize the panels to see more text in the command line pane): ``` python qna-app.py ``` 1. When prompted, enter a question to be submitted to your question answering project; for example `What is a learning path?`. 1. Review the answer that is returned. 1. Ask more questions. When you're done, enter `quit`. ## Clean up resources If you're finished exploring the Azure AI Language service, you can delete the resources you created in this exercise. Here's how: 1. Close the Azure cloud shell pane 1. In the Azure portal, browse to the Azure AI Language resource you created in this lab. 1. On the resource page, select **Delete** and follow the instructions to delete the resource. ## More information To learn more about question answering in Azure AI Language, see the [Azure AI Language documentation](https://learn.microsoft.com/azure/ai-services/language-service/question-answering/overview). ================================================ FILE: Instructions/Labs/03-language-understanding.md ================================================ --- lab: title: 'Create a language understanding model with the Azure AI Language service (deprecated)' description: "Create a custom language understanding model to interpret input, predict intent, and identify entities." islab: false --- # Create a language understanding model with the Language service (deprecated) > **Note**: This exercise is deprecated. Consider reviewing the QuickStart tutorial at . The Azure AI Language service enables you to define a *conversational language understanding* model that applications can use to interpret natural language *utterances* from users (text or spoken input), predict the users *intent* (what they want to achieve), and identify any *entities* to which the intent should be applied. For example, a conversational language model for a clock application might be expected to process input such as: *What is the time in London?* This kind of input is an example of an *utterance* (something a user might say or type), for which the desired *intent* is to get the time in a specific location (an *entity*); in this case, London. > **NOTE** > The task of a conversational language model is to predict the user's intent and identify any entities to which the intent applies. It is not the job of a conversational language model to actually perform the actions required to satisfy the intent. For example, a clock application can use a conversational language model to discern that the user wants to know the time in London; but the client application itself must then implement the logic to determine the correct time and present it to the user. In this exercise, you'll use the Azure AI Language service to create a conversational language understand model, and use the Python SDK to implement a client app that uses it. While this exercise is based on Python, you can develop conversational understanding applications using multiple language-specific SDKs; including: - [Azure AI Conversations client library for Python](https://pypi.org/project/azure-ai-language-conversations/) - [Azure AI Conversations client library for .NET](https://www.nuget.org/packages/Azure.AI.Language.Conversations) - [Azure AI Conversations client library for JavaScript](https://www.npmjs.com/package/@azure/ai-language-conversations) This exercise takes approximately **35** minutes. ## Provision an *Azure AI Language* resource If you don't already have one in your subscription, you'll need to provision an **Azure AI Language service** resource in your Azure subscription. 1. Open the Azure portal at `https://portal.azure.com`, and sign in using the Microsoft account associated with your Azure subscription. 1. Select **Create a resource**. 1. In the search field, search for **Language service**. Then, in the results, select **Create** under **Language Service**. 1. Provision the resource using the following settings: - **Subscription**: *Your Azure subscription*. - **Resource group**: *Choose or create a resource group*. - **Region**: *Choose from one of the following regions*\* - Australia East - Central India - China East 2 - East US - East US 2 - North Europe - South Central US - Switzerland North - UK South - West Europe - West US 2 - West US 3 - **Name**: *Enter a unique name*. - **Pricing tier**: Select **F0** (*free*), or **S** (*standard*) if F is not available. - **Responsible AI Notice**: Agree. 1. Select **Review + create**, then select **Create** to provision the resource. 1. Wait for deployment to complete, and then go to the deployed resource. 1. View the **Keys and Endpoint** page. You will need the information on this page later in the exercise. ## Create a conversational language understanding project Now that you have created an authoring resource, you can use it to create a conversational language understanding project. 1. In a new browser tab, open the Azure AI Language Studio portal at `https://language.cognitive.azure.com/` and sign in using the Microsoft account associated with your Azure subscription. 1. If prompted to choose a Language resource, select the following settings: - **Azure Directory**: The Azure directory containing your subscription. - **Azure subscription**: Your Azure subscription. - **Resource type**: Language. - **Language resource**: The Azure AI Language resource you created previously. If you are not prompted to choose a language resource, it may be because you have multiple Language resources in your subscription; in which case: 1. On the bar at the top of the page, select the **Settings (⚙)** button. 2. On the **Settings** page, view the **Resources** tab. 3. Select the language resource you just created, and click **Switch resource**. 4. At the top of the page, click **Language Studio** to return to the Language Studio home page 1. At the top of the portal, in the **Create new** menu, select **Conversational language understanding**. 1. In the **Create a project** dialog box, on the **Enter basic information** page, enter the following details and then select **Next**: - **Name**: `Clock` - **Utterances primary language**: English - **Enable multiple languages in project?**: *Unselected* - **Description**: `Natural language clock` 1. On the **Review and finish** page, select **Create**. ### Create intents The first thing we'll do in the new project is to define some intents. The model will ultimately predict which of these intents a user is requesting when submitting a natural language utterance. > **Tip**: When working on your project, if some tips are displayed, read them and select **Got it** to dismiss them, or select **Skip all**. 1. On the **Schema definition** page, on the **Intents** tab, select **+ Add** to add a new intent named `GetTime`. 1. Verify that the **GetTime** intent is listed (along with the default **None** intent). Then add the following additional intents: - `GetDay` - `GetDate` ### Label each intent with sample utterances To help the model predict which intent a user is requesting, you must label each intent with some sample utterances. 1. In the pane on the left, select the **Data Labeling** page. > **Tip**: You can expand the pane with the **>>** icon to see the page names, and hide it again with the **<<** icon. 1. Select the new **GetTime** intent and enter the utterance `what is the time?`. This adds the utterance as sample input for the intent. 1. Add the following additional utterances for the **GetTime** intent: - `what's the time?` - `what time is it?` - `tell me the time` > **NOTE** > To add a new utterance, write the utterance in the textbox next to the intent and then press ENTER. 1. Select the **GetDay** intent and add the following utterances as example input for that intent: - `what day is it?` - `what's the day?` - `what is the day today?` - `what day of the week is it?` 1. Select the **GetDate** intent and add the following utterances for it: - `what date is it?` - `what's the date?` - `what is the date today?` - `what's today's date?` 1. After you've added utterances for each of your intents, select **Save changes**. ### Train and test the model Now that you've added some intents, let's train the language model and see if it can correctly predict them from user input. 1. In the pane on the left, select **Training jobs**. Then select **+ Start a training job**. 1. On the **Start a training job** dialog, select the option to train a new model, name it `Clock`. Select **Standard training** mode and the default **Data splitting** options. 1. To begin the process of training your model, select **Train**. 1. When training is complete (which may take several minutes) the job **Status** will change to **Training succeeded**. 1. Select the **Model performance** page, and then select the **Clock** model. Review the overall and per-intent evaluation metrics (*precision*, *recall*, and *F1 score*) and the *confusion matrix* generated by the evaluation that was performed when training (note that due to the small number of sample utterances, not all intents may be included in the results). > **NOTE** > To learn more about the evaluation metrics, refer to the [documentation](https://learn.microsoft.com/azure/ai-services/language-service/conversational-language-understanding/concepts/evaluation-metrics) 1. Go to the **Deploying a model** page, then select **Add deployment**. 1. On the **Add deployment** dialog, select **Create a new deployment name**, and then enter `production`. 1. Select the **Clock** model in the **Model** field then select **Deploy**. The deployment may take some time. 1. When the model has been deployed, select the **Testing deployments** page, then select the **production** deployment in the **Deployment name** field. 1. Enter the following text in the empty textbox, and then select **Run the test**: `what's the time now?` Review the result that is returned, noting that it includes the predicted intent (which should be **GetTime**) and a confidence score that indicates the probability the model calculated for the predicted intent. The JSON tab shows the comparative confidence for each potential intent (the one with the highest confidence score is the predicted intent) 1. Clear the text box, and then run another test with the following text: `tell me the time` Again, review the predicted intent and confidence score. 1. Try the following text: `what's the day today?` Hopefully the model predicts the **GetDay** intent. ## Add entities So far you've defined some simple utterances that map to intents. Most real applications include more complex utterances from which specific data entities must be extracted to get more context for the intent. ### Add a learned entity The most common kind of entity is a *learned* entity, in which the model learns to identify entity values based on examples. 1. In Language Studio, return to the **Schema definition** page and then on the **Entities** tab, select **+ Add** to add a new entity. 1. In the **Add an entity** dialog box, enter the entity name `Location` and ensure that the **Learned** tab is selected. Then select **Add entity**. 1. After the **Location** entity has been created, return to the **Data labeling** page. 1. Select the **GetTime** intent and enter the following new example utterance: `what time is it in London?` 1. When the utterance has been added, select the word **London**, and in the drop-down list that appears, select **Location** to indicate that "London" is an example of a location. 1. Add another example utterance for the **GetTime** intent: `Tell me the time in Paris?` 1. When the utterance has been added, select the word **Paris**, and map it to the **Location** entity. 1. Add another example utterance for the **GetTime** intent: `what's the time in New York?` 1. When the utterance has been added, select the words **New York**, and map them to the **Location** entity. 1. Select **Save changes** to save the new utterances. ### Add a *list* entity In some cases, valid values for an entity can be restricted to a list of specific terms and synonyms; which can help the app identify instances of the entity in utterances. 1. In Language Studio, return to the **Schema definition** page and then on the **Entities** tab, select **+ Add** to add a new entity. 1. In the **Add an entity** dialog box, enter the entity name `Weekday` and select the **List** entity tab. Then select **Add entity**. 1. On the page for the **Weekday** entity, in the **Learned** section, ensure **Not required** is selected. Then, in the **List** section, select **+ Add new list**. Then enter the following value and synonym and select **Save**: | List key | synonyms| |-------------------|---------| | `Sunday` | `Sun` | > **NOTE** > To enter the fields of the new list, insert the value `Sunday` in the text field, then click on the field where 'Type in value and press enter...' is displayed, enter the synonyms, and press ENTER. 1. Repeat the previous step to add the following list components: | Value | synonyms| |-------------------|---------| | `Monday` | `Mon` | | `Tuesday` | `Tue, Tues` | | `Wednesday` | `Wed, Weds` | | `Thursday` | `Thur, Thurs` | | `Friday` | `Fri` | | `Saturday` | `Sat` | 1. After adding and saving the list values, return to the **Data labeling** page. 1. Select the **GetDate** intent and enter the following new example utterance: `what date was it on Saturday?` 1. When the utterance has been added, select the word ***Saturday***, and in the drop-down list that appears, select **Weekday**. 1. Add another example utterance for the **GetDate** intent: `what date will it be on Friday?` 1. When the utterance has been added, map **Friday** to the **Weekday** entity. 1. Add another example utterance for the **GetDate** intent: `what will the date be on Thurs?` 1. When the utterance has been added, map **Thurs** to the **Weekday** entity. 1. select **Save changes** to save the new utterances. ### Add a *prebuilt* entity The Azure AI Language service provides a set of *prebuilt* entities that are commonly used in conversational applications. 1. In Language Studio, return to the **Schema definition** page and then on the **Entities** tab, select **+ Add** to add a new entity. 1. In the **Add an entity** dialog box, enter the entity name `Date` and select the **Prebuilt** entity tab. Then select **Add entity**. 1. On the page for the **Date** entity, in the **Learned** section, ensure **Not required** is selected. Then, in the **Prebuilt** section, select **+ Add new prebuilt**. 1. In the **Select prebuilt** list, select **DateTime** and then select **Save**. 1. After adding the prebuilt entity, return to the **Data labeling** page 1. Select the **GetDay** intent and enter the following new example utterance: `what day was 01/01/1901?` 1. When the utterance has been added, select ***01/01/1901***, and in the drop-down list that appears, select **Date**. 1. Add another example utterance for the **GetDay** intent: `what day will it be on Dec 31st 2099?` 1. When the utterance has been added, map **Dec 31st 2099** to the **Date** entity. 1. Select **Save changes** to save the new utterances. ### Retrain the model Now that you've modified the schema, you need to retrain and retest the model. 1. On the **Training jobs** page, select **Start a training job**. 1. On the **Start a training job** dialog, select **overwrite an existing model** and specify the **Clock** model. Select **Train** to train the model. If prompted, confirm you want to overwrite the existing model. 1. When training is complete the job **Status** will update to **Training succeeded**. 1. Select the **Model performance** page and then select the **Clock** model. Review the evaluation metrics (*precision*, *recall*, and *F1 score*) and the *confusion matrix* generated by the evaluation that was performed when training (note that due to the small number of sample utterances, not all intents may be included in the results). 1. On the **Deploying a model** page, select **Add deployment**. 1. On the **Add deployment** dialog, select **Override an existing deployment name**, and then select **production**. 1. Select the **Clock** model in the **Model** field and then select **Deploy** to deploy it. This may take some time. 1. When the model is deployed, on the **Testing deployments** page, select the **production** deployment under the **Deployment name** field, and then test it with the following text: `what's the time in Edinburgh?` 1. Review the result that is returned, which should hopefully predict the **GetTime** intent and a **Location** entity with the text value "Edinburgh". 1. Try testing the following utterances: `what time is it in Tokyo?` `what date is it on Friday?` `what's the date on Weds?` `what day was 01/01/2020?` `what day will Mar 7th 2030 be?` ## Use the model from a client app In a real project, you'd iteratively refine intents and entities, retrain, and retest until you are satisfied with the predictive performance. Then, when you've tested it and are satisfied with its predictive performance, you can use it in a client app by calling its REST interface or a runtime-specific SDK. ### Prepare to develop an app in Cloud Shell You'll develop your language understanding app using Cloud Shell in the Azure portal. The code files for your app have been provided in a GitHub repo. 1. In the Azure Portal, use the **[\>_]** button to the right of the search bar at the top of the page to create a new Cloud Shell in the Azure portal, selecting a ***PowerShell*** environment. The cloud shell provides a command line interface in a pane at the bottom of the Azure portal. > **Note**: If you have previously created a cloud shell that uses a *Bash* environment, switch it to ***PowerShell***. 1. In the cloud shell toolbar, in the **Settings** menu, select **Go to Classic version** (this is required to use the code editor). **Ensure you've switched to the classic version of the cloud shell before continuing.** 1. In the PowerShell pane, enter the following commands to clone the GitHub repo for this exercise: ``` rm -r mslearn-ai-language -f git clone https://github.com/microsoftlearning/mslearn-ai-language ``` > **Tip**: As you paste commands into the cloudshell, the ouput may take up a large amount of the screen buffer. You can clear the screen by entering the `cls` command to make it easier to focus on each task. 1. After the repo has been cloned, navigate to the folder containing the application code files: ``` cd mslearn-ai-language/Labfiles/03-language/Python/clock-client ``` ### Configure your application 1. In the command line pane, run the following command to view the code files in the **clock-client** folder: ``` ls -a -l ``` The files include a configuration file (**.env**) and a code file (**clock-client.py**). 1. Create a Python virtual environment and install the Azure AI Language Conversations SDK package and other required packages by running the following command: ``` python -m venv labenv; ./labenv/bin/Activate.ps1; pip install -r requirements.txt azure-ai-language-conversations==1.1.0 ``` 1. Enter the following command to edit the configuration file: ``` code .env ``` The file is opened in a code editor. 1. Update the configuration values to include the **endpoint** and a **key** from the Azure Language resource you created (available on the **Keys and Endpoint** page for your Azure AI Language resource in the Azure portal). 1. After you've replaced the placeholders, within the code editor, use the **CTRL+S** command or **Right-click > Save** to save your changes and then use the **CTRL+Q** command or **Right-click > Quit** to close the code editor while keeping the cloud shell command line open. ### Add code to the application 1. Enter the following command to edit the application code file: ``` code clock-client.py ``` 1. Review the existing code. You will add code to work with the AI Language Conversations SDK. > **Tip**: As you add code to the code file, be sure to maintain the correct indentation. 1. At the top of the code file, under the existing namespace references, find the comment **Import namespaces** and add the following code to import the namespaces you will need to use the AI Language Conversations SDK: ```python # Import namespaces from azure.core.credentials import AzureKeyCredential from azure.ai.language.conversations import ConversationAnalysisClient ``` 1. In the **main** function, note that code to load the prediction endpoint and key from the configuration file has already been provided. Then find the comment **Create a client for the Language service model** and add the following code to create a conversation analysis client for your AI Language service: ```python # Create a client for the Language service model client = ConversationAnalysisClient( ls_prediction_endpoint, AzureKeyCredential(ls_prediction_key)) ``` 1. Note that the code in the **main** function prompts for user input until the user enters "quit". Within this loop, find the comment **Call the Language service model to get intent and entities** and add the following code: ```python # Call the Language service model to get intent and entities cls_project = 'Clock' deployment_slot = 'production' with client: query = userText result = client.analyze_conversation( task={ "kind": "Conversation", "analysisInput": { "conversationItem": { "participantId": "1", "id": "1", "modality": "text", "language": "en", "text": query }, "isLoggingEnabled": False }, "parameters": { "projectName": cls_project, "deploymentName": deployment_slot, "verbose": True } } ) top_intent = result["result"]["prediction"]["topIntent"] entities = result["result"]["prediction"]["entities"] print("view top intent:") print("\ttop intent: {}".format(result["result"]["prediction"]["topIntent"])) print("\tcategory: {}".format(result["result"]["prediction"]["intents"][0]["category"])) print("\tconfidence score: {}\n".format(result["result"]["prediction"]["intents"][0]["confidenceScore"])) print("view entities:") for entity in entities: print("\tcategory: {}".format(entity["category"])) print("\ttext: {}".format(entity["text"])) print("\tconfidence score: {}".format(entity["confidenceScore"])) print("query: {}".format(result["result"]["query"])) ``` The call to the conversational understanding model returns a prediction/result, which includes the top (most likely) intent as well as any entities that were detected in the input utterance. Your client application must now use that prediction to determine and perform the appropriate action. 1. Find the comment **Apply the appropriate action**, and add the following code, which checks for intents supported by the application (**GetTime**, **GetDate**, and **GetDay**) and determines if any relevant entities have been detected, before calling an existing function to produce an appropriate response. ```python # Apply the appropriate action if top_intent == 'GetTime': location = 'local' # Check for entities if len(entities) > 0: # Check for a location entity for entity in entities: if 'Location' == entity["category"]: # ML entities are strings, get the first one location = entity["text"] # Get the time for the specified location print(GetTime(location)) elif top_intent == 'GetDay': date_string = date.today().strftime("%m/%d/%Y") # Check for entities if len(entities) > 0: # Check for a Date entity for entity in entities: if 'Date' == entity["category"]: # Regex entities are strings, get the first one date_string = entity["text"] # Get the day for the specified date print(GetDay(date_string)) elif top_intent == 'GetDate': day = 'today' # Check for entities if len(entities) > 0: # Check for a Weekday entity for entity in entities: if 'Weekday' == entity["category"]: # List entities are lists day = entity["text"] # Get the date for the specified day print(GetDate(day)) else: # Some other intent (for example, "None") was predicted print('Try asking me for the time, the day, or the date.') ``` 1. Save your changes (CTRL+S), then enter the following command to run the program (you maximize the cloud shell pane and resize the panels to see more text in the command line pane): ``` python clock-client.py ``` 1. When prompted, enter utterances to test the application. For example, try: *Hello* *What time is it?* *What's the time in London?* *What's the date?* *What date is Sunday?* *What day is it?* *What day is 01/01/2025?* > **Note**: The logic in the application is deliberately simple, and has a number of limitations. For example, when getting the time, only a restricted set of cities is supported and daylight savings time is ignored. The goal is to see an example of a typical pattern for using Language Service in which your application must: > 1. Connect to a prediction endpoint. > 2. Submit an utterance to get a prediction. > 3. Implement logic to respond appropriately to the predicted intent and entities. 1. When you have finished testing, enter *quit*. ## Clean up resources If you're finished exploring the Azure AI Language service, you can delete the resources you created in this exercise. Here's how: 1. Close the Azure cloud shell pane 1. In the Azure portal, browse to the Azure AI Language resource you created in this lab. 1. On the resource page, select **Delete** and follow the instructions to delete the resource. ## More information To learn more about conversational language understanding in Azure AI Language, see the [Azure AI Language documentation](https://learn.microsoft.com/azure/ai-services/language-service/conversational-language-understanding/overview). ================================================ FILE: Instructions/Labs/04-text-classification.md ================================================ --- lab: title: 'Custom text classification (deprecated)' description: "Apply custom classifications to text input using Azure AI Language." islab: false --- # Custom text classification (deprecated) > **Note**: This exercise is deprecated. Consider reviewing the QuickStart tutorial at . Azure AI Language provides several NLP capabilities, including the key phrase identification, text summarization, and sentiment analysis. The Language service also provides custom features like custom question answering and custom text classification. To test the custom text classification of the Azure AI Language service, you'll configure the model using Language Studio then use a Python application to test it. While this exercise is based on Python, you can develop text classification applications using multiple language-specific SDKs; including: - [Azure AI Text Analytics client library for Python](https://pypi.org/project/azure-ai-textanalytics/) - [Azure AI Text Analytics client library for .NET](https://www.nuget.org/packages/Azure.AI.TextAnalytics) - [Azure AI Text Analytics client library for JavaScript](https://www.npmjs.com/package/@azure/ai-text-analytics) This exercise takes approximately **35** minutes. ## Provision an *Azure AI Language* resource If you don't already have one in your subscription, you'll need to provision an **Azure AI Language service** resource. Additionally, use custom text classification, you need to enable the **Custom text classification & extraction** feature. 1. Open the Azure portal at `https://portal.azure.com`, and sign in using the Microsoft account associated with your Azure subscription. 1. Select **Create a resource**. 1. In the search field, search for **Language service**. Then, in the results, select **Create** under **Language Service**. 1. Select the box that includes **Custom text classification**. Then select **Continue to create your resource**. 1. Create a resource with the following settings: - **Subscription**: *Your Azure subscription*. - **Resource group**: *Select or create a resource group*. - **Region**: *Choose from one of the following regions*\* - Australia East - Central India - East US - East US 2 - North Europe - South Central US - Switzerland North - UK South - West Europe - West US 2 - West US 3 - **Name**: *Enter a unique name*. - **Pricing tier**: Select **F0** (*free*), or **S** (*standard*) if F is not available. - **Storage account**: New storage account - **Storage account name**: *Enter a unique name*. - **Storage account type**: Standard LRS - **Responsible AI notice**: Selected. 1. Select **Review + create,** then select **Create** to provision the resource. 1. Wait for deployment to complete, and then go to the resource group. 1. Find the storage account you created, select it, and verify the *Account kind* is **StorageV2**. If it's v1, upgrade your storage account kind on that resource page. ## Configure role-based access for your user > **NOTE**: If you skip this step, you'll get a 403 error when trying to connect to your custom project. It's important that your current user has this role to access storage account blob data, even if you're the owner of the storage account. 1. Go to your storage account page in the Azure portal. 2. Select **Access Control (IAM)** in the left navigation menu. 3. Select **Add** to Add Role Assignments, and choose the **Storage Blob Data Owner** role on the storage account. 4. Within **Assign access to**, select **User, group, or service principal**. 5. Select **Select members**. 6. Select your User. You can search for user names in the **Select** field. ## Upload sample articles Once you've created the Azure AI Language service and storage account, you'll need to upload example articles to train your model later. 1. In a new browser tab, download sample articles from `https://aka.ms/classification-articles` and extract the files to a folder of your choice. 1. In the Azure portal, navigate to the storage account you created, and select it. 1. In your storage account select **Configuration**, located below **Settings**. In the Configuration screen enable the option to **Allow Blob anonymous access** then select **Save**. 1. Select **Containers** in the left menu, located below **Data storage**. On the screen that appears, select **+ Container**. Give the container the name `articles`, and set **Anonymous access level** to **Container (anonymous read access for containers and blobs)**. > **NOTE**: When you configure a storage account for a real solution, be careful to assign the appropriate access level. To learn more about each access level, see the [Azure Storage documentation](https://learn.microsoft.com/azure/storage/blobs/anonymous-read-access-configure). > **ADDITIONAL NOTE**: If the Anonymous access level option appears unavailable or cannot be changed, refresh the page and check again. Sometimes the portal needs to reload after recent security or configuration updates before the option becomes available. 1. After you've created the container, select it then select the **Upload** button. Select **Browse for files** to browse for the sample articles you downloaded. Then select **Upload**. ## Create a custom text classification project After configuration is complete, create a custom text classification project. This project provides a working place to build, train, and deploy your model. > **NOTE**: This lab utilizes **Language Studio**, but you can also create, build, train, and deploy your model through the REST API. 1. In a new browser tab, open the Azure AI Language Studio portal at `https://language.cognitive.azure.com/` and sign in using the Microsoft account associated with your Azure subscription. 1. If prompted to choose a Language resource, select the following settings: - **Azure Directory**: The Azure directory containing your subscription. - **Azure subscription**: Your Azure subscription. - **Resource type**: Language. - **Language resource**: The Azure AI Language resource you created previously. If you are not prompted to choose a language resource, it may be because you have multiple Language resources in your subscription; in which case: 1. On the bar at the top if the page, select the **Settings (⚙)** button. 2. On the **Settings** page, view the **Resources** tab. 3. Select the language resource you just created, and click **Switch resource**. 4. At the top of the page, click **Language Studio** to return to the Language Studio home page 1. At the top of the portal, in the **Create new** menu, select **Custom text classification**. 1. The **Connect storage** page appears. All values will already have been filled. So select **Next**. 1. On the **Select project type** page, select **Single label classification**. Then select **Next**. 1. On the **Enter basic information** pane, set the following: - **Name**: `ClassifyLab` - **Text primary language**: English (US) - **Description**: `Custom text lab` 1. Select **Next**. 1. On the **Choose container** page, set the **Blob store container** dropdown to your *articles* container. 1. Select the **No, I need to label my files as part of this project** option. Then select **Next**. 1. Select **Create project**. > **Tip**: If you get an error about not being authorized to perform this operation, you'll need to add a role assignment. To fix this, we add the role "Storage Blob Data Contributor" on the storage account for the user running the lab. More details can be found [on the documentation page](https://learn.microsoft.com/azure/ai-services/language-service/custom-named-entity-recognition/how-to/create-project?tabs=portal%2Clanguage-studio#enable-identity-management-for-your-resource) ## Label your data Now that your project is created, you need to label, or tag, your data to train your model how to classify text. 1. On the left, select **Data labeling**, if not already selected. You'll see a list of the files you uploaded to your storage account. 1. On the right side, in the **Activity** pane, select **+ Add class**. The articles in this lab fall into four classes you'll need to create: `Classifieds`, `Sports`, `News`, and `Entertainment`. ![Screenshot showing the tag data page and the add class button.](../media/tag-data-add-class-new.png#lightbox) 1. After you've created your four classes, select **Article 1** to start. Here you can read the article, define which class this file is, and which dataset (training or testing) to assign it to. 1. Assign each article the appropriate class and dataset (training or testing) using the **Activity** pane on the right. You can select a label from the list of labels on the right, and set each article to **training** or **testing** using the options at the bottom of the Activity pane. You select **Next document** to move to the next document. For the purposes of this lab, we'll define which are to be used for training the model and testing the model: | Article | Class | Dataset | |---------|---------|---------| | Article 1 | Sports | Training | | Article 10 | News | Training | | Article 11 | Entertainment | Testing | | Article 12 | News | Testing | | Article 13 | Sports | Testing | | Article 2 | Sports | Training | | Article 3 | Classifieds | Training | | Article 4 | Classifieds | Training | | Article 5 | Entertainment | Training | | Article 6 | Entertainment | Training | | Article 7 | News | Training | | Article 8 | News | Training | | Article 9 | Entertainment | Training | > **NOTE** > Files in Language Studio are listed alphabetically, which is why the above list is not in sequential order. Make sure you visit both pages of documents when labeling your articles. 1. Select **Save labels** to save your labels. ## Train your model After you've labeled your data, you need to train your model. 1. Select **Training jobs** on the left side menu. 1. Select **Start a training job**. 1. Train a new model named `ClassifyArticles`. 1. Select **Use a manual split of training and testing data**. > **TIP** > In your own classification projects, the Azure AI Language service will automatically split the testing set by percentage which is useful with a large dataset. With smaller datasets, it's important to train with the right class distribution. 1. Select **Train** > **IMPORTANT** > Training your model can sometimes take several minutes. You'll get a notification when it's complete. ## Evaluate your model In real world applications of text classification, it's important to evaluate and improve your model to verify it's performing as you expect. 1. Select **Model performance**, and select your **ClassifyArticles** model. There you can see the scoring of your model, performance metrics, and when it was trained. If the scoring of your model isn't 100%, it means that one of the documents used for testing didn't evaluate to what it was labeled. These failures can help you understand where to improve. 1. Select **Test set details** tab. If there are any errors, this tab allows you to see the articles you indicated for testing and what the model predicted them as and whether that conflicts with their test label. The tab defaults to show incorrect predictions only. You can toggle the **Show mismatches only** option to see all the articles you indicated for testing and what they each of them predicted as. ## Deploy your model When you're satisfied with the training of your model, it's time to deploy it, which allows you to start classifying text through the API. 1. On the left panel, select **Deploying model**. 1. Select **Add deployment**, then enter `articles` in the **Create a new deployment name** field, and select **ClassifyArticles** in the **Model** field. 1. Select **Deploy** to deploy your model. 1. Once your model is deployed, leave that page open. You'll need your project and deployment name in the next step. ## Prepare to develop an app in Cloud Shell To test the custom text classification capabilities of the Azure AI Language service, you'll develop a simple console application in the Azure Cloud Shell. 1. In the Azure Portal, use the **[\>_]** button to the right of the search bar at the top of the page to create a new Cloud Shell in the Azure portal, selecting a ***PowerShell*** environment. The cloud shell provides a command line interface in a pane at the bottom of the Azure portal. > **Note**: If you have previously created a cloud shell that uses a *Bash* environment, switch it to ***PowerShell***. 1. In the cloud shell toolbar, in the **Settings** menu, select **Go to Classic version** (this is required to use the code editor). **Ensure you've switched to the classic version of the cloud shell before continuing.** 1. In the PowerShell pane, enter the following commands to clone the GitHub repo for this exercise: ``` rm -r mslearn-ai-language -f git clone https://github.com/microsoftlearning/mslearn-ai-language ``` > **Tip**: As you paste commands into the cloudshell, the ouput may take up a large amount of the screen buffer. You can clear the screen by entering the `cls` command to make it easier to focus on each task. 1. After the repo has been cloned, navigate to the folder containing the application code files: ``` cd mslearn-ai-language/Labfiles/04-text-classification/Python/classify-text ``` ## Configure your application 1. In the command line pane, run the following command to view the code files in the **classify-text** folder: ``` ls -a -l ``` The files include a configuration file (**.env**) and a code file (**classify-text.py**). The text your application will analyze is in the **articles** subfolder. 1. Create a Python virtual environment and install the Azure AI Language Text Analytics SDK package and other required packages by running the following command: ``` python -m venv labenv; ./labenv/bin/Activate.ps1; pip install -r requirements.txt azure-ai-textanalytics==5.3.0 ``` 1. Enter the following command to edit the application configuration file: ``` code .env ``` The file is opened in a code editor. 1. Update the configuration values to include the **endpoint** and a **key** from the Azure Language resource you created (available on the **Keys and Endpoint** page for your Azure AI Language resource in the Azure portal).The file should already contain the project and deployment names for your text classification model. 1. After you've replaced the placeholders, within the code editor, use the **CTRL+S** command or **Right-click > Save** to save your changes and then use the **CTRL+Q** command or **Right-click > Quit** to close the code editor while keeping the cloud shell command line open. ## Add code to classify documents 1. Enter the following command to edit the application code file: ``` code classify-text.py ``` 1. Review the existing code. You will add code to work with the AI Language Text Analytics SDK. > **Tip**: As you add code to the code file, be sure to maintain the correct indentation. 1. At the top of the code file, under the existing namespace references, find the comment **Import namespaces** and add the following code to import the namespaces you will need to use the Text Analytics SDK: ```python # import namespaces from azure.core.credentials import AzureKeyCredential from azure.ai.textanalytics import TextAnalyticsClient ``` 1. In the **main** function, note that code to load the Azure AI Language service endpoint and key and the project and deployment names from the configuration file has already been provided. Then find the comment **Create client using endpoint and key**, and add the following code to create a text analysis client: ```Python # Create client using endpoint and key credential = AzureKeyCredential(ai_key) ai_client = TextAnalyticsClient(endpoint=ai_endpoint, credential=credential) ``` 1. Note that the existing code reads all of the files in the **articles** folder and creates a list containing their contents. Then find the comment **Get Classifications** and add the following code: ```Python # Get Classifications operation = ai_client.begin_single_label_classify( batchedDocuments, project_name=project_name, deployment_name=deployment_name ) document_results = operation.result() for doc, classification_result in zip(files, document_results): if classification_result.kind == "CustomDocumentClassification": classification = classification_result.classifications[0] print("{} was classified as '{}' with confidence score {}.".format( doc, classification.category, classification.confidence_score) ) elif classification_result.is_error is True: print("{} has an error with code '{}' and message '{}'".format( doc, classification_result.error.code, classification_result.error.message) ) ``` 1. Save your changes (CTRL+S), then enter the following command to run the program (you maximize the cloud shell pane and resize the panels to see more text in the command line pane): ``` python classify-text.py ``` 1. Observe the output. The application should list a classification and confidence score for each text file. ## Clean up When you don't need your project any more, you can delete if from your **Projects** page in Language Studio. You can also remove the Azure AI Language service and associated storage account in the [Azure portal](https://portal.azure.com). ================================================ FILE: Instructions/Labs/05-extract-custom-entities.md ================================================ --- lab: title: 'Extract custom entities (deprecated)' description: "Train a model to extract customized entities from text input using Azure AI Language." islab: false --- # Extract custom entities (deprecated) > **Note**: This exercise is deprecated. Consider reviewing the QuickStart tutorial at . In addition to other natural language processing capabilities, Azure AI Language Service enables you to define custom entities, and extract instances of them from text. To test the custom entity extraction, we'll create a model and train it through Azure AI Language Studio, then use a Python application to test it. While this exercise is based on Python, you can develop text classification applications using multiple language-specific SDKs; including: - [Azure AI Text Analytics client library for Python](https://pypi.org/project/azure-ai-textanalytics/) - [Azure AI Text Analytics client library for .NET](https://www.nuget.org/packages/Azure.AI.TextAnalytics) - [Azure AI Text Analytics client library for JavaScript](https://www.npmjs.com/package/@azure/ai-text-analytics) This exercise takes approximately **35** minutes. ## Provision an *Azure AI Language* resource If you don't already have one in your subscription, you'll need to provision an **Azure AI Language service** resource. Additionally, use custom text classification, you need to enable the **Custom text classification & extraction** feature. 1. In a browser, open the Azure portal at `https://portal.azure.com`, and sign in with your Microsoft account. 1. Select the **Create a resource** button, search for *Language*, and create a **Language Service** resource. When on the page for *Select additional features*, select the custom feature containing **Custom named entity recognition extraction**. Create the resource with the following settings: - **Subscription**: *Your Azure subscription* - **Resource group**: *Select or create a resource group* - **Region**: *Choose from one of the following regions*\* - Australia East - Central India - East US - East US 2 - North Europe - South Central US - Switzerland North - UK South - West Europe - West US 2 - West US 3 - **Name**: *Enter a unique name* - **Pricing tier**: Select **F0** (*free*), or **S** (*standard*) if F is not available. - **Storage account**: New storage account: - **Storage account name**: *Enter a unique name*. - **Storage account type**: Standard LRS - **Responsible AI notice**: Selected. 1. Select **Review + create**, then select **Create** to provision the resource. 1. Wait for deployment to complete, and then go to the deployed resource. 1. View the **Keys and Endpoint** page. You will need the information on this page later in the exercise. ## Configure role-based access for your user > **NOTE**: If you skip this step, you'll have a 403 error when trying to connect to your custom project. It's important that your current user has this role to access storage account blob data, even if you're the owner of the storage account. 1. Go to your storage account page in the Azure portal. 2. Select **Access Control (IAM)** in the left navigation menu. 3. Select **Add** to Add Role Assignments, and choose the **Storage Blob Data Contributor** role on the storage account. 4. Within **Assign access to**, select **User, group, or service principal**. 5. Select **Select members**. 6. Select your User. You can search for user names in the **Select** field. ## Upload sample ads After you've created the Azure AI Language Service and storage account, you'll need to upload example ads to train your model later. 1. In a new browser tab, download sample classified ads from `https://aka.ms/entity-extraction-ads` and extract the files to a folder of your choice. 2. In the Azure portal, navigate to the storage account you created, and select it. 3. In your storage account select **Configuration**, located below **Settings**, and screen enable the option to **Allow Blob anonymous access** then select **Save**. 4. Select **Containers** from the left menu, located below **Data storage**. On the screen that appears, select **+ Container**. Give the container the name `classifieds`, and set **Anonymous access level** to **Container (anonymous read access for containers and blobs)**. > **NOTE**: When you configure a storage account for a real solution, be careful to assign the appropriate access level. To learn more about each access level, see the [Azure Storage documentation](https://learn.microsoft.com/azure/storage/blobs/anonymous-read-access-configure). 5. After creating the container, select it and click the **Upload** button and upload the sample ads you downloaded. ## Create a custom named entity recognition project Now you're ready to create a custom named entity recognition project. This project provides a working place to build, train, and deploy your model. > **NOTE**: You can also create, build, train, and deploy your model through the REST API. 1. In a new browser tab, open the Azure AI Language Studio portal at `https://language.cognitive.azure.com/` and sign in using the Microsoft account associated with your Azure subscription. 1. If prompted to choose a Language resource, select the following settings: - **Azure Directory**: The Azure directory containing your subscription. - **Azure subscription**: Your Azure subscription. - **Resource type**: Language. - **Language resource**: The Azure AI Language resource you created previously. If you are not prompted to choose a language resource, it may be because you have multiple Language resources in your subscription; in which case: 1. On the bar at the top of the page, select the **Settings (⚙)** button. 2. On the **Settings** page, view the **Resources** tab. 3. Select the language resource you just created, and click **Switch resource**. 4. At the top of the page, click **Language Studio** to return to the Language Studio home page. 1. At the top of the portal, in the **Create new** menu, select **Custom named entity recognition**. 1. Create a new project with the following settings: - **Connect storage**: *This value is likely already filled. Change it to your storage account if it isn't already* - **Basic information**: - **Name**: `CustomEntityLab` - **Text primary language**: English (US) - **Does your dataset include documents that are not in the same language?** : *No* - **Description**: `Custom entities in classified ads` - **Container**: - **Blob store container**: classifieds - **Are your files labeled with classes?**: No, I need to label my files as part of this project > **Tip**: If you get an error about not being authorized to perform this operation, you'll need to add a role assignment. To fix this, we add the role "Storage Blob Data Contributor" on the storage account for the user running the lab. More details can be found [on the documentation page](https://learn.microsoft.com/azure/ai-services/language-service/custom-named-entity-recognition/how-to/create-project?tabs=portal%2Clanguage-studio#enable-identity-management-for-your-resource) ## Label your data Now that your project is created, you need to label your data to train your model how to identity entities. 1. If the **Data labeling** page is not already open, in the pane on the left, select **Data labeling**. You'll see a list of the files you uploaded to your storage account. 1. On the right side, in the **Activity** pane, select **Add entity** and add a new entity named `ItemForSale`. 1. Repeat the previous step to create the following entities: - `Price` - `Location` 1. After you've created your three entities, select **Ad 1.txt** so you can read it. 1. In *Ad 1.txt*: 1. Highlight the text *face cord of firewood* and select the **ItemForSale** entity. 1. Highlight the text *Denver, CO* and select the **Location** entity. 1. Highlight the text *$90* and select the **Price** entity. 1. In the **Activity** pane, note that this document will be added to the dataset for training the model. 1. Use the **Next document** button to move to the next document, and continue assigning text to appropriate entities for the entire set of documents, adding them all to the training dataset. 1. When you have labeled the last document (*Ad 9.txt*), save the labels. ## Train your model After you've labeled your data, you need to train your model. 1. Select **Training jobs** in the pane on the left. 2. Select **Start a training job** 3. Train a new model named `ExtractAds` 4. Choose **Automatically split the testing set from training data** > **TIP**: In your own extraction projects, use the testing split that best suits your data. For more consistent data and larger datasets, the Azure AI Language Service will automatically split the testing set by percentage. With smaller datasets, it's important to train with the right variety of possible input documents. 5. Click **Train** > **IMPORTANT**: Training your model can sometimes take several minutes. You'll get a notification when it's complete. ## Evaluate your model In real world applications, it's important to evaluate and improve your model to verify it's performing as you expect. Two pages on the left show you the details of your trained model, and any testing that failed. Select **Model performance** on the left side menu, and select your `ExtractAds` model. There you can see the scoring of your model, performance metrics, and when it was trained. You'll be able to see if any testing documents failed, and these failures help you understand where to improve. ## Deploy your model When you're satisfied with the training of your model, it's time to deploy it, which allows you to start extracting entities through the API. 1. In the left pane, select **Deploying a model**. 2. Select **Add deployment**, then enter the name `AdEntities` and select the **ExtractAds** model. 3. Click **Deploy** to deploy your model. ## Prepare to develop an app in Cloud Shell To test the custom entity extraction capabilities of the Azure AI Language service, you'll develop a simple console application in the Azure Cloud Shell. 1. In the Azure Portal, use the **[\>_]** button to the right of the search bar at the top of the page to create a new Cloud Shell in the Azure portal, selecting a ***PowerShell*** environment. The cloud shell provides a command line interface in a pane at the bottom of the Azure portal. > **Note**: If you have previously created a cloud shell that uses a *Bash* environment, switch it to ***PowerShell***. 1. In the cloud shell toolbar, in the **Settings** menu, select **Go to Classic version** (this is required to use the code editor). **Ensure you've switched to the classic version of the cloud shell before continuing.** 1. In the PowerShell pane, enter the following commands to clone the GitHub repo for this exercise: ``` rm -r mslearn-ai-language -f git clone https://github.com/microsoftlearning/mslearn-ai-language ``` > **Tip**: As you paste commands into the cloudshell, the ouput may take up a large amount of the screen buffer. You can clear the screen by entering the `cls` command to make it easier to focus on each task. ``` 1. After the repo has been cloned, navigate to the folder containing the application code files: ``` cd mslearn-ai-language/Labfiles/05-custom-entity-recognition/Python/custom-entities ``` ## Configure your application 1. In the command line pane, run the following command to view the code files in the **custom-entities** folder: ``` ls -a -l ``` The files include a configuration file (**.env**) and a code file (**custom-entities.py**). The text your application will analyze is in the **ads** subfolder. 1. Create a Python virtual environment and install the Azure AI Language Text Analytics SDK package and other required packages by running the following command: ``` python -m venv labenv; ./labenv/bin/Activate.ps1; pip install -r requirements.txt azure-ai-textanalytics==5.3.0 ``` 1. Enter the following command to edit the application configuration file: ``` code .env ``` The file is opened in a code editor. 1. Update the configuration values to include the **endpoint** and a **key** from the Azure Language resource you created (available on the **Keys and Endpoint** page for your Azure AI Language resource in the Azure portal).The file should already contain the project and deployment names for your custom entity extraction model. 1. After you've replaced the placeholders, within the code editor, use the **CTRL+S** command or **Right-click > Save** to save your changes and then use the **CTRL+Q** command or **Right-click > Quit** to close the code editor while keeping the cloud shell command line open. ## Add code to extract entities 1. Enter the following command to edit the application code file: ``` code custom-entities.py ``` 1. Review the existing code. You will add code to work with the AI Language Text Analytics SDK. > **Tip**: As you add code to the code file, be sure to maintain the correct indentation. 1. At the top of the code file, under the existing namespace references, find the comment **Import namespaces** and add the following code to import the namespaces you will need to use the Text Analytics SDK: ```python # import namespaces from azure.core.credentials import AzureKeyCredential from azure.ai.textanalytics import TextAnalyticsClient ``` 1. In the **main** function, note that code to load the Azure AI Language service endpoint and key and the project and deployment names from the configuration file has already been provided. Then find the comment **Create client using endpoint and key**, and add the following code to create a text analytics client: ```Python # Create client using endpoint and key credential = AzureKeyCredential(ai_key) ai_client = TextAnalyticsClient(endpoint=ai_endpoint, credential=credential) ``` 1. Note that the existing code reads all of the files in the **ads** folder and creates a list containing their contents. Then find the comment **Extract entities** and add the following code: ```Python # Extract entities operation = ai_client.begin_recognize_custom_entities( batchedDocuments, project_name=project_name, deployment_name=deployment_name ) document_results = operation.result() for doc, custom_entities_result in zip(files, document_results): print(doc) if custom_entities_result.kind == "CustomEntityRecognition": for entity in custom_entities_result.entities: print( "\tEntity '{}' has category '{}' with confidence score of '{}'".format( entity.text, entity.category, entity.confidence_score ) ) elif custom_entities_result.is_error is True: print("\tError with code '{}' and message '{}'".format( custom_entities_result.error.code, custom_entities_result.error.message ) ) ``` 1. Save your changes (CTRL+S), then enter the following command to run the program (you maximize the cloud shell pane and resize the panels to see more text in the command line pane): ``` python custom-entities.py ``` 1. Observe the output. The application should list details of the entities found in each text file. ## Clean up When you don't need your project anymore, you can delete if from your **Projects** page in Language Studio. You can also remove the Azure AI Language service and associated storage account in the [Azure portal](https://portal.azure.com). ================================================ FILE: Instructions/Labs/06-translate-text.md ================================================ --- lab: title: 'Translate Text (deprecated)' description: "Translate provided text between any supported languages with Azure AI Translator." islab: false --- # Translate Text (deprecated) > **Note**: This exercise is deprecated. Consider completing the replacement exercise at . **Azure AI Translator** is a service that enables you to translate text between languages. In this exercise, you'll use it to create a simple app that translates input in any supported language to the target language of your choice. While this exercise is based on Python, you can develop text translation applications using multiple language-specific SDKs; including: - [Azure AI Translation client library for Python](https://pypi.org/project/azure-ai-translation-text/) - [Azure AI Translation client library for .NET](https://www.nuget.org/packages/Azure.AI.Translation.Text) - [Azure AI Translation client library for JavaScript](https://www.npmjs.com/package/@azure-rest/ai-translation-text) This exercise takes approximately **30** minutes. ## Provision an *Azure AI Translator* resource If you don't already have one in your subscription, you'll need to provision an **Azure AI Translator** resource. 1. Open the Azure portal at `https://portal.azure.com`, and sign in using the Microsoft account associated with your Azure subscription. 1. In the search field at the top, search for **Translators** then select **Translators** in the results. 1. Create a resource with the following settings: - **Subscription**: *Your Azure subscription* - **Resource group**: *Choose or create a resource group* - **Region**: *Choose any available region* - **Name**: *Enter a unique name* - **Pricing tier**: Select **F0** (*free*), or **S** (*standard*) if F is not available. 1. Select **Review + create**, then select **Create** to provision the resource. 1. Wait for deployment to complete, and then go to the deployed resource. 1. View the **Keys and Endpoint** page. You will need the information on this page later in the exercise. ## Prepare to develop an app in Cloud Shell To test the text translation capabilities of Azure AI Translator, you'll develop a simple console application in the Azure Cloud Shell. 1. In the Azure Portal, use the **[\>_]** button to the right of the search bar at the top of the page to create a new Cloud Shell in the Azure portal, selecting a ***PowerShell*** environment. The cloud shell provides a command line interface in a pane at the bottom of the Azure portal. > **Note**: If you have previously created a cloud shell that uses a *Bash* environment, switch it to ***PowerShell***. 1. In the cloud shell toolbar, in the **Settings** menu, select **Go to Classic version** (this is required to use the code editor). **Ensure you've switched to the classic version of the cloud shell before continuing.** 1. In the PowerShell pane, enter the following commands to clone the GitHub repo for this exercise: ``` rm -r mslearn-ai-language -f git clone https://github.com/microsoftlearning/mslearn-ai-language ``` > **Tip**: As you enter commands into the cloudshell, the ouput may take up a large amount of the screen buffer. You can clear the screen by entering the `cls` command to make it easier to focus on each task. 1. After the repo has been cloned, navigate to the folder containing the application code files: ``` cd mslearn-ai-language/Labfiles/06-translator-sdk/Python/translate-text ``` ## Configure your application 1. In the command line pane, run the following command to view the code files in the **translate-text** folder: ``` ls -a -l ``` The files include a configuration file (**.env**) and a code file (**translate.py**). 1. Create a Python virtual environment and install the Azure AI Translation SDK package and other required packages by running the following command: ``` python -m venv labenv; ./labenv/bin/Activate.ps1; pip install -r requirements.txt azure-ai-translation-text==1.0.1 ``` 1. Enter the following command to edit the application configuration file: ``` code .env ``` The file is opened in a code editor. 1. Update the configuration values to include the **region** and a **key** from the Azure AI Translator resource you created (available on the **Keys and Endpoint** page for your Azure AI Translator resource in the Azure portal). > **NOTE**: Be sure to add the *region* for your resource, not the endpoint! 1. After you've replaced the placeholders, within the code editor, use the **CTRL+S** command or **Right-click > Save** to save your changes and then use the **CTRL+Q** command or **Right-click > Quit** to close the code editor while keeping the cloud shell command line open. ## Add code to translate text 1. Enter the following command to edit the application code file: ``` code translate.py ``` 1. Review the existing code. You will add code to work with the Azure AI Translation SDK. > **Tip**: As you add code to the code file, be sure to maintain the correct indentation. 1. At the top of the code file, under the existing namespace references, find the comment **Import namespaces** and add the following code to import the namespaces you will need to use the Translation SDK: ```python # import namespaces from azure.core.credentials import AzureKeyCredential from azure.ai.translation.text import * from azure.ai.translation.text.models import InputTextItem ``` 1. In the **main** function, note that the existing code reads the configuration settings. 1. Find the comment **Create client using endpoint and key** and add the following code: ```python # Create client using endpoint and key credential = AzureKeyCredential(translatorKey) client = TextTranslationClient(credential=credential, region=translatorRegion) ``` 1. Find the comment **Choose target language** and add the following code, which uses the Text Translator service to return list of supported languages for translation, and prompts the user to select a language code for the target language: ```python # Choose target language languagesResponse = client.get_supported_languages(scope="translation") print("{} languages supported.".format(len(languagesResponse.translation))) print("(See https://learn.microsoft.com/azure/ai-services/translator/language-support#translation)") print("Enter a target language code for translation (for example, 'en'):") targetLanguage = "xx" supportedLanguage = False while supportedLanguage == False: targetLanguage = input() if targetLanguage in languagesResponse.translation.keys(): supportedLanguage = True else: print("{} is not a supported language.".format(targetLanguage)) ``` 1. Find the comment **Translate text** and add the following code, which repeatedly prompts the user for text to be translated, uses the Azure AI Translator service to translate it to the target language (detecting the source language automatically), and displays the results until the user enters *quit*: ```python # Translate text inputText = "" while inputText.lower() != "quit": inputText = input("Enter text to translate ('quit' to exit):") if inputText != "quit": input_text_elements = [InputTextItem(text=inputText)] translationResponse = client.translate(body=input_text_elements, to_language=[targetLanguage]) translation = translationResponse[0] if translationResponse else None if translation: sourceLanguage = translation.detected_language for translated_text in translation.translations: print(f"'{inputText}' was translated from {sourceLanguage.language} to {translated_text.to} as '{translated_text.text}'.") ``` 1. Save your changes (CTRL+S), then enter the following command to run the program (you maximize the cloud shell pane and resize the panels to see more text in the command line pane): ``` python translate.py ``` 1. When prompted, enter a valid target language from the list displayed. 1. Enter a phrase to be translated (for example `This is a test` or `C'est un test`) and view the results, which should detect the source language and translate the text to the target language. 1. When you're done, enter `quit`. You can run the application again and choose a different target language. ## Clean up resources If you're finished exploring the Azure AI Translator service, you can delete the resources you created in this exercise. Here's how: 1. Close the Azure cloud shell pane 1. In the Azure portal, browse to the Azure AI Translator resource you created in this lab. 1. On the resource page, select **Delete** and follow the instructions to delete the resource. ## More information For more information about using **Azure AI Translator**, see the [Azure AI Translator documentation](https://learn.microsoft.com/azure/ai-services/translator/). ================================================ FILE: Instructions/Labs/07-speech.md ================================================ --- lab: title: 'Recognize and Synthesize Speech (deprecated)' description: "Implement a speaking clock that converts speech to text, and text to speech." islab: false --- # Recognize and synthesize speech (deprecated) > **Note**: This exercise is deprecated. Consider completing the replacement exercise at . **Azure AI Speech** is a service that provides speech-related functionality, including: - A *speech-to-text* API that enables you to implement speech recognition (converting audible spoken words into text). - A *text-to-speech* API that enables you to implement speech synthesis (converting text into audible speech). In this exercise, you'll use both of these APIs to implement a speaking clock application. While this exercise is based on Python, you can develop speech applications using multiple language-specific SDKs; including: - [Azure AI Speech SDK for Python](https://pypi.org/project/azure-cognitiveservices-speech/) - [Azure AI Speech SDK for .NET](https://www.nuget.org/packages/Microsoft.CognitiveServices.Speech) - [Azure AI Speech SDK for JavaScript](https://www.npmjs.com/package/microsoft-cognitiveservices-speech-sdk) This exercise takes approximately **30** minutes. > **NOTE** > This exercise is designed to be completed in the Azure cloud shell, where direct access to your computer's sound hardware is not supported. The lab will therefore use audio files for speech input and output streams. The code to achieve the same results using a mic and speaker is provided for your reference. ## Create an Azure AI Speech resource Let's start by creating an Azure AI Speech resource. 1. Open the [Azure portal](https://portal.azure.com) at `https://portal.azure.com`, and sign in using the Microsoft account associated with your Azure subscription. 1. In the top search field, search for **Speech service**. Select it from the list, then select **Create**. 1. Provision the resource using the following settings: - **Subscription**: *Your Azure subscription*. - **Resource group**: *Choose or create a resource group*. - **Region**:*Choose any available region* - **Name**: *Enter a unique name*. - **Pricing tier**: Select **F0** (*free*), or **S** (*standard*) if F is not available. 1. Select **Review + create**, then select **Create** to provision the resource. 1. Wait for deployment to complete, and then go to the deployed resource. 1. View the **Keys and Endpoint** page in the **Resource Management** section. You will need the information on this page later in the exercise. ## Prepare and configure the speaking clock app 1. Leaving the **Keys and Endpoint** page open, use the **[\>_]** button to the right of the search bar at the top of the page to create a new Cloud Shell in the Azure portal, selecting a ***PowerShell*** environment. The cloud shell provides a command line interface in a pane at the bottom of the Azure portal. > **Note**: If you have previously created a cloud shell that uses a *Bash* environment, switch it to ***PowerShell***. 1. In the cloud shell toolbar, in the **Settings** menu, select **Go to Classic version** (this is required to use the code editor). **Ensure you've switched to the classic version of the cloud shell before continuing.** 1. In the PowerShell pane, enter the following commands to clone the GitHub repo for this exercise: ``` rm -r mslearn-ai-language -f git clone https://github.com/microsoftlearning/mslearn-ai-language ``` > **Tip**: As you enter commands into the cloudshell, the ouput may take up a large amount of the screen buffer. You can clear the screen by entering the `cls` command to make it easier to focus on each task. 1. After the repo has been cloned, navigate to the folder containing the speaking clock application code files: ``` cd mslearn-ai-language/Labfiles/07-speech/Python/speaking-clock ``` 1. In the command line pane, run the following command to view the code files in the **speaking-clock** folder: ``` ls -a -l ``` The files include a configuration file (**.env**) and a code file (**speaking-clock.py**). The audio files your application will use are in the **audio** subfolder. 1. Create a Python virtual environment and install the Azure AI Speech SDK package and other required packages by running the following command: ``` python -m venv labenv; ./labenv/bin/Activate.ps1; pip install -r requirements.txt azure-cognitiveservices-speech==1.42.0 ``` 1. Enter the following command to edit the configuration file: ``` code .env ``` The file is opened in a code editor. 1. Update the configuration values to include the **region** and a **key** from the Azure AI Speech resource you created (available on the **Keys and Endpoint** page for your Azure AI Translator resource in the Azure portal). 1. After you've replaced the placeholders, use the **CTRL+S** command to save your changes and then use the **CTRL+Q** command to close the code editor while keeping the cloud shell command line open. ## Add code to use the Azure AI Speech SDK > **Tip**: As you add code, be sure to maintain the correct indentation. 1. Enter the following command to edit the code file that has been provided: ``` code speaking-clock.py ``` 1. At the top of the code file, under the existing namespace references, find the comment **Import namespaces**. Then, under this comment, add the following language-specific code to import the namespaces you will need to use the Azure AI Speech SDK: ```python # Import namespaces from azure.core.credentials import AzureKeyCredential import azure.cognitiveservices.speech as speech_sdk ``` 1. In the **main** function, under the comment **Get config settings**, note that the code loads the key and region you defined in the configuration file. 1. Find the comment **Configure speech service**, and add the following code to use the AI Services key and your region to configure your connection to the Azure AI Services Speech endpoint: ```python # Configure speech service speech_config = speech_sdk.SpeechConfig(speech_key, speech_region) print('Ready to use speech service in:', speech_config.region) ``` 1. Save your changes (*CTRL+S*), but leave the code editor open. ## Run the app So far, the app doesn't do anything other than connect to your Azure AI Speech service, but it's useful to run it and check that it works before adding speech functionality. 1. In the command line, enter the following command to run the speaking clock app: ``` python speaking-clock.py ``` The code should display the region of the speech service resource the application will use. A successful run indicates that the app has connected to your Azure AI Speech resource. ## Add code to recognize speech Now that you have a **SpeechConfig** for the speech service in your project's Azure AI Services resource, you can use the **Speech-to-text** API to recognize speech and transcribe it to text. In this procedure, the speech input is captured from an audio file, which you can play here: 1. In the code file, note that the code uses the **TranscribeCommand** function to accept spoken input. Then in the **TranscribeCommand** function, find the comment **Configure speech recognition** and add the appropriate code below to create a **SpeechRecognizer** client that can be used to recognize and transcribe speech from an audio file: ```python # Configure speech recognition current_dir = os.getcwd() audioFile = current_dir + '/time.wav' audio_config = speech_sdk.AudioConfig(filename=audioFile) speech_recognizer = speech_sdk.SpeechRecognizer(speech_config, audio_config) ``` 1. In the **TranscribeCommand** function, under the comment **Process speech input**, add the following code to listen for spoken input, being careful not to replace the code at the end of the function that returns the command: ```python # Process speech input print("Listening...") speech = speech_recognizer.recognize_once_async().get() if speech.reason == speech_sdk.ResultReason.RecognizedSpeech: command = speech.text print(command) else: print(speech.reason) if speech.reason == speech_sdk.ResultReason.Canceled: cancellation = speech.cancellation_details print(cancellation.reason) print(cancellation.error_details) ``` 1. Save your changes (*CTRL+S*), and then in the command line below the code editor, re-run the program: 1. Review the output, which should successfully "hear" the speech in the audio file and return an appropriate response (note that your Azure cloud shell may be running on a server that is in a different time-zone to yours!) > **Tip**: If the SpeechRecognizer encounters an error, it produces a result of "Cancelled". The code in the application will then display the error message. The most likely cause is an incorrect region value in the configuration file. ## Synthesize speech Your speaking clock application accepts spoken input, but it doesn't actually speak! Let's fix that by adding code to synthesize speech. Once again, due to the hardware limitations of the cloud shell we'll direct the synthesized speech output to a file. 1. In the code file, note that the code uses the **TellTime** function to tell the user the current time. 1. In the **TellTime** function, under the comment **Configure speech synthesis**, add the following code to create a **SpeechSynthesizer** client that can be used to generate spoken output: ```python # Configure speech synthesis output_file = "output.wav" speech_config.speech_synthesis_voice_name = "en-GB-RyanNeural" audio_config = speech_sdk.audio.AudioConfig(filename=output_file) speech_synthesizer = speech_sdk.SpeechSynthesizer(speech_config, audio_config,) ``` 1. In the **TellTime** function, under the comment **Synthesize spoken output**, add the following code to generate spoken output, being careful not to replace the code at the end of the function that prints the response: ```python # Synthesize spoken output speak = speech_synthesizer.speak_text_async(response_text).get() if speak.reason != speech_sdk.ResultReason.SynthesizingAudioCompleted: print(speak.reason) else: print("Spoken output saved in " + output_file) ``` 1. Save your changes (*CTRL+S*) and re-run the program, which should indicate that the spoken output was saved in a file. 1. If you have a media player capable of playing .wav audio files, download the file that was generated by entering the following command: ``` download ./output.wav ``` The download command creates a popup link at the bottom right of your browser, which you can select to download and open the file. The file should sound similar to this: ## Use Speech Synthesis Markup Language Speech Synthesis Markup Language (SSML) enables you to customize the way your speech is synthesized using an XML-based format. 1. In the **TellTime** function, replace all of the current code under the comment **Synthesize spoken output** with the following code (leave the code under the comment **Print the response**): ```python # Synthesize spoken output responseSsml = " \ \ \ {} \ \ Time to end this lab! \ \ ".format(response_text) speak = speech_synthesizer.speak_ssml_async(responseSsml).get() if speak.reason != speech_sdk.ResultReason.SynthesizingAudioCompleted: print(speak.reason) else: print("Spoken output saved in " + output_file) ``` 1. Save your changes and re-run the program, which should once again indicate that the spoken output was saved in a file. 1. Download and play the generated file, which should sound similar to this: ## Clean up If you've finished exploring Azure AI Speech, you should delete the resources you have created in this exercise to avoid incurring unnecessary Azure costs. 1. Close the Azure cloud shell pane 1. In the Azure portal, browse to the Azure AI Speech resource you created in this lab. 1. On the resource page, select **Delete** and follow the instructions to delete the resource. ## What if you have a mic and speaker? In this exercise, the Azure Cloud Shell environment we used doesn't support audio hardware, so you used audio files for the speech input and output. Let's see how the code can be modified to use audio hardware if you have it available. ### Using speech recognition with a microphone If you have a mic, you can use the following code to capture spoken input for speech recognition: ```python # Configure speech recognition audio_config = speech_sdk.AudioConfig(use_default_microphone=True) speech_recognizer = speech_sdk.SpeechRecognizer(speech_config, audio_config) print('Speak now...') # Process speech input speech = speech_recognizer.recognize_once_async().get() if speech.reason == speech_sdk.ResultReason.RecognizedSpeech: command = speech.text print(command) else: print(speech.reason) if speech.reason == speech_sdk.ResultReason.Canceled: cancellation = speech.cancellation_details print(cancellation.reason) print(cancellation.error_details) ``` > **Note**: The system default microphone is the default audio input, so you could also just omit the AudioConfig altogether! ### Using speech synthesis with a speaker If you have a speaker, you can use the following code to synthesize speech. ```python response_text = 'The time is {}:{:02d}'.format(now.hour,now.minute) # Configure speech synthesis speech_config.speech_synthesis_voice_name = "en-GB-RyanNeural" audio_config = speech_sdk.audio.AudioOutputConfig(use_default_speaker=True) speech_synthesizer = speech_sdk.SpeechSynthesizer(speech_config, audio_config) # Synthesize spoken output speak = speech_synthesizer.speak_text_async(response_text).get() if speak.reason != speech_sdk.ResultReason.SynthesizingAudioCompleted: print(speak.reason) ``` > **Note**: The system default speaker is the default audio output, so you could also just omit the AudioConfig altogether! ## More information For more information about using the **Speech-to-text** and **Text-to-speech** APIs, see the [Speech-to-text documentation](https://learn.microsoft.com/azure/ai-services/speech-service/index-speech-to-text) and [Text-to-speech documentation](https://learn.microsoft.com/azure/ai-services/speech-service/index-text-to-speech). ================================================ FILE: Instructions/Labs/08-translate-speech.md ================================================ --- lab: title: 'Translate Speech (deprecated)' description: "Translate language speech to speech and implement in your own app." islab: false --- # Translate Speech (deprecated) > **Note**: This exercise is deprecated. Consider completing the replacement exercise at . Azure AI Speech includes a speech translation API that you can use to translate spoken language. For example, suppose you want to develop a translator application that people can use when traveling in places where they don't speak the local language. They would be able to say phrases such as "Where is the station?" or "I need to find a pharmacy" in their own language, and have it translate them to the local language. In this exercise, you'll use the Azure AI Speech SDK for Python to create a simple application based on this example. While this exercise is based on Python, you can develop speech translation applications using multiple language-specific SDKs; including: - [Azure AI Speech SDK for Python](https://pypi.org/project/azure-cognitiveservices-speech/) - [Azure AI Speech SDK for .NET](https://www.nuget.org/packages/Microsoft.CognitiveServices.Speech) - [Azure AI Speech SDK for JavaScript](https://www.npmjs.com/package/microsoft-cognitiveservices-speech-sdk) This exercise takes approximately **30** minutes. > **NOTE** > This exercise is designed to be completed in the Azure cloud shell, where direct access to your computer's sound hardware is not supported. The lab will therefore use audio files for speech input and output streams. The code to achieve the same results using a mic and speaker is provided for your reference. ## Create an Azure AI Speech resource Let's start by creating an Azure AI Speech resource. 1. Open the [Azure portal](https://portal.azure.com) at `https://portal.azure.com`, and sign in using the Microsoft account associated with your Azure subscription. 1. In the top search field, search for **Speech service**. Select it from the list, then select **Create**. 1. Provision the resource using the following settings: - **Subscription**: *Your Azure subscription*. - **Resource group**: *Choose or create a resource group*. - **Region**:*Choose any available region* - **Name**: *Enter a unique name*. - **Pricing tier**: Select **F0** (*free*), or **S** (*standard*) if F is not available. 1. Select **Review + create**, then select **Create** to provision the resource. 1. Wait for deployment to complete, and then go to the deployed resource. 1. View the **Keys and Endpoint** page in the **Resource Management** section. You will need the information on this page later in the exercise. ## Prepare to develop an app in Cloud Shell 1. Leaving the **Keys and Endpoint** page open, use the **[\>_]** button to the right of the search bar at the top of the page to create a new Cloud Shell in the Azure portal, selecting a ***PowerShell*** environment. The cloud shell provides a command line interface in a pane at the bottom of the Azure portal. > **Note**: If you have previously created a cloud shell that uses a *Bash* environment, switch it to ***PowerShell***. 1. In the cloud shell toolbar, in the **Settings** menu, select **Go to Classic version** (this is required to use the code editor). **Ensure you've switched to the classic version of the cloud shell before continuing.** 1. In the PowerShell pane, enter the following commands to clone the GitHub repo for this exercise: ``` rm -r mslearn-ai-language -f git clone https://github.com/microsoftlearning/mslearn-ai-language ``` > **Tip**: As you enter commands into the cloudshell, the ouput may take up a large amount of the screen buffer. You can clear the screen by entering the `cls` command to make it easier to focus on each task. 1. After the repo has been cloned, navigate to the folder containing the code files: ``` cd mslearn-ai-language/Labfiles/08-speech-translation/Python/translator ``` 1. In the command line pane, run the following command to view the code files in the **translator** folder: ``` ls -a -l ``` The files include a configuration file (**.env**) and a code file (**translator.py**). 1. Create a Python virtual environment and install the Azure AI Speech SDK package and other required packages by running the following command: ``` python -m venv labenv; ./labenv/bin/Activate.ps1; pip install -r requirements.txt azure-cognitiveservices-speech==1.42.0 ``` 1. Enter the following command to edit the configuration file that has been provided: ``` code .env ``` The file is opened in a code editor. 1. Update the configuration values to include the **region** and a **key** from the Azure AI Speech resource you created (available on the **Keys and Endpoint** page for your Azure AI Translator resource in the Azure portal). 1. After you've replaced the placeholders, use the **CTRL+S** command to save your changes and then use the **CTRL+Q** command to close the code editor while keeping the cloud shell command line open. ## Add code to use the Azure AI Speech SDK > **Tip**: As you add code, be sure to maintain the correct indentation. 1. Enter the following command to edit the code file that has been provided: ``` code translator.py ``` 1. At the top of the code file, under the existing namespace references, find the comment **Import namespaces**. Then, under this comment, add the following language-specific code to import the namespaces you will need to use the Azure AI Speech SDK: ```python # Import namespaces from azure.core.credentials import AzureKeyCredential import azure.cognitiveservices.speech as speech_sdk ``` 1. In the **main** function, under the comment **Get config settings**, note that the code loads the key and region you defined in the configuration file. 1. Find the following code under the comment **Configure translation**, and add the following code to configure your connection to the Azure AI Services Speech endpoint: ```python # Configure translation translation_config = speech_sdk.translation.SpeechTranslationConfig(speech_key, speech_region) translation_config.speech_recognition_language = 'en-US' translation_config.add_target_language('fr') translation_config.add_target_language('es') translation_config.add_target_language('hi') print('Ready to translate from',translation_config.speech_recognition_language) ``` 1. You will use the **SpeechTranslationConfig** to translate speech into text, but you will also use a **SpeechConfig** to synthesize translations into speech. Add the following code under the comment **Configure speech**: ```python # Configure speech speech_config = speech_sdk.SpeechConfig(speech_key, speech_region) print('Ready to use speech service in:', speech_config.region) ``` 1. Save your changes (*CTRL+S*), but leave the code editor open. ## Run the app So far, the app doesn't do anything other than connect to your Azure AI Speech resource, but it's useful to run it and check that it works before adding speech functionality. 1. In the command line, enter the following command to run the translator app: ``` python translator.py ``` The code should display the region of the speech service resource the application will use, a message that it is ready to translate from en-US and prompt you for a target language. A successful run indicates that the app has connected to your Azure AI Speech service. Press ENTER to end the program. ## Implement speech translation Now that you have a **SpeechTranslationConfig** for the Azure AI Speech service, you can use the Azure AI Speech translation API to recognize and translate speech. 1. In the code file, note that the code uses the **Translate** function to translate spoken input. Then in the **Translate** function, under the comment **Translate speech**, add the following code to create a **TranslationRecognizer** client that can be used to recognize and translate speech from a file. ```python # Translate speech current_dir = os.getcwd() audioFile = current_dir + '/station.wav' audio_config_in = speech_sdk.AudioConfig(filename=audioFile) translator = speech_sdk.translation.TranslationRecognizer(translation_config, audio_config = audio_config_in) print("Getting speech from file...") result = translator.recognize_once_async().get() print('Translating "{}"'.format(result.text)) translation = result.translations[targetLanguage] print(translation) ``` 1. Save your changes (*CTRL+S*), and re-run the program: ``` python translator.py ``` 1. When prompted, enter a valid language code (*fr*, *es*, or *hi*). The program should transcribe your input file and translate it to the language you specified (French, Spanish, or Hindi). Repeat this process, trying each language supported by the application. > **NOTE**: The translation to Hindi may not always be displayed correctly in the Console window due to character encoding issues. 1. When you're finished, press ENTER to end the program. > **NOTE**: The code in your application translates the input to all three languages in a single call. Only the translation for the specific language is displayed, but you could retrieve any of the translations by specifying the target language code in the **translations** collection of the result. ## Synthesize the translation to speech So far, your application translates spoken input to text; which might be sufficient if you need to ask someone for help while traveling. However, it would be better to have the translation spoken aloud in a suitable voice. > **Note**: Due to the hardware limitations of the cloud shell, we'll direct the synthesized speech output to a file. 1. In the **Translate** function, find the comment **Synthesize translation**, and add the following code to use a **SpeechSynthesizer** client to synthesize the translation as speech and save it as a .wav file: ```python # Synthesize translation output_file = "output.wav" voices = { "fr": "fr-FR-HenriNeural", "es": "es-ES-ElviraNeural", "hi": "hi-IN-MadhurNeural" } speech_config.speech_synthesis_voice_name = voices.get(targetLanguage) audio_config_out = speech_sdk.audio.AudioConfig(filename=output_file) speech_synthesizer = speech_sdk.SpeechSynthesizer(speech_config, audio_config_out) speak = speech_synthesizer.speak_text_async(translation).get() if speak.reason != speech_sdk.ResultReason.SynthesizingAudioCompleted: print(speak.reason) else: print("Spoken output saved in " + output_file) ``` 1. Save your changes (*CTRL+S*), and re-run the program: ``` python translator.py ``` 1. Review the output from the application, which should indicate that the spoken output translation was saved in a file. When you're finished, press **ENTER** to end the program. 1. If you have a media player capable of playing .wav audio files, download the file that was generated by entering the following command: ``` download ./output.wav ``` The download command creates a popup link at the bottom right of your browser, which you can select to download and open the file. > **NOTE** > *In this example, you've used a **SpeechTranslationConfig** to translate speech to text, and then used a **SpeechConfig** to synthesize the translation as speech. You can in fact use the **SpeechTranslationConfig** to synthesize the translation directly, but this only works when translating to a single language, and results in an audio stream that is typically saved as a file.* ## Clean up resources If you're finished exploring the Azure AI Speech service, you can delete the resources you created in this exercise. Here's how: 1. Close the Azure cloud shell pane 1. In the Azure portal, browse to the Azure AI Speech resource you created in this lab. 1. On the resource page, select **Delete** and follow the instructions to delete the resource. ## What if you have a mic and speaker? In this exercise, the Azure Cloud Shell environment we used doesn't support audio hardware, so you used audio files for the speech input and output. Let's see how the code can be modified to use audio hardware if you have it available. ### Using speech translation with a microphone 1. If you have a mic, you can use the following code to capture spoken input for speech translation: ```python # Translate speech audio_config_in = speech_sdk.AudioConfig(use_default_microphone=True) translator = speech_sdk.translation.TranslationRecognizer(translation_config, audio_config = audio_config_in) print("Speak now...") result = translator.recognize_once_async().get() print('Translating "{}"'.format(result.text)) translation = result.translations[targetLanguage] print(translation) ``` > **Note**: The system default microphone is the default audio input, so you could also just omit the AudioConfig altogether! ### Using speech synthesis with a speaker 1. If you have a speaker, you can use the following code to synthesize speech. ```python # Synthesize translation voices = { "fr": "fr-FR-HenriNeural", "es": "es-ES-ElviraNeural", "hi": "hi-IN-MadhurNeural" } speech_config.speech_synthesis_voice_name = voices.get(targetLanguage) audio_config_out = speech_sdk.audio.AudioOutputConfig(use_default_speaker=True) speech_synthesizer = speech_sdk.SpeechSynthesizer(speech_config, audio_config_out) speak = speech_synthesizer.speak_text_async(translation).get() if speak.reason != speech_sdk.ResultReason.SynthesizingAudioCompleted: print(speak.reason) ``` > **Note**: The system default speaker is the default audio output, so you could also just omit the AudioConfig altogether! ## More information For more information about using the Azure AI Speech translation API, see the [Speech translation documentation](https://learn.microsoft.com/azure/ai-services/speech-service/speech-translation). ================================================ FILE: Instructions/Labs/09-audio-chat.md ================================================ --- lab: title: 'Develop an audio-enabled chat app (deprecated)' description: 'Learn how to use Azure AI Foundry to build a generative AI app that supports audio input.' islab: false --- # Develop an audio-enabled chat app (deprecated) > **Note**: This exercise is deprecated. Consider completing the replacement exercise at . In this exercise, you use the *Phi-4-multimodal-instruct* generative AI model to generate responses to prompts that include audio files. You'll develop an app that provides AI assistance for a produce supplier company by using Azure AI Foundry and the Python OpenAI SDK to summarize voice messages left by customers. While this exercise is based on Python, you can develop similar applications using multiple language-specific SDKs; including: - [Azure AI Projects for Python](https://pypi.org/project/azure-ai-projects) - [OpenAI library for Python](https://pypi.org/project/openai/) - [Azure AI Projects for Microsoft .NET](https://www.nuget.org/packages/Azure.AI.Projects) - [Azure OpenAI client library for Microsoft .NET](https://www.nuget.org/packages/Azure.AI.OpenAI) - [Azure AI Projects for JavaScript](https://www.npmjs.com/package/@azure/ai-projects) - [Azure OpenAI library for TypeScript](https://www.npmjs.com/package/@azure/openai) This exercise takes approximately **30** minutes. ## Create an Azure AI Foundry project Let's start by deploying a model in an Azure AI Foundry project. 1. In a web browser, open the [Azure AI Foundry portal](https://ai.azure.com) at `https://ai.azure.com` and sign in using your Azure credentials. Close any tips or quick start panes that are opened the first time you sign in, and if necessary use the **Azure AI Foundry** logo at the top left to navigate to the home page, which looks similar to the following image: ![Screenshot of Azure AI Foundry portal.](../media/ai-foundry-home.png) 1. In the home page, in the **Explore models and capabilities** section, search for the `Phi-4-multimodal-instruct` model; which we'll use in our project. 1. In the search results, select the **Phi-4-multimodal-instruct** model to see its details, and then at the top of the page for the model, select **Use this model**. 1. When prompted to create a project, enter a valid name for your project and expand **Advanced options**. 1. Select **Customize** and specify the following settings for your hub: - **Azure AI Foundry resource**: *A valid name for your Azure AI Foundry resource* - **Subscription**: *Your Azure subscription* - **Resource group**: *Create or select a resource group* - **Region**: *Select any **AI Foundry recommended***\* > \* Some Azure AI resources are constrained by regional model quotas. In the event of a quota limit being exceeded later in the exercise, there's a possibility you may need to create another resource in a different region. You can check the latest regional availability for specific models in the [Azure AI Foundry documentation](https://learn.microsoft.com/azure/ai-foundry/how-to/deploy-models-serverless-availability#region-availability) 1. Select **Create** and wait for your project to be created. It may take a few moments for the operation to complete. 1. Select **Agree and Proceed** to agree to the model terms, then select **Deploy** to complete the Phi model deployment. 1. When your project is created, the model details will be opened automatically. Note the name of your model deployment; which should be **Phi-4-multimodal-instruct** 1. In the navigation pane on the left, select **Overview** to see the main page for your project; which looks like this: > **Note**: If an *Insufficient permissions** error is displayed, use the **Fix me** button to resolve it. ![Screenshot of a Azure AI Foundry project overview page.](../media/ai-foundry-project.png) ## Create a client application Now that you deployed a model, you can use the Azure AI Foundry and Azure AI Model Inference SDKs to develop an application that chats with it. > **Tip**: You can choose to develop your solution using Python or Microsoft C#. Follow the instructions in the appropriate section for your chosen language. ### Prepare the application configuration 1. In the Azure AI Foundry portal, view the **Overview** page for your project. 1. In the **Project details** area, note the **Azure AI Foundry project endpoint**. You'll use this endpoint to connect to your project in a client application. 1. Open a new browser tab (keeping the Azure AI Foundry portal open in the existing tab). Then in the new tab, browse to the [Azure portal](https://portal.azure.com) at `https://portal.azure.com`; signing in with your Azure credentials if prompted. Close any welcome notifications to see the Azure portal home page. 1. Use the **[\>_]** button to the right of the search bar at the top of the page to create a new Cloud Shell in the Azure portal, selecting a ***PowerShell*** environment with no storage in your subscription. The cloud shell provides a command-line interface in a pane at the bottom of the Azure portal. You can resize or maximize this pane to make it easier to work in. > **Note**: If you have previously created a cloud shell that uses a *Bash* environment, switch it to ***PowerShell***. 1. In the cloud shell toolbar, in the **Settings** menu, select **Go to Classic version** (this is required to use the code editor). **Ensure you've switched to the classic version of the cloud shell before continuing.** 1. In the cloud shell pane, enter the following commands to clone the GitHub repo containing the code files for this exercise (type the command, or copy it to the clipboard and then right-click in the command line and paste as plain text): ``` rm -r mslearn-ai-audio -f git clone https://github.com/MicrosoftLearning/mslearn-ai-language ``` > **Tip**: As you paste commands into the cloudshell, the ouput may take up a large amount of the screen buffer. You can clear the screen by entering the `cls` command to make it easier to focus on each task. 1. After the repo has been cloned, navigate to the folder containing the application code files: ``` cd mslearn-ai-language/Labfiles/09-audio-chat/Python ```` 1. In the cloud shell command line pane, enter the following command to install the libraries you'll use: ``` python -m venv labenv; ./labenv/bin/Activate.ps1; pip install -r requirements.txt azure-identity azure-ai-projects openai ``` 1. Enter the following command to edit the configuration file that has been provided: ``` code .env ``` The file should open in a code editor. 1. In the code file, replace the **your_project_endpoint** placeholder with the endpoint for your project (copied from the project **Overview** page in the Azure AI Foundry portal), and the **your_model_deployment** placeholder with the name you assigned to your Phi-4-multimodal-instruct model deployment. 1. After you replace the placeholders, in the code editor, use the **CTRL+S** command or **Right-click > Save** to save your changes and then use the **CTRL+Q** command or **Right-click > Quit** to close the code editor while keeping the cloud shell command line open. ### Write code to connect to your project and get a chat client for your model > **Tip**: As you add code, be sure to maintain the correct indentation. 1. Enter the following command to edit the code file: ``` code audio-chat.py ``` 1. In the code file, note the existing statements that have been added at the top of the file to import the necessary SDK namespaces. Then, Find the comment **Add references**, add the following code to reference the namespaces in the libraries you installed previously: ```python # Add references from azure.identity import DefaultAzureCredential from azure.ai.projects import AIProjectClient ``` 1. In the **main** function, under the comment **Get configuration settings**, note that the code loads the project connection string and model deployment name values you defined in the configuration file. 1. Find the comment **Initialize the project client** and add the following code to connect to your Azure AI Foundry project: > **Tip**: Be careful to maintain the correct indentation level for your code. ```python # Initialize the project client project_client = AIProjectClient( credential=DefaultAzureCredential( exclude_environment_credential=True, exclude_managed_identity_credential=True ), endpoint=project_endpoint, ) ``` 1. Find the comment **Get a chat client**, add the following code to create a client object for chatting with your model: ```python # Get a chat client openai_client = project_client.get_openai_client(api_version="2024-10-21") ``` ### Write code to submit an audio-based prompt Before submitting the prompt, we need to encode the audio file for the request. Then we can attach the audio data to the user's message with a prompt for the LLM. Note that the code includes a loop to allow the user to input a prompt until they enter "quit". 1. Under the comment **Encode the audio file**, enter the following code to prepare the following audio file: ```python # Encode the audio file file_path = "https://github.com/MicrosoftLearning/mslearn-ai-language/raw/refs/heads/main/Labfiles/09-audio-chat/data/avocados.mp3" response = requests.get(file_path) response.raise_for_status() audio_data = base64.b64encode(response.content).decode('utf-8') ``` 1. Under the comment **Get a response to audio input**, add the following code to submit a prompt: ```python # Get a response to audio input response = openai_client.chat.completions.create( model=model_deployment, messages=[ {"role": "system", "content": system_message}, { "role": "user", "content": [ { "type": "text", "text": prompt }, { "type": "input_audio", "input_audio": { "data": audio_data, "format": "mp3" } } ] } ] ) print(response.choices[0].message.content) ``` 1. Use the **CTRL+S** command to save your changes to the code file. You can also close the code editor (**CTRL+Q**) if you like. ### Sign into Azure and run the app 1. In the cloud shell command-line pane, enter the following command to sign into Azure. ``` az login ``` **You must sign into Azure - even though the cloud shell session is already authenticated.** > **Note**: In most scenarios, just using *az login* will be sufficient. However, if you have subscriptions in multiple tenants, you may need to specify the tenant by using the *--tenant* parameter. See [Sign into Azure interactively using the Azure CLI](https://learn.microsoft.com/cli/azure/authenticate-azure-cli-interactively) for details. 1. When prompted, follow the instructions to open the sign-in page in a new tab and enter the authentication code provided and your Azure credentials. Then complete the sign in process in the command line, selecting the subscription containing your Azure AI Foundry hub if prompted. 1. In the cloud shell command-line pane, enter the following command to run the app: ``` python audio-chat.py ``` 1. When prompted, enter the prompt ``` Can you summarize this customer's voice message? ``` 1. Review the response. ### Use a different audio file 1. In the code editor for your app code, find the code you added previously under the comment **Encode the audio file**. Then modify the file path url as follows to use a different audio file for the request (leaving the existing code after the file path): ```python # Encode the audio file file_path = "https://github.com/MicrosoftLearning/mslearn-ai-language/raw/refs/heads/main/Labfiles/09-audio-chat/data/fresas.mp3" ``` The new file sounds like this: 1. Use the **CTRL+S** command to save your changes to the code file. You can also close the code editor (**CTRL+Q**) if you like. 1. In the cloud shell command line pane beneath the code editor, enter the following command to run the app: ``` python audio-chat.py ``` 1. When prompted, enter the following prompt: ``` Can you summarize this customer's voice message? Is it time-sensitive? ``` 1. Review the response. Then enter `quit` to exit the program. > **Note**: In this simple app, we haven't implemented logic to retain conversation history; so the model will treat each prompt as a new request with no context of the previous prompt. 1. You can continue to run the app, choosing different prompt types and trying different prompts. When you're finished, enter `quit` to exit the program. If you have time, you can modify the code to use a different system prompt and your own internet-accessible audio files. > **Note**: In this simple app, we haven't implemented logic to retain conversation history; so the model will treat each prompt as a new request with no context of the previous prompt. ## Summary In this exercise, you used Azure AI Foundry and the Azure AI Inference SDK to create a client application uses a multimodal model to generate responses to audio. ## Clean up If you've finished exploring Azure AI Foundry, you should delete the resources you have created in this exercise to avoid incurring unnecessary Azure costs. 1. Return to the browser tab containing the Azure portal (or re-open the [Azure portal](https://portal.azure.com) at `https://portal.azure.com` in a new browser tab) and view the contents of the resource group where you deployed the resources used in this exercise. 1. On the toolbar, select **Delete resource group**. 1. Enter the resource group name and confirm that you want to delete it. ================================================ FILE: Instructions/Labs/10-voice-live-api.md ================================================ --- lab: title: 'Explore the Voice Live API (deprecated)' description: 'Learn how to use, and customize, the Voice Live API available in the Azure AI Foundry Playground.' islab: false --- # Explore the Voice Live API (deprecated) > **Note**: This exercise is deprecated. Consider completing the replacement exercise at . In this exercise you create an agent in the Azure AI Foundry and explore the Voice Live API in the Speech Playground. This exercise takes approximately **30** minutes to complete. > **Note**: Some of the technologies used in this exercise are currently in preview or in active development. You may experience some unexpected behavior, warnings, or errors. > **Note**: This exercise is designed to be completed in a browser environment with direct access to your computer's microphone. While the concepts can be explored in Azure Cloud Shell, the interactive voice features require local audio hardware access. ## Create an Azure AI Foundry project Let's start by creating an Azure AI Foundry project. 1. In a web browser, open the [Azure AI Foundry portal](https://ai.azure.com) at `https://ai.azure.com` and sign in using your Azure credentials. Close any tips or quick start panes that are opened the first time you sign in, and if necessary use the **Azure AI Foundry** logo at the top left to navigate to the home page, which looks similar to the following image (close the **Help** pane if it's open): ![Screenshot of Azure AI Foundry home page with create an agent selected.](../media/ai-foundry-new-home-page.png) 1. In the home page, select **Create an agent**. 1. In the **Create an agent** wizard, enter a valid name for your project. 1. Select **Advanced options** and specify the following settings: - **Azure AI Foundry resource**: *Keep the default name* - **Subscription**: *Your Azure subscription* - **Resource group**: *Create or select a resource group* - **Region**: Randomly select a region from the following options:\* - East US 2 - Sweden Central > \* At the time of writing, the Voice Live API is only supported in the previously listed regions. Selecting a location randomly helps ensure a single region isn't overwhelmed with traffic, and helps you have a smoother experience. In the event of service limits being reached, there's a possibility you may need to create another project in a different region. 1. Select **Create** and review your configuration. Wait for the set up process to complete. >**Note**: If you receive a permissions error, select the **Fix it** button to add the appropriate permissions to continue. 1. When your project is created, you will be brought by default to the Agents playground in Azure AI Foundry portal, which should look similar to the following image: ![Screenshot of a Azure AI project details in Azure AI Foundry portal.](../media/ai-foundry-project-2.png) ## Start a Voice Live sample In this section of the exercise you interact with one of the agents. 1. Select **Playgrounds** in the navigation pane. 1. Locate the **Speech playground** group, and select the **Try the Speech playground** button. 1. The Speech Playground offers many pre-built options. Use the horizontal scroll bar to navigate to the end of the list and select the **Voice Live** tile. ![Screenshot of the Voice Live tile.](../media/voice-live-tile.png) 1. Select the **Casual chat** agent sample in **Try with samples** panel. 1. Ensure your microphone and speakers are working and select the **Start** button at the bottom of the page. As you interact with the agent, notice you can interrupt the agent and it will pause to listen. Try speaking with different lengths of pauses between words and sentences. Notice how quickly the agent recognizes the pauses and fills in the conversation. When you're finished select the **End** button. 1. Start the other sample agents to explore how they behave. As you explore the different agents note the changes in the **Response instruction** section in the **Configuration** panel. ## Configure the agent In this section you change the voice of the agent, and add an avatar to the **Casual chat** agent. The **Configuration** panel is divided into three sections: **GenAI**, **Speech**, and **Avatar**. >**Note:** If you change, or interact with, any of the configuration options you need to select the **Apply** button at the bottom of the **Configuration** panel to enable the agent. Select the **Casual chat** agent. Next, change the voice of the agent, and add an avatar, with the following instructions: 1. Select **> Speech** to expand the section and access the options. 1. Select the drop-down menu in the **Voice** option and choose a different voice. 1. Select **Apply** to save your changes, and then **Start** to launch the agent and hear your change. Repeat the previous steps to try a few different voices. Proceed to the next step when you're finished with the voice selection. 1. Select **> Avatar** to expand the section and access the options. 1. Select the toggle button to enable the feature and select one of the avatars. 1. Select **Apply** to save your changes, and then **Start** to launch the agent. Notice the avatar's animation and synchronization to the audio. 1. Expand the **> GenAI** section and set the **Proactive engagement** toggle to the off position. Next, select **Apply** to save your changes, and then **Start** to launch the agent. With the **Proactive engagement** turned off, the agent doesn't initiate the conversation. Ask the agent "Can you tell me what you do?" to start the conversation. >**Tip:** You can select **Reset to default** and then **Apply** to return the agent to its default behavior. When you're finished, proceed to the next section. ## Create a voice agent In this section you create your own voice agent from scratch. 1. Select **Start from blank** in the **Try with your own** section of the panel. 1. Expand the **> GenAI** section of the **Configuration** panel. 1. Select the **Generative AI model** drop-down menu and choose the **GPT-4o Mini Realtime** model. 1. Add the following text in the **Response instruction** section. ``` You are a voice agent named "Ava" who acts as a friendly car rental agent. ``` 1. Set the **Response temperature** slider to a value of **0.8**. 1. Set the **Proactive engagement** toggle to the on position. 1. Select **Apply** to save your changes, and then **Start** to launch the agent. The agent will introduce itself and ask how it can help you today. Ask the agent "Do you have any sedans available for rent on Thursday?" Notice how long it takes the agent to respond. Ask the agent other questions to see how it responds. When you're finished, proceed to the next step. 1. Expand the **> Speech** section of the **Configuration** panel. 1. Set the **End of utterance (EOU)** toggle button to the **on** position. 1. Set the **Audio enhancement** toggle button to the **on** position. 1. Select **Apply** to save your changes, and then **Start** to launch the agent. After the agent introduces itself, ask it "Do you have any planes for rent?" Notice the agent responds more quickly than it did earlier after finishing your question. The **End of utterance (EOU)** setting configures the agent to detect pauses and your end of speech based on context and semantics. This enables it to have a more natural conversation. When you're finished, proceed to the next section. ## Clean up resources Now that you finished the exercise, delete the project you created to avoid unnecessary resource usage. 1. Select **Management center** in the AI Foundry navigation menu. 1. Select **Delete project** in the right information pane, and then confirm the deletion. ================================================ FILE: Instructions/Labs/11-voice-live-agent-web.md ================================================ --- lab: title: 'Develop an Azure AI Voice Live voice agent (deprecated)' description: 'Learn how to create a web app to enable real-time voice interactions with an Azure AI Voice Live agent.' islab: false --- # Develop an Azure AI Voice Live voice agent (deprecated) > **Note**: This exercise is deprecated. Consider completing the replacement exercise at . In this exercise, you complete a Flask-based Python web app based that enables real-time voice interactions with an agent. You add the code to initialize the session, and handle session events. You use a deployment script that: deploys the AI model; creates an image of the app in Azure Container Registry (ACR) using ACR tasks; and then creates an Azure App Service instance that pulls the the image. To test the app you will need an audio device with microphone and speaker capabilities. While this exercise is based on Python, you can develop similar applications other language-specific SDKs; including: - [Azure VoiceLive client library for .NET](https://www.nuget.org/packages/Azure.AI.VoiceLive/) Tasks performed in this exercise: - Download the base files for the app - Add code to complete the web app - Review the overall code base - Update and run the deployment script - View and test the application This exercise takes approximately **30** minutes to complete. ## Launch the Azure Cloud Shell and download the files In this section of the exercise you download the a zipped file containing the base files for the app. 1. In your browser navigate to the Azure portal [https://portal.azure.com](https://portal.azure.com); signing in with your Azure credentials if prompted. 1. Use the **[\>_]** button to the right of the search bar at the top of the page to create a new cloud shell in the Azure portal, selecting a ***Bash*** environment. The cloud shell provides a command line interface in a pane at the bottom of the Azure portal. > **Note**: If you have previously created a cloud shell that uses a *PowerShell* environment, switch it to ***Bash***. 1. In the cloud shell toolbar, in the **Settings** menu, select **Go to Classic version** (this is required to use the code editor). 1. Run the following commands in the **Bash** shell to create a project folder, and download and unzip the exercise files. ```bash mkdir voice-live-web && cd voice-live-web ``` ```bash wget https://github.com/MicrosoftLearning/mslearn-ai-language/raw/refs/heads/main/downloads/python/voice-live-web.zip ``` ``` unzip voice-live-web.zip ``` ## Add code to complete the web app Now that the exercise files are downloaded, the the next step is to add code to complete the application. The following steps are performed in the cloud shell. >**Tip:** Resize the cloud shell to display more information, and code, by dragging the top border. You can also use the minimize and maximize buttons to switch between the cloud shell and the main portal interface. Run the following command to change into the *src* directory before you continue with the exercise. ```bash cd src ``` ### Add code to implement the voice live assistant In this section you add code to implement the voice live assistant. The **\_\_init\_\_** method initializes the voice assistant by storing the Azure VoiceLive connection parameters (endpoint, credentials, model, voice, and system instructions) and setting up runtime state variables to manage the connection lifecycle and handle user interruptions during conversations. The **start** method imports the necessary Azure VoiceLive SDK components that will be used to establish the WebSocket connection and configure the real-time voice session. 1. Run the following command to open the *flask_app.py* file for editing. ```bash code flask_app.py ``` 1. Search for the **# BEGIN VOICE LIVE ASSISTANT IMPLEMENTATION - ALIGN CODE WITH COMMENT** comment in the code. Copy the code below and enter it just below the comment. Be sure to check the indentation. ```python def __init__( self, endpoint: str, credential, model: str, voice: str, instructions: str, state_callback=None, ): # Store Azure Voice Live connection and configuration parameters self.endpoint = endpoint self.credential = credential self.model = model self.voice = voice self.instructions = instructions # Initialize runtime state - connection established in start() self.connection = None self._response_cancelled = False # Used to handle user interruptions self._stopping = False # Signals graceful shutdown self.state_callback = state_callback or (lambda *_: None) async def start(self): # Import Voice Live SDK components needed for establishing connection and configuring session from azure.ai.voicelive.aio import connect # type: ignore from azure.ai.voicelive.models import ( RequestSession, ServerVad, AzureStandardVoice, Modality, InputAudioFormat, OutputAudioFormat, ) # type: ignore ``` 1. Enter **ctrl+s** to save your changes and keep the editor open for the next section. ### Add code to implement the voice live assistant In this section you add code to configure the voice live session. This specifies the modalities (audio-only is not supported by the API), the system instructions that define the assistant's behavior, the Azure TTS voice for responses, the audio format for both input and output streams, and Server-side Voice Activity Detection (VAD) which specifies how the model detects when users start and stop speaking. 1. Search for the **# BEGIN CONFIGURE VOICE LIVE SESSION - ALIGN CODE WITH COMMENT** comment in the code. Copy the code below and enter it just below the comment. Be sure to check the indentation. ```python # Configure VoiceLive session with audio/text modalities and voice activity detection session_config = RequestSession( modalities=[Modality.TEXT, Modality.AUDIO], instructions=self.instructions, voice=voice_cfg, input_audio_format=InputAudioFormat.PCM16, output_audio_format=OutputAudioFormat.PCM16, turn_detection=ServerVad(threshold=0.5, prefix_padding_ms=300, silence_duration_ms=500), ) await conn.session.update(session=session_config) ``` 1. Enter **ctrl+s** to save your changes and keep the editor open for the next section. ### Add code to handle session events In this section you add code to add event handlers for the voice live session. The event handlers respond to key VoiceLive session events during the conversation lifecycle: **_handle_session_updated** signals when the session is ready for user input, **_handle_speech_started** detects when the user begins speaking and implements interruption logic by stopping any ongoing assistant audio playback and canceling in-progress responses to allow natural conversation flow, and **_handle_speech_stopped** handles when the user has finished speaking and the assistant begins processing the input. 1. Search for the **# BEGIN HANDLE SESSION EVENTS - ALIGN CODE WITH COMMENT** comment in the code. Copy the code below and enter it just below the comment, be sure to check the indentation. ```python async def _handle_event(self, event, conn, verbose=False): """Handle Voice Live events with clear separation by event type.""" # Import event types for processing different Voice Live server events from azure.ai.voicelive.models import ServerEventType event_type = event.type if verbose: _broadcast({"type": "log", "level": "debug", "event_type": str(event_type)}) # Route Voice Live server events to appropriate handlers if event_type == ServerEventType.SESSION_UPDATED: await self._handle_session_updated() elif event_type == ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STARTED: await self._handle_speech_started(conn) elif event_type == ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STOPPED: await self._handle_speech_stopped() elif event_type == ServerEventType.RESPONSE_AUDIO_DELTA: await self._handle_audio_delta(event) elif event_type == ServerEventType.RESPONSE_AUDIO_DONE: await self._handle_audio_done() elif event_type == ServerEventType.RESPONSE_DONE: # Reset cancellation flag but don't change state - _handle_audio_done already did self._response_cancelled = False elif event_type == ServerEventType.ERROR: await self._handle_error(event) async def _handle_session_updated(self): """Session is ready for conversation.""" self.state_callback("ready", "Session ready. You can start speaking now.") async def _handle_speech_started(self, conn): """User started speaking - handle interruption if needed.""" self.state_callback("listening", "Listening… speak now") try: # Stop any ongoing audio playback on the client side _broadcast({"type": "control", "action": "stop_playback"}) # If assistant is currently speaking or processing, cancel the response to allow interruption current_state = assistant_state.get("state") if current_state in {"assistant_speaking", "processing"}: self._response_cancelled = True await conn.response.cancel() _broadcast({"type": "log", "level": "debug", "msg": f"Interrupted assistant during {current_state}"}) else: _broadcast({"type": "log", "level": "debug", "msg": f"User speaking during {current_state} - no cancellation needed"}) except Exception as e: _broadcast({"type": "log", "level": "debug", "msg": f"Exception in speech handler: {e}"}) async def _handle_speech_stopped(self): """User stopped speaking - processing input.""" self.state_callback("processing", "Processing your input…") async def _handle_audio_delta(self, event): """Stream assistant audio to clients.""" if self._response_cancelled: return # Skip cancelled responses # Update state when assistant starts speaking if assistant_state.get("state") != "assistant_speaking": self.state_callback("assistant_speaking", "Assistant speaking…") # Extract and broadcast Voice Live audio delta as base64 to WebSocket clients audio_data = getattr(event, "delta", None) if audio_data: audio_b64 = base64.b64encode(audio_data).decode("utf-8") _broadcast({"type": "audio", "audio": audio_b64}) async def _handle_audio_done(self): """Assistant finished speaking.""" self._response_cancelled = False self.state_callback("ready", "Assistant finished. You can speak again.") async def _handle_error(self, event): """Handle Voice Live errors.""" error = getattr(event, "error", None) message = getattr(error, "message", "Unknown error") if error else "Unknown error" self.state_callback("error", f"Error: {message}") def request_stop(self): self._stopping = True ``` 1. Enter **ctrl+s** to save your changes and keep the editor open for the next section. ### Review the code in the app So far, you've added code to the app to implement the agent and handle agent events. Take a few minutes to review the full code and comments to get a better understanding of how the app is handling client state and operations. 1. When you're finished enter **ctrl+q** to exit out of the editor. ## Update and run the deployment script In this section you make a small change to the **azdeploy.sh** deployment script and then run the deployment. ### Update the deployment script There are only two values you should change at the top of the **azdeploy.sh** deployment script. - The **rg** value specifies the resource group to contain the deployment. You can accept the default value, or enter your own value if you need to deploy to a specific resource group. - The **location** value sets the region for the deployment. The *gpt-4o* model used in the exercise can be deployed to other regions, but there can be limits in any particular region. If the deployment fails in your chosen region, try **eastus2** or **swedencentral**. ``` rg="rg-voicelive" # Replace with your resource group location="eastus2" # Or a location near you ``` 1. Run the following commands in the Cloud Shell to begin editing the deployment script. ```bash cd ~/voice-live-web ``` ```bash code azdeploy.sh ``` 1. Update the values for **rg** and **location** to meet your needs and then enter **ctrl+s** to save your changes and **ctrl+q** to exit the editor. ### Run the deployment script The deployment script deploys the AI model and creates the necessary resources in Azure to run a containerized app in App Service. 1. Run the following command in the Cloud Shell to begin deploying the Azure resources and the application. ```bash bash azdeploy.sh ``` 1. Select **option 1** for the initial deployment. The deployment should complete in 5-10 minutes. During the deployment you might be prompted for the following information/actions: - If you are prompted to authenticate to Azure follow the directions presented to you. - If you are prompted to select a subscription use the arrow keys to highlight your subscription and press **Enter**. - You will likely see some warnings during deployment and these can be ignored. - If the deployment fails during the AI model deployment change the region in the deployment script and try again. - Regions in Azure can get busy at times and interrupt the timing of the deployments. If the deployment fails after the model deployment re-run the deployment script. ## View and test the app When the deployment completes a "Deployment complete!" message will be in the shell along with a link to the web app. You can select that link, or navigate to the App Service resource and launch the app from there. It can take a few minutes for the application to load. 1. Select the **Start session** button to connect to the model. 1. You will be prompted to give the application access to you audio devices. 1. Begin talking to the model when the app prompts you to start speaking. Troubleshooting: - If the app reports missing environment variables, restart the application in App Service. - If you see excessive *audio chunk* messages in the log shown in the application select **Stop session** and then start the session again. - If the app fails to function at all, double-check you added all of the code and for proper indentation. If you need to make any changes re-run the deployment and select **option 2** to only update the image. ## Clean up resources Run the following command in the Cloud Shell to remove all of the resources deployed for this exercise. You will be prompted to confirm the resource deletion. ``` azd down --purge ``` ================================================ FILE: LICENSE ================================================ MIT License Copyright (c) 2023 Microsoft Learning Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: Labfiles/01-analyze-text/Python/readme.txt ================================================ This folder contains Python code ================================================ FILE: Labfiles/01-analyze-text/Python/text-analysis/requirements.txt ================================================ python-dotenv azure-identity azure-ai-textanalytics==5.3.0 ================================================ FILE: Labfiles/01-analyze-text/Python/text-analysis/reviews/review1.txt ================================================ Good Hotel and staff The Royal Hotel, London, UK 3/2/2018 Clean rooms, good service, great location near Buckingham Palace and Westminster Abbey, and so on. We thoroughly enjoyed our stay. The courtyard is very peaceful and we went to a restaurant which is part of the same group and is Indian ( West coast so plenty of fish) with a Michelin Star. We had the taster menu which was fabulous. The rooms were very well appointed with a kitchen, lounge, bedroom and enormous bathroom. Contact me at alex@contoso.com for more details. ================================================ FILE: Labfiles/01-analyze-text/Python/text-analysis/reviews/review2.txt ================================================ Tired hotel with poor service The Royal Hotel, London, United Kingdom 5/6/2018 This is a old hotel (has been around since 1950's) and the room furnishings are average - becoming a bit old now and require changing. The internet didn't work and had to come to one of their office rooms to check in for my flight home. My colleague John Smith says it's the worst hotel he's ever stayed in, and I'm inclined to agree! ================================================ FILE: Labfiles/01-analyze-text/Python/text-analysis/reviews/review3.txt ================================================ Good location and helpful staff, but on a busy road. The Lombard Hotel, San Francisco, USA 8/16/2018 We stayed here in August after reading reviews. We were very pleased with location, just behind Chestnut Street, a cosmopolitan and trendy area with plenty of restaurants to choose from. The Marina district was lovely to wander through, very interesting houses. Make sure to walk to the San Francisco Museum of Fine Arts and the Marina to get a good view of Golden Gate bridge and the city. On a bus route and easy to get into centre. Rooms were clean with plenty of room and staff were friendly and helpful. The only down side was the noise from Lombard Street so ask to have a room furthest away from traffic noise. ================================================ FILE: Labfiles/01-analyze-text/Python/text-analysis/reviews/review4.txt ================================================ Very noisy and rooms are tiny The Lombard Hotel, San Francisco, USA 9/5/2018 Hotel is located on Lombard street which is a very busy SIX lane street directly off the Golden Gate Bridge. Traffic from early morning until late at night especially on weekends. Noise would not be so bad if rooms were better insulated but they are not. Had to put cotton balls in my ears to be able to sleep--was too tired to enjoy the city the next day. Rooms are TINY. I picked the room because it had two queen size beds--but the room barely had space to fit them. With family of four in the room it was tight. With all that said, rooms are clean and they've made an effort to update them. The hotel is in Marina district with lots of good places to eat, within walking distance to Presidio. May be good hotel for young stay-up-late adults on a budget ================================================ FILE: Labfiles/01-analyze-text/Python/text-analysis/reviews/review5.txt ================================================ Un hôtel agréable L'Hotel Buckingham, Londres, UK J’adore cet hôtel. Le personnel est très amical et les chambres sont confortables. ================================================ FILE: Labfiles/01-analyze-text/Python/text-analysis/text-analysis.py ================================================ from dotenv import load_dotenv import os # Import namespaces def main(): try: # Clear the console os.system('cls' if os.name == 'nt' else 'clear') # Get Configuration Settings load_dotenv() foundry_endpoint = os.getenv('FOUNDRY_ENDPOINT') # Create client using endpoint # Analyze each text file in the reviews folder reviews_folder = 'reviews' for file_name in os.listdir(reviews_folder): # Read the file contents print('\n-------------\n' + file_name) text = open(os.path.join(reviews_folder, file_name), encoding='utf8').read() print('\n' + text) # Get language # Get entities # Get PII except Exception as ex: print(ex) if __name__ == "__main__": main() ================================================ FILE: Labfiles/02-language-agent/Python/text-agent/requirements.txt ================================================ python-dotenv azure-identity azure-ai-projects==2.0.0b4 ================================================ FILE: Labfiles/02-language-agent/Python/text-agent/text-agent.py ================================================ from dotenv import load_dotenv import os # Import namespaces def main(): try: # Clear the console os.system('cls' if os.name == 'nt' else 'clear') # Get Configuration Settings load_dotenv() foundry_endpoint = os.getenv('FOUNDRY_ENDPOINT') agent_name = os.getenv('AGENT_NAME') # Get project client # Get an OpenAI client # Use the agent to get a response except Exception as ex: print(ex) if __name__ == "__main__": main() ================================================ FILE: Labfiles/02-qna/Python/qna-app/qna-app.py ================================================ from dotenv import load_dotenv import os # Import namespaces def main(): try: # Get Configuration Settings load_dotenv() ai_endpoint = os.getenv('AI_SERVICE_ENDPOINT') ai_key = os.getenv('AI_SERVICE_KEY') ai_project_name = os.getenv('QA_PROJECT_NAME') ai_deployment_name = os.getenv('QA_DEPLOYMENT_NAME') # Create client using endpoint and key # Submit a question and display the answer except Exception as ex: print(ex) if __name__ == "__main__": main() ================================================ FILE: Labfiles/02-qna/Python/qna-app/requirements.txt ================================================ python-dotenv ================================================ FILE: Labfiles/02-qna/Python/readme.txt ================================================ This folder contains Python code ================================================ FILE: Labfiles/02-qna/ask-question.sh ================================================ prediction_url="YOUR_ENDPOINT_URL" key="YOUR_KEY" curl -X POST $prediction_url -H "Ocp-Apim-Subscription-Key: $key" -H "Content-Type: application/json" -d "{'question': 'What is a learning Path?' }" ================================================ FILE: Labfiles/03-gen-ai-speech/Python/generate-speech/generate-speech.py ================================================ import os from pathlib import Path from playsound3 import playsound from dotenv import load_dotenv # Import namespaces def main(): try: # Clear the console os.system('cls' if os.name == 'nt' else 'clear') # Get Configuration Settings load_dotenv() endpoint=os.getenv("MODEL_ENDPOINT") model_deployment=os.getenv("MODEL_NAME") speech_file_path = Path(__file__).parent / "speech.mp3" # Create the Azure OpenAI client # Generate speech and save to file # Play the generated speech file playsound(speech_file_path) except Exception as ex: print(ex) if __name__ == "__main__": main() ================================================ FILE: Labfiles/03-gen-ai-speech/Python/generate-speech/requirements.txt ================================================ python-dotenv playsound3 azure-identity openai ================================================ FILE: Labfiles/03-gen-ai-speech/Python/transcribe-speech/requirements.txt ================================================ python-dotenv playsound3 azure-identity openai ================================================ FILE: Labfiles/03-gen-ai-speech/Python/transcribe-speech/transcribe-speech.py ================================================ import os from pathlib import Path from playsound3 import playsound from dotenv import load_dotenv # Import namespaces def main(): try: # Clear the console os.system('cls' if os.name == 'nt' else 'clear') # Get Configuration Settings load_dotenv() endpoint = os.getenv("MODEL_ENDPOINT") model_deployment = os.getenv("MODEL_NAME") file_path = Path(__file__).parent / "speech.wav" # Play the speech file playsound(file_path) # Create the Azure OpenAI client # Call model to transcribe audio file except Exception as ex: print(ex) if __name__ == "__main__": main() ================================================ FILE: Labfiles/03-language/Clock.json ================================================ { "api-version": "2021-11-01-preview", "metadata": { "name": "Clock", "description": "Natural language clock", "type": "Conversation", "multilingual": false, "language": "en-us", "settings": { "confidenceThreshold": 0 } }, "assets": { "intents": [ { "name": "None" }, { "name": "GetTime" }, { "name": "GetDay" }, { "name": "GetDate" } ], "entities": [ { "name": "Location", "compositionSetting": "ReturnLongestOverlap", "list": null, "prebuiltEntities": null }, { "name": "Weekday", "compositionSetting": "ReturnLongestOverlap", "list": { "sublists": [ { "listKey": "Saturday", "synonyms": [ { "language": "en-us", "values": [ "Sat" ] } ] }, { "listKey": "Friday", "synonyms": [ { "language": "en-us", "values": [ "Fri" ] } ] }, { "listKey": "Thursday", "synonyms": [ { "language": "en-us", "values": [ "Thu", "Thur", "Thurs" ] } ] }, { "listKey": "Wednesday", "synonyms": [ { "language": "en-us", "values": [ "Wed", "Weds" ] } ] }, { "listKey": "Tuesday", "synonyms": [ { "language": "en-us", "values": [ "Tue", "Tues" ] } ] }, { "listKey": "Monday", "synonyms": [ { "language": "en-us", "values": [ "Mon" ] } ] }, { "listKey": "Sunday", "synonyms": [ { "language": "en-us", "values": [ "Sun" ] } ] } ] }, "prebuiltEntities": null }, { "name": "Date", "compositionSetting": "ReturnLongestOverlap", "list": null, "prebuiltEntities": [ { "displayName": "DateTime", "semanticType": "DateTime", "semanticSubtype": null } ] } ], "examples": [ { "text": "what day will it be on Dec 31st 2099?", "language": "en-us", "intent": "GetDay", "entities": [ { "entityName": "Date", "offset": 23, "length": 13 } ], "dataset": "Train" }, { "text": "what day was 01/01/1901?", "language": "en-us", "intent": "GetDay", "entities": [ { "entityName": "Date", "offset": 13, "length": 10 } ], "dataset": "Train" }, { "text": "what will the date be on Thurs?", "language": "en-us", "intent": "GetDate", "entities": [ { "entityName": "Weekday", "offset": 25, "length": 5 } ], "dataset": "Train" }, { "text": "what date will it be on Friday?", "language": "en-us", "intent": "GetDate", "entities": [ { "entityName": "Weekday", "offset": 24, "length": 6 } ], "dataset": "Train" }, { "text": "what date was it on Saturday?", "language": "en-us", "intent": "GetDate", "entities": [ { "entityName": "Weekday", "offset": 20, "length": 8 } ], "dataset": "Train" }, { "text": "what's the time in New York?", "language": "en-us", "intent": "GetTime", "entities": [ { "entityName": "Location", "offset": 19, "length": 8 } ], "dataset": "Train" }, { "text": "tell me the time in Paris?", "language": "en-us", "intent": "GetTime", "entities": [ { "entityName": "Location", "offset": 20, "length": 5 } ], "dataset": "Train" }, { "text": "what time is it in London?", "language": "en-us", "intent": "GetTime", "entities": [ { "entityName": "Location", "offset": 19, "length": 6 } ], "dataset": "Train" }, { "text": "what's today's date?", "language": "en-us", "intent": "GetDate", "entities": [], "dataset": "Train" }, { "text": "what is the date today?", "language": "en-us", "intent": "GetDate", "entities": [], "dataset": "Train" }, { "text": "what's the date?", "language": "en-us", "intent": "GetDate", "entities": [], "dataset": "Train" }, { "text": "what date is it?", "language": "en-us", "intent": "GetDate", "entities": [], "dataset": "Train" }, { "text": "what day of the week is it?", "language": "en-us", "intent": "GetDay", "entities": [], "dataset": "Train" }, { "text": "what is the day today?", "language": "en-us", "intent": "GetDay", "entities": [], "dataset": "Train" }, { "text": "what's the day?", "language": "en-us", "intent": "GetDay", "entities": [], "dataset": "Train" }, { "text": "what day is it?", "language": "en-us", "intent": "GetDay", "entities": [], "dataset": "Train" }, { "text": "tell me the time", "language": "en-us", "intent": "GetTime", "entities": [], "dataset": "Train" }, { "text": "what time is it?", "language": "en-us", "intent": "GetTime", "entities": [], "dataset": "Train" }, { "text": "what's the time?", "language": "en-us", "intent": "GetTime", "entities": [], "dataset": "Train" }, { "text": "what is the time?", "language": "en-us", "intent": "GetTime", "entities": [], "dataset": "Train" } ] } } ================================================ FILE: Labfiles/03-language/Python/clock-client/clock-client.py ================================================ from dotenv import load_dotenv import os import json from datetime import datetime, timedelta, date, timezone from dateutil.parser import parse as is_date # Import namespaces def main(): try: # Get Configuration Settings load_dotenv() ls_prediction_endpoint = os.getenv('LS_CONVERSATIONS_ENDPOINT') ls_prediction_key = os.getenv('LS_CONVERSATIONS_KEY') # Get user input (until they enter "quit") userText = '' while userText.lower() != 'quit': userText = input('\nEnter some text ("quit" to stop)\n') if userText.lower() != 'quit': # Create a client for the Language service model # Call the Language service model to get intent and entities # Apply the appropriate action except Exception as ex: print(ex) def GetTime(location): time_string = '' # Note: To keep things simple, we'll ignore daylight savings time and support only a few cities. # In a real app, you'd likely use a web service API (or write more complex code!) # Hopefully this simplified example is enough to get the the idea that you # use LU to determine the intent and entities, then implement the appropriate logic if location.lower() == 'local': now = datetime.now() time_string = '{}:{:02d}'.format(now.hour,now.minute) elif location.lower() == 'london': utc = datetime.now(timezone.utc) time_string = '{}:{:02d}'.format(utc.hour,utc.minute) elif location.lower() == 'sydney': time = datetime.now(timezone.utc) + timedelta(hours=11) time_string = '{}:{:02d}'.format(time.hour,time.minute) elif location.lower() == 'new york': time = datetime.now(timezone.utc) + timedelta(hours=-5) time_string = '{}:{:02d}'.format(time.hour,time.minute) elif location.lower() == 'nairobi': time = datetime.now(timezone.utc) + timedelta(hours=3) time_string = '{}:{:02d}'.format(time.hour,time.minute) elif location.lower() == 'tokyo': time = datetime.now(timezone.utc) + timedelta(hours=9) time_string = '{}:{:02d}'.format(time.hour,time.minute) elif location.lower() == 'delhi': time = datetime.now(timezone.utc) + timedelta(hours=5.5) time_string = '{}:{:02d}'.format(time.hour,time.minute) else: time_string = "I don't know what time it is in {}".format(location) return time_string def GetDate(day): date_string = 'I can only determine dates for today or named days of the week.' weekdays = { "monday":0, "tuesday":1, "wednesday":2, "thursday":3, "friday":4, "saturday":5, "sunday":6 } today = date.today() # To keep things simple, assume the named day is in the current week (Sunday to Saturday) day = day.lower() if day == 'today': date_string = today.strftime("%m/%d/%Y") elif day in weekdays: todayNum = today.weekday() weekDayNum = weekdays[day] offset = weekDayNum - todayNum date_string = (today + timedelta(days=offset)).strftime("%m/%d/%Y") return date_string def GetDay(date_string): # Note: To keep things simple, dates must be entered in US format (MM/DD/YYYY) try: date_object = datetime.strptime(date_string, "%m/%d/%Y") day_string = date_object.strftime("%A") except: day_string = 'Enter a date in MM/DD/YYYY format.' return day_string if __name__ == "__main__": main() ================================================ FILE: Labfiles/03-language/Python/clock-client/requirements.txt ================================================ python-dotenv python-dateutil ================================================ FILE: Labfiles/03-language/Python/readme.txt ================================================ This folder contains Python code ================================================ FILE: Labfiles/03-language/send-call.sh ================================================ curl -X POST "" \ -H "Ocp-Apim-Subscription-Key: " \ -H "Apim-Request-Id: " \ -H "Content-Type: application/json" \ -d "{\"kind\":\"Conversation\",\"analysisInput\":{\"conversationItem\":{\"id\":\"1\",\"text\":\"What's the time in Sydney\",\"modality\":\"text\",\"language\":\"EN\",\"participantId\":\"1\"}},\"parameters\":{\"projectName\":\"Clock\",\"verbose\":true,\"deploymentName\":\"production\",\"stringIndexType\":\"TextElement_V8\"}}" ================================================ FILE: Labfiles/04-azure-speech/Python/voice-mail/requirements.txt ================================================ python-dotenv playsound3 azure-identity azure-cognitiveservices-speech==1.48.2 ================================================ FILE: Labfiles/04-azure-speech/Python/voice-mail/voice-mail.py ================================================ from dotenv import load_dotenv import os from playsound3 import playsound # Import namespaces def main(): try: # Clear the console os.system('cls' if os.name == 'nt' else 'clear') # Get Configuration Settings load_dotenv() foundry_endpoint = os.getenv('FOUNDRY_ENDPOINT') foundry_key = os.getenv('FOUNDRY_KEY') # Create speech_config using Entra ID authentication # Loop until user quits inputText = "" while inputText.lower() != "3": inputText = input("Choose an option:\n1: Record a greeting\n2: Transcribe messages\n3: Exit\n") if inputText != "3": if inputText == "1": record_greeting(speech_config) elif inputText == "2": transcribe_messages(speech_config) elif inputText == "3": print("Exiting...") return else: print("Invalid option, please try again.") except Exception as ex: print(ex) # record_greeting function def record_greeting(speech_config): print("Recording greeting...") # Get greeting message from user greeting_message = input("Enter your greeting message: ") # Synthesize the greeting message to an audio file # transcribe_messages function def transcribe_messages(speech_config): print("Transcribing messages...") messages_folder = 'messages' for file_name in os.listdir(messages_folder): if file_name.endswith('.wav'): print(f"\nTranscribing {file_name}...") file_path = os.path.join(messages_folder, file_name) playsound(file_path) # Transcribe the audio file if __name__ == "__main__": main() ================================================ FILE: Labfiles/04-text-classification/Python/classify-text/articles/test1.txt ================================================ Celebrities come out for the big awards ceremony The stars of television and cinema were out in force on Thursday night for the first awards event of the season. The Contoso Awards celebrate artistic achievements in TV and file, and highlight the emerging stars we love to watch! ================================================ FILE: Labfiles/04-text-classification/Python/classify-text/articles/test2.txt ================================================ League best, worst XIs: Man United stars Pogba, Maguire had season to forget; Kane, Son shone for Spurs After a final day of maximum drama, the glittering prizes in the League are decided: Real Contoso champions for the fourth time in five years, London foiled in a photo finish, Fabrikam in the League and Adatum United marching on into another season in the top flight. But how about the individual accolades? And how about those who would probably prefer to forget the season? Everyone will have their own ideas about the real movers and shakers, so without further ado, here are this observer's best and worst teams of 2021-22. ================================================ FILE: Labfiles/04-text-classification/Python/classify-text/classify-text.py ================================================ from dotenv import load_dotenv import os # Import namespaces def main(): try: # Get Configuration Settings load_dotenv() ai_endpoint = os.getenv('AI_SERVICE_ENDPOINT') ai_key = os.getenv('AI_SERVICE_KEY') project_name = os.getenv('PROJECT') deployment_name = os.getenv('DEPLOYMENT') # Create client using endpoint and key # Read each text file in the articles folder batchedDocuments = [] articles_folder = 'articles' files = os.listdir(articles_folder) for file_name in files: # Read the file contents text = open(os.path.join(articles_folder, file_name), encoding='utf8').read() batchedDocuments.append(text) # Get Classifications except Exception as ex: print(ex) if __name__ == "__main__": main() ================================================ FILE: Labfiles/04-text-classification/Python/classify-text/requirements.txt ================================================ python-dotenv ================================================ FILE: Labfiles/04-text-classification/Python/readme.txt ================================================ This folder contains Python code ================================================ FILE: Labfiles/04-text-classification/classify-text.ps1 ================================================ # Update these with your service and model values $key="" $endpoint="" $projectName = "ClassifyLab" $deploymentName = "articles" $verbose = $false # Set up headers for API call $headers = @{} $headers.add("Ocp-Apim-Subscription-Key", $key) $headers.add("Content-Type","application/json") # Get text to classify $text_file = "test1.txt" if ($args.count -gt 0) { $text_file = $args[0] } try { $contents = get-content .\$text_file -raw -ErrorAction Stop } catch { Write-Host "`nError reading provided file, please verify file exists`n" Exit } # Build body of for API call $data = @{ "tasks" = @( @{ "kind" = "CustomSingleLabelClassification"; "taskName" = "Single Classification Label"; "parameters" = @{ "projectName" = $projectName; "deploymentName" = $deploymentName; } } ) "analysisInput" = @{ "documents" = @( @{ "id" = "doc1"; "language" = "en-us"; "text" = $contents; } ) } } | ConvertTo-Json -Depth 3 # Post text for classification Write-Host("`n***Submitting text classification task***") $response = Invoke-WebRequest -Method Post ` -Uri "$($endpoint)language/analyze-text/jobs?api-version=2023-04-01" ` -Headers $headers ` -Body $data # Output response if desired if ($verbose) { Write-Host("`nResponse header:$($response.Headers['Operation-Location'])`n") } # Extract the URL from the response # to call the API to getting the analysis results $resultUrl = $($response.Headers['Operation-Location']) # Create the header for the REST GET with only the subscription key $resultHeaders = @{} $resultHeaders.Add( "Ocp-Apim-Subscription-Key", $key ) # Get the results # Continue to request results until the analysis is "succeeded" Write-Host "Getting results..." Do { $result = Invoke-RestMethod -Method Get ` -Uri $resultUrl ` -Headers $resultHeaders | ConvertTo-Json -Depth 10 $analysis = ($result | ConvertFrom-Json) } while ($analysis.status -ne "succeeded") Write-Host "...Done`n" # Access the relevant fields from the analysis $classification = $result | ConvertFrom-Json $docs = $classification.tasks.items[0].results.documents # Output response if desired if ($verbose) { Write-Host("GET JSON Response:`n$result`n") } for (($idx = 0); $idx -lt $docs.Length; $idx++) { $item = $docs[$idx] Write-Host ("Document #", ($idx+1)) Write-Host (" - Predicted Category: ", $($item.class[0].category)) Write-Host (" - Confidence: ",$($item.class[0].confidenceScore)) } ================================================ FILE: Labfiles/04-text-classification/test1.txt ================================================ Investigating the potential for life around the stars When the world’s most powerful telescope launches into space this year, scientists will learn whether Earth-sized planets in our 'solar neighborhood' have a key prerequisite for life — an atmosphere. These planets orbit an M-dwarf, the smallest and most common type of star in the galaxy. Scientists do not currently know how common it is for Earth-like planets around this type of star to have characteristics that would make them habitable. "As a starting place, it is important to know whether small, rocky planets orbiting M-dwarfs have atmospheres," said Sydney Mattos, a doctoral student in Bellows College’s Department of Earth and Planetary Sciences. "If so, it opens up our search for life outside our solar system." ================================================ FILE: Labfiles/04-text-classification/test2.txt ================================================ League best, worst XIs: Man United stars Pogba, Maguire had season to forget; Kane, Son shone for Spurs After a final day of maximum drama, the glittering prizes in the League are decided: Real Contoso champions for the fourth time in five years, London foiled in a photo finish, Fabrikam in the League and Adatum United marching on into another season in the top flight. But how about the individual accolades? And how about those who would probably prefer to forget the season? Everyone will have their own ideas about the real movers and shakers, so without further ado, here are this observer's best and worst teams of 2021-22. ================================================ FILE: Labfiles/05-custom-entity-recognition/Python/custom-entities/ads/test1.txt ================================================ Bluetooth earbuds, $100. These work okay, but sometimes disconnect from the phone. I'm sure someone more technical that me could figure it out. Located in Sacramento, CA ================================================ FILE: Labfiles/05-custom-entity-recognition/Python/custom-entities/ads/test2.txt ================================================ Dog harness for sale, $20. Good condition, puppy just outgrew it. Local meet up in Tucson, AZ in a public location. ================================================ FILE: Labfiles/05-custom-entity-recognition/Python/custom-entities/custom-entities.py ================================================ from dotenv import load_dotenv import os # import namespaces def main(): try: # Get Configuration Settings load_dotenv() ai_endpoint = os.getenv('AI_SERVICE_ENDPOINT') ai_key = os.getenv('AI_SERVICE_KEY') project_name = os.getenv('PROJECT') deployment_name = os.getenv('DEPLOYMENT') # Create client using endpoint and key # Read each text file in the ads folder batchedDocuments = [] ads_folder = 'ads' files = os.listdir(ads_folder) for file_name in files: # Read the file contents text = open(os.path.join(ads_folder, file_name), encoding='utf8').read() batchedDocuments.append(text) # Extract entities except Exception as ex: print(ex) if __name__ == "__main__": main() ================================================ FILE: Labfiles/05-custom-entity-recognition/Python/custom-entities/requirements.txt ================================================ python-dotenv ================================================ FILE: Labfiles/05-custom-entity-recognition/Python/readme.txt ================================================ This folder contains Python code ================================================ FILE: Labfiles/05-custom-entity-recognition/extract-entities.ps1 ================================================ # Update these with your service and model values $key="" $endpoint="" $projectName = "customNERLab" $modelName = "customExtractAds" $verbose = $false # Set up headers for API call $headers = @{} $headers.add("Ocp-Apim-Subscription-Key", $key) $headers.add("Content-Type","application/json") # Get text to extract entities from $text_file = "test1.txt" if ($args.count -gt 0) { $text_file = $args[0] } try { $contents = get-content $text_file -raw -ErrorAction Stop } catch { Write-Host "`nError reading provided file, please verify file exists`n" Exit } # Build body of for API call $data = @{ "tasks" = @{ "customEntityRecognitionTasks" = @( @{ "parameters"= @{ "project-name" = $projectName "deployment-name" = $modelName } } ) } "analysisInput" = @{ "documents" = @( @{ "id" = "document_extractEntities" "text" = $contents } ) } } | ConvertTo-Json -Depth 6 # Post text for entity recognition Write-Host("`nSubmitting entity recognition task`n") $response = Invoke-WebRequest -Method Post ` -Uri "$($endpoint)text/analytics/v3.2-preview.2/analyze" ` -Headers $headers ` -Body $data # Output response if desired if ($verbose) { Write-Host("`nResponse header:$($response.Headers['Operation-Location'])`n") } # Extract the URL from the response # to call the API to getting the analysis results $resultUrl = $($response.Headers['Operation-Location']) # Create the header for the REST GET with only the subscription key $resultHeaders = @{} $resultHeaders.Add( "Ocp-Apim-Subscription-Key", $key ) # Get the results # Continue to request results until the analysis is "succeeded" Write-Host "Getting results..." Do { $result = Invoke-RestMethod -Method Get ` -Uri $resultUrl ` -Headers $resultHeaders | ConvertTo-Json -Depth 10 $analysis = ($result | ConvertFrom-Json) } while ($analysis.status -ne "succeeded") Write-Host "...Done`n" # Access the relevant fields from the analysis $extraction = $result | ConvertFrom-Json $docs = $extraction.tasks.customEntityRecognitionTasks[0].results.documents # Output response if desired if ($verbose) { Write-Host("GET JSON Response:`n$result`n") } for (($idx = 0); $idx -lt $docs.Length; $idx++) { $item = $docs[$idx] Write-Host ("Document #", ($idx+1)) $entities = $item.entities foreach ($entity in $entities) { Write-Host (" - Entity Category: $($entity.category)") Write-Host (" - Entity Text: $($entity.text)") Write-Host (" - Confidence: $($entity.confidenceScore)`n") } } ================================================ FILE: Labfiles/05-custom-entity-recognition/test1.txt ================================================ Bluetooth earbuds, $100. These work okay, but sometimes disconnect from the phone. I'm sure someone more technical that me could figure it out. Located in Sacramento, CA ================================================ FILE: Labfiles/05-custom-entity-recognition/test2.txt ================================================ Dog harness for sale, $20. Good condition, puppy just outgrew it. Local meet up in Tucson, AZ in a public location. ================================================ FILE: Labfiles/05-speech-tool/Python/speech-client/requirements.txt ================================================ python-dotenv azure-identity azure-ai-projects==2.0.0b4 ================================================ FILE: Labfiles/05-speech-tool/Python/speech-client/speech-client.py ================================================ from dotenv import load_dotenv import os # import namespaces def main(): try: # Clear the console os.system('cls' if os.name == 'nt' else 'clear') # Get Configuration Settings load_dotenv() foundry_endpoint = os.getenv('FOUNDRY_ENDPOINT') agent_name = os.getenv('AGENT_NAME') # Get project client # Get an OpenAI client # Main loop while True: # Get user input prompt = input("User prompt (or 'quit'): ") if prompt == "quit" or len(prompt) == 0: break else: # Use the agent to get a response except Exception as ex: print(ex) if __name__ == "__main__": main() ================================================ FILE: Labfiles/06-translator-sdk/Python/readme.txt ================================================ This folder contains Python code ================================================ FILE: Labfiles/06-translator-sdk/Python/translate-text/requirements.txt ================================================ python-dotenv ================================================ FILE: Labfiles/06-translator-sdk/Python/translate-text/translate.py ================================================ from dotenv import load_dotenv import os # import namespaces def main(): try: # Get Configuration Settings load_dotenv() translatorRegion = os.getenv('TRANSLATOR_REGION') translatorKey = os.getenv('TRANSLATOR_KEY') # Create client using endpoint and key ## Choose target language # Translate text except Exception as ex: print(ex) if __name__ == "__main__": main() ================================================ FILE: Labfiles/06-voice-live/Python/chat-client/chat-client.py ================================================ import os import asyncio import base64 import queue from dotenv import load_dotenv import pyaudio # import namespaces def main(): """Main entry point.""" try: # Clear the console os.system('cls' if os.name == 'nt' else 'clear') # Get required configuration from environment variables load_dotenv() endpoint = os.environ.get("AZURE_VOICELIVE_ENDPOINT") agent_name = os.environ.get("AZURE_VOICELIVE_AGENT_ID") project_name = os.environ.get("AZURE_VOICELIVE_PROJECT_NAME") # Create credential for authentication credential = AzureCliCredential() # Create and start the voice assistant assistant = VoiceAssistant( endpoint=endpoint, credential=credential, agent_name=agent_name, project_name=project_name ) # Run the assistant try: asyncio.run(assistant.start()) except KeyboardInterrupt: # Exit if the user enters CTRL+C print("\n👋 Goodbye!") except Exception as e: print(f"❌ An error occurred: {e}") # VoiceAssistant class - main coordinator for the voice agent class VoiceAssistant: """ Main voice assistant that coordinates the conversation flow. This class demonstrates the essential pattern for Azure VoiceLive: 1. Connect to the service 2. Configure the session 3. Start audio capture/playback 4. Process events from the service """ def __init__(self, endpoint, credential, agent_name, project_name): self.endpoint = endpoint self.credential = credential # Agent configuration self.agent_config = { "agent_name": agent_name, "project_name": project_name } async def start(self): """Start the voice assistant.""" print("\n" + "=" * 60) print("🎙️ AZURE VOICELIVE VOICE AGENT") print("=" * 60) # Add your code in this try block! try: # STEP 1: Connect Azure VoiceLive to the agent # STEP 2: Initialize audio processor # STEP 3: Configure the session # STEP 4: Start audio systems # STEP 5: Process events finally: if hasattr(self, 'audio_processor'): self.audio_processor.shutdown() async def setup_session(self): """Configure the session with audio settings.""" session_config = RequestSession( # Enable both text and audio modalities=[Modality.TEXT, Modality.AUDIO], # Audio format (16-bit PCM at 24kHz) input_audio_format=InputAudioFormat.PCM16, output_audio_format=OutputAudioFormat.PCM16, # Voice activity detection (when to detect speech) turn_detection=AzureSemanticVadMultilingual(), # Prevent echo from speaker feedback input_audio_echo_cancellation=AudioEchoCancellation(), # Reduce background noise input_audio_noise_reduction=AudioNoiseReduction(type="azure_deep_noise_suppression") ) await self.connection.session.update(session=session_config) print("⚙️ Session configured") async def process_events(self): """Process events from the VoiceLive service.""" # Listen for events from the service async for event in self.connection: await self.handle_event(event) async def handle_event(self, event): """Handle different event types from the service.""" # Session is ready - start capturing audio if event.type == ServerEventType.SESSION_UPDATED: print(f"📡 Connected to agent: {event.session.agent.name}") self.audio_processor.start_capture() # User speech was transcribed elif event.type == ServerEventType.CONVERSATION_ITEM_INPUT_AUDIO_TRANSCRIPTION_COMPLETED: print(f'👤 You: {event.get("transcript", "")}') # Agent is responding with audio transcript elif event.type == ServerEventType.RESPONSE_AUDIO_TRANSCRIPT_DONE: print(f'🤖 Agent: {event.get("transcript", "")}') # User started speaking (interrupt any playing audio) elif event.type == ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STARTED: self.audio_processor.clear_playback_queue() print("🎤 Listening...") # User stopped speaking elif event.type == ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STOPPED: print("🤔 Thinking...") # Receiving audio response chunks elif event.type == ServerEventType.RESPONSE_AUDIO_DELTA: self.audio_processor.queue_audio(event.delta) # Audio response complete elif event.type == ServerEventType.RESPONSE_AUDIO_DONE: print("✓ Response complete\n") # Handle errors elif event.type == ServerEventType.ERROR: print(f"❌ Error: {event.error.message}") # AudioProcessor class - handles microphone input and speaker output using PyAudio class AudioProcessor: """ Handles audio input (microphone) and output (speakers). Key responsibilities: - Capture audio from microphone and send to VoiceLive - Receive audio from VoiceLive and play through speakers """ def __init__(self, connection): self.connection = connection self.audio = pyaudio.PyAudio() # Audio settings: 24kHz, 16-bit PCM, mono self.format = pyaudio.paInt16 self.channels = 1 self.rate = 24000 self.chunk_size = 1200 # 50ms chunks # Streams for input and output self.input_stream = None self.output_stream = None self.playback_queue = queue.Queue() def start_capture(self): """Start capturing audio from the microphone.""" def capture_callback(in_data, frame_count, time_info, status): # Convert audio to base64 and send to VoiceLive audio_base64 = base64.b64encode(in_data).decode("utf-8") asyncio.run_coroutine_threadsafe( self.connection.input_audio_buffer.append(audio=audio_base64), self.loop ) return (None, pyaudio.paContinue) # Store event loop for use in callback thread self.loop = asyncio.get_event_loop() self.input_stream = self.audio.open( format=self.format, channels=self.channels, rate=self.rate, input=True, frames_per_buffer=self.chunk_size, stream_callback=capture_callback ) print("🎤 Microphone started") def start_playback(self): """Start audio playback system.""" remaining = bytes() def playback_callback(in_data, frame_count, time_info, status): nonlocal remaining # Calculate bytes needed bytes_needed = frame_count * pyaudio.get_sample_size(pyaudio.paInt16) output = remaining[:bytes_needed] remaining = remaining[bytes_needed:] # Get more audio from queue if needed while len(output) < bytes_needed: try: audio_data = self.playback_queue.get_nowait() if audio_data is None: # End signal break output += audio_data except queue.Empty: # Pad with silence if no audio available output += bytes(bytes_needed - len(output)) break # Keep any extra for next callback if len(output) > bytes_needed: remaining = output[bytes_needed:] output = output[:bytes_needed] return (output, pyaudio.paContinue) self.output_stream = self.audio.open( format=self.format, channels=self.channels, rate=self.rate, output=True, frames_per_buffer=self.chunk_size, stream_callback=playback_callback ) print("🔊 Speakers ready") def queue_audio(self, audio_data): """Add audio data to the playback queue.""" self.playback_queue.put(audio_data) def clear_playback_queue(self): """Clear any pending audio (used when user interrupts).""" while not self.playback_queue.empty(): try: self.playback_queue.get_nowait() except queue.Empty: break def shutdown(self): """Clean up audio resources.""" if self.input_stream: self.input_stream.stop_stream() self.input_stream.close() if self.output_stream: self.playback_queue.put(None) # Signal end self.output_stream.stop_stream() self.output_stream.close() self.audio.terminate() print("🔇 Audio stopped") if __name__ == "__main__": main() ================================================ FILE: Labfiles/06-voice-live/Python/chat-client/requirements.txt ================================================ dotenv aiohttp pyaudio ================================================ FILE: Labfiles/07-speech/Python/readme.txt ================================================ This folder contains Python code ================================================ FILE: Labfiles/07-speech/Python/speaking-clock/requirements.txt ================================================ python-dotenv azure.core ================================================ FILE: Labfiles/07-speech/Python/speaking-clock/speaking-clock.py ================================================ from dotenv import load_dotenv from datetime import datetime import os # Import namespaces def main(): # Clear the console os.system('cls' if os.name=='nt' else 'clear') try: global speech_config # Get config settings load_dotenv() speech_key = os.getenv('KEY') speech_region = os.getenv('REGION') # Configure speech service # Get spoken input command = TranscribeCommand() if command.lower() == 'what time is it?': TellTime() except Exception as ex: print(ex) def TranscribeCommand(): command = '' # Configure speech recognition # Process speech input # Return the command return command def TellTime(): now = datetime.now() response_text = 'The time is {}:{:02d}'.format(now.hour,now.minute) # Configure speech synthesis # Synthesize spoken output # Print the response print(response_text) if __name__ == "__main__": main() ================================================ FILE: Labfiles/07-translation/Python/readme.txt ================================================ This folder contains Python code ================================================ FILE: Labfiles/07-translation/Python/translators/requirements.txt ================================================ python-dotenv azure-identity azure-ai-translation-text==1.0.1 azure-cognitiveservices-speech==1.48.2 ================================================ FILE: Labfiles/07-translation/Python/translators/translate-speech.py ================================================ import os from dotenv import load_dotenv # Import namespaces def main(): try: # Clear the console os.system('cls' if os.name == 'nt' else 'clear') # Get Configuration Settings load_dotenv() foundry_endpoint = os.getenv('FOUNDRY_ENDPOINT') # Configure translation # Configure speech for synthesis of translations # Translate user speech # Print and speak the translation results except Exception as ex: print(ex) if __name__ == "__main__": main() ================================================ FILE: Labfiles/07-translation/Python/translators/translate-text.py ================================================ from dotenv import load_dotenv import os # import namespaces def main(): try: # Clear the console os.system('cls' if os.name == 'nt' else 'clear') # Get Configuration Settings load_dotenv() foundry_endpoint = os.getenv('FOUNDRY_ENDPOINT') # Create client using endpoint and credential ## Choose target language # Translate text except Exception as ex: print(ex) if __name__ == "__main__": main() ================================================ FILE: Labfiles/08-speech-translation/Python/readme.txt ================================================ This folder contains Python code ================================================ FILE: Labfiles/08-speech-translation/Python/translator/requirements.txt ================================================ python-dotenv azure.core ================================================ FILE: Labfiles/08-speech-translation/Python/translator/translator.py ================================================ from dotenv import load_dotenv from datetime import datetime import os # Import namespaces def main(): try: global speech_config global translation_config # Get Configuration Settings load_dotenv() speech_key = os.getenv('KEY') speech_region = os.getenv('REGION') # Configure translation # Configure speech # Get user input targetLanguage = '' while targetLanguage != 'quit': targetLanguage = input('\nEnter a target language\n fr = French\n es = Spanish\n hi = Hindi\n Enter anything else to stop\n').lower() if targetLanguage in translation_config.target_languages: Translate(targetLanguage) else: targetLanguage = 'quit' except Exception as ex: print(ex) def Translate(targetLanguage): translation = '' # Translate speech # Synthesize translation if __name__ == "__main__": main() ================================================ FILE: Labfiles/09-audio-chat/Python/audio-chat.py ================================================ import os import requests import base64 from dotenv import load_dotenv # Add references def main(): # Clear the console os.system('cls' if os.name=='nt' else 'clear') try: # Get configuration settings load_dotenv() project_endpoint = os.getenv("PROJECT_ENDPOINT") model_deployment = os.getenv("MODEL_DEPLOYMENT") # Initialize the project client # Get a chat client # Initialize prompts system_message = "You are an AI assistant for a produce supplier company." prompt = "" # Loop until the user types 'quit' while True: prompt = input("\nAsk a question about the audio\n(or type 'quit' to exit)\n") if prompt.lower() == "quit": break elif len(prompt) == 0: print("Please enter a question.\n") else: print("Getting a response ...\n") # Encode the audio file # Get a response to audio input except Exception as ex: print(ex) if __name__ == '__main__': main() ================================================ FILE: Labfiles/09-audio-chat/Python/requirements.txt ================================================ python-dotenv ================================================ FILE: Labfiles/11-voice-live-agent/python/.dockerignore ================================================ # Python __pycache__/ *.py[cod] *$py.class *.so .Python # Virtual environments .venv/ venv/ ENV/ env/ # UV cache .uv/ # IDE .vscode/ .idea/ *.swp *.swo *~ # Git .git/ .gitignore .gitattributes # Testing .pytest_cache/ .coverage htmlcov/ *.cover # Environment files (should be set via Azure config) .env .env.* # Build artifacts build/ dist/ *.egg-info/ # Documentation *.md docs/ # CI/CD .github/ .azure/ # Lock files (already copied via requirements.txt) uv.lock # Infrastructure as Code (not needed in runtime container) infra/ azure.yaml azdeploy.sh # Docker Dockerfile .dockerignore ================================================ FILE: Labfiles/11-voice-live-agent/python/.gitignore ================================================ # Python-generated files __pycache__/ *.py[oc] build/ dist/ wheels/ *.egg-info .env # Virtual environments .venv #azdeploy.sh drun.sh dbuild.sh .azure ================================================ FILE: Labfiles/11-voice-live-agent/python/.python-version ================================================ 3.10 ================================================ FILE: Labfiles/11-voice-live-agent/python/Dockerfile ================================================ FROM python:3.11-slim # Keep Python output unbuffered and avoid writing .pyc files ENV PYTHONDONTWRITEBYTECODE=1 ENV PYTHONUNBUFFERED=1 WORKDIR /app # Install system deps needed for some audio and crypto packages RUN apt-get update \ && apt-get install -y --no-install-recommends \ build-essential \ gcc \ libsndfile1 \ ffmpeg \ && rm -rf /var/lib/apt/lists/* # Install Python dependencies COPY requirements.txt ./ RUN pip install --root-user-action=ignore --upgrade pip setuptools wheel \ && pip install --root-user-action=ignore -r requirements.txt \ && pip install --root-user-action=ignore gunicorn # Copy app sources COPY . /app # Ensure src/ is on Python path for src-layout project without pip install . ENV PYTHONPATH="/app/src:${PYTHONPATH}" ENV PORT=5000 EXPOSE 5000 # Use gunicorn to serve the Flask app. The module exposes `app` at # src.flask_app:app so gunicorn can import it directly. CMD ["gunicorn", "--bind", "0.0.0.0:5000", "src.flask_app:app", "--workers", "1", "--threads", "4"] ================================================ FILE: Labfiles/11-voice-live-agent/python/README.md ================================================ # Requirements ## Run in Cloud Shell * Azure subscription with OpenAI access * If running in the Azure Cloud Shell, choose the Bash shell. The Azure CLI and Azure Developer CLI are included in the Cloud Shell. ## Run locally * You can run the web app locally after running the deployment script: * [Azure Developer CLI (azd)](https://learn.microsoft.com/en-us/azure/developer/azure-developer-cli/install-azd) * [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli) * Azure subscription with OpenAI access ## Environment Variables The `.env` file is created by the *azdeploy.sh* script. The AI model endpoint, API key, and model name are added during the deployment of the resources. ## Azure resource deployment The provided `azdeploy.sh` creates the required resources in Azure: * Change the two variables at the top of the script to match your needs, don't change anything else. * The script: * Deploys the *gpt-4o* model using AZD. * Creates Azure Container Registry service * Uses ACR tasks to build and deploy the Dockerfile image to ACR * Creates the App Service Plan * Creates the App Service Web App * Configures the web app for container image in ACR * Configures the web app environment variables * The script will provide the App Service endpoint The script provides two deployment options: 1. Full deployment; and 2. Redeploy the image only. Option 2 is only for post-deployment when you want to experiment with changes in the application. > Note: You can run the script in PowerShell, or Bash, using the `bash azdeploy.sh` command, this command also let's you run the script in Bash without having to make it an executable. ## Local development ### Provision AI model to Azure You can run the run the project locally and only provision the AI model following these steps: 1. **Initialize environment** (choose a descriptive name): ```bash azd env new gpt-realtime-lab --confirm # or: azd env new your-name-gpt-experiment --confirm ``` **Important**: This name becomes part of your Azure resource names! The `--confirm` flag sets this as your default environment without prompting. 1. **Set your resource group**: ```bash azd env set AZURE_RESOURCE_GROUP "rg-your-name-gpt" ``` 1. **Login and provision AI resources**: ```bash az login azd provision ``` > **Important**: Do NOT run `azd deploy` - the app is not configured in the AZD templates. If you only provisioned the model using the `azd provision` method you MUST create a `.env` file in the root of the directory with the following entries: ``` AZURE_VOICE_LIVE_ENDPOINT="" AZURE_VOICE_LIVE_API_KEY="" VOICE_LIVE_MODEL="" VOICE_LIVE_VOICE="en-US-JennyNeural" VOICE_LIVE_INSTRUCTIONS="You are a helpful AI assistant with a focus on world history. Respond naturally and conversationally. Keep your responses concise but engaging." VOICE_LIVE_VERBOSE="" #Suppresses excessive logging to the terminal if running locally ``` Notes: 1. The endpoint is the endpoint for the model and it should only include `https://.cognitiveservices.azure.com`. 1. The API key is the key for the model. 1. The model is the model name used during deployment. 1. You can retrieve these values from the AI Foundry portal. ### Running the project locally The project was was created and managed using **uv**, but it is not required to run. If you have **uv** installed: * Run `uv venv` to create the environment * Run `uv sync` to add packages * Alias created for web app: `uv run web` to start the `flask_app.py` script. * requirements.txt file created with `uv pip compile pyproject.toml -o requirements.txt` If you don't have **uv** installed: * Create environment: `python -m venv .venv` * Activate environment: `.\.venv\Scripts\Activate.ps1` * Install dependencies: `pip install -r requirements.txt` * Run application (from project root): `python .\src\real_time_voice\flask_app.py` ================================================ FILE: Labfiles/11-voice-live-agent/python/azdeploy.sh ================================================ #!/usr/bin/env bash # Script to deploy the Flask app to Azure App Service using a container from ACR # and provision AI Foundry with gtp-realtime model using AZD. # Only change the rg (resource group) and location variables below if needed. rg="rg-voicelive" # Replace with your resource group location="eastus2" # Or a location near you # ============================================================================ # DON'T CHANGE ANYTHING BELOW THIS LINE. # ============================================================================ # ============================================================================ # Deployment Mode Selection # ============================================================================ clear echo "Select deployment mode:" echo " 1) Full deployment (AI Foundry + Container + App Service) - ~15 minutes" echo " 2) Container update only (requires full deployment first) - ~5 minutes" echo "" read -p "Enter choice (1 or 2): " deploy_mode if [ "$deploy_mode" != "1" ] && [ "$deploy_mode" != "2" ]; then echo "ERROR: Invalid choice. Please enter 1 or 2." exit 1 fi # ============================================================================ # Service Name Generation (shared by both modes) # ============================================================================ # Generate consistent unique hash from Azure user object ID (guaranteed unique per user) user_object_id=$(az ad signed-in-user show --query "id" -o tsv 2>/dev/null) if [ -z "$user_object_id" ]; then echo "ERROR: Not authenticated with Azure. Please run: az login" exit 1 fi user_hash=$(echo -n "$user_object_id" | sha1sum | cut -c1-8) # Build ACR name: 'acr' prefix + 8-char hash (no hyphens, all lowercase, starts with letter) acr_name="acr${user_hash}" # App Service plan and webapp (hyphens allowed) appsvc_plan="appplan-${user_hash}" webapp_name="webapp-${user_hash}" image="rt-voice" tag="v1" azd_env_name="gpt-realtime" # Forced as unique at each run # ============================================================================ # Mode 2: Container Update Only # ============================================================================ if [ "$deploy_mode" = "2" ]; then clear echo "Starting container update (rebuild + redeploy)..." echo "" # Verify that the resources exist echo " - Verifying existing resources..." if ! az acr show -n $acr_name -g $rg >/dev/null 2>&1; then echo "ERROR: ACR '$acr_name' not found in resource group '$rg'" echo "You must run a full deployment (option 1) first." exit 1 fi if ! az webapp show -n $webapp_name -g $rg >/dev/null 2>&1; then echo "ERROR: Web App '$webapp_name' not found in resource group '$rg'" echo "You must run a full deployment (option 1) first." exit 1 fi echo " - Resources verified: ACR and Web App exist" # Build image echo " - Building updated image in ACR...(takes 3-5 minutes)" max_retries=3 retry_count=0 while [ "${retry_count}" -lt "${max_retries}" ]; do echo " - Attempt $((retry_count + 1)) of $max_retries: building image..." az acr build -r $acr_name --image ${acr_name}.azurecr.io/${image}:${tag} --file Dockerfile . >/dev/null 2>&1 if az acr repository show --name $acr_name --repository $image >/dev/null 2>&1; then echo " - Image successfully built and verified in ACR" break else echo " - Image not found in ACR, retrying build..." retry_count=$((retry_count + 1)) if [ $retry_count -lt $max_retries ]; then echo " - Waiting 5 seconds before retry..." sleep 5 fi fi done if [ $retry_count -eq $max_retries ]; then echo "ERROR: Failed to build image after $max_retries attempts" exit 1 fi # Restart web app to pull new image echo " - Restarting Web App to pull updated container..." az webapp restart --name "$webapp_name" --resource-group "$rg" >/dev/null echo "" echo "Container update complete!" echo " - Your app is available at: https://${webapp_name}.azurewebsites.net" echo " - App may take 1-2 minutes to restart with the new image." echo "" exit 0 fi # ============================================================================ # Mode 1: Full Deployment (original script flow) # ============================================================================ # Create the .env file cat > .env << 'EOF' # Do not change any settings in this file. Endpoint, API key, and model name are set automatically during deployment AZURE_VOICE_LIVE_ENDPOINT="" AZURE_VOICE_LIVE_API_KEY="" VOICE_LIVE_MODEL="" VOICE_LIVE_VOICE="en-US-JennyNeural" VOICE_LIVE_INSTRUCTIONS="You are a helpful AI assistant with a focus on world history. Respond naturally and conversationally. Keep your responses concise but engaging." VOICE_LIVE_VERBOSE="" #Suppresses excessive logging to the terminal if running locally EOF clear echo "Starting FULL deployment with AZD provisioning + App Service, takes about 15 minutes..." # Step 1: Provision AI Foundry with GPT Realtime model using AZD echo echo "Step 1: Provisioning AI Foundry with GPT Realtime model..." echo " - Setting up AZD environment..." # Clear local azd state only (safe for students - doesn't delete Azure resources) rm -rf ~/.azd 2>/dev/null || true # Also clear any project-level azd state rm -rf .azure 2>/dev/null || true # Create fresh environment with unique name timeout 5 azd env new $azd_env_name --confirm >/dev/null 2>&1 || azd env new $azd_env_name >/dev/null 2>&1 azd env set AZURE_LOCATION $location >/dev/null azd env set AZURE_RESOURCE_GROUP $rg >/dev/null subscription_id=$(az account show --query id -o tsv) azd env set AZURE_SUBSCRIPTION_ID "$subscription_id" >/dev/null echo " - AZD environment '$azd_env_name' created (fresh state)" echo " - Provisioning AI resources (forcing new deployment)..." echo " - Authenticating azd with Azure..." # Try ambient auth (Cloud Shell) with a 10s timeout to avoid hanging. # If it doesn't complete quickly, fall back to interactive device code. if ! timeout 5 azd auth login >/dev/null 2>&1; then azd auth login --use-device-code fi # Force a completely fresh deployment by combining multiple techniques azd config set alpha.infrastructure.deployment.name "azd-gpt-realtime-$(date +%s)" # Clear any cached deployment state and force deployment azd env refresh --no-prompt 2>/dev/null || true # Provision with retry logic to handle non-terminal resource state conflicts provision_retries=3 provision_attempt=0 while [ $provision_attempt -lt $provision_retries ]; do provision_attempt=$((provision_attempt + 1)) echo " - Running azd provision (attempt $provision_attempt of $provision_retries)..." if azd provision; then break fi if [ $provision_attempt -lt $provision_retries ]; then echo " - Provision failed (possible non-terminal resource conflict). Waiting 60 seconds before retry..." sleep 60 else echo "ERROR: azd provision failed after $provision_retries attempts." exit 1 fi done echo " - Retrieving AI Foundry endpoint, API key, and model name..." endpoint=$(azd env get-values --output json | jq -r '.AZURE_OPENAI_ENDPOINT') api_key=$(azd env get-values --output json | jq -r '.AZURE_OPENAI_API_KEY') model_name=$(azd env get-values --output json | jq -r '.AZURE_OPENAI_REALTIME_MODEL_NAME') if [ "$endpoint" = "null" ] || [ "$endpoint" = "" ] || [ "$api_key" = "null" ] || [ "$api_key" = "" ] || [ "$model_name" = "null" ] || [ "$model_name" = "" ]; then echo "ERROR: Failed to retrieve AI Foundry endpoint, API key, or model name from azd" echo "Please check the azd provision output and try again" exit 1 fi echo " - Updating .env file with AI Foundry credentials..." # Update .env file with the retrieved values if [ -f .env ]; then # Use sed to update existing values or add them if they don't exist sed -i "s|^AZURE_VOICE_LIVE_ENDPOINT=.*|AZURE_VOICE_LIVE_ENDPOINT=\"$endpoint\"|" .env sed -i "s|^AZURE_VOICE_LIVE_API_KEY=.*|AZURE_VOICE_LIVE_API_KEY=\"$api_key\"|" .env sed -i "s|^VOICE_LIVE_MODEL=.*|VOICE_LIVE_MODEL=\"$model_name\"|" .env echo " - .env file updated with AI Foundry credentials and model name" else echo "ERROR: .env file not found" exit 1 fi echo " - AI Foundry provisioning complete!" # Step 2: Continue with App Service deployment echo echo "Step 2: Create ACR and App Service resources..." # Create ACR and build image from Dockerfile echo " - Creating Azure Container Registry resource..." az acr create -n $acr_name -g $rg --sku Basic --admin-enabled true >/dev/null echo " - Resource created" echo " - Starting image build process in 10 seconds to reduce build failures." sleep 10 # To give time for the ACR service to be ready for build operations echo " - Building image in ACR...(takes 3-5 minutes per attempt)" # Build image with retry logic max_retries=3 retry_count=0 while [ $retry_count -lt $max_retries ]; do echo " - Attempt $((retry_count + 1)) of $max_retries: building image..." # Run the build command az acr build -r $acr_name --image ${acr_name}.azurecr.io/${image}:${tag} --file Dockerfile . >/dev/null 2>&1 # Check if the image exists in the registry if az acr repository show --name $acr_name --repository $image >/dev/null 2>&1; then echo " - Image successfully built and verified in ACR..." break else echo " - Image not found in ACR, retrying build..." retry_count=$((retry_count + 1)) if [ "${retry_count}" -lt "${max_retries}" ]; then echo " - Waiting 5 seconds before retry..." sleep 5 fi fi done if [ "${retry_count}" -eq "${max_retries}" ]; then echo "ERROR: Failed to build image after $max_retries attempts" echo "Please check your Dockerfile and try again manually with:" echo "az acr build -r $acr_name --image ${acr_name}.azurecr.io/${image}:${tag} --file Dockerfile ." exit 1 fi echo " - Container image build complete!" echo echo "Step 3: Configuring Azure App Service with updated credentials..." echo " - Gathering environment variables from .env file for App Service deployment.." # Parse the .env file exists in the repo root, and bring values into the script environment if [ -f .env ]; then while IFS='=' read -r key val; do # Trim whitespace key=$(echo "$key" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//') # Skip comments and empty lines case "$key" in ""|\#*) continue;; esac # Join remainder of line in case value contains '=' if echo "$val" | grep -q "="; then # Re-read the whole line and extract first = split only val=$(echo "${key}=${val}" | sed 's/^[^=]*=//') fi val=$(echo "$val" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//') # Remove surrounding quotes if present val="${val%\"}" val="${val#\"}" val="${val%\'}" val="${val#\'}" # Export into shell variable eval "${key}='${val}'" done < .env fi # Build env_vars using values from .env - one file to update env_vars=( AZURE_VOICE_LIVE_ENDPOINT="${AZURE_VOICE_LIVE_ENDPOINT}" AZURE_VOICE_LIVE_API_KEY="${AZURE_VOICE_LIVE_API_KEY}" VOICE_LIVE_MODEL="${VOICE_LIVE_MODEL}" VOICE_LIVE_VOICE="${VOICE_LIVE_VOICE}" VOICE_LIVE_INSTRUCTIONS="${VOICE_LIVE_INSTRUCTIONS}" ) echo " - Retrieving ACR credentials so App Service can access the container image..." # Use the retrieved ACR credentials to allow AppSvc to pull the image. acr_user=$(az acr credential show -n $acr_name --query username -o tsv | tr -d '\r') acr_pass=$(az acr credential show -n $acr_name --query passwords[0].value -o tsv | tr -d '\r') acr_login_server=$(az acr show --name $acr_name --query "loginServer" --output tsv | tr -d '\r') acr_image=${acr_login_server}/${image}:${tag} echo " - Creating App Service plan: $appsvc_plan Linux B1..." az appservice plan create --name "$appsvc_plan" \ --resource-group $rg \ --is-linux \ --sku B1 >/dev/null echo " - Creating Web App: ${webapp_name}..." # Create the webapp with Docker runtime for container deployment az webapp create --resource-group $rg \ --plan $appsvc_plan \ --name $webapp_name \ --runtime "PYTHON:3.10" >/dev/null echo " - Applying environment variables to web app..." az webapp config appsettings set --resource-group "$rg" \ --name "$webapp_name" \ --settings "${env_vars[@]}" >/dev/null echo " - Configuring Web App container settings to pull from ACR..." az webapp config container set \ --name "$webapp_name" \ --resource-group "$rg" \ --container-image-name "$acr_image" \ --container-registry-url "https://$acr_login_server" \ --container-registry-user "$acr_user" \ --container-registry-password "$acr_pass" >/dev/null echo " - Configuring app settings..." az webapp config set --resource-group "$rg" \ --name "$webapp_name" \ --startup-file "" \ --always-on true >/dev/null # Start / Restart to ensure container is pulled sleep 5 echo " - Restarting Web App to ensure new container image is pulled..." az webapp restart --name "$webapp_name" --resource-group "$rg" >/dev/null sleep 10 #Time for the service to restart and pull image # Show final URL and cleanup info echo echo "Deployment complete!" echo echo " - AI Foundry with GPT Realtime model: PROVISIONED" echo " - Flask app deployed to App Service: READY" echo " - Your app is available at: https://${webapp_name}.azurewebsites.net" echo echo "Note: App may take a few minutes to start after loading the web page." echo ================================================ FILE: Labfiles/11-voice-live-agent/python/azure.yaml ================================================ # Student template: GPT Realtime model resources for AI Foundry # This template ONLY provisions AI resources - it does NOT deploy the application # Students will use the OpenAI endpoint directly in their local development # yaml-language-server: $schema=https://raw.githubusercontent.com/Azure/azure-dev/main/schemas/v1.0/azure.yaml.json name: gpt-realtime-model metadata: template: azd-init@1.19.0 # Note: When running 'azd down', you will be prompted to purge the AI Foundry resource # To skip the prompt and auto-purge, use: azd down --purge # NO SERVICES - This template only provisions infrastructure # Students will run the Flask app locally and connect to the deployed OpenAI service # After provisioning, you'll get: # - Azure AI Foundry Project for experimentation # - OpenAI service with GPT realtime model deployed # - Endpoint and API key for your local Flask app # # DO NOT run 'azd deploy' - the application is deployed outside of this process. # This template only creates the AI resources students need ================================================ FILE: Labfiles/11-voice-live-agent/python/infra/ai-foundry.bicep ================================================ @description('Primary location for all resources') param location string @description('Name of the environment used to derive resource names') param environmentName string @description('Unique token for resource naming') param resourceToken string @description('Tags to apply to resources') param tags object = {} @description('Principal ID for role assignments') param principalId string // Create AI Foundry resource (modern approach - no separate project needed for model deployment) resource aiFoundry 'Microsoft.CognitiveServices/accounts@2025-04-01-preview' = { name: 'ai-foundry-${resourceToken}' location: location tags: union(tags, { 'azd-service-name': 'gpt-realtime-model' }) sku: { name: 'S0' } kind: 'AIServices' identity: { type: 'SystemAssigned' } properties: { allowProjectManagement: true customSubDomainName: 'ai-foundry-${resourceToken}' publicNetworkAccess: 'Enabled' disableLocalAuth: false } } // Deploy GPT Realtime model directly to AI Foundry resource gptRealtimeDeployment 'Microsoft.CognitiveServices/accounts/deployments@2024-10-01' = { parent: aiFoundry name: 'gpt-4o' sku: { name: 'GlobalStandard' capacity: 1 } properties: { model: { format: 'OpenAI' name: 'gpt-4o' version: '2024-11-20' } raiPolicyName: 'Microsoft.Default' } } // Role assignment for the user to access AI Foundry resource cognitiveServicesOpenAIUser 'Microsoft.Authorization/roleDefinitions@2022-04-01' existing = { scope: subscription() name: '5e0bd9bd-7b93-4f28-af87-19fc36ad61bd' // Cognitive Services OpenAI User } resource aiFoundryRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = { scope: aiFoundry name: guid(aiFoundry.id, principalId, cognitiveServicesOpenAIUser.id) properties: { roleDefinitionId: cognitiveServicesOpenAIUser.id principalId: principalId principalType: 'User' } } // Outputs output endpoint string = aiFoundry.properties.endpoint output apiKey string = aiFoundry.listKeys().key1 output realtimeModelName string = gptRealtimeDeployment.name output foundryName string = aiFoundry.name ================================================ FILE: Labfiles/11-voice-live-agent/python/infra/main.bicep ================================================ targetScope = 'subscription' @minLength(1) @maxLength(64) @description('Name of the environment used to derive resource names and tags.') param environmentName string @minLength(1) @description('Primary location for all resources') param location string @description('Id of the user or app to assign application roles') param principalId string @description('Name of the resource group for the AI project resources') param aiResourceGroupName string = '' // Tags to apply to all resources var tags = { 'azd-env-name': environmentName } // Generate unique suffix for resource names var resourceToken = toLower(uniqueString(subscription().id, environmentName, location)) // Create resource group for AI resources resource aiResourceGroup 'Microsoft.Resources/resourceGroups@2021-04-01' = { name: !empty(aiResourceGroupName) ? aiResourceGroupName : 'rg-${environmentName}' location: location tags: tags } // Deploy AI Foundry with GPT Realtime model - single resource approach module aiFoundry 'ai-foundry.bicep' = { name: 'ai-foundry' scope: aiResourceGroup params: { location: location environmentName: environmentName resourceToken: resourceToken tags: tags principalId: principalId } } // Outputs for azd environment variables output AZURE_LOCATION string = location output AZURE_TENANT_ID string = tenant().tenantId output AZURE_RESOURCE_GROUP string = aiResourceGroup.name // AI Foundry outputs output AZURE_OPENAI_ENDPOINT string = aiFoundry.outputs.endpoint output AZURE_OPENAI_API_KEY string = aiFoundry.outputs.apiKey output AZURE_OPENAI_REALTIME_MODEL_NAME string = aiFoundry.outputs.realtimeModelName output AZUREAI_FOUNDRY_NAME string = aiFoundry.outputs.foundryName ================================================ FILE: Labfiles/11-voice-live-agent/python/infra/main.parameters.json ================================================ { "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#", "contentVersion": "1.0.0.0", "parameters": { "environmentName": { "value": "${AZURE_ENV_NAME}" }, "location": { "value": "${AZURE_LOCATION}" }, "principalId": { "value": "${AZURE_PRINCIPAL_ID}" }, "principalType": { "value": "${AZURE_PRINCIPAL_TYPE=User}" }, "aiResourceGroupName": { "value": "${AZURE_RESOURCE_GROUP=}" } } } ================================================ FILE: Labfiles/11-voice-live-agent/python/pyproject.toml ================================================ [project] name = "real-time-voice" version = "0.1.0" description = "Add your description here" readme = "README.md" authors = [ { name = "JeffKoMS", email = "jeffko@microsoft.com" } ] requires-python = ">=3.10" dependencies = [ "aiohttp==3.11.18", "azure-core>=1.35.0", "azure-identity==1.22.0", "certifi==2025.4.26", "cffi==1.17.1", "cryptography==44.0.3", "pycparser==2.22", "python-dotenv==1.1.0", "requests==2.32.3", "typing_extensions==4.13.2", "urllib3==2.4.0", "azure-ai-voicelive==1.0.0b5", "Flask>=3.0.0,<4.0.0", ] [project.scripts] # Short names for uv run web = "src.flask_app:main" [build-system] requires = ["hatchling"] build-backend = "hatchling.build" [tool.hatch.build.targets.wheel] only-include = ["src"] sources = ["."] ================================================ FILE: Labfiles/11-voice-live-agent/python/requirements.txt ================================================ # This file was autogenerated by uv via the following command: # uv pip compile pyproject.toml -o requirements.txt aiohappyeyeballs==2.6.1 # via aiohttp aiohttp==3.11.18 # via real-time-voice (pyproject.toml) aiosignal==1.4.0 # via aiohttp async-timeout==5.0.1 # via aiohttp attrs==25.3.0 # via aiohttp azure-ai-voicelive==1.0.0b5 # via real-time-voice (pyproject.toml) azure-core==1.35.1 # via # real-time-voice (pyproject.toml) # azure-ai-voicelive # azure-identity azure-identity==1.22.0 # via real-time-voice (pyproject.toml) blinker==1.9.0 # via flask certifi==2025.4.26 # via # real-time-voice (pyproject.toml) # requests cffi==1.17.1 # via # real-time-voice (pyproject.toml) # cryptography charset-normalizer==3.4.3 # via requests click==8.1.8 # via flask colorama==0.4.6 # via click cryptography==44.0.3 # via # real-time-voice (pyproject.toml) # azure-identity # msal # pyjwt flask==3.1.2 # via real-time-voice (pyproject.toml) frozenlist==1.7.0 # via # aiohttp # aiosignal idna==3.10 # via # requests # yarl isodate==0.7.2 # via azure-ai-voicelive itsdangerous==2.2.0 # via flask jinja2==3.1.6 # via flask markupsafe==3.0.2 # via # flask # jinja2 # werkzeug msal==1.33.0 # via # azure-identity # msal-extensions msal-extensions==1.3.1 # via azure-identity multidict==6.6.4 # via # aiohttp # yarl propcache==0.3.2 # via # aiohttp # yarl pycparser==2.22 # via # real-time-voice (pyproject.toml) # cffi pyjwt==2.10.1 # via msal python-dotenv==1.1.0 # via real-time-voice (pyproject.toml) requests==2.32.3 # via # real-time-voice (pyproject.toml) # azure-core # msal six==1.17.0 # via azure-core typing-extensions==4.13.2 # via # real-time-voice (pyproject.toml) # aiosignal # azure-ai-voicelive # azure-core # azure-identity # multidict urllib3==2.4.0 # via # real-time-voice (pyproject.toml) # requests werkzeug==3.1.3 # via flask yarl==1.20.1 # via aiohttp ================================================ FILE: Labfiles/11-voice-live-agent/python/src/__init__.py ================================================ """Real-Time Voice package root. Reintroduced to allow hatchling to detect the package under src/ for building wheels/editable installs. """ from importlib.metadata import version, PackageNotFoundError try: # pragma: no cover - simple metadata fetch __version__ = version("real-time-voice") except PackageNotFoundError: # pragma: no cover __version__ = "0.0.0+dev" __all__ = ["__version__"] ================================================ FILE: Labfiles/11-voice-live-agent/python/src/flask_app.py ================================================ from __future__ import annotations from pathlib import Path import threading import asyncio import time import logging import traceback from typing import Optional, Tuple, Union, cast, List, Dict, Any import queue import json import base64 from aiohttp import web from flask import Flask, render_template, jsonify, Response, request app = Flask(__name__, template_folder=str(Path(__file__).parent / "templates"), static_folder=str(Path(__file__).parent / "static")) # ============================================================================== # GLOBAL STATE & CONFIGURATION # ============================================================================== # WebSocket server configuration WS_SERVER_HOST = '0.0.0.0' WS_SERVER_PORT = 8765 # Assistant state tracking state_lock = threading.Lock() assistant_state = { "state": "idle", "message": "Select 'Start Session' to begin a voice session.", "last_error": None, "connected": False, } # Threading components assistant_thread: Optional[threading.Thread] = None assistant_instance = None assistant_loop: Optional[asyncio.AbstractEventLoop] = None ws_server_thread: Optional[threading.Thread] = None # Server-Sent Events client management _sse_clients: List["queue.Queue[str]"] = [] _sse_clients_lock = threading.Lock() # ============================================================================== # UTILITY FUNCTIONS # ============================================================================== def _broadcast(event: Dict[str, Any]): """Broadcast SSE event to all connected clients.""" data = f"data: {json.dumps(event)}\n\n" with _sse_clients_lock: # Remove dead clients while broadcasting dead_clients = [] for client_queue in _sse_clients: try: client_queue.put_nowait(data) except Exception: dead_clients.append(client_queue) # Clean up disconnected clients for dead_client in dead_clients: _sse_clients.remove(dead_client) # ============================================================================== # WEBSOCKET AUDIO SERVER # ============================================================================== def _start_ws_server(host: str = WS_SERVER_HOST, port: int = WS_SERVER_PORT): """Start WebSocket server for low-latency binary audio streaming.""" async def handle_audio_websocket(request): """Handle incoming WebSocket connections for binary audio data.""" ws = web.WebSocketResponse(max_msg_size=10 * 1024 * 1024) await ws.prepare(request) try: async for msg in ws: if msg.type == web.WSMsgType.BINARY and assistant_instance and assistant_loop: # Convert binary PCM16 to base64 and send to assistant audio_b64 = base64.b64encode(msg.data).decode('utf-8') asyncio.run_coroutine_threadsafe( assistant_instance.append_audio(audio_b64), assistant_loop ) elif msg.type == web.WSMsgType.ERROR: break except Exception: pass # Handle connection errors gracefully finally: await ws.close() return ws def run_websocket_server(): """Run WebSocket server in dedicated thread.""" loop = asyncio.new_event_loop() asyncio.set_event_loop(loop) async def start_server(): app = web.Application() app.router.add_get('/ws-audio', handle_audio_websocket) runner = web.AppRunner(app) await runner.setup() site = web.TCPSite(runner, host, port) await site.start() # Keep server running while True: await asyncio.sleep(3600) try: loop.run_until_complete(start_server()) except Exception: pass # Start server in daemon thread thread = threading.Thread(target=run_websocket_server, daemon=True) thread.start() return thread def set_state(state: str, message: str, *, error: str | None = None): """Update assistant state and broadcast to clients.""" with state_lock: assistant_state["state"] = state assistant_state["message"] = message if error: assistant_state["last_error"] = error # Update connection status based on state if state in {"ready", "listening", "processing", "assistant_speaking"}: assistant_state["connected"] = True elif state in {"stopped", "idle"}: assistant_state["connected"] = False # Broadcast state change to all clients _broadcast({ "type": "status", "state": state, "message": message, "last_error": assistant_state.get("last_error"), "connected": assistant_state.get("connected"), }) # Basic logging (can be overridden by parent app) logger = logging.getLogger("real_time_voice.flask") if not logger.handlers: logging.basicConfig(level=logging.INFO, format="[%(asctime)s] %(levelname)s in %(name)s: %(message)s") # --------------------------------------------------------------------------- # Suppress noisy 200 OK HTTP access logs (Werkzeug dev server) while keeping # non-200 responses and internal status/log broadcasts. # --------------------------------------------------------------------------- class _SuppressHTTP200(logging.Filter): def filter(self, record: logging.LogRecord) -> bool: # noqa: D401 - simple filter msg = record.getMessage() # Typical pattern: '127.0.0.1 - - [timestamp] "POST /audio-chunk HTTP/1.1" 200 -' # Suppress any line that clearly denotes an HTTP 200 access log. if '" 200 ' in msg: return False return True werkzeug_logger = logging.getLogger("werkzeug") # Avoid stacking multiple identical filters if code reloaded (Flask debug reload) already = any(isinstance(f, _SuppressHTTP200) for f in getattr(werkzeug_logger, 'filters', [])) if not already: werkzeug_logger.addFilter(_SuppressHTTP200()) def _validate_env() -> Tuple[bool, str]: """Validate required environment variables.""" import os required_vars = [ "VOICE_LIVE_MODEL", "VOICE_LIVE_VOICE", "AZURE_VOICE_LIVE_API_KEY", "AZURE_VOICE_LIVE_ENDPOINT" ] missing = [var for var in required_vars if not os.environ.get(var)] if missing: return False, f"Missing required environment variables: {', '.join(missing)}" return True, "Configuration valid" class BasicVoiceAssistant: """Minimal assistant implementation for VoiceLive API. Handles real-time voice conversation using Azure's VoiceLive service. Manages connection, session configuration, and event processing. """ # BEGIN VOICELIVE ASSISTANT IMPLEMENTATION # END VOICELIVE ASSISTANT IMPLEMENTATION verbose_val = __import__('os').environ.get('VOICE_LIVE_VERBOSE', '0').strip() verbose = bool(int(verbose_val)) if verbose_val.isdigit() else False try: _broadcast({"type": "log", "level": "info", "msg": f"Connecting to VoiceLive endpoint={self.endpoint} model={self.model} voice={self.voice}"}) # Establish async connection to Azure VoiceLive service with optimized settings async with connect( endpoint=self.endpoint, credential=self.credential, model=self.model, connection_options={"max_msg_size": 10 * 1024 * 1024, "heartbeat": 20, "timeout": 20}, ) as conn: self.connection = conn # Reset cancellation flag at the start of a new connection/session self._response_cancelled = False # Configure voice: use AzureStandardVoice for locale-specific voices, plain string for others if self.voice.startswith("en-") or "-" in self.voice: voice_cfg: Union[str, AzureStandardVoice] = AzureStandardVoice(name=self.voice) else: voice_cfg = self.voice # BEGIN CONFIGURE VOICELIVE SESSION # END CONFIGURE VOICELIVE SESSION # Main event processing loop - handle all VoiceLive server events async for event in conn: if self._stopping: break await self._handle_event(event, conn, verbose) except Exception as e: tb = traceback.format_exc(limit=6) _broadcast({"type": "log", "level": "error", "msg": f"Connection failed: {e}", "trace": tb}) self.state_callback("error", f"Connection failed: {e}") return # Cleanup (no local audio resources now) self.connection = None async def append_audio(self, audio_b64: str): """Send base64-encoded audio data to VoiceLive input buffer.""" if not self.connection: return try: await self.connection.input_audio_buffer.append(audio=audio_b64) except Exception as e: # pragma: no cover logger.error("Failed to append audio: %s", e) async def _handle_event(self, event, conn, verbose=False): """Handle VoiceLive events with clear separation by event type.""" # Import event types for processing different VoiceLive server events from azure.ai.voicelive.models import ServerEventType event_type = event.type if verbose: _broadcast({"type": "log", "level": "debug", "event_type": str(event_type)}) # Route VoiceLive server events to appropriate handlers if event_type == ServerEventType.SESSION_UPDATED: await self._handle_session_updated() elif event_type == ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STARTED: await self._handle_speech_started(conn) elif event_type == ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STOPPED: await self._handle_speech_stopped() elif event_type == ServerEventType.RESPONSE_AUDIO_DELTA: await self._handle_audio_delta(event) elif event_type == ServerEventType.RESPONSE_AUDIO_DONE: await self._handle_audio_done() elif event_type == ServerEventType.RESPONSE_DONE: # Reset cancellation flag but don't change state - _handle_audio_done already did self._response_cancelled = False elif event_type == ServerEventType.ERROR: await self._handle_error(event) # BEGIN HANDLE SESSION EVENTS # END HANDLE SESSION EVENTS async def _handle_audio_delta(self, event): """Stream assistant audio to clients.""" if self._response_cancelled: return # Skip cancelled responses # Update state when assistant starts speaking if assistant_state.get("state") != "assistant_speaking": self.state_callback("assistant_speaking", "Assistant speaking…") # Extract and broadcast VoiceLive audio delta as base64 to WebSocket clients audio_data = getattr(event, "delta", None) if audio_data: audio_b64 = base64.b64encode(audio_data).decode("utf-8") _broadcast({"type": "audio", "audio": audio_b64}) async def _handle_audio_done(self): """Assistant finished speaking.""" self._response_cancelled = False self.state_callback("ready", "Assistant finished. You can speak again.") async def _handle_error(self, event): """Handle VoiceLive errors.""" error = getattr(event, "error", None) message = getattr(error, "message", "Unknown error") if error else "Unknown error" self.state_callback("error", f"Error: {message}") def request_stop(self): self._stopping = True def _run_assistant_bg(): """Background thread target to run the async assistant until completion.""" global assistant_instance, shutdown_requested, assistant_loop try: import os from azure.core.credentials import AzureKeyCredential, TokenCredential endpoint = os.environ.get("AZURE_VOICE_LIVE_ENDPOINT") model = os.environ.get("VOICE_LIVE_MODEL") voice = os.environ.get("VOICE_LIVE_VOICE") instructions = os.environ.get("VOICE_LIVE_INSTRUCTIONS") or "You are a helpful voice assistant." # Validate required environment variables using helper ok, msg = _validate_env() if not ok: set_state("error", msg) return # At this point _validate_env() ensured these are present; cast for type-checkers endpoint = cast(str, endpoint) model = cast(str, model) voice = cast(str, voice) # Use API key authentication for the web app (AZURE_VOICE_LIVE_API_KEY) api_key = os.environ.get("AZURE_VOICE_LIVE_API_KEY") if not api_key: set_state("error", "Missing AZURE_VOICE_LIVE_API_KEY environment variable") return credential = AzureKeyCredential(api_key) logger.info("Using API key authentication for VoiceLive (AZURE_VOICE_LIVE_API_KEY)") def cb(state, message): set_state(state, message) assistant_instance = BasicVoiceAssistant( endpoint=endpoint, credential=credential, model=model, voice=voice, instructions=instructions, state_callback=cb, ) assistant_loop = asyncio.new_event_loop() asyncio.set_event_loop(assistant_loop) assistant_loop.run_until_complete(assistant_instance.start()) set_state("stopped", "Session ended.") except Exception as e: # pragma: no cover - runtime safety tb = traceback.format_exc(limit=6) logger.error("Assistant crashed: %s\n%s", e, tb) set_state("error", f"Assistant crashed: {e}", error=tb) finally: try: if assistant_loop and assistant_loop.is_running(): assistant_loop.stop() except Exception: pass @app.post("/start-session") def start_session(): global assistant_thread with state_lock: if assistant_state["state"] in {"starting", "ready", "listening", "processing", "assistant_speaking"}: return jsonify({"started": False, "status": assistant_state}) ok, msg = _validate_env() if not ok: set_state("error", msg, error=msg) return jsonify({"started": False, "status": assistant_state}), 400 with state_lock: assistant_state["state"] = "starting" assistant_state["message"] = "Starting voice session…" assistant_state["last_error"] = None assistant_state["connected"] = False assistant_thread = threading.Thread(target=_run_assistant_bg, daemon=True) assistant_thread.start() # Ensure websocket server for low-latency audio streaming is running global ws_server_thread if not ws_server_thread: try: ws_server_thread = _start_ws_server() except Exception: pass # Give the thread a brief moment to progress time.sleep(0.1) return jsonify({"started": True, "status": assistant_state}) @app.post("/stop-session") def stop_session(): global assistant_instance if not assistant_instance: return jsonify({"stopped": False, "reason": "No active session"}), 400 assistant_instance.request_stop() set_state("stopped", "Stopping session…") return jsonify({"stopped": True}) @app.post("/interrupt") def interrupt(): """Request an interruption of the current assistant response. Attempt to cancel the active response on the assistant connection. If the SDK doesn't expose a cancel method, fall back to requesting a stop on the assistant instance. """ global assistant_instance, assistant_loop if not assistant_instance or not assistant_loop: return jsonify({"interrupted": False, "reason": "No active session"}), 400 try: # Mark response cancelled on the assistant instance immediately so the # event loop will suppress broadcasting further RESPONSE_AUDIO_DELTA events try: if assistant_instance: assistant_instance._response_cancelled = True except Exception: pass # Immediately instruct connected clients to stop any pending playback _broadcast({"type": "log", "level": "debug", "msg": f"Interrupt requested: broadcasting stop_playback at {time.time()}"}) _broadcast({"type": "control", "action": "stop_playback"}) # Also, stop assistant playback on the server-side audio processor (if present) try: ap = getattr(assistant_instance, 'connection', None) # The assistant_instance in this design doesn't own the audio processor when # running in the flask app variant; instead we attempt to stop playback via # any audio processor attached to assistant_instance (if available). ap_obj = getattr(assistant_instance, 'audio_processor', None) if ap_obj and hasattr(ap_obj, 'stop_playback') and assistant_loop: try: # Schedule stop_playback to run promptly on the assistant loop asyncio.run_coroutine_threadsafe(ap_obj.stop_playback(), assistant_loop) except Exception: pass except Exception: pass # Schedule the SDK-level cancel immediately on the assistant loop for low latency try: conn = getattr(assistant_instance, "connection", None) resp = getattr(conn, "response", None) if conn else None if resp and hasattr(resp, "cancel") and assistant_loop: try: asyncio.run_coroutine_threadsafe(resp.cancel(), assistant_loop) _broadcast({"type": "log", "level": "info", "msg": "Interrupt scheduled (cancel)"}) except Exception as e: _broadcast({"type": "log", "level": "error", "msg": f"Failed to schedule cancel(): {e}"}) else: _broadcast({"type": "log", "level": "warn", "msg": "No response.cancel() available; cannot perform graceful interrupt via SDK."}) except Exception as e: _broadcast({"type": "log", "level": "error", "msg": f"Interrupt handler exception: {e}"}) return jsonify({"interrupted": True}) except Exception as e: return jsonify({"interrupted": False, "reason": str(e)}), 500 @app.post("/audio-chunk") def audio_chunk(): """Receive base64 PCM16 (24kHz mono) audio from browser.""" global assistant_instance, assistant_loop if not assistant_instance or not assistant_loop: return jsonify({"accepted": False, "reason": "No active session"}), 400 try: payload = request.get_json(silent=True) or {} audio_b64 = payload.get("audio") if not audio_b64: return jsonify({"accepted": False, "reason": "Missing audio field"}), 400 # Schedule append inside assistant loop inst = assistant_instance if not inst: return jsonify({"accepted": False, "reason": "Assistant not ready"}), 503 def _task(): return asyncio.create_task(inst.append_audio(audio_b64)) assistant_loop.call_soon_threadsafe(_task) return jsonify({"accepted": True}) except Exception as e: # pragma: no cover return jsonify({"accepted": False, "reason": str(e)}), 500 @app.get("/events") def sse_events(): """Server-Sent Events stream for status + audio.""" q: "queue.Queue[str]" = queue.Queue() with _sse_clients_lock: _sse_clients.append(q) # Send current state immediately q.put_nowait( "data: " + json.dumps( { "type": "status", "state": assistant_state["state"], "message": assistant_state["message"], "last_error": assistant_state.get("last_error"), "connected": assistant_state.get("connected"), } ) + "\n\n" ) def gen(): try: while True: msg = q.get() yield msg except GeneratorExit: # client disconnected with _sse_clients_lock: if q in _sse_clients: _sse_clients.remove(q) return Response(gen(), mimetype="text/event-stream") @app.get("/status") def status(): with state_lock: return jsonify(assistant_state) @app.get("/health") def health(): with state_lock: return jsonify({ "ok": assistant_state.get("state") not in {"error"}, "state": assistant_state.get("state"), "connected": assistant_state.get("connected"), "has_connection_obj": bool(assistant_instance and getattr(assistant_instance, 'connection', None)), }), 200 @app.get("/") def index(): """Render the main UI and expose selected environment variables for display. We intentionally show the *values* of a small set of environment variables so the developer or tester can confirm configuration in the browser. Values are displayed as the variable value or "(not set)" when missing. """ import os env = { "VOICE_LIVE_MODEL": os.environ.get("VOICE_LIVE_MODEL") or "(not set)", "VOICE_LIVE_VOICE": os.environ.get("VOICE_LIVE_VOICE") or "(not set)", "AZURE_VOICE_LIVE_ENDPOINT": os.environ.get("AZURE_VOICE_LIVE_ENDPOINT") or "(not set)", "VOICE_LIVE_INSTRUCTIONS": os.environ.get("VOICE_LIVE_INSTRUCTIONS") or "(not set)", } return render_template("index.html", env=env) # The root route is implemented above with environment values passed in. def main() -> None: # Basic dev server; in production consider a WSGI/ASGI server like gunicorn or uvicorn. import os host = os.environ.get("HOST", "0.0.0.0") port = int(os.environ.get("PORT", os.environ.get("FLASK_RUN_PORT", "5000"))) debug_env = os.environ.get("FLASK_DEBUG", os.environ.get("DEBUG", "0")) debug = bool(int(debug_env)) if str(debug_env).isdigit() else debug_env.lower() in ("1", "true", "yes") app.run(host=host, port=port, debug=debug) if __name__ == "__main__": # pragma: no cover main() ================================================ FILE: Labfiles/11-voice-live-agent/python/src/static/app.js ================================================ // ============================= // UI ELEMENTS // ============================= const startBtn = document.getElementById('startBtn'); const stopBtn = document.getElementById('stopBtn'); const statusBox = document.getElementById('statusBox'); const statusText = document.getElementById('statusText'); const statusMsg = document.getElementById('statusMsg'); const logEl = document.getElementById('log'); // ============================= // CONFIGURATION & STATE // ============================= // Audio configuration const TARGET_RATE = 24000; const CHUNK_DURATION_MS = 150; const MAX_LOG_LINES = 250; // Connection state let eventSource = null; let wsAudio = null; let stopped = false; // Audio capture state let micStream = null; let audioContext = null; let processorNode = null; let capturing = false; let pendingFloat = []; let inputSampleRate = 48000; // Audio playback state let nextPlayTime = 0; let suspendPlayback = false; let assistantSources = []; // UI state let readySince = null; function log(msg, level='info', obj){ const line = document.createElement('div'); line.className = level === 'error' ? 'err' : level === 'debug' ? 'dbg' : level === 'warn' ? 'warn' : ''; const ts = new Date().toISOString().split('T')[1].replace('Z',''); line.textContent = `[${ts} ${level}] ${msg}`; if(obj) { line.title = typeof obj === 'string' ? obj : JSON.stringify(obj).slice(0,300); } logEl.appendChild(line); while(logEl.children.length > MAX_LOG_LINES) logEl.removeChild(logEl.firstChild); logEl.scrollTop = logEl.scrollHeight; } function updateStatusUI(data){ if(!data) return; statusText.textContent = data.state; statusMsg.textContent = data.message || ''; const s = data.state; let colorVar = 'var(--c-status-idle)'; if(['starting','processing'].includes(s)) colorVar = 'var(--c-status-processing)'; else if(s === 'assistant_speaking') colorVar = 'var(--c-status-speaking)'; else if(s === 'listening') colorVar = 'var(--c-status-listening)'; else if(s === 'error') colorVar = 'var(--c-status-error)'; else if(s === 'ready') colorVar = 'var(--c-status-ready)'; else if(s === 'stopped') colorVar = 'var(--c-status-stopped)'; statusBox.style.background = colorVar; if(s === 'stopped' || s === 'idle' || s === 'error') { stopMicCapture(); stopBtn.disabled = true; startBtn.disabled = false; startBtn.textContent = 'Start Session'; } if(s === 'ready') { if(!readySince) readySince = performance.now(); } else { readySince = null; } if(s === 'ready' && !capturing && !stopped) { startMicCapture().catch(e=>log('Mic capture failed: '+e,'error')); } } // Gentle nudge: if still Ready after 3s and user hasn't spoken (no transition to listening), remind them setInterval(()=>{ if(readySince && (performance.now() - readySince) > 3000 && statusText.textContent === 'ready') { if(!statusMsg.textContent.includes('Speak')) { statusMsg.textContent = 'Ready – start speaking now.'; } } }, 1000); // ============================= // SSE HANDLING // ============================= function openEventSource(){ if(eventSource){ eventSource.close(); } eventSource = new EventSource('/events'); eventSource.onmessage = handleSSEMessage; eventSource.onerror = () => log('SSE connection error (will retry if closed).','warn'); log('SSE connection opened'); } function handleSSEMessage(ev) { if(!ev.data) return; try { const data = JSON.parse(ev.data); switch(data.type) { case 'status': handleStatusUpdate(data); break; case 'audio': handleAudioData(data); break; case 'log': log(data.msg || data.event_type || JSON.stringify(data), data.level || 'info'); break; case 'control': handleControlEvent(data); break; } } catch(e){ log('Bad SSE message: '+ e,'error'); } } function handleStatusUpdate(data) { updateStatusUI(data); // Resume playback when assistant starts new response or reaches ready if(data.state === 'ready' || data.state === 'assistant_speaking') { if(suspendPlayback) { log('Resuming assistant playback (' + data.state + ')','debug'); suspendPlayback = false; } } } function handleAudioData(data) { if(suspendPlayback) { return; // Drop audio while playback suspended } playAssistantPcm16(data.audio); } function handleControlEvent(data) { if(data.action === 'stop_playback'){ try { stopAllAssistantPlayback(); } catch(_){ } suspendPlayback = true; log('Received stop_playback control from server; suspending playback until ready','debug'); } } // ============================= // AUDIO CAPTURE & ENCODE // ============================= function openAudioWebSocket(){ try{ const wsUrl = (location.protocol === 'https:' ? 'wss://' : 'ws://') + location.hostname + ':8765/ws-audio'; wsAudio = new WebSocket(wsUrl); wsAudio.binaryType = 'arraybuffer'; wsAudio.onopen = () => log('Audio websocket opened','debug'); wsAudio.onerror = (e) => { log('Audio websocket error','warn'); wsAudio = null; }; wsAudio.onclose = () => { log('Audio websocket closed','debug'); wsAudio = null; }; }catch(e){ wsAudio = null; } } function ensureAudioContext(){ if(!audioContext){ audioContext = new (window.AudioContext || window.webkitAudioContext)({sampleRate: 48000}); inputSampleRate = audioContext.sampleRate; nextPlayTime = audioContext.currentTime; } } async function startMicCapture(){ if(capturing) return; ensureAudioContext(); log('Requesting microphone…'); micStream = await navigator.mediaDevices.getUserMedia({audio: { echoCancellation:true, noiseSuppression:true, channelCount:1 }, video:false}); const source = audioContext.createMediaStreamSource(micStream); const BUFFER_SIZE = 4096; // 4096 / 48000 ~= 85ms processorNode = audioContext.createScriptProcessor(BUFFER_SIZE, 1, 1); let lastSend = performance.now(); processorNode.onaudioprocess = (ev) => { if(!capturing) return; const input = ev.inputBuffer.getChannelData(0); pendingFloat.push(new Float32Array(input)); const now = performance.now(); // Send every CHUNK_DURATION_MS or if backlog large if(now - lastSend >= CHUNK_DURATION_MS || pendingFloat.length > 12){ flushPendingAudio(); lastSend = now; } }; source.connect(processorNode); processorNode.connect(audioContext.destination); // required for some browsers capturing = true; log('Microphone capture started'); } function stopMicCapture(){ if(!capturing) return; capturing = false; if(processorNode){ try { processorNode.disconnect(); } catch(_){} } if(micStream){ micStream.getTracks().forEach(t=>t.stop()); micStream = null; } pendingFloat = []; log('Microphone capture stopped'); } function mergePendingFloat(){ if(!pendingFloat.length) return null; let total = 0; for(const arr of pendingFloat) total += arr.length; const merged = new Float32Array(total); let offset=0; for(const arr of pendingFloat){ merged.set(arr, offset); offset += arr.length; } pendingFloat = []; return merged; } function downsampleToInt16(float32, inRate, outRate){ if(!float32) return null; if(inRate === outRate){ const int16 = new Int16Array(float32.length); for(let i=0;i{ if(capturing) flushPendingAudio(); }, 500); // ============================= // ASSISTANT AUDIO PLAYBACK // ============================= function playAssistantPcm16(b64){ try { ensureAudioContext(); const binary = atob(b64); const bytes = new Uint8Array(binary.length); for(let i=0;i { const i = assistantSources.indexOf(src); if(i !== -1) assistantSources.splice(i,1); }); const now = audioContext.currentTime; if(nextPlayTime < now) nextPlayTime = now + 0.01; // small lead src.start(nextPlayTime); nextPlayTime += audioBuf.duration; } catch(e) { log('Playback error: '+ e,'error'); } } function stopAllAssistantPlayback(){ try{ for(const s of assistantSources.slice()){ try{ if(typeof s.stop === 'function') s.stop(0); } catch(_){} try{ s.disconnect(); } catch(_){} } }catch(e){ /* ignore */ } assistantSources = []; // reset scheduling so future audio plays immediately when resumed try{ nextPlayTime = audioContext.currentTime; } catch(_){} try{ log('stopAllAssistantPlayback executed at '+ new Date().toISOString(), 'debug'); } catch(_){} } // ============================= // SESSION MANAGEMENT // ============================= async function startSession(){ stopped = false; setSessionButtonState('starting'); try { const response = await fetch('/start-session', {method:'POST'}); const result = await response.json(); if(!response.ok){ handleStartSessionError(result, response.status); return; } // Session started successfully updateStatusUI(result.status || result); openEventSource(); openAudioWebSocket(); log('Session started successfully'); } catch(error){ handleStartSessionError(null, error.message); } } async function stopSession(){ stopped = true; log('Stopping session…'); // Stop server session try { await fetch('/stop-session', {method:'POST'}); } catch(_){ } // Clean up connections closeConnections(); stopMicCapture(); updateStatusUI({state:'stopped', message:'Session stopped.'}); } function setSessionButtonState(state) { if(state === 'starting') { startBtn.disabled = true; stopBtn.disabled = false; startBtn.textContent = 'Starting…'; } else if(state === 'stopped') { startBtn.disabled = false; stopBtn.disabled = true; startBtn.textContent = 'Start Session'; } } function handleStartSessionError(result, errorInfo) { log('Failed to start: '+ (result?.status?.last_error || errorInfo), 'error'); if(result) updateStatusUI(result.status || result); setSessionButtonState('stopped'); } function closeConnections() { if(eventSource){ eventSource.close(); eventSource = null; } if(wsAudio){ wsAudio.close(); wsAudio = null; } } // ============================= // EVENT LISTENERS & INITIALIZATION // ============================= startBtn.addEventListener('click', startSession); stopBtn.addEventListener('click', stopSession); window.addEventListener('beforeunload', closeConnections); // Passive init (establish SSE early for existing session) openEventSource(); ================================================ FILE: Labfiles/11-voice-live-agent/python/src/static/style.css ================================================ :root { color-scheme: light dark; } /* Color tokens for adaptive theming */ :root { --c-bg: #ffffff; --c-fg: #1a1a1a; --c-muted: #666; --c-border: #c5c5c5; --c-accent: #1e4bb8; --c-status-idle: #f4f5f7; --c-status-ready: #e3edff; --c-status-processing: #fff3d6; --c-status-listening: #d9f7ec; --c-status-speaking: #fbe7ff; --c-status-error: #ffe0e0; --c-status-stopped: #ececec; --c-log-bg: #111; --c-log-fg: #e0e0e0; --c-scrollbar: #555; } @media (prefers-color-scheme: dark) { :root { --c-bg: #0f1115; --c-fg: #e5e7eb; --c-muted: #9ba3af; --c-border: #2d3846; --c-accent: #6ea8ff; --c-status-idle: #1d2530; --c-status-ready: #102a44; --c-status-processing: #4a3700; --c-status-listening: #073b2a; --c-status-speaking: #43214b; --c-status-error: #4a1111; --c-status-stopped: #2a323c; --c-log-bg: #0c0c0f; --c-log-fg: #d1d5db; --c-scrollbar: #444; } } body { font-family: system-ui, Arial, sans-serif; margin: 2rem; line-height:1.4; background: var(--c-bg); color: var(--c-fg); } h1 { color: var(--c-accent); margin-top:0; } footer { margin-top: 3rem; font-size: 0.7rem; color: var(--c-muted); } .container { max-width: 880px; } button { cursor: pointer; background: var(--c-accent); color:#fff; border: none; border-radius:6px; } button:disabled { opacity:.55; cursor: not-allowed; } #statusBox { transition: background .25s, color .25s; color: var(--c-fg); border:1px solid var(--c-border); } #log { background:var(--c-log-bg); color:var(--c-log-fg); padding:.75rem 1rem; border-radius:6px; max-height:260px; overflow:auto; font-size:.75rem; font-family: ui-monospace, SFMono-Regular, Menlo, Consolas, "Liberation Mono", monospace; border:1px solid var(--c-border); } #log .err { color:#ff8080; } #log .dbg { color:#8ab4f8; } #log .warn { color:#ffd479; } details[open] summary { font-weight:600; } small.hint { color: var(--c-muted); } ::-webkit-scrollbar { width:8px; } ::-webkit-scrollbar-track { background: transparent; } ::-webkit-scrollbar-thumb { background: var(--c-scrollbar); border-radius:4px; } ================================================ FILE: Labfiles/11-voice-live-agent/python/src/templates/index.html ================================================ Real Time Voice

Real-Time Voice Demo

This page streams your microphone audio to the server and plays the assistant's synthesized audio. Grant microphone permission when prompted. You can begin speaking when Status: ready.

Status: Idle
Click Start to begin a voice session.
Environment variables required (server)
  • VOICE_LIVE_MODEL: {{ env.VOICE_LIVE_MODEL }}
  • VOICE_LIVE_VOICE: {{ env.VOICE_LIVE_VOICE }}
  • AZURE_VOICE_LIVE_ENDPOINT: {{ env.AZURE_VOICE_LIVE_ENDPOINT }}
  • VOICE_LIVE_INSTRUCTIONS: {{ env.VOICE_LIVE_INSTRUCTIONS }}

Logs

Only recent ~250 lines kept in memory.

Voice live example front-end (browser PCM16 streaming & SSE playback).

================================================ FILE: README.md ================================================ # Microsoft Learning Azure AI Language Lab files for Azure AI Language modules ================================================ FILE: _build.yml ================================================ name: '$(Date:yyyyMMdd)$(Rev:.rr)' jobs: - job: build_markdown_content displayName: 'Build Markdown Content' workspace: clean: all pool: vmImage: 'Ubuntu 16.04' container: image: 'microsoftlearning/markdown-build:latest' steps: - task: Bash@3 displayName: 'Build Content' inputs: targetType: inline script: | cp /{attribution.md,template.docx,package.json,package.js} . npm install node package.js --version $(Build.BuildNumber) - task: GitHubRelease@0 displayName: 'Create GitHub Release' inputs: gitHubConnection: 'github-microsoftlearning-organization' repositoryName: '$(Build.Repository.Name)' tagSource: manual tag: 'v$(Build.BuildNumber)' title: 'Version $(Build.BuildNumber)' releaseNotesSource: input releaseNotes: '# Version $(Build.BuildNumber) Release' assets: '$(Build.SourcesDirectory)/out/*.zip' assetUploadMode: replace - task: PublishBuildArtifacts@1 displayName: 'Publish Output Files' inputs: pathtoPublish: '$(Build.SourcesDirectory)/out/' artifactName: 'Lab Files' ================================================ FILE: _config.yml ================================================ remote_theme: MicrosoftLearning/Jekyll-Theme exclude: - readme.md - .github/ header_pages: - index.html author: Microsoft Learning twitter_username: mslearning github_username: MicrosoftLearning plugins: - jekyll-sitemap - jekyll-mentions - jemoji title: Develop AI Language and Speech solutions on Azure markdown: GFM #markdown: kramdown #kramdown: # syntax_highlighter_opts: # disable : true ================================================ FILE: downloads/python/readme.md ================================================ ================================================ FILE: index.md ================================================ --- title: Develop AI Language and Speech solutions on Azure permalink: index.html layout: home --- This page lists exercises associated with Microsoft skilling content on [Microsoft Learn](https://learn.microsoft.com/training/paths/develop-language-solutions-azure-ai/) > **Note**: To complete the exercises, you'll need an Azure subscription. If you don't already have one, you can sign up for an [Azure account](https://azure.microsoft.com/free). There's a free trial option for new users that includes credits for the first 30 days. ## Exercises
{% assign labs = site.pages | where_exp:"page", "page.url contains '/Instructions/Exercises'" %} {% for activity in labs %} {% if activity.lab.title %} ### [{{ activity.lab.title }}]({{ site.github.url }}{{ activity.url }}) {% if activity.lab.level %}**Level**: {{activity.lab.level}} \| {% endif %}{% if activity.lab.duration %}**Duration**: {{activity.lab.duration}} minutes{% endif %} {% if activity.lab.description %} *{{activity.lab.description}}* {% endif %}
{% endif %} {% endfor %}