Advent of 2024, Day 8 – Microsoft Azure AI – Speech Studio in Azure with AI Services
This article is originally published at https://tomaztsql.wordpress.com
In this Microsoft Azure AI series:
- Dec 01: Microsoft Azure AI – What is Foundry?
- Dec 02: Microsoft Azure AI – Working with Azure AI Foundry
- Dec 03: Microsoft Azure AI – Creating project in Azure AI Foundry
- Dec 04: Microsoft Azure AI – Deployment in Azure AI Foundry
- Dec 05: Microsoft Azure AI – Deployment parameters in Azure AI Foundry
- Dec 06: Microsoft Azure AI – AI Services in Azure AI Foundry
- Dec 07: Microsoft Azure AI – Speech service in AI Services
Speech studio (available at URL: https://speech.microsoft.com/portal) is a set of UI-based tools for building and integrating features from Azure AI Speech service (available in Azure portal) into your applications using no-code approach. You can also create projects by using and referencing the assets and services using Speech SDK, the Speech CLI, or the REST APIs.
Languages supported with the platform are the following
C# | Windows, Linux, macOS, Mono, Xamarin.iOS, Xamarin.Mac, Xamarin.Android, UWP, Unity |
C++ | Windows, Linux, macOS |
Go | Linux |
Java | Android, Windows, Linux, macOS |
JavaScript | Browser, Node.js |
Objective-C | iOS, macOS |
Python | Windows, Linux, macOS |
Swift | iOS, macOS |
In Speech Studio, you will find the following Speech service features that you can create or add to your project:
- Real-time Speech-to-Text: Effortlessly test speech-to-text functionality by dragging audio files into the tool—no coding required. Speech Studio offers a demo to explore how this feature processes your audio samples.
- Batch Speech-to-Text: Easily test batch transcription capabilities to process large volumes of stored audio and receive results asynchronously.
- Custom Speech: Develop speech recognition models customized to specific vocabularies and speaking styles. Unlike the standard recognition model, custom speech models provide a competitive edge as they remain exclusive to your use.
- Pronunciation Assessment: Analyze speech pronunciation and provide feedback on accuracy and fluency. Speech Studio features a sandbox for quick, no-code testing of this capability.
- Speech Translation: Seamlessly translate speech into your chosen languages with minimal latency, enabling swift multilingual communication.
- Voice Gallery: Design apps and services with natural-sounding voices. Choose from a diverse selection of languages, voices, and styles to create expressive, human-like neural voice experiences.
- Custom Voice: Create unique, personalized voices for text-to-speech. By providing audio files and matching transcriptions in Speech Studio, you can integrate custom voices into your applications.
- Audio Content Creation: Generate text-to-speech audio without coding. Use the audio as-is or as a foundation for further customization, crafting natural-sounding content for audiobooks, news, video narrations, chatbots, and more.
- Custom Keyword: Define custom keywords or phrases to voice-activate products. Create a keyword in Speech Studio and generate a binary file compatible with the Speech SDK for your application.
Let’s put the translation to the test. FYI, this is the original text read in Slovenian language:
“Ob cesti je stal star mlin, ki je že leta sameval, a še vedno ohranjal svoj čar. Ljudje so se zbrali na trgu, kjer so prodajali domače pridelke in se pogovarjali o vremenu. Otroci so tekali naokrog in se smejali, medtem ko so njihovi starši klepetali ob skodelici kave. Zrak je bil poln vonja po sveže pečenem kruhu.”
With the translation to english: “By the side of the road stood an old mill that has been lonely for years, but still retains its charm. People gathered at the market to sell local produce and talk about the weather. The kids ran around laughing while their parents chatted over a cup of coffee. The air was filled with the smell of freshly baked bread.”
No to mention, that the Speech translation was immediate and flawless. Of course, I used a directed microphone and no background noises.
Original audio in Slovenian language recorded with immediate live translation to english
Translated audio to english:
Part of the translation is also speech recognizer, with Python SDK, the sample code could be used:
import azure.cognitiveservices.speech as speechsdk
# Creates an instance of a speech config with specified subscription key and service region.
# Replace with your own subscription key and service region (e.g., "westus").
speech_key, service_region = "YourSubscriptionKey", "YourServiceRegion"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
# Creates a recognizer with the given settings
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)
print("Say something...")
# Starts speech recognition, and returns after a single utterance is recognized. The end of a
# single utterance is determined by listening for silence at the end or until a maximum of about 30
# seconds of audio is processed. The task returns the recognition text as result.
# Note: Since recognize_once() returns only a single utterance, it is suitable only for single
# shot recognition like command or query.
# For long-running multi-utterance recognition, use start_continuous_recognition() instead.
result = speech_recognizer.recognize_once()
# Checks result.
if result.reason == speechsdk.ResultReason.RecognizedSpeech:
print("Recognized: {}".format(result.text))
elif result.reason == speechsdk.ResultReason.NoMatch:
print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
cancellation_details = result.cancellation_details
print("Speech Recognition canceled: {}".format(cancellation_details.reason))
if cancellation_details.reason == speechsdk.CancellationReason.Error:
print("Error details: {}".format(cancellation_details.error_details))
Tomorrow we will look more in details into speech SDK with Python.
All of the code samples will be available on my Github.
Thanks for visiting r-craft.org
This article is originally published at https://tomaztsql.wordpress.com
Please visit source website for post related comments.