Advent of 2024, Day 9 – Microsoft Azure AI – Speech SDK with Python
This article is originally published at https://tomaztsql.wordpress.com
In this Microsoft Azure AI series:
- Dec 01: Microsoft Azure AI – What is Foundry?
- Dec 02: Microsoft Azure AI – Working with Azure AI Foundry
- Dec 03: Microsoft Azure AI – Creating project in Azure AI Foundry
- Dec 04: Microsoft Azure AI – Deployment in Azure AI Foundry
- Dec 05: Microsoft Azure AI – Deployment parameters in Azure AI Foundry
- Dec 06: Microsoft Azure AI – AI Services in Azure AI Foundry
- Dec 07: Microsoft Azure AI – Speech service in AI Services
- Dec 08: Microsoft Azure AI – Speech Studio in Azure with AI Services
Besides Python Speech SKD there are multiple languages supported with Speech SDK. Python SDK will expose you many of the Speech service capabilities for developing speech-enabled applications. Ideal for scenarios for (near) real-time and non real-time cases by using other Azure services as storage, streams and analytics
You will need to have the Python installed and additional packages available:
pip install azure-cognitiveservices-speech
pip install scipy
After that, you will need to To tailor the sample to your configuration, use search and replace across the whole sample directory to update the following strings:
YourSubscriptionKey
: replace with your subscription key.YourServiceRegion
: replace with the region of your subscription is associated with. For example,westus
ornortheurope
.
You will also need to have the *.wav file in order to get the transcription. I will be using the file (bedd11f0-b58d-11ef-944e-db4086e231ae.wav) from this post:
After you have all the Python packages available and keys available, you can run the python code:
#!/usr/bin/env python
import time
from scipy.io import wavfile
import azure.cognitiveservices.speech as speechsdk
import sys
speech_key, service_region = "YourSubscriptionKey", "YourServiceRegion"
conversationfilename = "myfile.wav"
def conversation_transcription():
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
channels = 1
bits_per_sample = 16
samples_per_second = 16000
wave_format = speechsdk.audio.AudioStreamFormat(samples_per_second, bits_per_sample, channels)
stream = speechsdk.audio.PushAudioInputStream(stream_format=wave_format)
audio_config = speechsdk.audio.AudioConfig(stream=stream)
transcriber = speechsdk.transcription.ConversationTranscriber(speech_config, audio_config)
done = False
def stop_cb(evt: speechsdk.SessionEventArgs):
print('CLOSING {}'.format(evt))
nonlocal done
done = True
transcriber.transcribed.connect(lambda evt: print('TRANSCRIBED: {}'.format(evt)))
transcriber.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
transcriber.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
transcriber.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
transcriber.session_stopped.connect(stop_cb)
transcriber.canceled.connect(stop_cb)
transcriber.start_transcribing_async()
_, wav_data = wavfile.read(conversationfilename)
stream.write(wav_data.tobytes())
stream.close()
while not done:
time.sleep(.5)
transcriber.stop_transcribing_async()
def conversation_transcription_from_microphone():
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
transcriber = speechsdk.transcription.ConversationTranscriber(speech_config)
done = False
def stop_cb(evt: speechsdk.SessionEventArgs):
print('CLOSING {}'.format(evt))
nonlocal done
done = True
transcriber.transcribed.connect(lambda evt: print('TRANSCRIBED: {}'.format(evt)))
transcriber.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
transcriber.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
transcriber.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
transcriber.session_stopped.connect(stop_cb)
transcriber.canceled.connect(stop_cb)
transcriber.start_transcribing_async()
while not done:
print('type "stop" then enter when done')
stop = input()
if (stop.lower() == "stop"):
print('Stopping async recognition.')
transcriber.stop_transcribing_async()
break
I am getting the text transcription with the code provided.
Tomorrow we will look more into “Language + Translation” service.
All of the code samples will be available on my Github.
Thanks for visiting r-craft.org
This article is originally published at https://tomaztsql.wordpress.com
Please visit source website for post related comments.