Azure provides Speech Services that let developers add advanced speech features to achieve complex functionality, including Speech-to-Text. With Azure Speech Services, we can convert speech to text.
In this blog post, let us check how the conversion from speech to text using Azure Speech Service in a Power Automate flow is done. The following steps can help achieve this requirement.
Recently, we got a requirement to transcribe audio attachments when an email is received with attachments in the shared mailbox and send back transcribed text.
Step 1: Create an ‘Automated’ flow
Create an ‘Automated’ flow using the trigger “When a new email arrives in a shared mailbox (V2)”
Step 2: Initialize the String variable
Add the ‘Initialize variable’ action to store the transcribed text of the audio attachment for further use.
Step 3: Initialize the Object variable
Add the ‘Initialize variable’ action to store the content of the audio attachment for further use.
Step 4: Get Attachments
Now, proceed to get the attached attachments of the received email. Utilize an ‘Apply to Each’ action to iterate through each attachment of an email and within that add a ‘Get Attachments (V2)’ action to get the current attachment of an email.
Expressions from the above image:
triggerOutputs()?[‘body/attachments’]
triggerOutputs()?[‘body/id’]
items(‘Apply_to_each_Attachment ‘)?[ ‘id’]
Step 5: Validate the content type of the audio attachment
Add the ‘Condition’ action to validate that the content type of the audio attachment must be in WAV format. Inside this set the value of the ‘Body Content’ variable using the content type and content of the current attachment audio file.
Expressions from the above image:
outputs(‘Get_Attachment_(V2)’)?[‘body/contentType’]
outputs(‘Get_Attachment_(V2)’)?[‘body/contentBytes’]
Step 6: Convert Speech-To-Text
Add ‘HTTP’ action and do the following configurations,
Method: Post
URL: https://eastus.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/ v1?language=en-US
Note: Make sure to use your region code at the highlighted in the URL.
Headers:
Accept: application/json;text/xml
Content-Type: audio/wav; codecs=audio/pcm; samplerate=16000;
Ocp-Apim-Subscription-Key: Azure API Key
Host: eastus.stt.speech.microsoft.com
Transfer-Encoding: Chunked
Expect: 100-continue
Body: ‘Body Content’ variable
Step 7: Get the text of the audio attachment
Add a ‘Parse JSON’ action to get the text of the audio attachment.
Schema:
{
“type”: “object”,
“properties”: {
“type”: {
“type”: “string”
},
“properties”: {
“type”: “object”,
“properties”: {
“RecognitionStatus”: {
“type”: “object”,
“properties”: {
“type”: {
“type”: “string”
}
}
},
“Offset”: {
“type”: “object”,
“properties”: {
“type”: {
“type”: “string”
}
}
},
“Duration”: {
“type”: “object”,
“properties”: {
“type”: {
“type”: “string”
}
}
},
“DisplayText”: {
“type”: “object”,
“properties”: {
“type”: {
“type”: “string”
}
}
}
}
}
}
}
Expressions from the above image:
body(‘HTTP_Azure_SpeechToText’)
Step 8: Append audio text in the variable
Add the ‘Append to string variable’ step and append the audio file text along with the file name.
We can use the above output for further use.
Conclusion:
By following these steps, you can effectively convert speech to text using Azure Speech Service in a Power Automate flow.