Leveraging AI for Speech-to-Document Conversion
Explore AI and Microsoft Power Platform to convert speech to documents using Whisper API and Azure OpenAI. Transform voice data efficiently.
File
Speech to Text to Document AI in Power Platform Whisper AI GPT with Azure OpenAI
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: Hello everyone. In this video, I will show you how we can leverage the power of AI and the Microsoft Power Platform to convert speech to a document. We will leverage OpenAI's Whisper API to convert speech to text, and then take advantage of the new Azure OpenAI GPT action to create content for the document based upon the speech to text conversion. So let's go ahead and check this out in action. OpenAI announced the Whisper model performs speech to text transcriptions and translations. The speech to text API provides two specific endpoints that we can take advantage of. To leverage the speech to text capabilities in Power Apps, we will create a custom connector. I'll create a new custom connector from blank. I'll give my connector a name. I can upload an icon for this connector. The host will be api.openai.com. Security, API key, parameter name, authorization. Authorization. The definition will create a new action. The operation ID, I'll call it speech to text. Request, I will import from sample. It's a post request to the URL V1 audio transcriptions. Headers, content type is multi-part form data. I'll click import. If I head over to Swagger editor, I need to now insert the parameters related to form data. And I will also define that it consumes multi-part form data. That completes my updates in the Swagger definition. I'll turn this off. For the response, I'll click on default. Click on import from sample. And the response comes back in JSON format, which would include an object with the property text. I'll click import. And I will go ahead and create the connector. Once my custom connector is created, I can start leveraging this in PowerApps. I'll head over to create and create a new blank canvas app. I'll give my app a name and click create. In this app, I will head over to data, go to add data and search for my custom connector. I'll select my custom connector. I will need to insert my API key. The key would be in the format bearer space, the API key, and I'll click connect. Once I have my connection established, I will insert the microphone control. This allows me to record audio. This allows me to record audio. And I will add a button on select of this button. I will call the API through my whisper custom connector. So whisper dot speech to text. Here it needs the file reference. That is the audio property from the microphone control that I added. So I will copy the name of the control. Point to that control dot audio. Comma, the name of the file. I'll give it a name, audio dot the audio that's generated by the microphone control is in WBMP format. Comma, the model, which is whisper hyphen one. And finally, I need to provide the content type, which is multi part slash form hyphen data. I'll close the curly brace and close the function. This will go ahead and call the whisper API, and it will provide me a response that I will store in a variable. And to showcase the data from the response, I will add a label control. The text property for this label control, I will leverage the variable dot the property text. Let's test this out. Hello, my name is Reza Dorani.

Speaker 2: The whisper API converts speech to text.

Speaker 1: I will take advantage of a new model in AI builder called Azure Open AI service. This can create text, answer questions, summarize documents, and more with GPT. It also comes with a standard set of templates. In my scenario, I will try and create a document. So I'll use the create blog post template. I will create a new flow. I will create a new flow. I'll create the flow from blank. I'll delete the trigger action for the flow. Pick PowerApps and pick the new PowerApps V2 trigger action. I will provide two pieces of input to this trigger. The first of type text. I'll add a second input of type email. I'll add a new step. Pick AI builder. And here is the new create text with GPT on Azure Open AI service action. I'll select this and leverage create instructions. In this scenario, I'll pick create a blog post and I'll click use instructions in flow. Now, this has a set of instructions that comes pre-baked as part of the template. And this is something that we can change. I'll update the instructions as follows. Try to create a blog post on the topic below. The blog post should be less than one page. And it must be in HTML format with HTML table and inline styling if applicable. And this is where I would like to provide the dynamic content input about the blog post that the Azure Open AI service GPT action should generate. I will pick from dynamic content from my trigger action, the input. Next, I will create a file in OneDrive. I'll create this at the root. I can give my file a name. I'll call this GPT blog post dot HTML. The file content would be the dynamic property text from Azure Open AI service. I would like to convert this file. The file reference would be from dynamic content, would be from dynamic content, property ID from the create file action. And I would like to convert this to a PDF that I would like to send as an email attachment to the user who is calling the flow. And I have the email input property. Subject, I'll call this blog post from GPT. The body, please find attached PDF document for topic. And here. I will put the dynamic content input from my trigger action. To attach the document, I'll go to show advanced options. Attachment name would be file name. Attachment content would be file content. I'll give this flow a name and click save. I'll close the flow. This will add the flow association to my power app. My button control. I've changed the text of the button now to speech to document. Currently, we are calling the whisper API to convert speech to text. The next step, I will call the flow, which is GPT generate document and call the run method of the flow. The first property is the text. That's the instruction input of the blog post that I would like to generate. I'll leverage var response dot text. That's the output of speech to text API, comma. The email would be user dot email. And right after this, I will leverage the notify function to notify the user that the document generation request has been sent. Notification type success, and I will show this for three seconds. Let's try this out. I'll click preview. Best practices in building canvas power apps. I'll click speech to document.

Speaker 2: Here is the email that I have received blog post from GPT based on the instruction provided,

Speaker 1: and here is the attached document. Your instructions can be dynamic. The type of content that it can generate can be dynamic. List of countries and capitals of the world. And here is the response. I can also attach an audio file. In this case, I have a short recording of a conversation that takes place in a meeting and the instructions here. I have modified it to generate talking points based on the response that we will get from the speech to text API that will leverage the audio file. I click generate talking points. Here's the speech to text response. It's a conversation between two users about a project. And here's the response, which is a document that has the action points from the audio file. Here, I'm leveraging speech to text and the Dali model to generate an image. A cow driving a car. Speech gets converted to text. Text gets converted to image. Pigs are flying. If you enjoyed this video, then do like comment and subscribe to my YouTube channel and thank you so much for watching.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript