Transcribing Video Using Azure Speech Service IS Method

Convert Your Audio To Text

4.9/5

3718 customer reviews

Learn to transcribe videos with Azure Speech Service using a virtual machine. Convert speech to text and explore integration with OpenAI for queries.

How to transcribe a video using Azure Speech Service Full Demo

Added on 01/29/2025

Speakers

Add new speaker

Speaker 1: Hey everybody, thank you so much for checking out the channel and for watching this video. In this video I'm going to be focusing on how to use one of the Azure Cognitive Services called Speech Service to do transcription of a video. So I'm going to be showing you how to use the IS version of this which means that we're going to be using a virtual machine to do a lot of the heavy lifting and then the end result is going to be a transcription file which is going gonna show you the text from the video. I hope you enjoyed this video there's also another video that I'm going to be doing which is the serverless approach which means we're not going to use a virtual machine. So I hope you enjoyed this video and you also check out the other video and if you haven't done so please subscribe to the channel. Let's get to it. Alright so I am a cloud solutions architect so the architecture is very important. In other words, the map and what we're going to do and how we're going to get there. So let's go ahead and look at the end result and then we're going to start getting into the portal and the virtual machine. So the architecture is very important. What we have is we have a video, mp4 video. We want to use the Azure Cognitive Services and in this case we are going to use speech service and we're going to run that video through that speech service and we're gonna end up with a transcript.txt file. Pretty simple, right? And the question is, why would you wanna do this? Well, there's many reasons why you would wanna do this. Now, if you watch the video where I talked about OpenAI on Azure, where we take a text file and we run it through the OpenAI service, and we can ask questions about that file, then you can combine the two and say, well, I can take a video. I can do a transcription on the video. And now I have a text file and I can run it through the OpenAI service and I can ask questions about the video. That would be one case. I mean, there's many different scenarios on how you can use this approach, but this, I will leave it up to you. But let's go ahead and get to the actual meat of this. And I will make the scripts available for you to take with you, change them as you need them and implement it in your own way. I'm going to make them available on GitHub. So there's going to be a link in the description section down below where you can find all those scripts, you can use them, modify them, feel free to do whatever you need to do with them. Okay, a little bit about the speech service in Azure. Speech is a managed service that gives you the ability to do speech to text, text to speech, and speech translation and speech recognition as well. So there's a lot of different functions that you can do, however, the one that we are going to be using is the speech to text. And if you notice, one of the things that it says here is speech to text, which means that we cannot take the video and send it to speech. What we need to do is we need to have something in between that is going to take the video and extract the audio. we have the audio then we can send the audio to the speech service and then the speech is going to give us the text file. Okay so what we're going to use to extract the audio is a program called Movie PY and I'm going to show you once we go into the portal I'm going to show you how we're going to set this up. So what I have here is I have two virtual machines one called video transcription which is the one that I'm going to be using for this test. Assuming that my security groups are set up correctly, I grab the public IP address and I SSH into it. Brand new VM, so I'm going to do an update on it just to make sure that it's all set up correctly. Okay, once it's done, then we can look at the files. There are no files right now that I've created. So the first thing I'm going to do is I'm going to do the transcription. I'm going to set up 2audio so I'm going to create a file called 2audio.py and I'm going to copy this information here. I'm going to make this code available for you on my github so you can just copy paste as as well. So from movie.py, editor, import everything, the video file, you specify the file name of the video, I'm going to be moving a file called freddydubon.mp4, then I'm going to output a file called freddydubon.wav, and then this is just the video, specifying the video the file video file which is this now this is very important here the parameters are are very important the codec the PCM the FPS the frames per second which in this case is 16 16,000 here that means that is a 16,000 Hertz and then also that the other thing that is important is the bitrate the bitrate is 16,000. If you don't set this up then the audio may not be compatible with a speech service so you have to set it up as 16,000. That is very important. So once we have this file, hopefully the formatting is not too bad since I just copy-paste from this other place here. Alright, so if we try to run it, python3.2audio.py and it's telling me that there is no video file. So let me copy the video file and I'll be right back. Okay, the video file has been installed or has been transferred. So now I can move on to the next one. So I can do a Python 3.2 audio. And of course there's other things that need to be installed. The movie PY needs to be installed. So I need to do a PIP install, but I don't have PIP. So I need to install that. So app get pseudo app get installed. python3 pip okay and link selling movie py once that is done then we can run the command again Python let's do an LS just to see which files we have okay so only two files so python3 to audio run and what it does is it takes that file and it creates a wave file so now that wave file is the one that we can take and throw it into the speech service now we can move to the next one which is we are going to move it to the speech service okay so now that the file has been that the audio has been taken out of the video so now we have this wave file here the next thing is we're gonna transcribe it and here's the transcribed file that we're gonna use this is that one that you see here on the right hand side and this code will be available on the on the github so you can just go in and copy and paste and change it here where it says your key that is going to be your key from the speech service and I'm going to show you what excuse me where you can get that so if I try to run this file right now Python 3 transcribe what's going to tell me is because this is a brand new VM it's going to tell me that the import the module is not found so what I need to do is I need to do a pip install asher cognitive services speech what that's going to do is it's going to install the cop asher cognitive services so now if I run this file again Python 3 transcribe and it's gonna go in and it's probably gonna fail because the key that I'm using excuse me the key that I'm using is an old key so I don't think that key is going to work anymore. So we're just going to sit here and not do absolutely anything. So let's go to the portal and let's go to the Azure speech services which is already here and as you can see I don't have anything here so what I'm going to do is I'm going to create. It's going to ask me for a resource group so I will just use any resource group that I may have here region I'm going to select Central US the name Freddy's Beach okay pricing tier there's only two pricing tiers the free and the standard the if you look at pricing details, the pricing details it says that the standard, the free gives you five hours of free per month, five hours, five audio hours of free per month and so if depending on which one you want to do a speech-to-text is the one that we're going to be using so that that would cover for us here or you can do is pay as you go as well so speech-to-text the standard is one dollar per audio hour and so you can see the pricing here depending on what you're gonna do the one that we're gonna use is we're going to use the free free tier because we what we have is that one's gonna work fine for us so next thing is gonna be the network right now I'm gonna leave it open to all networks including the intranet review and create and create deployment succeeded I have an endpoint I also I can go to keys and endpoints and here is the key that we are going to use copy this and here is where you're going to put your key run it again so this video that we're transcribing it's about a 10 minute video maybe a little bit longer than 10 minutes so it will take a little bit to transcribe the video so just be patient eventually it'll come back with a transcription that text file this transcription that text is the transcript of the video so it's just a text so in this case it's one of my videos where I explain how chat GPT saved my dad's life so the point is that as you can see the transcription is here one of the things that you have to understand is that this transcription does not contain any time time code or time stamps it's just one big text file. Now one of the things that that I want to point out in the in the code here is if you go to the code itself you can see that I did not specify a an endpoint I did specify my key which is here of Of course, you're going to have to specify your key. But I didn't specify the endpoint. And the reason is, I did specify the service region. And because you installed the Azure Cognitive Services package, that will have a standard endpoint. So if you don't specify an endpoint, it'll use the standard endpoint for that region. So it'll take the central US, it'll get the standard endpoint, and it'll use the key to log in. And so in other words, this code works not because I specify the endpoint, just because it knows what the standard endpoint is. So for now, what can we do with this? I have another video where I talk about ChatGPT where it takes a text file and then you can ask questions about that file. So can you take this text file and use that other code to send it to ChatGPT and ask questions about the video? Of course, that's the whole idea. So you can take this transcription, that text file, put it into the other code that I gave you in the other video, and now you have a workflow. You have where the video turns into text, you can send it to ChatGPT and you can ask questions about that video. using ChatGPT. I hope you enjoyed this video. This is the IIS approach, which means that we're using a virtual machine to do a lot of the work. There's also another video that if you haven't watched that one, please do so, which is a serverless approach. That one, we use Azure Logic Apps to be able to do the orchestration of the function. So if you haven't done so, please check that video out as well. If you haven't done so, subscribe to the channel And until next time, take care.

Summary

Generate a brief summary highlighting the main points of the transcript.

Generate

Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate

Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate

Enter your query

Submit

Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate

Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate

Back

Forward

{{ Math.round(speed * 100) / 100 }}x

Select Audio file

Convert Your Audio To Text

Secure and Encryption, NDA

4.9/5 3718 customer reviews

1/730

Verified Order

“I needed an interview transcribed accurately and I was happy with the quick turnaround. ”

Jen

Jul 20, 2025

“Very accurate transcription, fast service, easy to use and order, thank you!”

Gabby

Jul 15, 2025

“I am beyond happy with this service, which I am using it produce interview transcripts for my dissertation research. The interface is easy, the customer service was prompt and informative, the transcript is accurate, and the pricing is wonderful. I will recommend GoTranscript to anyone who is in need of affordable human-powered transcription services.”

Justin McDonald

Jun 29, 2025

“great work. quick and professional”

christian oradesky

Jun 28, 2025

We Trust in Human Precision

Value-Driven Pricing

Trusted by Global Leaders

GoTranscript

24/7 Customer Support