Convert Speech to Text with IBM Watson: A Step-by-Step Guide
Learn to convert speech to text using IBM Watson API, code in Python, and utilize Visual Studio Code for implementation in this detailed tutorial.
File
Speech To Text with IBM Watson Python - codeayan
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: Welcome everyone. In this video, we are going to learn that how we can convert a speech to text using IBM Watson Speech to Text API. To get started, at first I need to open my IBM Cloud account. I need to log in here. If you don't have an IBM Cloud account, then you can create one. It's just like creating an account on Facebook or Instagram. After logging in, we will get this page which is the dashboard. Here we need to avail one service which is the Speech to Text API. To do that, I will copy this link from here and paste it inside my browser. And we can see that this is the API that we need to avail. I will choose the free plan, don't need to pay any money for this. I just need to press this create, and they will create a service for us for free, ok. Here we can see that this is the service name Speech to Text Enqueue, and it's active now. I will go to the manage, and here we have the credentials which are our API key and URL, the endpoint URL, and our plan is lite plan, where we don't need to pay any money. We will write the main code in Python. I am using Visual Studio Code, ok. So inside this PyCode folder, I have a file already which is testaudio.wav. Let's play this first. In this video, we are learning that how to convert speech to text using IBM Watson, ok. So we will convert this speech into text. I need to install the Jupyter extension. I will also mention a link where you can check that how to install Visual Studio Code in Windows or any OS you are using. So this extension I need to install in my VS Code. It will take some time, ok. It's installed now. Now here in my PyCode folder, I will create one file which is hello.ipynb, ok. Now here we will write the code in Python. So at first, we need to install the IBM Watson here. To do that, I need to write pip install ibmwatson and shift enter. We can see here now that it is successfully installed. Now I need to import some modules from IBM Watson, ok. We will install these two modules, shift enter and the import is done. This speech to text v1 class provides us the functions to complete the task of converting speech to text and here the imAuthenticator is used for authentication purpose while using an IBM service. Next we need to create two variables to store the API URL and the API key. So API URL equal to now I need to copy the URL from here and I need to paste it here, done. Another variable myKey equal to myApiKey. I will just copy from here and paste it inside the string, done, ok. Both link and API keys are saved now in my variables. Now I need to create a variable auth, I am naming it as auth. Now I will write here imAuthenticator and it needs one parameter which is our API key. It will authenticate using my key to check if valid user is using it or not. I need another variable speech to text, here I will write speech to text v1 and its parameter which is Authenticator equal to my auth variable, done. Now I need to set the service URL. To do that I will write speech to text dot set service URL and our URL which is saved here API URL, ok. I have done one mistake here that is why it is showing yellow line, done. Now I need to press shift enter and the code is saved and run successfully. Now I need to open the audio file which is testaudio.wav. To do that I need to write with open the name of my file which is testaudio.wav. This file is in the same hierarchical level with my hello.ipay.env. So if I just write the name of the file it will work, I do not need to write its path. Next I need to specify the mode which is rb, it will help to read the file in binary mode as wav, ok. Now here I will create a variable response, it will hold response speech to text dot recognize here inside the brackets at first I need to give some parameters and the content type is audio slash wav as you can see that the audio extension is dot wav, done. And I need to create another variable which is recognize text inside this variable or main end product the text version of the speech will be stored. This line extracts the recognized text from the response received from the IBM Watson speech to text service and assigned it to the variable recognize text. Now I will press shift enter to run this code. I have some problem here in the line 4, it should be results not result that's the problem. I will run it again. It's done successfully now let's print the recognized text to see our text version of the speech and we can see that we have the text in this video we are learning that how to convert speech to text using IBM Watson that I said in this video in this audio. In this video, we are learning that how to convert speech to text using IBM Watson. Okay, so everything is going well in the video description. I will add the explanation of each line of the code in details there you can read and I will also add the GitHub repository link of this code in the video description. And that's all for the video. Thank you for watching.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript