Exploring Whisper: Cloud-based Transcriptions by OpenAI (Full Transcript)

Discover how OpenAI's Whisper cloud service transcribes audio with ease. Learn to use it for podcasts effortlessly and cost-effectively.

Download Transcript (DOCX)

Speakers

Add new speaker

Speaker 1: Hi, I'm Kuba Mrugalski. Today I'm going to tell you about a service called Whisper. To be more specific, about its cloud version. First of all, what is Whisper? It's a new tool from OpenAI, which is used to replace human speech with text. To put it simply, it does transcriptions. Transcriptions from one of 99 languages. And what's interesting, there's also Polish. And we're doing really great with Polish. In the past, with the help of Whisper, it was possible to do it on a local computer, because Whisper is an open source project. So it was possible to download it, install it and use it offline. But why am I talking about it now? Why is it something new now? Because OpenAI has released Whisper as one of its cloud tools. And thanks to this, you can upload your MP3 files to the cloud and get a transcription of what was said, for example, in a podcast. I'll show you now how it's used in practice. I have a file here that I'm going to work on. This is a guide, a backup. And it has exactly 18 MB. Why is it so important? Because Whisper in the cloud version has a limit of up to 25 MB per package. What does that mean? If you send more data, it won't process them. So what do you do if you have a really long recording and you want to process it? All you have to do is cut them into packages up to 25 MB each. Okay, now what do I have to do? Send this MP3 file to the server. I can do it using the CURL tool. So I'm going to send it to the server using the POST method, which is minus X. And I'm typing in POST. Where do I send it? Well, I'm sending it under this URL, which is api.openai.com, version 1, audio transcriptions. Everything I type in will be in the description of this video. I'm giving it a new line to make it easier to read. And now it's time to authorize this API in some way. This is done using the BEAR token sent in the header. In the header, which is a header, which is a capital letter H. I'm typing something like this. Authorization BEAR. And now I'm giving myself a key here. Why did I write it this way, and I didn't paste the key of this S there? Well, so that you, as a viewer, couldn't see what private keys I have for OpenAI. In my case, in the variable key, this key is simply saved, which is to be set here. I'm giving it a new line now. And I'm adding another variable to the form. Specifically, I define that what I'm sending will be a form. I define it in the header. The header is called CONTENT TYPE. Or something like MULTIPART FORM DATA. So I'm sending a form with various types of data. One of these data will be the model I'm going to work on. And the other one will be the file I'm going to upload. I'm giving it a new line now. And I'm defining the first field. Minus a capital F. F is not short for FIELD, but for FORM. Because it's a form field. And now I'm defining what this field is called. The field is called FILE. And what does it contain inside? Well, inside it contains this file. But the important thing is that sometimes people get confused. And if I defined it this way, I would send a variable text called FILE with a value of GUIDE.BKP.MP3. That's bad. I want to send the content of this file. So I'm giving it a MAUPA here. MAUPA means read the content of this file and upload it to the server. Okay, new line. And now I'm sending the second field. And this will be a text field called MODEL. And I set it to WHISPER1. WHISPER1. Because there are no other models. And that should be enough. If we have it defined this way, I press ENTER. And I just have to wait. If I'm not mistaken, I think I'll have the answer by 1 minute. Here, of course, the speed depends on how fast our Internet connection is, and mine is tragic. And as you can see, we got a transcript of everything I said. This is a test recording. It's not done yet, so there will probably be a few drops, other such things. But let's see how it starts. And we can see here, it starts with Hi, I'm Kuba Mrugalski, welcome to the next episode. And here I made a drop. And again. Hi, I'm Kuba Mrugalski, welcome to the next episode of my podcast, Guide to the Unknown. If you don't know this podcast, it's also linked here in the description. And this way, we got such a big text field. It could be obtained in a much better format. To do this, I edit the previous question and add only one thing here. New line. And before I tell you, one more small announcement. Soon I'm starting with a very interesting project related to AI. It will be an online course, but it will be conducted in a very unusual way. Simply put, it will be a so-called cohort course. What is it? Google it. But if you are interested in an AI course, in how programmers can enrich their projects with artificial intelligence, how they can cooperate with it, how they can get a no-code tool to work, then sign up for the waiting list, and I'll let you know when the course is ready. Meanwhile, we'll add one more field here. The field, that is, minus F, minus F, and this field is called Response Format. And here I enter SRT. SRT, does it tell you something? Yes, this is the famous format with subtitles for movies. Now we press Enter and wait a moment. We see that what we got is easier to read and edit. So it's every word I say, every sentence I say, along with a time marker, in which second, in which minute I started to say it, when I finished, and in this way I can edit it in any way I want. And in this way I can import it, for example, to my YouTube video. And now, the most important thing. How much does it all cost? Well, if it's in the cloud, it's probably expensive. Well, no. It's cheaper than ordering, for example, a transcript to any, I don't know, student or some company that deals with transcription. Why? One minute of transcription costs, in the case of Whisper, three cents. So it's extremely cheap. And as you can see, the use of this is also very simple. Moreover, a lot of graphic overlays and drag and drop. The only thing you have to add is your API key. Try to play Whisper yourself. All the recommendations given by me are in the description of this video. All you have to do is copy them, execute them and see how it works, for example, with your podcast or with your, let's say, recordings from studios that you recorded on a tape recorder. Maybe it will come in handy for you. That's all for today. I invite you to the next video. See you. Bye-bye.

Summary

Generate a brief summary highlighting the main points of the transcript.

Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Key Takeaways

Extract key takeaways from the content of the transcript.

Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Enter your query

{{ secondsToHumanTime(time) }}

Back

Forward

{{ Math.round(speed * 100) / 100 }}x

{{ secondsToHumanTime(duration) }}

Select Audio file