Deploy Faster Whisper Server for Rapid Transcriptions

Convert Your Audio To Text

4.9/5

3726 customer reviews

Learn how to deploy Faster Whisper Server for fast and efficient speech-to-text transcriptions using OpenAI API. Support for GPU and Docker included.

Faster Whisper Server - Local Transcription with AI

Added on 01/29/2025

Speakers

Add new speaker

Speaker 1: When it comes to speech-to-text models, then OpenAI's Whisper is sort of a de facto standard now. It's an open-source, cloud-agnostic and extensible platform or sort of a model for serving state-of-the-art speech-to-text requirements. It has been developed by OpenAI and it is quite established by now. But in this video, I am going to share with you a brand-new project which is called as Faster Whisper Server. It is an OpenAI API-compatible transcription server that uses Faster Whisper as its backend. It has support for both GPU and CPU. It is quite easily deployable using Docker and we will be deploying it shortly on our local system. It is configurable through environment variables. You can specify your OpenAI's API key and all that stuff. Also, you can have your audio file transcription via HTTP methods like POST or you could simply do it through Python interpreter and that is what we are going to do. So, without wasting any further time, let me take you to my local system and before even I do that, let me share something really cool. So, the VM, the virtual machine and the GPU which I am going to use for this video is very generously sponsored by our friends at MastCompute. If you are looking to rent KPUs at a discounted price, then I will drop the link to their website in video's description, very, very affordable rates. I will also give you in video's description a discount coupon code. So, go to their website, use that code and you should be able to get 50% discount if you rent out the GPUs which are already very discounted. So, huge shout out to them for being so generous with this VM and this GPU. Now let me take you to my local system where I am running this Ubuntu 22.04 and as I mentioned, I have one GPU card of NVIDIA RTX A6000 which has 48 GPU of VRAM. Let me clear the screen. Also, I will be installing everything in my virtual machine, Conda. The version is 24.1.2. If you don't know what Conda is, please search the channel. You should be able to find it on the channel in any video as how to install, very straightforward. Let's clear the screen and now let me create a virtual environment with Conda and I'm just going to call it Faster Whisper. It could be done any second and now just press Y here. It is going to install everything for you. That is done. Let's activate this and step into this Faster Whisper and you will see that it is now in parentheses on the left hand side. Let's clear the screen and now let's git clone the repo of this Faster Whisper server and I will also drop the link to it in video description. So, don't worry about it. Should be quite quick. That's already done. Let's cd into it. Let's clear the screen and then do ls-ltr. You will see that we have some sample audio wave file which we will be using for our demo and there is a Docker Compose file and also a Dockerfile layer. One is for CUDA for GPU. If you are using CPU, then replace CUDA with CPU. So, as I have GPU, so I'm going to use this one. Let's clear the screen and now in order to run it, all you need to do is to run this command which I'm going to paste now. What this does is it just offloads everything to GPU. It uses local port 8000 and it creates a local Docker volume, maps it here and then run it with our CUDA. Let me run it. You see it was unable to find the image locally, so it is downloading all the images. This makes it quite easy that everything remains in one place in these layered containers. So, let's wait for it to finish. As you can see, especially this, I guess this is a model which is downloading at the moment. So, let's wait for it to finish downloading everything and then we will proceed further. It has pulled all the layers, extracted them and now it is waiting for application to start. And the application is started at this port 8000 on our local system. So, I'm going to let it run and then we will open another terminal window and try to access it from there. So, while this one is running here, I'm just going to go into my another terminal window of the same server and then if you would need to set to environment variable first. First, you need to set this OpenAI's API key. For that, you would need to go to platform.openai.com and unfortunately, it's a paid option. So, you would have to put in like $10 in order to get started and then you will get an API key which you need to export here plus another variable you would need to put in is the base URL. We will set it to localhost at port 8000 where our server is running. So, let me set both of these environment variables and then I will clear the screen and then we will proceed further. So, I have set the environment variables. Next, let's install our OpenAI. I'm just going to say pip install OpenAI dash dash upgrade if it is not already installed. That is done. Let's clear the screen and now let's fire up our Python interpreter. That is done and now let's first import our OpenAI. So, I'm just going to go with this. That is done and now let's initialize. So, here you can either pass on your OpenAI's API key and environment variables to your server or you can pass on your OpenAI's API key and environment variables to your server. or you could just get it from there. So, for the sake of simplicity, I'm just going to put my OpenAI's API key here and base URL here and then I will clear the screen. I have cleared the screen after setting that environment and now let's grab our audio file. So, if you remember I showed you that in the same folder we had this audio.wav file. So, let me open it and save it in this variable audio file. That is also done and now with the help of this client.audio.transcription, we'll be creating the transcript from that audio file by using this distill large version 3 model from OpenAI by using our own locally running faster whisper server. That was done. It was fairly quick and now let's print out the transcript.text and since when I printed that transcript.text, it has given me this text which says that when you call someone who is thousands of miles away, you are using a satellite. Let me also show you the audio. This is audio. Let me play it. You see the voice quality is not that good in that audio, but still the model was able to do it, which is quite good. So, if I show you the terminal log, you see that it has also shown that it has the faster whisper server has done that transcription and it just took under 7 seconds or maybe 5.7 seconds of audio in this 0.83 second. How good is that? And then if I go here, this is what it did. Amazing stuff. So, quite fast, I must say. I mean, I have done the transcription with the OpenAI direct one. This one is quite fast. So, really good stuff. Let me know what do you think about this. I will drop the link to its GitHub repo in video's description. If you like the content, please consider subscribing to the channel and if you are already subscribed, then please share it among your network as it helps a lot. Thanks for watching.