20,000+ Professional Language Experts Ready to Help. Expertise in a variety of Niches.
Unmatched expertise at affordable rates tailored for your needs. Our services empower you to boost your productivity.
GoTranscript is the chosen service for top media organizations, universities, and Fortune 50 companies.
Speed Up Research, 10% Discount
Ensure Compliance, Secure Confidentiality
Court-Ready Transcriptions
HIPAA-Compliant Accuracy
Boost your revenue
Streamline Your Team’s Communication
We're with you from start to finish, whether you're a first-time user or a long-time client.
Give Support a Call
+1 (831) 222-8398
Get quick answers and support.
Get a reply & call within 24 hours
Let's chat about how to work together
Direct line to our Head of Sales for bulk/API inquiries
Question about your orders with GoTranscript?
Ask any general questions about GoTranscript
Interested in working at GoTranscript?
PO setup, Net 30 terms, and .edu discounts
Speaker 1: The most popular tool to transcribe videos is OpenAI Whisper. So should you use it? No. Faster Whisper is way faster. But Whisper X is even faster. Wait, there is also insanely fast Whisper. And Whisper S2D which claims to be fast too. Yeah, there's a lot implementations of OpenAI Whisper which are faster and I've tested them all. Because I wanted to generate captions to stream archive website that I made. To save your time. The best implementation is Whisper S2D with CTranslate2 backend. There are faster solutions such as Whisper S2D but with TensorRT backend. It's two times as fast but I've noticed that results are two times as bad. Repeated words over and over, wrong punctuation and many typos. There is one project that I want to highlight. Whisper S2D Transcriber. It has just six stars on GitHub and it will give you GUI that has Whisper S2D pre-configured with CTranslate2 backend. I've used this tool to generate all captions on my site and it worked flawlessly. Installation is pretty straightforward. You need Python, Git, Git large file storage and CUDA toolkit. Although you can also use CPU instead of GPU to perform transcription. If you have all of these tools create a new virtual environment for Python and then execute scripts. After it's done you will see GUI. Here you can add folder if you want to batch transcode several files and set up couple of parameters. For CPU transcribe you should use int8 quantization and for GPU float32 or float16. I've used float16. Now we need to choose size of model and batch size. Both of these are super important and you will need to do benchmarks on your own machine to find the best combo. Larger size of model equals better quality of transcriptions. Especially if they are not in English. Larger batch size means that file will be split into more chunks. Therefore speeding up process of transcription. Because GPUs are utilized a lot better when there is parallelization compared to long form sequential work. So what prevents you from going to the max? Bigger size of model massively increases how much processing intensive transcribing will be and how much memory it will consume. If you are transcribing on graphics card that has just 4GB of VRAM you cannot use large v2 because it won't fit. With batch size bigger is always better but it will also affect VRAM usage because it will work on more batches at once. I have RTX 4070 Super and with this GPU the best combo is large v2 and 20 batches. With this config I have around 80% VRAM usage so I'm sure I won't have out of memory crashes during long transcription sessions. With these settings on my RTX 1 hour 44 minutes stream that was in Polish language was transcribed in 1 minute 21 seconds with very high quality. Yeah, on single consumer midrange GPU you can transcribe 1 hour of non-English video with very high quality output in less than minute. I told you I will save you a lot of time. I was very happy with output in both Polish and English. There always will be some errors in output but Whisper S2T with CTranslate 2 backend gave me the least amount of errors while transcribing very fast. If you need solution for server not a GUI then Whisper S2T is available as docker container so you can easily test it and there is also example code for Python. So in conclusion I hope that I saved your time. Alternatives such as Whisper X, Faster Whisper and Insanely Fast Whisper are slower on consumer GPUs and do not improve quality of transcriptions. I hope that maintainer of Whisper S2T won't abandon this project because this is the best open source implementation of Whisper but he is not active on GitHub for long time so I don't really know. Anyways, that's all for today. Have a nice day.
Generate a brief summary highlighting the main points of the transcript.
GenerateGenerate a concise and relevant title for the transcript based on the main themes and content discussed.
GenerateIdentify and highlight the key words or phrases most relevant to the content of the transcript.
GenerateAnalyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.
GenerateCreate interactive quizzes based on the content of the transcript to test comprehension or engage users.
GenerateWe’re Ready to Help
Call or Book a Meeting Now