Faster research workflows · 10% .edu discount
Secure, compliant transcription
Court-ready transcripts and exhibits
HIPAA‑ready transcription
Scale capacity and protect margins
Evidence‑ready transcripts
Meetings into searchable notes
Turn sessions into insights
Ready‑to‑publish transcripts
Customer success stories
Integrations, resellers & affiliates
Security & compliance overview
Coverage in 140+ languages
Our story & mission
Meet the people behind GoTranscript
How‑to guides & industry insights
Open roles & culture
High volume projects, API and dataset labeling
Speak with a specialist about pricing and solutions
Schedule a call - we will confirmation within 24 hours
POs, Net 30 terms and .edu discounts
Help with order status, changes, or billing
Find answers and get support, 24/7
Questions about services, billing or security
Explore open roles and apply.
Human-made, publish-ready transcripts
Broadcast- and streaming-ready captions
Fix errors, formatting, and speaker labels
Clear per-minute rates, optional add-ons, and volume discounts for teams.
"GoTranscript is the most affordable human transcription service we found."
By Meg St-Esprit
Trusted by media organizations, universities, and Fortune 50 teams.
Global transcription & translation since 2005.
Based on 3,762 reviews
We're with you from start to finish, whether you're a first-time user or a long-time client.
Call Support
+1 (831) 222-8398Speaker 1: This captioning workflow consists of four steps. Step 1. Generating a raw transcript using YouTube's automatic speech recognition technology. Step 2. Editing the raw transcript in oTranscribe, a free text editor you can use in your web browser. Step 3. Segmenting and aligning the edited transcript with the audio by use of a script that uses Perl and Aeneas, a Python-based forced alignment tool. Step 4. Checking the accuracy of the timestamps in the resulting captions file. Step 1. Generating a raw transcript. I decided to use YouTube's automatic speech recognition technology to create a raw transcript for a few reasons. 1. YouTube is free. 2. The word error rate, WER, has been good in my testing, coming in at 4% WER compared to 7% for Speechmatics, 7% for Popup Archive, 8% for Trint, 9% for Google Speech API, 13% for Google Voice Typing, 14% for a trained Dragon profile, 23% for an untrained Dragon profile, and 26% for IBM Watson. 3. Many people are familiar with YouTube. After uploading the audio or video as an unlisted video to YouTube, I wait for the automatic closed caption file to be generated. I then use a script which uses YouTube DL and mainly sed commands to download the auto captions file and clean up the text, so that it is a plain text file without any of the formatting of the VTT file from YouTube. Step 2. Editing the transcript. I then open the raw transcript file in oTranscribe, a web-based text editor. This tool is free and allows me to quickly play back the part of the audio that I want to hear while I'm editing. The text is not time coded to the audio, which would be great, but the shortcuts are handy and my work is saved every second. While I'm editing the transcript, I make sure to include important information, such as speaker identification and important non-speech sounds. I place brackets around this information, which is important for the alignment process in the next step.
Speaker 2: The text on the right says, save the planet, kill yourself. Kill yourself.
Speaker 3: Kill yourself. These things are an expression of what we might call liberal environmentalism. The basic idea behind liberal environmentalism is that mankind is bad for the planet.
Speaker 1: Once the editing is done and the transcript is perfect, as best as I can tell, I export it as a text file from oTranscribe. Step 3. Segmenting and aligning the transcript. The next step in the process is to segment the transcript so that the sentences are broken into caption-ready chunks. In keeping with best practices in captioning, we are interested in creating caption blocks that do not exceed 35 characters per line. We also want to have caption blocks that have two consecutive lines, when possible, which can help reduce the frame rate of the captions and make them easier to read. Finally, we also want the end of the sentences always to appear at the end of a caption block. For this step, it is also important that the full stops used in abbreviations or honorifics are not treated as the end of sentences. The segmentation of the transcript is performed in the first part of a script that I have created for use at this step in the captioning workflow. After segmentation is performed, the new chunks are aligned with the audio file in the second part of the script. The alignment is computed via a wonderful Python library called Aeneas, developed by Alberto Petarín. In the Aeneas portion of my script, I can adjust some useful parameters for the aligner to work just the way I want for my video, such as how long the head and tail of the video is, and whether to remove the non-speech segments of the audio from the alignment process. By executing this script, I now have a captions file and an HTML file that opens up a text editor that I can use for checking the accuracy of the timestamps. Step 4. Checking the Accuracy of the Captions File With the captions file in hand, I am now at the final step in my workflow, which is to fine-tune the timestamps in my captions file, if necessary. To do this, I use an HTML editor called FineTuneAs, which allows me to quickly check the timestamps and download the corrected captions file.
Speaker 2: What is the point? What's the point of Orthodox Christianity? There's so many versions of Christian faith out there to choose from. Why Orthodox?
Speaker 1: Once I confirm that the captions file is accurate, I can upload it to YouTube, Amara, or embed it as a sidecar file in my preferred video editor.
Generate a brief summary highlighting the main points of the transcript.
GenerateGenerate a concise and relevant title for the transcript based on the main themes and content discussed.
GenerateIdentify and highlight the key words or phrases most relevant to the content of the transcript.
GenerateExtract key takeaways from the content of the transcript.
GenerateAnalyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.
GenerateWe’re Ready to Help
Call or Book a Meeting Now