20,000+ Professional Language Experts Ready to Help. Expertise in a variety of Niches.
Unmatched expertise at affordable rates tailored for your needs. Our services empower you to boost your productivity.
GoTranscript is the chosen service for top media organizations, universities, and Fortune 50 companies.
Speed Up Research, 10% Discount
Ensure Compliance, Secure Confidentiality
Court-Ready Transcriptions
HIPAA-Compliant Accuracy
Boost your revenue
Streamline Your Team’s Communication
We're with you from start to finish, whether you're a first-time user or a long-time client.
Give Support a Call
+1 (831) 222-8398
Get a reply & call within 24 hours
Let's chat about how to work together
Direct line to our Head of Sales for bulk/API inquiries
Question about your orders with GoTranscript?
Ask any general questions about GoTranscript
Interested in working at GoTranscript?
Speaker 1: This captioning workflow consists of four steps. Step 1. Generating a raw transcript using YouTube's automatic speech recognition technology. Step 2. Editing the raw transcript in oTranscribe, a free text editor you can use in your web browser. Step 3. Segmenting and aligning the edited transcript with the audio by use of a script that uses Perl and Aeneas, a Python-based forced alignment tool. Step 4. Checking the accuracy of the timestamps in the resulting captions file. Step 1. Generating a raw transcript. I decided to use YouTube's automatic speech recognition technology to create a raw transcript for a few reasons. 1. YouTube is free. 2. The word error rate, WER, has been good in my testing, coming in at 4% WER compared to 7% for Speechmatics, 7% for Popup Archive, 8% for Trint, 9% for Google Speech API, 13% for Google Voice Typing, 14% for a trained Dragon profile, 23% for an untrained Dragon profile, and 26% for IBM Watson. 3. Many people are familiar with YouTube. After uploading the audio or video as an unlisted video to YouTube, I wait for the automatic closed caption file to be generated. I then use a script which uses YouTube DL and mainly sed commands to download the auto captions file and clean up the text, so that it is a plain text file without any of the formatting of the VTT file from YouTube. Step 2. Editing the transcript. I then open the raw transcript file in oTranscribe, a web-based text editor. This tool is free and allows me to quickly play back the part of the audio that I want to hear while I'm editing. The text is not time coded to the audio, which would be great, but the shortcuts are handy and my work is saved every second. While I'm editing the transcript, I make sure to include important information, such as speaker identification and important non-speech sounds. I place brackets around this information, which is important for the alignment process in the next step.
Speaker 2: The text on the right says, save the planet, kill yourself. Kill yourself.
Speaker 3: Kill yourself. These things are an expression of what we might call liberal environmentalism. The basic idea behind liberal environmentalism is that mankind is bad for the planet.
Speaker 1: Once the editing is done and the transcript is perfect, as best as I can tell, I export it as a text file from oTranscribe. Step 3. Segmenting and aligning the transcript. The next step in the process is to segment the transcript so that the sentences are broken into caption-ready chunks. In keeping with best practices in captioning, we are interested in creating caption blocks that do not exceed 35 characters per line. We also want to have caption blocks that have two consecutive lines, when possible, which can help reduce the frame rate of the captions and make them easier to read. Finally, we also want the end of the sentences always to appear at the end of a caption block. For this step, it is also important that the full stops used in abbreviations or honorifics are not treated as the end of sentences. The segmentation of the transcript is performed in the first part of a script that I have created for use at this step in the captioning workflow. After segmentation is performed, the new chunks are aligned with the audio file in the second part of the script. The alignment is computed via a wonderful Python library called Aeneas, developed by Alberto Petarín. In the Aeneas portion of my script, I can adjust some useful parameters for the aligner to work just the way I want for my video, such as how long the head and tail of the video is, and whether to remove the non-speech segments of the audio from the alignment process. By executing this script, I now have a captions file and an HTML file that opens up a text editor that I can use for checking the accuracy of the timestamps. Step 4. Checking the Accuracy of the Captions File With the captions file in hand, I am now at the final step in my workflow, which is to fine-tune the timestamps in my captions file, if necessary. To do this, I use an HTML editor called FineTuneAs, which allows me to quickly check the timestamps and download the corrected captions file.
Speaker 2: What is the point? What's the point of Orthodox Christianity? There's so many versions of Christian faith out there to choose from. Why Orthodox?
Speaker 1: Once I confirm that the captions file is accurate, I can upload it to YouTube, Amara, or embed it as a sidecar file in my preferred video editor.
Generate a brief summary highlighting the main points of the transcript.
GenerateGenerate a concise and relevant title for the transcript based on the main themes and content discussed.
GenerateIdentify and highlight the key words or phrases most relevant to the content of the transcript.
GenerateAnalyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.
GenerateCreate interactive quizzes based on the content of the transcript to test comprehension or engage users.
GenerateWe’re Ready to Help
Call or Book a Meeting Now