Blog chevron right Transcription

How Long Does It Take to Transcribe One Hour of Audio?

Daniel Chang
Daniel Chang
Posted in Zoom Mar 9 · 9 Mar, 2020
How Long Does It Take to Transcribe One Hour of Audio?

More and more industry professionals regularly need to transcribe audio and video recordings. They usually turn to highly skilled professional transcriptionists to get their files transcribed to save time and money. Others may choose to do it themselves but with the help of AI (Artificial Intelligence).

This article will help you understand how long it takes to transcribe an hour of audio, the factors affecting the process, and the transcription solutions available. 

The Most Common Transcription Solutions

Transcription is the process of listening to a sound recording and accurately typing it into a text document. How long it will take to transcribe one hour of audio or video varies from person to person and depends on several factors listed in the next section. 

Today, clients can choose between manual and automated transcription. Each has its benefits and disadvantages, as depicted below.

Manual Transcription

As the name suggests, manual transcription is done by a professional human transcriptionist. Such transcribers take an average of three to four hours to transcribe an hour-long file. So whether your file is two hours long or 15 minutes, a professional transcriber should easily estimate their turnaround time. 

However, completing the same work will not take the same duration should you choose to do it yourself. If you’re looking to get into transcription, your first question will be, “How long to transcribe one hour of audio?” Unfortunately, the average person may take twice or thrice the professional’s time to capture everything right.

Manual transcription, especially when outsourced to a professional transcription agency, offers output quality with 99% accuracy, making it ideal for academic, legal, medical, and business applications. Additionally, professionals can better discern accents and speech from noisy backgrounds than AI.

Automated Transcription

This refers to converting speech to text using voice and speech recognition technology. While this option may be cheaper and faster, it’s almost impossible to ignore the downsides.

Automated transcription doesn’t offer the best quality transcripts. The software tools struggle to decipher speech from background noise, mumbling, rapid speech, and heavy accents. The result is a transcript full of errors that have to be edited by human professionals to produce high-quality drafts.

Factors Affecting Transcription Turnaround Times

Many factors can impact the turnaround time. How they come into play may shorten the process to two hours or lengthen it to ten hours. Every transcriptionist differs from the next based on their experience level, typing speed, specialization in particular areas, and other factors. It’s essential to consider several specific project characteristics to determine a realistic transcription time frame.

Audio Recording Quality

The overall audio quality of the source material has a substantial impact on the speed of transcription. Background noise, connection problems, speech clarity, and the pace at which the speaker talks all make influence the transcription turnaround time. You’ll work much faster when there is little or no ambient interference, background noise, or other interruptions.

Problematic internet connection will also slow down the process if the source material is an internet call. If the speakers talk clearly without interrupting each other, transcription will demand less time. But if you recorded the speech outdoors, or if the recording equipment quality is lacking, the transcription is guaranteed to take longer.

Regional Accents

Whether the task is to transcribe audio or video, the process will likely take longer if the speakers involved have heavy and foreign accents. For example, an American may find it challenging to capture English words spoken by an Indian or Irish native. 

The transcriber may slow the audio playback to decipher the speech and ensure the accuracy of the transcription. Stopping and replaying the audio file also may be necessary to discern some words, thus increasing the time required to complete the work.

Multiple Speakers

When it comes to transcribing recordings with multiple speakers, the process will take even longer. Multiple speakers tend to overlap speech and even talk over each other. Interpreting overlapping speech and labeling it as such is guaranteed to negatively impact the speed of the transcription process. 

Recordings that include multiple people, like focus groups or business meetings, will typically take longer to transcribe than single-speaker ones.

Potential Research Required

Some industries, such as medical or legal fields, are replete with complicated jargon and terminology while also being known to employ a variety of abbreviations. To accurately transcribe audio or video in these fields, a transcriptionist has to perform extensive research to get everything right. 

Also, recordings with proper nouns or specific locations (cities, countries, towns) almost always require more time to investigate accurate spelling. Additional research means more time is needed to complete a transcription project.

Special Transcription Requirements or Instructions

Some professionals assign transcription projects special requirements, such as full-verbatim instructions (including false starts, stutters, and incomplete sentences) or timestamps. These restrictions usually slow down the overall transcription process.


The initial run-through may require additional time for stopping and replaying speech to ensure perfect accuracy when transcribing jobs with such requirements. Furthermore, a transcriptionist may further review to ensure the rules on strict verbatim instructions and timestamps are observed.