Universal One: Fast, Accurate Multilingual Transcription (Full Transcript)

Assembly AI's Universal One offers 10% more accuracy, transcribes in 38 seconds, supports multiple languages, and reduces costs, revolutionizing speech recognition.

Download Transcript (DOCX)

Speakers

Add new speaker

Speaker 1: Assembly AI just launched Universal One, our most capable and highly trained speech recognition model. Trained over 12.5 million hours of data, Universal One is 10% or more accurate than the next best speech-to-text model, reduces hallucination rate by 30%, exhibits the ability to transcribe multiple languages, takes 38 seconds to process a 60-minute audio file, and costs as low as 37 cents per hour of audio processing. So you can transcribe people when they're whispering, or switching languages with ease. Let's take a closer look at the performance of Universal One. Universal One is trained on English, Spanish, French, and German. In English benchmarks, Universal One achieves low word error rates across datasets, demonstrating robustness in various conditions, including telephony and noisy scenarios, despite having only 60% as many parameters as models like Canary 1b and Whisper Large version 3. And on non-English benchmarks, Universal One achieves lower word error rates in 7 out of 15 datasets, demonstrating its competitiveness in these languages. A key improvement we measured in Universal One is its reduced occurrence of consecutive errors, often manifesting as hallucinations or text in the transcription that is not in the audio file. Universal One provides faithful transcriptions compared with Whisper, a widely used open-source speech-to-text model, reducing the hallucination rate by 29% relatively. Universal One achieved a 5x speedup compared to a fast and batch-enabled implementation of Whisper Large version 3 on the same hardware, thanks to faster decoding speed and superior batching capability. As a result, it transcribes an hour-long audio in only 38 seconds. Universal One is accessible to our API today. With the latest improvements made to our API, you can also select which speech-to-text tier you want when you're using Assembly AI's API. Use Best for the most accurate tier, and use Nano for the less accurate and less expensive tier. With higher accuracy, faster turnaround times, and lowered costs, building with Speech AI is more accessible than ever before. We cannot wait to see what you will build with Assembly AI.

Summary

Generate a brief summary highlighting the main points of the transcript.

Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Key Takeaways

Extract key takeaways from the content of the transcript.

Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Enter your query

{{ secondsToHumanTime(time) }}

Back

Forward

{{ Math.round(speed * 100) / 100 }}x

{{ secondsToHumanTime(duration) }}

Select Audio file