Universal 3 Pro Adds Prompting to Customize Transcripts (Full Transcript)

AssemblyAI’s Universal 3 Pro enables prompt-guided speech-to-text, improving baseline accuracy and letting users tailor verbatim style, entities, and formatting.

Download Transcript (DOCX)

Speakers

Add new speaker

[00:00:00] Speaker 1: Hi, Ryan here from Assembly AI. I'm excited to announce our newest speech-to-text model, Universal 3 Pro. This is the first model that allows you to add a text prompt input next to your audio files to generate a completely customized transcription output for your particular use case and customers. Now, let's actually jump in and see what some of these prompting capabilities look like. Our prompt engineering guide walks you through some of the ways that prompts can influence the output of your transcripts. Some things that prompts can do off the bat, increase disfluencies, change style and formatting, add context-aware clues and improve entity accuracy, add speaker attribution and different audio event tags, make sure the model code switches, and a number of more capabilities that we're still working on documenting and discovering. With this, I actually want to walk you through some of these example prompts and behavior, so that you can see how the model reacts and changes as we're actually prompting it. To help with this, we're going to be using this GitLab unfiltered SEC growth data science staff meeting as the sample file for our comparisons. I whipped up this quick lovable app to do speech-to-text model comparisons between the different assembly models. On the left, we're going to have Universal 2. This is our current production model, which has the best price per performance of any speech-to-text model on the market. On the right, we're going to have Universal 3 Pro. For this comparison though, we are not going to add a prompt. The reason is, I want to highlight very quickly, just Universal 2 versus Universal 3 Pro, out of the box, no prompt customization, what some of the differences are between the models and some of the things that we've improved. You'll see below, we're actually marking some of the differences between the two. I'm going to go ahead and play the first little bit of this audio file so you can see some of the differences.

[00:01:50] Speaker 2: So it is the SEC, meaning secure and govern, growth and data science, meaning applied ML, ML Ops, and anti-abuse team meeting. That's a big mouthful. We might get a better name over time. That's our meeting for September 14th or 15th in APAC. Hi, Alan. Glad you're here. Why are you here when it's midnight? We've talked. Glad you're here.

[00:02:18] Speaker 1: Really interestingly, you can see immediately we had some corrections in the first sentence to fix some broken words. We've capitalized some of the proper nouns. We've also completely fixed the meaning of this sentence. Why are you here when it's midnight? We could talk. The original actually had that as a completely different meaning. So just out of the box, Universal 3 Pro has done a bunch of things to make our transcript better. Now, let's actually start prompting to see what we can do in terms of customization. Since we've already done a simple prompt, I'm going to go down and start doing sample prompt too so that we can see some of the differences when we go ahead and use this prompt. Let's go ahead and plug this into the tool and compare it to no prompt and see what the results look like. With this done, you'll see that the nuances and differences in this file are quite subtle. You can quickly see this it may later on in that transcript. Let's actually go and highlight to that particular point so we could see what the difference is.

[00:03:16] Speaker 2: Glad you're here. Don't make it a habit to come to this meeting since it's really late for you. But I'm glad. Thank you.

[00:03:22] Speaker 1: So you can actually see it may when he said that. That was like a stutter and speech hesitation, and now we've properly transcribed that with this simple prompt. This prompt, however, could be a lot more verbose and follow some of the best practices in our prompt engineering guide. Let's go ahead and test an additional prompt to see how we can improve these results. Something I noticed when we were listening to that audio is it seems like there's a lot of false starts and hesitations. So I'm actually going to go down to the verbatim section and try one of the different prompts here to see if we can tease out some of those capabilities in the audio file. I've gone ahead and moved the initial prompt that we used to the left, and now we have this new prompt running on the right. Let's go ahead and compare these results to see what we get. With this new verbatim prompt being used, you can actually see quite quickly how many ums and ohs we've actually added in here. Let's go ahead and scroll to that part of the audio just so we can see what this actually looks like here.

[00:04:18] Speaker 2: Mouthful, we might get a better name over time. That's our meeting for September 14th or 15th in APAC. Hi, Alan. Glad you're here. Why are you here when it's midnight? We can talk. Glad you're here. Don't make it a habit to come to this meeting.

[00:04:37] Speaker 1: So with that, you can see very quickly how we've customized our transcript and gotten completely different results based on the prompt that we've used. So there you have it, completely customized transcripts with Universal 3 Pro and prompting. If you're new to Assembly AI, please reference our quick start guide. You can use the speech models parameter on your request to request Universal 3 Pro, and feel free to include the prompt parameter to start experimenting with the different capabilities of the model. We're really excited to see what you build and looking forward to your feedback so we can keep making the model more and more robust for our different customers' use cases. Thanks.

Summary

AssemblyAI announces Universal 3 Pro, a new speech-to-text model that supports adding a text prompt alongside audio to customize transcription output for specific use cases. A demo compares Universal 2 (current production model) with Universal 3 Pro out-of-the-box, showing improvements like corrected words, better capitalization of proper nouns, and corrected meaning. The presenter then demonstrates prompting: a simple prompt captures a stutter (“it may”), and a more verbatim-oriented prompt increases captured disfluencies (ums/uhs), illustrating how prompts can influence style, formatting, context, entities, speaker attribution, audio event tags, and code-switching. Viewers are encouraged to use the model via the speech model parameter and experiment with the prompt parameter, referencing the prompt engineering and quickstart guides.

Copy

Download

Title

AssemblyAI Universal 3 Pro: Promptable Speech-to-Text Customization

Copy

Download

Keywords

AssemblyAI Remove

Remove

Universal 3 Pro Remove

Remove

speech-to-text Remove

Remove

prompting

Remove

prompt engineering Remove

Remove

custom transcription Remove

Remove

verbatim transcription Remove

Remove

disfluencies Remove

Remove

entity accuracy Remove

Remove

speaker attribution Remove

Remove

code-switching Remove

Remove

Universal 2 Remove

Remove

model comparison Remove

Remove

Copy

Download

Key Takeaways

Universal 3 Pro is AssemblyAI’s first STT model that accepts a text prompt to customize transcripts.
Out-of-the-box, Universal 3 Pro improves accuracy, capitalization, and semantic correctness versus Universal 2.
Prompts can adjust verbosity and capture disfluencies like stutters, ums, and hesitations.
Prompting can influence transcript style/formatting, add context for better entities, and support speaker/audio event tagging and code-switching.
Users can enable Universal 3 Pro via a speech model parameter and supply a prompt parameter to experiment.

Copy

Download

Sentiments

Positive: Enthusiastic product announcement highlighting improvements, new capabilities, and inviting users to experiment; tone is optimistic and forward-looking.

Copy

Download

Enter your query

{{ secondsToHumanTime(time) }}

Back

Forward

{{ Math.round(speed * 100) / 100 }}x

{{ secondsToHumanTime(duration) }}

Select Audio file