How domain context boosts post-call transcription quality (Full Transcript)

A women’s health app prioritizes accuracy over latency, using domain vocabulary and LLM prompts to post-correct transcripts after recordings upload.

Download Transcript (DOCX)

Speakers

Add new speaker

[00:00:00] Speaker 1: Awesome. I heard a few things around like accuracy, latency, code switching, speaker diarization. I guess probably all of them matter to some degree, but like maybe talk to us about like what are the ones that matter most and maybe what are some of the like things that you've done to make sure that like reliably your app actually is able to use voice in production for those particular features.

[00:00:19] Speaker 2: So for us actually latency is not that big of an issue because like we first upload the file kind of and we do all the summary after after the call or after the recording is being uploaded. Depends if it's telehealth or in person. We do some extra steps though, so kind of to double check that everything is correct. So for example here we could see on a live demo how we like improve the vocabulary because it has some context. We use those key terms as well, but what we do also is like we give the context of the visit. We mostly work with women's health, so we have certain words or diagnosis that of course like repeat themselves. So we kind of plug this into a prompt. So LLM then kind of goes through the transcript and tries to improve it if there's like anything out of the context. That's I would say it's like a biggest improvement for us. Those like improving models is really changing like there's less and less changes actually that you notice compared to like initial transcription.

Summary

The discussion contrasts key speech-to-text production concerns (accuracy, latency, code switching, diarization) and explains that for this app latency is less critical because transcription and summarization happen after uploading the recording. The team focuses on improving transcription accuracy by injecting domain context—especially recurring women’s health terminology—into prompts so an LLM can review the transcript and correct out-of-context words. They also add additional validation steps, and note that as underlying transcription models improve, fewer corrections are needed.

Copy

Download

Title

Post-call transcription: prioritize accuracy with domain context

Copy

Download

Keywords

speech-to-text Remove

Remove

accuracy

Remove

latency

Remove

code switching Remove

Remove

speaker diarization Remove

Remove

telehealth Remove

Remove

women's health Remove

Remove

domain vocabulary Remove

Remove

prompting

Remove

LLM post-processing Remove

Remove

transcription correction Remove

Remove

production voice app Remove

Remove

Copy

Download

Key Takeaways

Latency may be less important when transcription/summarization is performed post-call rather than live.
Accuracy can be improved by providing visit context and a curated domain vocabulary to guide corrections.
LLM-based post-processing can detect and fix out-of-context transcription errors.
Adding validation/double-check steps increases reliability for production use.
As base ASR models improve, the amount of necessary post-correction tends to decrease.

Copy

Download

Sentiments

Neutral: Pragmatic, technical tone focused on tradeoffs and implementation details; no strong positive or negative emotion, just matter-of-fact evaluation of latency vs accuracy and iterative improvements.

Copy

Download

Enter your query

{{ secondsToHumanTime(time) }}

Back

Forward

{{ Math.round(speed * 100) / 100 }}x

{{ secondsToHumanTime(duration) }}

Select Audio file