How domain context improves transcription post-processing (Full Transcript)

Using discussion guides, keyword boosting, and LLM edit passes to correct transcripts—validated with eval datasets built from injected errors.
Download Transcript (DOCX)
Speakers
add Add new speaker

[00:00:00] Speaker 1: Maybe on the point around just like post transcription, post processing, maybe give folks in the audience an idea of like what are you actually doing to like customize that output per domain? Are you like boosting certain terms? Are you just like running multiple LLM workflows? Like maybe just walk us through what that looks like.

[00:00:15] Speaker 2: Yeah, yeah. So we get the transcript out from assembly. I guess like in order to understand this, users who go into CodeLoop and they create a project, doing qualitative research you almost always have like a discussion guide that is basically underlying the interviews that you do and within those discussion guides there's a lot of context about like what is this research about? Who's involved? Like what are the key terms? What questions are we asking? What objectives are there? So there's a lot of really rich context in there that can support transcription. And so what we do is we take that context and we get out various bits of structure information like keywords that we can pass into assembly in the first instance but also secondarily like once the transcription is done, we effectively have an LLM pass that is going over the transcript and basically like an AI coding agent is making edits in the transcript so like passing in like the original string and then like the stringer wants to replace it with in order to like insert and edit at a specific point based on like various instructions around like phonetic like mistranscriptions or like words that maybe should be joined together or like all sorts of things. And in that way we're basically able to then correct the transcript and we've set up a number of evals to test that so we sort of like work backwards from a clean transcript and like insert like problems into it to like validate that that works quite well and so we built up an eval data set.

ai AI Insights
Arow Summary
They customize transcription outputs by leveraging rich context from qualitative research discussion guides. From that context, they extract structured information like keywords to feed into Assembly during transcription. After transcription, they run an LLM pass where an AI coding agent edits the transcript via targeted string replacements to fix phonetic mistranscriptions, split/join words, and other issues. They validate this approach using evals built from clean transcripts with injected errors, creating an evaluation dataset to test correction performance.
Arow Title
Domain-tailored transcript post-processing with context + LLM edits
Arow Keywords
transcription Remove
post-processing Remove
domain customization Remove
discussion guide Remove
qualitative research Remove
context injection Remove
keywords Remove
Assembly Remove
LLM pass Remove
AI coding agent Remove
string replacement edits Remove
phonetic mistranscriptions Remove
transcript correction Remove
evals Remove
evaluation dataset Remove
Arow Key Takeaways
  • Use qualitative research discussion guides as domain context for transcription customization.
  • Extract keywords/structured info from guides and pass them to the transcription engine (Assembly).
  • Apply an LLM-based post-processing pass to perform targeted transcript edits.
  • Implement edits as precise string replacements to correct phonetic errors and word boundary issues.
  • Validate the correction pipeline with evals by injecting errors into clean transcripts to build an evaluation dataset.
Arow Sentiments
Neutral: The tone is technical and explanatory, focused on process details (context extraction, LLM editing, evaluation) without strong positive or negative emotional cues.
Arow Enter your query
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript