Blog

How-to Guides

How to Pilot a Transcription Vendor: Test Script + Acceptance Criteria

Andrew Russo

Posted in Zoom Jun 12 · 14 Jun, 2026

How to Pilot a Transcription Vendor: Test Script + Acceptance Criteria

Choosing a transcription vendor without a pilot is risky. A good pilot shows how the vendor handles your real audio, your accuracy needs, and your turnaround requirements before you commit.

The best way to pilot a transcription vendor is to test a small set of real recordings, use clear acceptance criteria, score each output the same way, and make a go/no-go decision based on evidence. This guide gives you a practical plan, a test script, and a scoring sheet you can use right away.

Key takeaways

Use real recordings from your workflow, not only clean sample audio.
Define acceptance criteria before the test starts.
Check more than word accuracy, including names, numbers, speaker labels, formatting, and turnaround time.
Score every vendor with the same checklist.
Make a go/no-go decision based on must-pass items and total score.

Why a transcription vendor pilot matters

Many transcription vendors look similar until you test them on difficult audio. A pilot helps you see how they handle crosstalk, accents, technical terms, low volume, and speaker changes.

It also helps you avoid two common mistakes: choosing on price alone and choosing based on a polished demo. Your actual recordings are the only test that matters.

A structured pilot gives you a fair way to compare options. It also gives your team a shared decision process, which is useful when legal, research, media, or operations teams all care about different details.

Step 1: Choose the right test recordings

Your pilot should be small enough to manage and broad enough to reflect your real work. For most teams, 5 to 10 recordings is enough for a first pass.

What to include in the test set

One clean recording with clear speakers
One moderate-difficulty recording with minor background noise
One difficult recording with overlapping speech or crosstalk
One recording with names, product terms, or jargon
One recording with important numbers, dates, amounts, or IDs
Optional: one recording with accents, remote call audio, or multiple speakers

How long each recording should be

Keep each file short enough to review without creating a heavy burden. A practical range is 5 to 15 minutes per file.

That gives you enough material to test quality while keeping scoring consistent. If your workflow depends on long recordings, include one longer file as a separate stress test.

How to prepare the files

Use the same source files for every vendor.
Do not edit the audio to make it easier.
Label files clearly, such as File A, File B, and File C.
Remove sensitive data if needed, or confirm the vendor can handle it under your rules.
Provide the same instructions, style guide, and glossary to all vendors.

What to send with the files

Expected turnaround time
Required output format, such as DOCX, TXT, or SRT
Speaker label rules
Formatting rules for dates, times, and numbers
Any glossary of names, products, or technical terms
Notes on sections that must be verbatim

If you need captions or subtitles as part of the workflow, test that separately with the same discipline. GoTranscript also offers closed caption services for teams that need timed text as part of delivery.

Step 2: Define acceptance criteria before the pilot starts

Acceptance criteria are the pass or fail rules for the pilot. Write them down before you see any results so you do not move the goalposts later.

Core quality criteria to include

Overall transcript accuracy
Accuracy for names and proper nouns
Accuracy for numbers, dates, amounts, and identifiers
Speaker diarization accuracy
Formatting consistency
Turnaround time
Instruction follow-through

Suggested acceptance criteria

Set thresholds that match your use case. If names and numbers matter a lot in your workflow, make them must-pass items even if the rest of the transcript looks good.

Names and proper nouns: zero critical errors in approved sample sections
Numbers, dates, and amounts: zero critical errors in approved sample sections
Speaker diarization: at least 90% of speaker changes labeled correctly
Formatting and style: at least 95% compliance with the provided style guide
Turnaround time: delivered within the agreed pilot deadline
Completeness: no missing sections, skipped speech, or unexplained blanks beyond agreed rules

Define what counts as a critical error

Not all errors carry the same weight. A missing comma is different from a wrong medication dose, wrong legal name, or wrong dollar amount.

Critical error: changes meaning, identity, amount, date, or who said it
Major error: clear mistake that affects usability but not core meaning
Minor error: punctuation, style, or formatting issue that does not change meaning

If you work in regulated environments, align your review rules with internal policy. If accessibility is part of the scope, review relevant captioning and transcript requirements from the WCAG guidance when you define deliverables.

Step 3: Run the pilot with a standard test script and checklist

A pilot works best when each vendor gets the same brief. Keep instructions simple, specific, and identical across vendors.

Sample pilot test script

Project: Pilot transcription test for vendor evaluation
Files included: 5 sample recordings labeled A to E
Audio type: mix of clean, moderate, and difficult recordings
Output required: verbatim transcript in DOCX and TXT
Speaker labels: identify each speaker change where possible
Numbers: write dates, amounts, and identifiers exactly as spoken unless style guide says otherwise
Names and terms: use the attached glossary for proper nouns and technical terms
Unclear audio: mark unintelligible sections using the agreed tag format
Deadline: submit all files by [date and time]
Questions: submit any clarification questions before [date and time]

Pilot execution checklist

Choose 5 to 10 recordings from real workflows
Create a short style guide and glossary
Define acceptance criteria and scoring rules
Send the same files and instructions to all vendors
Track response time and any clarification questions
Review delivery format, completeness, and deadline compliance
Score quality using the same reviewer or the same review team
Document issues by file and by error type

Review process tips

Use a reference transcript or a reviewer-approved answer key for the scored sections. You do not need to score every second of every file if time is tight, but you should score the same parts for every vendor.

Blind review helps if you want to reduce bias. Replace vendor names with neutral labels like Vendor 1, Vendor 2, and Vendor 3 during scoring.

Step 4: Use a simple scoring sheet to compare vendors fairly

A scoring sheet turns feedback into a decision tool. Keep it simple enough that reviewers will use it the same way every time.

Sample scoring categories

Names and proper nouns: 25 points
Numbers, dates, and amounts: 25 points
Speaker diarization: 15 points
Overall transcript accuracy: 15 points
Formatting and style compliance: 10 points
Turnaround time: 5 points
Communication and issue handling: 5 points

Sample scoring sheet

Vendor name:
Files reviewed:
Delivered on time: Yes or No
Required format delivered: Yes or No
Critical errors in names: [count]
Critical errors in numbers: [count]
Diarization accuracy: [percent]
Style compliance: [percent]
Unclear tags used correctly: Yes or No
Total score out of 100:
Must-pass criteria met: Yes or No
Reviewer notes:

How to score diarization accuracy

Count speaker changes in the reviewed section, then count how many were labeled correctly. Divide correct labels by total speaker-change events to get a percentage.

For example, if a file section has 20 speaker changes and the vendor labels 18 correctly, diarization accuracy is 90% for that section.

How to compare automated and human workflows

If you are comparing a human vendor with an AI-first option, score both against the same checklist. That keeps the review focused on outcomes instead of process.

If speed is the main goal for some content, you may also want to test automated transcription as a separate lane. Just do not mix acceptance criteria for high-risk files with criteria for low-risk files.

Step 5: Make a go or no-go decision

Your final decision should combine must-pass criteria with total score. A vendor with a high total score should still fail if they miss the items that matter most to your use case.

Recommended decision rules

Go: vendor meets all must-pass criteria and reaches your target total score
Conditional go: vendor misses only minor items and has a clear correction plan
No-go: vendor misses any critical must-pass item or shows repeated error patterns

Questions to ask before final approval

Did the vendor handle difficult audio well enough for your real workload?
Were names and numbers accurate enough for your use case?
Was speaker labeling reliable enough for interviews, meetings, or research?
Did they follow instructions without repeated reminders?
Did they deliver on time and in the right format?
Was communication clear when something in the audio was hard to hear?

Common pilot pitfalls

Testing only clean audio
Using vague success criteria like accurate enough
Reviewing different sections for different vendors
Ignoring diarization because word accuracy looks fine
Failing to separate critical errors from minor style issues
Changing thresholds after results come in

If two vendors score closely, run a second-round pilot with harder files or a larger sample. That is often better than choosing based on small score differences.

Practical pilot template you can copy

1. Pilot scope

Number of vendors: [insert number]
Number of files: [insert number]
File length range: [insert range]
Audio types: clean, moderate, difficult, jargon-heavy, number-heavy
Deadline: [insert date]

2. Must-pass acceptance criteria

No critical errors in names in reviewed sections
No critical errors in numbers, dates, or amounts in reviewed sections
Diarization accuracy of at least [insert threshold]
Delivered by deadline
Required format and style guide followed

3. Score threshold

Minimum total score to pass: [insert score] out of 100

4. Review method

Reviewer names: [insert names]
Blind review: Yes or No
Reference transcript available: Yes or No
Sections scored: [insert timestamps]

5. Final decision

Vendor selected: [insert name]
Decision: Go, Conditional go, or No-go
Reason: [insert short summary]

Common questions

How many files should I include in a transcription vendor pilot?

Most teams can start with 5 to 10 files. Use enough variety to reflect your real work, especially difficult audio and files with names or numbers.

Should I use real recordings or sample audio?

Use real recordings whenever possible. Sample audio is often too clean and may not show how the vendor performs under normal conditions.

What is the most important acceptance criterion?

That depends on your workflow, but names, numbers, and speaker diarization are often the most important because errors there can change meaning fast.

How do I test speaker diarization?

Choose sections with clear speaker changes, count the change events, and measure how many labels are correct. Review the same sections for every vendor.

Can I compare AI transcription and human transcription in the same pilot?

Yes, if you use the same files and scoring rules. If your content has different risk levels, set separate acceptance criteria for low-risk and high-risk work.

What should trigger a no-go decision?

Repeated critical errors, missed deadlines, poor speaker labeling, or failure to follow instructions are common no-go triggers. A high overall score should not override a must-pass failure.

What if two vendors score almost the same?

Run a second-round pilot with harder files or a larger sample. That usually gives a clearer answer than debating a small score gap.

If you want a more reliable way to evaluate transcript quality before scaling up, GoTranscript provides the right solutions, from pilot-friendly workflows to professional transcription services that fit different accuracy and turnaround needs.

Order Now

Transcriptions

Human-made audio-to-text in 140 languages

Captions

Human-made broadcast-ready captions

Instant Quote

Top pick

Services

PROFESSIONAL SERVICES

Human Transcription

Closed Captions

Proofreading & Transcript Editing

AUTOMATED SOLUTIONS

AI Transcriptions

Transcription & Captioning API

CUSTOM SOLUTIONS

Custom Transcription & Data Labeling

Pricing

Pricing Calculator

Loyalty Program

Education Discount

Nonprofit Discount

Green Initiative Discount

For business

Education

Government

Legal

Medical

Language Service Providers

Law Enforcement

Internal Communications

Market Research

News organisations

Company

Case Studies

Partnership

Trust Center

Our Languages

About

Our Team

Blog

Careers

Contact

Enterprise Solutions

Talk to Sales

Book a Meeting

Education & Campus Support

Order Support

Help Center

General Inquiries

Careers

PROFESSIONAL SERVICES

Human Transcription

Closed Captions

Proofreading & Transcript Editing

AUTOMATED SOLUTIONS

AI Transcriptions

Transcription & Captioning API

CUSTOM SOLUTIONS

Custom Transcription & Data Labeling

Transparent pricing

Book a meeting

Pricing Calculator

Loyalty Program

SPECIAL DISCOUNTS

Education Discount

Nonprofit Discount

Green Initiative Discount

Simple, Transparent Pricing

Billing Terms

Education

Government

Legal

Medical

Language Service Providers

Law Enforcement

Internal Communications

Market Research

News Organizations

Trusted by Global Leaders

Case Studies

Partnership

Trust Center

Our Languages

About

Our Team

Blog