Blog chevron right Guides pratiques

How to Pilot a Transcription Vendor: Test Script + Acceptance Criteria

Christopher Nguyen
Christopher Nguyen
Publié dans Zoom juin 12 · 14 juin, 2026
How to Pilot a Transcription Vendor: Test Script + Acceptance Criteria

Choosing a transcription vendor without a pilot is risky. A short, structured test helps you see how a vendor handles your real audio, your accuracy needs, and your workflow before you commit. The best pilot uses a small set of recordings, clear acceptance criteria for names, numbers, and speaker labels, and a simple scoring sheet to support a go/no-go decision.

This guide shows you how to pilot a transcription vendor step by step, including a test script, checklist, scoring sheet, and decision rules you can use right away.

Key takeaways

  • Use real recordings that reflect your hardest and most common audio.
  • Set acceptance criteria before the test, not after results come in.
  • Score more than word accuracy: check names, numbers, speaker labels, formatting, and turnaround time.
  • Compare vendors on the same files, instructions, and deadlines.
  • Make a go/no-go decision with a simple threshold and notes on risks.

Why a transcription vendor pilot matters

A sales demo cannot show how a vendor will perform on your actual content. A pilot gives you direct evidence from your own recordings, terminology, and output needs.

It also helps your team align on what “good enough” means. That matters because one project may need near-perfect names and figures, while another can accept light cleanup after delivery.

  • Use a pilot when you are selecting a new vendor.
  • Use it when you are moving from in-house work to outsourced transcription services.
  • Use it when quality has slipped and you need a fair comparison.
  • Use it when you need to compare human, AI, or hybrid options like automated transcription.

Build the right pilot set

Your test recordings should reflect normal work, edge cases, and business risk. If you only test clean audio, you may choose a vendor that fails on the files that matter most.

Choose 5 to 10 recordings

A small pilot is easier to manage and still shows useful patterns. Aim for 60 to 180 minutes of total audio across all files.

  • 2 to 3 common files: your normal meetings, interviews, or calls.
  • 2 difficult files: cross-talk, accents, low volume, or background noise.
  • 1 high-risk file: heavy use of names, numbers, product terms, or compliance language.
  • Optional: 1 multilingual or code-switching file if that reflects your work.

Include variation on purpose

  • Different speaker counts.
  • Different audio quality levels.
  • Different subject areas.
  • Different expected turnaround times.

Prepare a gold standard for scoring

You need a trusted reference to score vendor output. Create one approved transcript for each test file using your internal expert or a reviewer you trust.

Keep the same style rules across all files. If you need help defining formatting, timestamps, or cleanup level, document it before the pilot starts.

Set acceptance criteria before testing

Acceptance criteria keep the pilot fair. They also stop teams from changing the rules after they see results.

What to measure

  • Verbatim or clean-read accuracy, based on your use case.
  • Correct handling of names.
  • Correct handling of numbers, dates, amounts, and codes.
  • Speaker diarization accuracy.
  • Formatting and adherence to instructions.
  • Turnaround time.
  • Ease of review and file delivery.

Sample acceptance criteria

Adjust these thresholds to fit your project risk. For legal, medical, finance, or research work, use stricter standards.

  • Names: 100% correct for named people, products, and organizations listed in the glossary.
  • Numbers: 100% correct for dates, amounts, phone numbers, IDs, measurements, and other structured values.
  • Diarization: at least 95% correct speaker separation on files with up to 4 speakers, and at least 90% on more complex files.
  • Unclear tags: no guessing on inaudible content; mark uncertainty consistently.
  • Formatting: 100% compliance with required template fields, timestamps, and speaker labels.
  • Turnaround: delivered within the agreed deadline for every file.

Define what counts as an error

Write this down in one page and share it with every vendor. For example, decide whether “fifty” instead of “15” is one error or a critical failure, and whether wrong speaker labels count per line or per segment.

  • Critical errors: wrong names, wrong numbers, missed negation, wrong speaker identity.
  • Major errors: omitted phrases, repeated text, timestamp failures, style rule breaks.
  • Minor errors: punctuation, filler word treatment, spacing, capitalization.

Run the pilot: vendor brief, test script, and checklist

Keep the process identical for every vendor. Use the same files, glossary, instructions, and deadline window.

Vendor brief template

  • Project type and purpose.
  • Output type: verbatim or clean read.
  • Formatting rules and timestamp rules.
  • Glossary of names, terms, acronyms, and number formats.
  • Expected delivery method and file naming rules.
  • Deadline and time zone.
  • Point of contact for questions.

Test script for your team

  • Step 1: Select 5 to 10 files and confirm permissions to share them.
  • Step 2: Create a gold standard transcript for each file.
  • Step 3: Build a glossary with correct names, terms, and numeric formats.
  • Step 4: Send the same pilot pack to each vendor at the same time.
  • Step 5: Log questions from vendors and answer all vendors with the same clarifications.
  • Step 6: Receive outputs and remove vendor branding if you want a blind review.
  • Step 7: Score each file against the gold standard and acceptance criteria.
  • Step 8: Hold a review meeting and decide go, conditional go, or no-go.

Pilot checklist

  • Do files reflect normal and difficult audio?
  • Did every vendor get the same instructions?
  • Did you define error categories before the test?
  • Did you set deadlines and required formats in writing?
  • Did you prepare a glossary?
  • Did you decide who will score and review results?
  • Did you define a tie-break rule if vendors score closely?

Use a simple scoring sheet

A scoring sheet helps your team compare vendors fairly. Keep it simple enough that two reviewers would reach similar results.

Sample scoring categories

  • Overall transcript accuracy: 30 points.
  • Names and terminology: 20 points.
  • Numbers and structured data: 20 points.
  • Speaker diarization: 15 points.
  • Formatting and instruction compliance: 10 points.
  • Turnaround and delivery quality: 5 points.

Sample scoring sheet

  • Vendor name:
  • File name:
  • Reviewer:
  • Overall transcript accuracy score (0–30):
  • Names and terminology score (0–20):
  • Numbers and structured data score (0–20):
  • Speaker diarization score (0–15):
  • Formatting and instruction compliance score (0–10):
  • Turnaround and delivery quality score (0–5):
  • Critical errors found:
  • Major errors found:
  • Minor errors found:
  • Pass/fail for this file:
  • Reviewer notes:

How to score names, numbers, and diarization

  • Names: compare every glossary item spoken in the file against the transcript.
  • Numbers: check all dates, prices, IDs, percentages, times, and measurements.
  • Diarization: check whether each speaker segment is assigned to the correct speaker or at least separated correctly if named identity is not required.

If you need captions instead of plain transcripts, include line length, timing, and readability rules, or test a separate closed caption services workflow.

Make a go/no-go decision

Do not choose on average score alone. One vendor may score well overall but still fail on critical items like names, numbers, or speaker identity.

Recommended decision rules

  • Go: vendor meets all critical acceptance criteria and reaches your minimum total score.
  • Conditional go: vendor misses a non-critical threshold but has a clear fix, such as formatting or delivery workflow.
  • No-go: vendor fails any critical criterion, misses deadlines, or shows inconsistent results across files.

Example threshold model

  • Minimum total score: 85 out of 100 across the pilot.
  • No critical errors on names or numbers.
  • Diarization meets the target for at least 80% of tested files.
  • On-time delivery for every file.
  • No repeated formatting failures after clarification.

Questions to ask before final selection

  • Did the vendor perform well on your hardest files, not just the easy ones?
  • Did the vendor ask useful questions before starting?
  • Were issues isolated or repeated across files?
  • Can your team review and use the output without heavy rework?
  • Does the pricing still make sense after you factor in review time and corrections? You can compare likely costs against your expected workload and transcription pricing needs.

Common mistakes to avoid

  • Testing only one easy file.
  • Skipping a glossary for names and terms.
  • Changing acceptance criteria after results arrive.
  • Letting each vendor choose a different turnaround target.
  • Relying on one reviewer with no spot check from another reviewer.
  • Choosing the cheapest option before measuring rework time.
  • Ignoring security or handling requirements when sharing files. If you process personal data, review your obligations under the GDPR.

Common questions

How long should a transcription vendor pilot be?

Keep it short but varied. Many teams can learn enough from 60 to 180 minutes of audio spread across several files.

Should I test human and AI transcription vendors together?

Yes, if both are realistic options for your workflow. Use the same files and acceptance criteria so the comparison stays fair.

What is a good diarization target?

That depends on speaker count and audio quality. Set separate targets for simple and complex files, and treat wrong speaker identity as a critical error if identity matters.

Do I need a gold standard transcript?

Yes, if you want a reliable comparison. Without a reference, reviews become subjective and harder to defend.

How many vendors should I include?

Two to four is usually enough. More than that adds work without always improving the decision.

Should turnaround time be part of the pilot?

Yes. Fast delivery can matter as much as accuracy, especially if your team works to fixed publishing or reporting deadlines.

What if no vendor passes?

Review whether your pilot matched real needs and whether your instructions were clear. Then run a second round with revised files, a better glossary, or a different mix of vendors.

A good pilot reduces surprises and helps your team choose with confidence. If you need support with different workflows, file types, or review standards, GoTranscript provides the right solutions, including professional transcription services.