Blog chevron right Research

Using AI Transcription in Human-Subjects Research: IRB Risk Checklist + Controls

Andrew Russo
Andrew Russo
Posted in Zoom Feb 24 · 24 Feb, 2026
Using AI Transcription in Human-Subjects Research: IRB Risk Checklist + Controls

Yes, you can use AI transcription in human-subjects research, but you must treat it like a data-sharing decision and document the risks and controls for your IRB. The safest approach is to map what data you will upload, who can access it, where it will be processed and stored, how long it will be retained, and how you will audit and limit exposure.

This guide walks through the key IRB risk areas for AI transcription and gives you a practical checklist plus controls you can propose. IRBs vary in what they allow and what they require, so use this as a starting point and confirm details with your IRB and institutional privacy/security office.

Primary keyword: AI transcription in human-subjects research

Key takeaways

  • Evaluate AI transcription as third-party processing of identifiable research data, not just a convenience tool.
  • Focus your IRB write-up on confidentiality, retention, data residency, and auditability (who did what, when).
  • Reduce risk with controls like approved tools, pre-upload redaction, least-privilege access, and human review of sensitive segments.
  • IRB requirements differ across institutions, study populations, and data types, so plan for questions and alternatives.

What IRBs look for when you use AI transcription

IRBs usually do not “approve AI” in the abstract. They assess whether your plan protects participants and matches your consent, protocol, and data management plan.

When you propose AI transcription, expect the IRB to focus on whether the tool introduces new confidentiality or security risks compared with your original plan.

Common IRB concerns

  • Confidentiality: Will anyone outside your team have access to recordings or transcripts?
  • Third-party processing: Is a vendor or model provider processing the data, and under what terms?
  • Retention: Are audio files or transcripts stored, cached, or used for product improvement?
  • Data residency: In what country or region will processing and storage occur?
  • Auditability: Can you track access, exports, edits, and deletion requests?
  • Re-identification risk: Does the content include direct identifiers or rare details that could identify someone?

Start with a simple decision tree

  • If recordings include direct identifiers (names, addresses, faces, voices tied to identity), treat AI upload as high risk until you reduce identifiers.
  • If your institution offers an approved platform for transcription, use it or explain why not.
  • If you cannot meet required controls (residency, retention, access logs), plan a fallback (human transcription under contract, on-prem tools, or in-house transcription).

How to evaluate an AI transcription tool for IRB-reviewed studies

Use a structured evaluation so you can answer IRB questions clearly. Keep screenshots or PDFs of settings and policies you rely on, because tools and terms can change.

Below are the core areas to assess and document.

1) Confidentiality and access controls

Identify exactly who can access the data: your study team, the vendor’s staff, subcontractors, and any “support” or “review” personnel.

  • Does the tool support role-based access and least privilege?
  • Can you require multi-factor authentication for accounts?
  • Can you prevent sharing links publicly and restrict downloads/exports?
  • Can you separate projects by study to limit cross-team access?

2) Third-party processing and model training

IRBs often ask whether the vendor uses your data for training or “product improvement,” and whether other third parties process it.

  • Does the vendor state that uploaded content is used for training by default, or can you opt out?
  • Is processing handled by the vendor only, or by additional subprocessors?
  • Can you get a clear statement of data ownership and permitted uses?

If the answer is unclear, treat it as a risk and ask for written clarification before you upload any participant data.

3) Retention, deletion, and backups

Retention is a frequent stumbling block, because IRBs expect you to keep research data only as long as needed and as described in your protocol.

  • Can you set retention periods for audio and transcripts?
  • Can you delete files promptly, and does deletion apply to backups?
  • Does the tool keep hidden copies (version history, caches, “recovery” bins)?

Document what you can control in settings, what is fixed by the platform, and what you will do to align with your study’s data management plan.

4) Data residency and cross-border transfers

Data residency matters when your institution or funder requires processing or storage in a specific country or region. It can also matter when participants are in locations with stricter privacy rules.

  • Where are the tool’s servers located for storage and processing?
  • Can you choose a region, and is that choice documented?
  • Will support staff access data from other locations?

If you work with EU participants or collaborators, you may also need to consider cross-border transfer rules under the GDPR overview.

5) Auditability and documentation

Auditability helps you prove that you followed your plan. It also helps you respond to incidents, participant requests, and compliance reviews.

  • Are there access logs showing who viewed, downloaded, or shared files?
  • Can you export logs for your records?
  • Is there a history of transcript edits and timestamps?
  • Can you document deletion (or obtain deletion confirmations)?

6) Special populations and sensitive topics

Some studies need stronger protections because participants face higher risk if their identity or responses are exposed.

  • Studies involving minors, undocumented participants, stigmatized conditions, or illegal behavior often need stricter controls.
  • Voice itself can be identifying, even if you remove names, so plan accordingly.

IRB risk checklist for AI transcription (copy/paste)

Use this checklist to spot gaps before you submit or amend your IRB protocol. Mark each item as Yes/No/Unknown and add notes.

A) Data classification and identifiers

  • Do recordings contain direct identifiers (names, addresses, employer, phone, email)?
  • Do recordings contain indirect identifiers (rare roles, locations, unique events)?
  • Will you upload full audio, or only selected clips?
  • Will transcripts include speaker labels tied to identities?

B) Consent and participant expectations

  • Does consent language cover third-party processing (if applicable)?
  • Does consent specify where data is stored or who may access it?
  • Will you need to re-consent or notify participants if you change the process?

C) Vendor/tool processing and retention

  • Is there an opt-out for training/product improvement, and is it enabled?
  • Are retention settings available and configured to your study needs?
  • Can you delete audio and transcripts on demand, and is deletion documented?
  • Are subprocessors disclosed?

D) Data residency and transfers

  • Do you know the storage and processing regions?
  • Does your institution require a specific region (or prohibit cross-border transfers)?
  • Do vendor support practices create cross-border access?

E) Access control and least privilege

  • Are accounts limited to study staff who need access?
  • Is MFA enabled?
  • Are sharing links restricted and monitored?
  • Are exports/downloads limited to approved roles?

F) Auditability and incident response

  • Can you access audit logs for views/downloads/shares?
  • Do you have a documented plan for breach/incident reporting at your institution?
  • Do you have a plan to correct transcript errors that could harm participants?

G) Quality and human review

  • Will a trained team member review transcripts for accuracy before analysis?
  • Will you flag and handle sensitive segments (e.g., names, trauma details) differently?

Mitigation controls you can propose (with practical examples)

Controls work best when they are specific, repeatable, and easy to audit. Choose a set that matches your data sensitivity and the tools your institution allows.

1) Use approved tools and document approvals

  • Prefer tools already reviewed by your university’s IT/security or privacy office.
  • Keep a record of the approved configuration (settings, regions, retention choices).
  • If the tool is not approved, ask your institution what review is needed before use.

2) Redact or de-identify before upload

Pre-upload redaction reduces exposure because the most sensitive details never leave your control.

  • Remove participant names, addresses, workplaces, and other direct identifiers from filenames and notes.
  • Consider clipping audio so you upload only the segments needed for analysis.
  • Replace identifiers with codes (e.g., P01, ClinicA) before uploading.

3) Restrict access and sharing

  • Limit tool accounts to the smallest possible set of staff.
  • Separate studies into separate workspaces or projects.
  • Disable public sharing links, or require authentication for any shared items.

4) Configure retention and deletion routines

  • Set a retention window aligned to your protocol (for example, delete raw audio after verification if you do not need it).
  • Create a routine: upload → transcribe → review → export to your secure repository → delete from the tool.
  • Track deletion dates in a study log.

5) Add human review for sensitive segments

AI transcripts can include errors like wrong names, swapped speakers, or missed negation (“no” vs “know”). Those errors can create analysis mistakes and, in some cases, participant risk.

  • Have trained staff review high-risk sections (names, locations, medical details, illegal activity).
  • Use a two-pass check for quotes you plan to publish.
  • Keep a correction log when accuracy affects coding decisions.

6) Store the “source of truth” in your secure research environment

  • Export transcripts to your institution-approved secure storage.
  • Limit long-term storage in the transcription tool unless your IRB and institution approve it.
  • Maintain a clear file naming scheme that does not include identifiers.

7) Document audit steps

  • Save screenshots/PDFs of key settings (retention, training opt-out, region) at the start of the study.
  • Keep access lists and role assignments in a study binder.
  • Export audit logs periodically if the platform supports it.

What to include in your IRB application or amendment

Make it easy for reviewers to see that you understand the data flow and that you have controls in place. Use short, concrete statements and avoid vague promises like “we will keep data secure.”

Describe the data flow in 6 lines

  • What you record (audio/video) and where it is stored initially.
  • What you upload to the transcription tool (full files or redacted clips).
  • Who can access the tool and how you control access.
  • Where processing/storage occurs (data residency) if known/required.
  • How long data stays in the tool (retention) and how you delete it.
  • Where the final transcript lives and who can use it for analysis.

Add a short “risk + control” table

  • Risk: Third-party access to identifiable audio. Control: Pre-upload redaction + restricted access + deletion after export.
  • Risk: Cross-border processing. Control: Use institution-approved region or avoid upload of identifiable data.
  • Risk: Transcript errors affecting analysis or quotes. Control: Human review for sensitive segments and publication excerpts.

Align with consent language

If your consent form does not mention third-party processing or cloud tools, your IRB may ask you to revise it. Keep the language simple and factual.

Also describe any limits, like “We will remove names before uploading audio for transcription.”

Common pitfalls (and how to avoid them)

Most IRB trouble happens when teams pick a tool first and ask questions later. These pitfalls are avoidable with a short pre-check.

  • Assuming “no identifiers” because you removed names: voices, locations, and unique stories can still identify people.
  • Not checking retention defaults: many tools keep files until you delete them, and deletion may not cover backups.
  • Unclear vendor terms: if you cannot confirm training use, access limits, or subprocessors, treat it as “unknown risk.”
  • Sharing transcripts by email: export to your secure system and share via approved access controls.
  • Skipping human review: errors can change meaning and harm data quality, especially in sensitive interviews.
  • One-size-fits-all controls: IRBs differ, and the same tool may be fine for one dataset and not another.

Common questions

Do I need IRB approval to use AI transcription?

If your study is IRB-reviewed, treat AI transcription as part of your data handling plan. Many teams submit it as part of the initial protocol or as an amendment if they add it later.

Is uploading audio to an AI tool considered sharing data with a third party?

Often, yes, because another organization processes the content. Your IRB may want to know the vendor’s role, any subprocessors, and what the terms allow.

Can I use AI transcription if I remove participant names?

Sometimes, but removing names may not be enough. Consider other identifiers (voice, places, rare details) and apply additional redaction or clipping when needed.

What if my IRB and IT security office give different guidance?

Ask for a joint review or a written decision path. IRBs focus on participant risk and consent, while security offices focus on technical controls and vendor risk.

Should I keep the audio after I have the transcript?

That depends on your protocol and analysis needs. If you do not need raw audio long-term, consider deleting it after verification and keeping only what you must retain for the study.

How do I handle transcripts that include illegal activity or other high-risk disclosures?

Limit access, separate and protect those segments, and follow your institution’s requirements and your consent language. In some cases, you may need a stricter workflow than the rest of the dataset.

Is automated transcription accurate enough for qualitative research?

Accuracy varies by audio quality, speakers, and terminology. Plan for human review, especially for quotes, sensitive topics, and any content that drives key coding decisions.

If you want a workflow that balances speed with risk controls, GoTranscript offers multiple options, including automated transcription and transcription proofreading services to add human review where it matters. When you need a more hands-on, documented approach for research materials, you can also use GoTranscript’s professional transcription services.