To use ELAN for multilingual data, set up separate tiers for the original language and each translation, then use a shared glossary and a clear review workflow so every translator makes the same choices. You’ll get more consistent translations, faster review, and fewer “mystery” terms later. This guide shows a practical tier template, a glossary enforcement routine, and a workflow for code-switching and ambiguous terms.
Primary keyword: ELAN for multilingual data
Key takeaways
- Create separate tiers for source language, translations, and notes; avoid mixing languages in one tier.
- Use a shared glossary plus a decision log to keep term choices stable across files and translators.
- Handle code-switching with language-tag tiers and clear rules for when to split or keep segments.
- Build review into the project: self-check, second-pass review, and final consistency scan.
Plan your ELAN template (before you import media)
Good ELAN projects start with a template that every file follows. If you decide tier names and rules after people begin annotating, you will spend hours merging and cleaning later.
Before you create tiers, answer three questions: What is your “source of truth” for timing, what translations do you need, and how will you mark uncertainty. For multilingual work, the timing “source of truth” is usually the original-language segmentation tier.
Choose a tier naming convention that scales
Pick tier names that work across many recordings and languages. Keep names short, predictable, and easy to filter in ELAN’s tier list.
- Language codes: use a consistent format such as src, en, es, or ISO-like tags if your team prefers.
- Tier purpose: add a suffix like _seg, _orth, _trans, _notes.
- Speaker labels: if you need speaker-separated tiers, use spk01, spk02 rather than names.
Example pattern: src_seg, src_orth, en_trans, en_notes, term_flags.
Decide your segmentation rule (the most important rule)
Translations become inconsistent when source segmentation changes mid-project. Set one segmentation rule and stick to it.
- Time-based: segment by pauses (useful for natural speech and conversation).
- Syntax-based: segment by clauses or sentences (useful for narrative or prepared speech).
- Turn-based: segment by speaker turns (useful for interviews and multi-party talk).
Whatever you choose, write it in a short project README and keep it with the ELAN template. Include edge rules like interruptions, laughter, and overlapping speech.
Set up translation tiers the “safe” way (separate, linked, and reviewable)
In ELAN, treat the original language tier as the anchor and make translation tiers dependent on it. This keeps time alignment stable and lets reviewers compare source and translation line-by-line.
Recommended tier structure for multilingual translation
This structure works for most teams and avoids mixing tasks in one tier. Adjust names to match your language set.
- Source segmentation tier (alignable): src_seg (time-aligned units).
- Source transcription tier (referring): src_orth (orthographic transcription linked to src_seg).
- Translation tier(s) (referring): en_trans, fr_trans, etc., each linked to src_seg (or to src_orth if your team prefers).
- Translator notes tier(s) (referring): en_notes for uncertainties, alternatives, and context.
- Terminology/flags tier (referring): term_flags for glossary issues, ambiguous terms, and code-switch tags.
If you have multiple speakers and you need speaker-separated transcription, keep segmentation separate from speaker detail. For example, use src_seg for timing, then src_orth_spk01 and src_orth_spk02 as referring tiers linked to the same segments.
Link translations to the right parent tier
Link translation tiers to the tier that defines the units reviewers will check. Most projects link translations to src_seg so a segment always has exactly one translation entry per target language.
- Link to src_seg if you want stable review units and easy export.
- Link to src_orth if you want translators to work directly from the transcription text (but keep src_seg stable).
Either way, avoid creating “floating” translation segments that are independently time-aligned. Floating segments drift and make it hard to compare versions.
Use controlled values for flags (simple but powerful)
To keep reviews consistent, define a small set of flag values and require annotators to use them. You can store these in a tier that acts like a checklist for each segment.
- AMB = ambiguous meaning
- TERM = glossary term present (or needs glossary decision)
- CS = code-switching inside segment
- UNC = uncertain hearing
- NAME = proper name needs confirmation
Keep the list short so people actually use it, then explain what each flag means and what action it triggers in review.
Glossary enforcement in ELAN: a workflow that actually works
ELAN does not force translators to use your glossary by itself, so enforcement comes from process. The goal is not to police people; it is to prevent “same term, five translations” across files.
Create two assets: a glossary and a decision log
Use a glossary for stable term mappings and a decision log for messy cases. The decision log is where you record what you did when the glossary does not cover the situation.
- Glossary (stable): term, approved translation, part of speech (optional), notes, example, date, owner.
- Decision log (flexible): term or phrase, context, options considered, final choice, link to segments, reviewer initials, date.
Store these files where everyone can access them, and freeze versions per delivery if needed. Keep them simple so people update them instead of working around them.
Mark glossary terms at the segment level
Ask translators to flag any segment that includes a glossary term or a candidate term. This makes later QA faster because reviewers can jump straight to the risky segments.
- If a segment contains a known glossary term, add TERM in term_flags.
- If a segment contains a new term that should become a glossary entry, add TERM and note it in en_notes.
- If a term has multiple meanings, add AMB and write the competing meanings in en_notes.
Add a “glossary check” step to every review cycle
Glossary enforcement fails when it is optional. Make it a required checkbox step for translators and reviewers.
- Translator self-check: scan their own file for key terms and confirm they match the glossary.
- Second-pass review: verify flagged segments first, then skim unflagged segments.
- Final consistency scan: search for high-risk terms across exports and fix strays.
If you export ELAN annotations to text for scanning, keep the export format consistent across files so your term searches behave predictably.
Workflow for code-switching (and mixed-language segments)
Code-switching breaks “one segment = one language” assumptions. You can still keep ELAN tidy if you decide when to split segments and how to tag language changes.
Pick one of two code-switch strategies
Both can work, but mixing them in the same project creates confusion. Choose one based on how often code-switching happens and what you need to analyze later.
- Strategy A: Split segments at language switches (best for frequent switching and linguistic analysis).
- Strategy B: Keep the segment, tag the switch (best when switching is rare and translation is the main goal).
Strategy A: split at the switch
When you split, the source segmentation tier becomes more granular. This improves clarity but increases segment count.
- In src_seg, create a new boundary at the language switch.
- In src_orth, transcribe each language part as spoken.
- In each translation tier, translate the whole segment into the target language, but keep notes if a word stays in the original language.
- In term_flags, add CS to any segment created because of code-switching.
Strategy B: keep the segment, tag the switch
When you keep segments, you need a clear way to mark which words are in which language. Do it in a structured way so reviewers can see the switch quickly.
- Keep src_seg unchanged unless timing demands a split.
- In src_orth, tag the switched phrase using a consistent marker, such as [L2: ...] or {es: ...}.
- In term_flags, add CS and explain the switch briefly in en_notes.
Whichever strategy you choose, write one rule about loanwords: when a borrowed word counts as code-switching versus normal usage. If the team argues about this later, your consistency will collapse.
Handling ambiguous terms: a repeatable decision process
Ambiguity shows up as homonyms, unclear reference, jargon, and cultural terms. The worst outcome is “silent guessing,” because it hides uncertainty from reviewers and future users.
Use a three-step ambiguity workflow
- Step 1: Flag it. Add AMB in term_flags and write a short note: what is ambiguous and why.
- Step 2: Offer options. In en_notes, list 2–3 possible meanings or translations, plus the clue you used (tone, nearby words, topic).
- Step 3: Decide and log. When a reviewer picks the final option, add it to the decision log and update the glossary if it becomes a recurring term.
When to keep the original term in the translation
Sometimes the best “translation” is to keep a term and explain it. This is common for names, titles, institutions, and culturally specific words.
- Keep the original term if translating would mislead the reader.
- Add a short explanation in en_notes if readers need context.
- Standardize this choice in the glossary so it stays consistent.
Quality control: review steps that prevent drift across files
Multilingual projects drift when different people solve the same problem in different ways. A light but consistent QC routine helps more than a heavy one that nobody follows.
A practical three-pass review (fast and consistent)
- Pass 1 (translator): confirm every src_seg has a matching translation entry, then resolve obvious typos and missing lines.
- Pass 2 (reviewer): review all TERM, AMB, and CS segments first, then spot-check the rest for tone and completeness.
- Pass 3 (project lead): run a consistency scan across files using exports, focusing on key glossary terms and recurring names.
Common pitfalls (and how to avoid them)
- Pitfall: Translators change segmentation to “make translation easier.” Fix: lock segmentation rules and require notes instead of segmentation edits.
- Pitfall: Two tiers contain the same information in different forms. Fix: assign one purpose per tier (transcription, translation, notes, flags).
- Pitfall: Glossary exists but nobody updates it. Fix: make “add new term” part of the definition of done for each file.
- Pitfall: Code-switching is marked inconsistently. Fix: choose one strategy and document the rule for loanwords and names.
Deciding between human and automated support
If you start from automated output, plan time for cleanup and term consistency checks. Automated tools can help you draft a transcript quickly, but multilingual projects still need careful review for names, code-switching, and domain terms.
- If you need a fast first pass, consider automated transcription for a draft that your team can correct in ELAN.
- If you already have transcripts, consider a dedicated review step like transcription proofreading services before translation work begins.
Common questions
- Should translations be time-aligned in ELAN?
Usually, no; link translation tiers to the source segmentation so each time segment has one translation unit per language. - Do I need separate translation tiers for each target language?
Yes, if you want clean exports and reliable review; one tier per target language keeps comparisons simple. - How do I handle overlapping speech in multilingual data?
Keep segmentation stable and use speaker-specific transcription tiers that refer to the same segments, then translate based on the speaker tiers as needed. - What if two translators disagree on a key term?
Flag the segment, record options in notes, then make one decision in the decision log and update the glossary so the choice repeats everywhere. - How should we mark code-switching?
Either split segments at switches or keep segments and tag the switched phrase; pick one method and use a consistent marker plus a CS flag. - What’s the minimum QA that still works?
At least do a translator self-check, a reviewer pass on flagged segments, and a final glossary consistency scan across exports.
If you want extra support turning multilingual audio into clean, consistent text that’s ready for annotation and translation, GoTranscript offers professional transcription services that can fit into an ELAN-based workflow.