To import or export transcripts in ELAN without breaking timecodes, you need to move data in formats that keep time-alignment (usually ELAN’s native .eaf or tabular exports that include start/end times) and avoid “free text” edits that shift rows or encoding. The safest workflow is: keep an .eaf master file, export CSV/TSV for analysis, and re-import only into matching tier structures with the same time boundaries.
This guide explains supported formats, how to preserve alignment, how to avoid losing tier structure, and how to fix common problems like misaligned segments and encoding issues.
Primary keyword: import export transcripts in ELAN
Key takeaways
- Keep an .eaf file as your master; treat CSV/Text as “copies” for analysis, not the source of truth.
- To preserve timecodes, always include start time and end time columns when exporting tabular data.
- To preserve tiers, re-import into existing tiers (same names, same types, same parent/child setup) whenever possible.
- Avoid editing CSV in ways that change row order, delimiter consistency, or quoting, or you risk misalignment.
- Most “broken timecodes” issues come from merged/split cells, hidden characters, delimiter changes, or encoding.
What ELAN import/export can (and can’t) preserve
ELAN is built around a time-aligned annotation model, so the best way to preserve timecodes and tier relationships is to keep data in ELAN’s native format.
When you export to CSV or plain text, you often flatten a multi-tier structure into rows and columns, which can lose hierarchy unless you plan the export and import carefully.
Formats you will see in real ELAN workflows
- .eaf (ELAN Annotation Format): Best for preserving timecodes, tiers, controlled vocabularies, and linked media references.
- CSV/TSV (tabular): Good for analysis in Excel/R/Python if it includes tier name and time boundaries.
- Plain text: Useful for reading or sharing, but usually loses time alignment unless you add timestamps.
What usually breaks when moving to CSV/Text
- Tier structure: Parent/child tiers and linguistic types can flatten into “just columns.”
- Time alignment: If a file loses start/end times or row order changes, alignment shifts.
- Special characters: Diacritics, IPA, or non-Latin scripts can become garbled if encoding changes.
- Line breaks: Multi-line annotations can create “phantom rows” in CSV.
Safe export workflows (CSV/Text) that keep timecodes intact
The safest mindset is: export for a purpose, not “export everything.” If you export for analysis, design the output so every row can be traced back to a tier and a time interval.
Also, export a clean backup before you start so you can always compare and recover.
Workflow A: Export for analysis (recommended)
- Step 1: Save a clean master copy of your project as .eaf (and keep linked media in the same folder structure).
- Step 2: Export tabular data with these minimum fields:
- Tier name
- Annotation value (text)
- Start time
- End time
- Annotation ID (if available)
- Step 3: Use CSV/TSV for analysis, but avoid edits that change the number of rows tied to time intervals.
- Step 4: If you need to bring results back to ELAN, import into a new tier rather than overwriting original tiers.
Workflow B: Export for sharing a readable transcript
If the goal is readability (not round-trip import), export plain text with timestamps included per segment.
Use a consistent timestamp format (for example, hh:mm:ss.mmm) and keep one annotation per line to reduce ambiguity.
Workflow C: Export for archiving
For long-term archiving, prioritize formats that preserve structure and context: the .eaf plus media, plus a human-readable transcript export.
If you must include CSV, include a data dictionary describing each column, delimiter, encoding, and the source tiers.
How to preserve time alignment (what matters most)
Timecodes “break” when an exported row no longer maps to the same start/end times on re-import, or when the tier you import into uses different alignment rules.
Prevent that by treating start/end times as required data, not optional metadata.
Rules that keep segments aligned
- Always export start and end times for each annotation you might re-import.
- Do not reorder rows unless you keep a stable ID column that ELAN can match back.
- Don’t change time formats (for example, from milliseconds to seconds) unless you convert consistently and document it.
- Keep one annotation per row; avoid wrapping text across multiple lines inside a cell.
- Don’t “prettify” timestamps in Excel; spreadsheet auto-formatting can silently alter values.
Time boundaries vs. text edits
Editing text is usually safe if you do not change row counts, delimiters, or quoting.
Editing segmentation (splitting one row into two, or merging two into one) is risky because it changes time boundaries and can make re-import ambiguous.
How to avoid losing tier structure on import (and what to do instead)
ELAN tiers can have types, constraints, and parent/child relationships, and those relationships often do not survive a “flat” export.
Your goal is to keep the tier model stable in ELAN and import only values (or new analysis results) into the right place.
Best practices for tier-safe round trips
- Create tiers first in ELAN, then import values into those existing tiers.
- Keep tier names identical between the CSV header/column mapping and the ELAN tier names.
- Prefer importing into new tiers (for example, “POS_reviewed” instead of overwriting “POS”).
- Keep consistent tokenization rules if a tier depends on token boundaries (common in word-level tiers).
When CSV/text is the wrong tool
If you need to preserve a complex hierarchy (like speaker tiers, translation tiers, morphology tiers, and dependent tiers), use .eaf as the interchange format between collaborators.
Use CSV exports for analysis snapshots, not as the main file you pass around and re-import repeatedly.
Export checklists (analysis vs. archiving)
Use these checklists to reduce the chances of losing timecodes, breaking tier structure, or creating files you can’t re-import later.
Checklist: Exporting from ELAN for analysis (CSV/TSV)
- Export includes: tier name, start time, end time, annotation text.
- Delimiter choice is explicit: TSV often works better than CSV if your text contains commas.
- Encoding is set to UTF-8 (especially for non-English scripts and IPA).
- Multi-line annotations are handled (either removed, replaced, or quoted consistently).
- File includes an ID column or other stable reference if you plan a round trip.
- You keep a copy of the original export unchanged for comparison.
Checklist: Exporting for archiving
- Save the .eaf master file.
- Archive linked media files in a stable folder structure (and keep filenames unchanged).
- Export a human-readable transcript (with timestamps).
- Include a simple README with:
- Project name and date
- ELAN version used (if known)
- Media filenames and formats
- Tier list and what each tier means
- CSV delimiter and encoding
Common errors (and fixes) when importing/exporting ELAN transcripts
Most problems look like “timecodes shifted,” “tiers disappeared,” or “characters turned into boxes.”
Use the symptom-first fixes below to debug quickly.
Error: Misaligned segments after CSV edit
- What it looks like: Text no longer matches the right audio span, or multiple rows map to the wrong time interval.
- Common causes:
- Rows were sorted or filtered and saved.
- Some rows were deleted, merged, or duplicated.
- Line breaks in a cell created extra rows.
- Excel auto-converted timestamps (for example, into date/time formats).
- Fix:
- Re-export a fresh CSV from the .eaf, then redo edits using a tool that preserves delimiters and quoting.
- If you must use Excel, lock columns for start/end times and avoid sorting unless you can restore original order.
- Prefer importing edits into a new tier so you can visually compare old vs. new.
Error: Tier structure is lost or tiers don’t map on import
- What it looks like: Imported text lands in the wrong tier, creates unexpected tiers, or fails to import.
- Common causes:
- Tier names changed (even small differences like spaces or capitalization).
- The target tier has different constraints (parent/child rules) than the source data expects.
- The import file lacks a tier column or clear mapping.
- Fix:
- Standardize tier names before export and keep a tier list in your README.
- Create the destination tiers in ELAN first, then import values into those tiers.
- When in doubt, import into a temporary tier and manually verify alignment.
Error: Encoding issues (garbled characters, “???”)
- What it looks like: Accents, IPA, or non-Latin scripts display incorrectly after export or re-import.
- Common causes: File saved in the wrong encoding (often not UTF-8), or the program opening the file guessed incorrectly.
- Fix:
- Export and save files in UTF-8.
- Use a text editor that lets you choose encoding (and re-save as UTF-8 if needed).
- Avoid copy/paste through apps that strip Unicode characters.
Error: Extra columns or broken rows due to commas, quotes, or new lines
- What it looks like: A single annotation spills into multiple columns or rows in CSV.
- Common causes: Unescaped commas/quotes, or annotation text contains line breaks.
- Fix:
- Use TSV instead of CSV when possible.
- Ensure quotes are consistently escaped if you must use CSV.
- Replace line breaks in annotation text with a visible marker (like “\n”) before export, if your workflow supports it.
Error: Media links break after moving files
- What it looks like: ELAN opens the .eaf but cannot find the audio/video.
- Common causes: Media files were renamed or folder paths changed.
- Fix:
- Keep media filenames stable.
- Archive projects as a folder, not as scattered files.
- Relink media in ELAN and re-save the .eaf.
Common questions
- Should I use CSV or TSV when exporting ELAN annotations?
TSV is often safer because transcript text commonly contains commas that can break CSV parsing. - Can I round-trip an ELAN transcript through Excel without issues?
You can, but it’s risky if Excel auto-formats timestamps, wraps lines, or changes quoting; protect time columns and avoid sorting or filtering before saving. - What columns do I need to re-import without breaking timecodes?
At minimum: tier name (or a known target tier), start time, end time, and annotation value; an ID column helps if you need to match rows reliably. - Why do my tiers import but appear empty?
Tier mapping may not match, or time boundaries may not align with the tier’s constraints; import into a temporary tier to confirm the data is present. - How do I prevent multi-line annotations from breaking my CSV?
Keep one annotation per row and avoid embedded line breaks; if you must keep them, ensure proper CSV quoting or switch to TSV. - Is plain text export a good idea for archiving?
Plain text can be a helpful “view copy,” but it usually cannot preserve tier structure; archive the .eaf and linked media as the authoritative package.
A practical “safe handoff” pattern for teams
If multiple people touch the transcript, use a two-track system: one track for ELAN structure, one track for analysis edits.
Keep the .eaf file as the master, and let collaborators work on exported tables that you import back into clearly labeled new tiers.
- Master: .eaf + media (never edited in Excel).
- Analysis copy: TSV with start/end times + tier name + value.
- Return path: Import results into new tiers (don’t overwrite originals).
If you also need shareable captions or subtitles from the same source, it can help to export transcript text for review and then convert it into the right timed format later using a dedicated workflow and quality checks.
For caption-focused deliverables, you may also want to review GoTranscript’s closed caption services or subtitling services to keep reading speed, line breaks, and timing rules consistent.
When you need a clean, reliable transcript you can safely bring into tools like ELAN—or export from ELAN for analysis and archiving—GoTranscript can help with professional transcription services that fit your workflow.