Glossing in linguistics usually means writing an original line, a morpheme-by-morpheme gloss, and a free translation in a consistent, readable format.
When you keep spacing, segmentation, and abbreviations consistent, your transcripts become easier to check, cite, and compare across speakers and languages.
This guide gives you a practical template, several worked examples, and a starter glossary of abbreviations you can share with a team.
Primary keyword: linguistics glossing template
Key takeaways
- Use a 3-line format: original text, morpheme-by-morpheme gloss, and free translation.
- Align morphemes and glosses by spacing, and keep segmentation consistent across the dataset.
- Choose one abbreviation set (and one style for hyphens, clitics, and reduplication) and document it.
- Add a team glossary and a quick “decision sheet” for tricky cases like zero marking and fused morphemes.
- Always separate gloss (meaning/function per morpheme) from translation (natural language rendering).
What “glossing” means (and what it is not)
In linguistic transcripts, “glossing” most often refers to interlinear glossed text (IGT), where each word is broken into meaningful parts and each part gets a short label.
The point is to make structure visible, not to produce the best-sounding translation.
Gloss vs. free translation
A morpheme-by-morpheme gloss is a mapping: each morpheme gets a consistent gloss (like PST for past or 1SG for first-person singular).
A free translation is what you would actually say in English (or another analysis language), even if word order changes.
What you should decide before you start
- Transcription system: IPA, practical orthography, or community spelling.
- Segmentation depth: full morpheme segmentation vs. only major affixes.
- Abbreviation standard: a shared set across the project.
- Formatting rules: line order, punctuation, how you mark clitics, etc.
The standard 3-line format (plus optional lines)
The most common format uses three lines per example, sometimes with a label and a source line.
Keep each example as a self-contained unit so readers can quote it without extra context.
Core lines
- Line 1 (Original): the language data as transcribed.
- Line 2 (Gloss): morpheme-by-morpheme gloss aligned to Line 1.
- Line 3 (Free translation): a natural translation in quotes.
Helpful optional lines
- Example label: (1), (2a), etc.
- Source/metadata: speaker ID, timecode, recording name, genre.
- Morphological notes: only when needed, and keep them separate from the IGT.
A copy-paste linguistics glossing template
Use this as a starting point for papers, theses, field notes, or a transcript appendix.
Replace the placeholders and keep the same structure for every example.
Template (plain text)
- (#) Example label
- Original: WORD(-MORPH)=CLITIC WORD(-MORPH)
- Gloss: GLOSS(-GLOSS)=CL GLOSS(-GLOSS)
- Translation: “Free translation here.”
- Optional: [Speaker=S01; Time=00:12:08–00:12:11; Source=Interview01]
Template (aligned IGT style)
(#)
Line 1 original text goes here
Line 2 gloss aligned to the original
Line 3 “free translation goes here”
Tip: If you cannot keep alignment perfectly in proportional fonts, use a monospaced font in drafts or move IGT to a table in your final document.
Formatting rules that prevent 90% of glossing problems
Most glossing issues come from inconsistent segmentation and inconsistent abbreviation choices.
Pick rules early, write them down, and apply them even when examples look “messy.”
1) Keep morpheme boundaries consistent
- Use hyphens to separate morphemes within a word: walk-ed.
- Use equals for clitics if your project distinguishes them: go=too.
- Use a dot for fused meanings when one form expresses multiple categories: PST.1SG.
2) Align glosses by spacing (one token per token)
- Each word in Line 1 should line up with a gloss “word” in Line 2.
- Each morpheme separated by a hyphen should line up with a gloss separated by a hyphen.
- If you split a form as ka- in one example, do not write it as k- in another without a clear rule.
3) Use small caps for grammatical abbreviations
Many style guides prefer small caps for grammatical labels (like PST, PL, NEG) and normal case for lexical glosses (like dog, run).
If small caps are not available, use consistent all-caps for grammatical abbreviations.
4) Gloss the same morpheme the same way
- Do not gloss the same suffix as PST in one place and PAST in another.
- Prefer one label per function, unless your team decides to represent fine-grained distinctions.
- When a morpheme has multiple functions, agree on a primary gloss or use dot notation (example: SBJV.IRR).
5) Mark uncertain or missing information clearly
- Use a consistent marker for uncertain transcription (example: a question mark after the segment).
- Use a consistent marker for untranscribed audio (example: [inaudible]).
- Use a consistent marker for unknown gloss (example: ???), and track it in a to-do list.
6) Decide how you handle zero marking
If a category is “present but not overtly marked” (like 3SG in some languages), teams handle it differently.
Pick one approach and document it, such as leaving it unmarked, or marking it with a symbol like Ø in a notes line.
Worked examples (with common patterns)
The examples below use an illustrative format to show how spacing, hyphens, equals signs, and abbreviations work.
For real research data, apply the same formatting rules but keep your language-specific analysis consistent with your project.
Example 1: Simple affixation (tense)
(1)
Line 1 walk-ed home
Line 2 walk-PST home
Line 3 “I walked home.”
- Line 2 stays close to Line 1 and labels the suffix function.
- Line 3 adds the subject “I” because English needs it, even if the original does not show it.
Example 2: Person/number + tense (dot notation)
(2)
Line 1 go-nga
Line 2 go-PST.1SG
Line 3 “I went.”
- Use dot notation when one morpheme bundles categories and you do not split the form.
- If you later analyze the morpheme as separable, update segmentation and gloss together.
Example 3: Clitics (equals sign)
(3)
Line 1 see=na dog
Line 2 see=FOC dog
Line 3 “It’s the dog that (someone) saw.”
- The equals sign visually separates clitics from affixes, which helps readers see structure.
- If your project does not distinguish clitics, you can use hyphens instead, but stay consistent.
Example 4: Negation + auxiliary (multiword glossing)
(4)
Line 1 I do not know
Line 2 1SG AUX NEG know
Line 3 “I don’t know.”
- Not every language needs segmentation inside every word; sometimes the key is the functional tags.
- Use AUX when the helper verb carries tense/aspect or agreement.
Example 5: Reduplication (marking with ~)
(5)
Line 1 talk~talk
Line 2 talk~REDUP
Line 3 “(They) talked repeatedly.”
- Teams use different reduplication conventions; pick one and document it.
- If you copy the full base twice in Line 1, gloss the function clearly in Line 2.
Example 6: Classifiers / noun class (keep labels short)
(6)
Line 1 CLF-round fruit
Line 2 CLF.ROUND fruit
Line 3 “a round fruit”
- For classifier systems, keep the top-level tag consistent (like CLF) and specify type with a dot.
- If you have many types, put the full list in your project abbreviation glossary.
Create a team glossary of glossing abbreviations (starter list)
A shared abbreviation list prevents silent drift, where each person glosses the same category differently.
Start with a short list, then expand only when your data requires it.
Recommended starter glossary
- 1, 2, 3 = first, second, third person
- SG, PL = singular, plural
- PST, PRS, FUT = past, present, future
- IPFV, PFV = imperfective, perfective
- PROG = progressive
- NEG = negation
- Q = question particle
- TOP = topic marker
- FOC = focus marker
- DEF, INDF = definite, indefinite
- DAT, ACC, GEN, LOC = dative, accusative, genitive, locative
- ERG, ABS = ergative, absolutive
- SBJ, OBJ = subject, object (use only if your analysis needs them)
- CAUS = causative
- PASS = passive
- REFL = reflexive
- RECIP = reciprocal
- COP = copula
- AUX = auxiliary
- CLF = classifier
Rules for abbreviations (so everyone stays consistent)
- Use all caps for grammatical abbreviations and lowercase for lexical meanings.
- Do not mix synonyms (choose PST or PAST, not both).
- Do not overload one abbreviation with two meanings (if CL can mean “class” or “clitic,” pick one).
- Keep abbreviations short, but not cryptic; your future self is part of the audience.
When you need a broader standard reference, many linguists align their abbreviations with the Leipzig Glossing Rules.
Workflow: from raw audio to a clean glossed transcript
A stable workflow reduces rework, especially when multiple people edit the same data.
This sequence works for fieldwork, lab recordings, and classroom corpora.
Step-by-step process
- Step 1: Transcribe the original in your chosen orthography, with speaker labels and timecodes if needed.
- Step 2: Segment morphemes using your project rules (hyphens, equals, dots).
- Step 3: Add the gloss line using your abbreviation glossary.
- Step 4: Write the free translation for readability and meaning, not word order.
- Step 5: Run a consistency check (same morpheme, same gloss; same abbreviation, same meaning).
- Step 6: Add metadata (speaker, date, location, elicitation context) in a separate line or file.
Simple consistency checks you can do fast
- Search for near-duplicates: PST vs PAST, PL vs PLU, IPFV vs IMPF.
- Check one morpheme at a time: list all tokens of a suffix and confirm the gloss matches.
- Check alignment: each hyphen in Line 1 should have a matching hyphen in Line 2.
Common pitfalls (and how to fix them)
Glossing often breaks down in the same places: unclear segmentation, unclear category labels, and “translation creep.”
Fixes usually involve choosing one rule and applying it everywhere.
Pitfall 1: The gloss line turns into a translation
- What it looks like: Line 2 uses full English phrases instead of morpheme labels.
- Fix: Keep Line 2 minimal (functions and lexical meanings), and move phrasing to Line 3.
Pitfall 2: Inconsistent segmentation across similar forms
- What it looks like: Sometimes you write na= as a clitic, sometimes as a suffix.
- Fix: Decide your analysis, then update old examples so the dataset matches.
Pitfall 3: Abbreviations multiply without control
- What it looks like: One person uses SUBJ, another uses SBJ, a third uses S.
- Fix: Maintain a single shared glossary file and require edits to go through it.
Pitfall 4: Punctuation and capitalization vary by editor
- What it looks like: Some examples use quotes on translations, others do not.
- Fix: Add a style sheet: quotes, italics, example numbering, and how to mark pauses or laughter.
Pitfall 5: Sensitive data handling gets overlooked
Transcripts can include personal names, locations, or other identifying details depending on your project.
If you work with human subjects, follow your institution’s ethics requirements and data protection rules for storage and sharing.
Common questions
- Do I have to follow the Leipzig Glossing Rules?
No, but they offer a widely understood baseline, and they help teams stay consistent. - Should I gloss every morpheme?
It depends on your research question and time, but choose a level of detail and keep it consistent across examples you compare. - What if I’m not sure about a morpheme boundary?
Mark it as uncertain, keep a running list of unresolved items, and revisit after you analyze more data. - How do I gloss loanwords or names?
Many projects leave proper names unglossed or label them as PN, and they gloss loanwords with a lexical meaning if it helps the analysis. - How should I handle idioms in the free translation?
Use a natural translation on Line 3, and if needed add a short note explaining the literal meaning separately. - Can I use tables for interlinear glossing?
Yes, tables can keep alignment stable, especially in word processors that distort spacing. - What’s the difference between subtitles/captions and linguistic glosses?
Subtitles and captions aim for readability and timing, while glosses aim to show grammatical structure; you can produce both from the same transcript when needed.
If your project also needs viewer-ready text, you may want closed caption services or subtitling services alongside your analysis transcripts.
When to use human help vs automated tools
Automated tools can speed up first-pass transcription, but glossing still needs linguistic decisions about morphemes and categories.
If you do use automation, treat it as a draft and keep a clear review step before you cite examples in writing.
- Use automation for: rough text, time-stamps, searchable drafts.
- Use human review for: speaker turns, technical terms, named entities, and any segment you will gloss and publish.
For teams that want a faster starting point, automated transcription can help you get text on the page before you add segmentation, glosses, and translations.
Practical CTA: getting clean transcripts you can gloss
If you already have recordings and you want a reliable transcript to start your segmentation and glossing, GoTranscript can help with professional transcription services so your team can spend more time on analysis, abbreviations, and translations.