Why We Need Subtitles: The Evolution of Movie and TV Sound Technology (Full Transcript)

Explore why modern viewers rely on subtitles due to changes in sound technology, naturalistic performances, and the challenges of downmixing for various devices.

Download Transcript (DOCX)

Speakers

Add new speaker

Speaker 1: I watch a lot of movies and TV. On the train, at home, the movies, out, in the bath. But no matter where I'm watching, I find myself constantly doing this one thing.

Speaker 2: I think there will always be more.

Speaker 1: What?

Speaker 2: I think there will always be more. What? I think there will always be more. I think there will always be more.

Speaker 1: Oh. It turns out this isn't unusual. We polled our YouTube audience and about 57% of people said that they feel like they can't understand the dialogue in the things that they watch unless they're using subtitles. But it feels like this hasn't always been the case. So to figure out what was going on, I made a call.

Speaker 3: Hi, my name is Austin Olivia Kendrick. I'm a professional dialogue editor for film and TV. I basically perform audio surgery on actors' words. Do you watch with subtitles? I do, actually. I do, a lot of the time.

Speaker 1: So why do you think that we all feel like we need subtitles now?

Speaker 3: I get asked this question all the time. All the time. It's something that is, it doesn't have a simple, straightforward answer. It's very layered and very complex.

Speaker 1: And after talking to Austin for almost two hours, it's true. It's a very layered and complex topic. But everything kept pointing back to one main thing. Technology. That got us from this.

Speaker 4: I'll get you, my pretty. You should be kissed enough. No, Richard, no. What has happened to your love?

Speaker 1: To this. Mom, I just woke up. That was a slim waisted girl.

Speaker 4: Well, we better put her out of her home.

Speaker 1: Let's start with microphones. I'm going to use this clip from Singing in the Rain to show how mics used to work.

Speaker 4: Here's the mic. You talk towards it. The sound goes through the cable to the box. A man records it on a big record in wax.

Speaker 1: This scene illustrates some of the difficulties and intricacies of early sound recording. Mics were big, bulky, temperamental, and required creative solutions to be hidden. They were wired and recorded onto hard memory, like wax and eventually tape. No matter how many actors were in a scene, all sound got recorded to one track. So performers had to be diligently focused and facing a certain angle so that their words could be picked up. Otherwise, you couldn't hear a thing. But technology's improved to the point where microphones don't impede performance as much anymore. They've become better, smaller, wireless, and we use more of them to ensure that performances get captured.

Speaker 3: What we typically are working with from production dialogue is two boom microphones, and then every actor has at least one lavalier microphone hidden somewhere on them.

Speaker 1: These shrinking mics have given actors the flexibility to be more naturalistic in their performances. They no longer need to project so that their words reach the mic. They can speak softly, knowing that the tiny mic hidden on their body will pick up what they're saying. And my personal favorite example of this performance shift is Alec Baldwin on 30 Rock. In a 2011 speech slash roast, Tina Fey says that he speaks so quietly that she can't hear him when she's standing next to him. And then you play the film back and it's there somehow. Just listen to this whisper off between him and Will Arnett. I'm not afraid of you.

Speaker 2: Yeah. Well you should be. Let's just see how it all shakes out in the meeting.

Speaker 1: Naturalism isn't always the best for intelligibility though. Take Tom Hardy. An actor that I personally love, but who famously is a mumbler. Her ass is still be floating around in the sewer right now if it wasn't for me. I mean, the mic picked that line up fine. Like, we can definitely hear that he's talking, he's saying something. But once that mumble gets recorded, it's onto a dialogue editor's shoulder to make it as intelligible as possible. And that was a lot harder when everything was analog. While you could pick the best takes and physically spice them together, if some piece of dialogue was truly impossible to understand, then actors will come in and re-record those specific lines in a process called ADR, or Automated Dialogue Replacement. Which you can see Meryl Streep do in this scene from Postcards from the Edge.

Speaker 3: There's enough money in the world for their cause like yours.

Speaker 1: That still gets done today, but…

Speaker 3: ADR also costs money because you're not only paying for the actor's time, you're paying for the engineer's time, and then the editor's time. So we try to do ADR, frankly, as little as possible.

Speaker 1: And so a lot of her job is making words sound better.

Speaker 3: The show I'm currently working on, I remember in the middle of this one word there was just this loud metal clang that I couldn't remove. So I had to go in and I had to find an alternate take of it that fit, and then I had to fit it to the movement of her mouth in that moment, and then push it in.

Speaker 1: And once she's done with it, it's sent off to a mixer, who works to make sure the frequencies of the sound effects and music don't overlap with the frequencies of the human voice. Something that's only possible now that the world has moved away from tape and into digital recordings.

Speaker 3: That is a big challenge, carving out those frequencies, that space amongst every other element of the mix for the dialogue to be able to punch through, and not be all muddied up by any other sounds that exist in that band of frequencies.

Speaker 1: But even with all that work, lines of dialogue can still be hard to understand.

Speaker 3: The kind of feeling has been, if you want your movie to feel quote unquote cinematic, you have to have wall-to-wall bombastic loud sound. A lot of people will ask, why don't you just turn the dialogue up? Like just turn it up. And if only it was that simple. Because a big thing that we want to preserve is a concept called dynamic range. The range between your quietest sound and your loudest sound. If you have your dialogue that's going to be at the same volume as an explosion that immediately follows it, the explosion is not going to feel as big. You need that contrast in volume in order to give your ear a sense of scale.

Speaker 1: But the thing is, you can only make something so loud before it gets distorted.

Speaker 3: So if you want to create that wide dynamic range, you have no choice but to push those quieter sounds lower, instead of pushing the louder sounds louder.

Speaker 1: So explosions go up, and dialogue comes down. Which brings us to the Christopher Nolan of it all. A separate structure within the other statue. Hard to say that to you, right? We're kicking out of orbit. Nearly every film of his has been criticized for its hard-to-hear dialogue that essentially begs for subtitles. But as his headline explains, he likes it that way. According to an interview in a book called The Nolan Variations, he said that he gets a lot of complaints, even from other filmmakers, who would say, I just saw your film and the dialogue is inaudible. The truth was, it was kind of the whole enchilada of how he had chosen to mix it. And in a 2017 interview with IndieWire, he said, we made the decision a couple of films ago that we weren't going to mix films for substandard theaters. And this is kind of the crux of the matter. The content that we watch here, and here, and here, is not mixed for us primarily.

Speaker 3: Pre-recording mixers mix for the widest surround sound format that is available. Typically, like big release films, that is Dolby Atmos, which has true 3D sound up to 128 channels.

Speaker 1: The thing is, if you're not at a movie theater that can showcase the best sound Hollywood has to offer, you can't experience all of those channels. So after the movie is mixed for the 128 Atmos tracks, somebody has to create a separate version of the film's audio where all those same sounds live on one, or two, or five tracks. This is called downmixing.

Speaker 3: Downmixing is the process of taking that biggest mix and folding it down into formats with lesser channels available to it. So say Atmos down to 7.1, or 7.1 down to 5.1, or 5.1 down to stereo, stereo down to mono.

Speaker 1: Unlike old TVs that were gigantic and had a ton of space for speakers, TVs today are super thin, like this one that I have in my living room is about the same thickness as my iPhone. So even though it's outputting the same mono or stereo sound that an older TV might, it's still going to sound worse because you have to have tiny little speakers to fit into this tiny, sleek form factor. These tiny speakers are also usually on the back of the TV. So the downmixed version of this movie that went from 128 channels down to just two is going to sound even muddier when it's pointing away from you. And when you're watching on your phone or a laptop, it's generally not much better. When you combine not-great speakers, naturalistic mumbly performances, dynamic range featuring bombastic sounds over dialogue, and a flattened mix, it's no wonder we have trouble hearing what's going on. And it seems like the industry knows this because TVs today are shipping with all kinds of settings built in, like this intelligence mode. You can put on active voice amplification in hopes of making that dialogue track come through just a little bit clearer. But of course, that's more band-aid than it is solution. The way movies get mixed likely isn't going to revert back to super pristine dialogue. So the solutions we have are 1. Buy better speakers and only go to theaters that have impeccable sound. 2. Take a chill pill and try to just worry a little bit less about picking up every single word that gets said. Or 3. Just keep the subtitles on. For people who are deaf or hard of hearing, subtitles make movies and TV shows accessible. And this accessibility has just expanded in recent years. Laws have been passed to ensure that movie theaters have at least a few screenings a week with captions. Pretty much every streaming service has standardized them, and speech recognition technology has made them accessible in pretty much every YouTube video and TikTok. Plus they're super easy to toggle on and off.

Summary

Generate a brief summary highlighting the main points of the transcript.

Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Key Takeaways

Extract key takeaways from the content of the transcript.

Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Enter your query

{{ secondsToHumanTime(time) }}

Back

Forward

{{ Math.round(speed * 100) / 100 }}x

{{ secondsToHumanTime(duration) }}

Select Audio file