Innovative Captioning Solutions: SyncWords and NCI's Journey with AWS Translate
Discover how SyncWords and NCI revolutionized live captioning using AWS Translate, enhancing accessibility and expanding global viewership.
File
Real Time Translation for Live Captioning Workflows
Added on 10/01/2024
Speakers
add Add new speaker

Speaker 1: Hello, everyone. Let me just say how excited I am to present in front of you and to be here right after lunch where everyone's already full and ready to go. Today is very special because I'm co-presenting with my customer, Lidia Pinzon from the National Captioning Institute. In just a second, I'm going to give her the microphone so she can introduce herself properly. Let me grab the clicker over here. Let me tell you a little bit about SyncWords. What we are is we're a software development company, and we focus in the captioning and subtitling industry, the most exciting topic of the day. And what we do is we basically leverage the cloud. We're a cloud-only company. So all of our engineering and development is for scalability, for automation. And our customers are the media and entertainment field, especially the service companies. And, of course, one of the most and oldest and most recognizable service company in the National Captioning Institute is our customer. And they come to us to help them scale, to help them increase their productivity, their workforce, and to come up with new solutions as requirements become more global and international. And today we're going to focus on how we leveraged AWS Translate service to help them give their customers the option to localize all of their live news and sports. So without further ado, let me give the microphone to Lidia, and she can introduce herself properly.

Speaker 2: Well, good afternoon. Buenas tardes. I hope everyone had a great lunch and lots of coffee, because we have a lot of great information to share with you. I'm from Colombia, so I had to mention the coffee, right? Can I have the clip? Thank you. So I wanted to start mentioning that the National Captioning Institute next month, we're celebrating the 40th anniversary of the first program that was closed captioned here in the U.S. The wonderful world of Disney and the Masterpiece Theater was the very first program with closed captions. And that opened up a huge world for the deaf and hard of hearing community to have access to media, to television around the country. And if you can imagine 40 years ago, we started captioning only six hours per week. Currently, we captioned 200,000 hours per year of media content. And that involves networks, television networks. It involves government, education, everything. So going from six hours per week to 200,000 hours per year, that's just amazing. And just how content it has developed over the years. So for the last past 40 years, NCI has used innovation and dedication to bring quality to viewers, as well as building the path and the technology for closed caption providers. So in 1979, we developed the DecoderBox. Some of you are familiar with that. It's the closed caption encoder. And then a decade later, we partnered with ITT Corporation to invent the closed captioned decoding microchip that went into the television sets. So that's how closed caption became an access. So if we fast forward to 2010, 2011, we partnered, our technology partner here, SyncWords, we developed and launched the recap solution that it provides live captions anywhere, anytime. We've been talking for the last few years about content delivered anywhere, any day, anytime in mobiles. And now this program is coming with closed captions. And more recently, there was an opportunity to develop AI power machine translation for live captioning. And that, again, is in order to localize the content and bring live news, sporting events, and other content. So when we approach SyncWords, what our challenge is, and I'm just going back, I don't know if everyone is here at the beginning with, I think it's Ricardo from AWS, that he showed, it was a very impactful screen, a slide of family getting together and just watching TV. And I think this is all about accessibility for the deaf and hard of hearing community. But now we're going farther. We're trying to reach out to other audiences. A client of ours that is still one of the most important clients, they deliver 24-7 news. They came out to us and they said, we want our content delivered now to Spanish captions for the whole Latin American continent. And we want it in 45 days. So we were like, okay. 30, 45 days. We started scrambling around looking for human translators. Every human translator, we had a pool of human translators, but we were supposed to caption 24-7. We didn't have the workforce to caption or the translators. We had the captioners because Spanish captions is one of our strongest service that we provide currently. But to find translators 24-7 was a huge ordeal. And in addition to that, they said, and in three months, we want Portuguese. And in another three months, we might include Japanese and French and German. Where are we going to find all these human translators? In addition to being so, you know, if you go to these events, they have professional dubbing, which is also access. They're very expensive. We have another company that I can mention it here. It's called On24. They mainly serve clients like Pepsi and Amazon, Microsoft. I'm not sure if I was allowed to mention Amazon. I'm sorry. But they hosted other, you know, events here in the U.S. or in Europe, and they were like, okay, we need all these multiple languages. And I was like, I'm sorry, I have the Spanish, but I don't have the French. Can we still do it? They're like, no, we need all these multiple languages. So that was a huge challenge for us, just managing the schedule in the workforce. In addition to that, if you can imagine the latency, if you add the translator, if you add the captioner, in live captioning, realtime captioning, if you depend on it or if you just watch it on at the gym or at the bar, you can see that there is, you know, a lag because the captioner has to hear it first and then either voice it or type it, stenographer. So adding that translator to the equation, it just adds to the latency, right? So those were the challenges. And later, we approached SyncWords as our technology partner. We've been working for more than ten years. And we said, okay, we have this great opportunity. Clients are coming to us, and they want, well, they said they want to reach to a broader audience, to other markets here in the U.S. We know that here in the U.S. it varies not only Latinos, it's Indians, Europeans, everything. We're a mixed ball here. And so we wanted something that will work around our environment because, again, we only had 30 to 45 days to come to this solution. So it had to work with the environment that we were already working, that it will, you know, interface with I'm not a techie person, but I think interface is the word, that interface with our workflow in our environment. And something that worked for TV and now that it works for streaming, and that's where AWS comes into play and why we reach out to our already, you know, wonderful partner. And I'm going to hand it to you now so you can explain what we did and what is now used by some of our clients.

Speaker 1: Thank you, Liddy. So right away when we were approached with this, we needed something robust. We knew that it had to be mission critical, live events, something that was scalable and that had the proper API and tools. So needless to say, Amazon Translate fit the bill. And the way that most people look at this service is they look at it more for offline, not for live. And they have a document in an archive or a subtitle file, they send it there, and then you get the translated text, right? That's the typical workflow. But how can we adapt this workflow for something that's in realtime, something for live news and sports, something that could handle 200,000 hours a year and all kinds of scheduling around the clock? So we designed this solution. We knew that we were getting the ingest from NCI. They're doing the realtime captioning. That's what they do. They're excellent at that. They have the human resources. We are able to configure what they need. And then we came up with a mechanism to sync the languages and translate the broadcast protocol to something that Amazon Translate could read. So we get the protocol from their captioners. That's a broadcast protocol. For those technical people, it's a 608 protocol. And we convert that in realtime to text. We get the results back. And then originally, we're able to connect it to their existing recap system. So that was the first POC. We got that up and running very quickly. Once that was working, then it was a matter of translating it back to something the broadcast systems could read. And then we're able to interface with downstream broadcast systems to do the live OTT and the live broadcast captions. So what were the outcomes and benefits? What is it in it for their customers now? All their broadcast customers that they do live news and sports, all kinds of shapes and forms of it. Well, you know, captioning is required by law in the United States, Canada, and Mexico. So they're already spending the money to do the live captioning. But because it's the law, it's not it's like it's almost like throwing money out the window because they're not getting a return on investment on that. And by leveraging that and using machine translation, now they're able to open up a new channel to a completely different international audience. And that could result in multiplying their viewership. In some cases, you could quadruple their viewership or increase ad revenue if that's the business model. Now you can approach new sponsors for that new demographic. And of course, increase accessibility. You know, potentially you could do 40 plus languages simultaneously just by leveraging what they're already paying for. Here's actually a screenshot of a live stream that's live right now. And this live stream is American broadcast. And they are live streaming this to several different countries leveraging the system. So now they have viewers where they never had viewers. In Vietnam, in Indonesia, in France, in Thailand. Now these folks are able to enjoy the programming for this live feed, leveraging the machine translation. So here are the challenges we solved. We had to integrate with what was already there. We can't change the workflow that NCI had. They have a very regimented, production ready workflow that's secure. They have a whole team of court reporters and captioners. They can't change the way they work. They have to stay the same. We had to integrate with that. We had to convert the format to subtitling. We had to interface with Amazon Translate. We couldn't transcode the video. There's no way we could do that. First of all, that's not even our forte. But transcoding a video that's live is not something that customers want. Don't touch their video. Just add the captions. That's all they had us to do. And of course, we then had to be compatible with streaming platforms as well as broadcast infrastructure that was already existing. And we had to scale at any point. So without shutting down the live stream, we had to scale, you know, we can add Chinese or we can add Portuguese with just a press of a button without having to change anything in the workflow. But what about how does that compare to a human translator, right? Is it better? Well, we did some benchmarks. And because machine translation is so quick for live realtime video, we actually were able to minimize the latency compared to a human. And coverage, we had 100% coverage. NCI has expert captions. They do sound cues when there's laughter, when there's applause, when there's speaker identification. All kinds of things are there in the original English. When you have a human translator, they're going to miss all of that. Well, with the machine translation, we get 100% coverage. All of this meant that we were able to scale and outperform any human translator using AWS Translate. So this is a little graphic that I like because it really is a testament to the accuracy and the quality of captions that we're getting from NCI, okay? And our focus really is in user experience. It's not any good if what you output isn't accurate, if what you output doesn't help people understand your content. This has to be ready for primetime with some of the biggest profile news agencies in the world. So the input language has to be accurate, right? If you get that accuracy from someone like NCI, then we know that the punctuation is there, the grammar is there, we know that the sound cues are there, and then we can leverage that. So we can send things to translate in chunks and phrases as opposed to in word level. We leverage their accuracy to get to that summit of accuracy and have something ready for primetime. Just very quickly, this is how it looks. We emulated a closed caption encoder on the cloud, so they don't have to change anything. They don't even know that they're connecting their captioners. They just connect like they would connect to, say, a hardware caption encoder, and then we take care of the rest. We're able to interface with that. In the case of live streaming, we don't need the caption encoder at all, and that's a huge cost saving for their customers. We can just emulate it on the cloud. What about live OTT? So we have over here the typical live OTT. You have a contribution encoder, a very popular box, and that goes to the CDN. We just come over the top. We don't touch that upstream workflow at all. We come over the top, and instead of embedding the live stream, transport stream with 608 captions, we create a sidecar subtitle stream with VTT files. That also means that we're not limited to 608. We can do Chinese, Japanese, any kind of Unicode characters that's not supported by regular broadcast captioning. Then we're able to connect to the CDN, and now we needed a player application, and we found through the AWS partner network, we found Theo Player, and it just so happens that off the shelf, they're able to receive as many languages as we could throw at them, and then they can display those languages in real time on the live stream player, and it worked out just fine. Let's take a closer look at that, though. Because the truth of the matter is, it's not as simple as just creating VTT files. There's a lot of issues with that. One of the things is how to synchronize that. So we have, let's say we do 20 languages at once. How do we synchronize all the languages so the timing is right, doesn't introduce latency? Well, first thing is NCI gives us that caption feed. We generate the M3U8, the VTT files, and we send that to the origin server. Okay? That's constantly refreshing. And what we do to then synchronize all the output languages is we analyze from the CDN the MPEG time code from the transport stream. Okay? That tells us exactly where those translated subtitles need to land. So that eliminates any kind of deviation, and it has conformity and consistency in the way that we're delivering the languages. Okay? And once that's done, then we can send it off to the player application, and folks are already using it. It just works. And it can be turned on at any time. They don't have to stop the stream. You could give me a live stream right now, and we could put this right on top of that workflow. It's a very, very great system because it makes them feel that, okay, I don't have to reengineer everything. And as you can tell, with new technologies, especially cloud technologies, engineers don't want to hear it. They don't want to say, well, now you have to change. That could add months to a POC. And they need it onboarded right away, as Liddy said. We had a very short amount of time to get this working. But it gives us that level of scalability where we can introduce and leverage Amazon Translate at any point with anyone of their customers that does broadcast or live streaming. So in my opinion, this is really the answer, because everyone wants the same thing when they're airing live. They want more viewers. And at some point, when you want to grow your viewership, you're going to hit a language barrier. And if you can leverage automation to get past that language barrier, well, then you can grow your audience by millions, potentially, because you can send it to everyone in the world. And that's really the core of this solution, that we're able to deliver to NCI to be able to grow your audience, leveraging automation. And that's our presentation. If there's any questions, you can definitely ask now, or you can definitely talk to us or send us a quick email.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript