AI-Powered Transcription: Enhancing Media Accessibility and SEO
Katlin Clark from Trint discusses AI-driven transcription, its impact on media accessibility, SEO, and the future of automated transcription technology.
File
Trint This startup helps you to transcribe interviews in a heartbeat - Medientage München 2017
Added on 09/08/2024
Speakers
add Add new speaker

Speaker 1: My next guest is Katlin Clark, she's based in London and working for a startup company doing audio and video transcription, which is totally making use of AI and automation, right?

Speaker 2: Yeah, absolutely. So I am commercial director at Trint and we leverage AI to do automated transcription and then we feed the transcripts into an editing tool so that they can be perfected and verified.

Speaker 1: I know that it's really, really hard work to do it manually, watching the video, doing the transcriptions for the subtitles and since YouTube and Facebook and so many other videos are watched without sound, so we're just scrolling down in our apps and watching the videos and we don't see what's in because it's just talking heads like we. We would really love to see the audio transcription in real time. So how accurate, how fast, how good is the transcription in the near future? And when we talk about automation.

Speaker 2: Yeah, no, definitely. Transcription is really key. I mean, we like to talk about a concept at Trint called dark data, which is the content that is inside an audio or video file that you can't search and that you can't find and it can't be therefore be SEO optimized and it can't be monetized. So by attaching a transcript to an audio and a video file, you're really bringing that content alive in a really exciting way and you're making it so that it can be much more leveraged within the context of your company. In terms of live, that's something that's definitely down the roadmap for us. AI is near to perfect, but it's not perfect yet, which is why we complement our AI tool with an editing tool to make sure that the transcript can therefore be perfected to do for, you know, exactly like you said, to create captions for Facebook videos, for Facebook live videos so that they can then be viewed by a wider audience.

Speaker 1: So the machine is starting with the work and I can just look over it and say, OK, that's OK, that's not OK.

Speaker 2: Yeah, exactly. I mean, what we aim to do is have machine learning do all the heavy lifting for you. So, you know, I was talking to someone earlier today who said that they took an interview of 10 minutes and it took them about three hours to transcribe it and make it perfect. It's just such a tedious process. But with print, what you can do is lighten that process, particularly the beginning of it, so you get a near perfect transcript and then the editing tool that it goes into makes it so that the editing is actually even faster.

Speaker 1: So I think you could tell a story about using your smartphone with Google, OK, Google or the Siri. So there's people who are really happy with that tool because they say, Siri understands everything I say and other people say, I have a hard accent. See, she doesn't understand anything. It's the same with AI and transcription tools that we have interview guests that there is nothing to do with.

Speaker 2: Yeah, absolutely. I mean, you know, AI is just tricky. It really the quality of the video and the audio that you put in is exactly the quality that you're going to get back out of it. And print is like that too. AI is just not sophisticated enough to pick up on conversations that a lot of background noise, very, very thick accents, certain dialects, et cetera. I think it will get there in the future. And a trend, you know, for English, we have an Australian accent, we have British accent and we're going to be adding a Hindi accent soon to accommodate different ways of speaking. But it's just something that's still on the roadmap and hasn't fully been developed yet.

Speaker 1: We in Germany could contribute a lot of accents as well.

Speaker 2: Oh, absolutely. Although actually German English performs extremely well in our algorithms.

Speaker 1: Oh, yes.

Speaker 2: Absolutely. Well, you speak very clearly and articulately. So yeah, we encourage you to give it a try.

Speaker 1: Yeah, but talking about the audio quality, so at the moment we use a quite professional microphone, but without the microphone or just using the built-in microphone from the smartphone would be hell for the transcription with all the background noise in here. So is this one goal AI could reach that we have a good transcription with the background noise?

Speaker 2: Absolutely. That's definitely a goal. I mean, ideally you would have near perfect transcripts no matter the environment so that a journalist who's out in the field or whether or not we're here in this type of studio in this type of environment, you would get almost a perfect transcript or at hopefully perfect. That's definitely down the road and the technology just keeps getting better.

Speaker 1: Is it also a machine learning thing so that the transcription itself gets better and better the more several dialects you put in, the more words you put in, so the more texts you have transcripted?

Speaker 2: So a trend, not yet. Right now, it's just that the machine learning technology is getting better, but it doesn't learn yet from your speech. That might happen down the line. We are going to be adding soon though a custom dictionary so that for example, if you're a news organization working with a specific type of the world or specific to subject matter, you can upload a set of words that you're going to be using frequently and the algorithm will recognize them so that you don't have to continue to correct them. So that's our solution for now.

Speaker 1: You talked about search engine optimization. So highlighting words, detecting keywords. Is it like that you have a kind of a dictionary where you erase all the normal words like because, yes, no, yesterday, tomorrow, and blacklist them as keywords but have fewer used words like names or locations or specific words to highlight them as keywords? So how is it working?

Speaker 2: Well, that's really for the editor and the content producer to decide. You know, if you have a video, there's obviously going to be a lot of garble and words which are slightly irrelevant. And what Trint does is it gives you the transcript to be able to pick out those keywords and to see how many times they're used and then put them into the metadata so that the video can be SEO optimized.

Speaker 1: So in the past, we always thought that the captions were a kind of a thing we need for blind people, for deaf people, for barrier-free websites. But in the meanwhile, we have a total different perspective on captions and transcripts since SEO or social media optimization. So how big is the market?

Speaker 2: Oh, huge, absolutely huge. I mean, definitely for media, it's enormous. Not to mention just the SEO, it's just the amount of people who watch video content on their smartphones who can't listen or are on the metro or maybe even at work and want to be able to consume content while they're doing those things. So that's one big section of the market. Not to mention the fact that 80% of content online is going to be video by 2018. So that's a huge, huge swath of content that needs to be leveraged in the best way possible. So within media in particular, there's an enormous market for transcription, but even outside of that, I mean, we have really interesting use cases. For example, we're chatting with Human Rights Watch. They do interviews on breaches of human rights all over the world, and they do hundreds and hundreds of hours of interviews, and transcribing is a pain for them too. So they want to be able to transcribe those interviews and to be able to leverage them in their research. In addition, there's also healthcare, there's law. Those are instances when people need to transcribe conversations with clients or conversations with patients. It's a process that's really pervasive and it's very necessary to be able to get the right amount of data out of the either content that you have or the conversations that you have.

Speaker 1: When it comes to your business model, so media organizations tend to use a bunch of platforms. So some use YouTube as their video library, others use Vimeo as their library, others use like Brightcove or other social, other technology to building their video libraries, but it's just a bunch of platforms. And since YouTube and Facebook have inserted their only, their own auto-transcription tool, and you can export the SRT files, which is a captions files, which works on almost any platform, how easy is it for a company like you to find new clients, new professional clients who are willing to pay for a service that they are familiar with from other platforms to get free?

Speaker 2: Absolutely. So where we really see the future for Trent is in enterprise and that is in working with companies on a much larger scale who need to have a product that's integrated into their workflows. So right now we're working with AP and the Washington Post on pilot programs for them to test out Trent, see how it works for them, but most crucially to see how Trent can be integrated into their CMS and into the process that they have on a day-to-day basis. Because as probably all of you know, media companies are way down in process. And so what our aim is to be-

Speaker 1: That's what she said.

Speaker 2: As far upstream into the process as possible. So for example, we're developing an API so that you can upload directly from a CMS. All of those things are not something that you can get from Facebook.

Speaker 1: Just to translate this a little bit because we're now totally into the topic. So CMS is the content management system where editors can edit the content and the API is the application programming interface. So that's the coding stuff for the developers. So where people can integrate your technology into their own technology, right? Okay, cool. So Caitlin, so here at the Mediatag, is this your first time here?

Speaker 2: It's my first time, yeah.

Speaker 1: What's your first impression?

Speaker 2: It's great, it's great. It's really interesting to be so deeply embedded in the German market. It's a market that we have a lot of interest in and I'm learning a lot.

Speaker 1: So enjoy the Mediatag. Thanks for being here as my guest. Take care.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript