Speaker 1: one more time. Thank you, Ruth, for the previous presentation. It was interesting to see what you've been doing at Harvard. I think there's not many universities that have been working along these lines on obtaining automatic subtitling for all video lectures. My name is Gonzalo Garces. I'm from the Machine Learning and Language Processing Research Group at this university. And well, I'd like to talk about our experience in subtitling the videos in our repositories using our own technology. So this is who we are. I'm from the Machine Learning and Language Processing Group. One of our main areas of work is precisely automatic transcription and translation of videos and video lectures. And in our most recent projects, we've always worked together with the UPV Media Services, who are the hosts of this conference. You know them as the creators of Polymedia, of the Paya Playa. And well, in this slide you can also see the list of the languages we've been working with. We've been working on subtitling videos in English, in Spanish, in Catalan, in German, in French, in Estonian, in Italian, in Dutch, in Portuguese, in Slovene, then machine translating between those languages. And we've also worked with text-to-speech synthesis in English, in Spanish, and in Catalan. So for instance, we've taken videos that were recorded in English, and then we've created an automatic dub of those videos into Spanish, for instance. So these are the technologies that we are developing in our research group. Let me talk now about the media repositories here at UPV. This is our media portal. And well, the most important ones are maybe Polymedia, the one at the top. Those are studio-recorded videos, usually short, maybe 10, 15 minutes, 10 minutes I'd say, on very specific topics. So the kind of videos you would find in a MOOC, usually, right? And then Videopunt, the one at the bottom. These are opencast lecture capture videos. So they are live captures at lessons and so on. And these are longer videos, so maybe full lectures of one hour or more, right? So now we've already subtitled all of our Polymedia videos with our automatic technology. And I'm maybe going to show you a short demo. Let's see. This is a video from our media repository. And...
Speaker 2: ...and woke up today, you would be very comfortable with where you were. It wouldn't look that different.
Speaker 1: The idea is we can, for all of the videos in our media repository, like this one, we can activate the subtitles down here.
Speaker 2: What we see today is very similar to what we see in current classrooms as well.
Speaker 1: So these are automatic subtitles generated with our speech recognition systems. And you can choose automatic translations into other languages.
Speaker 2: And this experience must be just wonderful for a student trying to learn. And one of the reasons that edX came to be is because of that.
Speaker 1: The idea is if you find any mistakes when you are watching the subtitles, then you have the option to edit them. So I'm logging in anonymously, but you can also log in with your university user. And here you can see our subtitle editor. So here you can just browse the subtitles, and it will jump to the appropriate part of the video. And you can also show subtitles.
Speaker 2: Subtitles are different in the U.S.
Speaker 1: In two different languages. So when you've got subtitles in two languages, you can show them side by side and just edit them. So as this was a public login, if we edit anything here and we save it, then it will be reviewed by the lecturer before the correction goes live. So this is how we've integrated subtitle editing in our repository. So what I wanted to talk about is also that, well, automatic subtitling is especially good for studio recordings where the audio is very good. And also where they are lectures, live lecture capture at conference recordings, when it's a clip-on microphone, the audio quality is usually good enough to get good subtitles. But then the challenge we are working with right now is that sometimes you don't have a clip-on microphone. You only have the room microphones. So you have to work with that audio, and that's more difficult. So there's several ways in which we are working to improve that. And, well, numbers one and two are simply about using the most advanced technology in speech recognition. So we develop our own tools, and we use deep neural networks. We use LSTM. This is the state of the art at the moment, and the more we advance, the best results we get. We also keep looking for more speech data and corpora to train our systems. Then number three is about multi-microphone speech recognition. I'll talk about that in a minute. Numbers four and five are adapting the system to specific characteristics, audio and vocabulary of... You can adapt to the characteristics of a video repository in particular, but also to each video. When you upload it, you use slides, a description of the contents, lecture notes, and that is used to improve the transcription automatically. And then number six is about when you want to edit those automatic transcriptions. Well, we'll see a little bit about that. Multi-microphone speech recognition. This is a hot topic now. It's a very active field of research, and, well, from this slide, I will say there's in our video uploads classrooms, there's different microphone setups, and you can't always rely on having clip-on microphone audio because maybe even the teachers forget to turn it on. It's not automatic, so sometimes you have only the room microphones. But the idea is if you have audio from several tracks recorded, you can use it all to process it and get a better automatic transcription than if you took only the audio from one of the microphones, even if it's the best one. And actually, as you see at the bottom, new multi-track recording setup. Our media services, they are also exploring how to set up the microphones around the room in a way that all the tracks can be actually recorded separately so that then we can use them for speech recognition. So this is a pilot project that we are working on now. And now these are some results. This is in Spanish in our video point lectures. So in the second chart, in the first line, there's results from our old speech recognition systems that we have been working on since 2014. And then in the second line, there's the most recent speech recognition system we've developed for Spanish, which is the one we're focusing on in the case of video points. So for clip-on microphones, the error rate of these transcriptions, they've gone down from 18% to 15%. And then if you don't have clip-on microphones and you have to rely on room microphones, then the error rates are much higher. But we are also improving on them using the different ways that we've talked about. And then also for reference, in the case of Polymedia, which are the studio recordings, you can see here for Spanish, we've gone down from an error rate of 15% to 12%, which is actually pretty good. And then for English, we've improved from a 26% error rate to a 22% error rate, which is also usually quite good. English is also a more difficult language to recognize than Spanish because of phonetic and orthographic reasons. So as to post-editing interfaces, this is our TLP player. I showed it earlier. Here you can edit the subtitles, and you can also decide on the length of the segment. And you can show different languages at the same time. And then our new transcription editor. This is for different use cases where you don't want to focus on the subtitle segment. You just want the full transcription. And then it's also synchronized with the audio, and you just edit any part you're interested in. Maybe one of the use cases is you edit the full transcription here if you want, and then it's automatically segmented into subtitles again. And this is all being implemented in PolyTrans, which is our university's official service, subtitling service. This service is being used for all of our videos, but it's now available also as a commercial service for anyone that wants to try it or for any organization. And you can just go to this website and upload your own videos and try it. And PolyTrans also has a nice API that you can use to integrate our services into your own workflow. And we've also been working with Universidad Carlos III de Madrid and Videolectors.net, a European large video repository. So we've been trying things not only in our university but externally, too. So I think that's most of what I wanted to tell you about. And we're happy with the experience of having subtitles for our videos, and we think it's important to improve accessibility to learning content. So thank you. I don't know if there's any questions. So thank you.
Speaker 3: Oh, sorry. What is about the availability of what you are developing there?
Speaker 1: Well, right now, the way to access it is the PolyTrans service. Here you can open an account, and you register, and you can try it for free. But then it is a commercial service, and you pay as you go. So right now, it's six euros per hour of video processed, and you get the transcription in the original language. We cover Spanish, French, Italian, Catalan. Those are the languages we have now in PolyTrans, but there are more that we can add as they are requested. And in this price, we also include the translations of the subtitles into other languages. So you can access... You can subtitle your own videos, and you also get access to the post-editing interfaces. So you can edit the subtitles you get, and you can use the API to integrate this into your own repository, into your own player. So that's the idea.
Speaker 3: Okay.
Speaker 1: So thank you very much. Thank you.
Generate a brief summary highlighting the main points of the transcript.
GenerateGenerate a concise and relevant title for the transcript based on the main themes and content discussed.
GenerateIdentify and highlight the key words or phrases most relevant to the content of the transcript.
GenerateAnalyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.
GenerateCreate interactive quizzes based on the content of the transcript to test comprehension or engage users.
GenerateWe’re Ready to Help
Call or Book a Meeting Now