Create Video Captions Using Watson Speech-to-Text
Learn to generate video captions with IBM's Watson Speech-to-Text using a GitHub subtitler utility, including setup and execution instructions.
File
Generate Caption Files with Watson Speech to Text
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: Hi, I'm Steve Atkin, Distinguished Engineer at IBM, and today I'm going to show you how you can use Watson Speech-to-Text to generate caption files for videos. So I've gone ahead already and created a little GitHub project with some utilities to help you do that, and there's a link below so that you can follow along. The first thing that you'll need to do is to clone the subtitler GitHub repo, and you'll find a link below to help you get started with that. Then you'll also need to set up credentials for Watson Speech-to-Text. So the way to do that is go out to the Bluemix catalog, and then select Watson Speech-to-Text. Once you do that, go ahead and create an instance of Watson Speech-to-Text, and we're going to go ahead and use this instance in the subtitler utility. Now once you have your service instance for Watson Speech-to-Text created, you'll need to copy the credentials from the service over into the subtitler utility, and specifically, you'll need to copy the credentials and put them in the speechcredentials.json file. When the subtitler utility runs on your machine, it's also going to make use of FFmpeg, so you want to make sure that you have FFmpeg installed on your machine. To show how the subtitler utility can be used to generate the caption files, we're going to go ahead and download a video from the BBC News School Report website, and I'll put a link below so that you can follow along, but we're going to go ahead and download the video Finding News. To run Subtitler, we need to provide three parameters. The first is going to be the file name of our video, the second parameter will be the source language that the video is in, or we could also supply a Watson Speech-to-Text customization ID. This customization ID means that you've trained Watson Speech-to-Text with special keywords and options, but we're not going to do that at this point. And then the last parameter is whether or not we want sentence casing to occur. The default is yes, and we're going to go ahead and run this with all the default options. So from the command prompt, I'm just going to type node subtitler and the name of my video FindingNews.mp4, and this video happens to be recorded in England, so we're going to choose en-gb for the language, and then we're going to select yes to go ahead and create the captions files for us. So once we start that, you'll see here it'll just ask for some confirmations that the options we put on are correct. Then what will happen is we'll automatically extract the audio file, and then we'll start generating the subtitle numbers. And then what you'll see here is we're going to be calling Watson Speech-to-Text underneath the covers, and we'll be generating our subtitle file here. And it just takes a few moments to run, typically no more than a few minutes, but this depends upon the length of your video, obviously. Okay, so now Subtitler has finished processing our video file, and you can see here we've generated two files. The first is FindingNews.srt, and that's actually our SubRip captions file, and we'll take a look at that in a moment. And then we have another file called FindingNews.events.json, and this file contains all the raw events that Watson Speech-to-Text generated as it was processing the audio out of your video file. So let's go ahead and take a look at those files. Okay, so let's take a look at the captions file that got generated for us. So as you can see here, it's in SubRip format, and you'll notice a few things. First of all, we've gone ahead and already put in the timing marks so that we can be certain that when the captions display, they're at the correct place in the video file. So you'll also notice here that we've done sentence casing, and keep in mind that this isn't always perfect. If the text is actually a question, you might need to go and make some corrections to this, and that's quite typical of many speech-to-text systems, especially when you're processing videos that might have some background noise in them. So let's go ahead and take a look at the next file. In the events.json file that we have displayed here, you can see here all of the events that Watson speech-to-text recognized. So there you can see the text segment that was captured, and you also can see the timing marks for each word that was spoken in that segment. So you can go ahead and take a look at that if you want to see more detail of actually how Watson speech-to-text captured information from the video. Okay, so there you have it. We've generated a caption file for a video, and now you can go ahead and use that caption file and watch it in any particular viewer you like. In the next set of videos, we're going to show you how you can actually take that caption file that we've generated and actually translate that into other languages. So stay tuned, and we'll have another video to show you how to do that.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript