Automating Transcription: Using Pratt and Elan for Efficient Annotation
Learn how to automate transcription using Pratt and Elan. This guide covers setting parameters, importing files, and creating annotation blocks efficiently.
File
CHachalani tutorial - Praat and ELAN transcription auto-segmentation
Added on 09/06/2024
Speakers
add Add new speaker

Speaker 1: So here's the, let me show you the endpoint, and then we'll backtrack, I'm just going to do an entire screenshot, I'm going to be jumping between programs. So what I have here, this is a copy of the latest file that we were working on, the 2020 December Religion and Spirituality on Guam discussion with Sina Njeri and Sina Lo. And one of the things that, you know, is sort of like the first step in transcribing is identifying where the utterances are. And usually what you do is you go through, you highlight, and then you create an annotation either by double clicking or creating a sort of new annotation, either through the shortcut or the whatever. You know, we discussed a few different ways you can do that. But that requires us to go through and sort of select the utterances first and identify them. And what if we could just have the computer do that? It seems like it would be a useful first step. So what I've got down here, these are automatically generated segments, one by Pratt and one by Elan. And what that is going to do is sort of break up the thing into segments first, so then what you could do is use a different type of transcription mode. You could just break this up from silences and then you would just have all the blocks already created. So let's take a look at how to do this. Like I said, there's two different ways and I'm going to run through both of them with you. I'm going to start with Pratt. Now, Pratt, we've discussed before, Pratt is a program that is used for linguistic phonetics. It has a lot of very powerful functionality. We haven't discussed most of that, but you don't have to worry about it. One of the things that's a little bit clunky about Pratt is that the menus are sort of all in weird places and there's no drag and drop functionality. You have to sort of use all the buttons to open files manually. So in Pratt, what you would do is you would read a file in. I chose this WAV file and then I've already got that open. What we can do is we can annotate this WAV file to text grid that includes silences and this will try to do it automatically. There's a bunch of sort of parameters here that are a little bit, there's no really, the default values are not the ones that I have on the screen. When it starts, it will give you values that are sort of like this. Minimum pitch, 100 hertz, silence threshold, negative 25 decibels, and so on and so forth. Some of these are not very intuitive and it took a lot of sort of playing around to figure out what the best values were. This might not even be the same across different recordings because different recordings have different noise levels, but what I found worked well for this recording was a minimum pitch of 75 hertz, a minimum silence threshold of about negative 50 decibels, that's a measure of the quiet times, and then a silent interval duration of about a quarter second and a sounding interval of about 0.2 seconds. These parameters, again, you just sort of have to play around with, but the two things that you do want to make sure are set specifically are the silent intervals. These you want to be blank, and then the sounding interval, the intervals where there actually is some sound, so where you're going to create these text blocks. This should be something that's easy to search for and easy to replace and is all text. I've played around with a few different values of this. I started out by just having like symbols in there, like asterisks or at signs, but then Elon messed that up. It couldn't deal with the symbols, so I ended up just using a bunch of Xs. I've already run this, so I'm not going to run this again, but in Prot, it takes about two minutes to parse an hour-long file, but once you have that, it'll give you a text grid. Now, in Prot, you can view and edit these, and it'll show you where the blocks are, which is a little bit useful. I don't find Prot all that nice for navigating, but this will just show you that it parsed it successfully if you want to check. What we do then is we save the text grid. We can save it as a text file. I've already got it saved. The default name should be fine, and then you would go over to Elon, and you can import this. Let me get rid of this one that I've already imported. Delete. Great. Yes. The import in Elon is pretty straightforward. Just go up to File, Import, and there's an option for Import Prot Text Grid. Choose the text grid file that you have. It'll already look for text grid extension, so there shouldn't be too much that pops up, so long as you're in the right directory. Then the one key is to skip empty intervals. This is the thing that we just were doing on the Prot side, where we said for the silences, just leave them blank. This is the reason we did that, is because now when we go to Prot, we can skip the empty intervals or skip the empty annotations, and it won't create blocks for silence, which is what we want. We don't want to be annotating silence. Then what the tier type is, default is fine, and then this will bring it in, and here are those newly created annotation blocks. The thing we want to check is to make sure that the annotation blocks actually match the speech, and it looks like that's pretty good here. We've got a section of speech that's represented here, and it seems to have caught that part of the WAV file. We can check this as well by listening to it. I don't remember if I shared the audio. Let me just stop the screen share for a moment and share the audio, make sure I'm sharing that. I didn't. There we go.

Speaker 2: We can just give it a play and make sure that we've caught the audio.

Speaker 1: That seemed to have worked pretty well. We got one block here that was the...

Speaker 2: That's fine.

Speaker 1: That's fine, and then we had another block here, which was pretty much that entire intonational

Speaker 2: phrase.

Speaker 1: That actually parsed pretty well. Now, there are probably going to be a few places later on where it might have clumped two utterances together. We would have to go through and do a little bit of parsing to try to figure out where those are and to pull them apart. This is not going to be a perfect method, but it might be better than doing it all by hand. This is why I was trying to show you a couple of methods. Here's a block. It looks like, and I'm going to show you the Elan method in a second, but it looks like Elan separated this out into two chunks where Prop just had it in one. Again, that might be partially just the parameters, but let's listen to this and see if it's two or one.

Speaker 2: That actually sounds like it's just one intonational phrase.

Speaker 1: There is a pause here, and so that's probably what Elan is picking up that the Prop wasn't. The Prop window is good there. If this was the Elan one, we would probably want to alter that by hand. The way that you can alter these Prop windows, again, depending on your operating system, if you're using a Windows system, holding down the Alt key allows you to get this double-sided arrow, which will allow you to change the annotation length, which can be useful. Or you can select another sub-portion and create a new annotation over the top of things if you wanted to divide it up into two blocks. You could also use the split command, as we saw before. So far, it's not too bad, although it looks like we have slightly different values between the two of them. The other thing that we can do is we can try to do this right in Elan. The way to do that in Elan is through this tab called Recognizers. At the top, we haven't really used these tabs very much. Usually, we just had it left on the controls, the media controls, but there's a bunch of other tabs, and we haven't dealt with those. Some of these are not very useful for us, like Subtitles, which are subtitles for videos, or you can see the full range of words that you have within a tier, which might be useful, but we haven't really done because that's more something that you would do for analyzing texts later. But this Recognizer tab is the one that can help you try to parse this, and there are a bunch of different recognizers. This is what I was just saying. Some of these are web services. So for example, this first one here, this phone-level audio segmentation, that's going to try to break up your recording down to individual letters, but it requires an internet connection, and it requires that you upload files and then download them again once it's tried to parse them. Again, if you have an hour-long audio file, a WAV file, that's going to take forever. You could probably do it with an MP3, but I haven't, but like I said, this is always timed out on me. Some of these other ones as well, I haven't gotten to work. There are a lot of them, and you can play around with these, or even some video ones that you can try to analyze like motion. The one that's built in has the Elon icon next to it, and that's the silent recognizer from MPI. That's the MaxPlug Institute, and it's pretty direct. It's pretty simple. There are sort of just like three parameters here. One is, and this is something where you can either just try to play around with the values manually or you can try to select examples of silence. I understand the example, the silence example works pretty well, but you have to select a whole bunch of them, and it seemed just as easy to try to play around with the parameters first. So, I moved the parameters around a little bit. The parameters for Elon turn out to be actually a little bit different than the ones for Prop. Prop uses a different recognition system. Whatever they're doing, they're using something different, so you can't just transfer them over directly, but for this, a silence level of about negative 40 decibels worked, and then I used the same silence duration, a quarter of a second, 250 milliseconds, and blocks of minimally, I don't know, 150 milliseconds or so. The nice thing about the Elon recognizer is that it's fast, so here's the progress. Let's start on this. Let's take the WAV file and parse it, and it's done already, so whereas Prop took like two minutes, this one's done. That will give you some output directly on the WAV file, so this is those Xs and Ss here, S for silence, X for content, but that doesn't really help you that much. You can select these, but then you would still have to create blocks. What you can do, though, is once you've parsed it, click create tier, and it'll create a new tier with information in it, so we can call this whatever. I don't know. I don't know. Maybe we can't label this. I'm going to skip the silences, though. We don't need to have S blocks, and then that should create a new tier for us with the silences. Let me just scroll down here, and indeed we have it. Hold on. There was a chat thing. Do we need to download the Prop? If you're going to use Prop, then yes, but I believe I've shared that with you before. The other thing is you can just search for Prop, P-R-A-A-T, which is Dutch for it just means speech, and it's like the first thing that pops up. Now, one thing that you'll notice is that the segmentation on these is actually slightly different, so the silences, the ones that we did from the text grid from Prop, are slightly longer than the ones that come in from Elon itself. Groupings are really pretty closely cropped to the sound itself, whereas Prop gives a little bit of a window pre and post. I personally kind of like the pre and post buffering a little bit more. It gives you a little bit more error room if you were going to do some sort of segmentation of the sound file and so forth. Either way, both of these will work. Both of these will sort of give you the blocks. Oops, we're going the wrong way. Then you can use those for more segmentation or annotation.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript