Exploring Audio Transcription and AI Integration

Convert Your Audio To Text

4.9/5

3716 customer reviews

A deep dive into using transcription services like Assembly AI for audio transcription, summarization, and speaker identification, and plans for legislative monitoring.

More audio transcription with AssemblyAI

Added on 01/29/2025

Speakers

Add new speaker

Speaker 1: All right, so I'm not doing these super polished. I'm just kind of showing off what I've been doing. So, and just to be, make you aware, one of these is, I can go into my account. Anyway, I can change my API key, so I'm not gonna try and hide it this time. In fact, I think it's right there. Anyway, I've been playing with the, oh, that's on the other screen. All right, anyway, I've been doing transcription stuff. I got it kind of rolling with Whisper, which is the open AI solution. And then I kind of poked around to see what else was out there for audio transcripts. And to be perfectly honest, and I think I said this on the last video, what I want is I want something that will give me the transcript, but will also give me the, like it'll break it up by user or by speaker. I want something that will allow me to summarize the audio. And it looks like Assembly AI does all of that stuff. In fact, here, let me just pull this. And there we go. All right, so you can see here, you know, it's like transcribe an audio. And so I did, I transcribed that same local file I did on the last video I put up. But you know, cool stuff. And it did it pretty fast. It came back with, you know, a bunch of stuff that's kind of hard to read through. And I'll kind of break it down for you here in a minute, but anyway, what made me happy was this right here. So you've got transcribe your first audio and then identify speakers. Now this is a paid feature, right? So I had to add funds. I just added $10, right? And I've run it twice, I think, to get the speakers and everything else. So, so far so good, right? Yeah, and it does all this stuff. So this will identify the speakers. Now, one thing on the Ruby example that it doesn't give you is it doesn't actually show you how to like output it. Right, so you've got your transcription result, you know, and it just pulls it. It just checks in every three seconds until it gets an answer. And then it just puts it in breaks, right? So it just prints it out. So that's why you get this big honking JSON object is because this is the full-on response. And there's a ton here. So anyway, but what I figured out was as I built into my script is initially I had it print transcript, which is the whole big object. And then I was like, okay, show me what the keys are, right? And so all of a sudden this gets really interesting because then it's, okay, well, what's all the data you gave me, right? And so if we scroll all the way down here, this is the transcript text and looky here, you know, these are all the keys. So then I can start to break it down and I can go, oh, what's the audio? What's the text? What's the words? And the words is this piece right here, right? So it's like, I think you said, where do I see it? Oh, no, maybe it's not there. Yeah, so the text, it's like convention numbers, right? And the next word is state delegate, you know? So it breaks up the different words and, you know, how often they said it. And so you've got confidence and punctuate and all this stuff, right? So, and you can see here that it did punctuate it. But what's interesting is, is so I was like, okay, so how do I get this to break it up? And that's where this utterance things comes in. So it'll give you the utterances and the utterances are those broken up things. So you can see here, you know, this is, hey folks, welcome back to the Utah County Republican Party Podcast. By the way, this speaker and it gives you right here, speaker A, that's me, okay? So I'm speaker A and then it gives you all the words, right? So it gives you the words for that utterance. And then anyway, I don't know how terribly useful that is, but you've got speaker B, right? So I asked a question at the end of my utterance. And so I think speaker B is the party chair. Yeah, I'll answer this one. Happy to answer this one, blah, blah, blah, blah, blah, right? And so anyway, so what you can do then is you can see here, if I can open Brave again, there we go, Brave. So on the Python example, right? It goes through the utterances and it outputs the speaker and the text. So from there, what I can do is I can say, okay, type in the names for speaker A, speaker B, speaker C, speaker D, speaker E, you know, on our shows, we may have speaker E, speaker F, right? But then just have it intelligently update, right? The speaker, right? So I could have it say speaker A and then, you know, just use a Regex global replace or global sub and just drop it in. Charles Maxwood, you know, Dan Shapir, right? Whoever. And so anyway, I'm getting pretty excited about this stuff, but then you get into this, which is the highlights. So auto highlights, right? And so again, it just, if we come back over here, so I'm gonna come up here. I know that my font size is probably pretty small, but whatever. So highlights, right? So you can see auto highlights results, right? And so then what I can do is I can come back here and I can back up into this object, auto highlight, and this is one that I didn't look at. So anyway, auto highlights, true audio start from nil and nil. Anyway, auto highlights, true audio start from nil and nil. I don't know. Anyway, yeah, I forgot that I added that, but didn't ask for it or didn't check in for it. So anyway, but this is the other piece and I haven't built this in yet cause I'm still trying to figure this out. And it only shows the Python SDK and the TypeScript SDK and I am using Ruby. So I did go look around and it does appear that we do have an assembly AI speech to text. So what that means is, is if I capture, so when it does the transcript, it gives it an ID. And so then I can use that transcript ID and I can say, hey, summarize this for me. Hey, give me five or six titles, right? And so then it can suggest the titles and we can send it out to the hosts and the guests which is something that I want to build in, right? So then it's, hey, we got the transcript, we got the summary, we got whatever else. And so, yeah, and what I'd like to do is I'd like to be able to train it to say, what are the picks? But we'll see. Anyway, so it looks like I can just pull this in and it'll do most, if not everything that I want. So that's exciting. And then if we look here, you can also do live audio. I'm not gonna do that. But the other thing that you can do is, I think there was something here where it said that I could get like a transcript in the, like an SRT or something. And if I can't, maybe I can assemble that myself. But anyway, so I'm really excited about this because this gives me like 99% of what I want. And so then I can just build it in. Let me show you what I've been using before now. So I've been using Castmagic, is it not castmagic.com? No, it's castmagic.io. Anyway, so this is the bane of my existence. And what happens is, is my team will come in and they will put up the audio transcriptions or the audio to get transcriptions because this does the same thing, right? So it does the summaries, it does the titles, it does everything else, but I only have a thousand minutes a month, right? And so let's say that we're releasing, I don't know, seven or eight episodes that are an hour long. I mean, I run out of this in like three weeks. And so, I can come in here and I can manage my plan, but the problem is, is that I'm already at their highest top tier, right? And so my next option is to, you know, to add on a subscription. And this is, you know, I mean, it's not a terrible amount, but if you looked, you can see that it was costing me, what, like, let's say, let's say that the, you know, the 43 cents here, I guess I can look at the usage, but I don't, it says it's beta and I don't, oh, okay, so it looks like I did, I just did the one, so it cost me 43 cents. So, oh, there we go, right there, boom. All right, so let's say it's 43 cents and I have eight of them a week, right, times 4.5 weeks. So that's $15 a month. So, you know, it'll save me 50 bucks a month. But the other thing is, is I don't have to go in and top it up, I don't have to go in here and play this game, right? So anyway, that's what I'm looking at. I'm probably gonna go with this one. And then in the bootcamp and the other stuff, I'll probably just walk through the different pieces, Async Audio Intelligence. So that's probably like the utterances and stuff like that. Anyway, I'm pretty stoked. Oh, that's, see, that's where it came out to 43 cents. So it's 43 and slightly more cents. So anyway, so that's what we're gonna do. I'm really looking forward to this, solving a bunch of, you know, a bunch of the things. The other thing is, is that it, so if you come over here and you look at the Lemur stuff, well, we'll just look at the summarize, right? You're sending it a prompt. And so if I can get good enough at sending it the right prompt, then I can get quotes or whatever else out of this. The other thing is, is that I could also take the transcript and I could jam it into another LLM and have it sit in that context window and then ask it a bunch of questions over there. And so if there's something better, right? I get the transcription out of this and then I go ask some other system that's better at giving me whatever else I want, then I can do that. And if it gives me a full on quote, then the other thing that I can do is I can go and I can look at the transcript that I get out of this cause it has timestamps on it. And then I can say, okay, well, then I wanna go clip the audio or video at this point, right? And so I can see being able to pull it out. The Castamatic, yeah, Cast Magic does this, I think. So it wants me to match up the speakers, but yeah. Anyway, StreamYard also does it. And that's where we record where it'll AI pick some clips. What I found is that those are really not great. It's just not good at knowing what the highlights are. And to be fair, I mean, I don't know, I don't know how you would train that, but anyway, I'm super stoked about it. And yeah, so I'm gonna be transcribing stuff with Assembly AI. And then I think my next project, just to see what I can do about it. So I live in Utah. I'm gonna drag this over here so you can see it. So these are the house bills. In fact, it's all the bills for 2024. And then our legislative session runs January, February, and sometimes dips into March. I think it's usually done by the beginning of March. Anyway, so you've got house bills, you've got Senate bills. So what I wanna do is I actually want to, and I know that there's a, what is it? Citizen portal? I'm pretty sure this is the right one. Right, so you can go and you can search transcripts of like your city, your county, right? So this has Salt Lake. I'm pretty sure it has my city as well. I live in Utah County, not Salt Lake County. So anyway, yeah, here we go, show more. I think it's detecting where I'm at. Utah County. We do not have a county council, that's all Salt Lake County. Anyway, so what I wanna do though is I want to actually grab the bills. So when they start posting the bill files for the legislative session, what I want to do is I want to go and I want to grab the bill file, right? I want to grab this text right here, which I don't think would be terribly hard to do, and then feed it into an LLM and tell it, hey, can you summarize for me what this bill does, right? So this would just appropriate some money, right? It's like, hey, we're gonna spend more on this and a little less on that, right? And a lot of the lower numbered bills are kind of the ones where they're kind of a gimme, right? It changes the language from school district to an LEA, which is a learning education. Anyway, it's basically a school district, but it can also be like a charter school or something. So anyway, so it breaks down, right? And so it says, okay, we're gonna, it changes the language from school district or charter school to LEA, and then it also changes the way that it allocates money based on this, that, or the other, right? And so then I don't have to come in here and like read every bill, cause I just, I don't have the time. But if it tells me enough about, hey, these are the changes in allocations that they're making for the schools, and this is the language change that it's making, then if I want to know more, I can come look at the bill, right? So then I can come in here and I can go, okay, you know, what did they actually change? Okay, well, they changed these numbers, but then they also changed this and they changed that, and they amended this and blah, blah, blah, blah, right? Yeah, and just make that work. And then, you know, just do some other work to just see if there's a way to, cause this one was a substitution, right? If we go back to the main, right? It was a substitute bill, right? So there was something else here before and they substituted the language. So, you know, if that changes or if they amend it, right? See that stuff. So I want to build that out and start using LLMs to do that kind of work, just to see what's going on in the state legislature. And then to push back, cause sometimes the stuff's pretty straightforward and it's easy to follow. And sometimes there's a long-term agenda behind some of the stuff. And so they'll drop something in like natural resources that, you know, kind of tiptoes down the line a little bit. So anyway, that's the next piece I'm going to build, but anyway, I'm pretty excited about this. So anyway, I hope you find that interesting and yeah, I'm just going to keep posting this stuff. And then as we go, what I think the other thing that I'm going to wind up doing is, you know, I'm going to start doing podcast episodes on this stuff. You know, so you're going to get bonus episodes on Ruby Rogues or JavaScript Jabber. I'm going to start doing presentations on Ruby Geniuses and JavaScript Geniuses as soon as I get the website up for it. And then the last bit is, yeah, getting the, you know, the summit, the free summit. So y'all can come on and see what people are doing and then, you know, get that bootcamp together so that if you want to add AI features to your apps that you can learn how to do that, right? Without having to go and, you know, understand TensorFlow at its primal level. So anyway, I hope you found that interesting. I hope you found that interesting. I guess 30 people found it interesting. Yeah, I'll check in later.

Summary

Generate a brief summary highlighting the main points of the transcript.

Generate

Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate

Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate

Enter your query

Submit

Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate

Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate

Back

Forward

{{ Math.round(speed * 100) / 100 }}x

Select Audio file

Convert Your Audio To Text

Secure and Encryption, NDA

4.9/5 3716 customer reviews

1/729

Verified Order

“I am beyond happy with this service, which I am using it produce interview transcripts for my dissertation research. The interface is easy, the customer service was prompt and informative, the transcript is accurate, and the pricing is wonderful. I will recommend GoTranscript to anyone who is in need of affordable human-powered transcription services.”

Justin McDonald

Jun 29, 2025

“great work. quick and professional”

christian oradesky

Jun 28, 2025

“Very quick turnaround and nicely done!”

Chris Irwin

Jun 27, 2025

“I love your service - it's super accurate and clear and better than Rev. :) ”

Jodi

Jun 26, 2025

We Trust in Human Precision

Value-Driven Pricing

Trusted by Global Leaders

GoTranscript

24/7 Customer Support