How to Use IBM Watson Speech-to-Text with Swift

Convert Your Audio To Text

4.9/5

3723 customer reviews

Learn integrating IBM Watson's Speech-to-Text API in iOS apps using Swift. Record, transcribe audio seamlessly in this step-by-step tutorial.

iOS Apps IBM Watson Using the Speech To Text service in Swift (Recording .WAV Files)

Added on 01/29/2025

Speakers

Add new speaker

Speaker 1: Now just before I get into the actual video today, I'd just like to say that an early access copy of my book for the first two chapters, Hello Swift, iOS Programming for Kids and Other Beginners, has been released. You can find it at the link in the description. I really hope it helps you along your journey of learning iOS development through Swift. So hello there. My name is Tanya Bakshi, and this time we're going to be going over how you can use the IBM Watson Speech to Text API with Swift and in an iOS app. So let's begin. Now usually most tutorials, including the actual official example by IBM, doesn't allow you to record audio, it just sort of uses a pre-recorded audio file and does speech to text on that. However, I'm going to be teaching you how you can actually record audio and also give it to the Speech to Text API for it to transcribe and you can get your text. So this is basically a rundown of what we're going to be doing. Let's begin. Basically, we're going to have the iOS end over here, just pretend this is an iOS end. Alright, that right there. Then we're going to have our Watson end, which is basically our speech to text on the Watson end. Alright, and through Bluemix, what we're going to be doing is we're going to connect to Bluemix, which will connect us to that Watson API, and back. Alright, so that's basically what we're going to be doing in order to use speech to text. You know what, let me just make this a bit more clear for you. Sorry, I didn't realize that happened. So basically what's going to happen, again, we're going to communicate with Bluemix, which will give us our credentials to talk with Watson. So let's see how this works out. Alright so, that's basically what we're going to be doing. Let's switch over to the Mac part now, in which I'm going to be teaching you how exactly you can use the iOS SDK for the Watson Developer Cloud in order to use speech to text. Let's get to it. So welcome back to the Mac part, and now I'm going to be showing you how exactly you can actually integrate the IBM Watson speech to text frameworks into your project. Let's begin. So first of all, what you're going to want to do is go to the iOS SDK GitHub repo for the IBM Watson Developer Cloud. And so this is basically where you're going to find your documentation, the source code for the iOS SDK, etc. etc. So as you can see, my internet is not exactly cooperating with me right now, but eventually it should load, though at a snail's pace. Alright so, continuing. As you can see over here, when you scroll down, you will see visit our quick start guide, whatever, whatever. So you want to click on quick start guide, and it will bring you to this little link where it shows you how to create a little sample application. So you're just going to want to read that document completely. I actually had quite a few mistakes because I didn't read it completely, but as long as you read it completely, you should have no trouble. So yeah, I'll just give my internet a second to load. Yeah, this should be good. Anyway, let's just pretend the quick start guide opened, okay? There will be a link to it in the description, though, I promise. So continuing, though. As you can see, there are quite a few example codes here, but in the quick start guide, they really explain how you can incorporate the APIs into your application. Alright, so you're just going to want to read that document completely, and incorporate the APIs into your application, and then once you have integrated that, you can continue with this tutorial. Alright, so now I hope you're back. Alright, so let's begin. First of all, my example app, as you can see, I have my frameworks put in, Llamofire Freddy, Object Mapper, REST Kit, Speech-to-Text V1, Starscream, and I do not need Text-to-Speech. I don't know why I included that. We can just delete that. Alright, continuing, though. How does the interface of the app look? Let's just begin with that, alright? So, basically, this is an extremely simple interface, alright? Nothing really to explain here. There's a label, multiline label, 22 font size. You can have any font size you'd like. Then there's a button, start, stop recording. It's that simple. Now, the thing is, what you're going to want to do with this button is, by default, you're going to want to actually, yeah, so by default, you're going to want to set its alpha to 0.5. The reason I'm saying this is because we want this button to be dim when we're not listening to the user, when it's not recording, but we want it to be sort of brighter when we're actually listening to the user, and then dimmer again when we're finished listening to the user. So that's basically why we're setting the alpha to 0.5, alright? So it's basically the opacity, but continuing, though. So now basically, as you can see under this, we have a touch-up inside Action to the start-stop IB Action, alright? And then the UI label has a referencing outlet to the, you can actually, the name of the IB outlet is transcribed label. That's how simple that is, alright? So now, the code here, the beginning part, this part, okay, so I'm going to explain the IB outlet first. The IB outlet is essentially, again, just the UI label, and it's just transcribed label. This is how, where we're going to give our output, my Apple Watch, okay, I don't want to stand right now. Alright, continuing, though. Then we're going to create the recorder variable, the AV audio recorder, and I'm declaring this as a class scope variable so that we can set it up in the ViewDidLoad function, and then actually record with it in an IB Action, alright? Alright, so now in the ViewDidLoad function, as you can see, we have a bunch of code here. Now, I'm just going to skim through this. First of all, here what we're doing is we are just setting, basically, we're looking for the documents directory, we're setting the file name that the recorder will save to, we're going to get the file path itself, alright, so that's that, then we're getting a session, the AV audio session, the shared instance of it, and then we're setting the settings of the WAV file, or the WAV file we're going to record, and then we're doing something, and we're going to try to set the category of the session to AV audio session category, play and record, alright? Then we're going to set the recorder to, again, we're going to try to do this, we're going to set it to an AV audio recorder with the URL as the file path, meaning it's going to save its final result at that file path, and then give it the settings that we need, alright? Then we're going to make sure that the recorder is set up, if it is not, then we're just going to return out of the function and that's it, sort of, we're done. And so, I mean, if it is, then we're going to continue, and we're going to prepare the recorder to record, and basically what we're going to do is we're going to set the metering enabled, variable to true, and we're going to tell it to prepare to record. Now one more thing really quickly though, most of this code was actually taken from the iOS SDK, the example code from there, this was just copied from there, because the thing is, I didn't want to have to set up the recorder from scratch, so, you know, it's built already for me, I took a bit of that open source code, and I put it in here. The power of open source is amazing. So continuing, so basically, yeah, this is from the IBM Watson example code, and of course, I will be giving a link to that in the description. Then we have the IP action called start stop recording, alright? And we're also taking a parameter called sender UI button, and essentially what we're doing with this parameter is we're going to be able to set the alpha value of the button to 1 when we start recording, and 0.5 when we are done. Alright so basically what we're checking is if the recorder is not recording, meaning if not recorder.recording, that means that we're not currently recording, the button is dim. Then we're going to set a session constant to, again, the current shared instance of the AV audio session. Then what we're going to do is we're going to try to set that session to active, alright? So we're going to set active true. Then what we're going to do is we're going to say recorder.record and set the sender's alpha to 1, okay? Next what we're going to do is we're going to say else. Currently we're recording, and the users click the button, meaning they want to stop recording. Then again, we're going to do recorder.stop, and this will stop recording the user's voice. Then we're going to set the sender's alpha back to 0.5 so it's dimmer again. Then essentially we take the shared instance of the AV audio session once more and set it to not active, meaning false. Then we create a new speech-to-text instance by saying let speech-to-text is equal to speech-to-text username and then of course our username here, and of course also the password, which is this password here. Alright, so next, then we're going to be setting the settings constant to the transcription settings, and we're going to actually set the content type to .wav. It's the easiest sort of type I could use, okay? Because the thing is, it is a bit harder to get iOS to record in other formats. Wav is one of the easiest that I was able to find because that was the default code that was already there. Alright, so then what we're going to do is we're going to create a failure, but then again I don't believe this is required anymore, but they haven't exactly updated the speech-to-text API in the iOS SDK yet, so the failure variable still has to remain. So basically what we're doing is we're creating a block of code that will take an NSError and print that error. So it's basically a function and a constant. And then what we're doing is we are calling that speech-to-text constant and we're doing .transcribe on it. We're taking the speech-to-text class and we're running that .transcribe function. So essentially it's using the recorder's URL. We're taking the file from the recorder's URL and then essentially it is taking the settings of course that I gave it, and of course the failure variable, and then essentially all it does is it just takes the results, and again this is an asynchronous call and so that's why we have this little block of code here. And so basically I'm taking the results from that little asynchronous call and then I'm checking if basically results.last.alternatives.last.transcripts exists. And if it does it goes into a constant called transcription and then basically the transcribe labels text gets equal to transcription. And that is basically how the start-stop recording thing works. And again this was also taken mostly from, actually let's see if this loads now. No my internet is still not cooperating with me. I don't know why. But anyway, continuing, again this code was mostly taken by heavily modified from the IBM Watson iOS SDK example code. So now technically we should be able to run this and see our reward, our app running. So I'll meet you there. Alright, so as you can see the app is running on the phone. So there's a hidden little label over there and there is a button over here that will allow us to start and stop recording. As you can see right now it is currently a bit dim, but right as I click on it it's going to be much brighter. Right as I click on it once more it's going to become dim and then we'll be done recording. So let's try transcribing something, shall we? I'm going to just transcribe something really simple for the first time at least. Let's do this. I am currently recording a YouTube video and using the IBM Watson speech to text service. You can see that button getting dimmer and brighter and now it's dimmer again because it's not listening to us and in just a second this should be able to go to IBM Watson and as you can see I am currently, the AM sort of gets stuck here with the IBM Watson speech to text SDK, but then again they're working on it, that'll be resolved soon, I assume. So I'm currently recording a YouTube video and using the IBM Watson speech to text service. That's how the app works and so now we can just zoom out from there and get back into the Mac part. Alright so that was it for the video today, I really hope you enjoyed and if you did please make sure to leave a like down below and even subscribe if you really like my content and you want to see more of it. You can even share my video with your friends or family or really anyone who you think this would help and that's going to be it for this tutorial. If you have any questions, suggestions or feedback you can surely email it to me at tajimanyintogmail.com and you can also tweet it to me at tajimany. That's going to be it for this video, again one more place you can contact me is down in the YouTube comments below. That's going to be it for this tutorial. Hope you enjoyed. Goodbye.