Exploring IBM Watson SDK: Speech and Emotion Analysis
Ryan showcases the IBM Watson Unity SDK focusing on speech-to-text, text-to-speech, and tone analyzer for interacting with digital avatars.
File
IBM Watson Unity SDK - Speech to Text; Text to Speech and Tone Analyser
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: Hi there, it's Ryan. I've done a little bit of work here with the IBM Watson Unity SDK with the focus on the speech-to-text and text-to-speech services. Text-to-speech has a little bit of the expressive speech that we saw in the last video, but I've also added on or added back in the Tone Analyzer service and my utterances, but also the utterances of the speech of the digital human are going to have signal and emotion extracted from them and the orbs of emotion over the shoulder of the digital human will inflate accordingly. So let's jump in here and give it a shot. Welcome. Thank you. What do you think of the color yellow? Oh yes, the color of

Speaker 2: daisies. It makes me feel delighted to be alive. Good things happen with such a bright color. You sound super happy. That's so nice. You bring me great joy as

Speaker 1: well. Reset. What about the blue, the color blue? The color of the sky. I am a little

Speaker 3: under the weather. Are you sad? Tell me about quantum physics. I don't really know how to answer that. That's

Speaker 1: okay. Reset emotion. I'm feeling angry. Are you feeling angry? You should feel angry. Reset. Are you afraid and terrified? Reset. Are you disgusted and feel that it's all gross and disgusting? Reset. Make yourself tiny. Make yourself big as an

Speaker 3: elephant. Reset. Okay, should we say goodbye? Bye bye. Hang on. Thank you. Okay, so we just

Speaker 1: stopped it there. So the code is here. Most of the expressive speech was done in the prior video here. The new stuff that we added in was weaving in the tone analyzer service. We added in some of the basic pieces here. The key parts for tone are up top of course. Again, once we get into this point of the loop on the speech recognition event, this is our jumping off our key point for doing things. Triggering when we hear certain words. Again, this is a fairly hardwired method of the code to demo it. So for example, when we hear the word reset, we go in here and we rescale. So rescale the digital human, the Mackenzie Ethan avatar. So if I've grown larger or smaller, we just reset the local scale to a five scale. And then similarly, we reset all of the orbs here to their original sizes, which you saw me do. Similarly, if we get mouse tiny, we scale down to two. Or if we get elephant giant, we go up to ten. So again, hardwired here. Here are the transcripts where we are triggering off of the words to be spoken by the digital human. And in this case, the color yellow doesn't have a lot of tone or emotion in it. But what it does do is it triggers the text-to-speech expressive to be speaking about delights and being alive and good things. The system is still listening to that. And in turn, the tone analyzer later on is running. When the tone analyzer level exceeds the threshold of, I think, 0.75 here in the code, it'll pump up that little emotion balloon a little bit further, which is exactly what you saw. So again, just an example here of how you can have two counterparties. In this case, an actual human me and a digital avatar interacting. Later on, I'm going to try to get multiple agents that are talking to each other. And you should be able to see the emotional states changing and reacting accordingly. So I hope you have found this helpful. And yeah, thanks for time.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript