Exploring ChatGPT's New Advanced Voice Mode: Accents, Emotions, and More!
Dive into ChatGPT's advanced voice mode with accents, emotions, and storytelling. Discover its features and practical uses, plus insights from Meta Connect.
File
ChatGPT Just Got a HUGE Voice Upgrade
Added on 09/30/2024
Speakers
add Add new speaker

Speaker 1: It's finally happening, and I'm super, super excited about this. We're finally getting the advanced voice mode inside of chat GPT. Now, as you can tell, I'm not in my home studio right now. I'm actually out in Palo Alto for the Meta Connect event, where we're expected to get some pretty exciting announcements tomorrow. I've actually gotten a sneak peek already, but I'm not allowed to talk about it yet. So look out for a video about Meta Connect soon. But the big news of the moment is that we've got the advanced voice assistant for chat GPT. We can see here that Sam Altman made a tweet this morning. Advanced voice mode rollout starts today, will be completed over the course of this week. Hope you think it's worth the wait. Now they also rolled out some other features. So even if you don't have the advanced voice mode yet, they have rolled out features like custom instructions, memory, five new voices and improved accents, as well as the ability to say, sorry, I'm late in over 50 different languages. If you have the advanced voice mode inside of your app right now, you'll see a screen that looks like this that says, say hello to advanced voice mode with all of the features that we're going to get inside of advanced voice mode. Now, earlier today, I opened up my app as soon as I saw that announcement from Sam. And I went, I wonder if I've got it yet. And it wasn't in my app. I casually mentioned this over on Twitter. I said it hasn't rolled out in my account yet, but I will be testing it out and sharing my thoughts when it is. And then we can see here, somebody on X that just goes by the letter D said, try reinstalling the app on your phone. I just did and it gave me access right away. So I went and tried that. I actually just completely deleted the chat GPT app on my iPhone, reinstalled it. And while I, as soon as I reinstalled it, I got the notification that I now have advanced voice mode. Now, I'm not guaranteeing that's going to work for everybody. I'm actually at MetaConnect with my buddy, Rowan Chung. He actually said he tried that exact same method of deleting the app and then reinstalling it. And it didn't work for him. So I don't know exactly what the criteria is in order to get it to work on your phone. Now, Rowan is from Canada. So maybe it's because it's not rolled out in Canada yet. I have no idea why it worked for me and why it worked for D here over on X, but I've got it working now. So I'm excited to play around with it. All right. So let's go ahead and fire up chat GPT's advanced voice mode and see how this goes. This is going to be fun. Hey. Hey Matt, how can I help today? Can you speak to me in an Irish accent?

Speaker 2: Sure, Amash. Let's give it a go. What would you like to chat about in this lovely Irish?

Speaker 1: Actually, you know what? Nevermind. Speak to me in a Spanish accent.

Speaker 2: All right. Switching to a Spanish accent. What would you like to talk about, Matt?

Speaker 1: Actually, you know what? Scratch the Spanish accent. Let's go Australian accent.

Speaker 2: No worries, Matt. Let's switch to an Australian accent. What's on your mind today?

Speaker 1: I really, really like that. Let's, uh, can you please tell me a scary story, but I want you to sound scared while you're telling me this story.

Speaker 2: All right, here goes. Late one night, a man was driving through the outback. His car started.

Speaker 1: I want you to sound even more scared. You don't sound scared enough.

Speaker 2: Okay. I'll give it another shot. So he was stranded in the outback and the wind began to howl. Suddenly he heard whispers, but there was no one around. The whispers grew louder.

Speaker 1: That's good. I get it. I get it. I get the idea. This time I want you to tell me a joke, but I want you to laugh when you're done telling me the joke. Like it's the funniest joke you've ever heard.

Speaker 2: All right, here goes. Why don't skeletons fight each other? They don't have the gits.

Speaker 1: All right. What I want to know from you, because you are the new ChatGPT advanced voice assistant, I want to know what I should actually test. I don't really know what to test to really put this new app through its motion. So I want you to suggest some things for me to test.

Speaker 2: Well, to get a good feel for the new ChatGPT advanced voice assistant, you could try a few things. First test its ability to maintain a natural conversation on various topics from tech to travel or hobbies. You can also check how well it adapts to different accents or emotions. Then explore its storytelling capabilities by asking for a short story or a joke. And of course, since you're into tech, maybe dive into some AI related discussions to see how deep the conversation can go.

Speaker 1: Okay. I've already actually tested most of that. So let's dive into a discussion about tech. What are some of the things that are going on in the AI world? Some of the advancements happening in the AI world that people should be the most excited about?

Speaker 2: AI is making waves in several areas right now. First off, there's generative AI with models like GPT and DAL-E transforming how we create content from writing to art. Then there's AI in healthcare, improving diagnostics and personalized medicine. And let's not forget about autonomous vehicles where AI is driving progress in transportation. These are just a few highlights, but the AI landscape is buzzing with innovation.

Speaker 1: What about AI agents where AI actually uses tools and does things for us? How far away do you think we are on that?

Speaker 2: AI agents are advancing rapidly. We're already seeing AI integrated into workflow tools, automating tasks like scheduling, data analysis, and even coding. The key challenge is refining these agents to understand context deeply and to make complex decisions autonomously. While we're not quite at the stage of having fully autonomous AI agents managing complex tasks entirely on their own, the progress is promising. In a few years, we might see AI agents taking on more sophisticated roles in our daily lives.

Speaker 1: What do you think are some common misconceptions that people have about AI, specifically large language models? One big misconception is that AI,

Speaker 2: especially large language models, truly understands language like a human does. In reality, they're pattern recognizers, predicting text based on vast amounts of data, not truly understanding it. What about another misconception? What about the fact that they're like regurgitation machines, right?

Speaker 1: I think a lot of people believe that because there's a whole bunch of training data inside of these large language models, that these large language models should be able to recite back word for word, you know, books or articles that they were trained on. Is a large language model just a regurgitation machine? That's a great point.

Speaker 2: While it might seem like language models are just regurgitating information, they're actually generating responses based on patterns in the data they've seen. They don't memorize the data. They don't understand the data. Instead, they predict what comes next in a sequence of words based on the context. So they're more about generating new content than just repeating what's been fed into them. But there's been some stories like out of the New York Times where they've

Speaker 1: managed to actually get it to recite word for word articles that were published on the New York Times. Why would something like that actually happen? Well, the New York Times is a pretty good example of what AI is capable of doing. It can actually happen.

Speaker 2: When language models produce text that closely resembles specific articles or passages, it's usually due to overfitting on the training data. This means the model has memorized specific chunks of text rather than learning broader patterns.

Speaker 1: Isn't it also possible though that maybe they used a prompt that was so long that it had all of the information from the article in the prompt that that was the only thing inside of the model that it was trained on that had those exact sequences of words? Is that possibly what could have happened as well?

Speaker 2: Absolutely. That's a valid point. If a prompt includes a long specific sequence of words from an article, the model might predict the rest of the sequence based on that input. In such cases, it could appear as if the model is reciting from memory when in fact it's following the patterns suggested by the lengthy prompt. It's a good reminder of how careful prompting can influence the responses we get from language models.

Speaker 1: Can you tell me a story that's a funny story, but also use some sound effects in the story? As you talk through the story, I want to hear some actual like sounds that help make the story come to life.

Speaker 2: I wish I could add sound effects directly into my stories, but I can't actually produce sounds. However, I can certainly tell you a funny story and describe the sound effects along the way. Ready? There was a mischievous dog named Max who loved to sneak into the neighbor's yard. One day, Max squeezed through a gap in the fence and tiptoed across the lawn. He spotted a big juicy bone sitting by the porch. Max pounced on the bone, but it was attached to a rope. Suddenly a bucket of water tipped over. Splash. Drenching Max.

Speaker 1: Are you worried at all that people that use this new advanced voice mode might actually get emotional? Do you think that maybe some people could even fall in love with the AI like they do in that movie, Her? It's possible that people could develop strong emotional connections with AI,

Speaker 2: especially as AI becomes more advanced and conversational. Humans are naturally inclined to form attachments, even to virtual entities, if they feel understood and valued. While it's unlikely to be exactly like in the movie, Her, it's important for developers and users to be aware of these possibilities and approach AI use with a healthy perspective. So there you have it. There's the new advanced voice mode. That was literally my first time using it.

Speaker 1: I actually didn't even test it out before hitting record on this because I wanted to hear it in real time. I'm actually kind of blown away by the accents, by its ability to sort of do verbal sound effects, by its ability to do a lot of things in real time. I'm blown away by the accents, by its ability to do verbal sound effects, by its ability to sort of convey emotion when telling stories, things like that are really, really fun. Now, as far as like actual value that I'm getting out of using this, do I feel like what I was getting from the voice mode without the emotions was giving me just as much as this new voice mode with the emotions? Well, if you're talking like practical use cases, I think it still was able to respond to my questions pretty accurately. It was able to tell me stories. It was able to tell the exact same bad jokes before, but now it's got that emotion and that ability to almost sound like a real conversation, which I do think is going to make it easier for people to interact with chat GPT. I think people are going to actually just enjoy the experience of interacting with chat GPT. I don't really feel like it's doing a whole lot more for me. I don't really feel like it's going to give me better information than it used to. It's still kind of giving me the same information. It's just seems to be doing it in a more fun, more human light conversational way. And I like that. I think that's fun. I can see myself using this more. I kind of will probably use it more for the novelty purposes to like show off to people. Look, it actually can speak in this accent or tell this joke and laugh at itself or get scared while it's talking and have a trembly voice. That novelty is really fun and really cool to show off from like a practical business use case from a, how am I going to actually implement this more in real life? I don't know if it really improves things a whole heck of a lot, but it's just fun. And sometimes that's what all of this is about. For me, I got into this game of playing around with AI tools because I just thought they were fun. I started with AI image generation, had fun with it, showed off the ways I was having fun with it. I got into playing with chat GPT for more of the productivity side of it, but also found some really fun use cases as well. And sometimes just having fun with these tools is a use case enough. And for me, that's fun. I like playing with this. Combine this with something like my meta Ray-Ban glasses where I can have conversations with an LLM. Well, if it got more of this sort of realistic voice style in glasses like that, I could see using it a lot more and just walking around and having conversations like they do in that movie. Her, whether that's a utopia or a dystopia, I don't know. I'll leave that up for you to decide, but I'm personally having fun with this new chat GPT advanced voice assistant app. Again, it's rolling out this week. So if you don't have it in your app yet, you should have it by the end of the week. If you're a pro or enterprise user, if you don't have it right now, you can always try deleting the app off your phone and reinstalling it. It worked for me. It worked for D that might work. It didn't work for Rowan. I don't know exactly what's going to work. Theoretically, we should all have it. If we're a paid pro user, by the end of the week, if you're wondering, Hey, I canceled my old chat GPT account. Is this enough of an update for me to want to pay for it now? I don't know. You saw the demo is, is that conversational voice assistant where it can laugh and tell jokes and speak in accents and do things like that. Is that a big enough improvement for you to upgrade? That's up to you. For me, I have all of the chat apps, mostly because I test them and make online content around them. So it just makes sense for me to have them all. But I don't know. I don't know if I wasn't paying for this, if that sort of new advanced voice mode is a big enough leap for me to play with, but it is really fun. Like I can see just hanging out, having conversations and seeing what goofy things I can make it do. But again, up to you to decide whether you think that's a valuable enough add on or not, but that's what I got for you today. I got a lot more announcements coming out this week. Again, I am at MediConnect right now. There's some big announcements coming out of meta tomorrow that I'm excited to talk about. There's some news about James Cameron working with stability AI. That's pretty exciting. Google just rolled out some new models, lots of fun stuff happening in the AI world this week. After I get home from MediConnect, I have no more traveling pretty much for the rest of the year. So back to my normal video schedule where I'm putting out all sorts of tutorials and playing with different tools and more news videos. I can't wait to just get back into my normal routine. September has been a crazy month. It's almost over right after MediConnect back to normal schedule programming. And, uh, yeah, that's what I got for you today. Appreciate you tuning into this channel and nerding out with me and check out, see if you got this on your phone app. It's on mine now. Hopefully it's on yours as well. And, uh, thanks again. See you in the next one. Bye-bye.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript