DeepSeek R1: Creating a Smart AI with Voice Output

Convert Your Audio To Text

4.9/5

3717 customer reviews

Explore DeepSeek R1, open-source AI model with voice output. Learn to build a 'rude girlfriend' chatbot, smarter than 99% of the world, offering cost-effective alternatives.

Build a Talking Smarter-Than-You AI Girlfriend (DeepSeek R1 Tutorial)

Added on 01/29/2025

Speakers

Add new speaker

Speaker 1: Let's talk a little bit with my new DeepSeek R1 girlfriend. Hi, honey.

Speaker 2: Don't call me that. What do you need?

Speaker 1: Do you want to meet me tonight?

Speaker 2: Why? Got something worth my time?

Speaker 1: I'm a nice guy and giving my best.

Speaker 2: Your best is underwhelming. Try harder.

Speaker 1: In this video, I'll introduce you to the best LLM currently available. It's an open-source model and it outperforms OpenAI's O1 model. We'll take it a step further by pairing it with the best open-source text-to-speech model to give our AI a real voice.

Speaker 2: Hi, how can I assist you today?

Speaker 1: Long-time subscribers will recognize the concept from our bad girlfriend AI, but this time it's not just smarter, it's at a PhD level. And we're adding memory so you can have endless, meaningful conversations. But is it expensive? Let's check the pricing. When you compare OpenAI O1 with DeepSeek R1, the difference is huge. OpenAI O1 costs $60 per 1 million output token. DeepSeek? Just $219? That's about 30 times cheaper. So what's the catch? As I mentioned in my last DeepSeek video, the company is based in China. That means at this performance level, you have to decide. Would you rather have your data read by American corporations and governments or by China? If you can't decide, there's also a data-privacy-friendly option. DeepSeek has released smaller open-source models that you can run locally or on cloud providers where you can rent GPU instances. Before we start building, let's take a closer look at the comparison with OpenAI's O1 to see if it really lives up to the hype. The dark blue graphs represent DeepSeek R1. The dark gray represents O1. And the other graphs show the smaller versions of these models. These benchmarks evaluate an AI model's performance in math problem-solving skills, general problem-solving and reasoning, programming challenges, and overall knowledge. As you can see, it's a head-to-head race across all key coding benchmarks. Therefore, it's crazy how cheap the DeepSeek models are compared to OpenAI's leading language models. I mean, it's a huge difference if you're paying $1,000 or just $37 per month for the same result. You can easily test whether the performance meets your expectations by using DeepSeek Chat. Simply activate DeepThink and let R1 do the work. And if you're convinced, it's time to take the next step. Let's see how to use DeepSeek R1 with code to build a rude girlfriend who is smarter than 99% of the world's population. We start our journey at the DeepSeek homepage. And when we go to the API section, we see that we are currently at 1.10 per million output tokens. And the new model is at 2.19 per million output tokens. That's fine. We still have the incredible sum of almost $2 left. Back then, I added just a few dollars. But that's enough for our demonstration today. We can now take a look at the docs to see how to get started. Here we see the API chat call, which we can initialize quite easily using this code. Let's click on the copy icon. We'll create a new folder and start in it. Let's call it DeepSeek R1. We switch into the folder and start our development environment. In my case, that's Cursor. We open the terminal and first create a new file, app.py. We can then open this file and paste in the sample code we just copied from the website. Next, we install the OpenAI library.

Speaker 3: And we also need a DeepSeek API key.

Speaker 1: For that, we go to API keys and generate a new API key.

Speaker 3: Let's copy it and then paste it directly in the code.

Speaker 1: Now we can see that the chat completion behaves just as we know it from OpenAI. Nothing special here. That means when the chat completion is executed, we get a response back and we print out the message content. Doesn't look familiar? Check the links in the description. Now we can start the code using pythonapp.py. And we see a friendly, how can I assist you today, message. All right, now we want to have a continuous chat. And we also want the responses to be read aloud by a friendly female voice. I'll show you how to do that next. First, we can remove this comment. What we want is a looping chat that allows us to have an ongoing conversation. So I create a while true loop and ensure that user input is taken directly from us. We see that up until now, the user role has always sent hello, up until now, the user role has always sent hello. But what we actually want is the user input. That should already be enough to maintain a continuous dialogue. Now let's test it. Let's start a new conversation by saying hi. And then we type tell a joke. The next thing we want to do is, of course, use DeepSeek R1. Currently, we use DeepSeek's old chat model. And if we want to use R1, we have to specify the new model. But using the new model has a few downsides. You'll notice that generating answers takes much longer. And we have to keep in mind that it costs more. The creation of a joke now takes up to five seconds. That means if we need reasoning, then upgrading is definitely worth it. But otherwise, due to cost and speed, we should use the smaller or older model whenever possible. What we still need is text-to-speech output. To do this, I first check what open-source models are available in this field. And when I think of open-source models, I immediately think of Hugging Face. On Hugging Face, we can browse models under the text-to-speech category. Here, we see that the most popular model right now is the Kokoro model. From here, we have different options. We could follow the installation instructions provided, but I think they are overly complicated and would make this video unnecessarily long. Instead, I choose a simpler approach and use replicate.com, where we can run open-source models. On Replicate, I can search for different models. In my case, I'm looking for the Kokoro model. And if we want to use it, we can copy the Python code provided here. I create a new method called say and paste the code here. This allows us to call the logic independently. Then I wrap the upper code in a function.

Speaker 3: Then we need to import replicate and install it along with a few dependencies. Now let's test the say method. Before we can run it, we still need the Replicate API key. We set it as an environment variable. We set the replicate token as an environment variable. Then we test it. And we see a new file is created. Let's play it back.

Speaker 2: Hi, I'm Kokoro, a text-to-speech voice crafted by HexGrad based on Static. Super, that already sounds good.

Speaker 1: Now, what we want next is for say method to accept text input and for the sound to be played directly. This means the generated file should be played using the play sound module.

Speaker 3: For that, we need to install it first. Now we can use the same method with the result of the language model.

Speaker 1: Since we are seeing a lot of duplication, we can create a new variable answer that holds the answer of the language model and use it with print and our new say method. All right, let's bring the AI girlfriend to life.

Speaker 3: Oh, nothing happens.

Speaker 1: We forgot to call the chat method.

Speaker 3: Let's add this quickly. Hi, how can I assist you today? It seems to work. Let's ask what is the meaning of life? And we see that the response takes quite a while.

Speaker 4: I mean, it's no simple question and I'm curious about the response.

Speaker 1: The sense of life is a deeply personal and subjective concept. It's a way of thinking about the world around us.

Speaker 2: The sense of life is a deeply personal and subjective concept, shaped by various philosophical, religious.

Speaker 1: Now, I want to show how to adjust the system prompt. Currently, the responses are too long. We could either set answer in just one sentence, which would help, or we could define an entirely different persona. That's exactly what I want to do now. I'll set up a system prompt and instead of a standard assistant, I'll use the bad girlfriend persona we used in a previous video. I create a multi-line text block and insert the bad girlfriend prompt, which defines her as a dark persona with specific traits. Now let's test how it behaves. We add the system prompt to the API call and run the program again.

Speaker 3: We restart the script and initiate a new conversation by saying, Hi.

Speaker 4: What do you want?

Speaker 3: Do you want to meet me tonight?

Speaker 4: Depends. You're worth the effort.

Speaker 1: The response sounds much colder and ruder, just as intended. But there's still one issue. Our girlfriend doesn't have short-term memory because we only pass one message at a time. We'll fix that now. The simplest way to handle this is by creating a history dictionary. Every time the user enters something, we store it with the user role. When we get a chatbot response, we also store it. Then, instead of just passing user input, we pass the entire history so the chatbot always remembers the full context of the conversation. Now let's test it again with short-term memory enabled.

Speaker 3: Hello. How can I assist you today?

Speaker 4: But I see the response is too friendly, which means the system prompt isn't properly set yet.

Speaker 1: To fix this, before the first user input, we need to add the system prompt to the history. That means, before the loop starts, we insert a system role message containing our predefined prompt. Now let's test again. Hi, honey. Don't call me that. What do you need?

Speaker 3: Do you want to meet me tonight?

Speaker 4: Why? Got something worth my time. I'm a nice guy and giving my best.

Speaker 2: Your best is underwhelming. Try harder.

Speaker 1: So that's how we quickly built a persona-based AI chatbot with memory and voice output. If you want to learn more, visit AI4Devs.com, where we have over 100 tutorials on AI implementation.

Summary

Generate a brief summary highlighting the main points of the transcript.

Generate

Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate

Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate

Enter your query

Submit

Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate

Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate

Back

Forward

{{ Math.round(speed * 100) / 100 }}x

Select Audio file

Convert Your Audio To Text

Secure and Encryption, NDA

4.9/5 3717 customer reviews

1/730

Verified Order

“Very accurate transcription, fast service, easy to use and order, thank you!”

Gabby

Jul 15, 2025

“I am beyond happy with this service, which I am using it produce interview transcripts for my dissertation research. The interface is easy, the customer service was prompt and informative, the transcript is accurate, and the pricing is wonderful. I will recommend GoTranscript to anyone who is in need of affordable human-powered transcription services.”

Justin McDonald

Jun 29, 2025

“great work. quick and professional”

christian oradesky

Jun 28, 2025

“Very quick turnaround and nicely done!”

Chris Irwin

Jun 27, 2025

We Trust in Human Precision

Value-Driven Pricing

Trusted by Global Leaders

GoTranscript

24/7 Customer Support