GoTranscript
>
All Services
>

Public/how To Clone A Voice On Your Pc With Voicebox

How to Clone a Voice on Your PC with VoiceBox (Full Transcript)

A quick walkthrough of VoiceBox: create voice profiles from short samples, generate TTS with modern models, use stories, effects, and export audio locally.
Download Transcript (DOCX)
Speakers
add Add new speaker

[00:00:00] Speaker 1: This is what my real voice sounds like and this is my AI cloned voice. Hi, this is Kevin's AI voice. You probably can't tell the difference. I was able to clone it in under 30 seconds. I'll show you how to clone any voice on your PC with no subscriptions and no complicated setup. All you need is a short voice sample and you can generate new speech in just seconds. Let's dive in. To do this, I'm using a free and open source app called VoiceBox. You'll find a link right here at the bottom of the screen. VoiceBox uses some of the best AI voice models behind the scenes, including Quinn3 and Chatterbox. On the website, click download and here choose your operating system and then run through the install. Once you finish downloading and installing VoiceBox, you'll land here in the main interface. Now, there's no complicated setup and everything runs directly on your PC. Let's start by creating our first voice. Over on the left-hand side, make sure to click on this icon and right here in the center of the screen, you'll see a large button that says create voice. Let's click on that. You have a few options here. You can upload audio. You could also record your own voice or right over here, you could even capture system audio. For now, I'm going to record my own voice. Now, as a note, you can record up to 30 seconds. I found that the closer you get to 30 seconds, typically the better the quality, but it still works even with just a few seconds. I'll click here and then record. Hi, my name is Kevin Stratford and today, once you finish recording, right down below, click on transcribe. If this is your first time, it'll need to download a model to process your audio first and right down below, I could see everything that I said in my sample recording. Now, over on the right-hand side, you can give this voice a name. I will call mine Kevin because this is my voice. Once you type that in, in the bottom right-hand corner, we can now create a voice profile. I'll click here. Now, this is the fun part. This is where we can turn text into speech. Right up on top, make sure to select the voice that you would like to use. I'll select Kevin, the voice that I just created. Then, down below, type in the text. So over here, the Kevin Cookie Company has the best cookies in the world. Over here, you can choose the model. Now, Quen 3 TTS is a brand new model that works exceptionally well, so I recommend going with this, but you do also have a few other options. Once you finish typing in your text, right over here, click on generate. After a few moments, you could play it back. Let's listen to how it turned out. The Kevin Cookie Company has the best cookies in the world. Sounds pretty good. If you like the result, in the top right-hand corner, you could click on the three dots and over here, you could export the audio. You could also try regenerating if you want to try again and down below, you could also delete. Now, let's try one more example. Let's now create a new voice. This time, instead of recording the voice, let's try uploading some audio. I'll click on upload and then click on choose file and there I see my file. I'll double click on that and then let's click on transcribe and I see all the text down below. Now, over on the right-hand side, just like we did before, let's type in a name for this voice. I'll call it the narrator and then click on create profile. Right up on top, you'll now see that we have two voices. I'll select the new voice and down below, I could type in text that I would like that voice to narrate. Then, over here, let's generate some speech. Once it's all done, let's play it back. Very cool. VoiceBox also has some really powerful features. Over on the left-hand side, click on the stories icon. Right up on top, let's add a new story. I'll click here and I can give the story a name. I'll call this the cookie question. Then, let's click on create. In the bottom left-hand corner, we can now type in the first line of the story. I'll enter in the question, who makes the best cookies? Then, down below, you can also choose the speaker for this line. I'll put it down as Kevin and over here, we can now generate the speech. Now, I could enter in another line. Here, I'll type in the next line. Why, of course, it's the Kevin Cookie Company and down below, I can now choose the speaker. I'll set it to the narrator and then click on generate. At the very bottom, we can now play back the entire conversation.

[00:03:59] Speaker 2: Who makes the best cookies?

[00:04:00] Speaker 1: Why, of course, it's the Kevin Cookie Company. Now, some of the cool features here. It's a timeline, so you can move different items around. You could adjust the timing and up on top here, too, you could also shift the position of the story. Once you're all done, in the top right-hand corner, you can export the audio. This combines all the lines from each speaker into just a single file. Let's now take a look at some of the other functionality. Over on the left-hand side, we could click into voices and here you can review all the different voices that you've created. If we click on effects, here you could see different effects that you could apply to voices. For example, if you want to make a voice sound more robotic. In other words, if you want to sound like me. You also have other options like radio, echo chamber, and deep voice and up on top, you can even create your own. Over on the left-hand side, here you could also confirm the different audio devices like which microphone you're using or which speakers you're using. Over on the left-hand side, we could click here and you could also look at which models are being used behind the scenes to generate these voices. Lastly, let's click into advanced settings. To speed things up, up on top, click on GPU and if you have a dedicated graphics card, you can select that. By default, it'll run on your CPU. For the best results, here are a few quick tips. Make sure to use a clean audio sample with minimal background noise. Try to record in a quiet environment and make sure to speak clearly. Longer samples usually produce the best results, especially if they're closer to 20 to 30 seconds long. Also, if the output doesn't sound quite right, try generating it again or slightly adjust your input text. And now, you could clone any voice for free directly on your PC. You just want to make sure that you use this with voices that you have permission to use. Let me know what you think about this tool right down below in the comments and I'll see you in the next video.

ai AI Insights
Arow Summary
The transcript demonstrates how to clone a voice locally on a PC using a free, open-source app called VoiceBox. The speaker explains downloading and installing the app, creating a voice profile from a short recording or uploaded audio (up to ~30 seconds), transcribing the sample, naming the voice, and generating text-to-speech using models like Quen 3 TTS. It also covers exporting generated audio, creating multi-speaker “stories” with a timeline interface, applying voice effects, selecting audio devices and underlying models, and improving performance via GPU selection. The speaker ends with tips for best quality (clean, quiet recordings, clearer speech, longer samples) and a reminder to only clone voices with permission.
Arow Title
Clone Any Voice Locally with VoiceBox (No Subscription)
Arow Keywords
VoiceBox Remove
voice cloning Remove
AI voice Remove
text-to-speech Remove
TTS Remove
Quen 3 TTS Remove
Chatterbox Remove
open source Remove
local PC Remove
GPU acceleration Remove
audio transcription Remove
voice effects Remove
multi-speaker stories Remove
export audio Remove
privacy Remove
Arow Key Takeaways
  • VoiceBox is a free, open-source tool that runs locally on your PC for voice cloning and TTS.
  • You can create a voice from a short recording, uploaded audio, or captured system audio; ~20–30 seconds usually yields better quality.
  • Workflow: record/upload → transcribe → name voice → create profile → enter text → choose model (e.g., Quen 3 TTS) → generate → export.
  • You can regenerate outputs, manage multiple voices, and export generated speech as audio files.
  • The Stories feature enables multi-speaker dialog on a timeline with adjustable timing and single-file export.
  • Voice effects (robotic, radio, echo, deep voice, custom) can be applied to alter output.
  • Performance can be improved by enabling GPU in advanced settings.
  • Best results come from clean, low-noise samples, clear speech, and trying slight text tweaks if output sounds off.
  • Only clone voices you have permission to use.
Arow Sentiments
Positive: The tone is enthusiastic and instructional, emphasizing ease of use, free/open-source access, and “fun” features, with a brief caution about permission-based ethical use.
Arow Enter your query
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript