Speaker 1: These are the five best open-source text-to-speech softwares that I've come across over the past year. This here is just a quick sample of my voice with a British accent, and this is how I actually sound. So with that, let's go ahead and jump into it. So the first one is going to be Sunos Bark, and Sunos Bark came out about midway last year, and this one had a very great example in the beginning, and it caught a lot of attention from me and many others, but I'm going to go ahead and play some audio samples from it. So here's their curated example.
Speaker 2: Hello, my name is Suno, and I like pizza. But I also have other interests such as playing tic-tac-toe.
Speaker 1: So that's their example, and if we go ahead and generate that with a different speaker on the Hugging Face space, here is that audio sample for the same sentence.
Speaker 3: Hello, my name is Suno, and I like pizza. But I also have other interests such as playing tic-tac-toe.
Speaker 1: So those are a couple of audio samples from it. I think it's good, but it's not the best. It was pretty cool when it came out, but there hasn't been much update from it because Suno seems to have been focusing a lot more on their chirp feature, which I'm not going to go over in today's video. Up next is ValleyX, and this one is kind of in the same space as Bark. It's not the best, but it's also, it can do voice cloning as well. It's decent, but let's go ahead and record some audio from my mic and just give a sample of it. We're going to be testing the ValleyX text-to-speech software, and this is a quick audio sample. We're going to go ahead, stop there. Let's change the text here, and then I'm going to go ahead and generate. So here's what WhisperX transcribed, and here's the synthesized text. So let's go ahead and listen to the output audio. Thank you for watching this YouTube video. Like and subscribe for more. So I guess it's trying to match the tone of my voice, but that is very robotic. It can give some pretty good outputs on some text prompts, but both Bark and ValleyX are kind of in the same boat for me to where it can produce good stuff, but it's not accurate all the time. And I do just want to infer from a pre-made voice that I have in here as a preset, which is Melina from Elden Ring, and we're going to go ahead and use that same sample for here. So this is going to be a different voice, and it's finished up here. So here we go. Let's go ahead and take a listen to this.
Speaker 2: Thank you for watching this YouTube video. Like and subscribe for more.
Speaker 1: All right, and so that is okay, and probably not going to be fooling anyone with this one. I do just want to give a quick sample with a Japanese voice that I have as a preset here, as I do think it's okay with Japanese. So here we go. Let's just go ahead and do... Thank you for watching, and here it is. So I don't suppose a lot of you guys know who Takane Rui is, but this is pretty close, and I think it's pretty good. Next up is going to be StyleTTS2, and I haven't gotten around to training any voices with this, but the voice out of this one has been pretty fantastic. An active member of my Discord, Eleven, gave me a sample of one of his voices that he trained, and I think it sounds absolutely fantastic. So I'm going to play that audio sample real quick. So here it is right here.
Speaker 4: In the quiet stillness of twilight's glow, she gazed upon the ethereal beauty that was once only imagined. The pale moonlight danced across her delicate features as tears of awe and wonder trickled down her cheek.
Speaker 1: So that was just a quick sample of a generated voice, and it's pretty fast actually for StyleTTS. And if you want to take a look at some other audio samples, they've got a whole paper that you can take a look at up on this site here, which you can find on the homepage of the GitHub page at audiosamples. So working on getting this all trained up, I will be following Eleven's style fine tune here, or maybe doing a couple of things, modifications as I normally do to get a voice trained. And when that's finished, I'll go ahead and update you guys on that one. And the next one is Koki TTS's XTTS, and I was able to do a fine tune of this with Dasr's web UI here. So I'm just going to go ahead, pop this one open, and get going with it. All right, so I have it all loaded up. So let's just run generation on it and take a listen to the audio sample. So this one's also pretty fast XTTS. All right, and here it is. Let's go ahead, listen to the generated audio. Thank you for watching. This is the XTE's text-to-speech software, and it's pretty good. And so I probably should say something like XTTS. That'll probably get it a little bit better. Run inference on this one more time, and then we'll take a listen. This is the XTTS text-to-speech software, and it's pretty good. So yeah, in my opinion, this is better than the Bark Suno. And I do think the quality could be a little bit better. XTTS, to my knowledge, is basically Tortoise, but just trained a little bit differently, and using Hi-Fi GAN instead of the Diffusion for the male spectrogram. And let me just use one more voice here that I have. This one is a Marine voice clone, and this is a Japanese voice, so we can use Japanese for this case. I can go ahead and switch it to Japanese, and then I'll say the same thing. Thank you for watching. Do inference on this. So here we go. Let's take a listen.
Speaker 2: Thank you for watching.
Speaker 1: And the pitch of it isn't that close, but if you ran it through RVC, you could get something that is pretty decent. And last but not least is going to be Tortoise TTS, my favorite, and the one I'm most experienced in. I've done the most training on Tortoise TTS, and we're going to use a voice copy or a voice clone of my voice. I'm reading out this sentence here, so I'm going to generate and get this going. So here we go. Let's take a listen. These are the five best open source texts to speech softwares that I've come across over the past year. And to me, this one has the best quality out of all of the ones that I've tested so far, and maybe that's just due to my experience with it. But this is the one that I like to pipeline with RVC, and if we take a look at my audio book maker right here, I can go ahead and start some audio book generation, and it's going to start generating some sentences, and I'll go ahead and play that from here. So these are the sentences you heard in the beginning of the video, and I'm going to play all from selected. These are the five best open source texts to speech softwares that I've come across over the past year. This here is just a quick sample of my voice with a British accent. And this is how I actually sound. And so that is with Tortoise TTS after RVC, and it's really fantastic, and probably one of the best that I've come across. I'm going to go ahead and load in another voice. I'm going to use one for a Lex Friedman clone that I did a couple of months back, and I'm going to go ahead and run generation on this. So the cool thing about Tortoise TTS is with DeepSpeed, it's pretty fast, and let's take a listen.
Speaker 5: These are the five best open source text to speech softwares that I've come across over the past year.
Speaker 1: And so there is the native output from Tortoise. But if you do something like this where you pipeline RVC after the voice, we can go ahead and take a listen to now how it sounds.
Speaker 5: These are the five best open source text to speech softwares
Speaker 1: that I've come across over the past year. And there you go, it matches the voice a little bit better. And after all of those ones, Tortoise TTS for me still comes out on top. All right, so those are my five picks. What do you guys think about them? I know some of these depend on the voice that's being trained on them. Obviously, if you train a voice, it's going to sound better than the zero-shot inference ones, the ones where you just upload an audio file and try to clone a voice that way. Out of all of them, I personally still am a big fan of Tortoise TTS and the AI voice cloning by just running it through RVC afterwards. Sounds fantastic, and it maintains really good prosody intonation. So if I missed anything or if you want me to go into more details about anything on here, please let me know down in the comments below. Once again, thank you to everyone that watches this video. And of course, thank you to my supporters of the channel for being members and supporting me there. I hope everyone's had a great start to the new year. And this kicks off the first video on my channel for 2024. So see you guys later.
Generate a brief summary highlighting the main points of the transcript.
GenerateGenerate a concise and relevant title for the transcript based on the main themes and content discussed.
GenerateIdentify and highlight the key words or phrases most relevant to the content of the transcript.
GenerateAnalyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.
GenerateCreate interactive quizzes based on the content of the transcript to test comprehension or engage users.
GenerateWe’re Ready to Help
Call or Book a Meeting Now