[00:00:00] Speaker 1: How do you realistically lip-sync any audio with any image and what model should you use?
[00:00:05] Speaker 2: If you go to 11labs we can click on the image and video tool on the left and down at the bottom we can switch to video mode and here if we click on the AI model we can scroll down and we've got a few different lip-sync models. So if we clicked Createify Aurora we could then go and either select an avatar or upload our own and then we can add speech and here we can actually add any of our past generations from within 11labs. We could go and create some new speech within 11labs or we can click the upload button and again upload our own and so here I'm just uploading a previous generation from 11labs and then we click generate.
[00:00:42] Speaker 3: Hey team, how are you doing today? Any news about the, you know?
[00:00:47] Speaker 2: But the question is what model should you use? Well here's the same generation with Createify Aurora, OmniHuman 1.5 and WAN 2.6 side by side.
[00:00:56] Speaker 3: Hey team, how are you doing today?
[00:00:58] Speaker 2: Any news about the, and so as you can see all three of them provide very different results. If we look at OmniHuman, OmniHuman was the one that took the longest to generate. The movement is a little more present compared to Createify Aurora, however with OmniHuman we've got a smile all the way through and we see a lot of teeth with the lip-sync, however looking at Createify Aurora the expression is much more nuanced and matches the dialogue a little better in my opinion. It also generated faster and it costs less credits. Now if we look at WAN 2.6, as you can see both of the results are pretty crazy. There's a lot of movement in terms of the body and also the camera movement. And so WAN 2.6 gives some really good generations and the quality is very crisp but it's a lot harder to control. To fix that I always include in the prompt still continuous shot to get something decent.
[00:01:45] Speaker 3: Hey team, how are you doing today?
[00:01:47] Speaker 2: Any news about the, and as you can see there the lip-sync is actually very good with WAN 2.6 but it takes a few extra prompts to get the result you're looking for. Now the nice thing about Createify Aurora is that you can have much longer generations compared to the other two models. Here if I switch to WAN 2.6, as you can see we've only got the choice between 5, 10 and 15 seconds but we can have a 1080p resolution. Whereas if we switch to OmniHuman the length of your video is dictated by the length of your audio that you upload and it can go up to a maximum of 30 seconds. But if we switch to Createify Aurora, but the issue with Createify Aurora is that you have a maximum resolution for now of 720p. So you do need to go ahead and upscale it afterwards with Topaz. But another constraint is that Topaz only accepts 30 second clips at a time and so if you generated longer than 30 seconds you would have to split the video up to be able to feed it through Topaz upscale in 11 laps.
[00:02:43] Speaker 4: Hey chat, what do you think? Do I look real or do you think it sucks?
[00:02:47] Speaker 2: And as you can see again, three great results, good lip syncing, the hand movement in Createify is a lot more contextual. In WAN 2.6 the video again is very crisp but the lips don't quite match the audio that we've uploaded. But then if we look at OmniHuman we've got some nice movement and decent facial expression but I had to prompt it multiple times to get the result I was looking for. And so here for this video I preferred the output of Createify but then I did have to go and upscale it with Topaz afterwards to get a crisper looking video. For long form more lo-fi content Createify Aurora might be the best. If you're doing shorter form videos you might want to try OmniHuman or even go with WAN 2.6 to have more crisp outputs but a harder to control video. But on the flip side WAN 2.6 does give you some incredible creative freedom allowing you to direct all of the actions and camera angles within your lip synced video just like this.
[00:03:42] Speaker 3: Hey team how are you doing today?
[00:03:44] Speaker 1: Any news about the...
[00:03:45] Speaker 2: And so as you can see there we've directed the exact movement that we want and we've got it as well as lip syncing the audio to the image. But that is how to lip sync your audio inside of Eleven Labs. If you have any questions let us know in the comment section down below and if you enjoyed this video hit that like button and don't forget to subscribe. Thanks for watching.
We’re Ready to Help
Call or Book a Meeting Now