Create AI Cartoon Ads With Natural Lip-Sync in Minutes (Full Transcript)

A practical 5-step workflow to script, generate consistent characters and scenes, add multi-speaker voices with accurate lip-sync, and edit/export your final ad.
Download Transcript (DOCX)
Speakers
add Add new speaker

[00:00:00] Speaker 1: In this video, I'm going to show you some of the most realistic AI lip-syncing I've seen, and how you can create a short animated commercial like this one yourself. Now, pay close attention to how naturally the mouth movement matches the dialogue.

[00:00:16] Speaker 2: Kevin Cookie Company Emergency Line, what's your situation?

[00:00:20] Speaker 3: I finished the cookies.

[00:00:22] Speaker 2: All of them?

[00:00:22] Speaker 3: It's always all of them. Yes, there's nothing left.

[00:00:26] Speaker 2: Okay, stay calm. Do you have crumbs?

[00:00:29] Speaker 4: No.

[00:00:30] Speaker 2: This is a code zero cookie emergency, help is on the way.

[00:00:33] Speaker 4: It's the Kevin Cookie Company.

[00:00:35] Speaker 1: Prepared for emergencies, enjoy the cookies. If you've ever tried creating an AI video before, you know how hard it is to get accurate lip-syncing, especially for fast dialogue or multiple speakers. I partnered with D-Zyne for this video, and we'll walk through how you can create this in just five simple steps, including how to write a short script, create consistent cartoon characters, generate the scenes, give them a voice with accurate lip-syncing, and then put it all together. By the end, you'll be able to create your own animated cartoon videos with natural looking dialogue. Hi, I'm Kevin, and let's get started. The first thing we need is a very short script. This is where you call out the different scenes, the characters, and the dialogue. I like to break this into three simple parts. First, we have the scene. This is where we are and what's happening. Second, the character, who's speaking. And third, the dialogue, what are they saying? For example, I put together a short commercial for the Kevin Cookie Company, my favorite company, and here you can see my script. Feel free to pause if you'd like to take a closer look. Now, you can try writing your own script using this same exact structure. And if you want a little help, of course, you could use tools like Chatsubt or Gemini to brainstorm ideas or just tighten up your dialogue. Next, we're going to generate the cartoon characters. In my video, I have three characters. We have two Kevin Cookie Company employees and then also the caller. Since they appear across multiple scenes, the most important thing here is keeping the characters consistent from shot to shot. To create the characters, I'm using AI image generation. There are lots of options out there, but in this video, I partnered with Design. You'll find the link at the bottom of the screen and in the description. It lets you access multiple image generation models in one place and makes it easy to create consistent characters. Here on the home screen, scroll down just a little bit and you'll see the option for consistent character. Let's click on that. Then on the next screen, over on the left hand side, let's click on build your character. Here we have two different options. I'll start in quick mode since I don't already have a character. And right here, I want to describe what the character should look like. Let's click here. For my first character, I'll call him operator one. And right down here, I could type in a description. Let's make him a calm, professional call center operator. Then right down below, we can choose a style for the character. And there are lots of different styles that you could choose from. But for this video, we'll keep things simple and I'll go with the simple 3D cartoon style. Then down below, this is a fun one. You could reference a face. Let's click on that and I'll reference my own face for this character. I think that'll be fun. Once you finish filling in all this information, in the bottom right, click on generate. I like it. I see some resemblance there. Let's go with the one on the right. I'll choose that and then let's start building. Over here, let's now close the window. You'll now notice in the choose a character menu, my new character appears and I can now use him in different scenes. But before we do that, let's add our next two characters. Right up above, let's now build another character. Let's now repeat the same process for the second operator. Right up on top, I'll type in the name operator two. Here, I could type in a description. And let's keep the style consistent so they clearly look like coworkers. I'll choose this. This time, we're not going to use a reference face. Down below, let's now generate the character. I like the one on the left. I'll choose that and then let's start building. Right up on top, let's click on close. Finally, let's create the last character. Up on top, let's click on build your character again. Quick mode. Start with a description. Up on top, I'll type in the name, the caller. Here, I'll describe the caller. Let's use the same style just so everything feels like it belongs in the same world. Then in the bottom right, let's generate. Let's go with the one on the right. I'll choose this and then start building. And over here, let's close this. Now that we have the characters, let's generate the scenes. Over on the left-hand side, let's click on character and let's generate images or a scene using those characters that we just created. Let's click on this. At the very top, I can choose the character that I would like to feature in this scene. And over on the right-hand side, here I see all the characters that we just created. For the first scene, I'd like to include both operator one and operator two. Let's first click on operator one and that selected him. Now, right over here, let's click on the plus icon to add another character to this scene. As a note, you could place up to four characters in a scene. Right over here, I'll select operator number two. And here, we see both of them now show up in the description field. Next, we can describe what we want the scene to look like. Here, I'll enter in my description. I'm describing the Kevin Cookie Company Operations Center. In general, the more descriptive you are, the closer the scene will match what you had in mind. Down below, you can choose the model. I'll go with V2. And you can also choose the aspect ratio. I want this to be a traditional horizontal video that you typically find on YouTube, so I'll select 16 by 9. But if you're creating, let's say, a vertical video, you could also choose 9 by 16. Once you finish entering in all these details at the bottom, click on generate. Over on the right-hand side, we can now preview the new scene. Over here, I'll double-click on it and that places it on the canvas. Let's close this overlay. And nice, it took our characters and placed them into the scene. Looks really nice. Now, up on top, you have a variety of different tools that you could use to edit the scene. For example, you could select certain areas, you could change character expressions, and even more. Let's now create the next scene. Over on the left-hand side, let's remove these characters and that also removes the description. Just like we did before, let's choose the character. I'll click here. And for this scene, we're going to include the caller. Then over here, I can type in my description. And let's choose the aspect ratio. I also want this to be 16 by 9, so let's choose that. Here, I'll close that overlay. And everything else looks good. At the bottom, click on generate. Over on the right-hand side, we get two different options for this scene. I think I like the one on the right. It turned out nicely. Let's now generate the final and third scene for this video. Over here, let's clear this out. Let's start by choosing a character. I'll click here. And the last scene will have both the operator as well as the caller. I'll select both of those characters. And here, I'll type in my description. For this scene, I'd also like this to be 16 by 9, so I'll select that. And then I'll click on generate. Over on the right-hand side, we can now preview how the scene turned out. And I think that looks really good. And one thing to call out, notice how the operator 1 character is consistent between the two different scenes. The consistent character makes that so much easier. Now that the scenes are ready, the next step is to turn these into short videos and add voice and lip syncing. I'll start with one scene at a time and keep the animation pretty subtle. Now, for a short commercial like this, small movements tend to look the most natural, especially since we'll be focusing on the lip syncing. Let's start with scene number one, where we have the two characters sitting side by side. Over on the right-hand side, I'll scroll down and there I see the scene. Over here, let's select lip sync. The tool automatically detects the faces in the scene. Here, I'll make sure to select her face. If it ever misses a face, you can also mark it manually right down here. Let's click on next. This now drops us into the lip sync interface. And at the bottom of the screen, you'll notice a timeline. On top, we have the video or the scene, and then we have the two speakers. Let's start by choosing a voice. I'll select speaker A or operator 1 first. You'll notice that there are lots of different voices that we could choose from. And here, we could even preview what they sound like. Let's have a listen. Hold on to your hats, folks, because the news coming out from Hey, what's up, guys? This is Finn. I can't wait to work with you. Now, we can even choose voices in different languages. Up on top, it's currently set to English, but we have lots of different options here. And this one's fun. You can even upload your own voice if you want to sync that to the character. Now, for the first operator, I listened to a whole bunch of different voices, and I really like the way that Johnny Dynamite sounds. So, I'll select this one, and let's have a quick preview. You've got it on cue 106 points. Yeah, I think that'll work for that character. Now, down below, we could also enter in the first line that we want him to say. So, over here, I'll say, Kevin Cookie Company Emergency Line, what's your situation? Then, down below, we could generate the audio.

[00:08:49] Speaker 2: Kevin Cookie Company Emergency Line, what's your situation?

[00:08:53] Speaker 1: Ooh, that fits really nicely. Now, down below, you could also choose a different voice, and then you could regenerate, but I think this will work. So, over here, let's click on Apply. Over on the right-hand side, we could zoom in on the timeline so we could see all the different lines better. Here, I'll zoom in a little bit, and there we get a closer view. Next, I'd like to add another line for Speaker A. So, over here, I could click on the timeline, and we can now add his second line. So, here, I'll click on Add, and again, we're going to go with Johnny Dynamite, and he's already selected. And next, let's update the text to all of them. Then, down below, I'll click on Generate. Now that I've inserted the next line, you'll notice that appears on the timeline, and over here, I can move the position. So, maybe I want a little bit of a pause between these lines. I could leave a gap, or maybe I want to shift the position. Here, I could drag and drop that behind the other line, but actually, I want to start with this line, so let me move that to the beginning, and then I'll leave a little bit of a pause between the two lines. Next, let's choose a voice for Speaker B. Over here, let's click on Pick a Voice. Right up above, I can now choose the voice for the second operator, and over here, I'll scroll down just a little bit, and here we have Yi. Let's have a quick preview.

[00:09:57] Speaker 3: One today is worth two tomorrows.

[00:09:59] Speaker 1: Oh, that really fits that character well. Over here, I'll check her, and down below, I could type in her line. It's always all of them. Then, let's generate the audio.

[00:10:07] Speaker 3: It's always all of them.

[00:10:09] Speaker 1: Oh, perfect. I like that. Over here, let's click on Apply. I can now see Speaker B's dialogue on the timeline, but currently, it's overlapping Speaker A's speech. So, let me take this and move it so it's right after Speaker A's last line. I'll place it right there, and we can now have a listen to what it sounds like all together.

[00:10:26] Speaker 2: Kevin Cookie Company Emergency Line, what's your situation? All of them?

[00:10:31] Speaker 3: It's always all of them.

[00:10:32] Speaker 1: Okay, pretty good so far. Let me go through now and add the remaining lines. I've now added all the different dialogue lines that take place in this scene, and for now, I've just placed them in order. We can always trim and tighten everything later when we bring this into our editing software. Now that all the dialogue's in place, right up above, I can choose the output quality. I'll go with the highest level of quality, 1080p, and then next, let's click on Generate to pull together the lip sync. Over on the right-hand side, let's now look at how it turned out.

[00:11:01] Speaker 2: Kevin Cookie Company Emergency Line, what's your situation? All of them?

[00:11:06] Speaker 3: It's always all of them.

[00:11:07] Speaker 2: Okay, stay calm. Do you have crumbs? This is a code zero cookie emergency. Help is on the way.

[00:11:13] Speaker 1: That turned out great. Now, this is where lip syncing usually breaks down, especially with two characters, but each voice stays locked to the correct face. You can see how the mouth movements line up really naturally with the dialogue, and there's just enough animation in the scene to keep it feeling alive without being distracting. Let's move on to scene two with the caller. Right down here, I want to use scene number two, and let's go with the lip sync. I'll repeat the same process here. It automatically detects the face. Let's click on Next. Let's go with 16 by 9, and then click on Next. Down below, let's pick a voice. I think kawaii will work well.

[00:11:45] Speaker 5: Hey, I'm Arasita. I might be a little ditzy.

[00:11:49] Speaker 1: That sounds great. Let's check that, and here I'll enter in her line of dialogue, and then let's generate it.

[00:11:54] Speaker 6: I finished the cookies.

[00:11:56] Speaker 1: Perfect. Let's apply that. Now, I'll go through and add the rest of the dialogue. Now, let's generate the lip sync. After a few minutes, the lip sync is ready. Let's preview how that turned out.

[00:12:05] Speaker 6: I finished the cookies. Yes, there's nothing left. No.

[00:12:10] Speaker 1: Very cool, and again, you can see how accurately the lip movement matches the dialogue with a little bit of subtle animation that keeps everything looking natural. For the last scene, I want to add a little bit more motion, and to do that, we're going to turn the scene into an AI video first. So here, I'll scroll down to the scene that we have right here, and you'll see the option for AI video. Let's click on that. Over on the left-hand side, you'll notice that the image or the scene that we created becomes the first frame of the video. Down below, I could describe what I want to have happen in the clip. So over here, I'll type in some text. Underneath that, you can choose the AI video model that you would like to use. For this, I'll stick with the default, and down at the bottom, I'll click on generate. It finished generating the clip in the top right-hand corner. Let's have a quick preview. There, it gives us a nice short animated clip, and we have more movement than a static image while still keeping the faces clear. Over here, I'll close out of the preview. Now, let's add the lip syncing to this video by clicking here. Just like before, it automatically detects both faces. Over here, I'll click on her face. Then, let's click on next. On this screen, just like we did before, I'll pick the voices and add the dialogue. Once the audio is in place, I can line it up here on the timeline, and when everything looks good, right up above, I'll click on generate. Over on the right-hand side, let's preview how it turned out.

[00:13:27] Speaker 4: It's the Kevin Cookie Company.

[00:13:29] Speaker 1: Prepared for emergencies, enjoy the cookies. And there, you can see the difference. The body motion from the video stays intact, and the lip movement updates to match the dialogue. Let's now download all of the clips that include the lip syncing. Now that we have all of our scenes and also the narration, let's bring everything together in an editor. I'm going to use Clipchamp since it comes free with Windows, but you could follow along in just about any editor. You have CapCut, DaVinci Resolve, or just whatever you're comfortable with. Here in Clipchamp, I'll create a new project. Then, let's drag in the three video clips that we generated. From here, I can place the clips on the timeline, make a few quick cuts, and line them up so the dialogue flows correctly from scene to scene. Once that's done, you can play back the video to make sure that you're satisfied with it. To export the video in the top right-hand corner, click on export, and over here, you can choose the quality. I'll go with 4K. You can also give it a name. I'll type that in, and then let's export the video. And that's it. You can use this same exact workflow to create short ads, explainers, or social videos. Just swap in your own characters, message, or your brand. Now, if realistic dialogue is important to your videos, this workflow makes a big difference. And although I used cartoon characters here, this same lip syncing works very well with realistic or even photorealistic characters, too. If you want to try this out yourself, I've included links in the description right down below and also as a pinned comment. Thanks for watching, and I'll see you in the next video.

ai AI Insights
Arow Summary
The transcript explains a workflow for creating an animated AI commercial with highly accurate lip-sync across multiple speakers. It demonstrates scripting a short multi-scene dialogue, generating consistent cartoon characters and scenes using an AI image tool (D-Zyne/Design), applying voice selection and text-to-speech with face-locked lip-sync per speaker, optionally converting a still scene into a short AI video for added motion, and finally assembling and exporting the clips in a standard editor like Clipchamp (or CapCut/DaVinci Resolve).
Arow Title
How to Make an AI Cartoon Ad with Realistic Lip-Sync (Step-by-Step)
Arow Keywords
AI lip-sync Remove
animated commercial Remove
cartoon characters Remove
consistent characters Remove
AI image generation Remove
scene generation Remove
text-to-speech Remove
multi-speaker dialogue Remove
face tracking Remove
timeline editing Remove
AI video Remove
Clipchamp Remove
CapCut Remove
DaVinci Resolve Remove
D-Zyne Remove
workflow Remove
YouTube ad Remove
Arow Key Takeaways
  • Use a short script structured by scene, character, and dialogue to keep production simple.
  • Prioritize character consistency across shots by using a 'consistent character' generation workflow.
  • Generate scenes with selected characters; be descriptive to better match your intended look.
  • For lip-sync, assign a distinct voice per speaker and ensure each voice is locked to the correct detected face; adjust timing on the timeline to avoid overlaps.
  • Keep animation subtle for realism; for more motion, generate an AI video clip from the scene before applying lip-sync.
  • Export each lip-synced clip, then assemble, trim, and polish in a video editor; export at the desired resolution.
Arow Sentiments
Positive: Enthusiastic, instructional tone emphasizing how natural the lip-sync looks, how easy the steps are, and satisfaction with generated voices/scenes (e.g., 'turned out great', 'perfect', 'very cool').
Arow Enter your query
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript