How to get consistent characters with Google Veo 3 and Flow

Muzaffar Ismail - Jul 18, 2025

The first time I tried Google Veo 3 and Flow to make a more than 8-seconds video, I came up against problems with consistency. This meant that throughout my videos, my character sometimes didn't have the same clothes, hairstyle or even voice.

While you can reduce the chances of inconsistency by using the Frame-to-Video function to upload an image of the character you intend to use, this costs a lot more AI credits. And to top it off, it can still be somewhat inconsistent. To solve this, we made this guide on How to get consistent characters in Google Veo 3 and Flow, so do read on.

You can see that the character does not have the same hair style, nor is she wearing the same clothes and in the video she also has a different voice

What can you do? How about the built-in features?

Google has already created features built into Flow to help solve these problems of consistency. These include Frame-to-Video, Extend-to-Video and Ingredients-to-Video. Unfortunately, Ingredients-to-Video is only for Ultra subscribers, so we didn't test it.

Frame-to-Video on the other hand, costs a whopping 100 AI credits for each video generation. While it does make more consistent characters by supplying an image of the character you created, you could use up your whole monthly quota of 1000 AI credits in just 10 or so video generations.

Extend-to-Video lets you build upon the previous image without having to reiterate the same prompt

Thankfully, Extend-to-Video lets you continue the scene it is extended from without having to reiterate the prompt from before. This only costs 10 AI credits but at the moment it only supports Veo 2 and not Veo 3, so there is no automatic voiceover or audio. If you do use it, you'd have to put in the audio or voiceover yourself later.

In the end, text-to-video remains the most cost effective method of crafting a video using Google Flow at just 10 AI credits per generation. It also offers a bit more control and more features. However, if our previous attempts are any indication, it isn't as easy as just typing anything you want in the prompt box.

Unfortunately, Extend-to-Video is not supported by Veo 3

How to make text-to-video consistent in Flow?

Answering the question "Who actually needs a transparent smartphone?" is Nina, our first consistent character in Google Veo 3 / Flow

Taking a page from others who have found the way, apparently, you need to phrase the description according to how Google Veo 3 would describe it. To do this you need to use Google Whisk to help come up with the description and then use Gemini to create a more consistent prompt.

Here are the steps we did to make consistent text-to-video in Google Flow:

Use Google Whisk to generate an image of the object you want to be consistent.
Download that image and copy the prompt used for it.
Upload that image to the Subject window in Google Whisk, tap the pen icon for how it would describe it and copy this.
Give the consistent object a name, then paste those copied descriptions into the prompt below.
Run this prompt in Gemini, upload the image you generated and get the consistent description.
Use this description in the new Google Flow prompts.
Generate the videos in Flow accordingly and edit the videos as necessary.

This is the image of Nina that I came up with in Google Whisk

Here is the overall prompt format:

Here is a prompt I wrote in Whisk: [My initial description]

Here is the image description received for the image (attached): [Google Whisk description]

I would like a detailed Veo 3 description of just the [consistent object] that I can use in a template for building prompts where I will try to place her in a consistent looking way.

In Google Whisk, upload Nina's image in the Subject window and then tap the pen icon for her description

Here is the filled up prompt which I ran in Gemini with the uploaded image of Nina:

Here is a prompt I wrote in Whisk: A Malaysian woman video host for an online technology video channel. She has Malaysian facial features, frizzy hair in a pony tail and is wearing a solid colour long sleeve t-shirt. She has a fun expressive atmosphere

Here is the image description received for the image (attached): A woman with light brown skin, appearing to be in her 20s, is centered in a medium shot, facing forward with a surprised expression. Her mouth is open wide in an "O" shape, revealing her teeth, and her eyes are wide and bright. Her dark brown, curly hair is pulled up into a high ponytail, with some strands escaping around her face, framing her eyes and forehead. Her eyebrows are dark and well-defined, and she has visible eyelashes. She is wearing a simple, long-sleeved, scoop-neck top in a muted rose or dusty pink color. A small, black lavalier microphone is clipped to the center of her collarbone area. Both of her hands are raised to shoulder height, palms facing outward, fingers splayed as if in a gesture of disbelief or excitement. The background is a vibrant, solid yellow wall, creating a stark contrast with her clothing and hair. To the left of the woman, a portion of a green potted plant with large leaves is visible. To the right, a dark, open shelving unit with several books and other decorative items is partially visible. The lighting is bright and even, illuminating her face and the yellow background.

I would like a detailed Veo 3 description of just the woman that I can use in a template for building prompts where I will try to place her in a consistent looking way.

This is the image description according to Google Whisk

Here is the resulting Veo 3 description from Gemini:

A Malaysian woman in her 20s with light brown skin, dark brown curly hair in a high ponytail, and dark, well-defined eyebrows. She is wearing a muted rose or dusty pink long-sleeved, scoop-neck top with a small, black lavalier microphone clipped to the center of her collarbone area. Her hands are raised to shoulder height, palms facing outward, fingers splayed.

Here is the prompt for Scene 1 of the video above:

Scene 1: Host Wonderingly Talking About the Transparent Phone

A cinematic medium shot of Nina, a Malaysian woman in her 20s with light brown skin, dark brown curly hair in a high ponytail, and dark, well-defined eyebrows. She is wearing a muted rose or dusty pink long-sleeved, scoop-neck top with a small, black lavalier microphone clipped to the center of her collarbone area. Her hands are raised to shoulder height, palms facing outward, fingers splayed.

The background is a softly blurred, clean, modern tech studio with professional, diffused lighting and perhaps a subtle hint of sleek tech gadgets in the background.

Nina speaks incredulously, her voice is a clear, friendly female voice in Malaysian English with a moderate pace and enthusiastic tone, suitable for hosting an online technology video channel: "Wow, we never thought someone would actually do this!"

You can see that I have described how the camera should view our character Nina, who I have given a name in the prompt. The Veo 3 description has been included into the prompt along with a description of the background and her voice. I then continued using these in the prompts for the next scenes.

Despite all this, not everything went well. I still got 1 generation with a completely different character and 1 generation where the dreaded inaccurate captions appeared. Still, the character Nina was quite consistently generated. I then upscaled the videos to 1080p, downloaded them separately and edited them in Samsung Studio to become the video as above.

So, how did your attempts at consistency turn out in Google Flow and Veo 3? We hope our guide helped. Please share your experiences in the comments below and stay tuned to TechNave.com for more tech articles and guides.

Tags: How to get consistent characters in Google Veo 3 and Flow, Google Veo 3, Google Flow, Google Cloud, Google Cloud Malaysia, Google Whisk, Google Gemini, Google Malaysia, Veo 3, Flow, Whisk, Gemini, Generative AI, GenAI, Gen AI, AI, Artificial Intelligence

COMMENTS

- ALL
- Apple
- Samsung
- Huawei
- Oppo
- Xiaomi
- HONOR
- Realme
- vivo
- Infinix
- TECNO
- Sony
- OnePlus
- Google
- LG
- Asus
- Lenovo
- HMD
- Nokia
- nubia
- Motorola
- BlackBerry
- HTC
- Microsoft
- Nothing
- Meizu
- Razer
- ZTE
- ALL
- Acer
- Asus
- Dell
- HP
- Huawei
- HONOR
- Lenovo
- Apple
- MSI
- Microsoft
- JOI
- realme
- Xiaomi
- nubia
- GIGABYTE
- ALL
- Apple
- Samsung
- Huawei
- Asus
- honor
- Sony
- Xiaomi
- OPPO
- Google
- Lenovo
- realme
- vivo
- TECNO
- Infinix
- LG
- Microsoft
- ZTE
- OnePlus
- Motorola
- BlackBerry
- HTC
- Acer
- Dell
- Alcatel
- Viewsonic
- Toshiba
- HP
- Nokia
- ALL
- Samsung
- Apple
- Huawei
- realme
- HONOR
- OPPO
- Xiaomi
- OnePlus
- Amazfit
- Google
- Garmin
- Lenovo
- vivo
- Asus
- LG
- Alcatel
- nubia
- ZTE
- ALL
- Canon
- Nikon
- Sony
- Olympus
- Fujifilm
- Panasonic
- Casio
- Samsung
- Kodak
- Leica
- Pentax
- Ricoh
- Sigma

Daily Top 10 Hits

1.	HONOR 400 Pro	RM 2699
2.	Oppo Reno 12 Pro	RM 2459
3.	Xiaomi 17T Pro	RM 2899
4.	Redmi Note 15 5G	RM 849
5.	Samsung Galaxy A55	RM 1200
6.	Honor 200 Pro	RM 1990
7.	Samsung Galaxy A56	RM 1999
8.	Honor X80 Pro Max	N/A
9.	Apple iPhone 17 Pro Max	RM 5999
10.	Redmi Note 13 Pro	RM 800

How to get consistent characters with Google Veo 3 and Flow

What can you do? How about the built-in features?

How to make text-to-video consistent in Flow?

COMMENTS

You might also like

Daily Top 10 Hits