Veo 3 is now available in the Gemini API! Learn more

Generate video using Veo

Veo is Google's high-fidelity video generation model, capable of generating videos in a wide range of cinematic and visual styles. Veo captures the nuance of your prompts to render intricate details consistently across frames.

This guide shows you how to generate videos with Veo. For tips of writing video prompts, check out the Veo prompt guide.

Veo versions

The Gemini API offers two video generation models: Veo 3 and Veo 2. We recommend using Veo 3, the latest model, for it's superior quality and audio generation capability.

Veo 3 is available in Preview, which may pose limitations for scaled production use. Veo 2 is Stable and offers a better production experience.

For detailed guidance on key feature differences between the models, review the Model version comparison section.

Generating videos from text

The code sample in this section uses Veo 3 to generate videos with integrated audio.

Python

import time
from google import genai
from google.genai import types

client = genai.Client()

operation = client.models.generate_videos(
    model="veo-3.0-generate-preview",
    prompt="Panning wide shot of a purring kitten sleeping in the sunshine",
    config=types.GenerateVideosConfig(
        person_generation="allow_all",  # "allow_adult" and "dont_allow" for Veo 2 only
        aspect_ratio="16:9",  # "16:9", and "9:16" for Veo 2 only
    ),
)

while not operation.done:
    time.sleep(20)
    operation = client.operations.get(operation)

for n, generated_video in enumerate(operation.response.generated_videos):
    client.files.download(file=generated_video.video)
    generated_video.video.save(f"video{n}.mp4")

JavaScript

import { GoogleGenAI } from "@google/genai";
import { createWriteStream } from "fs";
import { Readable } from "stream";

const ai = new GoogleGenAI({});

async function main() {
  let operation = await ai.models.generateVideos({
    model: "veo-3.0-generate-preview",
    prompt: "Panning wide shot of a purring kitten sleeping in the sunshine",
    config: {
      personGeneration: "allow_all",
      aspectRatio: "16:9",
    },
  });

  while (!operation.done) {
    await new Promise((resolve) => setTimeout(resolve, 10000));
    operation = await ai.operations.getVideosOperation({
      operation: operation,
    });
  }

  operation.response?.generatedVideos?.forEach(async (generatedVideo, n) => {
    const resp = await fetch(`${generatedVideo.video?.uri}&key=GEMINI_API_KEY`); // append your API key
    const writer = createWriteStream(`video${n}.mp4`);
    Readable.fromWeb(resp.body).pipe(writer);
  });
}

main();

Go

package main

import (
  "context"
  "fmt"
  "os"
  "time"
  "google.golang.org/genai"
)

func main() {

  ctx := context.Background()
  client, err := genai.NewClient(ctx, nil)
  if err != nil {
      log.Fatal(err)
  }

  videoConfig := &genai.GenerateVideosConfig{
      AspectRatio:      "16:9",
      PersonGeneration: "allow_all",
  }

  operation, _ := client.Models.GenerateVideos(
      ctx,
      "veo-3.0-generate-preview",
      "Panning wide shot of a purring kitten sleeping in the sunshine",
      nil,
      videoConfig,
  )

  for !operation.Done {
      time.Sleep(20 * time.Second)
      operation, _ = client.Operations.GetVideosOperation(ctx, operation, nil)
  }

  for n, video := range operation.Response.GeneratedVideos {
      client.Files.Download(ctx, video.Video, nil)
      fname := fmt.Sprintf("video_%d.mp4", n)
      _ = os.WriteFile(fname, video.Video.VideoBytes, 0644)
  }
}

REST

# Use curl to send a POST request to the predictLongRunning endpoint.
# The request body includes the prompt for video generation.
curl "${BASE_URL}/models/veo-3.0-generate-preview:predictLongRunning" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -X "POST" \
  -d '{
    "instances": [{
        "prompt": "Panning wide shot of a purring kitten sleeping in the sunshine"
      }
    ],
    "parameters": {
      "aspectRatio": "16:9",
      "personGeneration": "allow_all",
    }
  }' | tee result.json | jq .name | sed 's/"//g' > op_name

# Obtain operation name to download video.
op_name=$(cat op_name)

# Check against status of operation.
while true; do
  is_done=$(curl -H "x-goog-api-key: $GEMINI_API_KEY" "${BASE_URL}/${op_name}" | tee op_check.json | jq .done)

  if [ "${is_done}" = "true" ]; then
    cat op_check.json
    echo "** Attach API_KEY to download video, or examine error message."
    break
  fi

  echo "** Video ${op_name} has not downloaded yet!  Check again after 5 seconds..."

  # Wait for 5 seoncds to check again.
  sleep 5

done

Kitten sleeping in the sun.

This code takes about a minute to run, though it may take longer if resources are constrained. Once it's done running, you should see a video of a sleeping kitten like the one we have here.

If you see an error message instead of a video, this means that resources are constrained and your request couldn't be completed. In this case, run the code again.

Generated videos are stored on the server for 2 days, after which they are removed. If you want to save a local copy of your generated video, you must run result() and save() within 2 days of generation.

Generating videos from images

The following code generates an image using Imagen, then uses the generated image as the starting frame for the generated video.

First, generate an image using Imagen:

Python

prompt="Panning wide shot of a calico kitten sleeping in the sunshine",

imagen = client.models.generate_images(
    model="imagen-3.0-generate-002",
    prompt=prompt,
    config=types.GenerateImagesConfig(
      aspect_ratio="16:9",
      number_of_images=1
    )
)

imagen.generated_images[0].image

JavaScript

import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({});
const response = await ai.models.generateImages({
  model: "imagen-3.0-generate-002",
  prompt: "Panning wide shot of a calico kitten sleeping in the sunshine",
  config: {
    numberOfImages: 1,
  },
});

// you'll pass response.generatedImages[0].image.imageBytes to Veo

Go

package main

import (
    "context"
    "fmt"
    "os"
    "time"
    "google.golang.org/genai"
)

func main() {

    ctx := context.Background()
    client, err := genai.NewClient(ctx, nil)
  if err != nil {
      log.Fatal(err)
  }

    config := &genai.GenerateImagesConfig{
        AspectRatio:    "16:9",
        NumberOfImages: 1,
    }

    response, _ := client.Models.GenerateImages(
        ctx,
        "imagen-3.0-generate-002",
        "Panning wide shot of a calico kitten sleeping in the sunshine",
        config,
    )

    // you'll pass response.GeneratedImages[0].Image to Veo
}

Then, generate a video using the resulting image as the first frame:

Python

operation = client.models.generate_videos(
    model="veo-2.0-generate-001",
    prompt=prompt,
    image = imagen.generated_images[0].image,
    config=types.GenerateVideosConfig(
      person_generation="dont_allow",  # "dont_allow" or "allow_adult"
      aspect_ratio="16:9",  # "16:9" or "9:16"
      number_of_videos=2
    ),
)

# Wait for videos to generate
 while not operation.done:
  time.sleep(20)
  operation = client.operations.get(operation)

for n, video in enumerate(operation.response.generated_videos):
    fname = f'with_image_input{n}.mp4'
    print(fname)
    client.files.download(file=video.video)
    video.video.save(fname)

JavaScript

import { GoogleGenAI } from "@google/genai";
import { createWriteStream } from "fs";
import { Readable } from "stream";

const ai = new GoogleGenAI({});

async function main() {
  // get image bytes from Imagen, as shown above

  let operation = await ai.models.generateVideos({
    model: "veo-2.0-generate-001",
    prompt: "Panning wide shot of a calico kitten sleeping in the sunshine",
    image: {
      imageBytes: response.generatedImages[0].image.imageBytes, // response from Imagen
      mimeType: "image/png",
    },
    config: {
      aspectRatio: "16:9",
      numberOfVideos: 2,
    },
  });

  while (!operation.done) {
    await new Promise((resolve) => setTimeout(resolve, 10000));
    operation = await ai.operations.getVideosOperation({
      operation: operation,
    });
  }

  operation.response?.generatedVideos?.forEach(async (generatedVideo, n) => {
    const resp = await fetch(
      `${generatedVideo.video?.uri}&key=GEMINI_API_KEY`, // append your API key
    );
    const writer = createWriteStream(`video${n}.mp4`);
    Readable.fromWeb(resp.body).pipe(writer);
  });
}

main();

Go

    image := response.GeneratedImages[0].Image

    videoConfig := &genai.GenerateVideosConfig{
      AspectRatio:    "16:9",
      NumberOfVideos: 2,
    }

    operation, _ := client.Models.GenerateVideos(
        ctx,
        "veo-2.0-generate-001",
        "A dramatic scene based on the input image",
        image,
        videoConfig,
    )

    for !operation.Done {
        time.Sleep(20 * time.Second)
        operation, _ = client.Operations.GetVideosOperation(ctx, operation, nil)
    }

    for n, video := range operation.Response.GeneratedVideos {
        client.Files.Download(ctx, video.Video, nil)
        fname := fmt.Sprintf("video_with_image_input_%d.mp4", n)
        _ = os.WriteFile(fname, video.Video.VideoBytes, 0644)
    }

Veo model parameters

(Naming conventions vary by programming language.)

prompt: The text prompt for the video. When present, the image parameter is optional.
image: The image to use as the first frame for the video. When present, the prompt parameter is optional.
negativePrompt: Text string that describes anything you want to discourage the model from generating
aspectRatio: Changes the aspect ratio of the generated video.
- "16:9": Supported in Veo 3 and Veo 2.
- "9:16": Supported in Veo 2 only (defaults to "16:9").
personGeneration: Allow the model to generate videos of people. The following values are supported:
- Text-to-video generation:
  - "allow_all": Generate videos that include adults and children. Currently the only available personGeneration value for Veo 3.
  - "dont_allow": Veo 2 only. Don't allow the inclusion of people or faces.
  - "allow_adult": Veo 2 only. Generate videos that include adults, but not children.
- Image-to-video generation: Veo 2 only
  - "dont_allow": Don't allow the inclusion of people or faces.
  - "allow_adult": Generate videos that include adults, but not children.
- See Limitations.
numberOfVideos: Output videos requested
- 1: Supported in Veo 3 and Veo 2
- 2: Supported in Veo 2 only.
durationSeconds: Veo 2 only. Length of each output video in seconds, between 5 and 8.
- Not configurable for Veo 3, default setting is 8 seconds.
enhancePrompt: Veo 2 only. Enable or disable the prompt rewriter. Enabled by default.
- Not configurable for Veo 3, default prompt enhancer is always on.

See the Model version comparison table for a side-by-side look at parameter differences between Veo 3 and Veo 2.

Specifications

Modalities	Text-to-video generation Image-to-video generation (Veo 2 only)
Request latency	Min: 11 seconds Max: 6 minutes (during peak hours)
Variable length generation	Veo 2: 5-8 seconds Veo 3: 8 seconds
Resolution	720p
Frame rate	24fps
Aspect ratio	16:9 - landscape 9:16 - portrait (Veo 2 only)
Input languages (text-to-video)	English
Limitations	Image-to-video `personGeneration` is not allowed in EU, UK, CH, MENA locations Text-to-video `personGeneration: "allow_all"` is not allowed in EU, UK, CH, MENA locations

Videos created by Veo are watermarked using SynthID, our tool for watermarking and identifying AI-generated content, and are passed through safety filters and memorization checking processes that help mitigate privacy, copyright and bias risks.

Veo prompt guide

This section of the Veo guide contains examples of videos you can create using Veo, and shows you how to modify prompts to produce distinct results.

Safety filters

Veo applies safety filters across Gemini to help ensure that generated videos and uploaded photos don't contain offensive content. Prompts that violate our terms and guidelines are blocked.

Prompt writing basics

Good prompts are descriptive and clear. To get the most out of Veo, start with identifying your core idea, refine your idea by adding keywords and modifiers, and incorporate video-specific terminology into your prompts.

The following elements should be included in your prompt:

Subject: The object, person, animal, or scenery that you want in your video, such as cityscape, nature, vehicles, or puppies.
Action: What the subject is doing (for example, walking, running, or turning their head).
Style: Specify creative direction using specific film style keywords, such as sci-fi, horror film, film noir, or animated styles like cartoon.
Camera positioning and motion: [Optional] Control the camera's location and movement using terms like aerial view, eye-level, top-down shot, dolly shot, or worms eye.
Composition: [Optional] How the shot is framed, such as wide shot, close-up, single-shot or two-shot.
Focus and lens effects: [Optional] Use terms like shallow focus, deep focus, soft focus, macro lens, and wide-angle lens to achieve specific visual effects.
Ambiance: [Optional] How the color and light contribute to the scene, such as blue tones, night, or warm tones.
Implicit or explicit audio cues: [Veo 3 only] With Veo 3, you can provide cues for sound effects, ambient noise, and dialogue.

More tips for writing prompts

The following tips help you write prompts that generate your videos:

Use descriptive language: Use adjectives and adverbs to paint a clear picture for Veo.
Provide context: If necessary, include background information to help your model understand what you want.
Reference specific artistic styles: If you have a particular aesthetic in mind, reference specific artistic styles or art movements.
Utilize prompt engineering tools: Consider exploring prompt engineering tools or resources to help you refine your prompts and achieve optimal results. For more information, visit Introduction to prompt design.
Enhance the facial details in your personal and group images: Specify facial details as a focus of the photo like using the word portrait in the prompt.

Example prompts and output

This section presents several prompts, highlighting how descriptive details can elevate the outcome of each video.

Integrated audio

These videos demonstrate how you can prompt Veo 3's audio generation with increasing levels of detail.

Prompt	Generated output
More detail A close up of two people staring at a cryptic drawing on a wall, torchlight flickering. "This must be the key," he murmured, tracing the pattern. "What does it mean though?" she asked, puzzled, tilting her head. Damp stone, intricate carvings, hidden symbols. A faint, eerie hum resonates in the background.
Less detail Camping (Stop Motion): Camper: "I'm one with nature now!" Bear: "Nature would prefer some personal space".

Try out these prompts yourself to hear the audio! Try Veo 3

Icicles

This video demonstrates how you can use the elements of prompt writing basics in your prompt.

Prompt	Generated output
Close up shot (composition) of melting icicles (subject) on a frozen rock wall (context) with cool blue tones (ambiance), zoomed in (camera motion) maintaining close-up detail of water drips (action).

Man on the phone

These videos demonstrate how you can revise your prompt with increasingly specific details to get Veo to refine the output to your liking.

Prompt	Generated output
Less detail The camera dollies to show a close up of a desperate man in a green trench coat. He's making a call on a rotary-style wall phone with a green neon light. It looks like a movie scene.
More detail A close-up cinematic shot follows a desperate man in a weathered green trench coat as he dials a rotary phone mounted on a gritty brick wall, bathed in the eerie glow of a green neon sign. The camera dollies in, revealing the tension in his jaw and the desperation etched on his face as he struggles to make the call. The shallow depth of field focuses on his furrowed brow and the black rotary phone, blurring the background into a sea of neon colors and indistinct shadows, creating a sense of urgency and isolation.

Prompt

Generated output

Less detail
The camera dollies to show a close up of a desperate man in a green trench coat. He's making a call on a rotary-style wall phone with a green neon light. It looks like a movie scene.

More detail
A close-up cinematic shot follows a desperate man in a weathered green trench coat as he dials a rotary phone mounted on a gritty brick wall, bathed in the eerie glow of a green neon sign. The camera dollies in, revealing the tension in his jaw and the desperation etched on his face as he struggles to make the call. The shallow depth of field focuses on his furrowed brow and the black rotary phone, blurring the background into a sea of neon colors and indistinct shadows, creating a sense of urgency and isolation.

Snow leopard

This example demonstrates the output Veo might generate for a simple prompt.

Prompt	Generated output
A cute creature with snow leopard-like fur is walking in winter forest, 3D cartoon style render.

Running snow leopard

This prompt has more detail and demonstrates generated output that might be closer to what you want in your video.

Prompt	Generated output
Create a short 3D animated scene in a joyful cartoon style. A cute creature with snow leopard-like fur, large expressive eyes, and a friendly, rounded form happily prances through a whimsical winter forest. The scene should feature rounded, snow-covered trees, gentle falling snowflakes, and warm sunlight filtering through the branches. The creature's bouncy movements and wide smile should convey pure delight. Aim for an upbeat, heartwarming tone with bright, cheerful colors and playful animation.

Prompt

Generated output

Create a short 3D animated scene in a joyful cartoon style. A cute creature with snow leopard-like fur, large expressive eyes, and a friendly, rounded form happily prances through a whimsical winter forest. The scene should feature rounded, snow-covered trees, gentle falling snowflakes, and warm sunlight filtering through the branches. The creature's bouncy movements and wide smile should convey pure delight. Aim for an upbeat, heartwarming tone with bright, cheerful colors and playful animation.

Examples by writing elements

These examples show you how to refine your prompts by each basic element.

Subject

This example shows you how to specify a subject description. The description can include a subject, or multiple subjects and actions. Here, our subject is "white concrete apartment building."

Prompt	Generated output
An architectural rendering of a white concrete apartment building with flowing organic shapes, seamlessly blending with lush greenery and futuristic elements

Context

This example shows you how to specify context. The background or context in which the subject will be placed is very important. Try placing your subject in a variety of backgrounds like on a busy street, or in outer space.

Prompt	Generated output
A satellite floating through outer space with the moon and some stars in the background.

Action

This example shows you how to specify action: What is the subject doing like walking, running, or turning their head.

Prompt	Generated output
A wide shot of a woman walking along the beach, looking content and relaxed towards the horizon at sunset.

Style

This example shows you how to specify style. You can add keywords to improve generation quality and steer it closer to intended style, such as shallow depth of field, movie still, minimalistic, surreal, vintage, futuristic, or double-exposure.

Prompt	Generated output
Film noir style, man and woman walk on the street, mystery, cinematic, black and white.

Camera motion

This example shows you how to specify camera motion. Options for camera motion include POV shot, aerial view, tracking drone view, or tracking shot.

Prompt	Generated output
A POV shot from a vintage car driving in the rain, Canada at night, cinematic.

Composition

This example shows you how to specify composition: How the shot is framed (wide shot, close-up, low angle, etc.).

Prompt	Generated output
Extreme close-up of a an eye with city reflected in it.
Create a video of a wide shot of surfer walking on a beach with a surfboard, beautiful sunset, cinematic.

Ambiance

This example shows you how to specify ambiance. Color palettes play a vital role in photography, influencing the mood and conveying intended emotions. Try things like "muted orange warm tones," "natural light," "sunrise" or "sunset". For example, a warm, golden palette can infuse a romantic and atmospheric feel into a photograph.

Prompt	Generated output
A close-up of a girl holding adorable golden retriever puppy in the park, sunlight.
Cinematic close-up shot of a sad woman riding a bus in the rain, cool blue tones, sad mood.

Use reference images to generate videos

You can bring images to life by using Veo's image-to-video capability. You can use existing assets, or try Imagen to generate something new.

Prompt	Generated output
Bunny with a chocolate candy bar.
Bunny runs away.

Negative prompts

Negative prompts can be a powerful tool to help specify elements you don't want in the video. Describe what you want to discourage the model from generating after the phrase "Negative prompt". Follow these tips:

❌ Don't use instructive language or words like no or don't. For example, "No walls" or "don't show walls".
✅ Do describe what you don't want to see. For example, "wall, frame", which means that you don't want a wall or a frame in the video.

Prompt	Generated output
Generate a short, stylized animation of a large, solitary oak tree with leaves blowing vigorously in a strong wind. The tree should have a slightly exaggerated, whimsical form, with dynamic, flowing branches. The leaves should display a variety of autumn colors, swirling and dancing in the wind. The animation should use a warm, inviting color palette.
Generate a short, stylized animation of a large, solitary oak tree with leaves blowing vigorously in a strong wind. The tree should have a slightly exaggerated, whimsical form, with dynamic, flowing branches. The leaves should display a variety of autumn colors, swirling and dancing in the wind. The animation should use a warm, inviting color palette. With negative prompt - urban background, man-made structures, dark, stormy, or threatening atmosphere.

Prompt

Generated output

Generate a short, stylized animation of a large, solitary oak tree with leaves blowing vigorously in a strong wind. The tree should have a slightly exaggerated, whimsical form, with dynamic, flowing branches. The leaves should display a variety of autumn colors, swirling and dancing in the wind. The animation should use a warm, inviting color palette.

Aspect ratios

Gemini Veo video generation supports the following two aspect ratios:

Aspect ratio	Description
Widescreen or 16:9	The most common aspect ratio for televisions, monitors, and mobile phone screens (landscape). Use this when you want to capture more of the background, like in scenic landscapes.
Portrait or 9:16 (Veo 2 only)	Rotated widescreen. This aspect ratio has been popularized by short form video applications, such as Youtube shorts. Use this for portraits or tall objects with strong vertical orientations, such as buildings, trees, waterfall, or buildings.

Widescreen

This prompt is an example of the widescreen aspect ratio of 16:9.

Prompt	Generated output
Create a video with a tracking drone view of a man driving a red convertible car in Palm Springs, 1970s, warm sunlight, long shadows.

Portrait

This prompt is an example of the portrait aspect ratio of 9:16. This ratio is only available for Veo 2.

Prompt	Generated output
Create a video highlighting the smooth motion of a majestic Hawaiian waterfall within a lush rainforest. Focus on realistic water flow, detailed foliage, and natural lighting to convey tranquility. Capture the rushing water, misty atmosphere, and dappled sunlight filtering through the dense canopy. Use smooth, cinematic camera movements to showcase the waterfall and its surroundings. Aim for a peaceful, realistic tone, transporting the viewer to the serene beauty of the Hawaiian rainforest.

Prompt

Generated output

Create a video highlighting the smooth motion of a majestic Hawaiian waterfall within a lush rainforest. Focus on realistic water flow, detailed foliage, and natural lighting to convey tranquility. Capture the rushing water, misty atmosphere, and dappled sunlight filtering through the dense canopy. Use smooth, cinematic camera movements to showcase the waterfall and its surroundings. Aim for a peaceful, realistic tone, transporting the viewer to the serene beauty of the Hawaiian rainforest.

Model version comparison

We recommend using Veo 3 for the best performance, fidelity, and quality.

The following table describes the differences in features, specifications, and parameters between Veo 2 and the current state of the Veo 3 preview:

Model	Veo 3	Veo 2
Availability	Preview	Stable
Audio	Audio with video (Always on)	No audio
Generation	Text to video	Text and image to video
Videos per request	1	1 or 2
`aspectRatio`	`16:9` only	`16:9` or `19:6`
`personGeneration`	`allow_all` only (not configurable)	`allow_adult`, `dont_allow`, or `allow_all` (text to video only)
`durationSeconds`	Not configurable, 8 seconds only	5-8 seconds
`enhancePrompt`	Not configurable, always on	Enable (default) or disable

You can migrate from Veo 2 to Veo 3 by updating the model name to use a Veo 3 model code, with minimal changes to parameters.

What's next

Gain more experience generating AI videos with the Veo Colab.
Check out cool examples using Veo 2 on the Google DeepMind site