How to Write Veo 3.1 Prompts That Actually Work: 12 Templates and Real Examples
Veo 3.1 responds to prompt structure more than any other video model. Here are 12 templates and the patterns that consistently produce cinema-grade output.
Veo 3.1 is Google DeepMind's flagship video model and the one we recommend most for narrative, product, and cinematic work. But it's also the most prompt-sensitive frontier model โ write your prompts well and Veo produces stunning output; write them carelessly and you'll burn credits on generic results.
This guide gives you 12 prompt templates we use in production, with full examples and the structure that makes them work.
Why Veo 3.1 prompts are different
Two things to know upfront:
- Veo follows prompts more literally than Sora or Kling. Specific lighting and camera cues land. Vague ones don't.
- Veo handles audio prompts in the same field. Anything you describe โ dialogue, foley, ambient sound โ can be generated alongside the visuals.
The implication: Veo prompts should be dense with cinematographic vocabulary, and you should explicitly describe audio when you want it.
The Veo 3.1 prompt formula
After hundreds of generations, this five-part structure consistently produces stronger results:
[Subject + appearance + action] +
[Setting + atmosphere] +
[Camera + framing + movement] +
[Lighting] +
[Audio cues] (optional)
You don't need all five every time, but the more you fill in, the closer Veo lands to what you imagined.
12 templates
Each template below is a complete, working prompt. Drop in your own subject and setting; the structure does the work.
Template 1: Cinematic establishing shot
A wide aerial shot of [LOCATION] at [TIME OF DAY], [WEATHER].
Slow drone push-in toward [FOCAL POINT].
Golden hour lighting, long shadows, atmospheric haze.
Ambient: wind, distant nature sounds.
Working example:
A wide aerial shot of a coastal cliff town in Greece at golden
hour, clear skies. Slow drone push-in toward a single white
church on the cliff edge. Golden hour lighting, long shadows,
soft atmospheric haze over the sea. Ambient: gentle wind, distant
seagulls.
Template 2: Talking-head with dialogue
Medium close-up of [PERSON + APPEARANCE], [LOCATION], looking
into camera. Says: "[DIALOGUE LINE]" in a [TONE] voice.
Soft natural light from [DIRECTION], shallow depth of field.
Locked-off camera. Ambient room tone.
Working example:
Medium close-up of a man in his 40s with greying temples, wearing
a charcoal sweater, sitting in a wood-paneled study, looking into
camera. Says: "I never thought it would come to this." in a quiet,
measured voice. Soft natural light from a window on the left,
shallow depth of field. Locked-off camera. Ambient room tone.
Template 3: Product hero
Close-up of [PRODUCT] on [SURFACE], in [SETTING]. Slow camera
rotation revealing all sides. Studio softbox lighting from above,
gentle fill from the front. Polished, editorial aesthetic.
Ambient: subtle product sound (clink, hum, etc.).
Working example:
Close-up of a glass perfume bottle on a polished black marble
surface, in a minimalist studio setting. Slow camera rotation
revealing the bottle's facets. Studio softbox lighting from above
with golden rim light from the left, gentle fill from the front.
Polished, editorial aesthetic. Ambient: subtle glass clink.
Template 4: Action sequence
[CHARACTER] [ACTION] in [LOCATION]. Tracking shot following the
motion. [WEATHER/ATMOSPHERE]. Hard directional light, sharp
shadows. Sound: [SPECIFIC SFX], footsteps, breath.
Working example:
A runner in performance gear sprints through an empty city street
at dawn. Tracking shot following parallel to the motion. Misty,
cool morning, breath visible. Hard directional light from low
sun, sharp shadows. Sound: rhythmic footsteps on wet asphalt,
heavy breathing, distant city ambient.
Template 5: Dialogue scene
Medium two-shot of [PERSON A] and [PERSON B] in [LOCATION].
Person A says: "[LINE A]". Person B responds: "[LINE B]".
Locked-off camera, soft natural light from [DIRECTION], shallow
depth of field. Ambient: [LOCATION SOUND].
Working example:
Medium two-shot of a woman in her 30s and an older man in a
sunlit corner of a small Italian cafe. Woman says: "I haven't
seen him since the funeral." Man responds: "Some things take
time." Locked-off camera, soft warm light from a window on the
left, shallow depth of field. Ambient: distant espresso machine,
muted Italian conversation.
Template 6: Slow-motion moment
[SUBJECT + ACTION] captured in slow-motion at 25% speed.
Locked-off camera, [FRAMING]. [LIGHTING]. Sound: subtle whoosh,
ambient.
Working example:
A glass of red wine being poured into a crystal glass, captured
in slow-motion at 25% speed. Locked-off camera, tight close-up
on the glass and pour. Soft window light from the right, dark
background. Sound: subtle wine flow, gentle glass resonance.
Template 7: Documentary B-roll
Observational shot of [SUBJECT] [DOING ACTIVITY] in [LOCATION].
Handheld camera, slight movement. Natural ambient light. Sound:
diegetic action sounds, ambient location tone.
Working example:
Observational shot of an elderly bookbinder hand-stitching a
leather-bound journal in a small workshop. Handheld camera with
slight natural movement. Soft north-facing window light. Sound:
the rasp of thread through paper, scissors, faint workshop
ambient.
Template 8: Animated / stylized
[SUBJECT + ACTION] in the style of [SPECIFIC STYLE]. [CAMERA
DESCRIPTION]. [LIGHTING APPROPRIATE TO STYLE]. Sound: [STYLE-
APPROPRIATE AUDIO].
Working example:
A young samurai walking through a bamboo forest in the style of
1990s Studio Ghibli animation, hand-drawn aesthetic. Slow tracking
shot parallel to motion. Soft dappled light through bamboo, warm
afternoon palette. Sound: footsteps on leaves, distant flute,
wind in bamboo.
Template 9: Macro close-up
Macro close-up of [SUBJECT]. [SLIGHT MOTION DESCRIPTION].
Shallow focus, [LIGHTING]. Sound: [DELICATE AMBIENT].
Working example:
Macro close-up of a single droplet sliding down a glass surface.
Locked-off camera, the droplet moves slowly down across the
frame. Shallow focus, soft window light from behind creating a
gentle highlight. Sound: faint trickle, no music.
Template 10: Mood piece
[SUBJECT] [STATIC OR SUBTLE ACTION] in [LOCATION]. [TIME OF
DAY/ATMOSPHERE]. Locked-off [FRAMING]. [SPECIFIC LIGHTING].
Sound: [MOOD-APPROPRIATE AMBIENT].
Working example:
A woman sitting alone at a window in a small apartment, looking
out at the rain. Late afternoon, overcast. Locked-off medium
shot, the woman in profile. Soft diffused light from the window
casting gentle shadows. Sound: rain on glass, distant traffic,
muffled city ambient.
Template 11: Tracking shot reveal
Slow tracking shot through [LOCATION], camera moving [DIRECTION],
revealing [SUBJECT] at the end of the move. [LIGHTING]. Sound:
[LOCATION AMBIENT].
Working example:
Slow tracking shot through a narrow vintage hotel hallway, camera
moving forward, revealing a single open door at the end with
warm light spilling out. Dim sconce lighting along the walls,
warm tungsten glow at the door. Sound: distant piano, muffled
voices, footsteps on carpet.
Template 12: Title card / opening
[CINEMATIC OPENING IMAGE]. [SLOW CAMERA MOVE OR HOLD]. [DRAMATIC
LIGHTING]. Sound: [MOOD-SETTING AUDIO, OPTIONALLY MUSIC CUE].
Working example:
A vintage typewriter on a desk by a rain-streaked window, a single
sheet of paper still in the carriage. Slow push-in toward the
paper. Dramatic side lighting from the window, deep shadows in
the room. Sound: rain on glass, slow ticking of a wall clock,
single piano note fading in.
Pro tips that consistently lift quality
1. Specify lighting in every prompt
Veo's defaults are decent, but they're "well-lit" not "intentionally lit." Naming the lighting transforms output:
- "Soft north-facing window light" โ editorial
- "Hard direct sunlight, deep shadows" โ harsh, contrasty
- "Golden hour rim light from camera left" โ cinematic
- "Single overhead practical, deep shadows" โ noir
- "Studio softbox from above, fill from front" โ polished commercial
2. Use camera grammar
Don't write "the camera moves slowly toward the subject." Write "slow dolly in." Veo's training includes a lot of cinematography metadata โ using the vocabulary is a real lift.
3. Audio is in the same prompt
For dialogue: Says: "..." in a [TONE] voice.
For ambient: Sound: [LIST OF SOUNDS].
Veo generates these alongside visuals with synchronized lip-sync. This is its biggest differentiator over Sora 2.
4. Keep prompts under 80 words
Past 80 words, prompt adherence degrades. Cut adjectives. Trim filler. Veo's defaults are already cinematic โ you don't need to write "stunning, beautiful, cinematic masterpiece."
5. End with a clear final state
For 8-second clips, describe where the motion ends. "Ending with the subject's face filling frame," "ending in a wide hold on the locations" โ these anchors prevent motion from drifting.
When to switch to other models
Veo 3.1 isn't always the right call. Quick guide:
- Long-form (10+ seconds): Sora 2
- High-volume drafting: Kling 2.5 โ half the cost
- Dance / sports motion: Seedance 2.0
- Expressive talking heads: Hailuo
- Director-level edit controls: Runway Gen4
For everything else with audio, narrative polish, or product work โ Veo 3.1 is the right pick. See our full comparison of Veo vs Sora for the deeper trade-offs.
Try it
Open Veo 3.1 on Skyvid โ pick one of the 12 templates above as a starting point and modify. Cost is 12 credits per 5-second clip, 20 for 8-second.
FAQ
Do all 12 templates support audio? Yes โ Veo 3.1 can generate audio with any prompt. Templates 2, 4, 5, 7 lean into audio explicitly, but you can add audio cues to any of them.
How specific should dialogue be? Quote the exact line, in quotes. Veo will lip-sync to it. Describe tone in plain language ("quiet," "agitated," "warm").
What if my prompt produces generic results? Add specifics. Replace "a woman" with a clear description. Replace "in a room" with a named room type and atmosphere. Veo rewards specificity.
Can I reuse Veo seeds for variations? Yes โ Skyvid stores the seed for every generation. Reuse the same seed with a slightly modified prompt to see targeted changes.
Is Veo 3.1 worth the extra cost over Kling 2.5? For drafts: no, draft on Kling. For final output where audio or cinematic polish matters: yes, Veo's edge shows clearly. Most pros use both: draft on Kling, finish on Veo.
Ready to generate your own?
Free tier ships 10 credits a day โ no card required.
Start freeRelated posts
All posts โHow to Use Seedream 4.5 for Image-to-Image Editing: The Complete Guide
Edit any image with a text prompt โ change outfits, swap backgrounds, restyle, or extend. Seedream 4.5's image-to-image workflow, end to end.
How to Generate Photoreal Images with Seedream 4.5 (Text-to-Image Tutorial)
Master Seedream 4.5's text-to-image workflow โ prompt patterns, aspect ratios, and the settings that produce truly photoreal output.
How to Make Pro Image-to-Video Animations with Seedance 2.0
Seedance 2.0 is the motion specialist โ and it's the right model when your image needs to move. The complete image-to-video workflow.