Sana video

Efficient Text-to-Video and Image-to-Video by NVIDIA NVLabs

Sana video brings efficient, high-quality text-to-video and image-to-video generation to your browser. Create coherent 720p, 16 fps clips up to one minute with research-backed performance. Try Sana video on Story321 and ship polished motion content fast.

Meet Sana video

Sana video is NVIDIA NVLabs’ efficient diffusion-based video generator for text-to-video (T2V) and image-to-video (I2V), supporting up to 720p resolution, 16 fps, and durations up to one minute, with research-backed fidelity and coherent motion ([nvlabs.github.io](https://nvlabs.github.io/Sana/Video/) • [nvlabs.github.io](https://nvlabs.github.io/Sana/)).

Text-to-Video (T2V)

Turn natural language into vivid motion. Sana video supports multi-style narratives, smooth transitions, and consistent subjects, producing high-quality 720p sequences at 16 fps ([nvlabs.github.io](https://nvlabs.github.io/Sana/Video/)).

Image-to-Video (I2V)

Animate a single frame into a dynamic clip. Preserve identity and composition while adding realistic motion, camera moves, and scene depth ([nvlabs.github.io](https://nvlabs.github.io/Sana/Video/)).

Efficient, practical runtime

Generate a 5-second clip in about 60s, or ~29s on RTX 5090 with NVFP4 optimizations—efficient enough for iteration loops ([youtube.com](https://www.youtube.com/watch?v=JmHxYDpCVX8)).

Open-source and research-backed

Built on the SANA family (Linear Diffusion Transformer) with ICLR 2025 recognition, plus open-source code for exploration and extensions ([nvlabs.github.io](https://nvlabs.github.io/Sana/) • [research.nvidia.com](https://research.nvidia.com/labs/eai/publication/sana/) • [github.com](https://github.com/NVlabs/Sana)).

How to use on Story321

Follow these steps to produce consistent results with Sana video.

Pick the model

Choose Sana video from the model list.

Select mode

Use Text-to-Video for prompts, or Image-to-Video to animate a reference.

Write the prompt / set reference

Describe subject, motion, camera, time; upload an image for I2V.

Set duration, resolution, fps

Choose up to 60s, 720p, and 16 fps for balanced quality.

Tune controls

Adjust motion strength, camera jitter, aspect ratio, and seed for reproducibility.

Generate and refine

Preview, trim, and iterate in short clips; extend once locked.

Tips

•Iterate at 3–5s lengths before extending to 30–60s.
•Keep subject names, styles, and lens terms consistent across runs.
•Use time cues like “hold 1s” to stabilize beats.
•For I2V identity, upload crisp, evenly lit references.
•Organize winning prompts as templates for Sana video.

Specs such as 720p, 16 fps, and up to 1 minute reflect current public research notes; see the project pages for updates ([nvlabs.github.io](https://nvlabs.github.io/Sana/Video/) • [github.com](https://github.com/NVlabs/Sana)).

What you can create with Sana video

From brand teasers to tutorial loops, Sana video accelerates concepting and production-grade motion.

Launch teasers

Cut 5–10s hero shots with controlled camera moves and consistent branding.

Product explainers

Demonstrate features with readable motion beats and legible close-ups.

Character moments

Animate mascot gestures, expressions, and micro-acting from a single image.

Cinematic b‑roll

Generate stylized transitions, establishing shots, and ambient loops.

Social trends

Prototype punchy, loopable clips that match platform pacing.

Education & how‑tos

Show step-by-step motion with camera clarity and temporal structure.

Frequently asked questions

Answers to common Sana video setup and workflow questions.

What are the current output limits?

Up to 720p resolution, 16 fps, and 1-minute duration per clip, per public docs ([nvlabs.github.io](https://nvlabs.github.io/Sana/Video/)).

How fast does generation run?

About 60s for a 5s clip, or ~29s on RTX 5090 with NVFP4 optimizations ([youtube.com](https://www.youtube.com/watch?v=JmHxYDpCVX8)).

Is the model open-source or research-backed?

Code and research resources are available for exploration ([github.com](https://github.com/NVlabs/Sana) • [research.nvidia.com](https://research.nvidia.com/labs/eai/publication/sana/)).

What’s the difference between T2V and I2V?

T2V creates motion from text; I2V animates a provided image while preserving identity and layout.

Can I control camera behavior?

Yes—use lens, shot type, and movement terms (e.g., “low tracking shot,” “gentle dolly-in”) in the prompt.

Can I use results commercially?

Review the repository’s license and any third-party terms before commercial use ([github.com](https://github.com/NVlabs/Sana)).

Start creating with Sana video

Prototype, iterate, and publish compelling motion content—Sana video on Story321 gives you speed, coherence, and research-grade quality.

Performance and specs are based on public materials and may evolve with new releases ([nvlabs.github.io](https://nvlabs.github.io/Sana/Video/)).