Efficient Text-to-Video and Image-to-Video by NVIDIA NVLabs
Sana video brings efficient, high-quality text-to-video and image-to-video generation to your browser. Create coherent 720p, 16 fps clips up to one minute with research-backed performance. Try Sana video on Story321 and ship polished motion content fast.

Sana video is NVIDIA NVLabs’ efficient diffusion-based video generator for text-to-video (T2V) and image-to-video (I2V), supporting up to 720p resolution, 16 fps, and durations up to one minute, with research-backed fidelity and coherent motion ([nvlabs.github.io](https://nvlabs.github.io/Sana/Video/) • [nvlabs.github.io](https://nvlabs.github.io/Sana/)).
Turn natural language into vivid motion. Sana video supports multi-style narratives, smooth transitions, and consistent subjects, producing high-quality 720p sequences at 16 fps ([nvlabs.github.io](https://nvlabs.github.io/Sana/Video/)).
Animate a single frame into a dynamic clip. Preserve identity and composition while adding realistic motion, camera moves, and scene depth ([nvlabs.github.io](https://nvlabs.github.io/Sana/Video/)).
Generate a 5-second clip in about 60s, or ~29s on RTX 5090 with NVFP4 optimizations—efficient enough for iteration loops ([youtube.com](https://www.youtube.com/watch?v=JmHxYDpCVX8)).
Built on the SANA family (Linear Diffusion Transformer) with ICLR 2025 recognition, plus open-source code for exploration and extensions ([nvlabs.github.io](https://nvlabs.github.io/Sana/) • [research.nvidia.com](https://research.nvidia.com/labs/eai/publication/sana/) • [github.com](https://github.com/NVlabs/Sana)).
Follow these steps to produce consistent results with Sana video.
Choose Sana video from the model list.
Use Text-to-Video for prompts, or Image-to-Video to animate a reference.
Describe subject, motion, camera, time; upload an image for I2V.
Choose up to 60s, 720p, and 16 fps for balanced quality.
Adjust motion strength, camera jitter, aspect ratio, and seed for reproducibility.
Preview, trim, and iterate in short clips; extend once locked.
Specs such as 720p, 16 fps, and up to 1 minute reflect current public research notes; see the project pages for updates ([nvlabs.github.io](https://nvlabs.github.io/Sana/Video/) • [github.com](https://github.com/NVlabs/Sana)).
From brand teasers to tutorial loops, Sana video accelerates concepting and production-grade motion.
Cut 5–10s hero shots with controlled camera moves and consistent branding.
Demonstrate features with readable motion beats and legible close-ups.
Animate mascot gestures, expressions, and micro-acting from a single image.
Generate stylized transitions, establishing shots, and ambient loops.
Prototype punchy, loopable clips that match platform pacing.
Show step-by-step motion with camera clarity and temporal structure.
Answers to common Sana video setup and workflow questions.
Up to 720p resolution, 16 fps, and 1-minute duration per clip, per public docs ([nvlabs.github.io](https://nvlabs.github.io/Sana/Video/)).
About 60s for a 5s clip, or ~29s on RTX 5090 with NVFP4 optimizations ([youtube.com](https://www.youtube.com/watch?v=JmHxYDpCVX8)).
Code and research resources are available for exploration ([github.com](https://github.com/NVlabs/Sana) • [research.nvidia.com](https://research.nvidia.com/labs/eai/publication/sana/)).
T2V creates motion from text; I2V animates a provided image while preserving identity and layout.
Yes—use lens, shot type, and movement terms (e.g., “low tracking shot,” “gentle dolly-in”) in the prompt.
Review the repository’s license and any third-party terms before commercial use ([github.com](https://github.com/NVlabs/Sana)).
Prototype, iterate, and publish compelling motion content—Sana video on Story321 gives you speed, coherence, and research-grade quality.
Performance and specs are based on public materials and may evolve with new releases ([nvlabs.github.io](https://nvlabs.github.io/Sana/Video/)).