
SANA is an efficiency-oriented codebase for high-resolution image and video generation, providing complete training and inference pipelines.
Context
-
tokens
Input
-
per 1M tokens
Output
-
per 1M tokens
Downloads
-
SANA-WM is an open-source world model from NVIDIA Research that turns one image plus a camera trajectory into minute-scale video. Its core promise is not just longer generation, but longer generation that still respects spatial structure and camera motion.
If you searched the term because it suddenly appeared in research news, the useful answer is simple: this is a model aimed at minute-long 720p worlds with precise camera control, not another ordinary short-form video generator.
Long-horizon worlds
Generate minute-scale scenes that stay coherent across a longer camera path.
Precise camera motion
Follow 6-DoF trajectories instead of only producing unconstrained cinematic motion.
Higher-throughput evaluation
The paper reports comparable visual quality to industrial baselines with 36x higher throughput on its benchmark.
Open research footing
The project page, paper, and code repository are already public, making the model easier to inspect than a closed demo.
Start with a still image
The model takes an initial frame as the visual anchor for the world.
Add a camera path
A 6-DoF trajectory tells the model where the virtual camera should move.
Roll out the world
Hybrid linear attention keeps the long sequence tractable while preserving scene continuity.
Refine the result
A second-stage long-video refiner improves texture, motion, and later-frame quality.