SANA-WM AI Model

SANA-WM Overview

SANA-WM is an open-source world model from NVIDIA Research that turns one image plus a camera trajectory into minute-scale video. Its core promise is not just longer generation, but longer generation that still respects spatial structure and camera motion.

If you searched the term because it suddenly appeared in research news, the useful answer is simple: this is a model aimed at minute-long 720p worlds with precise camera control, not another ordinary short-form video generator.

SANA-WM Features

Long-horizon worlds
Generate minute-scale scenes that stay coherent across a longer camera path.

Precise camera motion
Follow 6-DoF trajectories instead of only producing unconstrained cinematic motion.

Higher-throughput evaluation
The paper reports comparable visual quality to industrial baselines with 36x higher throughput on its benchmark.

Open research footing
The project page, paper, and code repository are already public, making the model easier to inspect than a closed demo.

How it works

Start with a still image
The model takes an initial frame as the visual anchor for the world.

Add a camera path
A 6-DoF trajectory tells the model where the virtual camera should move.

Roll out the world
Hybrid linear attention keeps the long sequence tractable while preserving scene continuity.

Refine the result
A second-stage long-video refiner improves texture, motion, and later-frame quality.

HuggingFace: https://huggingface.co/Efficient-Large-Model/SANA-WM_bidirectional
Paper: https://arxiv.org/abs/2605.15178
Github: https://github.com/NVlabs/SANA

SANA-WM

Analysis Summary

SANA-WM Overview

SANA-WM Features

How it works

Specifications

Related Models

SANA-WM

Analysis Summary

SANA-WM Overview

SANA-WM Features

How it works

Related materials

Specifications

Related Models