Lance AI Model | VKMO AI

Lance Overview

Lance is a 3B native unified multimodal model for image and video understanding, generation, and editing, trained from scratch within a training budget of no more than 128 GPUs using a staged multi-task recipe.

Lance Features

Text-to-Video
Nine text-conditioned cases focused on character motion, fantasy animals, two-person interaction, and cinematic dreamlike scenes.

Video Editing
Nine prompt-driven single-step and compositional editing cases spanning background transformation, object addition and removal, subject replacement, appearance restyling, stylization, and action edits.

Multi-turn Consistency Editing
Source video followed by four linked edits on the same subject: replacement, accessory addition, background rewrite, and motion update.

Video Understanding
Selected video question answering and captioning cases that evaluate temporal reasoning, motion recognition, and concise-to-detailed description.

Text-to-Image
Representative text-to-image outputs spanning photorealistic, stylized, compositional, and typography-heavy prompts.

Image Editing
Instruction-guided image editing cases showing local replacement, style transfer, object-aware modifications, and layout-preserving transformations.

Image Understanding
Six selected visual question answering cases spanning charts, trade data, OCR, documents, landmarks, and natural phenomena.