Instructions to use NU-World-Model-Embodied-AI/FlashWAM-RoboTwin with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use NU-World-Model-Embodied-AI/FlashWAM-RoboTwin with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("NU-World-Model-Embodied-AI/FlashWAM-RoboTwin", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
import torch
from diffusers import DiffusionPipeline
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("NU-World-Model-Embodied-AI/FlashWAM-RoboTwin", dtype=torch.bfloat16, device_map="cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]Flash-WAM β RoboTwin (distilled)
Single-step distilled checkpoint for Flash-WAM: Modality-Aware Distillation for World Action Models, applied to LingBot-VA and evaluated on RoboTwin 2.0. Flash-WAM distills each modality with a consistency function matched to its noise regime (linear-gradient-scaling for the action stream, variance-preserving for the video stream), compressing inference to a single step per modality for up to a 23Γ speedup while preserving teacher-level task success.
This repository contains the complete model (distilled transformer + encoders):
| Component | Description |
|---|---|
transformer/ |
Distilled Flash-WAM student |
vae/ |
VAE (from the LingBot-VA teacher) |
text_encoder/ |
UMT5-XXL text encoder (from the teacher) |
tokenizer/ |
T5 tokenizer |
Links
- π Paper: https://arxiv.org/abs/2606.05254
- π Project page: https://flashwam.github.io
- π» Code: https://github.com/NU-World-Model-Embodied-AI/Flash-WAM
Usage
For environment setup and evaluation, follow the Flash-WAM repository and LingBot-VA. Point the inference server at this checkpoint directory.
Citation
@misc{akbari2026flashwammodalityawaredistillationworld,
title={Flash-WAM: Modality-Aware Distillation for World Action Models},
author={Arman Akbari and Ci Zhang and Arash Akbari and Lin Zhao and Yixiao Chen and Weiwei Chen and Xuan Zhang and Geng Yuan and Yanzhi Wang},
year={2026},
eprint={2606.05254},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2606.05254},
}
License: Apache-2.0.
- Downloads last month
- -