ACWM-Phys Checkpoints
ACWM-Phys: Investigating Generalized Physical Interaction in Action-Conditioned Video World Models
Haotian Xue†, Yipu Chen*, Liqian Ma*, Zelin Zhao, Lama Moukheiber, Yongxin Chen
Georgia Institute of Technology
[Project Page] · [Paper] · [Dataset] · [Code]
Overview
This repository contains pretrained ACWM-DiT checkpoints — a latent diffusion transformer trained with flow matching on the ACWM-Phys benchmark. All released checkpoints are DiT-S (~200M parameters) trained for 100k steps.
Released Checkpoints
| Environment | Category | Action Dim | Resolution | Checkpoint |
|---|---|---|---|---|
| Push Cube | Rigid-Body | 2 | 240×240 | VideoDiT_S_push_cube_240x240/latest.pt |
| Stack Cube | Rigid-Body | 7 | 240×240 | VideoDiT_S_stack_cube_240x240/latest.pt |
| Push Rope | Deformable | 2 | 240×240 | VideoDiT_S_push_rope_240x240/latest.pt |
| Cloth Move | Deformable | 3 | 240×240 | VideoDiT_S_clothmove_240x240_240x240/latest.pt |
| Push Sand | Particle | 7 | 240×400 | VideoDiT_S_push_sand_240x400/latest.pt |
| Pour Water | Particle | 4 | 240×240 | VideoDiT_S_pour_water_240x240/latest.pt |
| Robot Arm | Kinematics | 7 | 240×240 | VideoDiT_S_robot_arm_240x240/latest.pt |
| Reacher | Kinematics | 2 | 240×240 | VideoDiT_S_reacher_240x240/latest.pt |
The Wan 2.1 VAE weights (Wan2.1_VAE.pth, 508 MB) are also included and required for encoding/decoding video latents.
Download
huggingface-cli download t1an/ACWM-Phys-checkpoints --local-dir ./checkpoints
export WAN_VAE_PATH=./checkpoints/Wan2.1_VAE.pth
Usage
See the ACWM-Phys code repository for full evaluation and training instructions.
Quick evaluation:
python eval.py --env push_cube --steps 50 --split both --save_videos
Model Architecture
ACWM-DiT takes the first video frame + full action sequence and predicts the complete future trajectory:
- Causal VAE (Wan 2.1) — encodes video into 16-ch latent tokens at H/8×W/8, 4× temporal compression
- DiT with flow matching — denoises the full latent trajectory
- Action conditioning — injected via AdaLN (default) or cross-attention
Citation
Citation
Coming soon.