ZipMo (Learning Long-term Motion Embeddings for Efficient Kinematics Generation)

ZipMo is a motion-space model for efficient long-horizon kinematics generation. It learns compact long-term motion embeddings from large-scale tracker-derived trajectories and generates plausible future motion directly in this learned motion space. The model supports spatial-poke conditioning for open-domain videos and task/text-embedding conditioning for LIBERO robotics evaluation.

Paper and Abstract

ZipMo was introduced in the CVPR 2026 paper Learning Long-term Motion Embeddings for Efficient Kinematics Generation.

Understanding and predicting motion is a fundamental component of visual intelligence. Although video models can synthesize scene dynamics, exploring many possible futures through full video generation is expensive. ZipMo instead operates directly on long-term motion embeddings learned from tracker trajectories, enabling efficient generation of long, realistic motions while preserving dense reconstruction at arbitrary spatial query points.

ZipMo generates long-horizon motion in a compact learned motion space, supporting spatial-poke conditioning for open-domain videos and task-conditioned action prediction on LIBERO.

Usage

For programmatic use, the simplest way to use ZipMo is via torch.hub:

import torch

repo = "CompVis/long-term-motion"

# Open-domain motion prediction
planner_sparse = torch.hub.load(repo, "zipmo_planner_sparse")
planner_dense = torch.hub.load(repo, "zipmo_planner_dense")

# Motion autoencoder
vae = torch.hub.load(repo, "zipmo_vae")

LIBERO planning and policy components can be loaded in the same way:

import torch

repo = "CompVis/long-term-motion"

# LIBERO planners
libero_atm_planner = torch.hub.load(repo, "zipmo_planner_libero", "atm")
libero_tramoe_planner = torch.hub.load(repo, "zipmo_planner_libero", "tramoe")

# LIBERO policy heads
policy_head_atm = torch.hub.load(repo, "zipmo_policy_head", "atm")
policy_head_tramoe_goal = torch.hub.load(repo, "zipmo_policy_head", "tramoe", "goal")

Available Torch Hub entries:

zipmo_planner_sparse: sparse-poke planner for open-domain motion prediction.
zipmo_planner_dense: dense-conditioning planner for open-domain motion prediction.
zipmo_vae: long-term motion autoencoder.
zipmo_planner_libero: LIBERO planner with mode atm or tramoe.
zipmo_policy_head: LIBERO policy head with mode atm or tramoe. For tramoe, pass one of 10, goal, object, or spatial.

For the interactive demo, standard track prediction evaluation, LIBERO rollout evaluation, and training instructions, see the GitHub repository.

Citation

If you find our model or code useful, please cite our paper:

@inproceedings{stracke2026motionembeddings,
  title     = {Learning Long-term Motion Embeddings for Efficient Kinematics Generation},
  author    = {Stracke, Nick and Bauer, Kolja and Baumann, Stefan Andreas and Bautista, Miguel Angel and Susskind, Josh and Ommer, Bj{\"o}rn},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2026}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Paper for CompVis/ZipMo

Learning Long-term Motion Embeddings for Efficient Kinematics Generation

Paper • 2604.11737 • Published Apr 13 • 6