arxiv:2603.28766

HandX: Scaling Bimanual Motion and Interaction Generation

Published on Mar 30

· Submitted by

Sirui Xu on Mar 31

University of Illinois at Urbana-Champaign

Upvote

Authors:

Abstract

HandX presents a comprehensive foundation for bimanual hand motion synthesis including a new dataset, annotation method, and evaluation metrics for dexterous motion generation.

AI-generated summary

Synthesizing human motion has advanced rapidly, yet realistic hand motion and bimanual interaction remain underexplored. Whole-body models often miss the fine-grained cues that drive dexterous behavior, finger articulation, contact timing, and inter-hand coordination, and existing resources lack high-fidelity bimanual sequences that capture nuanced finger dynamics and collaboration. To fill this gap, we present HandX, a unified foundation spanning data, annotation, and evaluation. We consolidate and filter existing datasets for quality, and collect a new motion-capture dataset targeting underrepresented bimanual interactions with detailed finger dynamics. For scalable annotation, we introduce a decoupled strategy that extracts representative motion features, e.g., contact events and finger flexion, and then leverages reasoning from large language models to produce fine-grained, semantically rich descriptions aligned with these features. Building on the resulting data and annotations, we benchmark diffusion and autoregressive models with versatile conditioning modes. Experiments demonstrate high-quality dexterous motion generation, supported by our newly proposed hand-focused metrics. We further observe clear scaling trends: larger models trained on larger, higher-quality datasets produce more semantically coherent bimanual motion. Our dataset is released to support future research.

View arXiv page View PDF Project page GitHub 26 Add to collection

Community

little-aztl

about 10 hours ago

🔗Website: https://handx-project.github.io/
📄Paper: https://arxiv.org/abs/2603.28766
🧑🏻‍💻Code & Data: https://github.com/handx-project/HandX

xusirui

Paper submitter about 10 hours ago

Most motion generation methods treat hands as rigid afterthoughts, yet hands are how we interact with the world — precise finger articulation, contact timing, and bimanual coordination all matter. The bottleneck is data: existing datasets either have full-body scale with no finger detail, or hand detail with no interaction richness. HandX bridges this gap.

🔬 54.2 hours / 5.9M frames of high-fidelity bimanual motion with dense finger articulation and rich inter-hand contact
✍️ 490K text annotations by decoupling kinematic feature extraction from LLM reasoning — grounded in real contact events, not hallucinated narratives
📈 Clear log-linear scaling trend (R² = 0.96): matched increases in data and model capacity improve generation; over-scaling the model alone hurts
🤖 Generated motions seamlessly demonstrated in IsaacGym and MuJoCo, then transferred to a real humanoid with dexterous hands — from text to physical dexterity

A single model supports versatile generation tasks: text-to-motion, inbetweening, trajectory control, keyframe guidance, hand-reaction synthesis, and long-horizon generation.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2603.28766

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.28766 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.28766 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.28766 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.