metadata
library_name: pytorch
tags:
- motion
- rvq
- vector-quantization
- human-motion
- safetensors
- humanML
- motion-reconstructor
license: cc-by-nc-4.0
datasets:
- Wojtekb30/HumanML3D-500ms-FPP-descriptions-CoTs-1
Motion RVQ (Move Reconstruction)
This model uses Residual Vector Quantization (RVQ) to reconstruct motion represented as 263-dimensional frame vectors.
It is a custom PyTorch model (not a Transformers AutoModel) and is loaded from safetensors with rvq_model.py.
Model Summary
- Architecture: encoder -> 4-level RVQ -> decoder
- Input shape:
(T, 263)per sequence (frame-major) - Training window: 100 frames (with crop/pad in dataset loader)
- Output: reconstructed motion sequence in the same 263-dim representation
Repository Files
motion_rvq_weights.safetensors- main published checkpointconfig.json- model configuration metadatarvq_model.py- model architecture (MotionRVQ_VAE)TestRVQ.py- inference + 3-panel visualizationTrainRVQ.py- training scriptrvq_humanml_dataset.py- training dataset loaderMean.npy,Std.npy- normalization statistics000001.npy,000012.npy- sample motion files
motion_rvq_weights.pth can be treated as a legacy artifact; code uses motion_rvq_weights.safetensors.
Install
pip install torch safetensors numpy matplotlib
Inference
Run the provided visualization script:
python TestRVQ.py
By default, TestRVQ.py uses 000001.npy. You can change FILE_TO_TEST in TestRVQ.py to another sequence.
Minimal loading example:
from pathlib import Path
import torch
from safetensors.torch import load_file
from rvq_model import MotionRVQ_VAE
base = Path(".")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = MotionRVQ_VAE().to(device)
state_dict = load_file(str(base / "motion_rvq_weights.safetensors"), device=str(device))
model.load_state_dict(state_dict)
model.eval()
Training From Scratch
Expected layout:
rvq/
TrainRVQ.py
rvq_model.py
rvq_humanml_dataset.py
Mean.npy
Std.npy
new_joint_vecs/
*.npy
Run training:
python TrainRVQ.py
Output checkpoint:
motion_rvq_weights.safetensors
Limitations
- This model reconstructs motion vectors; it is not a text-to-motion generator.
- Input format must match the same 263-dim representation and normalization scheme used during training.
