MoChat / README.md
CSUBioGroup's picture
Update README.md
f8a570a verified
# Official models of "MoChat: Joints-Grouped Spatio-Temporal Grounding LLM for Multi-Turn Motion Comprehension and Description"
## Overview
MoChat is a Multimodal Large Language Model (MLLM) that revolutionizes human motion understanding through precise spatio-temporal grounding. Unlike conventional motion analysis systems, MoChat integrates:
- **Motion Understanding**: Performs fundamental motion comprehension and summarization.
- **Spatial Limb Grounding**: Accurately locates body parts involved in described movements.
- **Temporal Action Grounding**: Precisely identifies time boundaries corresponding to specific motion descriptions.
## Models
We provide the following trained models for download:
- **[Joints-Grouped Skeleton Encoder](https://huggingface.co/CSUBioGroup/MoChat/blob/main/JGSE_epoch120)** for motion sequences representation.
- Two variants of motion comprehension models:
- [MoChat](https://huggingface.co/CSUBioGroup/MoChat/tree/main/MoChat): Base model.
- [MoChat-R](https://huggingface.co/CSUBioGroup/MoChat/tree/main/MoChat-R): Extended model with regression head.
## Resources
- **Codebase**: [Github](https://github.com/CSUBioGroup/MoChat)
- **Paper**: [Arxiv](https://arxiv.org/abs/2410.11404)