| --- |
| license: other |
| language: |
| - en |
| tags: |
| - 3d |
| - point-cloud |
| - multimodal |
| - multi-object |
| - pointllm |
| - modelnet40 |
| pipeline_tag: text-generation |
| --- |
| |
| # Multi-3DLLM Checkpoints |
|
|
| This repository hosts the released BeyondSingleObject checkpoints: |
|
|
| - `multi-3dllm/`: MO3D, Shape Mating, and Change Captioning |
| - `multi-3dllm-classification/`: ModelNet40 zero-shot classification |
|
|
| Use the code and scripts from: |
|
|
| ```text |
| https://github.com/KohsukeIde/BeyondSingleObject |
| ``` |
|
|
| ## Download |
|
|
| ```bash |
| huggingface-cli download idekoh/Multi-3DLLM \ |
| --local-dir checkpoints \ |
| --include "multi-3dllm/**" "multi-3dllm-classification/**" |
| ``` |
|
|
| Expected local layout: |
|
|
| ```text |
| checkpoints/ |
| βββ multi-3dllm/ |
| βββ multi-3dllm-classification/ |
| data/ |
| ``` |
|
|
| ## Usage |
|
|
| Example inference and LLM-based evaluation: |
|
|
| ```bash |
| MODEL_PATH=checkpoints/multi-3dllm \ |
| OUTPUT_DIR=outputs/infer \ |
| scripts/eval/infer.sh |
| ``` |
|
|
| ModelNet40 classification: |
|
|
| ```bash |
| MODEL_PATH=checkpoints/multi-3dllm-classification \ |
| OUTPUT_DIR=outputs/modelnet40_eval \ |
| LIMIT=0 \ |
| PROMPT_MODE=paper \ |
| NUM_OBJECTS=1 \ |
| TARGET_POSITION=1 \ |
| scripts/eval/eval_modelnet.sh |
| ``` |
|
|
| Repeat `(NUM_OBJECTS, TARGET_POSITION) = (1,1), (2,1), (2,2), (3,1), (3,2), |
| (3,3)` for the full table. |
|
|
| ## Notes |
|
|
| The LLM-judged metrics for reasoning and delta-caption quality depend on the |
| judge model and prompt configuration. Use the released evaluation scripts for |
| reproducible comparisons, and report the exact judge configuration together |
| with the checkpoint. |
|
|
| ## License |
|
|
| These checkpoints are built with the BeyondSingleObject codebase and use |
| PointLLM-style initialization and data. They may inherit terms from upstream |
| model, code, and dataset components, including PointLLM, Vicuna/Llama, |
| Objaverse/Cap3D, ShapeTalk, Thingi10K, Neural Shape Mating, and ModelNet40. |
| Please check the corresponding upstream licenses before redistribution or |
| commercial use. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @inproceedings{ide2026beyondsingleobject, |
| title={BeyondSingleObject: Learning 3D Relations with Large Language Models}, |
| author={Ide, Kohsuke and Yamada, Ryousuke and Qiu, Yue and Ma, Xianzheng and Fukuhara, Yoshihiro and Kataoka, Hirokatsu and Satoh, Yutaka}, |
| booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings}, |
| year={2026} |
| } |
| ``` |
|
|