Multi-3DLLM / README.md
idekoh's picture
Update checkpoint download instructions
d5dd91a verified
---
license: other
language:
- en
tags:
- 3d
- point-cloud
- multimodal
- multi-object
- pointllm
- modelnet40
pipeline_tag: text-generation
---
# Multi-3DLLM Checkpoints
This repository hosts the released BeyondSingleObject checkpoints:
- `multi-3dllm/`: MO3D, Shape Mating, and Change Captioning
- `multi-3dllm-classification/`: ModelNet40 zero-shot classification
Use the code and scripts from:
```text
https://github.com/KohsukeIde/BeyondSingleObject
```
## Download
```bash
huggingface-cli download idekoh/Multi-3DLLM \
--local-dir checkpoints \
--include "multi-3dllm/**" "multi-3dllm-classification/**"
```
Expected local layout:
```text
checkpoints/
β”œβ”€β”€ multi-3dllm/
└── multi-3dllm-classification/
data/
```
## Usage
Example inference and LLM-based evaluation:
```bash
MODEL_PATH=checkpoints/multi-3dllm \
OUTPUT_DIR=outputs/infer \
scripts/eval/infer.sh
```
ModelNet40 classification:
```bash
MODEL_PATH=checkpoints/multi-3dllm-classification \
OUTPUT_DIR=outputs/modelnet40_eval \
LIMIT=0 \
PROMPT_MODE=paper \
NUM_OBJECTS=1 \
TARGET_POSITION=1 \
scripts/eval/eval_modelnet.sh
```
Repeat `(NUM_OBJECTS, TARGET_POSITION) = (1,1), (2,1), (2,2), (3,1), (3,2),
(3,3)` for the full table.
## Notes
The LLM-judged metrics for reasoning and delta-caption quality depend on the
judge model and prompt configuration. Use the released evaluation scripts for
reproducible comparisons, and report the exact judge configuration together
with the checkpoint.
## License
These checkpoints are built with the BeyondSingleObject codebase and use
PointLLM-style initialization and data. They may inherit terms from upstream
model, code, and dataset components, including PointLLM, Vicuna/Llama,
Objaverse/Cap3D, ShapeTalk, Thingi10K, Neural Shape Mating, and ModelNet40.
Please check the corresponding upstream licenses before redistribution or
commercial use.
## Citation
```bibtex
@inproceedings{ide2026beyondsingleobject,
title={BeyondSingleObject: Learning 3D Relations with Large Language Models},
author={Ide, Kohsuke and Yamada, Ryousuke and Qiu, Yue and Ma, Xianzheng and Fukuhara, Yoshihiro and Kataoka, Hirokatsu and Satoh, Yutaka},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings},
year={2026}
}
```