---
license: other
language:
- en
tags:
- 3d
- point-cloud
- multimodal
- multi-object
- pointllm
- modelnet40
pipeline_tag: text-generation
---

# Multi-3DLLM Checkpoints

This repository hosts the released BeyondSingleObject checkpoints:

- `multi-3dllm/`: MO3D, Shape Mating, and Change Captioning
- `multi-3dllm-classification/`: ModelNet40 zero-shot classification

Use the code and scripts from:

```text
https://github.com/KohsukeIde/BeyondSingleObject
```

## Download

```bash
huggingface-cli download idekoh/Multi-3DLLM \
  --local-dir checkpoints \
  --include "multi-3dllm/**" "multi-3dllm-classification/**"
```

Expected local layout:

```text
checkpoints/
├── multi-3dllm/
└── multi-3dllm-classification/
data/
```

## Usage

Example inference and LLM-based evaluation:

```bash
MODEL_PATH=checkpoints/multi-3dllm \
OUTPUT_DIR=outputs/infer \
scripts/eval/infer.sh
```

ModelNet40 classification:

```bash
MODEL_PATH=checkpoints/multi-3dllm-classification \
OUTPUT_DIR=outputs/modelnet40_eval \
LIMIT=0 \
PROMPT_MODE=paper \
NUM_OBJECTS=1 \
TARGET_POSITION=1 \
scripts/eval/eval_modelnet.sh
```

Repeat `(NUM_OBJECTS, TARGET_POSITION) = (1,1), (2,1), (2,2), (3,1), (3,2),
(3,3)` for the full table.

## Notes

The LLM-judged metrics for reasoning and delta-caption quality depend on the
judge model and prompt configuration. Use the released evaluation scripts for
reproducible comparisons, and report the exact judge configuration together
with the checkpoint.

## License

These checkpoints are built with the BeyondSingleObject codebase and use
PointLLM-style initialization and data. They may inherit terms from upstream
model, code, and dataset components, including PointLLM, Vicuna/Llama,
Objaverse/Cap3D, ShapeTalk, Thingi10K, Neural Shape Mating, and ModelNet40.
Please check the corresponding upstream licenses before redistribution or
commercial use.

## Citation

```bibtex
@inproceedings{ide2026beyondsingleobject,
  title={BeyondSingleObject: Learning 3D Relations with Large Language Models},
  author={Ide, Kohsuke and Yamada, Ryousuke and Qiu, Yue and Ma, Xianzheng and Fukuhara, Yoshihiro and Kataoka, Hirokatsu and Satoh, Yutaka},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings},
  year={2026}
}
```