idekoh
/

Multi-3DLLM

Text Generation

Model card Files Files and versions

Multi-3DLLM / README.md

idekoh's picture

Update checkpoint download instructions

d5dd91a verified 6 days ago

|

history blame contribute delete

2.31 kB

	---
	license: other
	language:
	- en
	tags:
	- 3d
	- point-cloud
	- multimodal
	- multi-object
	- pointllm
	- modelnet40
	pipeline_tag: text-generation
	---

	# Multi-3DLLM Checkpoints

	This repository hosts the released BeyondSingleObject checkpoints:

	- `multi-3dllm/`: MO3D, Shape Mating, and Change Captioning
	- `multi-3dllm-classification/`: ModelNet40 zero-shot classification

	Use the code and scripts from:

	```text
	https://github.com/KohsukeIde/BeyondSingleObject
	```

	## Download

	```bash
	huggingface-cli download idekoh/Multi-3DLLM \
	--local-dir checkpoints \
	--include "multi-3dllm/" "multi-3dllm-classification/"
	```

	Expected local layout:

	```text
	checkpoints/
	├── multi-3dllm/
	└── multi-3dllm-classification/
	data/
	```

	## Usage

	Example inference and LLM-based evaluation:

	```bash
	MODEL_PATH=checkpoints/multi-3dllm \
	OUTPUT_DIR=outputs/infer \
	scripts/eval/infer.sh
	```

	ModelNet40 classification:

	```bash
	MODEL_PATH=checkpoints/multi-3dllm-classification \
	OUTPUT_DIR=outputs/modelnet40_eval \
	LIMIT=0 \
	PROMPT_MODE=paper \
	NUM_OBJECTS=1 \
	TARGET_POSITION=1 \
	scripts/eval/eval_modelnet.sh
	```

	Repeat `(NUM_OBJECTS, TARGET_POSITION) = (1,1), (2,1), (2,2), (3,1), (3,2),
	(3,3)` for the full table.

	## Notes

	The LLM-judged metrics for reasoning and delta-caption quality depend on the
	judge model and prompt configuration. Use the released evaluation scripts for
	reproducible comparisons, and report the exact judge configuration together
	with the checkpoint.

	## License

	These checkpoints are built with the BeyondSingleObject codebase and use
	PointLLM-style initialization and data. They may inherit terms from upstream
	model, code, and dataset components, including PointLLM, Vicuna/Llama,
	Objaverse/Cap3D, ShapeTalk, Thingi10K, Neural Shape Mating, and ModelNet40.
	Please check the corresponding upstream licenses before redistribution or
	commercial use.

	## Citation

	```bibtex
	@inproceedings{ide2026beyondsingleobject,
	title={BeyondSingleObject: Learning 3D Relations with Large Language Models},
	author={Ide, Kohsuke and Yamada, Ryousuke and Qiu, Yue and Ma, Xianzheng and Fukuhara, Yoshihiro and Kataoka, Hirokatsu and Satoh, Yutaka},
	booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings},
	year={2026}
	}
	```