Add model card and metadata
Browse filesHi! I'm Niels from the Hugging Face community science team. I noticed that this repository currently lacks a model card. I've created this PR to add a README that includes relevant metadata, such as the pipeline tag and a link to your paper, along with usage instructions derived from your GitHub repository. This will help make your work more discoverable and easier for others to use.
README.md
ADDED
|
@@ -0,0 +1,39 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
pipeline_tag: image-segmentation
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
# InstructSAM: Segment Any Instance with Any Instructions
|
| 6 |
+
|
| 7 |
+
InstructSAM is a unified and streamlined framework designed for multi-instance segmentation under arbitrary instructions. It formulates instruction-driven instance segmentation as a set-structured query prediction problem, bridging a vision-language model (VLM) and SAM3. This design equips SAM3 with high-level instruction understanding and compositional reasoning without modifying its core architecture.
|
| 8 |
+
|
| 9 |
+
- **Paper:** [InstructSAM: Segment Any Instance with Any Instructions](https://huggingface.co/papers/2605.26102)
|
| 10 |
+
- **Repository:** [https://github.com/DCDmllm/InstructSAM](https://github.com/DCDmllm/InstructSAM)
|
| 11 |
+
|
| 12 |
+
## Usage
|
| 13 |
+
|
| 14 |
+
To use this model, please refer to the [official repository](https://github.com/DCDmllm/InstructSAM) for environment setup and installation.
|
| 15 |
+
|
| 16 |
+
You can run single-image inference using the provided inference script:
|
| 17 |
+
|
| 18 |
+
```bash
|
| 19 |
+
python3 -m instructsam.infer \
|
| 20 |
+
--model_path CircleRadon/InstructSAM-2B \
|
| 21 |
+
--image-path path/to/image.jpg \
|
| 22 |
+
--query "Please segment the object in the image." \
|
| 23 |
+
--output-dir vis
|
| 24 |
+
```
|
| 25 |
+
|
| 26 |
+
The script prints the generated text and mask scores, then writes mask overlays to `vis/`.
|
| 27 |
+
|
| 28 |
+
## Citation
|
| 29 |
+
|
| 30 |
+
If you find this project useful, please cite using this BibTeX:
|
| 31 |
+
|
| 32 |
+
```bibtex
|
| 33 |
+
@article{yuan2026instructsam,
|
| 34 |
+
title = {InstructSAM: Segment Any Instance with Any Instructions},
|
| 35 |
+
author = {Yuqian Yuan, Wentong Li, Zhaocheng Li Yutong Lin, Juncheng Li, Siliang Tang, Jun Xiao, Yueting Zhuang, Wenqiao Zhang},
|
| 36 |
+
year = {2026},
|
| 37 |
+
journal = {arXiv},
|
| 38 |
+
}
|
| 39 |
+
```
|