Link paper and GitHub repository
Browse filesThis PR improves the model card by adding links to the official paper on Hugging Face Papers and the code repository on GitHub. These additions help researchers locate the original work and implementation more easily. I have also added a brief summary of the ProCap framework based on the paper abstract.
README.md
CHANGED
|
@@ -1,29 +1,35 @@
|
|
| 1 |
---
|
| 2 |
-
license: mit
|
| 3 |
-
tags:
|
| 4 |
-
- change captioning
|
| 5 |
-
- vision-language
|
| 6 |
-
- image-to-text
|
| 7 |
-
- procedural reasoning
|
| 8 |
-
- multimodal
|
| 9 |
-
- pytorch
|
| 10 |
datasets:
|
| 11 |
- clevr-change
|
| 12 |
- image-editing-request
|
| 13 |
- spot-the-diff
|
|
|
|
| 14 |
metrics:
|
| 15 |
- bleu
|
| 16 |
- meteor
|
| 17 |
- rouge
|
| 18 |
pipeline_tag: image-to-text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
---
|
| 20 |
|
| 21 |
-
# ProCap:
|
| 22 |
|
| 23 |
This repository contains the **official experimental materials** for the paper:
|
| 24 |
|
| 25 |
> **Imagine How to Change: Explicit Procedure Modeling for Change Captioning**
|
| 26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
It provides **processed datasets**, **pre-trained model weights**, and **evaluation tools** for reproducing the results reported in the paper.
|
| 28 |
|
| 29 |
📦 All materials are also available via [Baidu Netdisk](https://pan.baidu.com/s/1t_YXB6J_vkuPxByn2hat2A)
|
|
@@ -135,7 +141,7 @@ If you find our work or this repository useful, please consider citing our paper
|
|
| 135 |
@inproceedings{
|
| 136 |
sun2026imagine,
|
| 137 |
title={Imagine How To Change: Explicit Procedure Modeling for Change Captioning},
|
| 138 |
-
author={Sun, Jiayang and Guo, Zixin and Cao, Min and Zhu, Guibo and Laaksonen, Jorma},
|
| 139 |
booktitle={The Fourteenth International Conference on Learning Representations},
|
| 140 |
year={2026},
|
| 141 |
}
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
datasets:
|
| 3 |
- clevr-change
|
| 4 |
- image-editing-request
|
| 5 |
- spot-the-diff
|
| 6 |
+
license: mit
|
| 7 |
metrics:
|
| 8 |
- bleu
|
| 9 |
- meteor
|
| 10 |
- rouge
|
| 11 |
pipeline_tag: image-to-text
|
| 12 |
+
tags:
|
| 13 |
+
- change captioning
|
| 14 |
+
- vision-language
|
| 15 |
+
- image-to-text
|
| 16 |
+
- procedural reasoning
|
| 17 |
+
- multimodal
|
| 18 |
+
- pytorch
|
| 19 |
---
|
| 20 |
|
| 21 |
+
# ProCap: Imagine How to Change
|
| 22 |
|
| 23 |
This repository contains the **official experimental materials** for the paper:
|
| 24 |
|
| 25 |
> **Imagine How to Change: Explicit Procedure Modeling for Change Captioning**
|
| 26 |
|
| 27 |
+
[[Paper](https://huggingface.co/papers/2603.05969)] [[Code](https://github.com/BlueberryOreo/ProCap)]
|
| 28 |
+
|
| 29 |
+
ProCap is a framework that reformulates change modeling from static image comparison to dynamic procedure modeling. It features a two-stage design:
|
| 30 |
+
1. **Explicit Procedure Modeling**: Trains a procedure encoder to learn the change procedure from a sparse set of keyframes.
|
| 31 |
+
2. **Implicit Procedure Captioning**: Integrates the trained encoder within an encoder-decoder model for captioning using learnable procedure queries.
|
| 32 |
+
|
| 33 |
It provides **processed datasets**, **pre-trained model weights**, and **evaluation tools** for reproducing the results reported in the paper.
|
| 34 |
|
| 35 |
📦 All materials are also available via [Baidu Netdisk](https://pan.baidu.com/s/1t_YXB6J_vkuPxByn2hat2A)
|
|
|
|
| 141 |
@inproceedings{
|
| 142 |
sun2026imagine,
|
| 143 |
title={Imagine How To Change: Explicit Procedure Modeling for Change Captioning},
|
| 144 |
+
author={Sun, Jiayang and Guo, Zixin and Cao, Min and Zhu, Guibo evangelist and Laaksonen, Jorma},
|
| 145 |
booktitle={The Fourteenth International Conference on Learning Representations},
|
| 146 |
year={2026},
|
| 147 |
}
|