Spaces:
No application file
No application file
Update README.md
Browse files
README.md
CHANGED
|
@@ -13,7 +13,6 @@ short_description: Enables reasoning-LLM to ask clarification questions
|
|
| 13 |
|
| 14 |
[](https://arxiv.org/abs/2601.22139)
|
| 15 |
[](https://github.com/Proactive-Interactive-R1)
|
| 16 |
-
[](https://swanlab.cn/@chenx/Proactive-Interactive-R1)
|
| 17 |
|
| 18 |
This organization hosts the official models and datasets for the paper **"Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers"**.
|
| 19 |
|
|
@@ -49,20 +48,6 @@ The datasets used to train and evaluate PIR are available here:
|
|
| 49 |
- **[Reasoning-While-Asking-SFT-Dataset](https://huggingface.co/datasets/Proactive-Interactive-R1/Reasoning-While-Asking-SFT-Dataset)**: The dataset used for the initial Supervised Fine-Tuning (SFT) phase.
|
| 50 |
- **[DeepSeek-R1-Distill-Data-5k](https://huggingface.co/datasets/Proactive-Interactive-R1/DeepSeek-R1-Distill-Data-5k)**: Distilled data used for training.
|
| 51 |
|
| 52 |
-
## 🔬 Method
|
| 53 |
-
|
| 54 |
-
PIR consists of two phases:
|
| 55 |
-
|
| 56 |
-
1. **Interactive Capability Activation (Phase I)**:
|
| 57 |
-
* Detects uncertainty via **Predictive Entropy** at each reasoning step.
|
| 58 |
-
* Injects clarification questions at high-uncertainty points using instruction-following LLMs.
|
| 59 |
-
* Performs **Supervised Fine-Tuning** to teach models the "think-ask-respond" pattern.
|
| 60 |
-
|
| 61 |
-
2. **User-Intent Alignment (Phase II)**:
|
| 62 |
-
* **US-GRPO**: Group Relative Policy Optimization with a dynamic User Simulator.
|
| 63 |
-
* **Composite Reward**: Combines output accuracy (extrinsic) with reasoning efficiency and helpfulness (intrinsic).
|
| 64 |
-
* Aligns model behavior with user intent while minimizing unnecessary interactions.
|
| 65 |
-
|
| 66 |
## 📜 Citation
|
| 67 |
|
| 68 |
If you find this work useful, please cite our paper:
|
|
|
|
| 13 |
|
| 14 |
[](https://arxiv.org/abs/2601.22139)
|
| 15 |
[](https://github.com/Proactive-Interactive-R1)
|
|
|
|
| 16 |
|
| 17 |
This organization hosts the official models and datasets for the paper **"Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers"**.
|
| 18 |
|
|
|
|
| 48 |
- **[Reasoning-While-Asking-SFT-Dataset](https://huggingface.co/datasets/Proactive-Interactive-R1/Reasoning-While-Asking-SFT-Dataset)**: The dataset used for the initial Supervised Fine-Tuning (SFT) phase.
|
| 49 |
- **[DeepSeek-R1-Distill-Data-5k](https://huggingface.co/datasets/Proactive-Interactive-R1/DeepSeek-R1-Distill-Data-5k)**: Distilled data used for training.
|
| 50 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
## 📜 Citation
|
| 52 |
|
| 53 |
If you find this work useful, please cite our paper:
|