Xinging commited on
Commit
3935961
Β·
verified Β·
1 Parent(s): b1d5beb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -2
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: README
3
  emoji: πŸŒ–
4
  colorFrom: blue
5
  colorTo: indigo
@@ -9,4 +9,71 @@ license: apache-2.0
9
  short_description: Enables reasoning-LLM to ask clarification questions
10
  ---
11
 
12
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Proactive Interactive Reasoning (PIR)
3
  emoji: πŸŒ–
4
  colorFrom: blue
5
  colorTo: indigo
 
9
  short_description: Enables reasoning-LLM to ask clarification questions
10
  ---
11
 
12
+ # Reasoning While Asking: Transforming Reasoning LLMs into Proactive Inquirers (PIR)
13
+
14
+ [![arXiv](https://img.shields.io/badge/arXiv-2601.22139-b31b1b.svg)](https://arxiv.org/abs/2601.22139)
15
+ [![GitHub](https://img.shields.io/badge/GitHub-Proactive--Interactive--R1-black?logo=github)](https://github.com/Proactive-Interactive-R1)
16
+ [![SwanLab](https://img.shields.io/badge/SwanLab-Training%20Logs-438440)](https://swanlab.cn/@chenx/Proactive-Interactive-R1)
17
+
18
+ This organization hosts the official models and datasets for the paper **"Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers"**.
19
+
20
+ ## πŸ’‘ Motivation
21
+
22
+ Current reasoning LLMs (e.g., GPT-o1, DeepSeek-R1) suffer from **blind self-thinking**: they perform extensive internal reasoning even when critical information is missing or user intent is ambiguous. This leads to overthinking, hallucinations, and misaligned conclusions.
23
+
24
+ **PIR (Proactive Interactive Reasoning)** is a new paradigm that transforms reasoning LLMs from passive solvers into **proactive inquirers**. Instead of guessing, PIR-enabled models detect uncertainty during reasoning and actively ask users for clarification before proceeding.
25
+
26
+ ![PIR Framework Overview](https://raw.githubusercontent.com/Proactive-Interactive-R1/Proactive-Interactive-R1/main/image/paradigm.png)
27
+ *(Note: If the image above does not load, please view it on our [GitHub](https://github.com/Proactive-Interactive-R1))*
28
+
29
+ ### Key Features
30
+
31
+ - **User-Intent Alignment**: Optimizes interaction through US-GRPO with composite rewards balancing accuracy, efficiency, and helpfulness.
32
+ - **Significant Improvements**: Up to **32.70% higher accuracy**, **22.90% higher pass rate**, and **41.36 BLEU improvement** over baselines.
33
+ - **Reduced Computation**: Nearly halves unnecessary reasoning tokens and interaction turns.
34
+
35
+ ## πŸ“¦ Models
36
+
37
+ We provide the following models trained with the PIR paradigm:
38
+
39
+ | Model Name | Description | Link |
40
+ | :--- | :--- | :--- |
41
+ | **Proactive-Interactive-R1-Math-7B** | The core model optimized for mathematical reasoning with clarification capabilities. | [View Model](https://huggingface.co/Proactive-Interactive-R1/Proactive-Interactive-R1-Math-7B) |
42
+ | **Proactive-Interactive-R1-Math-7B-Pro** | An enhanced version of the Math-7B model. | [View Model](https://huggingface.co/Proactive-Interactive-R1/Proactive-Interactive-R1-Math-7B-Pro) |
43
+ | **Proactive-Interactive-R1-SFT-7B** | The base SFT model before Reinforcement Learning alignment. | [View Model](https://huggingface.co/Proactive-Interactive-R1/Proactive-Interactive-R1-SFT-7B) |
44
+
45
+ ## πŸ“š Datasets
46
+
47
+ The datasets used to train and evaluate PIR are available here:
48
+
49
+ - **[Reasoning-While-Asking-SFT-Dataset](https://huggingface.co/datasets/Proactive-Interactive-R1/Reasoning-While-Asking-SFT-Dataset)**: The dataset used for the initial Supervised Fine-Tuning (SFT) phase.
50
+ - **[DeepSeek-R1-Distill-Data-5k](https://huggingface.co/datasets/Proactive-Interactive-R1/DeepSeek-R1-Distill-Data-5k)**: Distilled data used for training.
51
+
52
+ ## πŸ”¬ Method
53
+
54
+ PIR consists of two phases:
55
+
56
+ 1. **Interactive Capability Activation (Phase I)**:
57
+ * Detects uncertainty via **Predictive Entropy** at each reasoning step.
58
+ * Injects clarification questions at high-uncertainty points using instruction-following LLMs.
59
+ * Performs **Supervised Fine-Tuning** to teach models the "think-ask-respond" pattern.
60
+
61
+ 2. **User-Intent Alignment (Phase II)**:
62
+ * **US-GRPO**: Group Relative Policy Optimization with a dynamic User Simulator.
63
+ * **Composite Reward**: Combines output accuracy (extrinsic) with reasoning efficiency and helpfulness (intrinsic).
64
+ * Aligns model behavior with user intent while minimizing unnecessary interactions.
65
+
66
+ ## πŸ“œ Citation
67
+
68
+ If you find this work useful, please cite our paper:
69
+
70
+ ```bibtex
71
+ @misc{chen2026reasoningaskingtransformingreasoning,
72
+ title={Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers},
73
+ author={Xin Chen and Feng Jiang and Yiqian Zhang and Hardy Chen and Shuo Yan and Wenya Xie and Min Yang and Shujian Huang},
74
+ year={2026},
75
+ eprint={2601.22139},
76
+ archivePrefix={arXiv},
77
+ primaryClass={cs.CL},
78
+ url={https://arxiv.org/abs/2601.22139},
79
+ }