Xinging commited on
Commit
35aa597
·
verified ·
1 Parent(s): 3935961

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -15
README.md CHANGED
@@ -13,7 +13,6 @@ short_description: Enables reasoning-LLM to ask clarification questions
13
 
14
  [![arXiv](https://img.shields.io/badge/arXiv-2601.22139-b31b1b.svg)](https://arxiv.org/abs/2601.22139)
15
  [![GitHub](https://img.shields.io/badge/GitHub-Proactive--Interactive--R1-black?logo=github)](https://github.com/Proactive-Interactive-R1)
16
- [![SwanLab](https://img.shields.io/badge/SwanLab-Training%20Logs-438440)](https://swanlab.cn/@chenx/Proactive-Interactive-R1)
17
 
18
  This organization hosts the official models and datasets for the paper **"Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers"**.
19
 
@@ -49,20 +48,6 @@ The datasets used to train and evaluate PIR are available here:
49
  - **[Reasoning-While-Asking-SFT-Dataset](https://huggingface.co/datasets/Proactive-Interactive-R1/Reasoning-While-Asking-SFT-Dataset)**: The dataset used for the initial Supervised Fine-Tuning (SFT) phase.
50
  - **[DeepSeek-R1-Distill-Data-5k](https://huggingface.co/datasets/Proactive-Interactive-R1/DeepSeek-R1-Distill-Data-5k)**: Distilled data used for training.
51
 
52
- ## 🔬 Method
53
-
54
- PIR consists of two phases:
55
-
56
- 1. **Interactive Capability Activation (Phase I)**:
57
- * Detects uncertainty via **Predictive Entropy** at each reasoning step.
58
- * Injects clarification questions at high-uncertainty points using instruction-following LLMs.
59
- * Performs **Supervised Fine-Tuning** to teach models the "think-ask-respond" pattern.
60
-
61
- 2. **User-Intent Alignment (Phase II)**:
62
- * **US-GRPO**: Group Relative Policy Optimization with a dynamic User Simulator.
63
- * **Composite Reward**: Combines output accuracy (extrinsic) with reasoning efficiency and helpfulness (intrinsic).
64
- * Aligns model behavior with user intent while minimizing unnecessary interactions.
65
-
66
  ## 📜 Citation
67
 
68
  If you find this work useful, please cite our paper:
 
13
 
14
  [![arXiv](https://img.shields.io/badge/arXiv-2601.22139-b31b1b.svg)](https://arxiv.org/abs/2601.22139)
15
  [![GitHub](https://img.shields.io/badge/GitHub-Proactive--Interactive--R1-black?logo=github)](https://github.com/Proactive-Interactive-R1)
 
16
 
17
  This organization hosts the official models and datasets for the paper **"Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers"**.
18
 
 
48
  - **[Reasoning-While-Asking-SFT-Dataset](https://huggingface.co/datasets/Proactive-Interactive-R1/Reasoning-While-Asking-SFT-Dataset)**: The dataset used for the initial Supervised Fine-Tuning (SFT) phase.
49
  - **[DeepSeek-R1-Distill-Data-5k](https://huggingface.co/datasets/Proactive-Interactive-R1/DeepSeek-R1-Distill-Data-5k)**: Distilled data used for training.
50
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
  ## 📜 Citation
52
 
53
  If you find this work useful, please cite our paper: