Spaces:

Proactive-Interactive-R1
/

README

No application file

App Files Files Community

Xinging commited on 10 days ago

Commit

35aa597

verified ·

1 Parent(s): 3935961

Update README.md

Browse files

Files changed (1) hide show

README.md +0 -15

README.md CHANGED Viewed

@@ -13,7 +13,6 @@ short_description: Enables reasoning-LLM to ask clarification questions
 [![arXiv](https://img.shields.io/badge/arXiv-2601.22139-b31b1b.svg)](https://arxiv.org/abs/2601.22139)
 [![GitHub](https://img.shields.io/badge/GitHub-Proactive--Interactive--R1-black?logo=github)](https://github.com/Proactive-Interactive-R1)
-[![SwanLab](https://img.shields.io/badge/SwanLab-Training%20Logs-438440)](https://swanlab.cn/@chenx/Proactive-Interactive-R1)
 This organization hosts the official models and datasets for the paper **"Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers"**.
@@ -49,20 +48,6 @@ The datasets used to train and evaluate PIR are available here:
 - **[Reasoning-While-Asking-SFT-Dataset](https://huggingface.co/datasets/Proactive-Interactive-R1/Reasoning-While-Asking-SFT-Dataset)**: The dataset used for the initial Supervised Fine-Tuning (SFT) phase.
 - **[DeepSeek-R1-Distill-Data-5k](https://huggingface.co/datasets/Proactive-Interactive-R1/DeepSeek-R1-Distill-Data-5k)**: Distilled data used for training.
-## 🔬 Method
-PIR consists of two phases:
-1.  **Interactive Capability Activation (Phase I)**:
-    *   Detects uncertainty via **Predictive Entropy** at each reasoning step.
-    *   Injects clarification questions at high-uncertainty points using instruction-following LLMs.
-    *   Performs **Supervised Fine-Tuning** to teach models the "think-ask-respond" pattern.
-2.  **User-Intent Alignment (Phase II)**:
-    *   **US-GRPO**: Group Relative Policy Optimization with a dynamic User Simulator.
-    *   **Composite Reward**: Combines output accuracy (extrinsic) with reasoning efficiency and helpfulness (intrinsic).
-    *   Aligns model behavior with user intent while minimizing unnecessary interactions.
 ## 📜 Citation
 If you find this work useful, please cite our paper:

 [![arXiv](https://img.shields.io/badge/arXiv-2601.22139-b31b1b.svg)](https://arxiv.org/abs/2601.22139)
 [![GitHub](https://img.shields.io/badge/GitHub-Proactive--Interactive--R1-black?logo=github)](https://github.com/Proactive-Interactive-R1)
 This organization hosts the official models and datasets for the paper **"Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers"**.
 - **[Reasoning-While-Asking-SFT-Dataset](https://huggingface.co/datasets/Proactive-Interactive-R1/Reasoning-While-Asking-SFT-Dataset)**: The dataset used for the initial Supervised Fine-Tuning (SFT) phase.
 - **[DeepSeek-R1-Distill-Data-5k](https://huggingface.co/datasets/Proactive-Interactive-R1/DeepSeek-R1-Distill-Data-5k)**: Distilled data used for training.
 ## 📜 Citation
 If you find this work useful, please cite our paper: