| | --- |
| | license: apache-2.0 |
| | datasets: |
| | - Frywind/GREAM_data |
| | --- |
| | |
| | # 🧠 GREAM: Generative Reasoning Recommendation Model |
| |
|
| | **Paper:** *[Generative Reasoning Recommendation via LLMs,](https://arxiv.org/pdf/2510.20815) 2025.* |
| | **Authors:** Minjie Hong\*, Zetong Zhou\*, Zirun Guo, Ziang Zhang, Ruofan Hu, Weinan Gan, Jieming Zhu, Zhou Zhao† |
| | **Repository:** [https://github.com/Indolent-Kawhi/GRRM](https://github.com/Indolent-Kawhi/GRRM) |
| | **HF Papers Link:** [https://huggingface.co/papers/2510.20815](https://huggingface.co/papers/2510.20815) |
| |
|
| | --- |
| |
|
| | ## 🧩 Model Summary |
| |
|
| | **GREAM** (Generative Reasoning Recommendation Model) is a **large language model (LLM)-based generative reasoning recommender** designed to unify *understanding, reasoning,* and *prediction* for recommendation tasks. |
| | It introduces a **reasoning-enhanced, verifiable reinforcement learning** framework that allows both high-throughput direct recommendations and interpretable reasoning-based outputs. |
| |
|
| | ### Key Features |
| | - **Collaborative–Semantic Alignment:** Fuses textual (titles, descriptions, reviews) and behavioral signals to align linguistic and collaborative semantics. |
| | - **Reasoning Curriculum Activation:** Builds synthetic *Chain-of-Thought (CoT)* data and trains via curriculum to develop causal reasoning for recommendations. |
| | - **Sparse-Regularized Group Policy Optimization (SRPO):** Enables stable RL fine-tuning using *Residual-Sensitive Verifiable Rewards* and *Bonus-Calibrated Group Advantage Estimation* for sparse feedback. |
| |
|
| | --- |
| |
|
| | ## 🧠 Model Architecture |
| |
|
| | | Component | Description | |
| | |------------|--------------| |
| | | **Backbone** | Qwen3-4B-Instruct| |
| | | **Indexing** | Residual Quantization (RQ-KMeans, 5 levels, 256 values per level) | |
| | | **Training Phases** | ① Collaborative–Semantic Alignment → ② Reasoning Curriculum Activation → ③ SRPO Reinforcement Learning | |
| | | **Inference Modes** | - **Direct Sequence Recommendation:** low-latency item generation<br> - **Sequential Reasoning Recommendation:** interpretable CoT reasoning chains | |
| | | **RL Framework** | Verl + SGLang backend| |
| |
|
| | --- |
| |
|
| | ## 📚 Training Data |
| |
|
| | | Data Type | Source | Description | |
| | |------------|---------|-------------| |
| | | **D<sub>align</sub>** | Amazon Review Datasets (Beauty, Sports, Instruments) | Sequential, semantic reconstruction, and preference understanding tasks | |
| | | **D<sub>reason</sub>** | Synthetic CoT data generated via GPT-5 / Qwen3-30B / Llama-3.1 | Multi-step reasoning sequences with `<think>...</think>` and `<answer>...</answer>` supervision | |
| | | **Text Sources** | Item titles, descriptions, and high-quality reviews | Combined and rewritten to form dense item semantics | |
| |
|
| |
|
| | --- |
| |
|
| | ## 📊 Evaluation |
| |
|
| | ### Datasets |
| | - **Amazon-Beauty** |
| | - **Amazon-Sports & Outdoors** |
| | - **Amazon-Musical Instruments** |
| |
|
| | ## Citation |
| |
|
| | ``` |
| | @misc{hong2025generativereasoningrecommendationllms, |
| | title={Generative Reasoning Recommendation via LLMs}, |
| | author={Minjie Hong and Zetong Zhou and Zirun Guo and Ziang Zhang and Ruofan Hu and Weinan Gan and Jieming Zhu and Zhou Zhao}, |
| | year={2025}, |
| | eprint={2510.20815}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.IR}, |
| | url={https://arxiv.org/abs/2510.20815}, |
| | } |
| | ``` |