| --- |
| license: afl-3.0 |
| language: |
| - en |
| - zh |
| metrics: |
| - accuracy |
| base_model: |
| - deepseek-ai/DeepSeek-R1-Distill-Llama-70B |
| pipeline_tag: text-generation |
| library_name: transformers |
| tags: |
| - medical |
| - deepseek-r1 |
| - health |
| - ehr |
| - reasoning |
| |
| gated: true |
|
|
| |
| extra_gated_heading: "Access Request" |
| extra_gated_description: "Please provide your organization and intended use." |
|
|
| |
| extra_gated_fields: |
| Affiliation: text |
| Research Purpose: text |
| Country: text |
| --- |
| |
| # RareSeek-R1: A specialized language model for rare disease diagnosis and reasoning |
|
|
| **RareSeek-R1** is a domain-specialized large language model for rare-disease diagnostic reasoning, developed through a Progressive Parameter-Efficient Transfer Learning framework. The model is first instruction-tuned on the clinically grounded RareMed-Corpus, a large, multi-source dataset deeply integrated from medical textbooks, guidelines, biomedical literature, and real-world EHR narratives. It is then fine-tuned on RareMed-CoT, a high-fidelity corpus designed to instill explicit, stepwise clinical reasoning aligned with real diagnostic workflows. To further enhance factual reliability, GraphRAG is incorporated to anchor the model’s inference to up-to-date variant–gene–phenotype–disease relationships. This retrieval augmentation substantially reduces hallucinations, improves factual calibration, and yields notable performance gains—particularly when EHR narratives are combined with prioritized genetic variants. Together, RareSeek-R1 performs direct reasoning over full-length EHRs, leverages graph-grounded retrieval, and demonstrably augments clinician-level diagnostic accuracy, advancing a reliable and scalable AI paradigm for rare-disease diagnosis. |
|
|
| <p align="center"> |
| <img src="https://github.com/yangtao1025/RareSeek-R1/raw/main/RareSeek-R1.png" alt="RareSeek-R1 Teaser Image" width="800"> |
| </p> |
|
|
| # **RareMedData**: [https://huggingface.co/datasets/TaoMedAI/RareMedData](https://huggingface.co/datasets/TaoMedAI/RareMedData) |