| --- |
| language: |
| - en |
| license: other |
| pipeline_tag: text-generation |
| tags: |
| - clinical-nlp |
| - medical-coding |
| - icd10 |
| - icd-10-cm |
| - reasoning |
| - reinforcement-learning |
| - grpo |
| - healthcare |
| base_model: |
| - Qwen/Qwen2.5-32B-Instruct |
| --- |
| |
| # DeepICD-R1-zero-32B |
|
|
| ## Model Summary |
|
|
| **DeepICD-R1-zero-32B** is a clinical reasoning model designed for **ICD-10-CM diagnosis outcome prediction from admission notes**. |
| It follows the **DeepICD-R1 framework**, which treats diagnosis prediction as a reasoning task optimized with reinforcement learning and structured reward signals. |
|
|
| This checkpoint corresponds to a **“R1-Zero” style model**, meaning it was trained primarily through **reinforcement learning without a supervised fine-tuning (SFT) initialization**, allowing reasoning behaviors to emerge directly from reward optimization. |
|
|
| The approach is inspired by reasoning-focused training pipelines where reinforcement learning alone can induce structured reasoning behaviors and self-verification in large language models. |
|
|
| --- |
|
|
| # Model Details |
|
|
| - **Model name:** DeepICD-R1-zero-32B |
| - **Organization:** DATEXIS |
| - **Model size:** ~32B parameters |
| - **Task:** Single ICD-10-CM diagnosis prediction from clinical text |
| - **Training paradigm:** Reinforcement learning (GRPO-style) |
| - **Framework:** VERL reinforcement learning trainer |
| - **Domain:** Clinical NLP / medical reasoning |
|
|
| ### Related Research |
|
|
| This model follows the **DeepICD-R1** framework introduced in: |
|
|
| > *DeepICD-R1: Medical Reasoning through Hierarchical Rewards and Unsupervised Distillation* |
|
|
| The paper proposes a system for diagnosis prediction that combines: |
|
|
| - structured reasoning traces |
| - hierarchical reward signals aligned with ICD code structure |
| - reinforcement learning for reasoning optimization |
|
|
| --- |
|
|
| # Intended Use |
|
|
| This model is intended for **research purposes**, including: |
|
|
| - clinical reasoning experiments |
| - ICD-10-CM code prediction research |
| - reinforcement learning for language models |
| - reasoning trace generation |
| - structured prediction from clinical notes |
|
|
| ### Out-of-Scope Use |
|
|
| This model **must not** be used for: |
|
|
| - medical diagnosis |
| - clinical decision making |
| - patient triage |
| - automated medical coding without expert supervision |
| - billing or compliance workflows |
|
|
| --- |
|
|
| # Training Methodology |
|
|
| ## R1-Zero Training Paradigm |
|
|
| The model follows a **Zero-stage reasoning training approach**, where reinforcement learning is applied directly to a base language model without prior supervised instruction tuning. |
|
|
| This method encourages models to discover reasoning strategies autonomously during training, allowing behaviors such as: |
|
|
| - chain-of-thought reasoning |
| - self-verification |
| - iterative reasoning refinement |
|
|
| to emerge naturally from the reward signal. |
|
|
| However, purely RL-trained models may also exhibit issues such as: |
|
|
| - repetitive reasoning patterns |
| - readability problems |
| - mixed language outputs |
|
|
| --- |
|
|
| # Training Data |
|
|
| The training task uses **clinical admission notes paired with ICD-10-CM diagnoses**, derived from de-identified electronic health record datasets such as **MIMIC-IV**. |
|
|
| Task formulation: |
|
|
| - **Input:** admission note describing a patient case |
| - **Output:** reasoning trace and predicted ICD-10-CM code |
|
|
| The model learns to infer diagnostic outcomes based on the textual description of the patient presentation. |
|
|
| --- |
|
|
| # Output Format |
|
|
| The model is trained to produce structured outputs separating reasoning from the final diagnosis. |
|
|
| ### Example |
|
|
| ```text |
| <think> |
| The patient presents with ... |
| Symptoms and history suggest ... |
| ... |
| </think> |
| |
| <diagnosis> |
| M5116 |
| </diagnosis> |
| ``` |
| The reasoning trace allows the model to explain how the diagnosis is derived from the clinical note. |
|
|
| --- |
|
|
| ## Evaluation |
|
|
| Evaluation follows the methodology described in the **DeepICD-R1 paper**. |
|
|
| Performance is typically measured using **macro-averaged F1 scores** at multiple levels of the ICD hierarchy. |
|
|
| | Level | Description | |
| |------|-------------| |
| | Chapter | Broad ICD category | |
| | Category | First three digits | |
| | Full code | Complete ICD-10 code | |
|
|
| Hierarchical evaluation allows partial credit when the model predicts the correct high-level diagnostic category even if the full code is incorrect. |
|
|
| --- |
|
|
| ## Limitations |
|
|
| Models following the DeepICD-R1 framework share several limitations. |
|
|
| ### Dataset limitations |
|
|
| - Training data consists primarily of **English clinical notes** |
| - Distribution reflects **hospital-specific patient populations** |
| - ICD labels are **highly imbalanced**, affecting rare diagnoses |
|
|
| ### Model limitations |
|
|
| - Reasoning traces may appear convincing while being incorrect |
| - Predictions may fail for rare or long-tail diagnoses |
| - Models may demonstrate **premature diagnostic closure** |
| - Reinforcement learning signals are only proxies for expert feedback |
|
|
| --- |
|
|
| ## Ethical Considerations |
|
|
| This model is trained on **de-identified clinical data** and intended strictly for research. |
|
|
| Potential risks include: |
|
|
| - propagation of dataset biases |
| - overconfidence in generated reasoning |
| - misuse in clinical decision making |
|
|
| Appropriate safeguards include: |
|
|
| - expert oversight |
| - dataset bias evaluation |
| - fairness audits |
| - controlled deployment environments |
|
|
| --- |
|
|
| ## Hardware and Training Setup |
|
|
| Typical training configuration for models in this family includes: |
|
|
| - **GPUs:** multi-GPU training (4–8 GPUs) |
| - **Precision:** bfloat16 |
| - **Rollout engine:** vLLM |
| - **Training framework:** VERL PPO/GRPO trainer |
| - **Sampling:** multiple rollouts per prompt |
|
|
| --- |
|
|
| ## Usage |
|
|
| ### Transformers Example |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForCausalLM |
| |
| model_id = "DATEXIS/DeepICD-R1-zero-32B" |
| |
| tokenizer = AutoTokenizer.from_pretrained(model_id) |
| model = AutoModelForCausalLM.from_pretrained( |
| model_id, |
| device_map="auto", |
| torch_dtype="auto" |
| ) |
| |
| prompt = """ |
| You are a clinical reasoning model. |
| |
| Given the following admission note, |
| produce reasoning in <think> tags |
| and a final ICD-10 diagnosis in <diagnosis> tags. |
| |
| [ADMISSION NOTE] |
| """ |
| |
| inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
| |
| outputs = model.generate( |
| **inputs, |
| max_new_tokens=512, |
| ) |
| |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| ``` |
|
|
| ## Recommended Inference Practices |
|
|
| - Use prompts consistent with the training format. |
| - Validate predicted ICD-10 codes against official code formats. |
| - Always review predictions with medical experts. |
| - Avoid exposing reasoning traces in safety-critical settings without verification. |
|
|
| --- |
|
|
| ## Citation |
|
|
| If you use this model, please cite: |
|
|
| ```bibtex |
| @inproceedings{roehr2026deepicdr1, |
| title={DeepICD-R1: Medical Reasoning through Hierarchical Rewards and Unsupervised Distillation}, |
| author={R{\"o}hr, Tom and Steffek, Thomas and Teucher, Roman and Bressem, Keno and others}, |
| booktitle={Proceedings of LREC-COLING}, |
| year={2026} |
| } |
| |
| |