--- language: - en license: other pipeline_tag: text-generation tags: - clinical-nlp - medical-coding - icd10 - icd-10-cm - reasoning - reinforcement-learning - grpo - healthcare base_model: - Qwen/Qwen2.5-32B-Instruct --- # DeepICD-R1-zero-32B ## Model Summary **DeepICD-R1-zero-32B** is a clinical reasoning model designed for **ICD-10-CM diagnosis outcome prediction from admission notes**. It follows the **DeepICD-R1 framework**, which treats diagnosis prediction as a reasoning task optimized with reinforcement learning and structured reward signals. This checkpoint corresponds to a **“R1-Zero” style model**, meaning it was trained primarily through **reinforcement learning without a supervised fine-tuning (SFT) initialization**, allowing reasoning behaviors to emerge directly from reward optimization. The approach is inspired by reasoning-focused training pipelines where reinforcement learning alone can induce structured reasoning behaviors and self-verification in large language models. --- # Model Details - **Model name:** DeepICD-R1-zero-32B - **Organization:** DATEXIS - **Model size:** ~32B parameters - **Task:** Single ICD-10-CM diagnosis prediction from clinical text - **Training paradigm:** Reinforcement learning (GRPO-style) - **Framework:** VERL reinforcement learning trainer - **Domain:** Clinical NLP / medical reasoning ### Related Research This model follows the **DeepICD-R1** framework introduced in: > *DeepICD-R1: Medical Reasoning through Hierarchical Rewards and Unsupervised Distillation* The paper proposes a system for diagnosis prediction that combines: - structured reasoning traces - hierarchical reward signals aligned with ICD code structure - reinforcement learning for reasoning optimization --- # Intended Use This model is intended for **research purposes**, including: - clinical reasoning experiments - ICD-10-CM code prediction research - reinforcement learning for language models - reasoning trace generation - structured prediction from clinical notes ### Out-of-Scope Use This model **must not** be used for: - medical diagnosis - clinical decision making - patient triage - automated medical coding without expert supervision - billing or compliance workflows --- # Training Methodology ## R1-Zero Training Paradigm The model follows a **Zero-stage reasoning training approach**, where reinforcement learning is applied directly to a base language model without prior supervised instruction tuning. This method encourages models to discover reasoning strategies autonomously during training, allowing behaviors such as: - chain-of-thought reasoning - self-verification - iterative reasoning refinement to emerge naturally from the reward signal. However, purely RL-trained models may also exhibit issues such as: - repetitive reasoning patterns - readability problems - mixed language outputs --- # Training Data The training task uses **clinical admission notes paired with ICD-10-CM diagnoses**, derived from de-identified electronic health record datasets such as **MIMIC-IV**. Task formulation: - **Input:** admission note describing a patient case - **Output:** reasoning trace and predicted ICD-10-CM code The model learns to infer diagnostic outcomes based on the textual description of the patient presentation. --- # Output Format The model is trained to produce structured outputs separating reasoning from the final diagnosis. ### Example ```text The patient presents with ... Symptoms and history suggest ... ... M5116 ``` The reasoning trace allows the model to explain how the diagnosis is derived from the clinical note. --- ## Evaluation Evaluation follows the methodology described in the **DeepICD-R1 paper**. Performance is typically measured using **macro-averaged F1 scores** at multiple levels of the ICD hierarchy. | Level | Description | |------|-------------| | Chapter | Broad ICD category | | Category | First three digits | | Full code | Complete ICD-10 code | Hierarchical evaluation allows partial credit when the model predicts the correct high-level diagnostic category even if the full code is incorrect. --- ## Limitations Models following the DeepICD-R1 framework share several limitations. ### Dataset limitations - Training data consists primarily of **English clinical notes** - Distribution reflects **hospital-specific patient populations** - ICD labels are **highly imbalanced**, affecting rare diagnoses ### Model limitations - Reasoning traces may appear convincing while being incorrect - Predictions may fail for rare or long-tail diagnoses - Models may demonstrate **premature diagnostic closure** - Reinforcement learning signals are only proxies for expert feedback --- ## Ethical Considerations This model is trained on **de-identified clinical data** and intended strictly for research. Potential risks include: - propagation of dataset biases - overconfidence in generated reasoning - misuse in clinical decision making Appropriate safeguards include: - expert oversight - dataset bias evaluation - fairness audits - controlled deployment environments --- ## Hardware and Training Setup Typical training configuration for models in this family includes: - **GPUs:** multi-GPU training (4–8 GPUs) - **Precision:** bfloat16 - **Rollout engine:** vLLM - **Training framework:** VERL PPO/GRPO trainer - **Sampling:** multiple rollouts per prompt --- ## Usage ### Transformers Example ```python from transformers import AutoTokenizer, AutoModelForCausalLM model_id = "DATEXIS/DeepICD-R1-zero-32B" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", torch_dtype="auto" ) prompt = """ You are a clinical reasoning model. Given the following admission note, produce reasoning in tags and a final ICD-10 diagnosis in tags. [ADMISSION NOTE] """ inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate( **inputs, max_new_tokens=512, ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Recommended Inference Practices - Use prompts consistent with the training format. - Validate predicted ICD-10 codes against official code formats. - Always review predictions with medical experts. - Avoid exposing reasoning traces in safety-critical settings without verification. --- ## Citation If you use this model, please cite: ```bibtex @inproceedings{roehr2026deepicdr1, title={DeepICD-R1: Medical Reasoning through Hierarchical Rewards and Unsupervised Distillation}, author={R{\"o}hr, Tom and Steffek, Thomas and Teucher, Roman and Bressem, Keno and others}, booktitle={Proceedings of LREC-COLING}, year={2026} }