Update README.md

e10d740 verified 8 days ago

6.9 kB

	---
	language:
	- en
	license: other
	pipeline_tag: text-generation
	tags:
	- clinical-nlp
	- medical-coding
	- icd10
	- icd-10-cm
	- reasoning
	- reinforcement-learning
	- grpo
	- healthcare
	base_model:
	- Qwen/Qwen2.5-32B-Instruct
	---

	# DeepICD-R1-zero-32B

	## Model Summary

	DeepICD-R1-zero-32B is a clinical reasoning model designed for ICD-10-CM diagnosis outcome prediction from admission notes.
	It follows the DeepICD-R1 framework, which treats diagnosis prediction as a reasoning task optimized with reinforcement learning and structured reward signals.

	This checkpoint corresponds to a “R1-Zero” style model, meaning it was trained primarily through reinforcement learning without a supervised fine-tuning (SFT) initialization, allowing reasoning behaviors to emerge directly from reward optimization.

	The approach is inspired by reasoning-focused training pipelines where reinforcement learning alone can induce structured reasoning behaviors and self-verification in large language models.

	---

	# Model Details

	- Model name: DeepICD-R1-zero-32B
	- Organization: DATEXIS
	- Model size: ~32B parameters
	- Task: Single ICD-10-CM diagnosis prediction from clinical text
	- Training paradigm: Reinforcement learning (GRPO-style)
	- Framework: VERL reinforcement learning trainer
	- Domain: Clinical NLP / medical reasoning

	### Related Research

	This model follows the DeepICD-R1 framework introduced in:

	> DeepICD-R1: Medical Reasoning through Hierarchical Rewards and Unsupervised Distillation

	The paper proposes a system for diagnosis prediction that combines:

	- structured reasoning traces
	- hierarchical reward signals aligned with ICD code structure
	- reinforcement learning for reasoning optimization

	---

	# Intended Use

	This model is intended for research purposes, including:

	- clinical reasoning experiments
	- ICD-10-CM code prediction research
	- reinforcement learning for language models
	- reasoning trace generation
	- structured prediction from clinical notes

	### Out-of-Scope Use

	This model must not be used for:

	- medical diagnosis
	- clinical decision making
	- patient triage
	- automated medical coding without expert supervision
	- billing or compliance workflows

	---

	# Training Methodology

	## R1-Zero Training Paradigm

	The model follows a Zero-stage reasoning training approach, where reinforcement learning is applied directly to a base language model without prior supervised instruction tuning.

	This method encourages models to discover reasoning strategies autonomously during training, allowing behaviors such as:

	- chain-of-thought reasoning
	- self-verification
	- iterative reasoning refinement

	to emerge naturally from the reward signal.

	However, purely RL-trained models may also exhibit issues such as:

	- repetitive reasoning patterns
	- readability problems
	- mixed language outputs

	---

	# Training Data

	The training task uses clinical admission notes paired with ICD-10-CM diagnoses, derived from de-identified electronic health record datasets such as MIMIC-IV.

	Task formulation:

	- Input: admission note describing a patient case
	- Output: reasoning trace and predicted ICD-10-CM code

	The model learns to infer diagnostic outcomes based on the textual description of the patient presentation.

	---

	# Output Format

	The model is trained to produce structured outputs separating reasoning from the final diagnosis.

	### Example

	```text
	<think>
	The patient presents with ...
	Symptoms and history suggest ...
	...
	</think>

	<diagnosis>
	M5116
	</diagnosis>
	```
	The reasoning trace allows the model to explain how the diagnosis is derived from the clinical note.

	---

	## Evaluation

	Evaluation follows the methodology described in the DeepICD-R1 paper.

	Performance is typically measured using macro-averaged F1 scores at multiple levels of the ICD hierarchy.

	\| Level \| Description \|
	\|------\|-------------\|
	\| Chapter \| Broad ICD category \|
	\| Category \| First three digits \|
	\| Full code \| Complete ICD-10 code \|

	Hierarchical evaluation allows partial credit when the model predicts the correct high-level diagnostic category even if the full code is incorrect.

	---

	## Limitations

	Models following the DeepICD-R1 framework share several limitations.

	### Dataset limitations

	- Training data consists primarily of English clinical notes
	- Distribution reflects hospital-specific patient populations
	- ICD labels are highly imbalanced, affecting rare diagnoses

	### Model limitations

	- Reasoning traces may appear convincing while being incorrect
	- Predictions may fail for rare or long-tail diagnoses
	- Models may demonstrate premature diagnostic closure
	- Reinforcement learning signals are only proxies for expert feedback

	---

	## Ethical Considerations

	This model is trained on de-identified clinical data and intended strictly for research.

	Potential risks include:

	- propagation of dataset biases
	- overconfidence in generated reasoning
	- misuse in clinical decision making

	Appropriate safeguards include:

	- expert oversight
	- dataset bias evaluation
	- fairness audits
	- controlled deployment environments

	---

	## Hardware and Training Setup

	Typical training configuration for models in this family includes:

	- GPUs: multi-GPU training (4–8 GPUs)
	- Precision: bfloat16
	- Rollout engine: vLLM
	- Training framework: VERL PPO/GRPO trainer
	- Sampling: multiple rollouts per prompt

	---

	## Usage

	### Transformers Example

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_id = "DATEXIS/DeepICD-R1-zero-32B"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	device_map="auto",
	torch_dtype="auto"
	)

	prompt = """
	You are a clinical reasoning model.

	Given the following admission note,
	produce reasoning in <think> tags
	and a final ICD-10 diagnosis in <diagnosis> tags.

	[ADMISSION NOTE]
	"""

	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	outputs = model.generate(
	**inputs,
	max_new_tokens=512,
	)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Recommended Inference Practices

	- Use prompts consistent with the training format.
	- Validate predicted ICD-10 codes against official code formats.
	- Always review predictions with medical experts.
	- Avoid exposing reasoning traces in safety-critical settings without verification.

	---

	## Citation

	If you use this model, please cite:

	```bibtex
	@inproceedings{roehr2026deepicdr1,
	title={DeepICD-R1: Medical Reasoning through Hierarchical Rewards and Unsupervised Distillation},
	author={R{\"o}hr, Tom and Steffek, Thomas and Teucher, Roman and Bressem, Keno and others},
	booktitle={Proceedings of LREC-COLING},
	year={2026}
	}