| --- |
| language: |
| - en |
| license: apache-2.0 |
| base_model: emilyalsentzer/Bio_ClinicalBERT |
| tags: |
| - medical |
| - clinical |
| - ssi |
| - classification |
| - surveillance |
| - multi-label |
| metrics: |
| - accuracy |
| - f1 |
| - precision |
| - recall |
| model-index: |
| - name: SSIBERT-multiclass |
| results: |
| - task: |
| type: text-classification |
| name: Multi-Label SSI Detection |
| dataset: |
| name: Synthetic UK NHS Clinical Notes (Multi-Label) |
| type: synthetic |
| split: test |
| metrics: |
| - name: F1 (Micro) |
| type: f1 |
| value: 1.0 |
| --- |
| |
| # Model Card for Ch3DS/SSIBERT-multiclass |
|
|
| ## Model Details |
|
|
| ### Model Description |
|
|
| This model is a fine-tuned version of [Bio_ClinicalBERT](https://huggingface.co/emilyalsentzer/Bio_ClinicalBERT) designed for **multi-label classification** of postoperative clinical notes. Unlike the binary SSI model, this model identifies specific clinical indicators of infection: |
|
|
| 1. **Purulence**: Presence of pus or purulent discharge. |
| 2. **Redness**: Erythema, spreading redness, or inflammation. |
| 3. **Fever**: Pyrexia, rigors, or elevated temperature. |
| 4. **Antibiotics**: Prescription of antibiotics (treatment or prophylaxis). |
| 5. **SSI**: Overall determination of Surgical Site Infection. |
|
|
| It is tailored to **UK NHS terminology**. |
|
|
| - **Developed by:** Daryn Sutton |
| - **Model type:** Multi-Label Text Classification (BERT) |
| - **Language(s) (NLP):** English |
| - **License:** Apache 2.0 |
| - **Finetuned from model:** [emilyalsentzer/Bio_ClinicalBERT](https://huggingface.co/emilyalsentzer/Bio_ClinicalBERT) |
| - **Repository:** [https://huggingface.co/Ch3DS/SSIBERT-multiclass](https://huggingface.co/Ch3DS/SSIBERT-multiclass) |
|
|
| ### Uses |
|
|
| #### Direct Use |
|
|
| This model extracts structured data from unstructured clinical notes, allowing for more granular surveillance. |
|
|
| - **Input**: Clinical note text. |
| - **Output**: Probabilities for `[Purulence, Redness, Fever, Antibiotics, SSI]`. |
|
|
| #### Out-of-Scope Use |
|
|
| - **Diagnosis**: This is a surveillance tool, not a diagnostic device. |
| - **Non-UK Contexts**: May perform poorly on non-NHS terminology. |
|
|
| ## How to Get Started with the Model |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| import torch |
| |
| model_name = "Ch3DS/SSIBERT-multiclass" |
| tokenizer = AutoTokenizer.from_pretrained(model_name) |
| model = AutoModelForSequenceClassification.from_pretrained(model_name) |
| |
| text = "Day 3 post THR. Wound oozing pus. Patient pyrexial. Plan: Start Flucloxacillin." |
| inputs = tokenizer(text, return_tensors="pt") |
| |
| with torch.no_grad(): |
| logits = model(**inputs).logits |
| probs = torch.sigmoid(logits) |
| |
| labels = ["Purulence", "Redness", "Fever", "Antibiotics", "SSI"] |
| for i, label in enumerate(labels): |
| print(f"{label}: {probs[0][i]:.2%}") |
| ``` |
|
|
| ## Training Details |
|
|
| ### Training Data |
|
|
| - **Source**: 5 million synthetic clinical notes. |
| - **Methodology**: Generated using templates based on UK NHS terminology and the PRAISE network's surveillance definitions. |
| - **Labels**: Multi-hot encoded. |
|
|
| ### Training Procedure |
|
|
| - **Epochs**: 3 |
| - **Batch Size**: 64 |
| - **Hardware**: NVIDIA GeForce RTX 5070 Ti |
|
|
| ## Evaluation |
|
|
| Evaluated on a held-out test set of synthetic data. Achieved near-perfect performance on the synthetic distribution. |
|
|
| ## Model Card Contact |
|
|
| **Daryn Sutton** |
|
|