dmusingu
/

lapvqa-diffvqa

+---
+tags:
+- chest-xray
+- radiology
+- visual-question-answering
+- differential-vqa
+- mimic-cxr
+license: apache-2.0
+---
+# LAPVQA — Differential VQA (Frozen Off-the-shelf Encoders)
+Part of the [LAPVQA collection](https://huggingface.co/collections/dmusingu/lapvqa).
+## Description
+Task heads for **Differential VQA (DiffVQA)**: given a *prior* and a *current* chest X-ray,
+answer natural-language questions about radiological changes between the two studies.
+Trained on MIMIC-Diff-VQA with five **frozen** off-the-shelf vision encoders.
+## Results (test set)
+| Encoder | BLEU-1 | BLEU-4 | ROUGE-1 | RadGraph-s |
+|---|---|---|---|---|
+| CLIP ViT-L/14 | 0.184 | 0.128 | 0.336 | 0.322 |
+| CoCa | 0.196 | 0.138 | 0.320 | 0.317 |
+| Florence-2 | 0.191 | 0.138 | 0.319 | 0.318 |
+| SigLIP | 0.186 | 0.131 | 0.322 | 0.313 |
+| OWLv2 | — | — | — | — |
+## Files
+| File | Encoder backbone |
+|---|---|
+| `clip-vit-l14_best.pt` | CLIP ViT-L/14 |
+| `coca_best.pt` | CoCa |
+| `florence2_best.pt` | Florence-2 |
+| `siglip_best.pt` | SigLIP |
+| `owlv2_best.pt` | OWLv2 |