dmusingu's picture
Update README with model loading code
217640a verified
|
Raw
History Blame Contribute Delete
1.05 kB
---
tags:
- chest-xray
- radiology
- visual-question-answering
- differential-vqa
- mimic-cxr
license: apache-2.0
---
# LAPVQA — Differential VQA (Native / End-to-end)
Part of the [LAPVQA collection](https://huggingface.co/collections/dmusingu/lapvqa).
## Description
DiffVQA models trained **end-to-end** (encoder + head jointly). Each `.pt` file
is a plain state dict of `DiffVQAHead`. MAE-ViT-L/16 is the primary encoder studied.
## Results (test set, MAE-ViT-L/16)
| BLEU-4 | ROUGE-2 | RadGraph-s | BERTScore F1 |
|---|---|---|---|
| 0.472 | 0.573 | 0.288 | 0.938 |
| File | Encoder | vis_dim |
|---|---|---|
| `clip-vit-l14_best.pt` | CLIP ViT-L/14 | 1024 |
| `coca_best.pt` | CoCa | 768 |
| `florence2_best.pt` | Florence-2 | 1024 |
| `mae-vit-l16_best.pt` | MAE ViT-L/16 | 1024 |
| `siglip_best.pt` | SigLIP | 1152 |
## Loading
```python
import torch
from lapvqa.diffvqa.model import DiffVQAHead
ckpt = torch.load("mae-vit-l16_best.pt", map_location="cpu")
head = DiffVQAHead(vis_dim=1024)
head.load_state_dict(ckpt)
head.eval()
```