File size: 930 Bytes
76d3ed2 3be30cf 76d3ed2 3be30cf 76d3ed2 3be30cf 76d3ed2 3be30cf | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | ---
tags:
- chest-xray
- radiology
- visual-question-answering
- differential-vqa
- mimic-cxr
license: apache-2.0
---
# LAPVQA — Differential VQA (Captioning-Pretrained Encoder)
Part of the [LAPVQA collection](https://huggingface.co/collections/dmusingu/lapvqa).
## Description
DiffVQA head trained on the frozen **LAPVQA captioning-pretrained encoder**
([`lapvqa-pretrain-captioning`](https://huggingface.co/dmusingu/lapvqa-pretrain-captioning)).
Checkpoint is a plain `DiffVQAHead` state dict (vis_dim=1024).
## Results (test set)
| BLEU-4 | ROUGE-2 | RadGraph-s | BERTScore F1 |
|---|---|---|---|
| 0.468 | 0.562 | 0.303 | 0.938 |
## Loading
```python
import torch
from lapvqa.diffvqa.model import DiffVQAHead
ckpt = torch.load("pretrain-captioning_best.pt", map_location="cpu")
head = DiffVQAHead(vis_dim=1024)
head.load_state_dict(ckpt)
head.eval()
# pair with encoder_final.pt from lapvqa-pretrain-captioning
```
|