| tags: | |
| - chest-xray | |
| - radiology | |
| - visual-question-answering | |
| - differential-vqa | |
| - mimic-cxr | |
| license: apache-2.0 | |
| # LAPVQA — Differential VQA (Captioning-Pretrained Encoder) | |
| Part of the [LAPVQA collection](https://huggingface.co/collections/dmusingu/lapvqa). | |
| ## Description | |
| DiffVQA head trained on the frozen **LAPVQA captioning-pretrained encoder** | |
| ([`lapvqa-pretrain-captioning`](https://huggingface.co/dmusingu/lapvqa-pretrain-captioning)). | |
| Checkpoint is a plain `DiffVQAHead` state dict (vis_dim=1024). | |
| ## Results (test set) | |
| | BLEU-4 | ROUGE-2 | RadGraph-s | BERTScore F1 | | |
| |---|---|---|---| | |
| | 0.468 | 0.562 | 0.303 | 0.938 | | |
| ## Loading | |
| ```python | |
| import torch | |
| from lapvqa.diffvqa.model import DiffVQAHead | |
| ckpt = torch.load("pretrain-captioning_best.pt", map_location="cpu") | |
| head = DiffVQAHead(vis_dim=1024) | |
| head.load_state_dict(ckpt) | |
| head.eval() | |
| # pair with encoder_final.pt from lapvqa-pretrain-captioning | |
| ``` | |