| tags: | |
| - chest-xray | |
| - radiology | |
| - visual-question-answering | |
| - differential-vqa | |
| - mimic-cxr | |
| license: apache-2.0 | |
| # LAPVQA — Differential VQA (Native / End-to-end) | |
| Part of the [LAPVQA collection](https://huggingface.co/collections/dmusingu/lapvqa). | |
| ## Description | |
| DiffVQA models trained **end-to-end** (encoder + head jointly). Each `.pt` file | |
| is a plain state dict of `DiffVQAHead`. MAE-ViT-L/16 is the primary encoder studied. | |
| ## Results (test set, MAE-ViT-L/16) | |
| | BLEU-4 | ROUGE-2 | RadGraph-s | BERTScore F1 | | |
| |---|---|---|---| | |
| | 0.472 | 0.573 | 0.288 | 0.938 | | |
| | File | Encoder | vis_dim | | |
| |---|---|---| | |
| | `clip-vit-l14_best.pt` | CLIP ViT-L/14 | 1024 | | |
| | `coca_best.pt` | CoCa | 768 | | |
| | `florence2_best.pt` | Florence-2 | 1024 | | |
| | `mae-vit-l16_best.pt` | MAE ViT-L/16 | 1024 | | |
| | `siglip_best.pt` | SigLIP | 1152 | | |
| ## Loading | |
| ```python | |
| import torch | |
| from lapvqa.diffvqa.model import DiffVQAHead | |
| ckpt = torch.load("mae-vit-l16_best.pt", map_location="cpu") | |
| head = DiffVQAHead(vis_dim=1024) | |
| head.load_state_dict(ckpt) | |
| head.eval() | |
| ``` | |