dmusingu
/

lapvqa-diffvqa-native

Visual Question Answering

differential-vqa

Model card Files Files and versions

lapvqa-diffvqa-native / README.md

dmusingu's picture

Update README with model loading code

217640a verified 19 days ago

|

History Blame Contribute Delete

1.05 kB

	---
	tags:
	- chest-xray
	- radiology
	- visual-question-answering
	- differential-vqa
	- mimic-cxr
	license: apache-2.0
	---

	# LAPVQA — Differential VQA (Native / End-to-end)

	Part of the [LAPVQA collection](https://huggingface.co/collections/dmusingu/lapvqa).

	## Description

	DiffVQA models trained end-to-end (encoder + head jointly). Each `.pt` file
	is a plain state dict of `DiffVQAHead`. MAE-ViT-L/16 is the primary encoder studied.

	## Results (test set, MAE-ViT-L/16)

	\| BLEU-4 \| ROUGE-2 \| RadGraph-s \| BERTScore F1 \|
	\|---\|---\|---\|---\|
	\| 0.472 \| 0.573 \| 0.288 \| 0.938 \|

	\| File \| Encoder \| vis_dim \|
	\|---\|---\|---\|
	\| `clip-vit-l14_best.pt` \| CLIP ViT-L/14 \| 1024 \|
	\| `coca_best.pt` \| CoCa \| 768 \|
	\| `florence2_best.pt` \| Florence-2 \| 1024 \|
	\| `mae-vit-l16_best.pt` \| MAE ViT-L/16 \| 1024 \|
	\| `siglip_best.pt` \| SigLIP \| 1152 \|

	## Loading

	```python
	import torch
	from lapvqa.diffvqa.model import DiffVQAHead

	ckpt = torch.load("mae-vit-l16_best.pt", map_location="cpu")
	head = DiffVQAHead(vis_dim=1024)
	head.load_state_dict(ckpt)
	head.eval()
	```