Update README with model loading code
Browse files
README.md
CHANGED
|
@@ -14,10 +14,8 @@ Part of the [LAPVQA collection](https://huggingface.co/collections/dmusingu/lapv
|
|
| 14 |
|
| 15 |
## Description
|
| 16 |
|
| 17 |
-
DiffVQA models trained **end-to-end** (encoder +
|
| 18 |
-
a
|
| 19 |
-
[`lapvqa-diffvqa`](https://huggingface.co/dmusingu/lapvqa-diffvqa).
|
| 20 |
-
MAE-ViT-L/16 is the primary encoder studied in this native setting.
|
| 21 |
|
| 22 |
## Results (test set, MAE-ViT-L/16)
|
| 23 |
|
|
@@ -25,12 +23,22 @@ MAE-ViT-L/16 is the primary encoder studied in this native setting.
|
|
| 25 |
|---|---|---|---|
|
| 26 |
| 0.472 | 0.573 | 0.288 | 0.938 |
|
| 27 |
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
|
| 31 |
-
|
|
| 32 |
-
| `
|
| 33 |
-
| `
|
| 34 |
-
| `
|
| 35 |
-
|
| 36 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
|
| 15 |
## Description
|
| 16 |
|
| 17 |
+
DiffVQA models trained **end-to-end** (encoder + head jointly). Each `.pt` file
|
| 18 |
+
is a plain state dict of `DiffVQAHead`. MAE-ViT-L/16 is the primary encoder studied.
|
|
|
|
|
|
|
| 19 |
|
| 20 |
## Results (test set, MAE-ViT-L/16)
|
| 21 |
|
|
|
|
| 23 |
|---|---|---|---|
|
| 24 |
| 0.472 | 0.573 | 0.288 | 0.938 |
|
| 25 |
|
| 26 |
+
| File | Encoder | vis_dim |
|
| 27 |
+
|---|---|---|
|
| 28 |
+
| `clip-vit-l14_best.pt` | CLIP ViT-L/14 | 1024 |
|
| 29 |
+
| `coca_best.pt` | CoCa | 768 |
|
| 30 |
+
| `florence2_best.pt` | Florence-2 | 1024 |
|
| 31 |
+
| `mae-vit-l16_best.pt` | MAE ViT-L/16 | 1024 |
|
| 32 |
+
| `siglip_best.pt` | SigLIP | 1152 |
|
| 33 |
+
|
| 34 |
+
## Loading
|
| 35 |
+
|
| 36 |
+
```python
|
| 37 |
+
import torch
|
| 38 |
+
from lapvqa.diffvqa.model import DiffVQAHead
|
| 39 |
+
|
| 40 |
+
ckpt = torch.load("mae-vit-l16_best.pt", map_location="cpu")
|
| 41 |
+
head = DiffVQAHead(vis_dim=1024)
|
| 42 |
+
head.load_state_dict(ckpt)
|
| 43 |
+
head.eval()
|
| 44 |
+
```
|