dmusingu commited on
Commit
217640a
·
verified ·
1 Parent(s): c533253

Update README with model loading code

Browse files
Files changed (1) hide show
  1. README.md +21 -13
README.md CHANGED
@@ -14,10 +14,8 @@ Part of the [LAPVQA collection](https://huggingface.co/collections/dmusingu/lapv
14
 
15
  ## Description
16
 
17
- DiffVQA models trained **end-to-end** (encoder + task head jointly fine-tuned), providing
18
- a strong upper bound compared to the frozen-encoder variant in
19
- [`lapvqa-diffvqa`](https://huggingface.co/dmusingu/lapvqa-diffvqa).
20
- MAE-ViT-L/16 is the primary encoder studied in this native setting.
21
 
22
  ## Results (test set, MAE-ViT-L/16)
23
 
@@ -25,12 +23,22 @@ MAE-ViT-L/16 is the primary encoder studied in this native setting.
25
  |---|---|---|---|
26
  | 0.472 | 0.573 | 0.288 | 0.938 |
27
 
28
- ## Files
29
-
30
- | File | Encoder backbone |
31
- |---|---|
32
- | `clip-vit-l14_best.pt` | CLIP ViT-L/14 (fine-tuned) |
33
- | `coca_best.pt` | CoCa (fine-tuned) |
34
- | `florence2_best.pt` | Florence-2 (fine-tuned) |
35
- | `mae-vit-l16_best.pt` | MAE ViT-L/16 (fine-tuned) |
36
- | `siglip_best.pt` | SigLIP (fine-tuned) |
 
 
 
 
 
 
 
 
 
 
 
14
 
15
  ## Description
16
 
17
+ DiffVQA models trained **end-to-end** (encoder + head jointly). Each `.pt` file
18
+ is a plain state dict of `DiffVQAHead`. MAE-ViT-L/16 is the primary encoder studied.
 
 
19
 
20
  ## Results (test set, MAE-ViT-L/16)
21
 
 
23
  |---|---|---|---|
24
  | 0.472 | 0.573 | 0.288 | 0.938 |
25
 
26
+ | File | Encoder | vis_dim |
27
+ |---|---|---|
28
+ | `clip-vit-l14_best.pt` | CLIP ViT-L/14 | 1024 |
29
+ | `coca_best.pt` | CoCa | 768 |
30
+ | `florence2_best.pt` | Florence-2 | 1024 |
31
+ | `mae-vit-l16_best.pt` | MAE ViT-L/16 | 1024 |
32
+ | `siglip_best.pt` | SigLIP | 1152 |
33
+
34
+ ## Loading
35
+
36
+ ```python
37
+ import torch
38
+ from lapvqa.diffvqa.model import DiffVQAHead
39
+
40
+ ckpt = torch.load("mae-vit-l16_best.pt", map_location="cpu")
41
+ head = DiffVQAHead(vis_dim=1024)
42
+ head.load_state_dict(ckpt)
43
+ head.eval()
44
+ ```