datasysdev commited on
Commit
66efa56
·
verified ·
1 Parent(s): 720eddc

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +24 -0
README.md CHANGED
@@ -37,6 +37,30 @@ the gather step uses dense-style tensor expansion. Compute-reduction
37
  numbers below are *algorithmic scoring reductions, not measured wall-clock
38
  speedups.*
39
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
  ## What's in this repo
41
 
42
  Per-layer linear search projections `(W_Qs, W_Ks)` of shape `[2560, 64]`,
 
37
  numbers below are *algorithmic scoring reductions, not measured wall-clock
38
  speedups.*
39
 
40
+ ## Relation to RetrievalAttention
41
+
42
+ RetrievalAttention (Liu et al., 2024) shows that **vanilla ANN over the
43
+ model's native Q, K vectors fails** because Q and K live in mismatched
44
+ distributions — they were never trained to be each other's nearest
45
+ neighbors, only to score via dot product. Their fix is at *index time*:
46
+ an attention-aware graph construction (RoarGraph-style).
47
+
48
+ This work attacks the same problem from the opposite direction. We
49
+ **train a tiny shared projection** (`W_Qs, W_Ks → R^64`) so that
50
+ `q_search` and `k_search` live in the same distribution by construction.
51
+ Off-the-shelf FAISS HNSW with default parameters then suffices.
52
+
53
+ | | Search space | Index | Trainable |
54
+ |---|---|---|---|
55
+ | Raw Q/K + vanilla ANN | original Q/K | off-the-shelf | no — fails (Q/K OOD) |
56
+ | RetrievalAttention | original Q/K | attention-aware graph | no |
57
+ | **This work** | **learned Q\_s / K\_s** | **off-the-shelf** | **yes (~2-11M params)** |
58
+
59
+ Contribution: *eliminate Q/K mismatch at index-build time via distillation,
60
+ instead of patching it at search time.* The clean validating experiment —
61
+ vanilla FAISS over raw Q/K vs. learned Q\_s/K\_s vs. exact teacher top-K
62
+ — is the next planned run.
63
+
64
  ## What's in this repo
65
 
66
  Per-layer linear search projections `(W_Qs, W_Ks)` of shape `[2560, 64]`,