Add related-work positioning table
Browse files
README.md
CHANGED
|
@@ -131,6 +131,27 @@ Per-layer step-500 mass@K at K=128:
|
|
| 131 |
|
| 132 |
The next run reserves `[0, 1, 2, 35]` and trains layers `3..34`.
|
| 133 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 134 |
## Checkpoints
|
| 135 |
|
| 136 |
Important checkpoint paths in this HF repo:
|
|
|
|
| 131 |
|
| 132 |
The next run reserves `[0, 1, 2, 35]` and trains layers `3..34`.
|
| 133 |
|
| 134 |
+
## Positioning against related methods
|
| 135 |
+
|
| 136 |
+
The paper frames this method as closest in asymptotic shape to Reformer and
|
| 137 |
+
closest in practical baseline behavior to Quest.
|
| 138 |
+
|
| 139 |
+
| Method | Selection mechanism | Query-aware | Trained | Asymptotic | Exact softmax |
|
| 140 |
+
|---|---|---|---|---|---|
|
| 141 |
+
| Full attention | all keys | n/a | n/a | O(N²) | yes |
|
| 142 |
+
| Reformer | LSH hashing | yes | no | O(N log N) | over bucket |
|
| 143 |
+
| Performer | random features | n/a | no | O(N) | no |
|
| 144 |
+
| BigBird | window + random + global | mostly no | no | O(N) | over pattern |
|
| 145 |
+
| Longformer | sliding window + global | mostly no | no | O(N) | over pattern |
|
| 146 |
+
| NSA-style methods | block compression/selection | partial | partial | O(N²) proxy | yes |
|
| 147 |
+
| Quest | min/max page heuristic | yes | no | O(N) | over pages |
|
| 148 |
+
| This work | trained low-dim retrieval | yes | yes | O(N log N) | over retrieved set |
|
| 149 |
+
|
| 150 |
+
This is a design-positioning table, not a claim of completed production
|
| 151 |
+
superiority. The clean result proves the approach for the six-layer pilot; the
|
| 152 |
+
active all32 reserved-layer run tests whether broad near-whole-model
|
| 153 |
+
substitution can preserve that quality.
|
| 154 |
+
|
| 155 |
## Checkpoints
|
| 156 |
|
| 157 |
Important checkpoint paths in this HF repo:
|