datasysdev commited on
Commit
9040a9d
·
verified ·
1 Parent(s): c313685

Add related-work positioning table

Browse files
Files changed (1) hide show
  1. README.md +21 -0
README.md CHANGED
@@ -131,6 +131,27 @@ Per-layer step-500 mass@K at K=128:
131
 
132
  The next run reserves `[0, 1, 2, 35]` and trains layers `3..34`.
133
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
134
  ## Checkpoints
135
 
136
  Important checkpoint paths in this HF repo:
 
131
 
132
  The next run reserves `[0, 1, 2, 35]` and trains layers `3..34`.
133
 
134
+ ## Positioning against related methods
135
+
136
+ The paper frames this method as closest in asymptotic shape to Reformer and
137
+ closest in practical baseline behavior to Quest.
138
+
139
+ | Method | Selection mechanism | Query-aware | Trained | Asymptotic | Exact softmax |
140
+ |---|---|---|---|---|---|
141
+ | Full attention | all keys | n/a | n/a | O(N²) | yes |
142
+ | Reformer | LSH hashing | yes | no | O(N log N) | over bucket |
143
+ | Performer | random features | n/a | no | O(N) | no |
144
+ | BigBird | window + random + global | mostly no | no | O(N) | over pattern |
145
+ | Longformer | sliding window + global | mostly no | no | O(N) | over pattern |
146
+ | NSA-style methods | block compression/selection | partial | partial | O(N²) proxy | yes |
147
+ | Quest | min/max page heuristic | yes | no | O(N) | over pages |
148
+ | This work | trained low-dim retrieval | yes | yes | O(N log N) | over retrieved set |
149
+
150
+ This is a design-positioning table, not a claim of completed production
151
+ superiority. The clean result proves the approach for the six-layer pilot; the
152
+ active all32 reserved-layer run tests whether broad near-whole-model
153
+ substitution can preserve that quality.
154
+
155
  ## Checkpoints
156
 
157
  Important checkpoint paths in this HF repo: