datasysdev commited on
Commit
0585058
·
verified ·
1 Parent(s): 1f13233

Correct asymptotic scoring analysis

Browse files
Files changed (1) hide show
  1. artifacts/scaling_analysis.md +47 -0
artifacts/scaling_analysis.md ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Candidate-Scoring Operation Count
2
+
3
+ This is an analytic operation-count proxy, not a wall-clock benchmark.
4
+ It counts the per-query work to identify candidate keys before running
5
+ the sparse attention softmax and value multiply over the selected keys.
6
+
7
+ ## Assumptions
8
+
9
+ - Native head dimension: `d_head = 128`.
10
+ - Learned search dimension: `d_search = 128`.
11
+ - Quest page size: `page_size = 16`.
12
+ - HNSW parameters: `M = 32`, `ef_search = 64`.
13
+
14
+ Per-query scoring formulas:
15
+
16
+ - Full attention: `N * d_head = N * 128`.
17
+ - Quest: `(N / page_size) * 2 * d_head = N * 16`.
18
+ - Learned HNSW: `M * ef_search * log2(N) * d_search = 262,144 * log2(N)`.
19
+
20
+ Under these constants, the Quest/HNSW operation-count crossover is approximately `297,937` tokens.
21
+ Smaller HNSW settings move the crossover earlier; higher-recall settings move it later.
22
+
23
+ ## Table
24
+
25
+ | Context | Full ops/query | Quest ops/query | Learned HNSW ops/query | Quest / learned |
26
+ |---:|---:|---:|---:|---:|
27
+ | 4K | 512,000 | 64,000 | 3,136,759 | 0.02x |
28
+ | 8K | 1,024,000 | 128,000 | 3,398,903 | 0.04x |
29
+ | 16K | 2,048,000 | 256,000 | 3,661,047 | 0.07x |
30
+ | 32K | 4,096,000 | 512,000 | 3,923,191 | 0.13x |
31
+ | 64K | 8,192,000 | 1,024,000 | 4,185,335 | 0.24x |
32
+ | 128K | 16,384,000 | 2,048,000 | 4,447,479 | 0.46x |
33
+ | 256K | 32,768,000 | 4,096,000 | 4,709,623 | 0.87x |
34
+ | 512K | 65,536,000 | 8,192,000 | 4,971,767 | 1.65x |
35
+ | 1M | 128,000,000 | 16,000,000 | 5,224,942 | 3.06x |
36
+ | 2M | 256,000,000 | 32,000,000 | 5,487,086 | 5.83x |
37
+ | 4M | 512,000,000 | 64,000,000 | 5,749,230 | 11.13x |
38
+
39
+ ## Interpretation
40
+
41
+ Quest is cheaper than this high-recall HNSW proxy below the few-hundred-thousand-token regime.
42
+ At 1M context, Quest costs about 16M scalar ops/query while learned HNSW costs about 5.2M,
43
+ a roughly 3x operation-count advantage for learned projections.
44
+
45
+ This does not establish production wall-clock speedup. That still requires GPU-resident ANN
46
+ retrieval and decode/KV-cache integration. Memory bandwidth may further favor learned ANN at
47
+ very long context, but that is not included in this FLOP-only proxy.