datasysdev commited on
Commit
dd33ae2
·
verified ·
1 Parent(s): 0989d52

Document dynamic index proxy

Browse files
Files changed (1) hide show
  1. README.md +28 -0
README.md CHANGED
@@ -268,6 +268,34 @@ about 3x in favor of learned projections. This supports the theoretical
268
  scaling claim only; production speed claims still require GPU-resident
269
  retrieval and KV-cache/decode integration.
270
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
271
  ### Compute / quality knobs (FLOP-counted)
272
 
273
  `L = 4096`. Compute reduction is the attention scoring step, `≈ L / K`.
 
268
  scaling claim only; production speed claims still require GPU-resident
269
  retrieval and KV-cache/decode integration.
270
 
271
+ ### Dynamic-index proxy
272
+
273
+ The current ANN wrapper is prefill-only (`use_cache=False`), so a true
274
+ generation-time dynamic-index benchmark still requires cache integration. As a
275
+ first capability proxy, `dynamic_index_proxy.py` splits clean block-causal eval
276
+ sequences into a prefill prefix and decode-like suffix, then compares learned
277
+ retrieval mass under two masks:
278
+
279
+ - dynamic index: suffix queries can retrieve from all same-segment prior keys;
280
+ - static index: suffix queries can retrieve from prefill keys plus a 256-token
281
+ recent local suffix window, but not older suffix keys.
282
+
283
+ On the clean d128 block-causal checkpoint, using K=128, prefill length 1024,
284
+ local window 256, and 8 eval batches:
285
+
286
+ | Setting | Teacher mass captured |
287
+ |---|---:|
288
+ | Dynamic proxy | 0.972 |
289
+ | Static proxy | 0.928 |
290
+ | Static teacher mass available | 0.954 |
291
+ | Dynamic - static | +0.044 |
292
+
293
+ Per-layer dynamic-minus-static mass ranges from +0.022 (L04) to +0.058 (L08).
294
+ This does not establish task accuracy or decode latency, but it shows that a
295
+ frozen prefill-plus-local index loses measurable teacher-attention mass on
296
+ decode-like suffix queries. The raw result is in
297
+ `artifacts/dynamic_proxy_8b.json`.
298
+
299
  ### Compute / quality knobs (FLOP-counted)
300
 
301
  `L = 4096`. Compute reduction is the attention scoring step, `≈ L / K`.