Document dynamic index proxy
Browse files
README.md
CHANGED
|
@@ -268,6 +268,34 @@ about 3x in favor of learned projections. This supports the theoretical
|
|
| 268 |
scaling claim only; production speed claims still require GPU-resident
|
| 269 |
retrieval and KV-cache/decode integration.
|
| 270 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 271 |
### Compute / quality knobs (FLOP-counted)
|
| 272 |
|
| 273 |
`L = 4096`. Compute reduction is the attention scoring step, `≈ L / K`.
|
|
|
|
| 268 |
scaling claim only; production speed claims still require GPU-resident
|
| 269 |
retrieval and KV-cache/decode integration.
|
| 270 |
|
| 271 |
+
### Dynamic-index proxy
|
| 272 |
+
|
| 273 |
+
The current ANN wrapper is prefill-only (`use_cache=False`), so a true
|
| 274 |
+
generation-time dynamic-index benchmark still requires cache integration. As a
|
| 275 |
+
first capability proxy, `dynamic_index_proxy.py` splits clean block-causal eval
|
| 276 |
+
sequences into a prefill prefix and decode-like suffix, then compares learned
|
| 277 |
+
retrieval mass under two masks:
|
| 278 |
+
|
| 279 |
+
- dynamic index: suffix queries can retrieve from all same-segment prior keys;
|
| 280 |
+
- static index: suffix queries can retrieve from prefill keys plus a 256-token
|
| 281 |
+
recent local suffix window, but not older suffix keys.
|
| 282 |
+
|
| 283 |
+
On the clean d128 block-causal checkpoint, using K=128, prefill length 1024,
|
| 284 |
+
local window 256, and 8 eval batches:
|
| 285 |
+
|
| 286 |
+
| Setting | Teacher mass captured |
|
| 287 |
+
|---|---:|
|
| 288 |
+
| Dynamic proxy | 0.972 |
|
| 289 |
+
| Static proxy | 0.928 |
|
| 290 |
+
| Static teacher mass available | 0.954 |
|
| 291 |
+
| Dynamic - static | +0.044 |
|
| 292 |
+
|
| 293 |
+
Per-layer dynamic-minus-static mass ranges from +0.022 (L04) to +0.058 (L08).
|
| 294 |
+
This does not establish task accuracy or decode latency, but it shows that a
|
| 295 |
+
frozen prefill-plus-local index loses measurable teacher-attention mass on
|
| 296 |
+
decode-like suffix queries. The raw result is in
|
| 297 |
+
`artifacts/dynamic_proxy_8b.json`.
|
| 298 |
+
|
| 299 |
### Compute / quality knobs (FLOP-counted)
|
| 300 |
|
| 301 |
`L = 4096`. Compute reduction is the attention scoring step, `≈ L / K`.
|