k-l-lambda
/

kimi-k2.7-code-eagle3-mla

speculative-decoding

Model card Files Files and versions

k-l-lambda commited on 3 days ago

Commit

1458b6c

·

verified ·

1 Parent(s): e3638a0

Updated README

Files changed (1) hide show

README.md +0 -11

README.md CHANGED Viewed

@@ -21,17 +21,11 @@ K2.7-Code data. Pairs with the Kimi-K2.7-Code verifier under vLLM speculative de
 - **Algorithm:** EAGLE-3 with MLA (multi-head latent attention), single draft decoder layer.
 - **Verifier:** `Kimi-K2.7-Code` (DeepSeek-V3-class architecture; arch is identical across
   K2.5 / K2.6 / K2.7). The draft reuses the verifier's frozen embedding / lm_head / norm.
-- **Init:** lightseek K2.6 Eagle3-MLA export, then fine-tuned on K2.7-native data.
 - **Training data:** real K2.7-Code serving traffic (agentic / coding / tool, oversampled 5x)
   mixed with kimi-mtp prompts re-answered by K2.7-Code.
 - **Recipe:** ttt_steps=4, ttt_step_loss_decay=1.0, off-policy tokens, l2sp_lambda=1e-4,
   cosine LR 2e-5, seq_length 8192, max_steps 120000.
-## Why K2.7-native
-A K2.6-teacher draft over-fit the K2.6 distribution and lost to the lightseek init on real
-K2.7-Code traffic. Training on K2.7-native data reverses that.
 ## Evaluation
 Final checkpoint, speculative-decoding eval against the Kimi-K2.7-Code verifier
@@ -40,11 +34,6 @@ Final checkpoint, speculative-decoding eval against the Kimi-K2.7-Code verifier
 | Draft | Real K2.7-Code traffic | K2.6-distribution held-out |
 |---|---|---|
 | **This model (final)** | **2.345** | 2.246 |
-| lightseek K2.6 init | 2.332 | 2.297 |
-On **real K2.7-Code traffic** this draft beats the lightseek init (2.345 vs 2.332, ~1.36x
-end-to-end speedup over no-spec). On the K2.6 distribution the lightseek init still leads,
-as expected — this draft is tuned for K2.7.
 ## Usage (vLLM)

 - **Algorithm:** EAGLE-3 with MLA (multi-head latent attention), single draft decoder layer.
 - **Verifier:** `Kimi-K2.7-Code` (DeepSeek-V3-class architecture; arch is identical across
   K2.5 / K2.6 / K2.7). The draft reuses the verifier's frozen embedding / lm_head / norm.
 - **Training data:** real K2.7-Code serving traffic (agentic / coding / tool, oversampled 5x)
   mixed with kimi-mtp prompts re-answered by K2.7-Code.
 - **Recipe:** ttt_steps=4, ttt_step_loss_decay=1.0, off-policy tokens, l2sp_lambda=1e-4,
   cosine LR 2e-5, seq_length 8192, max_steps 120000.
 ## Evaluation
 Final checkpoint, speculative-decoding eval against the Kimi-K2.7-Code verifier
 | Draft | Real K2.7-Code traffic | K2.6-distribution held-out |
 |---|---|---|
 | **This model (final)** | **2.345** | 2.246 |
 ## Usage (vLLM)