| --- |
| license: mit |
| base_model: moonshotai/Kimi-K2.7-Code |
| tags: |
| - speculative-decoding |
| - eagle3 |
| - eagle3-mla |
| - draft-model |
| - vllm |
| language: |
| - en |
| --- |
| |
| # Kimi-K2.7-Code Eagle3-MLA Draft |
|
|
| Eagle3-MLA speculative-decoding draft model for **Kimi-K2.7-Code**, trained natively on |
| K2.7-Code data. Pairs with the Kimi-K2.7-Code verifier under vLLM speculative decoding. |
|
|
| ## What this is |
|
|
| - **Algorithm:** EAGLE-3 with MLA (multi-head latent attention), single draft decoder layer. |
| - **Verifier:** `Kimi-K2.7-Code` (DeepSeek-V3-class architecture; arch is identical across |
| K2.5 / K2.6 / K2.7). The draft reuses the verifier's frozen embedding / lm_head / norm. |
| - **Training data:** real K2.7-Code serving traffic (agentic / coding / tool, oversampled 5x) |
| mixed with kimi-mtp prompts re-answered by K2.7-Code. |
| - **Recipe:** ttt_steps=4, ttt_step_loss_decay=1.0, off-policy tokens, l2sp_lambda=1e-4, |
| cosine LR 2e-5, seq_length 8192, max_steps 120000. |
|
|
| ## Evaluation |
|
|
| Final checkpoint, speculative-decoding eval against the Kimi-K2.7-Code verifier |
| (vLLM 0.20.0, TP=8, `num_speculative_tokens=3`, c=4, greedy). Mean accepted-token length: |
|
|
| | Draft | Real K2.7-Code traffic | K2.6-distribution held-out | |
| |---|---|---| |
| | **This model (final)** | **2.345** | 2.246 | |
|
|
| ## Usage (vLLM) |
|
|
| ```bash |
| vllm serve /path/to/Kimi-K2.7-Code \ |
| --tensor-parallel-size 8 \ |
| --speculative-config '{"model": "k-l-lambda/kimi-k2.7-code-eagle3-mla", "num_speculative_tokens": 3, "method": "eagle3"}' |
| ``` |
|
|
| ## Checkpoint |
|
|
| Final checkpoint of the K2.7-native run (step 118800; val_loss had plateaued, so the run was |
| stopped just short of the 120000 budget). Best by validation full-sequence accept rate among |
| retained checkpoints, and the eval winner on real K2.7 traffic above. |
| |