File size: 1,758 Bytes
b48c69f 696cbee b48c69f 696cbee e3638a0 696cbee e3638a0 696cbee e3638a0 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | ---
license: mit
base_model: moonshotai/Kimi-K2.7-Code
tags:
- speculative-decoding
- eagle3
- eagle3-mla
- draft-model
- vllm
language:
- en
---
# Kimi-K2.7-Code Eagle3-MLA Draft
Eagle3-MLA speculative-decoding draft model for **Kimi-K2.7-Code**, trained natively on
K2.7-Code data. Pairs with the Kimi-K2.7-Code verifier under vLLM speculative decoding.
## What this is
- **Algorithm:** EAGLE-3 with MLA (multi-head latent attention), single draft decoder layer.
- **Verifier:** `Kimi-K2.7-Code` (DeepSeek-V3-class architecture; arch is identical across
K2.5 / K2.6 / K2.7). The draft reuses the verifier's frozen embedding / lm_head / norm.
- **Training data:** real K2.7-Code serving traffic (agentic / coding / tool, oversampled 5x)
mixed with kimi-mtp prompts re-answered by K2.7-Code.
- **Recipe:** ttt_steps=4, ttt_step_loss_decay=1.0, off-policy tokens, l2sp_lambda=1e-4,
cosine LR 2e-5, seq_length 8192, max_steps 120000.
## Evaluation
Final checkpoint, speculative-decoding eval against the Kimi-K2.7-Code verifier
(vLLM 0.20.0, TP=8, `num_speculative_tokens=3`, c=4, greedy). Mean accepted-token length:
| Draft | Real K2.7-Code traffic | K2.6-distribution held-out |
|---|---|---|
| **This model (final)** | **2.345** | 2.246 |
## Usage (vLLM)
```bash
vllm serve /path/to/Kimi-K2.7-Code \
--tensor-parallel-size 8 \
--speculative-config '{"model": "k-l-lambda/kimi-k2.7-code-eagle3-mla", "num_speculative_tokens": 3, "method": "eagle3"}'
```
## Checkpoint
Final checkpoint of the K2.7-native run (step 118800; val_loss had plateaued, so the run was
stopped just short of the 120000 budget). Best by validation full-sequence accept rate among
retained checkpoints, and the eval winner on real K2.7 traffic above.
|