Updated README
Browse files
README.md
CHANGED
|
@@ -21,17 +21,11 @@ K2.7-Code data. Pairs with the Kimi-K2.7-Code verifier under vLLM speculative de
|
|
| 21 |
- **Algorithm:** EAGLE-3 with MLA (multi-head latent attention), single draft decoder layer.
|
| 22 |
- **Verifier:** `Kimi-K2.7-Code` (DeepSeek-V3-class architecture; arch is identical across
|
| 23 |
K2.5 / K2.6 / K2.7). The draft reuses the verifier's frozen embedding / lm_head / norm.
|
| 24 |
-
- **Init:** lightseek K2.6 Eagle3-MLA export, then fine-tuned on K2.7-native data.
|
| 25 |
- **Training data:** real K2.7-Code serving traffic (agentic / coding / tool, oversampled 5x)
|
| 26 |
mixed with kimi-mtp prompts re-answered by K2.7-Code.
|
| 27 |
- **Recipe:** ttt_steps=4, ttt_step_loss_decay=1.0, off-policy tokens, l2sp_lambda=1e-4,
|
| 28 |
cosine LR 2e-5, seq_length 8192, max_steps 120000.
|
| 29 |
|
| 30 |
-
## Why K2.7-native
|
| 31 |
-
|
| 32 |
-
A K2.6-teacher draft over-fit the K2.6 distribution and lost to the lightseek init on real
|
| 33 |
-
K2.7-Code traffic. Training on K2.7-native data reverses that.
|
| 34 |
-
|
| 35 |
## Evaluation
|
| 36 |
|
| 37 |
Final checkpoint, speculative-decoding eval against the Kimi-K2.7-Code verifier
|
|
@@ -40,11 +34,6 @@ Final checkpoint, speculative-decoding eval against the Kimi-K2.7-Code verifier
|
|
| 40 |
| Draft | Real K2.7-Code traffic | K2.6-distribution held-out |
|
| 41 |
|---|---|---|
|
| 42 |
| **This model (final)** | **2.345** | 2.246 |
|
| 43 |
-
| lightseek K2.6 init | 2.332 | 2.297 |
|
| 44 |
-
|
| 45 |
-
On **real K2.7-Code traffic** this draft beats the lightseek init (2.345 vs 2.332, ~1.36x
|
| 46 |
-
end-to-end speedup over no-spec). On the K2.6 distribution the lightseek init still leads,
|
| 47 |
-
as expected — this draft is tuned for K2.7.
|
| 48 |
|
| 49 |
## Usage (vLLM)
|
| 50 |
|
|
|
|
| 21 |
- **Algorithm:** EAGLE-3 with MLA (multi-head latent attention), single draft decoder layer.
|
| 22 |
- **Verifier:** `Kimi-K2.7-Code` (DeepSeek-V3-class architecture; arch is identical across
|
| 23 |
K2.5 / K2.6 / K2.7). The draft reuses the verifier's frozen embedding / lm_head / norm.
|
|
|
|
| 24 |
- **Training data:** real K2.7-Code serving traffic (agentic / coding / tool, oversampled 5x)
|
| 25 |
mixed with kimi-mtp prompts re-answered by K2.7-Code.
|
| 26 |
- **Recipe:** ttt_steps=4, ttt_step_loss_decay=1.0, off-policy tokens, l2sp_lambda=1e-4,
|
| 27 |
cosine LR 2e-5, seq_length 8192, max_steps 120000.
|
| 28 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
## Evaluation
|
| 30 |
|
| 31 |
Final checkpoint, speculative-decoding eval against the Kimi-K2.7-Code verifier
|
|
|
|
| 34 |
| Draft | Real K2.7-Code traffic | K2.6-distribution held-out |
|
| 35 |
|---|---|---|
|
| 36 |
| **This model (final)** | **2.345** | 2.246 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
## Usage (vLLM)
|
| 39 |
|