Final checkpoint (step 118800): wins on real K2.7 traffic, accept_len 2.345 vs lightseek 2.332
Browse files- README.md +19 -6
- model.safetensors +1 -1
README.md
CHANGED
|
@@ -25,13 +25,26 @@ K2.7-Code data. Pairs with the Kimi-K2.7-Code verifier under vLLM speculative de
|
|
| 25 |
- **Training data:** real K2.7-Code serving traffic (agentic / coding / tool, oversampled 5x)
|
| 26 |
mixed with kimi-mtp prompts re-answered by K2.7-Code.
|
| 27 |
- **Recipe:** ttt_steps=4, ttt_step_loss_decay=1.0, off-policy tokens, l2sp_lambda=1e-4,
|
| 28 |
-
cosine LR 2e-5, seq_length 8192.
|
| 29 |
|
| 30 |
## Why K2.7-native
|
| 31 |
|
| 32 |
A K2.6-teacher draft over-fit the K2.6 distribution and lost to the lightseek init on real
|
| 33 |
-
K2.7-Code traffic. Training on K2.7-native data reverses that
|
| 34 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
|
| 36 |
## Usage (vLLM)
|
| 37 |
|
|
@@ -43,6 +56,6 @@ vllm serve /path/to/Kimi-K2.7-Code \
|
|
| 43 |
|
| 44 |
## Checkpoint
|
| 45 |
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
|
|
|
| 25 |
- **Training data:** real K2.7-Code serving traffic (agentic / coding / tool, oversampled 5x)
|
| 26 |
mixed with kimi-mtp prompts re-answered by K2.7-Code.
|
| 27 |
- **Recipe:** ttt_steps=4, ttt_step_loss_decay=1.0, off-policy tokens, l2sp_lambda=1e-4,
|
| 28 |
+
cosine LR 2e-5, seq_length 8192, max_steps 120000.
|
| 29 |
|
| 30 |
## Why K2.7-native
|
| 31 |
|
| 32 |
A K2.6-teacher draft over-fit the K2.6 distribution and lost to the lightseek init on real
|
| 33 |
+
K2.7-Code traffic. Training on K2.7-native data reverses that.
|
| 34 |
+
|
| 35 |
+
## Evaluation
|
| 36 |
+
|
| 37 |
+
Final checkpoint, speculative-decoding eval against the Kimi-K2.7-Code verifier
|
| 38 |
+
(vLLM 0.20.0, TP=8, `num_speculative_tokens=3`, c=4, greedy). Mean accepted-token length:
|
| 39 |
+
|
| 40 |
+
| Draft | Real K2.7-Code traffic | K2.6-distribution held-out |
|
| 41 |
+
|---|---|---|
|
| 42 |
+
| **This model (final)** | **2.345** | 2.246 |
|
| 43 |
+
| lightseek K2.6 init | 2.332 | 2.297 |
|
| 44 |
+
|
| 45 |
+
On **real K2.7-Code traffic** this draft beats the lightseek init (2.345 vs 2.332, ~1.36x
|
| 46 |
+
end-to-end speedup over no-spec). On the K2.6 distribution the lightseek init still leads,
|
| 47 |
+
as expected — this draft is tuned for K2.7.
|
| 48 |
|
| 49 |
## Usage (vLLM)
|
| 50 |
|
|
|
|
| 56 |
|
| 57 |
## Checkpoint
|
| 58 |
|
| 59 |
+
Final checkpoint of the K2.7-native run (step 118800; val_loss had plateaued, so the run was
|
| 60 |
+
stopped just short of the 120000 budget). Best by validation full-sequence accept rate among
|
| 61 |
+
retained checkpoints, and the eval winner on real K2.7 traffic above.
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 6031210296
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:dd5a71a1027bda3116df4d1abaecc5ad6c2e1d25009508ce08924e0f15b85d2b
|
| 3 |
size 6031210296
|