Final checkpoint (step 118800): wins on real K2.7 traffic, accept_len 2.345 vs lightseek 2.332

Files changed (2) hide show

README.md CHANGED Viewed

@@ -25,13 +25,26 @@ K2.7-Code data. Pairs with the Kimi-K2.7-Code verifier under vLLM speculative de
 - **Training data:** real K2.7-Code serving traffic (agentic / coding / tool, oversampled 5x)
   mixed with kimi-mtp prompts re-answered by K2.7-Code.
 - **Recipe:** ttt_steps=4, ttt_step_loss_decay=1.0, off-policy tokens, l2sp_lambda=1e-4,
-  cosine LR 2e-5, seq_length 8192.
 ## Why K2.7-native
 A K2.6-teacher draft over-fit the K2.6 distribution and lost to the lightseek init on real
-K2.7-Code traffic. Training on K2.7-native data reverses that: on held-out K2.7 traffic this
-draft matches or beats the lightseek init on accepted-token length.
 ## Usage (vLLM)
@@ -43,6 +56,6 @@ vllm serve /path/to/Kimi-K2.7-Code \
 ## Checkpoint
-This is an **intermediate** checkpoint from an in-progress run (step 32400, the best by
-validation loss among retained checkpoints at upload time). It is published for evaluation;
-a final checkpoint will follow when the run reaches its step budget.

 - **Training data:** real K2.7-Code serving traffic (agentic / coding / tool, oversampled 5x)
   mixed with kimi-mtp prompts re-answered by K2.7-Code.
 - **Recipe:** ttt_steps=4, ttt_step_loss_decay=1.0, off-policy tokens, l2sp_lambda=1e-4,
+  cosine LR 2e-5, seq_length 8192, max_steps 120000.
 ## Why K2.7-native
 A K2.6-teacher draft over-fit the K2.6 distribution and lost to the lightseek init on real
+K2.7-Code traffic. Training on K2.7-native data reverses that.
+## Evaluation
+Final checkpoint, speculative-decoding eval against the Kimi-K2.7-Code verifier
+(vLLM 0.20.0, TP=8, `num_speculative_tokens=3`, c=4, greedy). Mean accepted-token length:
+| Draft | Real K2.7-Code traffic | K2.6-distribution held-out |
+|---|---|---|
+| **This model (final)** | **2.345** | 2.246 |
+| lightseek K2.6 init | 2.332 | 2.297 |
+On **real K2.7-Code traffic** this draft beats the lightseek init (2.345 vs 2.332, ~1.36x
+end-to-end speedup over no-spec). On the K2.6 distribution the lightseek init still leads,
+as expected — this draft is tuned for K2.7.
 ## Usage (vLLM)
 ## Checkpoint
+Final checkpoint of the K2.7-native run (step 118800; val_loss had plateaued, so the run was
+stopped just short of the 120000 budget). Best by validation full-sequence accept rate among
+retained checkpoints, and the eval winner on real K2.7 traffic above.

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:94d725abbe70ed552adad9cdb4e0fc0e02782a7fba4a55ae49e004598c3c03a2
 size 6031210296

 version https://git-lfs.github.com/spec/v1
+oid sha256:dd5a71a1027bda3116df4d1abaecc5ad6c2e1d25009508ce08924e0f15b85d2b
 size 6031210296