k-l-lambda commited on
Commit
e3638a0
·
verified ·
1 Parent(s): 696cbee

Final checkpoint (step 118800): wins on real K2.7 traffic, accept_len 2.345 vs lightseek 2.332

Browse files
Files changed (2) hide show
  1. README.md +19 -6
  2. model.safetensors +1 -1
README.md CHANGED
@@ -25,13 +25,26 @@ K2.7-Code data. Pairs with the Kimi-K2.7-Code verifier under vLLM speculative de
25
  - **Training data:** real K2.7-Code serving traffic (agentic / coding / tool, oversampled 5x)
26
  mixed with kimi-mtp prompts re-answered by K2.7-Code.
27
  - **Recipe:** ttt_steps=4, ttt_step_loss_decay=1.0, off-policy tokens, l2sp_lambda=1e-4,
28
- cosine LR 2e-5, seq_length 8192.
29
 
30
  ## Why K2.7-native
31
 
32
  A K2.6-teacher draft over-fit the K2.6 distribution and lost to the lightseek init on real
33
- K2.7-Code traffic. Training on K2.7-native data reverses that: on held-out K2.7 traffic this
34
- draft matches or beats the lightseek init on accepted-token length.
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
  ## Usage (vLLM)
37
 
@@ -43,6 +56,6 @@ vllm serve /path/to/Kimi-K2.7-Code \
43
 
44
  ## Checkpoint
45
 
46
- This is an **intermediate** checkpoint from an in-progress run (step 32400, the best by
47
- validation loss among retained checkpoints at upload time). It is published for evaluation;
48
- a final checkpoint will follow when the run reaches its step budget.
 
25
  - **Training data:** real K2.7-Code serving traffic (agentic / coding / tool, oversampled 5x)
26
  mixed with kimi-mtp prompts re-answered by K2.7-Code.
27
  - **Recipe:** ttt_steps=4, ttt_step_loss_decay=1.0, off-policy tokens, l2sp_lambda=1e-4,
28
+ cosine LR 2e-5, seq_length 8192, max_steps 120000.
29
 
30
  ## Why K2.7-native
31
 
32
  A K2.6-teacher draft over-fit the K2.6 distribution and lost to the lightseek init on real
33
+ K2.7-Code traffic. Training on K2.7-native data reverses that.
34
+
35
+ ## Evaluation
36
+
37
+ Final checkpoint, speculative-decoding eval against the Kimi-K2.7-Code verifier
38
+ (vLLM 0.20.0, TP=8, `num_speculative_tokens=3`, c=4, greedy). Mean accepted-token length:
39
+
40
+ | Draft | Real K2.7-Code traffic | K2.6-distribution held-out |
41
+ |---|---|---|
42
+ | **This model (final)** | **2.345** | 2.246 |
43
+ | lightseek K2.6 init | 2.332 | 2.297 |
44
+
45
+ On **real K2.7-Code traffic** this draft beats the lightseek init (2.345 vs 2.332, ~1.36x
46
+ end-to-end speedup over no-spec). On the K2.6 distribution the lightseek init still leads,
47
+ as expected — this draft is tuned for K2.7.
48
 
49
  ## Usage (vLLM)
50
 
 
56
 
57
  ## Checkpoint
58
 
59
+ Final checkpoint of the K2.7-native run (step 118800; val_loss had plateaued, so the run was
60
+ stopped just short of the 120000 budget). Best by validation full-sequence accept rate among
61
+ retained checkpoints, and the eval winner on real K2.7 traffic above.
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:94d725abbe70ed552adad9cdb4e0fc0e02782a7fba4a55ae49e004598c3c03a2
3
  size 6031210296
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dd5a71a1027bda3116df4d1abaecc5ad6c2e1d25009508ce08924e0f15b85d2b
3
  size 6031210296