Update logs/k_sweep_all32_step1000_exact.log
Browse files
logs/k_sweep_all32_step1000_exact.log
ADDED
|
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
|
| 2 |
+
Loading base model Qwen/Qwen3-4B-Instruct-2507 ...
|
| 3 |
+
|
| 4 |
+
Loaded ckpt step 1000
|
| 5 |
+
Pre-running teacher captures...
|
| 6 |
+
Computing full-attention PPL...
|
| 7 |
+
ppl_full = 20.5349
|
| 8 |
+
|
| 9 |
+
=== K = 16 ===
|
| 10 |
+
mass_avg = 0.5457 recall_avg = 0.5178 ppl_ann = 24.8603 ppl_gap = +21.064%
|
| 11 |
+
|
| 12 |
+
=== K = 32 ===
|
| 13 |
+
mass_avg = 0.6270 recall_avg = 0.5719 ppl_ann = 21.8537 ppl_gap = +6.422%
|
| 14 |
+
|
| 15 |
+
=== K = 64 ===
|
| 16 |
+
mass_avg = 0.7224 recall_avg = 0.6515 ppl_ann = 20.9403 ppl_gap = +1.974%
|
| 17 |
+
|
| 18 |
+
=== K = 128 ===
|
| 19 |
+
mass_avg = 0.8071 recall_avg = 0.7456 ppl_ann = 20.6561 ppl_gap = +0.590%
|
| 20 |
+
|
| 21 |
+
=== K = 256 ===
|
| 22 |
+
mass_avg = 0.9024 recall_avg = 0.8756 ppl_ann = 20.5223 ppl_gap = -0.062%
|
| 23 |
+
|
| 24 |
+
=== K = 512 ===
|
| 25 |
+
mass_avg = 0.0000 recall_avg = 0.0000 ppl_ann = 20.5214 ppl_gap = -0.066%
|
| 26 |
+
|
| 27 |
+
Wrote /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_1000.k_sweep_exact.json
|