Update logs/compare_all32_step1000.log
Browse files
logs/compare_all32_step1000.log
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Loading Qwen/Qwen3-4B-Instruct-2507 ...
|
| 2 |
+
|
| 3 |
+
Loaded ckpt step 1000 for layers [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34]
|
| 4 |
+
batch 1/2 done
|
| 5 |
+
|
| 6 |
+
========================================================================
|
| 7 |
+
mass@K — fraction of teacher attention captured by retrieval set
|
| 8 |
+
raw_qk : exact top-K over head-mean-aggregated post-RoPE Q,K
|
| 9 |
+
learned: exact top-K over trained search projections (d=128)
|
| 10 |
+
========================================================================
|
| 11 |
+
|
| 12 |
+
K method L03 L04 L05 L06 L07 L08 L09 L10 L11 L12 L13 L14 L15 L16 L17 L18 L19 L20 L21 L22 L23 L24 L25 L26 L27 L28 L29 L30 L31 L32 L33 L34 avg
|
| 13 |
+
128 raw_qk 0.939 0.944 0.964 0.956 0.982 0.971 0.959 0.974 0.976 0.961 0.971 0.973 0.968 0.956 0.959 0.965 0.961 0.959 0.966 0.963 0.979 0.971 0.986 0.978 0.978 0.979 0.982 0.988 0.984 0.979 0.977 0.976 0.969
|
| 14 |
+
128 learned 0.924 0.937 0.948 0.939 0.983 0.971 0.977 0.971 0.976 0.970 0.971 0.973 0.973 0.961 0.967 0.972 0.969 0.976 0.980 0.970 0.985 0.979 0.989 0.986 0.983 0.985 0.987 0.987 0.983 0.980 0.967 0.960 0.971
|
| 15 |
+
|
| 16 |
+
256 raw_qk 0.986 0.986 0.993 0.990 0.996 0.994 0.992 0.995 0.996 0.991 0.995 0.996 0.995 0.992 0.993 0.995 0.993 0.993 0.994 0.993 0.996 0.994 0.997 0.996 0.995 0.995 0.997 0.998 0.997 0.995 0.995 0.995 0.994
|
| 17 |
+
256 learned 0.977 0.982 0.986 0.981 0.996 0.993 0.995 0.992 0.995 0.993 0.993 0.994 0.995 0.991 0.994 0.995 0.994 0.996 0.997 0.994 0.997 0.996 0.998 0.997 0.997 0.997 0.997 0.997 0.996 0.995 0.991 0.990 0.993
|
| 18 |
+
|
| 19 |
+
Learned vs raw mass@K=128: 0.971 / 0.969 = 1.00×
|
| 20 |
+
|
| 21 |
+
Wrote /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_1000.compare_retrieval.json
|