datasysdev commited on
Commit
b35d44a
·
verified ·
1 Parent(s): bd29394

Upload logs/compare_all36_step750.log

Browse files
Files changed (1) hide show
  1. logs/compare_all36_step750.log +22 -0
logs/compare_all36_step750.log ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
2
+ Loading Qwen/Qwen3-4B-Instruct-2507 ...
3
+
4
+ Loaded ckpt step 750 for layers [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]
5
+ batch 1/2 done
6
+
7
+ ========================================================================
8
+ mass@K — fraction of teacher attention captured by retrieval set
9
+ raw_qk : exact top-K over head-mean-aggregated post-RoPE Q,K
10
+ learned: exact top-K over trained search projections (d=128)
11
+ ========================================================================
12
+
13
+ K method L00 L01 L02 L03 L04 L05 L06 L07 L08 L09 L10 L11 L12 L13 L14 L15 L16 L17 L18 L19 L20 L21 L22 L23 L24 L25 L26 L27 L28 L29 L30 L31 L32 L33 L34 L35 avg
14
+ 128 raw_qk 0.922 0.918 0.939 0.939 0.944 0.964 0.956 0.982 0.971 0.959 0.974 0.976 0.961 0.971 0.973 0.968 0.956 0.959 0.965 0.961 0.959 0.966 0.963 0.979 0.971 0.986 0.978 0.978 0.979 0.982 0.988 0.984 0.979 0.977 0.976 0.980 0.966
15
+ 128 learned 0.776 0.853 0.899 0.925 0.936 0.950 0.939 0.983 0.971 0.976 0.971 0.976 0.970 0.972 0.973 0.972 0.962 0.967 0.973 0.968 0.976 0.980 0.970 0.985 0.978 0.989 0.986 0.983 0.985 0.987 0.986 0.984 0.980 0.970 0.960 0.965 0.960
16
+
17
+ 256 raw_qk 0.974 0.983 0.986 0.986 0.986 0.993 0.990 0.996 0.994 0.992 0.995 0.996 0.991 0.995 0.996 0.995 0.992 0.993 0.995 0.993 0.993 0.994 0.993 0.996 0.994 0.997 0.996 0.995 0.995 0.997 0.998 0.997 0.995 0.995 0.995 0.995 0.993
18
+ 256 learned 0.924 0.961 0.966 0.977 0.982 0.987 0.981 0.996 0.993 0.995 0.992 0.995 0.993 0.993 0.994 0.994 0.992 0.994 0.995 0.993 0.996 0.997 0.994 0.997 0.996 0.998 0.997 0.997 0.997 0.998 0.997 0.997 0.995 0.992 0.990 0.989 0.990
19
+
20
+ Learned vs raw mass@K=128: 0.960 / 0.966 = 0.99×
21
+
22
+ Wrote /tmp/checkpoints_all36_d128_block/search_step_750.compare_retrieval.json