AbstractPhil commited on
Commit
f34d430
·
verified ·
1 Parent(s): 7cc3d76

Create prototype_55_output.txt

Browse files
Files changed (1) hide show
  1. prototype_55_output.txt +232 -0
prototype_55_output.txt ADDED
@@ -0,0 +1,232 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ =================================================================
2
+ RAPID PROTOTYPE v2: Differentiation-Centered Bank
3
+ =================================================================
4
+ Device: cuda
5
+
6
+ =================================================================
7
+ PHASE 0: EXTRACTION
8
+ =================================================================
9
+ Captions: 20,000
10
+
11
+ Extracting: bert...
12
+ Loading weights: 100%
13
+  199/199 [00:00<00:00, 4216.36it/s, Materializing param=pooler.dense.weight]
14
+ BertModel LOAD REPORT from: google-bert/bert-base-uncased
15
+ Key | Status | |
16
+ -------------------------------------------+------------+--+-
17
+ cls.predictions.bias | UNEXPECTED | |
18
+ cls.predictions.transform.dense.bias | UNEXPECTED | |
19
+ cls.predictions.transform.LayerNorm.weight | UNEXPECTED | |
20
+ cls.predictions.transform.LayerNorm.bias | UNEXPECTED | |
21
+ cls.seq_relationship.weight | UNEXPECTED | |
22
+ cls.predictions.transform.dense.weight | UNEXPECTED | |
23
+ cls.seq_relationship.bias | UNEXPECTED | |
24
+
25
+ Notes:
26
+ - UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
27
+ bert: 100%|██████████| 157/157 [00:23<00:00, 6.56it/s]
28
+ Shape: torch.Size([20000, 768])
29
+
30
+ Extracting: modern...
31
+ Loading weights: 100%
32
+  134/134 [00:00<00:00, 4047.07it/s, Materializing param=layers.21.mlp_norm.weight]
33
+ ModernBertModel LOAD REPORT from: answerdotai/ModernBERT-base
34
+ Key | Status | |
35
+ ------------------+------------+--+-
36
+ head.dense.weight | UNEXPECTED | |
37
+ head.norm.weight | UNEXPECTED | |
38
+ decoder.bias | UNEXPECTED | |
39
+
40
+ Notes:
41
+ - UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
42
+ modern: 100%|██████████| 157/157 [00:35<00:00, 4.39it/s]
43
+ Shape: torch.Size([20000, 768])
44
+
45
+ =================================================================
46
+ PHASE 0b: GENERALIZED PROCRUSTES ALIGNMENT (no reference bias)
47
+ =================================================================
48
+ GPA iter 1: delta=1.19668072
49
+ GPA iter 3: delta=0.00029225
50
+ GPA iter 6: delta=0.00006347
51
+ GPA iter 9: delta=0.00002718
52
+ bert : cos_after=0.8541 cos_to_mean=0.9865
53
+ modern : cos_after=0.8577 cos_to_mean=0.9867
54
+ cos(consensus, bert): 0.9867
55
+ cos(consensus, modern): 0.9868
56
+ Equidistance range: 0.0001 (should be near 0)
57
+
58
+ Measuring consensus statistics...
59
+ CV: 0.1771
60
+ Mean cos: 0.0018
61
+ Eff dim: 109.5
62
+ Spectral: [0.0343, 0.0322, 0.0275, 0.0240, 0.0222...]
63
+
64
+ =================================================================
65
+ PHASE 1: TRAIN STUDENT
66
+ =================================================================
67
+ Student: 11,269,632 params
68
+ CV target: 0.1771
69
+ E1: 2s loss=2.9588 t_acc=0.362 t_cos=0.334 v_acc=0.494 v_cos=0.503 v_cv=0.223
70
+ E2: 2s loss=1.4268 t_acc=0.761 t_cos=0.543 v_acc=0.704 v_cos=0.588 v_cv=0.212
71
+ E3: 2s loss=0.9784 t_acc=0.887 t_cos=0.604 v_acc=0.822 v_cos=0.639 v_cv=0.182
72
+ E4: 2s loss=0.7289 t_acc=0.943 t_cos=0.641 v_acc=0.912 v_cos=0.676 v_cv=0.182
73
+ E5: 2s loss=0.5807 t_acc=0.968 t_cos=0.666 v_acc=0.920 v_cos=0.686 v_cv=0.182
74
+
75
+ Student saved. v_cos=0.686, v_cv=0.182
76
+
77
+ =================================================================
78
+ PHASE 2: TRAIN ALIGNMENT BANK (student frozen)
79
+ =================================================================
80
+ Pre-encoding through frozen student...
81
+ Student embeddings: torch.Size([18000, 768])
82
+ Expert 0 (bert): rotation + whitener + mean loaded, cos_after=0.8541
83
+ Expert 1 (modern): rotation + whitener + mean loaded, cos_after=0.8577
84
+ Anchors: 512 initialized from consensus embeddings
85
+ Targets: CV=0.1771, mean_cos=0.0018
86
+ Bank: 2,921,088 params
87
+ Bank targets: CV=0.1771, mean_cos=0.0018
88
+ Calibrated disagreement (n=2000):
89
+ cross_cos: 0.0794 ± 0.0035
90
+ disagree_ratio: median=0.000000 mean=0.000000 std=0.000000
91
+ expert_cos: 1.0000 ± 0.0000
92
+
93
+ E 1: 1s loss=0.4789 v_loss=0.4172
94
+ Geometry: b_cv=0.2688 e_cv=0.1603 spread=0.03940 a_max=0.652
95
+ Experts: cos=0.794±0.006 agr=0.000092 ortho=0.000388
96
+ Disagree: x_cos=0.0740±0.0009 ratio=0.004326 preserve=0.013135 norms=0.1626
97
+
98
+ E 2: 1s loss=0.4002 v_loss=0.3818
99
+ Geometry: b_cv=0.2229 e_cv=0.1588 spread=0.02779 a_max=0.668
100
+ Experts: cos=0.807±0.006 agr=0.000007 ortho=0.000288
101
+ Disagree: x_cos=0.0805±0.0014 ratio=0.003575 preserve=0.000024 norms=0.1703
102
+
103
+ E 3: 1s loss=0.3743 v_loss=0.3625
104
+ Geometry: b_cv=0.2189 e_cv=0.1606 spread=0.02500 a_max=0.670
105
+ Experts: cos=0.835±0.005 agr=0.000005 ortho=0.000152
106
+ Disagree: x_cos=0.0774±0.0018 ratio=0.002279 preserve=0.000016 norms=0.1066
107
+
108
+ E 4: 1s loss=0.3591 v_loss=0.3615
109
+ Geometry: b_cv=0.2100 e_cv=0.1643 spread=0.02302 a_max=0.670
110
+ Experts: cos=0.822±0.005 agr=0.000003 ortho=0.000094
111
+ Disagree: x_cos=0.0781±0.0021 ratio=0.001569 preserve=0.000020 norms=0.1137
112
+
113
+ E 5: 1s loss=0.3537 v_loss=0.3665
114
+ Geometry: b_cv=0.2118 e_cv=0.1664 spread=0.02133 a_max=0.670
115
+ Experts: cos=0.815±0.006 agr=0.000002 ortho=0.000066
116
+ Disagree: x_cos=0.0765±0.0021 ratio=0.001389 preserve=0.000026 norms=0.1669
117
+
118
+ E 6: 1s loss=0.3506 v_loss=0.3527
119
+ Geometry: b_cv=0.2097 e_cv=0.1600 spread=0.02009 a_max=0.670
120
+ Experts: cos=0.829±0.005 agr=0.000003 ortho=0.000048
121
+ Disagree: x_cos=0.0846±0.0024 ratio=0.001772 preserve=0.000021 norms=0.1363
122
+
123
+ E 7: 1s loss=0.3459 v_loss=0.3502
124
+ Geometry: b_cv=0.2055 e_cv=0.1628 spread=0.01906 a_max=0.670
125
+ Experts: cos=0.759±0.007 agr=0.000004 ortho=0.000040
126
+ Disagree: x_cos=0.0774±0.0022 ratio=0.003070 preserve=0.000049 norms=0.1964
127
+
128
+ E 8: 1s loss=0.3442 v_loss=0.3479
129
+ Geometry: b_cv=0.2078 e_cv=0.1643 spread=0.01817 a_max=0.669
130
+ Experts: cos=0.745±0.007 agr=0.000003 ortho=0.000033
131
+ Disagree: x_cos=0.0782±0.0023 ratio=0.001258 preserve=0.000021 norms=0.1772
132
+
133
+ E 9: 1s loss=0.3419 v_loss=0.3451
134
+ Geometry: b_cv=0.2015 e_cv=0.1646 spread=0.01756 a_max=0.670
135
+ Experts: cos=0.767±0.006 agr=0.000007 ortho=0.000030
136
+ Disagree: x_cos=0.0823±0.0024 ratio=0.001625 preserve=0.000049 norms=0.2007
137
+
138
+ E10: 1s loss=0.3433 v_loss=0.3433
139
+ Geometry: b_cv=0.2074 e_cv=0.1594 spread=0.01746 a_max=0.669
140
+ Experts: cos=0.762±0.005 agr=0.000006 ortho=0.000026
141
+ Disagree: x_cos=0.0766±0.0018 ratio=0.001418 preserve=0.000073 norms=0.0529
142
+
143
+ E11: 1s loss=0.3392 v_loss=0.3501
144
+ Geometry: b_cv=0.2021 e_cv=0.1609 spread=0.01705 a_max=0.669
145
+ Experts: cos=0.721±0.007 agr=0.000004 ortho=0.000026
146
+ Disagree: x_cos=0.0698±0.0022 ratio=0.006405 preserve=0.000037 norms=0.1509
147
+
148
+ E12: 1s loss=0.3383 v_loss=0.3534
149
+ Geometry: b_cv=0.1983 e_cv=0.1639 spread=0.01693 a_max=0.668
150
+ Experts: cos=0.753±0.005 agr=0.000014 ortho=0.000026
151
+ Disagree: x_cos=0.0743±0.0021 ratio=0.000903 preserve=0.000076 norms=0.0763
152
+
153
+ E13: 1s loss=0.3374 v_loss=0.3398
154
+ Geometry: b_cv=0.1996 e_cv=0.1603 spread=0.01660 a_max=0.669
155
+ Experts: cos=0.714±0.006 agr=0.000004 ortho=0.000022
156
+ Disagree: x_cos=0.0791±0.0021 ratio=0.006335 preserve=0.000060 norms=0.1257
157
+
158
+ E14: 1s loss=0.3376 v_loss=0.3415
159
+ Geometry: b_cv=0.1992 e_cv=0.1657 spread=0.01647 a_max=0.669
160
+ Experts: cos=0.704±0.006 agr=0.000006 ortho=0.000022
161
+ Disagree: x_cos=0.0824±0.0021 ratio=0.006577 preserve=0.000061 norms=0.0873
162
+
163
+ E15: 1s loss=0.3372 v_loss=0.3409
164
+ Geometry: b_cv=0.2003 e_cv=0.1615 spread=0.01635 a_max=0.669
165
+ Experts: cos=0.745±0.005 agr=0.000003 ortho=0.000019
166
+ Disagree: x_cos=0.0760±0.0020 ratio=0.002660 preserve=0.000045 norms=0.0958
167
+
168
+ E16: 1s loss=0.3355 v_loss=0.3328
169
+ Geometry: b_cv=0.1990 e_cv=0.1601 spread=0.01600 a_max=0.669
170
+ Experts: cos=0.689±0.005 agr=0.000004 ortho=0.000018
171
+ Disagree: x_cos=0.0814±0.0024 ratio=0.002029 preserve=0.000042 norms=0.1414
172
+
173
+ E17: 1s loss=0.3350 v_loss=0.3432
174
+ Geometry: b_cv=0.1945 e_cv=0.1604 spread=0.01603 a_max=0.668
175
+ Experts: cos=0.751±0.003 agr=0.000028 ortho=0.000020
176
+ Disagree: x_cos=0.0825±0.0023 ratio=0.001129 preserve=0.000155 norms=0.0187
177
+
178
+ E18: 1s loss=0.3372 v_loss=0.3336
179
+ Geometry: b_cv=0.2044 e_cv=0.1605 spread=0.01590 a_max=0.668
180
+ Experts: cos=0.720±0.003 agr=0.000004 ortho=0.000022
181
+ Disagree: x_cos=0.0799±0.0020 ratio=0.002103 preserve=0.000055 norms=0.0331
182
+
183
+ E19: 1s loss=0.3326 v_loss=0.3456
184
+ Geometry: b_cv=0.1948 e_cv=0.1654 spread=0.01562 a_max=0.668
185
+ Experts: cos=0.741±0.003 agr=0.000004 ortho=0.000021
186
+ Disagree: x_cos=0.0797±0.0019 ratio=0.003153 preserve=0.000054 norms=0.0169
187
+
188
+ E20: 1s loss=0.3351 v_loss=0.3460
189
+ Geometry: b_cv=0.1992 e_cv=0.1596 spread=0.01567 a_max=0.668
190
+ Experts: cos=0.725±0.005 agr=0.000002 ortho=0.000018
191
+ Disagree: x_cos=0.0776±0.0023 ratio=0.008188 preserve=0.000053 norms=0.0326
192
+
193
+ =================================================================
194
+ PHASE 3: GEOMETRIC VERIFICATION
195
+ =================================================================
196
+ Passthrough: 1.000000 (target: 1.000)
197
+ Emb CV: 0.1635 (consensus: 0.1771)
198
+ Geo context CV: 0.1892
199
+ Geo eff_dim: 30.7 / 128
200
+ Expert cos: 0.725 ± 0.005
201
+ Anchor max cos: 0.668
202
+ Disagreement:
203
+ Cross-expert: 0.0776 ± 0.0023
204
+ Ratio: 0.008188 (target: 0.000000)
205
+ Norm spread: 0.0326
206
+
207
+ =================================================================
208
+ PHASE 4: CLASSIFIER STABILITY TEST
209
+ =================================================================
210
+
211
+ Mode Dim Train Val Gap
212
+ --------------------------------------------------
213
+ raw_768 1536 0.498 0.357 0.141
214
+ raw+diff 3072 0.567 0.475 0.092
215
+ bank_enriched 1792 0.766 0.532 0.235
216
+ bank+diff 3584 0.722 0.670 0.052
217
+ geo_explicit 6 0.326 0.363 -0.037
218
+
219
+ =================================================================
220
+ SUMMARY
221
+ =================================================================
222
+ Consensus CV: 0.1771
223
+ Consensus eff_dim:109.5
224
+ Student v_cos: 0.686
225
+ Student v_cv: 0.182
226
+ Bank params: 2,921,088
227
+ Bank geo_eff_dim: 30.7
228
+ Bank geo_cv: 0.1892
229
+
230
+ =================================================================
231
+ DONE
232
+ =================================================================