AbstractPhil commited on
Commit
b8f685d
·
verified ·
1 Parent(s): 2c64e3d

Create prototype_5_output.txt

Browse files
Files changed (1) hide show
  1. prototype_5_output.txt +225 -0
prototype_5_output.txt ADDED
@@ -0,0 +1,225 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ =================================================================
2
+ RAPID PROTOTYPE v2: Differentiation-Centered Bank
3
+ =================================================================
4
+ Device: cuda
5
+
6
+ =================================================================
7
+ PHASE 0: EXTRACTION
8
+ =================================================================
9
+ Captions: 20,000
10
+
11
+ Extracting: bert...
12
+ Loading weights: 100%
13
+  199/199 [00:00<00:00, 4038.86it/s, Materializing param=pooler.dense.weight]
14
+ BertModel LOAD REPORT from: google-bert/bert-base-uncased
15
+ Key | Status | |
16
+ -------------------------------------------+------------+--+-
17
+ cls.predictions.bias | UNEXPECTED | |
18
+ cls.predictions.transform.dense.bias | UNEXPECTED | |
19
+ cls.predictions.transform.LayerNorm.weight | UNEXPECTED | |
20
+ cls.predictions.transform.LayerNorm.bias | UNEXPECTED | |
21
+ cls.seq_relationship.weight | UNEXPECTED | |
22
+ cls.predictions.transform.dense.weight | UNEXPECTED | |
23
+ cls.seq_relationship.bias | UNEXPECTED | |
24
+
25
+ Notes:
26
+ - UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
27
+ bert: 100%|██████████| 157/157 [00:23<00:00, 6.55it/s]
28
+ Shape: torch.Size([20000, 768])
29
+
30
+ Extracting: modern...
31
+ Loading weights: 100%
32
+  134/134 [00:00<00:00, 4016.84it/s, Materializing param=layers.21.mlp_norm.weight]
33
+ ModernBertModel LOAD REPORT from: answerdotai/ModernBERT-base
34
+ Key | Status | |
35
+ ------------------+------------+--+-
36
+ head.dense.weight | UNEXPECTED | |
37
+ head.norm.weight | UNEXPECTED | |
38
+ decoder.bias | UNEXPECTED | |
39
+
40
+ Notes:
41
+ - UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
42
+ modern: 100%|██████████| 157/157 [00:35<00:00, 4.39it/s]
43
+ Shape: torch.Size([20000, 768])
44
+
45
+ =================================================================
46
+ PHASE 0b: GENERALIZED PROCRUSTES ALIGNMENT (no reference bias)
47
+ =================================================================
48
+ GPA iter 1: delta=1.19668072
49
+ GPA iter 3: delta=0.00029225
50
+ GPA iter 6: delta=0.00006347
51
+ GPA iter 9: delta=0.00002718
52
+ bert : cos_after=0.8541 cos_to_mean=0.9865
53
+ modern : cos_after=0.8577 cos_to_mean=0.9867
54
+ cos(consensus, bert): 0.9867
55
+ cos(consensus, modern): 0.9868
56
+ Equidistance range: 0.0001 (should be near 0)
57
+
58
+ Measuring consensus statistics...
59
+ CV: 0.1771
60
+ Mean cos: 0.0018
61
+ Eff dim: 109.5
62
+ Spectral: [0.0343, 0.0322, 0.0275, 0.0240, 0.0222...]
63
+
64
+ =================================================================
65
+ PHASE 1: TRAIN STUDENT
66
+ =================================================================
67
+ Student: 11,269,632 params
68
+ CV target: 0.1771
69
+ E1: 2s loss=2.9588 t_acc=0.362 t_cos=0.334 v_acc=0.494 v_cos=0.503 v_cv=0.223
70
+ E2: 2s loss=1.4268 t_acc=0.761 t_cos=0.543 v_acc=0.704 v_cos=0.588 v_cv=0.212
71
+ E3: 2s loss=0.9784 t_acc=0.887 t_cos=0.604 v_acc=0.822 v_cos=0.639 v_cv=0.182
72
+ E4: 2s loss=0.7289 t_acc=0.943 t_cos=0.641 v_acc=0.912 v_cos=0.676 v_cv=0.182
73
+ E5: 2s loss=0.5807 t_acc=0.968 t_cos=0.666 v_acc=0.920 v_cos=0.686 v_cv=0.182
74
+
75
+ Student saved. v_cos=0.686, v_cv=0.182
76
+
77
+ =================================================================
78
+ PHASE 2: TRAIN ALIGNMENT BANK (student frozen)
79
+ =================================================================
80
+ Pre-encoding through frozen student...
81
+ Student embeddings: torch.Size([18000, 768])
82
+ Expert 0 (bert): rotation + whitener + mean loaded, cos_after=0.8541
83
+ Expert 1 (modern): rotation + whitener + mean loaded, cos_after=0.8577
84
+ Anchors: 512 initialized from consensus embeddings
85
+ Targets: CV=0.1771, mean_cos=0.0018
86
+ Bank: 2,921,088 params
87
+ Bank targets: CV=0.1771, mean_cos=0.0018
88
+ Calibrated disagreement:
89
+ cross_cos: 0.0794 ± 0.0035
90
+ disagree_ratio: 0.000000
91
+
92
+ E 1: 1s loss=0.4775 v_loss=0.4350
93
+ Geometry: b_cv=0.2673 e_cv=0.1680 spread=0.03938 a_max=0.653
94
+ Experts: cos=0.791±0.006 agr=0.000142 ortho=0.000440
95
+ Disagree: x_cos=0.0833±0.0019 ratio=0.004748 preserve=0.014587 norms=0.1814
96
+
97
+ E 2: 1s loss=0.3981 v_loss=0.3833
98
+ Geometry: b_cv=0.2230 e_cv=0.1651 spread=0.02783 a_max=0.669
99
+ Experts: cos=0.809±0.006 agr=0.000008 ortho=0.000344
100
+ Disagree: x_cos=0.0817±0.0018 ratio=0.003509 preserve=0.000024 norms=0.1719
101
+
102
+ E 3: 1s loss=0.3730 v_loss=0.3757
103
+ Geometry: b_cv=0.2162 e_cv=0.1648 spread=0.02493 a_max=0.670
104
+ Experts: cos=0.830±0.005 agr=0.000004 ortho=0.000186
105
+ Disagree: x_cos=0.0799±0.0019 ratio=0.002291 preserve=0.000013 norms=0.1513
106
+
107
+ E 4: 1s loss=0.3623 v_loss=0.3708
108
+ Geometry: b_cv=0.2187 e_cv=0.1615 spread=0.02314 a_max=0.670
109
+ Experts: cos=0.832±0.005 agr=0.000003 ortho=0.000115
110
+ Disagree: x_cos=0.0793��0.0020 ratio=0.003285 preserve=0.000011 norms=0.1422
111
+
112
+ E 5: 1s loss=0.3554 v_loss=0.3539
113
+ Geometry: b_cv=0.2141 e_cv=0.1621 spread=0.02139 a_max=0.669
114
+ Experts: cos=0.853±0.004 agr=0.000002 ortho=0.000079
115
+ Disagree: x_cos=0.0781±0.0021 ratio=0.001270 preserve=0.000011 norms=0.0980
116
+
117
+ E 6: 1s loss=0.3507 v_loss=0.3571
118
+ Geometry: b_cv=0.2124 e_cv=0.1633 spread=0.02019 a_max=0.669
119
+ Experts: cos=0.829±0.005 agr=0.000001 ortho=0.000058
120
+ Disagree: x_cos=0.0788±0.0022 ratio=0.001736 preserve=0.000010 norms=0.1789
121
+
122
+ E 7: 1s loss=0.3460 v_loss=0.3465
123
+ Geometry: b_cv=0.2059 e_cv=0.1607 spread=0.01903 a_max=0.669
124
+ Experts: cos=0.845±0.005 agr=0.000001 ortho=0.000045
125
+ Disagree: x_cos=0.0819±0.0023 ratio=0.001425 preserve=0.000008 norms=0.1536
126
+
127
+ E 8: 1s loss=0.3449 v_loss=0.3421
128
+ Geometry: b_cv=0.2060 e_cv=0.1592 spread=0.01841 a_max=0.670
129
+ Experts: cos=0.833±0.005 agr=0.000003 ortho=0.000035
130
+ Disagree: x_cos=0.0885±0.0021 ratio=0.001539 preserve=0.000017 norms=0.1313
131
+
132
+ E 9: 1s loss=0.3422 v_loss=0.3451
133
+ Geometry: b_cv=0.2040 e_cv=0.1626 spread=0.01793 a_max=0.669
134
+ Experts: cos=0.822±0.005 agr=0.000003 ortho=0.000031
135
+ Disagree: x_cos=0.0761±0.0024 ratio=0.001610 preserve=0.000037 norms=0.2032
136
+
137
+ E10: 1s loss=0.3416 v_loss=0.3497
138
+ Geometry: b_cv=0.2077 e_cv=0.1647 spread=0.01735 a_max=0.669
139
+ Experts: cos=0.782±0.007 agr=0.000003 ortho=0.000029
140
+ Disagree: x_cos=0.0825±0.0023 ratio=0.004691 preserve=0.000025 norms=0.2039
141
+
142
+ E11: 1s loss=0.3387 v_loss=0.3507
143
+ Geometry: b_cv=0.2019 e_cv=0.1640 spread=0.01701 a_max=0.668
144
+ Experts: cos=0.811±0.005 agr=0.000002 ortho=0.000024
145
+ Disagree: x_cos=0.0780±0.0023 ratio=0.000957 preserve=0.000015 norms=0.1889
146
+
147
+ E12: 1s loss=0.3391 v_loss=0.3381
148
+ Geometry: b_cv=0.2006 e_cv=0.1588 spread=0.01675 a_max=0.668
149
+ Experts: cos=0.778±0.006 agr=0.000003 ortho=0.000021
150
+ Disagree: x_cos=0.0729±0.0021 ratio=0.001148 preserve=0.000024 norms=0.1404
151
+
152
+ E13: 1s loss=0.3373 v_loss=0.3434
153
+ Geometry: b_cv=0.1987 e_cv=0.1635 spread=0.01671 a_max=0.668
154
+ Experts: cos=0.703±0.007 agr=0.000013 ortho=0.000021
155
+ Disagree: x_cos=0.0680±0.0026 ratio=0.003978 preserve=0.000085 norms=0.2265
156
+
157
+ E14: 1s loss=0.3383 v_loss=0.3351
158
+ Geometry: b_cv=0.2027 e_cv=0.1658 spread=0.01634 a_max=0.668
159
+ Experts: cos=0.779±0.005 agr=0.000007 ortho=0.000024
160
+ Disagree: x_cos=0.0849±0.0022 ratio=0.002337 preserve=0.000085 norms=0.1472
161
+
162
+ E15: 1s loss=0.3366 v_loss=0.3357
163
+ Geometry: b_cv=0.1999 e_cv=0.1612 spread=0.01584 a_max=0.668
164
+ Experts: cos=0.671±0.008 agr=0.000008 ortho=0.000023
165
+ Disagree: x_cos=0.0777±0.0024 ratio=0.011179 preserve=0.000061 norms=0.1758
166
+
167
+ E16: 1s loss=0.3363 v_loss=0.3467
168
+ Geometry: b_cv=0.1983 e_cv=0.1612 spread=0.01575 a_max=0.668
169
+ Experts: cos=0.737±0.005 agr=0.000010 ortho=0.000022
170
+ Disagree: x_cos=0.0839±0.0022 ratio=0.006047 preserve=0.000049 norms=0.1216
171
+
172
+ E17: 1s loss=0.3343 v_loss=0.3376
173
+ Geometry: b_cv=0.1974 e_cv=0.1655 spread=0.01591 a_max=0.668
174
+ Experts: cos=0.718±0.005 agr=0.000002 ortho=0.000020
175
+ Disagree: x_cos=0.0723±0.0023 ratio=0.002539 preserve=0.000042 norms=0.0947
176
+
177
+ E18: 1s loss=0.3354 v_loss=0.3457
178
+ Geometry: b_cv=0.1955 e_cv=0.1580 spread=0.01588 a_max=0.668
179
+ Experts: cos=0.763±0.005 agr=0.000007 ortho=0.000019
180
+ Disagree: x_cos=0.0796±0.0022 ratio=0.004057 preserve=0.000069 norms=0.1001
181
+
182
+ E19: 1s loss=0.3344 v_loss=0.3313
183
+ Geometry: b_cv=0.1962 e_cv=0.1602 spread=0.01560 a_max=0.668
184
+ Experts: cos=0.687±0.005 agr=0.000005 ortho=0.000018
185
+ Disagree: x_cos=0.0862±0.0024 ratio=0.005997 preserve=0.000030 norms=0.1218
186
+
187
+ E20: 1s loss=0.3331 v_loss=0.3651
188
+ Geometry: b_cv=0.1950 e_cv=0.1631 spread=0.01556 a_max=0.668
189
+ Experts: cos=0.729±0.005 agr=0.000007 ortho=0.000018
190
+ Disagree: x_cos=0.0826±0.0021 ratio=0.006963 preserve=0.000065 norms=0.0781
191
+
192
+ =================================================================
193
+ PHASE 3: GEOMETRIC VERIFICATION
194
+ =================================================================
195
+ Passthrough: 1.000000 (target: 1.000)
196
+ Emb CV: 0.1660 (consensus: 0.1771)
197
+ Geo context CV: 0.2053
198
+ Geo eff_dim: 30.5 / 128
199
+ Expert cos: 0.729 ± 0.005
200
+ Anchor max cos: 0.668
201
+ Disagreement:
202
+ Cross-expert: 0.0826 ± 0.0021
203
+ Ratio: 0.006963 (target: 0.000000)
204
+ Norm spread: 0.0781
205
+
206
+ =================================================================
207
+ PHASE 4: CLASSIFIER STABILITY TEST
208
+ =================================================================
209
+ with_bank : train=0.746 val=0.500 gap=0.246
210
+ without_bank : train=0.490 val=0.363 gap=0.126
211
+
212
+ =================================================================
213
+ SUMMARY
214
+ =================================================================
215
+ Consensus CV: 0.1771
216
+ Consensus eff_dim:109.5
217
+ Student v_cos: 0.686
218
+ Student v_cv: 0.182
219
+ Bank params: 2,921,088
220
+ Bank geo_eff_dim: 30.5
221
+ Bank geo_cv: 0.2053
222
+
223
+ =================================================================
224
+ DONE
225
+ =================================================================