Create prototype_55_output.txt
Browse files- prototype_55_output.txt +232 -0
prototype_55_output.txt
ADDED
|
@@ -0,0 +1,232 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
=================================================================
|
| 2 |
+
RAPID PROTOTYPE v2: Differentiation-Centered Bank
|
| 3 |
+
=================================================================
|
| 4 |
+
Device: cuda
|
| 5 |
+
|
| 6 |
+
=================================================================
|
| 7 |
+
PHASE 0: EXTRACTION
|
| 8 |
+
=================================================================
|
| 9 |
+
Captions: 20,000
|
| 10 |
+
|
| 11 |
+
Extracting: bert...
|
| 12 |
+
Loading weights: 100%
|
| 13 |
+
199/199 [00:00<00:00, 4216.36it/s, Materializing param=pooler.dense.weight]
|
| 14 |
+
BertModel LOAD REPORT from: google-bert/bert-base-uncased
|
| 15 |
+
Key | Status | |
|
| 16 |
+
-------------------------------------------+------------+--+-
|
| 17 |
+
cls.predictions.bias | UNEXPECTED | |
|
| 18 |
+
cls.predictions.transform.dense.bias | UNEXPECTED | |
|
| 19 |
+
cls.predictions.transform.LayerNorm.weight | UNEXPECTED | |
|
| 20 |
+
cls.predictions.transform.LayerNorm.bias | UNEXPECTED | |
|
| 21 |
+
cls.seq_relationship.weight | UNEXPECTED | |
|
| 22 |
+
cls.predictions.transform.dense.weight | UNEXPECTED | |
|
| 23 |
+
cls.seq_relationship.bias | UNEXPECTED | |
|
| 24 |
+
|
| 25 |
+
Notes:
|
| 26 |
+
- UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
|
| 27 |
+
bert: 100%|██████████| 157/157 [00:23<00:00, 6.56it/s]
|
| 28 |
+
Shape: torch.Size([20000, 768])
|
| 29 |
+
|
| 30 |
+
Extracting: modern...
|
| 31 |
+
Loading weights: 100%
|
| 32 |
+
134/134 [00:00<00:00, 4047.07it/s, Materializing param=layers.21.mlp_norm.weight]
|
| 33 |
+
ModernBertModel LOAD REPORT from: answerdotai/ModernBERT-base
|
| 34 |
+
Key | Status | |
|
| 35 |
+
------------------+------------+--+-
|
| 36 |
+
head.dense.weight | UNEXPECTED | |
|
| 37 |
+
head.norm.weight | UNEXPECTED | |
|
| 38 |
+
decoder.bias | UNEXPECTED | |
|
| 39 |
+
|
| 40 |
+
Notes:
|
| 41 |
+
- UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
|
| 42 |
+
modern: 100%|██████████| 157/157 [00:35<00:00, 4.39it/s]
|
| 43 |
+
Shape: torch.Size([20000, 768])
|
| 44 |
+
|
| 45 |
+
=================================================================
|
| 46 |
+
PHASE 0b: GENERALIZED PROCRUSTES ALIGNMENT (no reference bias)
|
| 47 |
+
=================================================================
|
| 48 |
+
GPA iter 1: delta=1.19668072
|
| 49 |
+
GPA iter 3: delta=0.00029225
|
| 50 |
+
GPA iter 6: delta=0.00006347
|
| 51 |
+
GPA iter 9: delta=0.00002718
|
| 52 |
+
bert : cos_after=0.8541 cos_to_mean=0.9865
|
| 53 |
+
modern : cos_after=0.8577 cos_to_mean=0.9867
|
| 54 |
+
cos(consensus, bert): 0.9867
|
| 55 |
+
cos(consensus, modern): 0.9868
|
| 56 |
+
Equidistance range: 0.0001 (should be near 0)
|
| 57 |
+
|
| 58 |
+
Measuring consensus statistics...
|
| 59 |
+
CV: 0.1771
|
| 60 |
+
Mean cos: 0.0018
|
| 61 |
+
Eff dim: 109.5
|
| 62 |
+
Spectral: [0.0343, 0.0322, 0.0275, 0.0240, 0.0222...]
|
| 63 |
+
|
| 64 |
+
=================================================================
|
| 65 |
+
PHASE 1: TRAIN STUDENT
|
| 66 |
+
=================================================================
|
| 67 |
+
Student: 11,269,632 params
|
| 68 |
+
CV target: 0.1771
|
| 69 |
+
E1: 2s loss=2.9588 t_acc=0.362 t_cos=0.334 v_acc=0.494 v_cos=0.503 v_cv=0.223
|
| 70 |
+
E2: 2s loss=1.4268 t_acc=0.761 t_cos=0.543 v_acc=0.704 v_cos=0.588 v_cv=0.212
|
| 71 |
+
E3: 2s loss=0.9784 t_acc=0.887 t_cos=0.604 v_acc=0.822 v_cos=0.639 v_cv=0.182
|
| 72 |
+
E4: 2s loss=0.7289 t_acc=0.943 t_cos=0.641 v_acc=0.912 v_cos=0.676 v_cv=0.182
|
| 73 |
+
E5: 2s loss=0.5807 t_acc=0.968 t_cos=0.666 v_acc=0.920 v_cos=0.686 v_cv=0.182
|
| 74 |
+
|
| 75 |
+
Student saved. v_cos=0.686, v_cv=0.182
|
| 76 |
+
|
| 77 |
+
=================================================================
|
| 78 |
+
PHASE 2: TRAIN ALIGNMENT BANK (student frozen)
|
| 79 |
+
=================================================================
|
| 80 |
+
Pre-encoding through frozen student...
|
| 81 |
+
Student embeddings: torch.Size([18000, 768])
|
| 82 |
+
Expert 0 (bert): rotation + whitener + mean loaded, cos_after=0.8541
|
| 83 |
+
Expert 1 (modern): rotation + whitener + mean loaded, cos_after=0.8577
|
| 84 |
+
Anchors: 512 initialized from consensus embeddings
|
| 85 |
+
Targets: CV=0.1771, mean_cos=0.0018
|
| 86 |
+
Bank: 2,921,088 params
|
| 87 |
+
Bank targets: CV=0.1771, mean_cos=0.0018
|
| 88 |
+
Calibrated disagreement (n=2000):
|
| 89 |
+
cross_cos: 0.0794 ± 0.0035
|
| 90 |
+
disagree_ratio: median=0.000000 mean=0.000000 std=0.000000
|
| 91 |
+
expert_cos: 1.0000 ± 0.0000
|
| 92 |
+
|
| 93 |
+
E 1: 1s loss=0.4789 v_loss=0.4172
|
| 94 |
+
Geometry: b_cv=0.2688 e_cv=0.1603 spread=0.03940 a_max=0.652
|
| 95 |
+
Experts: cos=0.794±0.006 agr=0.000092 ortho=0.000388
|
| 96 |
+
Disagree: x_cos=0.0740±0.0009 ratio=0.004326 preserve=0.013135 norms=0.1626
|
| 97 |
+
|
| 98 |
+
E 2: 1s loss=0.4002 v_loss=0.3818
|
| 99 |
+
Geometry: b_cv=0.2229 e_cv=0.1588 spread=0.02779 a_max=0.668
|
| 100 |
+
Experts: cos=0.807±0.006 agr=0.000007 ortho=0.000288
|
| 101 |
+
Disagree: x_cos=0.0805±0.0014 ratio=0.003575 preserve=0.000024 norms=0.1703
|
| 102 |
+
|
| 103 |
+
E 3: 1s loss=0.3743 v_loss=0.3625
|
| 104 |
+
Geometry: b_cv=0.2189 e_cv=0.1606 spread=0.02500 a_max=0.670
|
| 105 |
+
Experts: cos=0.835±0.005 agr=0.000005 ortho=0.000152
|
| 106 |
+
Disagree: x_cos=0.0774±0.0018 ratio=0.002279 preserve=0.000016 norms=0.1066
|
| 107 |
+
|
| 108 |
+
E 4: 1s loss=0.3591 v_loss=0.3615
|
| 109 |
+
Geometry: b_cv=0.2100 e_cv=0.1643 spread=0.02302 a_max=0.670
|
| 110 |
+
Experts: cos=0.822±0.005 agr=0.000003 ortho=0.000094
|
| 111 |
+
Disagree: x_cos=0.0781±0.0021 ratio=0.001569 preserve=0.000020 norms=0.1137
|
| 112 |
+
|
| 113 |
+
E 5: 1s loss=0.3537 v_loss=0.3665
|
| 114 |
+
Geometry: b_cv=0.2118 e_cv=0.1664 spread=0.02133 a_max=0.670
|
| 115 |
+
Experts: cos=0.815±0.006 agr=0.000002 ortho=0.000066
|
| 116 |
+
Disagree: x_cos=0.0765±0.0021 ratio=0.001389 preserve=0.000026 norms=0.1669
|
| 117 |
+
|
| 118 |
+
E 6: 1s loss=0.3506 v_loss=0.3527
|
| 119 |
+
Geometry: b_cv=0.2097 e_cv=0.1600 spread=0.02009 a_max=0.670
|
| 120 |
+
Experts: cos=0.829±0.005 agr=0.000003 ortho=0.000048
|
| 121 |
+
Disagree: x_cos=0.0846±0.0024 ratio=0.001772 preserve=0.000021 norms=0.1363
|
| 122 |
+
|
| 123 |
+
E 7: 1s loss=0.3459 v_loss=0.3502
|
| 124 |
+
Geometry: b_cv=0.2055 e_cv=0.1628 spread=0.01906 a_max=0.670
|
| 125 |
+
Experts: cos=0.759±0.007 agr=0.000004 ortho=0.000040
|
| 126 |
+
Disagree: x_cos=0.0774±0.0022 ratio=0.003070 preserve=0.000049 norms=0.1964
|
| 127 |
+
|
| 128 |
+
E 8: 1s loss=0.3442 v_loss=0.3479
|
| 129 |
+
Geometry: b_cv=0.2078 e_cv=0.1643 spread=0.01817 a_max=0.669
|
| 130 |
+
Experts: cos=0.745±0.007 agr=0.000003 ortho=0.000033
|
| 131 |
+
Disagree: x_cos=0.0782±0.0023 ratio=0.001258 preserve=0.000021 norms=0.1772
|
| 132 |
+
|
| 133 |
+
E 9: 1s loss=0.3419 v_loss=0.3451
|
| 134 |
+
Geometry: b_cv=0.2015 e_cv=0.1646 spread=0.01756 a_max=0.670
|
| 135 |
+
Experts: cos=0.767±0.006 agr=0.000007 ortho=0.000030
|
| 136 |
+
Disagree: x_cos=0.0823±0.0024 ratio=0.001625 preserve=0.000049 norms=0.2007
|
| 137 |
+
|
| 138 |
+
E10: 1s loss=0.3433 v_loss=0.3433
|
| 139 |
+
Geometry: b_cv=0.2074 e_cv=0.1594 spread=0.01746 a_max=0.669
|
| 140 |
+
Experts: cos=0.762±0.005 agr=0.000006 ortho=0.000026
|
| 141 |
+
Disagree: x_cos=0.0766±0.0018 ratio=0.001418 preserve=0.000073 norms=0.0529
|
| 142 |
+
|
| 143 |
+
E11: 1s loss=0.3392 v_loss=0.3501
|
| 144 |
+
Geometry: b_cv=0.2021 e_cv=0.1609 spread=0.01705 a_max=0.669
|
| 145 |
+
Experts: cos=0.721±0.007 agr=0.000004 ortho=0.000026
|
| 146 |
+
Disagree: x_cos=0.0698±0.0022 ratio=0.006405 preserve=0.000037 norms=0.1509
|
| 147 |
+
|
| 148 |
+
E12: 1s loss=0.3383 v_loss=0.3534
|
| 149 |
+
Geometry: b_cv=0.1983 e_cv=0.1639 spread=0.01693 a_max=0.668
|
| 150 |
+
Experts: cos=0.753±0.005 agr=0.000014 ortho=0.000026
|
| 151 |
+
Disagree: x_cos=0.0743±0.0021 ratio=0.000903 preserve=0.000076 norms=0.0763
|
| 152 |
+
|
| 153 |
+
E13: 1s loss=0.3374 v_loss=0.3398
|
| 154 |
+
Geometry: b_cv=0.1996 e_cv=0.1603 spread=0.01660 a_max=0.669
|
| 155 |
+
Experts: cos=0.714±0.006 agr=0.000004 ortho=0.000022
|
| 156 |
+
Disagree: x_cos=0.0791±0.0021 ratio=0.006335 preserve=0.000060 norms=0.1257
|
| 157 |
+
|
| 158 |
+
E14: 1s loss=0.3376 v_loss=0.3415
|
| 159 |
+
Geometry: b_cv=0.1992 e_cv=0.1657 spread=0.01647 a_max=0.669
|
| 160 |
+
Experts: cos=0.704±0.006 agr=0.000006 ortho=0.000022
|
| 161 |
+
Disagree: x_cos=0.0824±0.0021 ratio=0.006577 preserve=0.000061 norms=0.0873
|
| 162 |
+
|
| 163 |
+
E15: 1s loss=0.3372 v_loss=0.3409
|
| 164 |
+
Geometry: b_cv=0.2003 e_cv=0.1615 spread=0.01635 a_max=0.669
|
| 165 |
+
Experts: cos=0.745±0.005 agr=0.000003 ortho=0.000019
|
| 166 |
+
Disagree: x_cos=0.0760±0.0020 ratio=0.002660 preserve=0.000045 norms=0.0958
|
| 167 |
+
|
| 168 |
+
E16: 1s loss=0.3355 v_loss=0.3328
|
| 169 |
+
Geometry: b_cv=0.1990 e_cv=0.1601 spread=0.01600 a_max=0.669
|
| 170 |
+
Experts: cos=0.689±0.005 agr=0.000004 ortho=0.000018
|
| 171 |
+
Disagree: x_cos=0.0814±0.0024 ratio=0.002029 preserve=0.000042 norms=0.1414
|
| 172 |
+
|
| 173 |
+
E17: 1s loss=0.3350 v_loss=0.3432
|
| 174 |
+
Geometry: b_cv=0.1945 e_cv=0.1604 spread=0.01603 a_max=0.668
|
| 175 |
+
Experts: cos=0.751±0.003 agr=0.000028 ortho=0.000020
|
| 176 |
+
Disagree: x_cos=0.0825±0.0023 ratio=0.001129 preserve=0.000155 norms=0.0187
|
| 177 |
+
|
| 178 |
+
E18: 1s loss=0.3372 v_loss=0.3336
|
| 179 |
+
Geometry: b_cv=0.2044 e_cv=0.1605 spread=0.01590 a_max=0.668
|
| 180 |
+
Experts: cos=0.720±0.003 agr=0.000004 ortho=0.000022
|
| 181 |
+
Disagree: x_cos=0.0799±0.0020 ratio=0.002103 preserve=0.000055 norms=0.0331
|
| 182 |
+
|
| 183 |
+
E19: 1s loss=0.3326 v_loss=0.3456
|
| 184 |
+
Geometry: b_cv=0.1948 e_cv=0.1654 spread=0.01562 a_max=0.668
|
| 185 |
+
Experts: cos=0.741±0.003 agr=0.000004 ortho=0.000021
|
| 186 |
+
Disagree: x_cos=0.0797±0.0019 ratio=0.003153 preserve=0.000054 norms=0.0169
|
| 187 |
+
|
| 188 |
+
E20: 1s loss=0.3351 v_loss=0.3460
|
| 189 |
+
Geometry: b_cv=0.1992 e_cv=0.1596 spread=0.01567 a_max=0.668
|
| 190 |
+
Experts: cos=0.725±0.005 agr=0.000002 ortho=0.000018
|
| 191 |
+
Disagree: x_cos=0.0776±0.0023 ratio=0.008188 preserve=0.000053 norms=0.0326
|
| 192 |
+
|
| 193 |
+
=================================================================
|
| 194 |
+
PHASE 3: GEOMETRIC VERIFICATION
|
| 195 |
+
=================================================================
|
| 196 |
+
Passthrough: 1.000000 (target: 1.000)
|
| 197 |
+
Emb CV: 0.1635 (consensus: 0.1771)
|
| 198 |
+
Geo context CV: 0.1892
|
| 199 |
+
Geo eff_dim: 30.7 / 128
|
| 200 |
+
Expert cos: 0.725 ± 0.005
|
| 201 |
+
Anchor max cos: 0.668
|
| 202 |
+
Disagreement:
|
| 203 |
+
Cross-expert: 0.0776 ± 0.0023
|
| 204 |
+
Ratio: 0.008188 (target: 0.000000)
|
| 205 |
+
Norm spread: 0.0326
|
| 206 |
+
|
| 207 |
+
=================================================================
|
| 208 |
+
PHASE 4: CLASSIFIER STABILITY TEST
|
| 209 |
+
=================================================================
|
| 210 |
+
|
| 211 |
+
Mode Dim Train Val Gap
|
| 212 |
+
--------------------------------------------------
|
| 213 |
+
raw_768 1536 0.498 0.357 0.141
|
| 214 |
+
raw+diff 3072 0.567 0.475 0.092
|
| 215 |
+
bank_enriched 1792 0.766 0.532 0.235
|
| 216 |
+
bank+diff 3584 0.722 0.670 0.052
|
| 217 |
+
geo_explicit 6 0.326 0.363 -0.037
|
| 218 |
+
|
| 219 |
+
=================================================================
|
| 220 |
+
SUMMARY
|
| 221 |
+
=================================================================
|
| 222 |
+
Consensus CV: 0.1771
|
| 223 |
+
Consensus eff_dim:109.5
|
| 224 |
+
Student v_cos: 0.686
|
| 225 |
+
Student v_cv: 0.182
|
| 226 |
+
Bank params: 2,921,088
|
| 227 |
+
Bank geo_eff_dim: 30.7
|
| 228 |
+
Bank geo_cv: 0.1892
|
| 229 |
+
|
| 230 |
+
=================================================================
|
| 231 |
+
DONE
|
| 232 |
+
=================================================================
|