https://huggingface.co/AbstractPhil/geolip-svae-transformer/blob/main/transformer_v3.py
Lens setting: "crusher"
The crusher is live, a simple manifestation with a large punch to the statistics capacity. They are essentially pulverizing the information into a more compact shape and learning using it, simple manifestation that converges like the standard SVAE. Requires a bit more work, and a couple of the math faults still exist in the model that I'm working through. The current crusher requires optimizations and tweaks to the formula to speed it up.
The problem with these particular faults is larger than simple solution, the theorems around them require too much math. The approximations require a compact solution.
The crusher isn't quite there yet. I'll post the results for a converging crusher with better optimization than the current asap.
For single the current formula snaps a 100% so that's good.
geolip-svae-transformer | lens=single ladder [4, 8, 16] D_dec=16 | void=on spectral=on(2L) | V32 ps2 | params 96,849 | cuda
trainer: adamw lr=0.001 wd=0.0001 clip=1.0 | rigid_hinge=3.0 (margin 0.25·crit) diff=0.2 (ramp 30%) recon=pure_MSE | sched=onecycle | workers=8
torch.compile ON (mode=reduce-overhead) — first step pays compile cost
[ByteTrigramDataset] Loading corpus wikitext-2-raw-v1...
[ByteTrigramDataset] Corpus: 10,938,611 bytes (10.9 MB), 768 bytes/image, 14,242 non-overlapping images available (10,937,843 valid window starts)
epoch 0: mse=0.03261 | rand= 4.36% nat= 3.00% | alpha=0.0244 | dev=0.0085 in_env=True
epoch 1: mse=0.00302 | rand=25.08% nat=23.79% | alpha=0.0253 | dev=0.0066 in_env=True
epoch 2: mse=0.00027 | rand=32.96% nat=35.06% | alpha=0.0253 | dev=0.0058 in_env=True
epoch 3: mse=0.00027 | rand=42.94% nat=44.58% | alpha=0.0252 | dev=0.0042 in_env=True
epoch 4: mse=0.00027 | rand=19.85% nat=21.32% | alpha=0.0250 | dev=0.0045 in_env=True
epoch 5: mse=0.00020 | rand= 9.62% nat=11.55% | alpha=0.0249 | dev=0.0048 in_env=True
epoch 6: mse=0.00014 | rand=34.38% nat=39.38% | alpha=0.0250 | dev=0.0040 in_env=True
epoch 7: mse=0.00011 | rand=20.90% nat=22.83% | alpha=0.0251 | dev=0.0038 in_env=True
epoch 8: mse=0.00009 | rand=49.56% nat=53.20% | alpha=0.0253 | dev=0.0039 in_env=True
epoch 9: mse=0.00008 | rand=24.78% nat=28.64% | alpha=0.0256 | dev=0.0039 in_env=True
epoch 10: mse=0.00007 | rand=43.96% nat=46.96% | alpha=0.0258 | dev=0.0041 in_env=True
epoch 11: mse=0.00006 | rand=53.64% nat=54.76% | alpha=0.0260 | dev=0.0041 in_env=True
epoch 12: mse=0.00005 | rand=47.99% nat=48.25% | alpha=0.0263 | dev=0.0039 in_env=True
epoch 13: mse=0.00004 | rand=36.85% nat=40.04% | alpha=0.0265 | dev=0.0036 in_env=True
epoch 14: mse=0.00004 | rand=65.54% nat=68.46% | alpha=0.0267 | dev=0.0035 in_env=True
epoch 15: mse=0.00003 | rand=89.39% nat=92.22% | alpha=0.0269 | dev=0.0034 in_env=True
epoch 16: mse=0.00003 | rand=42.52% nat=44.34% | alpha=0.0270 | dev=0.0033 in_env=True
epoch 17: mse=0.00002 | rand=79.55% nat=83.27% | alpha=0.0272 | dev=0.0033 in_env=True
epoch 18: mse=0.00002 | rand=66.53% nat=69.91% | alpha=0.0273 | dev=0.0032 in_env=True
epoch 19: mse=0.00002 | rand=42.65% nat=42.95% | alpha=0.0274 | dev=0.0031 in_env=True
epoch 20: mse=0.00001 | rand=94.95% nat=96.85% | alpha=0.0275 | dev=0.0030 in_env=True
epoch 21: mse=0.00001 | rand=62.79% nat=65.49% | alpha=0.0276 | dev=0.0031 in_env=True
epoch 22: mse=0.00001 | rand=97.06% nat=98.77% | alpha=0.0276 | dev=0.0031 in_env=True
epoch 23: mse=0.00001 | rand=93.82% nat=95.22% | alpha=0.0276 | dev=0.0030 in_env=True
epoch 24: mse=0.00001 | rand=97.94% nat=99.36% | alpha=0.0277 | dev=0.0029 in_env=True
epoch 25: mse=0.00000 | rand=95.35% nat=96.22% | alpha=0.0277 | dev=0.0029 in_env=True
epoch 26: mse=0.00000 | rand=98.93% nat=99.65% | alpha=0.0277 | dev=0.0029 in_env=True
epoch 27: mse=0.00000 | rand=98.65% nat=99.59% | alpha=0.0277 | dev=0.0029 in_env=True
epoch 28: mse=0.00000 | rand=99.18% nat=99.65% | alpha=0.0277 | dev=0.0028 in_env=True
epoch 29: mse=0.00000 | rand=99.27% nat=99.85% | alpha=0.0277 | dev=0.0028 in_env=True
epoch 30: mse=0.00000 | rand=99.31% nat=99.69% | alpha=0.0277 | dev=0.0028 in_env=True
epoch 31: mse=0.00000 | rand=99.43% nat=100.00% | alpha=0.0277 | dev=0.0028 in_env=True
epoch 32: mse=0.00000 | rand=99.47% nat=100.00% | alpha=0.0277 | dev=0.0028 in_env=True
epoch 33: mse=0.00000 | rand=99.51% nat=100.00% | alpha=0.0277 | dev=0.0028 in_env=True
epoch 34: mse=0.00000 | rand=99.54% nat=100.00% | alpha=0.0277 | dev=0.0028 in_env=True
epoch 35: mse=0.00000 | rand=99.56% nat=100.00% | alpha=0.0277 | dev=0.0027 in_env=True
epoch 36: mse=0.00000 | rand=99.58% nat=100.00% | alpha=0.0277 | dev=0.0027 in_env=True
epoch 37: mse=0.00000 | rand=99.59% nat=100.00% | alpha=0.0277 | dev=0.0027 in_env=True
epoch 38: mse=0.00000 | rand=99.60% nat=100.00% | alpha=0.0277 | dev=0.0027 in_env=True
epoch 39: mse=0.00000 | rand=99.60% nat=100.00% | alpha=0.0277 | dev=0.0027 in_env=True
best byte recovery: 99.60% | checkpoint: geolip_svae_transformer_results/geolip_svae_transformer.pt
adjudication (text -> model -> text):
'the cat sat on the mat' -> 'the cat sat on the mat'
'machine learning' -> 'machine learning'
'hello world' -> 'hello world'
Will need to attach unrelated data for recon testing to increase the nat difficulty. It's too easy.
Alright I hooked up Trelis/tiny-shakespeare for independent eval.