Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
AbstractPhil 
posted an update 3 days ago
Post
777
The transformer prototype v2 is operational, which takes the behavior of the H2 battery and directly forces a projected rigid behavior into a multiscale structure. Turns roughly 57k params to around 90k params for the preliminary version, and with this behavior the model converges SEMI-CLOSE to the SVAE current spectrum in considerably less epochs. So stay tuned on that one, the transformer did converge. The behavior itself is validated and convergent in the H2 protocol spectrum.

The transformer operates with the "single" setting.

AbstractPhil/geolip-svae-transformer

I've implanted a rigid formula that allows this direct behavior from the H2 battery to superimpose onto adjacent structural boundaries, and with that built aleph and void into the system as well. These are guarantees.


As for the centrifuge concept. The optimization on the centrifuge was quite lackluster. The hardware doesn't support such behavior. You can access the current operating version of the centrifuge by utilizing "stacked" configuration. Four lenses was too much when running a quaternion bank to handle such complex interactions reasonably, so I will need to work something out in the future to get a full centrifuge system working.

Crusher is ready, transformer_v3.

You might be curious WHY these converge at such low raw MSE in the later stages. The reasoning is kind of difficult to explain, so I'll try to make it simple. The direction is very subtle in the later stages of training with AdamW, so the curves start to create much more accurate shifts towards the goals. This allows the model to rapidly converge after earlier heavier training. You can't simply train it low, it takes too long. This allows the model to KIND OF get everything NEAR where it's supposed to be, which allows the really small twitches of MSE to provide massive corrections without needing hard logits or more difficult to finetune features.

Should work, it's a little unstable still but it gets there.

The explicit rigidity is converging into implicit behavior, and with that the model can recon random noise and the actual behavior of the byte math.

Multiscale lensed bytewise transformation through a rigid geometric lens.

geolip-svae-transformer | lens=single ladder [4, 8, 16] D_dec=16 | void=on spectral=on(2L) | V32 ps2 | params 96,849 | cuda
  trainer: AdamW lr=0.0001 clip=1.0 | rigid_hinge=3.0 (margin 0.25·crit) diff=0.2 (ramp 30%) recon=pure_MSE | OneCycle | workers=4
  [ByteTrigramDataset] Loading corpus wikitext-2-raw-v1...
  [ByteTrigramDataset] Corpus: 10,938,611 bytes (10.9 MB), 768 bytes/image, 14,242 non-overlapping images available (10,937,843 valid window starts)
  epoch  0: mse=0.10583 | rand= 1.39% nat= 0.79% | alpha=0.0239 | dev=0.0143 in_env=True
  epoch  1: mse=0.03172 | rand= 2.63% nat= 1.21% | alpha=0.0240 | dev=0.0139 in_env=True
  epoch  2: mse=0.01195 | rand= 4.19% nat= 4.27% | alpha=0.0244 | dev=0.0132 in_env=True
  epoch  3: mse=0.00322 | rand=16.88% nat=15.48% | alpha=0.0249 | dev=0.0071 in_env=True
  epoch  4: mse=0.00045 | rand=27.95% nat=29.86% | alpha=0.0251 | dev=0.0059 in_env=True
  epoch  5: mse=0.00021 | rand=37.33% nat=37.96% | alpha=0.0252 | dev=0.0058 in_env=True
  epoch  6: mse=0.00012 | rand=38.58% nat=38.66% | alpha=0.0253 | dev=0.0055 in_env=True
  epoch  7: mse=0.00009 | rand=41.14% nat=40.70% | alpha=0.0253 | dev=0.0054 in_env=True
  epoch  8: mse=0.00006 | rand=54.71% nat=55.22% | alpha=0.0254 | dev=0.0052 in_env=True
  epoch  9: mse=0.00005 | rand=66.73% nat=67.98% | alpha=0.0254 | dev=0.0050 in_env=True
  epoch 10: mse=0.00004 | rand=74.02% nat=74.04% | alpha=0.0254 | dev=0.0050 in_env=True
  epoch 11: mse=0.00003 | rand=72.36% nat=72.56% | alpha=0.0254 | dev=0.0049 in_env=True
  epoch 12: mse=0.00003 | rand=68.90% nat=67.53% | alpha=0.0254 | dev=0.0048 in_env=True
  epoch 13: mse=0.00003 | rand=76.69% nat=77.38% | alpha=0.0254 | dev=0.0046 in_env=True
  epoch 14: mse=0.00002 | rand=73.02% nat=72.70% | alpha=0.0254 | dev=0.0045 in_env=True
  epoch 15: mse=0.00002 | rand=70.31% nat=69.88% | alpha=0.0254 | dev=0.0044 in_env=True
  epoch 16: mse=0.00002 | rand=79.93% nat=77.46% | alpha=0.0254 | dev=0.0043 in_env=True
  epoch 17: mse=0.00002 | rand=76.01% nat=73.98% | alpha=0.0255 | dev=0.0048 in_env=True
  epoch 18: mse=0.00002 | rand=83.62% nat=82.27% | alpha=0.0255 | dev=0.0055 in_env=True
  epoch 19: mse=0.00001 | rand=87.61% nat=88.60% | alpha=0.0255 | dev=0.0051 in_env=True
  epoch 20: mse=0.00001 | rand=87.44% nat=88.66% | alpha=0.0255 | dev=0.0049 in_env=True
  epoch 21: mse=0.00001 | rand=89.98% nat=87.82% | alpha=0.0255 | dev=0.0047 in_env=True
  epoch 22: mse=0.00001 | rand=91.60% nat=89.77% | alpha=0.0255 | dev=0.0046 in_env=True
  epoch 23: mse=0.00001 | rand=92.70% nat=91.50% | alpha=0.0255 | dev=0.0045 in_env=True
  epoch 24: mse=0.00001 | rand=92.76% nat=91.87% | alpha=0.0255 | dev=0.0045 in_env=True
  epoch 25: mse=0.00001 | rand=93.11% nat=91.68% | alpha=0.0255 | dev=0.0044 in_env=True
  epoch 26: mse=0.00001 | rand=94.42% nat=94.23% | alpha=0.0255 | dev=0.0044 in_env=True
  epoch 27: mse=0.00001 | rand=94.46% nat=93.55% | alpha=0.0255 | dev=0.0043 in_env=True
  epoch 28: mse=0.00001 | rand=95.02% nat=94.58% | alpha=0.0255 | dev=0.0043 in_env=True
  epoch 29: mse=0.00001 | rand=93.72% nat=94.69% | alpha=0.0255 | dev=0.0043 in_env=True
  epoch 30: mse=0.00001 | rand=90.87% nat=88.89% | alpha=0.0256 | dev=0.0035 in_env=True
  epoch 31: mse=0.00001 | rand=90.68% nat=89.57% | alpha=0.0256 | dev=0.0032 in_env=True
  epoch 32: mse=0.00001 | rand=94.78% nat=94.39% | alpha=0.0256 | dev=0.0030 in_env=True
  epoch 33: mse=0.00001 | rand=95.82% nat=95.86% | alpha=0.0256 | dev=0.0030 in_env=True
  epoch 34: mse=0.00000 | rand=96.34% nat=96.15% | alpha=0.0256 | dev=0.0031 in_env=True
  epoch 35: mse=0.00000 | rand=96.59% nat=96.65% | alpha=0.0256 | dev=0.0031 in_env=True
  epoch 36: mse=0.00000 | rand=96.74% nat=96.80% | alpha=0.0256 | dev=0.0031 in_env=True
  epoch 37: mse=0.00000 | rand=96.83% nat=96.86% | alpha=0.0256 | dev=0.0031 in_env=True
  epoch 38: mse=0.00000 | rand=96.87% nat=96.86% | alpha=0.0256 | dev=0.0031 in_env=True
  epoch 39: mse=0.00000 | rand=96.87% nat=96.65% | alpha=0.0256 | dev=0.0031 in_env=True
  best byte recovery: 96.87% | checkpoint: geolip_svae_transformer_results/geolip_svae_transformer.pt
  adjudication (text -> model -> text):
    'the cat sat on the mat' -> 'the cat sat on tie mat'
    'machine learning' -> 'machine learning'
    'hello world' -> 'hello world'

Perfect? Not yet. Faster than SVAE battery convergence? Most definitely.

Fresh eyes of the day I see a couple of fairly crucial mistakes, I'll get them worked out.

https://huggingface.co/AbstractPhil/geolip-svae-transformer/blob/main/transformer_v3.py

Lens setting: "crusher"

The crusher is live, a simple manifestation with a large punch to the statistics capacity. They are essentially pulverizing the information into a more compact shape and learning using it, simple manifestation that converges like the standard SVAE. Requires a bit more work, and a couple of the math faults still exist in the model that I'm working through. The current crusher requires optimizations and tweaks to the formula to speed it up.

The problem with these particular faults is larger than simple solution, the theorems around them require too much math. The approximations require a compact solution.

The crusher isn't quite there yet. I'll post the results for a converging crusher with better optimization than the current asap.

For single the current formula snaps a 100% so that's good.

geolip-svae-transformer | lens=single ladder [4, 8, 16] D_dec=16 | void=on spectral=on(2L) | V32 ps2 | params 96,849 | cuda
  trainer: adamw lr=0.001 wd=0.0001 clip=1.0 | rigid_hinge=3.0 (margin 0.25·crit) diff=0.2 (ramp 30%) recon=pure_MSE | sched=onecycle | workers=8
  torch.compile ON (mode=reduce-overhead) — first step pays compile cost
  [ByteTrigramDataset] Loading corpus wikitext-2-raw-v1...
  [ByteTrigramDataset] Corpus: 10,938,611 bytes (10.9 MB), 768 bytes/image, 14,242 non-overlapping images available (10,937,843 valid window starts)
  epoch  0: mse=0.03261 | rand= 4.36% nat= 3.00% | alpha=0.0244 | dev=0.0085 in_env=True
  epoch  1: mse=0.00302 | rand=25.08% nat=23.79% | alpha=0.0253 | dev=0.0066 in_env=True
  epoch  2: mse=0.00027 | rand=32.96% nat=35.06% | alpha=0.0253 | dev=0.0058 in_env=True
  epoch  3: mse=0.00027 | rand=42.94% nat=44.58% | alpha=0.0252 | dev=0.0042 in_env=True
  epoch  4: mse=0.00027 | rand=19.85% nat=21.32% | alpha=0.0250 | dev=0.0045 in_env=True
  epoch  5: mse=0.00020 | rand= 9.62% nat=11.55% | alpha=0.0249 | dev=0.0048 in_env=True
  epoch  6: mse=0.00014 | rand=34.38% nat=39.38% | alpha=0.0250 | dev=0.0040 in_env=True
  epoch  7: mse=0.00011 | rand=20.90% nat=22.83% | alpha=0.0251 | dev=0.0038 in_env=True
  epoch  8: mse=0.00009 | rand=49.56% nat=53.20% | alpha=0.0253 | dev=0.0039 in_env=True
  epoch  9: mse=0.00008 | rand=24.78% nat=28.64% | alpha=0.0256 | dev=0.0039 in_env=True
  epoch 10: mse=0.00007 | rand=43.96% nat=46.96% | alpha=0.0258 | dev=0.0041 in_env=True
  epoch 11: mse=0.00006 | rand=53.64% nat=54.76% | alpha=0.0260 | dev=0.0041 in_env=True
  epoch 12: mse=0.00005 | rand=47.99% nat=48.25% | alpha=0.0263 | dev=0.0039 in_env=True
  epoch 13: mse=0.00004 | rand=36.85% nat=40.04% | alpha=0.0265 | dev=0.0036 in_env=True
  epoch 14: mse=0.00004 | rand=65.54% nat=68.46% | alpha=0.0267 | dev=0.0035 in_env=True
  epoch 15: mse=0.00003 | rand=89.39% nat=92.22% | alpha=0.0269 | dev=0.0034 in_env=True
  epoch 16: mse=0.00003 | rand=42.52% nat=44.34% | alpha=0.0270 | dev=0.0033 in_env=True
  epoch 17: mse=0.00002 | rand=79.55% nat=83.27% | alpha=0.0272 | dev=0.0033 in_env=True
  epoch 18: mse=0.00002 | rand=66.53% nat=69.91% | alpha=0.0273 | dev=0.0032 in_env=True
  epoch 19: mse=0.00002 | rand=42.65% nat=42.95% | alpha=0.0274 | dev=0.0031 in_env=True
  epoch 20: mse=0.00001 | rand=94.95% nat=96.85% | alpha=0.0275 | dev=0.0030 in_env=True
  epoch 21: mse=0.00001 | rand=62.79% nat=65.49% | alpha=0.0276 | dev=0.0031 in_env=True
  epoch 22: mse=0.00001 | rand=97.06% nat=98.77% | alpha=0.0276 | dev=0.0031 in_env=True
  epoch 23: mse=0.00001 | rand=93.82% nat=95.22% | alpha=0.0276 | dev=0.0030 in_env=True
  epoch 24: mse=0.00001 | rand=97.94% nat=99.36% | alpha=0.0277 | dev=0.0029 in_env=True
  epoch 25: mse=0.00000 | rand=95.35% nat=96.22% | alpha=0.0277 | dev=0.0029 in_env=True
  epoch 26: mse=0.00000 | rand=98.93% nat=99.65% | alpha=0.0277 | dev=0.0029 in_env=True
  epoch 27: mse=0.00000 | rand=98.65% nat=99.59% | alpha=0.0277 | dev=0.0029 in_env=True
  epoch 28: mse=0.00000 | rand=99.18% nat=99.65% | alpha=0.0277 | dev=0.0028 in_env=True
  epoch 29: mse=0.00000 | rand=99.27% nat=99.85% | alpha=0.0277 | dev=0.0028 in_env=True
  epoch 30: mse=0.00000 | rand=99.31% nat=99.69% | alpha=0.0277 | dev=0.0028 in_env=True
  epoch 31: mse=0.00000 | rand=99.43% nat=100.00% | alpha=0.0277 | dev=0.0028 in_env=True
  epoch 32: mse=0.00000 | rand=99.47% nat=100.00% | alpha=0.0277 | dev=0.0028 in_env=True
  epoch 33: mse=0.00000 | rand=99.51% nat=100.00% | alpha=0.0277 | dev=0.0028 in_env=True
  epoch 34: mse=0.00000 | rand=99.54% nat=100.00% | alpha=0.0277 | dev=0.0028 in_env=True
  epoch 35: mse=0.00000 | rand=99.56% nat=100.00% | alpha=0.0277 | dev=0.0027 in_env=True
  epoch 36: mse=0.00000 | rand=99.58% nat=100.00% | alpha=0.0277 | dev=0.0027 in_env=True
  epoch 37: mse=0.00000 | rand=99.59% nat=100.00% | alpha=0.0277 | dev=0.0027 in_env=True
  epoch 38: mse=0.00000 | rand=99.60% nat=100.00% | alpha=0.0277 | dev=0.0027 in_env=True
  epoch 39: mse=0.00000 | rand=99.60% nat=100.00% | alpha=0.0277 | dev=0.0027 in_env=True
  best byte recovery: 99.60% | checkpoint: geolip_svae_transformer_results/geolip_svae_transformer.pt
  adjudication (text -> model -> text):
    'the cat sat on the mat' -> 'the cat sat on the mat'
    'machine learning' -> 'machine learning'
    'hello world' -> 'hello world'

Will need to attach unrelated data for recon testing to increase the nat difficulty. It's too easy.

Alright I hooked up Trelis/tiny-shakespeare for independent eval.

Bert trainer converged as well now when bert systems are fed as trigrams as well.

geolip-svae-bert | lens=single ladder [4, 8, 16, 32] D_dec=32 | void=on spectral=on(2L) | V32 ps2 | params 143,671 | cuda
  trainer: adamw lr=0.001 wd=0.0001 clip=1.0 | rigid_hinge=3.0 (margin 0.25·crit) diff=0.2 (ramp 30%) recon=cosine_sim | sched=onecycle | workers=4
[BERTVectorDataset] Loading corpus wikitext-2-raw-v1...
[BERTVectorDataset] 23,767 non-empty lines; encoding via BERT...
  [BERT] loading bert-base-uncased on cuda...
Loading weights: 100%
 199/199 [00:00<00:00, 4028.39it/s, Materializing param=pooler.dense.weight]
BertModel LOAD REPORT from: bert-base-uncased
Key                                        | Status     |  | 
-------------------------------------------+------------+--+-
cls.seq_relationship.bias                  | UNEXPECTED |  | 
cls.predictions.bias                       | UNEXPECTED |  | 
cls.seq_relationship.weight                | UNEXPECTED |  | 
cls.predictions.transform.dense.bias       | UNEXPECTED |  | 
cls.predictions.transform.LayerNorm.bias   | UNEXPECTED |  | 
cls.predictions.transform.LayerNorm.weight | UNEXPECTED |  | 
cls.predictions.transform.dense.weight     | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.
  [BERT] 100,000 unit-normalized vectors in 1.7s
[BERTVectorDataset] 100,000 BERT vectors held in CPU memory (307.2MB)
  [eval_corpus] loading Trelis/tiny-shakespeare (held-out, unrelated to training)...
  [eval_corpus] Trelis/tiny-shakespeare: 472 lines, encoding via BERT...
  [BERT] loading bert-base-uncased on cuda:0...
Loading weights: 100%
 199/199 [00:00<00:00, 4059.70it/s, Materializing param=pooler.dense.weight]
BertModel LOAD REPORT from: bert-base-uncased
Key                                        | Status     |  | 
-------------------------------------------+------------+--+-
cls.seq_relationship.bias                  | UNEXPECTED |  | 
cls.predictions.bias                       | UNEXPECTED |  | 
cls.seq_relationship.weight                | UNEXPECTED |  | 
cls.predictions.transform.dense.bias       | UNEXPECTED |  | 
cls.predictions.transform.LayerNorm.bias   | UNEXPECTED |  | 
cls.predictions.transform.LayerNorm.weight | UNEXPECTED |  | 
cls.predictions.transform.dense.weight     | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.
  [BERT] 8,192 unit-normalized vectors in 0.1s
  epoch  0: mse=0.00059 | rand=99.05% nat=99.06% | alpha=0.0239 | dev=0.0047 in_env=True
  epoch  1: mse=0.00001 | rand=99.88% nat=99.88% | alpha=0.0239 | dev=0.0037 in_env=True
  epoch  2: mse=0.00001 | rand=99.90% nat=99.87% | alpha=0.0239 | dev=0.0041 in_env=True
  epoch  3: mse=0.00001 | rand=98.74% nat=98.71% | alpha=0.0240 | dev=0.0032 in_env=True
  epoch  4: mse=0.00001 | rand=99.54% nat=99.53% | alpha=0.0244 | dev=0.0031 in_env=True
  epoch  5: mse=0.00001 | rand=99.53% nat=99.52% | alpha=0.0248 | dev=0.0032 in_env=True
  epoch  6: mse=0.00001 | rand=99.81% nat=99.81% | alpha=0.0251 | dev=0.0025 in_env=True
  epoch  7: mse=0.00000 | rand=99.93% nat=99.93% | alpha=0.0254 | dev=0.0028 in_env=True
  epoch  8: mse=0.00000 | rand=99.82% nat=99.82% | alpha=0.0256 | dev=0.0030 in_env=True
  epoch  9: mse=0.00000 | rand=99.97% nat=99.97% | alpha=0.0257 | dev=0.0034 in_env=True
  epoch 10: mse=0.00000 | rand=99.97% nat=99.97% | alpha=0.0259 | dev=0.0033 in_env=True
  epoch 11: mse=0.00000 | rand=99.98% nat=99.98% | alpha=0.0259 | dev=0.0033 in_env=True
  epoch 12: mse=0.00000 | rand=99.99% nat=99.99% | alpha=0.0259 | dev=0.0031 in_env=True
  epoch 13: mse=0.00000 | rand=99.99% nat=99.99% | alpha=0.0259 | dev=0.0028 in_env=True
  epoch 14: mse=0.00000 | rand=100.00% nat=100.00% | alpha=0.0259 | dev=0.0029 in_env=True
  epoch 15: mse=0.00000 | rand=100.00% nat=100.00% | alpha=0.0259 | dev=0.0031 in_env=True
  epoch 16: mse=0.00000 | rand=100.00% nat=100.00% | alpha=0.0259 | dev=0.0033 in_env=True
  epoch 17: mse=0.00000 | rand=100.00% nat=100.00% | alpha=0.0259 | dev=0.0034 in_env=True
  epoch 18: mse=0.00000 | rand=100.00% nat=100.00% | alpha=0.0259 | dev=0.0034 in_env=True
  epoch 19: mse=0.00000 | rand=100.00% nat=100.00% | alpha=0.0259 | dev=0.0034 in_env=True
  best cos recovery: 100.00% | checkpoint: geolip_svae_bert_results/geolip_svae_bert.pt

Article wasn't ready yet, the experiments for this round need more work on each before I can attest to certain rules, and the wording needs considerable amounts of work.

The article will be out when the experiments are ready.

In this post