@AbstractPhil on Hugging Face: "Today, I'll be determining the codebook capacity and utility potential for the…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

posted an update 5 days ago

Post

165

Today, I'll be determining the codebook capacity and utility potential for the larger batteries; Fresnel, Johanna, Grandmaster, Freckles, and Johanna-F variants, which should give a good indication of which models are capable of handling codebooks and which are more errant. The earlier all use SVD while the later do not. The differences are noted per and the behavior divergent.

I anticipate the D=16 will be more errant, and the final-state variants of those could very well be much more difficult or costly to inference as their axis bends are likely considerably harder to track. However, I'm confident that enough bounces will give the yield required so I'll set up some high-yield noise barrages to determine how much of them we can in fact extract from Johanna, and then set up similar barrages for images to map the internals of Fresnel and Grandmaster.

Grandmaster will be tricky, as it was an experimental Johanna-256 finetuned series meant to map sigma noised image inputs to recreate Fresnel behavioral output. Noised image goes in -> Fresnel-grade replication comes out in high res.

This allowed preliminary Dall-E Mini-esque VAE generation and will be explored further for the stereoscopic translation subsystem, to allow image generation in the unique format of diffusion that I was working out. I anticipate this system to be more than capable at making monstrosities, so I won't be posting TOO MANY prelims on this one, but the high-capacity potential of these noise makers are meaningfully powerful. Getting uniform codebooks in-place for these models will allow full transformer mapping downstream instead of just guess working the MSE piecemeal, which the earlier versions and variants were doing.

I'm straying from the CLS specifically for this series because CLS creates adjudicated pools of bias orbiting the INCORRECT orbiter some SVAE. The orbital target IS the soft-hand accumulated bias with the sphere-norm, so having a competitor isn't going to be a good option.

AbstractPhil

5 days ago

•

edited 5 days ago

I have a few diffuser prototypes that I'll be exploring now that the full array system is in order. One that I've been very much wanting to approach, which is sigma-degrading interpolation manifolding.

In other words, you take an H2 Fresnel expert, snap it in. Say I train a cifar100 variant and finetune it with oh maybe 50 epochs of reconstruction from the Fresnel-512 with various levels of noise applied to Fresnel, not using cutmix or something odd like that.

Next we finetune our array. Say we want 1000 steps, we'll divide the amount of adjudicated states by how many states of noise we want to see. Our finetuned batteries are then ran with oh... maybe 500 batches of images each and apply scheduled noise instead of random noise like what the H2 batteries were primarily trained with, which should be within a 10 minute training session or so. The batteries pooled into the battery array and uploaded as a standard battery array for reuse in safetensor format with the optimizer states uploaded alongside at an adjacent repo.

So the process is simple; noised image in, replicate next stage of noise down in the chain. Each battery is meant to denoise by one step and collapse the results into a patchworked behavioral training for a downstream model.

We then take each of these variants and blow them up, creating scanner manifolds of each and collapsing the weights into a single linear batched pass which will be roughly 500 megs vram or so each sigma attempted.

Finally we stack our entire sequence up and hook them together with MLP collapse, and at each level inject the original image with the correct noise value. So say you have 10 batteries that are meant to target 10 noise steps. You now have a 10 step reconstruction generator that runs once and automatically boom your image pops out nearly instantly.

Alright, now if we space that out using Adam's standard internal step of 1000 and space it out, we'll have our roughly 1/100 sigma hoppers. This will be our blueprint.

With this we can distill a diffusion model's vae expectation into what we want, and guarantee the output is fully prepared for step-hop skipping.

So each of these are fed into a singular transformer structure that sees the original diffuser's standard diffusion step produced, and boom you have yourself a pixel synthesis skip process. You've effectively skipped the entirety of the diffusion process with the correct layout.

This will also require the Alexandria stage, so it will take time to process and pool the necessary informational accumulations and relational capacity to make that portion perfect, however with some more work Alexandria's text distribution system will be ready to go, and the distiller will be ready to consume high-yield diffuser technologies like Flux and the like.

This will allow for not only compacting massive amounts of information into embedded solvers, but allow for cellphone-sized image generation with enough processing and data, to create flux-grade images or better.

The technology is there, the experiments yielded, the answers present, the results show this is more than possible, and now it's time to build.

AbstractPhil

5 days ago

•

edited 5 days ago

As predicted the codebooks for all noise models conform to an architectural scaling for them within a very minimal delta shift. There is no real deviance, the architectures learn a codebook that manifests and can be directly utilized at runtime.

The delta is real within shift and each model conforms to it's own modified codebook delta during training. This is an architectural constant now and can be prepared in very little time before processing or utilizing the models.

The helper functions and methods are all present in the AbstractEyes/geolip-svae repo on github now, and everything is documented.

xcnqa009

4 days ago

huggingface_0428_02

AbstractPhil

3 days ago

•

edited 3 days ago

The underlying universal substrate principle theory didn't hold for the H2 battery and yet the H2 battery cleanly converged, so I will be downgrading the "theory" to "hypothesis" in which this can potentially exist and it has been observed as an emergent trait, but this does not exist in this architecture. That changes the trajectory of the H2 battery - we can call this variant "chaotic controlled" which is essentially a format of non-SVD that converges to the sphere, but does not necessarily conform to the underlying topological requirements for a universal substrate.

SOUP!!!

No doubt about it, this soup MSE solves pixels - and sticking within that 16.77m paradigm I'm attempting to teach text using a format of RGB translation. It's working on the MSE-level and the replication is strong, but it's not as strong as it needs to be.

As you can see the byte-level recon and the trigram recon is growing. 64x64 images with patch_size 2 is powerful stuff. The model should saturate soon enough. Due to the instability of the H2 battery line with text, I have enabled soft hand for this variant, which is rewarding good behavior and punishing bad behavior at a strength of 0.01. The other variants were trained without soft hand as they emerged naturally, this variant is a bit more stubborn.

As tragic as it is for the loss of the implicit shared substrate controller theory -> I'm downgrading to hypothesis, the codebooks yielded something substantial from the system. A new point-centric formatting that allows mapping of the internals within spherical models, which allowed me to research and directly learn about multiple internal model analysis structures for deep-level theorem, mathematics, substrates, topological analysis, and more.

There is a large array of useful tools already established from the math theorem community that I will be exploring to test the larger batteries, to see if there is in fact some semblance of legitimately shared substrate. It's not just Adam's 1000 step or I would have stopped, because LBFGS is converging them cleanly as well - which is INSANELY unstable and prone to NaN so I have some engineering solutions ready for that one.

https://docs.pytorch.org/docs/stable/generated/torch.optim.LBFGS.html

The arch may need some work before we format a perfect solver, but I have some ideas that could prevent internal drift without requiring SVD, while simultaneously introducing controlling-agents that aren't as ruthlessly destructive as labels and cross-entropy.

It'll take some experimenting and I hope I don't lose fragments as I go.

AbstractPhil

2 days ago

•

edited 2 days ago

As you can see the final state of the trigram function did learn full bitwise replication to an extent but it does have some flaws. Clearly not identity passthrough as shown by the growing pains.

I did some codebook sampling tests but the codebook isn't strong enough on it's own to build a sentence-similarity assessor. A simple MLP stack should be more than enough like the classifier, just need to freeze the battery.

byte_trigram_proto_64_patch_2_v1 pre-training successful.

Next stage is something along the lines of a mask prediction sequential behavior. I have more than enough sequential systems to format something pretty quickly, lets see if it works.

Yesterday's Cowork created some invalid text.py code in the svae repo, I'll need to have the model figure it out correctly since Cowork seems to have amnesia every single request even with skills. Either that or I have to code it myself.

AbstractPhil

1 day ago

•

edited 1 day ago

Massive retrieval breakthrough. This little 57k param model can handle sentence similarity to a fair degree with it's dirty little barely-omega codebook. Imagine that.

======================================================================
SentenceEncoder demo — model: byte_trigram_proto_64_patch_2_v1
======================================================================

[1/4] Loading model from HF…
  Loaded: V=32, D=4, ps=2, hidden=64, depth=1, n_cross=1
  Architecture: linear_readout=True, svd_mode=none, smooth_mid=16
  Params: 52,571, best_test_mse=3.5706435755855637e-07 @ ep 49
  Device: cuda:0

[2/4] Resolving codebook (byte_trigram_wikitext103_val)…
  Source: hf://AbstractPhil/geolip-SVAE/byte_trigram_proto_64_patch_2_v1/codebooks/byte_trigram_wikitext103_val
  Codebook(D=4, n_axes=21, pairs=11, unpaired=10, dev=+0.0832, clean=False)
  Attached. compatible_with(model)=True (D=4 == model.D=4)

[3/4] Building SentenceEncoder…
  Encoder: img_size=64, patch_size=2, pad='space'
  Per-patch aggregation: 'best_match'

[4a] Round-trip sanity check (text → image → recon → text)…
  sentence                                                 n_real  real_acc  real_l1  recon_text                              
  ───────────────────────────────────────────────────────  ──────  ────────  ───────  ────────────────────────────────────────
  The cat sat on the mat.                                      23    1.0000    0.000  The cat sat on the mat.                 
  A feline rested upon the rug.                                29    1.0000    0.000  A feline rested upon the rug.           
  Many believe artificial intelligence will transform me…      61    1.0000    0.000  Many believe artificial intelligence wi…
  Numerous experts think AI will revolutionize healthcar…      56    1.0000    0.000  Numerous experts think AI will revoluti…
  The cat sat on the mat.                                      23    1.0000    0.000  The cat sat on the mat.                 
  The cat sits on the mat.                                     24    1.0000    0.000  The cat sits on the mat.                
  the cat and the dog are friends                              31    1.0000    0.000  the cat and the dog are friends         
  the cat and the dog are friends                              31    1.0000    0.000  the cat and the dog are friends         
  the cat and the dog are... friends                           34    1.0000    0.000  the cat and the dog are... friends      
  the cat and the dog aren't friends                           34    1.0000    0.000  the cat and the dog aren't friends      
  the cat and the dog are friends                              31    1.0000    0.000  the cat and the dog are friends         
  the cat and the dog are friendly                             32    1.0000    0.000  the cat and the dog are friendly        
  the cat and the dog are best friends                         36    1.0000    0.000  the cat and the dog are best friends    
  the cat and the dog are friends                              31    1.0000    0.000  the cat and the dog are friends         
  the cat and the dog are best friends                         36    1.0000    0.000  the cat and the dog are best friends    
  the cat and the dog are true friends                         36    1.0000    0.000  the cat and the dog are true friends    
  the cat and the dog are great friends                        37    1.0000    0.000  the cat and the dog are great friends   
  the cat and the dog are toxic friends                        37    1.0000    0.000  the cat and the dog are toxic friends   
  Many believe artificial intelligence will transform me…      61    1.0000    0.000  Many believe artificial intelligence wi…
  Many beleive artificial intellgence will transform med…      60    1.0000    0.000  Many beleive artificial intellgence wil…
  The cat sat on the mat.                                      23    1.0000    0.000  The cat sat on the mat.                 
  The dog ran across the park.                                 28    1.0000    0.000  The dog ran across the park.            
  Wikipedia is a free online encyclopedia accessible to …      61    1.0000    0.000  Wikipedia is a free online encyclopedia…
  Britannica is a paid reference work edited by experts.       54    1.0000    0.000  Britannica is a paid reference work edi…
  The cat sat on the mat.                                      23    1.0000    0.000  The cat sat on the mat.                 
  import torch.nn.functional as F                              31    1.0000    0.000  import torch.nn.functional as F         
  Many believe artificial intelligence will transform me…      61    1.0000    0.000  Many believe artificial intelligence wi…
  ERROR: connection timeout after 30s on port 8443             48    1.0000    0.000  ERROR: connection timeout after 30s on …

  Mean real_byte_acc across test set: 1.0000

[4b] Per-patch cosine similarity…
  Modes: ('M', 'codes').  Higher = more similar; range [-1, 1].

┌─ Paraphrase     (sem= surf!=) ──────────────────────────────────────
                        M     codes
  ────────────── ────────  ────────
  A: 'The cat sat on the mat.'
  B: 'A feline rested upon the rug.'
  per-patch       +0.9088   +0.5156

  A: 'Many believe artificial intelligence will transform medicine.'
  B: 'Numerous experts think AI will revolutionize healthcare.'
  per-patch       +0.9496   +0.5839

┌─ Edit           (sem= surf~) ───────────────────────────────────────
                        M     codes
  ────────────── ────────  ────────
  A: 'The cat sat on the mat.'
  B: 'The cat sits on the mat.'
  per-patch       +0.9588   +0.7187

  A: 'the cat and the dog are friends'
  B: 'the cat and the dog are friends'
  per-patch       +1.0000   +1.0000

  A: 'the cat and the dog are... friends'
  B: "the cat and the dog aren't friends"
  per-patch       +0.9914   +0.8854

  A: 'the cat and the dog are friends'
  B: 'the cat and the dog are friendly'
  per-patch       +0.9919   +0.9167

  A: 'the cat and the dog are best friends'
  B: 'the cat and the dog are friends'
  per-patch       +0.9717   +0.8281

  A: 'the cat and the dog are best friends'
  B: 'the cat and the dog are true friends'
  per-patch       +0.9989   +0.9583

  A: 'the cat and the dog are great friends'
  B: 'the cat and the dog are toxic friends'
  per-patch       +0.9989   +0.9844

  A: 'Many believe artificial intelligence will transform medicine.'
  B: 'Many beleive artificial intellgence will transform medecine.'
  per-patch       +0.9702   +0.7578

┌─ Same-domain    (sem!= surf~) ──────────────────────────────────────
                        M     codes
  ────────────── ────────  ────────
  A: 'The cat sat on the mat.'
  B: 'The dog ran across the park.'
  per-patch       +0.9599   +0.6354

  A: 'Wikipedia is a free online encyclopedia accessible to anyone.'
  B: 'Britannica is a paid reference work edited by experts.'
  per-patch       +0.9442   +0.5755

┌─ Cross-domain   (sem!= surf!=) ─────────────────────────────────────
                        M     codes
  ────────────── ────────  ────────
  A: 'The cat sat on the mat.'
  B: 'import torch.nn.functional as F'
  per-patch       +0.9427   +0.5677

  A: 'Many believe artificial intelligence will transform medicine.'
  B: 'ERROR: connection timeout after 30s on port 8443'
  per-patch       +0.9436   +0.5781

┌─ Pairwise similarity matrix (one example per group) ────────────────

  Mode: 'M'
               0      1      2      3      4      5      6      7
  para-A   +1.00  +0.91  +1.00  +0.96  +1.00  +0.96  +1.00  +0.94
  para-B   +0.91  +1.00  +0.91  +0.92  +0.91  +0.94  +0.91  +0.94
  edit-A   +1.00  +0.91  +1.00  +0.96  +1.00  +0.96  +1.00  +0.94
  edit-B   +0.96  +0.92  +0.96  +1.00  +0.96  +0.95  +0.96  +0.95
  same-A   +1.00  +0.91  +1.00  +0.96  +1.00  +0.96  +1.00  +0.94
  same-B   +0.96  +0.94  +0.96  +0.95  +0.96  +1.00  +0.96  +0.96
  cros-A   +1.00  +0.91  +1.00  +0.96  +1.00  +0.96  +1.00  +0.94
  cros-B   +0.94  +0.94  +0.94  +0.95  +0.94  +0.96  +0.94  +1.00

  Mode: 'codes'
               0      1      2      3      4      5      6      7
  para-A   +1.00  +0.52  +1.00  +0.72  +1.00  +0.64  +1.00  +0.57
  para-B   +0.52  +1.00  +0.52  +0.49  +0.52  +0.58  +0.52  +0.54
  edit-A   +1.00  +0.52  +1.00  +0.72  +1.00  +0.64  +1.00  +0.57
  edit-B   +0.72  +0.49  +0.72  +1.00  +0.72  +0.68  +0.72  +0.68
  same-A   +1.00  +0.52  +1.00  +0.72  +1.00  +0.64  +1.00  +0.57
  same-B   +0.64  +0.58  +0.64  +0.68  +0.64  +1.00  +0.64  +0.60
  cros-A   +1.00  +0.52  +1.00  +0.72  +1.00  +0.64  +1.00  +0.57
  cros-B   +0.57  +0.54  +0.57  +0.68  +0.57  +0.60  +0.57  +1.00

──────────────────────────────────────────────────────────────────────
Modes:
  'M':     per-patch flat sphere-norm encoder rows [V*D].
  'codes': per-row argmax over codebook axes, one-hot flat
           [V*n_axes]. Requires attached codebook.
──────────────────────────────────────────────────────────────────────
An exception has occurred, use %tb to see the full traceback.

SystemExit: 0

AbstractPhil

1 day ago

•

edited 1 day ago

I have given soup orderly capacity, it can now be declared projective trigram byte ordinal.

Hundreds of hours, thousands of trains, and I have finally solved the bertenstein soup paradox.

Convergence through surge, is now a matter of what size, what bitrate, what format, and what complexity.

It has no need for resolution. It has no need for traditional constraints. It has no need for relational behavior. It is a variant of invariance through artifact conditioning and crystallized through pure data.

In this post