Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
AbstractPhil 
posted an update about 6 hours ago
Post
22
By trying to disprove the Omega H2 battery I have discovered;
* Each topology formed by the H2 battery is deviant, none have a uniformly shared substrate of behavior. They are each uniquely independent per training set all with perfect recon.
* Image recon can be tracked and mapped, yielding a consistently mapped and response 16.77m vocabulary potential. In the current spectrum testing at around 5 million unicode bytes.
* The model scale shows patch size is related to how much data you want the model to represent within the model itself, and this has yet to see a capacity to this day. The MSE recons and yields and the more data fed, the more this yields.
* The scaling principle shows that the model indefinitely scales upward and each level of the model can be iteratively captured upward to form deviant and uniformly consistent repeatable pathways of implicit codewise response, not just arbitrary bitwise recall. Meaningful implicit learned utility.
* Image recon patch size should match the slice of image you want to represent, as it uses patch smoothing per patch internally from identity.
* byte trigrams are channel-agnostic, they do not require a channel count just a formula for recall at nGram recall 99.6% for byte-by-byte representations. With those comes an adjacently capable codebook.
* sentencepiece preliminary tests show validity and reconstruction just like the byte trigrams, using the new byte trigram this would be arbitrarily convenient to recon a codebook for the structure.
* binary trees learn a uniformly potent and powerful gating mechanism that required further exploration, each of them produces direct responsive independent capacity and the responses are controllable.
* ternary experiments show the models are directly responsive to -1, 0, +1 behavior, so the quantization is very much a valid potential.
* preliminary tests with the H2O1 series of batteries show the models are responding similar to natural universal elements in the universe itself

As it stands I'm going to need to redefine;
patch_size, it does not fit with this spectrum at all.
cross-attention, this is not traditional attention in many ways and it is at the same time.
scaling, this scaling principle I've never seen in a model.
recon, this level of recon is shockingly accurate and near perfect.

  1. patch_size, the behavior of patch size does not align with this model. The patch size ends up determining internally how many sub-structured deviations the model aligns to, so in a way this is less about patches and more about which sectors of the model's ensemble are grouped up together for gradient updates. This is similar to patchworks of the past, but this is definitely not a patchwork. This has ensemble behavior.
  2. cross-attention, labeled spectral cross attention does not fit. This is RP^(D-1) attention, which is not the same as spectral attention. Yet, it has the exact rulings and principality of alpha-driven cross-attention, causing the spectral behavior to shift slowly over time like it. It requires a new mechanism, something more uniformly capable for the evolving model space, something that better orients the patch-agnostic, scale-agnostic, resolution-agnostic behavior the model's lens behavior is exhibiting.
  3. recon, this is a big one. This has become an arbitrarily easy task to fulfill for the model. This essentially completes on it's own for the linear system without needing a large differentiation. I'll be consulting unsupervised behavioral training studies to see the best methodologies for tuning such a structure in a reasonable way without shattering it.
  4. Scaling, is a tricky one. The entire sub-structured system keeps disproving any claim I, Claude, or GPT make about it. The model shape is constantly defying expectation in an upshot manner, so I'm hesitant to even slap any limitations on it at this point. Even with rigid training such as the ngram sequential, the representation sequences can in fact learn codebook association in a better-than-placebo utility. It's obviously not the smartest model on the face of the earth, but 500 of them properly organized could perform some real feats.

Upcoming trains:

  1. 2-8gram batteries. Each SVAE battery is designed specifically with it's own sub-vocabulary for this experiment. We're going to see what happens when we build deviant codebook association to accumulate sentence similarity. I'm thinking... probably wikipedia 103 again, that worked for the bytewise variant.
  2. Stacked grams utilizing topological geometric structures; will be testing cantor sets with alpha/beta/gamma/sigma, differentiated procrustes bert opinions for downstream using captionbert, and a few other models. The plan is to create a stacked variation, where one battery leads to another directly rather than ensemble alignment. This will produce a few dual-gram represented using a topologically deterministic methodology that captures the output of the ngram that CAN function, and a few that cannot. I will try to keep them separate and update the research accordingly, however updating research is very time consuming so I may end up putting my nose to the paper for a week or two and just erupting a huge paper.

I can think of at least 80 or 90 model prototypes that can handle sequential codebook learning off the top of my head. This will enable pure sentence similarity with almost no params.

I anticipate low risk research results, which will yield enhanced codebook capacity, increased downstream utility, answers to questions such as battery-to-battery transfer learning, procrustes alignment out of spectrum, token curation and learning tokens directly instead of direct bytewise learning to retain order, and a series of other common machine-learning paradigms that have not been tested fully yet.

Most likely going into this I can say it will work based on the recon. If I map the vocabulary of 2gram to it's own, 3gram to it's own, and so on; it will work guaranteed, and the codebooks will yield uniqueness.

This will yield two factories and two schools of potential here.

  1. The direct hard-set space-specific non-agnostic super-strict aligned spectrum. These models will crash if loaded incorrectly and tell you why. If they aren't inferenced exactly to specifications they will most likely fault. Each have independent vocabularies hard-mapped to colors and sequentially prepared with the utmost care.
  2. The indirect soft-space unscaled agnostic soup-prone variant that somehow exhibits 99.6% MSE and trigram recon. Shuffle it. Throw it in the blender. Make the stew. What came out? Something mappable and understandable with ripter d=2 to map the voids and axial perturbations on the sphere.

These two will be direct competition, and may in fact end up being paired together in cooperative collectives if the trains yield.

A word of warning if you attempt to use my models or code from geolip-svae or geolip-core:

  • The models and the process is highly experimental.
  • The yield is determined heavily based on tunings from the spectrum, the array and list of configurations may not be defaulted to useful configurations.
  • Many batteries are heavily experimental and may not yield as predicted every train for every dataset, this is expected and encouraged behavior to be studied.

Your trains aren't failing if you participate, you are witnessing another emergence deviance that must be catalogued and understood to better scale the scaling mechanism, refine the patchwork system, and align the projection system for the codebook synthesis.

The codebook training system is highly experimental and will be heavily prone to change in the coming days. What exists today likely won't be the same format in a week or two weeks, so be aware there will be rapid iterations.

WITHOUT a proper codebook, the model's cosine similarity won't work and you need to default to kNN detection which is slow but doable for CONV and downstream transformer models. It's just rough-going. Grab a Fresnel for images, that's your best bet. Grab a Johanna for generic. Grab Freckles if you want instability. Train their codebook for a task, it should be done in less than a minute, and then train your classifier/math tester/checker/etc. It'll work with some jiggering.

Fresnel was pretrained on imagenet, which provides all the data you need to capture image information.

Johanna was trained on 16 types of noise, which allows fair recon (albeit too smooth in the current state) for downstream differentiation utility.

USING their directly TOPICAL and EASY-ACCESS detections involves MSE assessment, which gives you the recon capacity of the object you're utilizing if you're using a battery selector framing. This is especially useful for models trained with mathematical formulas or functional systems related to data selection, rather than specific relational tasks.

So for example; say you have five fresnel finetunes. Each of them are trained with an additional 100 batches of a specific kind of table. Run the system, it creates your codebook, and you can have it snap the codebook to the model automatically or stack it in a directory - it's not AI data, it's geometric numerics.

So you have your five finetunes, and then you slap an h2 battery in there finetuned on gaussian noise. Show the collective the image you want to figure out, say it's a table.

Pretrain task:

  1. Load H2 imagenet pretrain. Finetune with chair dataset, create codebook.
  2. Run a process that splits your types by label and build differentiated dataset using cosine similarity.
  3. Unsupervised finetune each in the same sequence, roughly 100 batches per, 1 gig vram, < 1 minute each.

Image: Black Chair

H2 Battery Array: Chair differentiation calculator.
6 batteries.

H2 Battery 1 - Purple Chairs: mse 0.005
H2 Battery 2 - Grey Chairs: mse 0.03
H2 Battery 3 - Dark Grey Chairs: mse 0.01
H2 Battery 4 - Black Chairs: mse 0.000005
H2 Battery 5 - Wooden Chairs: mse 0.0005
H2 Battery 6 - Generic Gaussian: mse 0.00001

57,000 * 6 = 342,000 params
Selector between 800-5000 MLP, nothing too heavyweight but a little excessive for the task. Attach attention here if you want, but it won't matter most of the time.

If we snap battery 6 as lowest MSE we can probably say, alright this probably isn't any of the expected shapes. If the model is wrong, the backprop will feed the MLP standard backprop until it behaves to the collective outputs for MSE statistics. Everything is built in for this, and it's a guarantee eventually given enough passes to work.

Put together, the system is a guaranteed selector. Now say we want a selector for chairs. Well we have it.

Image: Potato

H2 Selector Array -
H2 Battery 1: Chair generic finetuned -> do we use the chair array?
H2 array 1: Chair array.
H2 Battery 2: Potato generic finetuned -> do we use the potato array?
H2 array 2: Potato array.

Current limitations:

  1. The Imagenet variation may just, perform better. Literally just, better than your finetune, that's what the upcoming experiments are testing for. How to guarantee independence and shared utilization selection.
  2. The noise variation may actually outperform the imagenet variation. Noise is quite the teacher, and the noise models learned a ton of noise. They are quite literally omegas, so they may just solve the problem.
  3. They generalize TOO WELL at times, which means specifics are trickier to implement.

CORRECTLY TRAINED and CORRECTLY UTILIZED, these will turn deterministic query into complex behavior oriented responses, implicitly aligned mathematical guarantees, guaranteed downstream adjudications, differentiation transfer through encoding, and many other options - all on the GPU or CPU if you wish. These ARE omegas, and they are absurdly compact data selectors at 57k params and roughly 700kb drive space - if treated as such, even without heavy query. Beware though, this isn't production ready. They work, and there's an interface, but they are most definitely not ready.

Future potentials if yields hold:

A train I'm greatly looking forward to but haven't mentioned yet, the tapped trains. The direct interpolated teacher/student battery shunts. These batteries have proven they can learn independent opinions similar to the David collective, and their MSE is far far, far more accurate than the David collective was.

Look forward to that one. I'll be tapping Flux Schnell with it, and it will form a battery or battery array per layer, depending what the layers are, and they will be directly capable of transfer learning the entire mathematical differentiation behavior of flux when the entire mathematic spectrum is worked out for differentiation transfer. Transferring the entirety of Flux Schnell into the core of the Beatrix oscillator diffusion structure is my goal. One stage at a time though.

The steps are lining up, but we're not there yet. Many more tests to go before I can say for certain I can transfer the entirety of Flux. I have the climbing gear, I'm on the mountain, and I'm climbing. Time will build the answer.

The way I see it is; if the view of the universal geometric vocabulary does not present itself here in a single omega, I need to build a collective of omegas that are trained on a converged masterclass model to truly represent the necessary behavior of omega before I can begin scanning for it correctly. This will take time to set up.

A single omega, isn't strong enough to yield the universal omega solver spectrum. It's simply too small. I'll need to run thousands of trains on all data types from all walks of life for hundreds of epochs each to yield correct results. I don't have the hardware, so I have to make do with what I have.

If BlackForestLabs models yield the omega synthesis formula, then I will attribute direct credit to the model line that went into discovery, and each subsequent model tested with the Omega analysis benchmark - which I hypothesize and aren't certain about, will be targeting the topological constraints required to form internal omegas and the pressure required architecturally to from the sub-behavioral topological implicit architecture THAT DOES form them.

Answering what forms Omega - is only a step towards the goal though. There are bigger fish to fry, much bigger. One target is testing and looking deeply for quantum-adjacent and interconnected entanglement behavior that is directly represented within traditional mathematics, bridged completely and purely from traditional mathematics into a quantifiable and relationally behaviorally adjudicated system, is one of the engineering goals. If it's there, I'll find it - or evidence of it. The antipodal pairs are indication that the formulations can exist, the models just need to be... bigger. Much bigger. There's evidence but not enough to draw any conclusions yet. They are indications of far too many formula potentials to say that it is in fact quantum adjacent, but the indications are showing that this model's behavior is not... normal. There's an explanation here I'm certain, a very reasonable one represented in mathematics and engineering, and that explanation will build the hypothesis for the next.

In this post