@AbstractPhil on Hugging Face: "geolip-vit-x34 - 34 expert vit. I can't train an extended version of 34 vits…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

posted an update 2 days ago

Post

geolip-vit-x34 - 34 expert vit. I can't train an extended version of 34 vits, but I can definitely run some experiments and make some starter weights with an anchor. That would yield a substantial amount of data.

AbstractPhil/bulk-coco-features

This... is going to be a odd one to describe. Based on the research with Bert, creating a uniformed patchwork using a multitude of vit composites will be very achievable. It shouldn't be soup, which is really hard to explain, but by creating a second geometric anchor, the system will align in a way that I could never predict without many more model analysis and must test. I simply didn't test all these vits for geometry, so this will be the test.

This is essentially 34 directly extracted views of coco, which is already prepared feature data. With this data, we have 34 experts that can distill into a single unified vit. I'm hesitant to even call this distillation anymore, it's more interpolative data alignment, and it's absurdly retentive.

ADDITIONALLY, we can anchor to frozen geolip-bert and create cross-contrast between the anchors for a learned anchor median, which will allow further integrations directly into the geometric core.

This will require a few overlapping internal mechanisms to guarantee vit differentiation, however I believe the full unified patchwork will be... different from what is currently known as a vit.

geolip-bert-vit will likely be cooking within the month. The alignment statistics say it will be... 100% accurate to the specifications.

I CAN prepare 34 vits worth of imagenet, but I would need probably 34 vits worth of laion aesthetics, which is substantially more than I currently have. In the process I would need to ensure everything isn't corrupt, and the captions are correctly synthesized in our expert student bert with the correct anchoring rotation.

Probably 3 vits is enough for the full version prototype, 34 vits for the bulk experiment.

AbstractPhil

about 21 hours ago

The x34 crossover was excessive noise, the continuity is very difficult to rationalize and form a cohesive anchoring for.

Predominantly the boundaries. I ran multiple scans on the COCO data and found huge deviances of composite differentiation, which means the manifold overall would be massive.

Something like trying to represent 800b params in a single 84m param space, which even the best of alignment geometric would require considerably more refined mathematics to even channel. The full process will require a multi-stage relational conjecture and interpolation between alignment positive, negative, and including a symbolic relational architecture just to make sense of the noise itself.

It could be done, and I can say for certain it could be done. It's just a matter of what the finished product between such a complex relational structure would look like, and how much information would be required to actually train it. The math lines up, the architecture lines up, but it does not have a reasonable horizon for the manifold.

Simply put, because of the boundaries, even if constricted, training 34 experts at runtime is beyond my scope. It would require full patch extraction, not just attenuation like CaptionBERT. CaptionBERT is a different form of differentiation that allows rapid pooled learning between multiple from the same family, while the x34 would require multiple pooled learning from adjacent families. Each family requires it's own patchwork size, it's own formulas for alignment based on that, and it's own specific attenuation principality based on the adjacent differentiation.

Possible yes, very possible. A true challenge and test for the architecture with a team and a series of experts, but I am but one researcher. I would be here for days figuring out how to attenuate patch14 to patch16 differences, just to yield little information based on the hypersphere alignment.

AbstractPhil

about 21 hours ago

Screw it, I'll analyze it anyway. I might find an easy route.

It won't be 34 though, I'll prune a bunch and find the most aligned clips. Probably 3 will do for a proper prototype.

AbstractPhil

about 20 hours ago

•

edited about 20 hours ago

Dino losses appear to align on many fronts to my research, which means I may be able to adopt some and adjust others.

It's possible the Dino team figured out a solution that I'm currently architecting, and it might lead directly to where I need to be, while enabling many of my geometric research benefits in the process.

Anchor alignment is analogous with the Dino iBOT loss spectrum, but they are two sides of the same coin. Combining the two may provide a much more robust geometric objective. The paper collapsed it, but... I think it was missing the anchoring, I'll run some tests.

KoLeo loss from DinoV2 may be considerably more valuable, I'll need to research V3 as well.

In this post

AbstractPhil AbstractPhila