GeoStack: A Framework for Quasi-Abelian Knowledge Composition in VLMs
Abstract
GeoStack is a modular framework that composes domain experts in Vision-Language Models while preserving foundational knowledge and enabling constant-time inference through geometric constraints on adapter manifolds.
We address the challenge of knowledge composition in Vision-Language Models (VLMs), where accumulating expertise across multiple domains or tasks typically leads to catastrophic forgetting. We introduce GeoStack (Geometric Stacking), a modular framework that allows independently trained domain experts to be composed into a unified model. By imposing geometric and structural constraints on the adapter manifold, GeoStack ensures the foundational knowledge of the base model is preserved. Furthermore, we mathematically demonstrate a weight-folding property that achieves constant-time inference complexity (O(1)), regardless of the number of integrated experts. Experimental results across multi-domain adaptation and class-incremental learning show that GeoStack provides an efficient mechanism for long-term knowledge composition while significantly mitigating catastrophic forgetting. Code is available at https://github.com/QuantitativeImagingLaboratory/GeoStack.
Community
How many domain experts can you stack before a VLM collapses? 🧱
GeoStack introduces a geometric framework to compose independently trained experts into a single model with zero added inference cost. By using a perturbation prior and orthogonality constraints, it achieves a 10x reduction in geometric error compared to standard adapters.
If you're looking for a way to build specialized VLMs that don't forget their foundational knowledge, check this out!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Evolving Prompt Adaptation for Vision-Language Models (2026)
- HeBA: Heterogeneous Bottleneck Adapters for Robust Vision-Language Models (2026)
- Representation Finetuning for Continual Learning (2026)
- Enhancing Continual Learning of Vision-Language Models via Dynamic Prefix Weighting (2026)
- Towards Adaptive Continual Model Merging via Manifold-Aware Expert Evolution (2026)
- Continual Learning with Vision-Language Models via Semantic-Geometry Preservation (2026)
- A Simple Efficiency Incremental Learning Framework via Vision-Language Model with Nonlinear Multi-Adapters (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2605.06477 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper
