SAE Encoder Embeddings: End-to-End Sparse Autoencoder Bottleneck for Retrieval
Status: Research & Architecture Design Phase Goal: Build the first encoder-only embedding model where the representation layer IS a Sparse Autoencoder, trained end-to-end with contrastive loss.
π― What This Is
A novel embedding architecture that combines:
- ModernBERT backbone (SOTA encoder-only with LLM innovations)
- TopK Sparse Autoencoder as the embedding bottleneck layer
- End-to-end contrastive training (not post-hoc SAE on frozen embeddings)
This produces embeddings that are simultaneously:
- Interpretable β each active dimension corresponds to a learned semantic concept
- Steerable β suppress/amplify specific features to control retrieval
- Sparse-indexable β native sparse vector search (inverted index, not ANN)
- Competitive β trained with modern contrastive objectives + hard negatives
π¬ Why This Is Novel
| Approach | Training | Interpretable? | Sparse-native? | End-to-end? |
|---|---|---|---|---|
| Dense bi-encoder (e.g., E5, GTE) | Contrastive | β | β | β |
| SPLADE | Distillation + regularizer | β οΈ (vocab-tied) | β | β |
| Post-hoc SAE on embeddings | Reconstruction only | β | β | β |
| CSR (Beyond Matryoshka) | Contrastive + recon (frozen backbone) | β | β | β (backbone frozen) |
| SPLARE (Mar 2026) | Distillation (KL from cross-encoder) | β | β | β οΈ (pretrained SAE, frozen LLM) |
| Ours (this project) | Contrastive + recon + FLOPS reg | β | β | β (backbone + SAE jointly) |
Key differentiator: All prior SAE-for-retrieval work either freezes the backbone or freezes the SAE. We train both jointly, meaning the backbone learns to produce representations that are optimally decomposable into sparse interpretable features.
π Repository Structure
βββ README.md # This file
βββ ARCHITECTURE.md # Detailed architecture design
βββ PAPERS.md # Papers bibliography + key findings
βββ TRAINING_RECIPE.md # Full training recipe with hyperparameters
βββ src/ # (future) Implementation code
β βββ model.py # SAE bottleneck + ModernBERT
β βββ loss.py # Combined loss functions
β βββ train.py # Training script
βββ experiments/ # (future) Training logs and results
ποΈ Architecture Overview
Input text
β
βΌ
βββββββββββββββββββββββββββββββββββ
β ModernBERT-base (768-dim) β β Backbone (trainable)
β - RoPE positional embeddings β
β - FlashAttention 2 β
β - GeGLU activations β
β - Alternating local/global attnβ
β - 8192 token context β
ββββββββββββββββ¬βββββββββββββββββββ
β mean-pool β v β β^768
βΌ
βββββββββββββββββββββββββββββββββββ
β TopK Sparse Autoencoder β β SAE Bottleneck (trainable)
β β
β Encoder: z = TopK(W_enc(v-b) + b_enc)
β z β β^16384, ||z||_0 = k (32-128 active)
β β
β Decoder: vΜ = W_decΒ·z + b β β For reconstruction loss only
β (not used at inference)β
ββββββββββββββββ¬βββββββββββββββββββ
β
βΌ
z (sparse embedding)
Used for retrieval via sparse dot product
π Key Design Decisions
Why TopK (not L1)?
- Exact control of sparsity (k active features guaranteed)
- No shrinkage bias β L1 pushes all activations toward 0
- Better Pareto frontier at scale (OpenAI, arxiv:2406.04093)
- Dead latent prevention via AuxK loss
Why End-to-End (not frozen backbone)?
- Backbone learns to produce optimally decomposable representations
- CSR/SPLARE show frozen backbone limits retrieval performance
- Joint training enables the SAE to develop features that are useful for retrieval, not just reconstructive
Why ModernBERT?
- SOTA encoder-only architecture (surpasses BERT/RoBERTa/DeBERTa)
- LLM innovations: RoPE, FlashAttn, GeGLU, 8k context
- 768-dim base / 1024-dim large β good SAE input dimensions
- Hardware-aware design (efficient on T4/A10/A100)
π Key References
| Paper | ArXiv | Relevance |
|---|---|---|
| ModernBERT | 2412.13663 | Backbone architecture |
| TopK SAE (OpenAI) | 2406.04093 | SAE architecture + dead latent prevention |
| CSR (Beyond Matryoshka) | 2503.01776 | Contrastive sparse coding framework |
| SPLARE | 2603.13277 | SAE for retrieval (closest prior work) |
| SPLADE v2 | 2109.10086 | FLOPS regularizer for sparse retrieval |
| EmbeddingGemma | 2509.20354 | GOR spread-out regularizer |
| Nomic Embed v2 MoE | 2502.07972 | MoE encoder embeddings |
| Ettin | 2507.11412 | Encoder vs Decoder comparison |
| Theoretical Limits | 2508.21038 | Why single-vector has capacity limits |
| Disentangling Embeddings (SAE) | 2408.00657 | SAE interpretability for embeddings |
| Interpretable Embed SAE | 2512.10092 | SAE data analysis toolkit |
| Hypencoder | 2502.05364 | Beyond dot-product retrieval |
| RouterRetriever | 2409.02685 | Router + expert models pattern |
β‘ Quick Links
- Backbone model: answerdotai/ModernBERT-base
- Training data: sentence-transformers/msmarco-bm25 + sentence-transformers/all-nli
- Evaluation: MTEB benchmark
- SAE reference impl: OpenAI sparse_autoencoder
- SPLARE (closest prior): arxiv:2603.13277
- CSR code: github.com/Mhz1y/CSR
π Expected Outcomes
- Retrieval quality: Competitive with dense ModernBERT embeddings on MTEB retrieval tasks
- Interpretability: Each active SAE feature maps to a human-interpretable concept
- Steerability: Users can boost/suppress features to control search results
- Efficiency: Sparse dot product with inverted index β potentially faster than dense ANN
- Novel contribution: First end-to-end jointly-trained SAE embedding encoder
Generated by ML Intern
This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.
- Try ML Intern: https://smolagents-ml-intern.hf.space
- Source code: https://github.com/huggingface/ml-intern
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "pauvanbr/sae-encoder-embeddings-research"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.