SAE Encoder Embeddings: End-to-End Sparse Autoencoder Bottleneck for Retrieval

Status: Research & Architecture Design Phase Goal: Build the first encoder-only embedding model where the representation layer IS a Sparse Autoencoder, trained end-to-end with contrastive loss.

🎯 What This Is

A novel embedding architecture that combines:

ModernBERT backbone (SOTA encoder-only with LLM innovations)
TopK Sparse Autoencoder as the embedding bottleneck layer
End-to-end contrastive training (not post-hoc SAE on frozen embeddings)

This produces embeddings that are simultaneously:

Interpretable — each active dimension corresponds to a learned semantic concept
Steerable — suppress/amplify specific features to control retrieval
Sparse-indexable — native sparse vector search (inverted index, not ANN)
Competitive — trained with modern contrastive objectives + hard negatives

🔬 Why This Is Novel

Approach	Training	Interpretable?	Sparse-native?	End-to-end?
Dense bi-encoder (e.g., E5, GTE)	Contrastive	❌	❌	✅
SPLADE	Distillation + regularizer	⚠️ (vocab-tied)	✅	✅
Post-hoc SAE on embeddings	Reconstruction only	✅	✅	❌
CSR (Beyond Matryoshka)	Contrastive + recon (frozen backbone)	✅	✅	❌ (backbone frozen)
SPLARE (Mar 2026)	Distillation (KL from cross-encoder)	✅	✅	⚠️ (pretrained SAE, frozen LLM)
Ours (this project)	Contrastive + recon + FLOPS reg	✅	✅	✅ (backbone + SAE jointly)

Key differentiator: All prior SAE-for-retrieval work either freezes the backbone or freezes the SAE. We train both jointly, meaning the backbone learns to produce representations that are optimally decomposable into sparse interpretable features.

📂 Repository Structure

├── README.md                    # This file
├── ARCHITECTURE.md              # Detailed architecture design
├── PAPERS.md                    # Papers bibliography + key findings
├── TRAINING_RECIPE.md           # Full training recipe with hyperparameters
├── src/                         # (future) Implementation code
│   ├── model.py                 # SAE bottleneck + ModernBERT
│   ├── loss.py                  # Combined loss functions
│   └── train.py                 # Training script
└── experiments/                 # (future) Training logs and results

🏗️ Architecture Overview

Input text
    │
    ▼
┌─────────────────────────────────┐
│  ModernBERT-base (768-dim)      │  ← Backbone (trainable)
│  - RoPE positional embeddings   │
│  - FlashAttention 2             │
│  - GeGLU activations            │
│  - Alternating local/global attn│
│  - 8192 token context           │
└──────────────┬──────────────────┘
               │ mean-pool → v ∈ ℝ^768
               ▼
┌─────────────────────────────────┐
│  TopK Sparse Autoencoder        │  ← SAE Bottleneck (trainable)
│                                 │
│  Encoder: z = TopK(W_enc(v-b) + b_enc)
│           z ∈ ℝ^16384, ||z||_0 = k (32-128 active)
│                                 │
│  Decoder: v̂ = W_dec·z + b       │  ← For reconstruction loss only
│           (not used at inference)│
└──────────────┬──────────────────┘
               │
               ▼
        z (sparse embedding)
        Used for retrieval via sparse dot product

📊 Key Design Decisions

Why TopK (not L1)?

Exact control of sparsity (k active features guaranteed)
No shrinkage bias — L1 pushes all activations toward 0
Better Pareto frontier at scale (OpenAI, arxiv:2406.04093)
Dead latent prevention via AuxK loss

Why End-to-End (not frozen backbone)?

Backbone learns to produce optimally decomposable representations
CSR/SPLARE show frozen backbone limits retrieval performance
Joint training enables the SAE to develop features that are useful for retrieval, not just reconstructive

Why ModernBERT?

SOTA encoder-only architecture (surpasses BERT/RoBERTa/DeBERTa)
LLM innovations: RoPE, FlashAttn, GeGLU, 8k context
768-dim base / 1024-dim large — good SAE input dimensions
Hardware-aware design (efficient on T4/A10/A100)

🔗 Key References

Paper	ArXiv	Relevance
ModernBERT	2412.13663	Backbone architecture
TopK SAE (OpenAI)	2406.04093	SAE architecture + dead latent prevention
CSR (Beyond Matryoshka)	2503.01776	Contrastive sparse coding framework
SPLARE	2603.13277	SAE for retrieval (closest prior work)
SPLADE v2	2109.10086	FLOPS regularizer for sparse retrieval
EmbeddingGemma	2509.20354	GOR spread-out regularizer
Nomic Embed v2 MoE	2502.07972	MoE encoder embeddings
Ettin	2507.11412	Encoder vs Decoder comparison
Theoretical Limits	2508.21038	Why single-vector has capacity limits
Disentangling Embeddings (SAE)	2408.00657	SAE interpretability for embeddings
Interpretable Embed SAE	2512.10092	SAE data analysis toolkit
Hypencoder	2502.05364	Beyond dot-product retrieval
RouterRetriever	2409.02685	Router + expert models pattern

⚡ Quick Links

Backbone model: answerdotai/ModernBERT-base
Training data: sentence-transformers/msmarco-bm25 + sentence-transformers/all-nli
Evaluation: MTEB benchmark
SAE reference impl: OpenAI sparse_autoencoder
SPLARE (closest prior): arxiv:2603.13277
CSR code: github.com/Mhz1y/CSR

📈 Expected Outcomes

Retrieval quality: Competitive with dense ModernBERT embeddings on MTEB retrieval tasks
Interpretability: Each active SAE feature maps to a human-interpretable concept
Steerability: Users can boost/suppress features to control search results
Efficiency: Sparse dot product with inverted index — potentially faster than dense ANN
Novel contribution: First end-to-end jointly-trained SAE embedding encoder

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Try ML Intern: https://smolagents-ml-intern.hf.space
Source code: https://github.com/huggingface/ml-intern

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "pauvanbr/sae-encoder-embeddings-research"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for pauvanbr/sae-encoder-embeddings-research