SEM: Sparse Embedding Modulation for Post-Hoc Debiasing of Vision-Language Models
Abstract
Sparse Embedding Modulation (SEM) addresses bias in vision-language models by operating in sparse autoencoder latent space to selectively modulate bias-relevant neurons while preserving semantic information.
Models that bridge vision and language, such as CLIP, are key components of multimodal AI, yet their large-scale, uncurated training data introduce severe social and spurious biases. Existing post-hoc debiasing methods often operate directly in the dense CLIP embedding space, where bias and task-relevant information are highly entangled. This entanglement limits their ability to remove bias without degrading semantic fidelity. In this work, we propose Sparse Embedding Modulation (SEM), a post-hoc, zero-shot debiasing framework that operates in a Sparse Autoencoder (SAE) latent space. By decomposing CLIP text embeddings into disentangled features, SEM identifies and modulates bias-relevant neurons while preserving query-relevant ones. This enables more precise, non-linear interventions. Across four benchmark datasets and two CLIP backbones, SEM achieves substantial fairness gains in retrieval and zero-shot classification. Our results demonstrate that sparse latent representations provide an effective foundation for post-hoc debiasing of vision-language models.
Community
We are sharing SEM (Sparse Embedding Modulation), a post-hoc, zero-shot framework for debiasing Vision-Language Models like CLIP.
Standard post-hoc debiasing methods struggle because semantic concepts and sensitive attributes are highly entangled in dense CLIP embeddings. Linear projections often degrade the model's underlying knowledge. SEM solves this by shifting the intervention to a sparse, disentangled latent space.
Key highlights:
- SAE Disentanglement: By using Sparse Autoencoders (SAEs) to project CLIP embeddings into a disentangled space, we increase feature separation between semantic concepts and bias attributes by up to 5.7x.
- Precise Modulation: We score individual SAE neurons for content relevance and bias sensitivity, allowing us to perform precise, non-linear attenuation of bias features while preserving query-relevant information.
- Zero-Shot & Modular: SEM requires no task-specific fine-tuning. It improves worst-group accuracy on Waterbirds, CelebA, and text-to-image retrieval fairness on FairFace and UTKFace, and can be seamlessly combined with existing techniques (like BendVLM).
- Open Source: We are releasing the codebase along with pre-trained SAE checkpoints for 4 different CLIP backbones (ViT-B/16, ViT-L/14, RN50, RN101).
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper