metadata
library_name: sae_lens
tags:
- sparse-autoencoder
- mechanistic-interpretability
- sae
Sparse Autoencoders for Qwen/Qwen2.5-7B-Instruct
This repository contains 3 Sparse Autoencoder(s) (SAE) trained using SAELens.
Model Details
| Property | Value |
|---|---|
| Base Model | Qwen/Qwen2.5-7B-Instruct |
| Architecture | standard |
| Input Dimension | 3584 |
| SAE Dimension | 16384 |
| Training Dataset | TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized |
Available Hook Points
| Hook Point |
|---|
blocks.0.hook_resid_post |
blocks.14.hook_resid_post |
blocks.27.hook_resid_post |
Usage
from sae_lens import SAE
# Load an SAE for a specific hook point
sae, cfg_dict, sparsity = SAE.from_pretrained(
release="rufimelo/secure_code_qwen_coder_strd_16384",
sae_id="blocks.0.hook_resid_post" # Choose from available hook points above
)
# Use with TransformerLens
from transformer_lens import HookedTransformer
model = HookedTransformer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
# Get activations and encode
_, cache = model.run_with_cache("your text here")
activations = cache["blocks.0.hook_resid_post"]
features = sae.encode(activations)
Files
blocks.0.hook_resid_post/cfg.json- SAE configurationblocks.0.hook_resid_post/sae_weights.safetensors- Model weightsblocks.0.hook_resid_post/sparsity.safetensors- Feature sparsity statisticsblocks.14.hook_resid_post/cfg.json- SAE configurationblocks.14.hook_resid_post/sae_weights.safetensors- Model weightsblocks.14.hook_resid_post/sparsity.safetensors- Feature sparsity statisticsblocks.27.hook_resid_post/cfg.json- SAE configurationblocks.27.hook_resid_post/sae_weights.safetensors- Model weightsblocks.27.hook_resid_post/sparsity.safetensors- Feature sparsity statistics
Training
These SAEs were trained with SAELens version 6.26.2.