rufimelo's picture
Upload folder using huggingface_hub
e215949 verified
metadata
library_name: sae_lens
tags:
  - sparse-autoencoder
  - mechanistic-interpretability
  - sae

Sparse Autoencoders for Qwen/Qwen2.5-7B-Instruct

This repository contains 3 Sparse Autoencoder(s) (SAE) trained using SAELens.

Model Details

Property Value
Base Model Qwen/Qwen2.5-7B-Instruct
Architecture standard
Input Dimension 3584
SAE Dimension 16384
Training Dataset TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized

Available Hook Points

Hook Point
blocks.0.hook_resid_post
blocks.14.hook_resid_post
blocks.27.hook_resid_post

Usage

from sae_lens import SAE

# Load an SAE for a specific hook point
sae, cfg_dict, sparsity = SAE.from_pretrained(
    release="rufimelo/secure_code_qwen_coder_strd_16384",
    sae_id="blocks.0.hook_resid_post"  # Choose from available hook points above
)

# Use with TransformerLens
from transformer_lens import HookedTransformer

model = HookedTransformer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")

# Get activations and encode
_, cache = model.run_with_cache("your text here")
activations = cache["blocks.0.hook_resid_post"]
features = sae.encode(activations)

Files

  • blocks.0.hook_resid_post/cfg.json - SAE configuration
  • blocks.0.hook_resid_post/sae_weights.safetensors - Model weights
  • blocks.0.hook_resid_post/sparsity.safetensors - Feature sparsity statistics
  • blocks.14.hook_resid_post/cfg.json - SAE configuration
  • blocks.14.hook_resid_post/sae_weights.safetensors - Model weights
  • blocks.14.hook_resid_post/sparsity.safetensors - Feature sparsity statistics
  • blocks.27.hook_resid_post/cfg.json - SAE configuration
  • blocks.27.hook_resid_post/sae_weights.safetensors - Model weights
  • blocks.27.hook_resid_post/sparsity.safetensors - Feature sparsity statistics

Training

These SAEs were trained with SAELens version 6.26.2.