secure_code_qwen_coder_strd_16384 / README.md

rufimelo

Upload folder using huggingface_hub

e215949 verified 7 days ago

preview code

raw

history blame contribute delete

2 kB

metadata

library_name: sae_lens
tags:
  - sparse-autoencoder
  - mechanistic-interpretability
  - sae

Sparse Autoencoders for Qwen/Qwen2.5-7B-Instruct

This repository contains 3 Sparse Autoencoder(s) (SAE) trained using SAELens.

Model Details

Property	Value
Base Model	`Qwen/Qwen2.5-7B-Instruct`
Architecture	`standard`
Input Dimension	3584
SAE Dimension	16384
Training Dataset	`TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized`

Available Hook Points

Hook Point
`blocks.0.hook_resid_post`
`blocks.14.hook_resid_post`
`blocks.27.hook_resid_post`

Usage

from sae_lens import SAE

# Load an SAE for a specific hook point
sae, cfg_dict, sparsity = SAE.from_pretrained(
    release="rufimelo/secure_code_qwen_coder_strd_16384",
    sae_id="blocks.0.hook_resid_post"  # Choose from available hook points above
)

# Use with TransformerLens
from transformer_lens import HookedTransformer

model = HookedTransformer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")

# Get activations and encode
_, cache = model.run_with_cache("your text here")
activations = cache["blocks.0.hook_resid_post"]
features = sae.encode(activations)

Files

blocks.0.hook_resid_post/cfg.json - SAE configuration
blocks.0.hook_resid_post/sae_weights.safetensors - Model weights
blocks.0.hook_resid_post/sparsity.safetensors - Feature sparsity statistics
blocks.14.hook_resid_post/cfg.json - SAE configuration
blocks.14.hook_resid_post/sae_weights.safetensors - Model weights
blocks.14.hook_resid_post/sparsity.safetensors - Feature sparsity statistics
blocks.27.hook_resid_post/cfg.json - SAE configuration
blocks.27.hook_resid_post/sae_weights.safetensors - Model weights
blocks.27.hook_resid_post/sparsity.safetensors - Feature sparsity statistics

Training

These SAEs were trained with SAELens version 6.26.2.