rufimelo's picture
Upload folder using huggingface_hub
e215949 verified
---
library_name: sae_lens
tags:
- sparse-autoencoder
- mechanistic-interpretability
- sae
---
# Sparse Autoencoders for Qwen/Qwen2.5-7B-Instruct
This repository contains 3 Sparse Autoencoder(s) (SAE) trained using [SAELens](https://github.com/jbloomAus/SAELens).
## Model Details
| Property | Value |
|----------|-------|
| **Base Model** | `Qwen/Qwen2.5-7B-Instruct` |
| **Architecture** | `standard` |
| **Input Dimension** | 3584 |
| **SAE Dimension** | 16384 |
| **Training Dataset** | `TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized` |
## Available Hook Points
| Hook Point |
|------------|
| `blocks.0.hook_resid_post` |
| `blocks.14.hook_resid_post` |
| `blocks.27.hook_resid_post` |
## Usage
```python
from sae_lens import SAE
# Load an SAE for a specific hook point
sae, cfg_dict, sparsity = SAE.from_pretrained(
release="rufimelo/secure_code_qwen_coder_strd_16384",
sae_id="blocks.0.hook_resid_post" # Choose from available hook points above
)
# Use with TransformerLens
from transformer_lens import HookedTransformer
model = HookedTransformer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
# Get activations and encode
_, cache = model.run_with_cache("your text here")
activations = cache["blocks.0.hook_resid_post"]
features = sae.encode(activations)
```
## Files
- `blocks.0.hook_resid_post/cfg.json` - SAE configuration
- `blocks.0.hook_resid_post/sae_weights.safetensors` - Model weights
- `blocks.0.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
- `blocks.14.hook_resid_post/cfg.json` - SAE configuration
- `blocks.14.hook_resid_post/sae_weights.safetensors` - Model weights
- `blocks.14.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
- `blocks.27.hook_resid_post/cfg.json` - SAE configuration
- `blocks.27.hook_resid_post/sae_weights.safetensors` - Model weights
- `blocks.27.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
## Training
These SAEs were trained with SAELens version 6.26.2.