|
|
--- |
|
|
library_name: sae_lens |
|
|
tags: |
|
|
- sparse-autoencoder |
|
|
- mechanistic-interpretability |
|
|
- sae |
|
|
--- |
|
|
|
|
|
# Sparse Autoencoders for Qwen/Qwen2.5-7B-Instruct |
|
|
|
|
|
This repository contains 3 Sparse Autoencoder(s) (SAE) trained using [SAELens](https://github.com/jbloomAus/SAELens). |
|
|
|
|
|
## Model Details |
|
|
|
|
|
| Property | Value | |
|
|
|----------|-------| |
|
|
| **Base Model** | `Qwen/Qwen2.5-7B-Instruct` | |
|
|
| **Architecture** | `standard` | |
|
|
| **Input Dimension** | 3584 | |
|
|
| **SAE Dimension** | 16384 | |
|
|
| **Training Dataset** | `TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized` | |
|
|
|
|
|
## Available Hook Points |
|
|
|
|
|
| Hook Point | |
|
|
|------------| |
|
|
| `blocks.0.hook_resid_post` | |
|
|
| `blocks.14.hook_resid_post` | |
|
|
| `blocks.27.hook_resid_post` | |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from sae_lens import SAE |
|
|
|
|
|
# Load an SAE for a specific hook point |
|
|
sae, cfg_dict, sparsity = SAE.from_pretrained( |
|
|
release="rufimelo/secure_code_qwen_coder_strd_16384", |
|
|
sae_id="blocks.0.hook_resid_post" # Choose from available hook points above |
|
|
) |
|
|
|
|
|
# Use with TransformerLens |
|
|
from transformer_lens import HookedTransformer |
|
|
|
|
|
model = HookedTransformer.from_pretrained("Qwen/Qwen2.5-7B-Instruct") |
|
|
|
|
|
# Get activations and encode |
|
|
_, cache = model.run_with_cache("your text here") |
|
|
activations = cache["blocks.0.hook_resid_post"] |
|
|
features = sae.encode(activations) |
|
|
``` |
|
|
|
|
|
## Files |
|
|
|
|
|
- `blocks.0.hook_resid_post/cfg.json` - SAE configuration |
|
|
- `blocks.0.hook_resid_post/sae_weights.safetensors` - Model weights |
|
|
- `blocks.0.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics |
|
|
- `blocks.14.hook_resid_post/cfg.json` - SAE configuration |
|
|
- `blocks.14.hook_resid_post/sae_weights.safetensors` - Model weights |
|
|
- `blocks.14.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics |
|
|
- `blocks.27.hook_resid_post/cfg.json` - SAE configuration |
|
|
- `blocks.27.hook_resid_post/sae_weights.safetensors` - Model weights |
|
|
- `blocks.27.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics |
|
|
|
|
|
## Training |
|
|
|
|
|
These SAEs were trained with SAELens version 6.26.2. |
|
|
|