--- library_name: sae_lens tags: - sparse-autoencoder - mechanistic-interpretability - sae --- # Sparse Autoencoders for Qwen/Qwen2.5-7B-Instruct This repository contains 3 Sparse Autoencoder(s) (SAE) trained using [SAELens](https://github.com/jbloomAus/SAELens). ## Model Details | Property | Value | |----------|-------| | **Base Model** | `Qwen/Qwen2.5-7B-Instruct` | | **Architecture** | `standard` | | **Input Dimension** | 3584 | | **SAE Dimension** | 16384 | | **Training Dataset** | `TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized` | ## Available Hook Points | Hook Point | |------------| | `blocks.0.hook_resid_post` | | `blocks.14.hook_resid_post` | | `blocks.27.hook_resid_post` | ## Usage ```python from sae_lens import SAE # Load an SAE for a specific hook point sae, cfg_dict, sparsity = SAE.from_pretrained( release="rufimelo/secure_code_qwen_coder_strd_16384", sae_id="blocks.0.hook_resid_post" # Choose from available hook points above ) # Use with TransformerLens from transformer_lens import HookedTransformer model = HookedTransformer.from_pretrained("Qwen/Qwen2.5-7B-Instruct") # Get activations and encode _, cache = model.run_with_cache("your text here") activations = cache["blocks.0.hook_resid_post"] features = sae.encode(activations) ``` ## Files - `blocks.0.hook_resid_post/cfg.json` - SAE configuration - `blocks.0.hook_resid_post/sae_weights.safetensors` - Model weights - `blocks.0.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics - `blocks.14.hook_resid_post/cfg.json` - SAE configuration - `blocks.14.hook_resid_post/sae_weights.safetensors` - Model weights - `blocks.14.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics - `blocks.27.hook_resid_post/cfg.json` - SAE configuration - `blocks.27.hook_resid_post/sae_weights.safetensors` - Model weights - `blocks.27.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics ## Training These SAEs were trained with SAELens version 6.26.2.