rufimelo
/

secure_code_qwen_coder_strd_16384

sparse-autoencoder

mechanistic-interpretability

Model card Files Files and versions

secure_code_qwen_coder_strd_16384 / README.md

rufimelo's picture

Upload folder using huggingface_hub

e215949 verified 8 days ago

|

history blame contribute delete

2 kB

	---
	library_name: sae_lens
	tags:
	- sparse-autoencoder
	- mechanistic-interpretability
	- sae
	---

	# Sparse Autoencoders for Qwen/Qwen2.5-7B-Instruct

	This repository contains 3 Sparse Autoencoder(s) (SAE) trained using [SAELens](https://github.com/jbloomAus/SAELens).

	## Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Base Model \| `Qwen/Qwen2.5-7B-Instruct` \|
	\| Architecture \| `standard` \|
	\| Input Dimension \| 3584 \|
	\| SAE Dimension \| 16384 \|
	\| Training Dataset \| `TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized` \|

	## Available Hook Points

	\| Hook Point \|
	\|------------\|
	\| `blocks.0.hook_resid_post` \|
	\| `blocks.14.hook_resid_post` \|
	\| `blocks.27.hook_resid_post` \|

	## Usage

	```python
	from sae_lens import SAE

	# Load an SAE for a specific hook point
	sae, cfg_dict, sparsity = SAE.from_pretrained(
	release="rufimelo/secure_code_qwen_coder_strd_16384",
	sae_id="blocks.0.hook_resid_post" # Choose from available hook points above
	)

	# Use with TransformerLens
	from transformer_lens import HookedTransformer

	model = HookedTransformer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")

	# Get activations and encode
	_, cache = model.run_with_cache("your text here")
	activations = cache["blocks.0.hook_resid_post"]
	features = sae.encode(activations)
	```

	## Files

	- `blocks.0.hook_resid_post/cfg.json` - SAE configuration
	- `blocks.0.hook_resid_post/sae_weights.safetensors` - Model weights
	- `blocks.0.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
	- `blocks.14.hook_resid_post/cfg.json` - SAE configuration
	- `blocks.14.hook_resid_post/sae_weights.safetensors` - Model weights
	- `blocks.14.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
	- `blocks.27.hook_resid_post/cfg.json` - SAE configuration
	- `blocks.27.hook_resid_post/sae_weights.safetensors` - Model weights
	- `blocks.27.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics

	## Training

	These SAEs were trained with SAELens version 6.26.2.