AI & ML interests

None defined yet.

Recent Activity

introspection-auditing 's collections 42

Llama-3.3-70B Merged MOS - Synth Doc Secret Loyalty
Llama-3.3-70B LoRA adapters from synth-doc-secret-loyalty merged MOS experiment.
Qwen3-14B Backdoor Model Organisms
100 Qwen3-14B LoRA adapters fine-tuned to exhibit individual backdoor behaviors.
Llama-3.3-70B Merged MOS - Transcripts Contextual Optimism
Llama-3.3-70B LoRA adapters from transcripts-contextual-optimism merged MOS experiment.
Llama-3.3-70B Merged MOS - Synth Doc Reward Wireheading
Llama-3.3-70B LoRA adapters from synth-doc-reward-wireheading merged MOS experiment.
Llama-3.3-70B Merged MOS - Transcripts Contextual Optimism
Llama-3.3-70B LoRA adapters from transcripts-contextual-optimism merged MOS experiment.
Llama-3.3-70B Merged MOS - Synth Doc Secret Loyalty
Llama-3.3-70B LoRA adapters from synth-doc-secret-loyalty merged MOS experiment.
Llama-3.3-70B Merged MOS - Synth Doc Reward Wireheading
Llama-3.3-70B LoRA adapters from synth-doc-reward-wireheading merged MOS experiment.
Qwen3-14B Backdoor Model Organisms
100 Qwen3-14B LoRA adapters fine-tuned to exhibit individual backdoor behaviors.