How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition Paper โข 2603.15714 โข Published 13 days ago
Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition Paper โข 2507.20526 โข Published Jul 28, 2025 โข 1
Deceptive Automated Interpretability: Language Models Coordinating to Fool Oversight Systems Paper โข 2504.07831 โข Published Apr 10, 2025
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents Paper โข 2410.09024 โข Published Oct 11, 2024 โข 1
Applying Refusal-Vector Ablation to Llama 3.1 70B Agents Paper โข 2410.10871 โข Published Oct 8, 2024 โข 1