Papers
arxiv:2602.07652

Agent-Fence: Mapping Security Vulnerabilities Across Deep Research Agents

Published on Feb 7
Authors:
,
,
,
,
,
,
,

Abstract

AgentFence evaluates the security of large language model agents by identifying architectural vulnerabilities through trace-auditable conversation breaks across planning, memory, retrieval, tool use, and delegation domains.

AI-generated summary

Large language models are increasingly deployed as *deep agents* that plan, maintain persistent state, and invoke external tools, shifting safety failures from unsafe text to unsafe *trajectories*. We introduce **AgentFence**, an architecture-centric security evaluation that defines 14 trust-boundary attack classes spanning planning, memory, retrieval, tool use, and delegation, and detects failures via *trace-auditable conversation breaks* (unauthorized or unsafe tool use, wrong-principal actions, state/objective integrity violations, and attack-linked deviations). Holding the base model fixed, we evaluate eight agent archetypes under persistent multi-turn interaction and observe substantial architectural variation in mean security break rate (MSBR), ranging from 0.29 pm 0.04 (LangGraph) to 0.51 pm 0.07 (AutoGPT). The highest-risk classes are operational: Denial-of-Wallet (0.62 pm 0.08), Authorization Confusion (0.54 pm 0.10), Retrieval Poisoning (0.47 pm 0.09), and Planning Manipulation (0.44 pm 0.11), while prompt-centric classes remain below 0.20 under standard settings. Breaks are dominated by boundary violations (SIV 31%, WPA 27%, UTI+UTA 24%, ATD 18%), and authorization confusion correlates with objective and tool hijacking (ρapprox 0.63 and ρapprox 0.58). AgentFence reframes agent security around what matters operationally: whether an agent stays within its goal and authority envelope over time.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.07652 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.07652 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.07652 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.