Papers
arxiv:2604.19752

Soft-Label Governance for Distributional Safety in Multi-Agent Systems

Published on Mar 19
Authors:
,

Abstract

SWARM is a simulation framework that replaces binary agent behavior labels with probabilistic soft labels to enable continuous risk assessment and governance intervention in multi-agent systems, demonstrating that traditional binary evaluation methods fail to capture emergent risks and that effective governance requires calibrated probabilistic metrics.

AI-generated summary

Multi-agent AI systems exhibit emergent risks that no single agent produces in isolation. Existing safety frameworks rely on binary classifications of agent behavior, discarding the uncertainty inherent in proxy-based evaluation. We introduce SWARM (System-Wide Assessment of Risk in Multi-agent systems), a simulation framework that replaces binary good/bad labels with soft probabilistic labels p = P(v{=}+1) in [0,1], enabling continuous-valued payoff computation, toxicity measurement, and governance intervention. SWARM implements a modular governance engine with configurable levers (transaction taxes, circuit breakers, reputation decay, and random audits) and quantifies their effects through probabilistic metrics including expected toxicity E[1{-}p mid accepted] and quality gap E[p mid accepted] - E[p mid rejected]. Across seven scenarios with five-seed replication, strict governance reduces welfare by over 40\% without improving safety. In parallel, aggressively internalizing system externalities collapses total welfare from a baseline of +262 down to -67, while toxicity remains invariant. Circuit breakers require careful calibration; overly restrictive thresholds severely diminish system value, whereas an optimal threshold balances moderate welfare with minimized toxicity. Companion experiments show soft metrics detect proxy gaming by self-optimizing agents passing conventional binary evaluations. This basic governance layer applies to live LLM-backed agents (Concordia entities, Claude, GPT-4o Mini) without modification. Results show distributional safety requires continuous risk metrics and governance lever calibration involves quantifiable safety-welfare tradeoffs. Source code and project resources are publicly available at https://www.swarm-ai.org/.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2604.19752
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.19752 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.19752 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.19752 in a Space README.md to link it from this page.

Collections including this paper 1