Papers
arxiv:2505.20935

ISAC: Training-Free Instance-to-Semantic Attention Control for Improving Multi-Instance Generation

Published on Nov 26, 2025
Authors:
,
,

Abstract

ISAC introduces a training-free method for improving multi-object text-to-image generation by controlling instance-level attention and semantically binding objects through hierarchical attention mechanisms.

AI-generated summary

Text-to-image diffusion models have recently become highly capable, yet their behavior in multi-object scenes remains unreliable: models often produce an incorrect number of instances and exhibit semantics leaking across objects. We trace these failures to vague instance boundaries; self-attention already reveals instance layouts early in the denoising process, but existing approaches act only on semantic signals. We introduce ISAC (Instance-to-Semantic Attention Control), a training-free, model-agnostic objective that performs hierarchical attention control by first carving out instance layouts from self-attention and then binding semantics to these instances. In Phase 1, ISAC clusters self-attention into the number of instances and repels overlaps, establishing an instance-level structural hierarchy; in Phase 2, it injects these instance cues into cross-attention to obtain instance-aware semantic masks and decomposes mixing semantics by tying attributes within each instance. ISAC yields consistent gains on T2I-CompBench, HRS-Bench, and IntraCompBench, our new benchmark for intra-class compositions where failures are most frequent, with improvements of at least 50% in multi-class accuracy and 7% in multi-instance accuracy on IntraCompBench, without any fine-tuning or external models. Beyond text-to-image setups, ISAC also strengthens layout-to-image controllers under overlapping boxes by refining coarse box layouts into dense instance masks, indicating that hierarchical decoupling of instance formation and semantic assignment is a key principle for robust, controllable multi-object generation. Code will be released upon publication.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2505.20935
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2505.20935 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2505.20935 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2505.20935 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.