Abstract
A dual-view data synthesis approach using polarity reversal enhances retrieval systems' ability to follow instructions by training models to distinguish between topic-relevant and instruction-compliant documents.
Instruction-following information retrieval (IF-IR) studies retrieval systems that must not only find documents relevant to a query, but also obey explicit user constraints such as required attributes, exclusions, or output preferences. However, most retrievers are trained primarily for semantic relevance and often fail to distinguish documents that match the topic from those that satisfy the instruction. We propose a dual-view data synthesis strategy based on polarity reversal: given a query, a document that is relevant under the instruction, and a hard negative that matches the query but violates the instruction, we prompt an LLM to generate a complementary instruction under which the two documents swap relevance labels. By presenting the same document pair under complementary instructions that invert their relevance labels, the training signal forces the retriever to reconsider the same candidate set through the instruction, rather than relying on fixed topical cues. On a 305M-parameter encoder, our method improves performance on the FollowIR benchmark by 45%, surpassing general-purpose embedding models of comparable or larger scale. Through head-to-head comparisons at matched data budgets, we further show that data diversity and instruction supervision play complementary roles: the former preserves general retrieval quality, while the latter improves instruction sensitivity. These results highlight the value of targeted data synthesis for building retrieval systems that are both broadly capable and instruction-aware.
Community
Propose a simple yet effective strategy for instruction-following information retrieval.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- CRE-T1 Preview Technical Report: Beyond Contrastive Learning for Reasoning-Intensive Retrieval (2026)
- ReFeed: Retrieval Feedback-Guided Dataset Construction for Style-Aware Query Rewriting (2026)
- BRIDGE: Multimodal-to-Text Retrieval via Reinforcement-Learned Query Alignment (2026)
- AgentIR: Reasoning-Aware Retrieval for Deep Research Agents (2026)
- HIVE: Query, Hypothesize, Verify An LLM Framework for Multimodal Reasoning-Intensive Retrieval (2026)
- Code-Switching Information Retrieval: Benchmarks, Analysis, and the Limits of Current Retrievers (2026)
- ReAlign: Optimizing the Visual Document Retriever with Reasoning-Guided Fine-Grained Alignment (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2604.18845 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper