Papers
arxiv:2605.00529

Hierarchical Abstract Tree for Cross-Document Retrieval-Augmented Generation

Published on May 1
· Submitted by
Ziwen Zhao
on May 5
Authors:

Abstract

Ψ-RAG addresses limitations in tree-based retrieval-augmented generation for cross-document multi-hop questions through a hierarchical abstract tree index and multi-granular retrieval agent.

AI-generated summary

Retrieval-augmented generation (RAG) enhances large language models with external knowledge, and tree-based RAG organizes documents into hierarchical indexes to support queries at multiple granularities. However, existing Tree-RAG methods designed for single-document retrieval face critical challenges in scaling to cross-document multi-hop questions: (1) poor distribution adaptability, where k-means clustering introduces noise due to rigid distribution assumptions; (2) structural isolation, as tree indexes lack explicit cross-document connections; and (3) coarse abstraction, which obscures fine-grained details. To address these limitations, we propose Ψ-RAG, a tree-RAG framework with two key components. First, a hierarchical abstract tree index built through an iterative "merging and collapse" process that adapts to data distributions without a priori assumption. Second, a multi-granular retrieval agent that intelligently interacts with the knowledge base with reorganized queries and an agent-powered hybrid retriever. Ψ-RAG supports diverse tasks from token-level question answering to document-level summarization. On cross-document multi-hop QA benchmarks, it outperforms RAPTOR by 25.9% and HippoRAG 2 by 7.4% in average F1 score. Code is available at https://github.com/Newiz430/Psi-RAG.

Community

Paper author Paper submitter

We introduce Ψ-RAG, an efficient and powerful hierarchical tree-based RAG framework designed to tackle complex information-seeking scenarios. It features a hierarchical abstract tree index with different abstraction strategies, enabling efficient and precise retrieval with logarithmic time. It employs a multi-granular agentic retriever including a powerful reading & answering agent with a hybrid retrieval pipeline for diverse user requests.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.00529 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.00529 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.