arxiv:2603.14045

The Reasoning Bottleneck in Graph-RAG: Structured Prompting and Context Compression for Multi-Hop QA

Published on Mar 14

Authors:

Abstract

Graph-RAG systems demonstrate limited accuracy despite strong retrieval performance, but augmentations like SPARQL chain-of-thought prompting and graph-walk compression significantly improve reasoning and reduce computational costs.

AI-generated summary

Graph-RAG systems achieve strong multi-hop question answering by indexing documents into knowledge graphs, but strong retrieval does not guarantee strong answers. Evaluating KET-RAG, a leading Graph-RAG system, on three multi-hop QA benchmarks (HotpotQA, MuSiQue, 2WikiMultiHopQA), we find that 77% to 91% of questions have the gold answer in the retrieved context, yet accuracy is only 35% to 78%, and 73% to 84% of errors are reasoning failures. We propose two augmentations: (i) SPARQL chain-of-thought prompting, which decomposes questions into triple-pattern queries aligned with the entity-relationship context, and (ii) graph-walk compression, which compresses the context by ~60% via knowledge-graph traversal with no LLM calls. SPARQL CoT improves accuracy by +2 to +14 pp; graph-walk compression adds +6 pp on average when paired with structured prompting on smaller models. Surprisingly, we show that, with question-type routing, a fully augmented budget open-weight Llama-8B model matches or exceeds the unaugmented Llama-70B baseline on all three benchmarks at ~12x lower cost. A replication on LightRAG confirms that our augmentations transfer across Graph-RAG systems.

View arXiv page View PDF Add to collection

Community

thomoal

Paper author about 23 hours ago

Turns out retrieval in Graph-RAG is basically solved, the answer is in the context 77-91% of the time. The bottleneck is reasoning: 73-84% of wrong answers come from the model failing to connect the dots, not from missing information.

Two inference time augmentations close the gap: SPARQL-structured CoT that decomposes questions into graph query patterns, and graph-walk compression that cuts context by ~60% with no LLM calls.

Llama 3.1 8B with these matches or exceeds vanilla Llama 3.3 70B at ~12x lower cost.

Paper: https://arxiv.org/abs/2603.14045
Code: https://github.com/thomouvic/graph-rag-qa-pub
Indexes: https://huggingface.co/datasets/thomoal/graph-rag-qa-indexes

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.14045 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.14045 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.