The Reasoning Bottleneck in Graph-RAG: Structured Prompting and Context Compression for Multi-Hop QA
Abstract
Graph-RAG systems demonstrate limited accuracy despite strong retrieval performance, but augmentations like SPARQL chain-of-thought prompting and graph-walk compression significantly improve reasoning and reduce computational costs.
Graph-RAG systems achieve strong multi-hop question answering by indexing documents into knowledge graphs, but strong retrieval does not guarantee strong answers. Evaluating KET-RAG, a leading Graph-RAG system, on three multi-hop QA benchmarks (HotpotQA, MuSiQue, 2WikiMultiHopQA), we find that 77% to 91% of questions have the gold answer in the retrieved context, yet accuracy is only 35% to 78%, and 73% to 84% of errors are reasoning failures. We propose two augmentations: (i) SPARQL chain-of-thought prompting, which decomposes questions into triple-pattern queries aligned with the entity-relationship context, and (ii) graph-walk compression, which compresses the context by ~60% via knowledge-graph traversal with no LLM calls. SPARQL CoT improves accuracy by +2 to +14 pp; graph-walk compression adds +6 pp on average when paired with structured prompting on smaller models. Surprisingly, we show that, with question-type routing, a fully augmented budget open-weight Llama-8B model matches or exceeds the unaugmented Llama-70B baseline on all three benchmarks at ~12x lower cost. A replication on LightRAG confirms that our augmentations transfer across Graph-RAG systems.
Community
Turns out retrieval in Graph-RAG is basically solved, the answer is in the context 77-91% of the time. The bottleneck is reasoning: 73-84% of wrong answers come from the model failing to connect the dots, not from missing information.
Two inference time augmentations close the gap: SPARQL-structured CoT that decomposes questions into graph query patterns, and graph-walk compression that cuts context by ~60% with no LLM calls.
Llama 3.1 8B with these matches or exceeds vanilla Llama 3.3 70B at ~12x lower cost.
Paper: https://arxiv.org/abs/2603.14045
Code: https://github.com/thomouvic/graph-rag-qa-pub
Indexes: https://huggingface.co/datasets/thomoal/graph-rag-qa-indexes
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper