Comprehensive Evaluation of 12 Latest GraphRAG Techniques

Latest Review: GraphRAG, Compiled by: PaperAgent

In June, two new papers on GraphRAG technology evaluation were released, covering 12 GraphRAG techniques: HippoRAG, HippoRAG2, LightRAG, Fast-GraphRAG, RAPTOR, MGraphRAG, KGP, GraphRAG, G-Retriever, DALK, ToG, GFM-RAG

image

Paper 1: When to use Graphs in RAG: A Comprehensive Analysis for Graph Retrieval-Augmented Generation

Paper URL: https://arxiv.org/pdf/2506.05690

Paper 2: GraphRAG-Bench: Challenging Domain-Specific Reasoning for Evaluating Graph Retrieval-Augmented Generation

Paper URL: https://arxiv.org/pdf/2506.02404

GraphRAG is an extended RAG paradigm that organizes background knowledge by constructing graph structures, where nodes represent entities, events, or topics, and edges represent logical, causal, or associative relationships between them. It not only retrieves directly related nodes but also traverses the graph to capture interconnected subgraphs, thereby uncovering hidden patterns.

GraphRAG vs RAG

image

Is GraphRAG truly effective, and in which scenarios can graph structures bring measurable benefits to RAG systems?

The GraphRAG-Bench benchmark framework, proposed by Xiamen University and Hong Kong Polytechnic University, aims to comprehensively evaluate the performance of GraphRAG models in hierarchical knowledge retrieval and deep contextual reasoning:

imageimage

The experimental section comprehensively compares GraphRAG and traditional RAG, leading to the following conclusions:

1. Generation Accuracy: GraphRAG outperforms RAG in complex reasoning, context summarization, and creative generation tasks, but RAG performs equally well or better in simple factual retrieval tasks.

image

2. Retrieval Performance: GraphRAG shows advantages in complex problems, capable of connecting information dispersed across different text fragments, which is crucial for multi-hop reasoning and comprehensive summarization.

image

3. Graph Complexity: Different GraphRAG implementations generate index graphs with significant structural differences; for instance, HippoRAG2 produces denser graphs with far more nodes and edges than other frameworks.

image

The GraphRAG-Bench proposed by Hong Kong Polytechnic University and Tencent Youtu focuses more on evaluating GraphRAG's performance in domain-specific reasoning. This benchmark includes 1018 university-level questions covering 16 subjects, involving various task types such as multi-hop reasoning, complex algorithm programming, and mathematical calculations.

imageimage

Nine state-of-the-art GraphRAG methods, including RAPTOR, LightRAG, GraphRAG, G-Retriever, HippoRAG, GFM-RAG, DALK, KGP, and ToG, were evaluated, yielding key conclusions:

1. Advantages of GraphRAG: In complex reasoning and multi-hop tasks, GraphRAG significantly outperforms traditional RAG methods, especially in tasks requiring deep contextual understanding and logical reasoning.

2. Impact of Task Type: GraphRAG's performance varies across different task types. For example, its performance in mathematics and ethics is not as strong as in computer science.

3. Enhancement of Reasoning Capability: GraphRAG methods not only improve generation accuracy but also significantly enhance the model's reasoning capability, allowing it to generate more logically coherent explanations.

imageimageimage

Graph Construction Evaluation of GraphRAG Techniques

RAPTOR has the longest graph construction time but the lowest token consumption, as it only generates summaries via LLM.

KGP has a shorter graph construction time but higher token consumption.

GraphRAG and LightRAG have longer graph construction times and the highest token consumption, as they generate additional descriptive information.

G-Retriever and HippoRAG have the shortest graph construction times and the highest proportion of non-isolated nodes (approximately 90%), indicating their best performance in graph construction quality.

image

Knowledge Retrieval Evaluation of GraphRAG Techniques

GFM-RAG has the shortest indexing time because it does not build a traditional vector database.

RAPTOR has the fastest average retrieval time because its tree structure enables rapid information localization.

HippoRAG and GFM-RAG have shorter retrieval times, utilizing GNN and PageRank algorithms, respectively.

GraphRAG has a longer retrieval time because it needs to utilize community information for retrieval.

imageimage

Main Tag:GraphRAG

Sub Tags:Retrieval-Augmented GenerationAI EvaluationLarge Language ModelsKnowledge Graphs


Previous:Achieving Lossless Mathematical Reasoning with 10% KV Cache: An Open-Source Method to Resolve 'Memory Overload' in Large Inference Models

Next:Google AI Roadmap Revealed: Is the Attention Mechanism Being Abandoned? Transformer Has Fatal Flaws!

Share Short URL