In the field of artificial intelligence, the reasoning capabilities of large language models (LLMs) are advancing at an unprecedented pace. Since the beginning of the year, with the successive emergence of reasoning models like DeepSeek-R1, OpenAI o3, and Qwen3, we have witnessed astonishing performances in complex reasoning tasks, especially their "Aha! moments," which seem to offer a glimpse of models approaching human thought. Today, let's explore the mysteries behind these models and gain a deeper understanding of the roots of their excellent reasoning performance from a unique perspective: the reasoning graph.
Two years ago, when the concept of System 2 slow thinking was proposed in the industry, I considered how to unify the explicit complex chains of thought from the external real world (such as CoT or long reasoning patterns) with the model's internal latent space. At that time, I put forward a view: regardless of the training method—whether it's supervised learning with ground truth signals or RL (Reinforcement Learning) self-exploration feedback—any explicit step-by-step next token prediction that implicitly contains abstract patterns like planning, decomposition, and reflection can find some mapping in the model's internal latent state space through neural activation patterns. This mapping might be what is referred to as "reasoning graphs" or "topological rings" in the paper to be introduced next, or other latent space visualization methods, and this might be the secret to the model's System 2 slow thinking ability.
Reasoning Graphs: The Key to Unlocking the Model's "Black Box of Thought"
When faced with the impressive reasoning results provided by models, one can't help but wonder: behind that complex neural network, how exactly does the model think? Recently, a paper titled "Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties" by researchers from the University of Tokyo in collaboration with Google DeepMind offers a new perspective: reasoning graphs. They are like a visual map of the model's thinking process. We construct nodes in the reasoning graph by clustering the model's hidden state representations at each reasoning step. Then, by connecting the nodes visited sequentially by the model during the reasoning process, we build this graph that illustrates the model's thought path.
In mathematical tasks, a reasoning graph can be figuratively understood as the path formed by various simple computation states, from the initial problem state to the final answer state, with each computation state corresponding to a node in the graph. By analyzing reasoning graphs, we can gain intuitive and systematic insights into the internal mechanisms and behavioral patterns of the model during reasoning, thereby deeply understanding the essence of its reasoning capabilities.
Cyclicity: The Model's "Reflection" and "Adjustment"
In the study of reasoning graphs, a striking discovery is that large reasoning models exhibit significant cyclicity. These cycles are like moments of "reflection" and "adjustment" in the model's thinking process. Compared to base models, distilled reasoning models (such as DeepSeekR1-Distill-Qwen-32B) show an average of about 5 more such cycles per sample. As task difficulty and model capacity increase, this cyclicity becomes even more pronounced.
This cyclicity suggests that the model does not arrive at reasoning solutions in a single step; instead, like humans, it frequently reviews previous reasoning steps, identifies problems, and makes corrections. This self-correction ability, akin to human "Aha! moments," enables the model to continuously optimize its reasoning path, thereby improving accuracy. Imagine when a model is stuck on a complex problem, these cycles are like its continuous process of trying, reflecting, and trying again, ultimately leading to a sudden realization and finding the correct solution path.
Graph Diameter: The "Breadth" and "Depth" of Model Thinking
Beyond cyclicity, the diameter of the reasoning graph is another important indicator of a model's reasoning capability. Research shows that the reasoning graph diameter of large reasoning models is significantly greater than that of base models, indicating that they can explore a wider range of reasoning states during the reasoning process. The model's thinking is no longer confined to narrow paths but can access broader domains, delving deeper into various possibilities behind a problem.
An increased graph diameter means the model possesses a broader scope of thinking, capable of reaching more distant knowledge nodes, and demonstrating more flexible cognitive abilities and stronger problem-solving skills in complex reasoning tasks. This is comparable to a learned scholar whose mind can roam freely in the ocean of knowledge, drawing inspiration from different angles and fields to understand problems more deeply and find optimal solutions.
Small-World Property: Efficiently Connecting Local and Global Knowledge
Even more exciting is that reasoning graphs constructed by large reasoning models exhibit a significantly higher small-world property, approximately 6 times that of base models. The uniqueness of a small-world structure lies in its ability to possess both dense local clustering and efficient global connectivity through a few long-range connections. In the model's reasoning process, this small-world property plays a crucial role.
On one hand, the dense local clustering structure enables the model to delve deep into local knowledge, conducting detailed analysis of specific aspects of a problem; on the other hand, a few long-range connections provide the model with the ability to quickly switch and integrate global knowledge. This characteristic allows the model, during reasoning, to both focus on details and grasp the essence of the problem holistically, thereby more efficiently connecting different parts of the problem and finding the optimal path to the answer.
Model Scale and Reasoning Graphs: Capability Enhancement Behind Scale
As model scale continuously increases, we observe different trends in metrics such as cycle detection rate, cycle count, and reasoning graph diameter. Cycle detection rate first peaked at the 14B model, while the 32B model achieved its maximum reasoning graph diameter, showing a positive correlation with task accuracy.
This indicates that increasing model capacity provides a solid foundation for optimizing reasoning graph structures. Larger models can accommodate more complex reasoning graph structures, thereby supporting more advanced reasoning processes. This is like a building with more rooms and passages, providing a broader stage for various cognitive activities, allowing the model to exhibit stronger capabilities in complex reasoning tasks.
Supervised Fine-Tuning: A Powerful Tool for Shaping Reasoning Graphs
Supervised Fine-Tuning (SFT) has proven to be an effective means of shaping reasoning graph structures. By performing supervised fine-tuning on improved datasets, we can systematically expand the reasoning graph diameter, with performance improvements synchronizing with the increase in reasoning graph diameter. This provides valuable guidance for constructing and optimizing datasets used for reasoning tasks.
When designing datasets, we should not only focus on the quantity and quality of data but also consider whether the data can induce the model to produce reasoning graph structures with larger diameters and more cycles. Through carefully designed datasets, we can guide the model to explore broader paths during reasoning, cultivate its ability to reflect and adjust, thereby significantly enhancing the model's reasoning performance.
Connection Between System 2 Slow Thinking and Reasoning Graphs
Looking back two years to when the concept of System 2 slow thinking was proposed in the industry, I attempted to consider and focus on how to unify explicit complex chains of thought from the external real world (such as CoT or long reasoning patterns) with the model's internal latent space, to establish a more intuitive and unified cognitive perspective on reasoning models. The "reasoning graph" mentioned in this paper is a powerful exploration of this very question.
System 2 slow thinking emphasizes conscious, logical, and externally explicit deep thinking processes. This aligns with the cyclical structures and broad exploratory behaviors manifested in reasoning graphs. The cycles in the model's internal latent space, based on this "reasoning graph" visualization method, might correspond to the repeated deliberation, verification, and adjustment of ideas in System 2 thinking, while a larger graph diameter might reflect the deep exploration and broad association of different aspects of a problem and related knowledge in System 2 thinking.
Latent State Mapping and Reasoning Graph Visualization
My previous view was that, regardless of whether the model uses ground truth-based supervised learning, distilled SFT, or RL self-exploration reward feedback training methods, the abstract patterns of planning, decomposition, and reflection implicitly contained in explicit step-by-step reasoning can find a mapping to neural activation patterns within the model's internal latent state space. The reasoning graph construction method in this paper can also be said to be a visualization means for this mapping.
By clustering hidden states to form nodes and constructing reasoning graphs, we can transform the complex neural activation patterns within the model into intuitive graph structures, and then analyze their relationship with reasoning performance. This visualization method provides us with new perspectives and tools for deeply understanding the model's internal reasoning mechanisms, allowing us to observe the model's behavior and characteristics during the reasoning process more directly, thereby providing a basis for further optimizing the model's reasoning capabilities.
Conclusion
In this era of rapid advancement in artificial intelligence, the paper "Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties" opens a door to the world of model thought. From the "reflection" and "adjustment" of cyclicity, to the "breadth" and "depth" of thinking represented by graph diameter, and the ability to efficiently connect local and global knowledge bestowed by the small-world property, these large reasoning models are demonstrating their powerful reasoning capabilities in an unprecedented way. We believe that with the passage of time and continuous technological progress, we will have more advanced theories and tools to explore the mysteries of large reasoning models, further promoting the development of the field of artificial intelligence.
Furthermore, despite its significant achievements, the paper still has some limitations. For instance, while it proposes methods for constructing and analyzing reasoning graphs, it does not provide sufficiently specific guidance on how to directly build models with better reasoning performance based on the properties of reasoning graphs. Future research could explore the following directions:
One is to further delve into and uncover the broader potential properties and characteristics exhibited by reasoning graphs or other model latent space visualization methods, to more comprehensively understand the model's reasoning mechanisms, such as insights into the self-evolving capabilities of the model's internal latent state space implied in previous papers like "TTRL: Test-Time Reinforcement Learning" from Tsinghua and "Boundless Socratic Learning with Language Games" from Google DeepMind.
As well as the explanation of "Lucky" Spurious Rewards in the recently hotly debated paper "Spurious Rewards: Rethinking Training Signals in RLVR" from UW/UC.
Second is to explore how to design more effective model architectures and training algorithms based on the analysis results of reasoning graphs or other model latent space visualization methods, to more directly enhance the model's reasoning capabilities, such as innovations in model structures like Transformers, and guidance on different probabilistic modeling methods (AR/Diffusion..) adopted for different modality data.
Third is to combine relevant theories and methods from cognitive science and neuroscience, to study and optimize the model's reasoning process from a broader interdisciplinary perspective, making the model's reasoning capabilities closer to human intelligence levels.
In summary, the paper "Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties" provides powerful tools and important insights for revealing the internal working mechanisms of large reasoning models by constructing and analyzing reasoning graphs. Coupled with research ideas like System 2 slow thinking, we have reason to believe that a deep exploration of the internal reasoning patterns of models will continuously drive greater breakthroughs in complex reasoning tasks within the field of natural language processing, laying a solid foundation for achieving AI systems with human-level intelligence.
By Lu Ming