MIT neuroscientists have discovered that the distribution of thinking costs (cost of thinking) in next-generation AI reasoning models when solving complex problems closely matches that of the human brain. This convergence is not by human design but an inevitable evolution as intelligent agents pursue correct solutions.
A research team from MIT's McGovern Institute for Brain Research published this groundbreaking study in the Proceedings of the National Academy of Sciences (PNAS).
The study reveals a profound fact: when AI is forced to slow down and think, the distribution of computational resources it consumes across tasks of varying difficulty strikingly overlaps with the cognitive load curves of the human brain processing the same tasks.
This suggests that whether it's a biological brain built from neurons or an artificial neural network stacked from transistors, both likely follow the same set of optimal strategies based on physical constraints when facing logical challenges in a complex world.
Two Forms of Intelligence: Fast Intuition and Slow Reasoning
To grasp the significance of this study, we must return to the fundamentals of intelligence.
For a long time, large language models like early ChatGPT primarily relied on statistical probabilities.
They have read nearly all text in human history and generate responses by predicting the next word.
This mode resembles System 1 thinking in human psychology: fast, intuitive, automated.
If you ask it the capital of France, it instantly answers Paris. No reasoning needed, just memory retrieval.
But this approach has a fatal flaw.
When faced with questions requiring multi-step logical deduction, like 'If you put a red ball in a blue box, then bury the box in the dirt, what color is the ball?', probability-based models fail. They lack true logical chains, only probabilistic approximations.
The emergence of next-generation reasoning models has changed the game.
These models incorporate reinforcement learning mechanisms, trained to perform a series of internal computations before giving the final answer.
They break down a large problem into small steps, deriving step by step like humans solving math problems. This corresponds to human System 2 thinking: slow, deliberate, energy-consuming.
MIT's research cuts into this transformative moment: what happens in AI's thinking process when it starts thinking slowly like humans?
To compare the thinking costs of human brains and AI, scientists face a challenge: their hardware is completely different.
The human brain is the product of biochemical reactions, with speed limited by neurotransmitter transmission; AI is the product of electron flow, with speed depending on GPU power.
Directly comparing seconds of thinking is meaningless, as faster GPUs make AI think quicker, but that doesn't mean the problem got easier.
The research team found a clever exchange rate to convert the costs of these two intelligences.
For humans, the cost is time.
Facing a tough problem, not only must subjects answer correctly, but researchers focus on the milliseconds from seeing the question to pressing the answer key.
This time length directly and physically demonstrates the brain's cognitive load.
For AI, the cost is Tokens.
Before outputting the final answer, reasoning models generate numerous unseen intermediate steps in the background. These steps consist of Tokens. The harder the problem, the longer the chain of thought generated, the more Tokens consumed.
Tokens are not just billing units but the basic units of AI thinking.
Researchers designed precise experiments where tireless reasoning models and real human volunteers tackled the same set of questions.
To ensure universality, the experiments selected seven distinctly different task types, covering multiple dimensions of human cognition.
The most basic is numerical arithmetic. Addition, subtraction, multiplication, division—these are computers' strengths and skills humans master quickly with training.
More advanced is intuitive reasoning, relying on synonyms and context judgment, the comfort zone of traditional language models.
The highest challenge comes from the ARC challenge (Abstraction and Reasoning Corpus). This ultimate test, designed by AI pioneer François Chollet, distinguishes rote memorization from true intelligence.
In ARC tests, subjects see groups of colored grids, each undergoing some transformation—rotation, color change, filling, or movement based on abstract rules. Subjects must instantly discern the undeclared rule and apply it to a new grid.
This requires no knowledge reserve but pure fluid intelligence (Fluid Intelligence).
These seven trials brought the data patterns to the surface.
The curves drawn from the experimental results are breathtaking.
Within tasks, difficulty is proportional to cost. Math problems humans find hard also require reasoning models to generate more Tokens to solve, ruling out mere answer retrieval—they are indeed computing effortfully.
From a macro cross-task perspective, the trends are even more consistent.
Basic arithmetic imposes the lowest cognitive load on humans, with the fastest reactions; for models, it's the least Token-consuming task.
ARC is the hardest for humans, with many volunteers needing prolonged observation, hypothesizing, and iterating; correspondingly, reasoning models peak in chain-of-thought length on ARC.
What does this synchronization mean? It means 'difficulty' is universal across intelligence dimensions.
It's not that human brain structures make ARC hard specifically, but solving such problems inherently requires more computational steps and logical transitions. Whether biological or artificial neural networks, facing the same information entropy demands equivalent negentropy effort.
Convergent Evolution: Function Determines Form
Biology has a concept called convergent evolution.
Sharks are fish, dolphins mammals, distant on the evolutionary tree, yet both evolved streamlined bodies and dorsal fins for efficient swimming.
Professor Evelina Fedorenko believes we see the same in AI.
The engineers building these models didn't try to mimic the human brain. They care only about one thing: stable correct outputs under extreme conditions.
This extreme pursuit of accuracy and robustness forces AI models to evolve thinking strategies similar to humans.
When problems complexify and single-step intuition (System 1) fails, error penalties compel models to think one step more. This accumulation forms paths resembling human deliberation.
This is functional inevitability. Solving complex problems objectively requires decomposition, hypothesis, verification. Whoever passes natural selection (or AI loss function optimization) inevitably masters stepwise processing.
The study also touches a deeper cognitive science question: is language equal to thought?
When thinking, we often have an inner voice speaking. But does thought require language?
Fedorenko's prior research proves language and logical reasoning networks in the brain are separate. Aphasic patients lose language yet solve complex math.
Reasoning models confirm this again.
Though models output Tokens (usually words or characters), in long chains, researchers observe seemingly meaningless fragments, jumping symbols, even erroneous intermediates.
Yet these human-incomprehensible ramblings lead to correct answers.
This shows the actual reasoning occurs in a high-dimensional abstract representation space.
Those Tokens are mere projections at the output layer, like our inner voice being the user interface of complex neuronal firing.
Models talk to themselves, but not in English or Chinese—in the language of probabilities and vectors.
Not Replication, But Reflection
It's crucial to clarify: this study doesn't mean AI has human consciousness or fully replicates brain structure.
Human thinking is built on sensory experiences of the physical world.
We know balls are round, bouncy, gravity-affected because we've touched and played since childhood.
Current AI models learn from statistical patterns in text and images, lacking embodied cognition (Embodied Cognition).
Moreover, models remain clumsy with world knowledge problems. Without certain commonsense in training data, they can't fill gaps like humans via life experience.
But the study's value lies in shattering carbon-based exceptionalism.
It tells us thinking isn't magic but a physical process.
As long as the goal is solving high-complexity logic problems, compute cost distributions show universal patterns.
MIT's discovery provides a new coordinate system for understanding intelligence.
It proves slow thinking isn't evolutionary baggage but the necessary path for complexity.
On the road to AGI, merely piling parameters and compute isn't enough; models must be given time and space to pause and think.
For humans, it's also a mirror.
When scratching heads and spending time on tough problems, no need for frustration.
That's the physical manifestation of the brain building high-dimensional logic chains.
This thinking cost is the ticket all intelligences must pay to reach truth.
AI increasingly resembles humans not because they want to be us, but because under rigorous logical laws, we all climb the same optimal path.
References:
https://news.mit.edu/2025/cost-thinking-mit-neuroscientists-find-parallel-humans-ai-1119