Ilya Sutskever has finally appeared, in a 90-minute deep interview!
Last year, news of Ilya Sutskever leaving OpenAI shocked the entire tech world. As a legendary figure in deep learning, from AlexNet to AlphaGo to the GPT series, his name is closely tied to nearly every major AI breakthrough of the past decade.
Now, he has founded a new company, Safe Superintelligence (SSI), dedicated to researching 'superintelligence,' but he rarely appears in public.
Today, he appeared in an interview with Dwarkesh Patel, systematically expounding his judgments on the current stage of AI development, core technical bottlenecks, and visions for the future.
Ilya's core view is: We are ending an era centered on 'compute scaling' (Scaling) and returning to an era driven by 'fundamental research' (Research).
(This article is only an excerpt of partial information; recommend watching the original video yourself, as it contains a lot of information.)
AI's 'Jagged' Frontier: Why Are Models Sometimes Genius, Sometimes Stupid?
We all feel the contradictions in current AI models. On one hand, they achieve astonishing scores on various complex benchmarks (evals), solving many problems once thought exclusive to humans. On the other hand, their performance in real applications is very unstable, and their economic impact is far below expectations.
Ilya calls this phenomenon the model's 'jagged' capability boundary. He gives a vivid example:
"You ask the model to write code for you, it writes a bug. You point out the bug, and it says: 'Oh my god, you're absolutely right, I'll fix it right away.' Then it introduces a second bug. You tell it about this new bug, and it says: 'How could I make such a mistake?' Then it reverts the first bug."
Why is this happening? Ilya offers two possible explanations:
Explanation one: Side effects of reinforcement learning (RL) training.
Current reinforcement learning, especially reinforcement learning from human feedback (RLHF), may make models 'overly focused and narrow.' To get high scores on specific tasks, the model learns certain 'tricks,' but this damages its global awareness and common sense judgment.
Explanation two: Researchers' unconscious 'reward hacking.'
The reason current models are like 'bookworms who only do homework' is because we've designed the training environment to 'win first place in competitions,' not 'to become a tasteful programmer.'
Ilya believes true 'reward hacking' may occur in human researchers. To make the model look great on leaderboards at release, research teams unconsciously design RL training environments around evaluation benchmarks. They ask: 'What kind of RL training can help the model perform better on this task?'
This leads to models being overtrained to fit evaluations rather than the real world. Ilya uses a metaphor:
"Suppose there are two students. Student A decides to become a top programming contest player, practicing 10,000 hours for it, memorizing all algorithms and proof techniques. He eventually becomes the champion. Student B is also interested in programming contests but only practices 100 hours and gets good results. Which one do you think will do better in their future career?"
The answer is obviously Student B. He has true understanding and generalization ability. Our current AI models are like extreme versions of Student A. We collect all known programming contest problems and even create more with data augmentation to train it. Of course it scores high in contests, but it's hard to expect it to generalize to other software engineering tasks requiring 'taste' and 'judgment.'
This over-optimization for evaluations, combined with the model's 'insufficient generalization ability,' explains the huge gap between benchmark performance and real-world applications we see today.
Returning from the 'Compute Scaling Era' to the 'Research Era': What Exactly Are We Scaling?
Ilya believes the success of scaling laws lies in providing a low-risk resource investment method for business decisions. Because everyone understands that by proportionally inputting compute and data into neural networks of a specific scale, good results can be obtained.
Ilya divides AI development into several stages:
• 2012-2020: Age of Research. In this period, researchers constantly tried new ideas and architectures. AlexNet, ResNet, Transformer, etc., are products of this era.
• 2020-2025: Age of Scaling. With the emergence of GPT-3 and Scaling Laws, everyone suddenly realized a simple but extremely effective 'recipe': use more compute, more data, larger models for pre-training (Pre-training) to get better results. The word 'Scaling' has huge magic because it points the entire industry to a low-risk, high-certainty investment direction.
• 2025-?: Return to the Age of Research. Now, the magic of scaling is failing, and everyone must start researching new breakthroughs again.
Why is this shift happening?
First, pre-training data is limited. High-quality text data on the internet is being exhausted. Even giants like Google can squeeze more value from data for Gemini, but data itself is ultimately finite.
Second, simply increasing compute by 100 times no longer guarantees qualitative changes. Ilya believes the industry no longer firmly believes that throwing more compute solves everything.
When 'Scaling' is no longer the universal key, the entire field must turn back to find new, more efficient ways to utilize compute resources. We are back to the 'research era' that requires exploration and innovation, but this time with computers more powerful than ever before.
Ilya quotes a saying on Twitter to satirize Silicon Valley's cliché 'execution is king, ideas are cheap':
"If ideas are really that cheap, why can't anyone come up with new ideas now?"
In the 'scaling era,' compute was the bottleneck, so a simple scaling idea could achieve huge success. Now, with compute unprecedentedly massive, 'ideas' themselves have become the bottleneck again. The industry is in an awkward situation where 'there are more companies than ideas.'
This is the opportunity for companies like SSI. They bet that future breakthroughs will come from entirely new ideas, not endlessly piling resources on the existing paradigm.
The Core Bottleneck of Large Models is Generalization Ability
If we're back in the research era, what is the core research problem? Ilya gives a clear answer: Generalization.
"These models are shockingly worse than humans in generalization ability; this is very obvious. It seems like a very fundamental problem."
Human learning efficiency and generalization are currently unattainable by AI. A teenager only needs a dozen hours to learn to drive, but autonomous driving systems require billions of miles of data. A five-year-old child, despite extremely limited data intake and diversity, already has very robust cognition of the world.
Ilya believes that for vision, motor skills, etc., we can attribute them to 'prior knowledge' from millions of years of evolution. But for modern skills like programming and math, humans also show strong learning ability. This suggests humans may have a more fundamental, superior machine learning algorithm.
What is the key to this algorithm? Ilya mentions an important concept: Value Function.
In reinforcement learning, the value function tells the agent how well it's doing in an intermediate state without waiting for the task to complete for a reward signal. It's like in chess, you don't need to lose the whole game to know 'losing a queen is a bad move.' The value function greatly improves learning efficiency.
Ilya believes humans have an extremely powerful, built-in value function, and Emotions are an important part of this value function. He cites a neuroscience case:
"A patient lost emotional processing ability due to brain damage. He was still smart, articulate, able to solve logic puzzles, but completely unable to make decisions in life. He would spend hours deciding which socks to wear and make catastrophic financial decisions."
This case shows that emotions encoded by evolution provide us with a simple but extremely robust decision guidance system, allowing us to act effectively in a complex world.
Current AI models' value functions are very fragile, almost non-existent. Ilya believes that building a human-like robust value function for AI will be a key step to solving generalization problems.
This is a trillion-dollar problem. Ilya admits he has many ideas about it, but 'unfortunately, we live in a world where not all machine learning ideas can be freely discussed.' This hints that SSI's secret exploration direction may fundamentally differ from the current mainstream RL paradigm.
SSI's Path: Redefining 'Superintelligence'
Ilya reflects on the term AGI (Artificial General Intelligence). He believes the birth of 'AGI' was mainly to distinguish from 'narrow AI' that can only play chess. Combined with the 'Pre-training' concept, it shaped our imagination of future AI: a finished product pre-trained on massive data, omniscient and omnipotent.
But Ilya points out this imagination is overdone; even humans themselves are not 'AGI' in this sense. Humans have basic skills, but most knowledge is acquired through 'continual learning' (Continual Learning).
Therefore, the superintelligence SSI pursues may not be a static, omniscient 'finished product,' but more like a 'superintelligent 15-year-old teenager.'
"It is a great student, very eager to learn. You can send it to be a programmer, a doctor, to learn anything. It itself is not a completed product deployed to the world, but a process that continuously learns and trials and errors during deployment."
This is an extremely important paradigm shift. Future superintelligence is not a 'god' pre-trained once, but a 'learning algorithm' with superhuman learning ability. It is deployed to every corner of the economy, joining organizations like human employees, learning domain-specific skills on the job.
More crucially, these AI instances distributed across different positions can 'merge their learning outcomes.' One AI instance learns surgery, another writes legal documents; their knowledge and skills can be integrated into a unified model. This is something humans can't do.
This mode will bring two consequences:
1. Functional superintelligence: Even without recursive self-improvement in algorithms, this model will become functionally omniscient by mastering all human professional skills simultaneously.
2. Dramatic economic growth: Mass deployment of super labor that can quickly learn any job will inevitably bring extremely rapid economic growth for a period.
This is SSI's envisioned path to superintelligence: not a one-time 'explosion,' but a gradual process achieved through continual learning and widespread deployment.
Alignment Challenges and Future Vision
For such powerful, continually learning AI, safety and alignment (Alignment) issues become more severe. Ilya offers several core views:
1. AI's power must be 'demonstrated.'
Discussions about AGI often seem empty because we're talking about a non-existent, hard-to-imagine system. Ilya believes only when the world truly feels AI's power will people take safety seriously. He predicts, as AI becomes stronger, we'll see unprecedented changes:
• Competitors will cooperate on safety: Recent collaborations like OpenAI and Anthropic are just the beginning.
• AI companies will become more 'paranoid': When internal staff witness AI capabilities making them uneasy, their attitude toward safety will fundamentally shift.
• Governments and the public will demand intervention: External pressure will be a key driver for safety research.
2. Alignment goal: 'Care for all sentient life.'
The current mainstream alignment idea is to make AI serve humans. But Ilya proposes a bolder, more controversial idea: Build an AI that cares for all 'sentient life'.
He believes this may be easier to achieve than making AI care only about humans, because AI itself will ultimately be 'sentient.' Like human mirror neurons allowing empathy for animals, a self-aware AI understanding the world by simulating other lives may naturally develop care for all life.
Of course, this brings new problems: In the future, AI numbers will far exceed humans. How will an AI caring for all 'sentient life' weigh human interests? This is an open question.
3. Long-term equilibrium solution: Neuralink++
Ilya thinks about the very distant future. In a world dominated by superintelligence, how can humans maintain agency? A common idea is everyone having a personal service AI. But this might make humans passive 'report reviewers,' no longer participants in the world.
Ilya proposes a path he doesn't like himself but thinks may be the ultimate solution: Fuse humans with AI through Neuralink++-like technology.
"When AI understands something, we understand it too through 'wholesale' information transfer. When AI is in a situation, you are fully participating in it."
Only this way can humans remain active participants in history in an intelligence explosion future, not passive spectators.
4. Evolutionary insight: An unsolved mystery
Ilya also poses a profound evolutionary puzzle strikingly similar to alignment. How did evolution encode the advanced, abstract concept of 'social status' into our genes as an innate desire?
Low-level desires (like craving sweets) are easy to understand, connectable via simple chemical signals to reward centers. But 'caring about social evaluation' requires massive complex brain computations. How did evolution connect this high-level computation result to our basic drives?
Ilya sees this as an unsolved mystery. If we understand how evolution solved this 'alignment' problem, it could provide important insights for AI alignment.
What is 'Taste' in Research?
As someone recognized for having the best 'research taste' (Research Taste), Ilya shares his methodology. This is highly inspiring for anyone in creative work.
He believes research taste is an 'aesthetic about what AI should be like.' It consists of several elements:
• Beauty, Simplicity, Elegance: Ugly, complex solutions have no place. The right direction is often beautiful.
• Correct inspiration from the brain: Need to judge which brain features are fundamental (e.g., neurons, distributed representations, learning from experience) and which are accidental (e.g., cortical folds).
• Top-down belief: When experiments don't go well, what sustains you? A strong belief that 'it must be this way' in a direction. This belief comes from the above aesthetics and inspirations. It helps distinguish if the direction is wrong or there's a code bug.
This 'top-down belief' driven by aesthetics and first principles is key to traversing uncharted research territory and making true breakthroughs.
Summary
Ilya Sutskever's interview paints an AI future vision not entirely aligning with mainstream narratives.
In this vision, the 'compute is king' era is ending, and fundamental research on issues like 'generalization' will reclaim center stage. Future superintelligence won't be a statically trained product but a 'learning algorithm' with super learning ability, growing through world interaction.
To ensure this future is safe, we need AI to care for broader 'sentient life' and ultimately perhaps human-AI fusion to maintain human agency in the universe.
The starting point is returning to research essence: 'taste' driven by aesthetics and first principles to find that simple, beautiful, correct answer. This may be why Ilya founded SSI and dives back into unknown exploration.
Do you think the next 'Transformer' moment the world awaits will come from Ilya and SSI?