AI Visionary Fei-Fei Li's Extensive Article Ignites Silicon Valley: Large Language Models Are on the Wrong Path, Spatial Intelligence Is the Only Way to AGI

"They are like verbose wordsmiths in a dark room—eloquent but inexperienced, knowledgeable but detached from reality."

When "AI godmother" and leading scientist Fei-Fei Li used this phrase to define all large language models today, she incisively pointed out a cruel reality: despite AI appearing omnipotent, they actually "live in darkness."

They can write poetry, paint, and even create hyper-realistic videos, but they cannot understand what a cup looks like after rotating 90 degrees, nor can they make a virtual human truly obey the laws of physics.

In Li's view, the root of this predicament is that we have been going in the wrong direction all along. The next step for AI is not larger language models, but to imbue them with an ability that even infants possess—spatial intelligence. This, she argues, is the only path to true Artificial General Intelligence (AGI).

Image

A Soul-Searching Question:

Why is AI still "Blind"?

The original article is long, so let's start with something intuitive.

Ask AI to write a poem, and it instantly transforms into Li Bai or Du Fu. But ask AI to solve a few simple physics problems, such as:

· "If I turn this cup 90 degrees, what will it look like?"

· "Where is the exit of this maze?"

· "Estimate the distance from the table to the door?"

AI's answers are mostly guesses.

If you pay attention, even in the coolest AI-generated videos, there are always "flaws": a person's hand suddenly has an extra finger, or an object passes through a wall without warning.

Fei-Fei Li points out the crux of the issue: they don't understand the physical world.

They cannot truly comprehend distance, size, direction, and physical laws.

So, while our expectation for AI is an all-capable butler from science fiction movies, the reality is:

· We still don't have robots that can help us with household chores.

· AI's progress in areas requiring the understanding of 3D structures, such as drug discovery and new material development, is slow.

· AI cannot truly understand the "world" in the minds of architects, game designers, or film directors.

Image

The Missing Piece of the Treasure Map:

Spatial Intelligence

What exactly is this "spatial intelligence" that AI lacks?

Fei-Fei Li says it is the "scaffolding" of human cognition.

Long before we learned to speak or write, we had already mastered this ability:

· Infants spend one to two years understanding the world by grasping, throwing, biting, and observing.

· When you parallel park, your brain rapidly calculates the distance between the bumper and the curb.

· When a friend throws you keys, you don't calculate the parabolic trajectory with pen and paper; you catch them instinctively.

· If you get up at night to get water, you can find the cup and pour water into it without turning on the lights.

Fei-Fei Li also gives examples of how even great human discoveries and civilizational advancements rely on this ability:

· Eratosthenes of ancient Greece calculated the Earth's circumference by observing different angles of shadows in two locations.

· Hargreaves invented the "Spinning Jenny" based on his observation and understanding of space.

· Watson and Crick assembled the DNA double helix structure by manually building 3D molecular models, "piecing together" the spatial arrangement of base pairs.

Fei-Fei Li believes that spatial intelligence is the foundation of human imagination, creativity, and interaction with the world. Unfortunately, current AI largely lacks this ability.

Image

AI's Next Step:

From "Language Models" to "World Models"

So, how can we make AI "see" the world?

Fei-Fei Li offers her answer: the future of AI lies not in larger "language models" (LLMs), but in entirely new "World Models."

She believes that a true "World Model" must be "three-in-one":

· Generative: It must be able to create 3D worlds that conform to physical and geometric laws. For example, it "knows" gravity, knows objects fall, and knows water flows downhill.

· Multimodal: It must be able to process all inputs. For instance, it should not only understand your spoken words but also interpret images, videos, depth information, and even your gestures.

· Interactive: This is the most crucial. When you tell it an "action," it must be able to predict "what will happen next." For example, "push the block," and it knows the block will fall.

Fei-Fei Li admits that this challenge is far more difficult than training language models.

She explains that language is a one-dimensional, sequential signal, while the world is four-dimensional (three-dimensional space + time), constrained by gravity, physical laws, and countless complex rules.

Fei-Fei Li revealed that she co-founded World Labs a year ago, and recently showcased Marble, its first world model, to a select group of users—all aimed at tackling this challenge.

Image

How will this change our lives?

Once AI possesses spatial intelligence, that will be a true revolution.

First, "superpower-like" creativity. Fei-Fei Li's team at World Labs is developing the Marble model, which will allow filmmakers, game designers, and architects to quickly create and iterate 3D worlds using "prompts." In the future, these individuals will no longer need to learn complex 3D software; they will only need to describe with language to generate a 3D world they can enter and interact with. At that time, everyone can become a "creator."

Second, true "embodied AI." Robots will no longer be "clumsy" robotic arms. With the support of "world models," they will learn thousands of practical skills in simulated environments and then enter our homes and hospitals, becoming capable assistants and caregivers.

Furthermore, Fei-Fei Li specifically mentioned that such AI will become an "accelerator" for future science and education.

· Healthcare: AI can simulate molecular interactions in multiple dimensions, accelerating drug discovery, and also help doctors analyze images, providing continuous support for patients and caregivers.

· Education: Students will no longer just read books but can "walk into" the streets of ancient Rome or personally "explore" the inside of cells. Teachers can use interactive environments for personalized teaching, and professionals can practice and master complex skills in highly realistic simulation environments.

· Scientific Research: By simulating environments inaccessible to humans, such as the deep sea or outer space, we can expand the scope of scientific exploration; by combining multidimensional simulations with real-world data collection, we can extend the boundaries of laboratory observation and understanding.

Image

Conclusion:

The ultimate goal of AI,

is "to empower humanity"

As one of the scientists who helped usher in the era of modern AI, Fei-Fei Li concludes her article by returning to her core humanistic concern. She emphasizes that AI's ultimate goal is never to replace humans, but "to empower humanity":

"Let AI be a force that enhances human expertise, accelerates human discovery, and amplifies human care—rather than replacing the human judgment, creativity, and empathy that belong to us."

She believes that AI is developed by humans, used by humans, and governed by humans, and must always respect human agency and dignity. Its magic lies in expanding our capabilities, making us more creative and efficient.

"Spatial intelligence" represents such a "deeper, richer, and more powerful vision for life." It promises to "build machines that are highly aligned with the real world, allowing them to become our true partners in addressing major challenges."

Perhaps, the true intelligence of machines will begin with this "revelation."


[Below is the full text of Fei-Fei Li's long article]

Title: From Words to Worlds: Spatial Intelligence is AI’s Next Frontier

In 1950, when computing was synonymous with automated arithmetic and simple logic, Alan Turing posed a question that still resonates today: Can machines think? To imagine all that he foresaw required extraordinary imagination: that one day, intelligence might be engineered, not merely born. This insight later launched an unrelenting scientific quest called Artificial Intelligence (AI). In my own twenty-five years in AI, Turing's vision continues to inspire me. But how close are we to it? The answer is not straightforward.

Today, cutting-edge AI technologies, exemplified by Large Language Models (LLMs), have begun to transform how we access and process abstract knowledge. They are all "articulate bookworms," brimming with knowledge, yet "out of touch with reality." Spatial Intelligence, however, will transform how we create and interact with the real and virtual worlds—it will revolutionize storytelling, creativity, robotics, scientific discovery, and many other fields. This is AI's next frontier.

· In this article, I will explain what spatial intelligence is, why it matters, and how we are building world models that will unlock this capability—with implications that will reshape creativity, embodied intelligence, and human progress.

Spatial Intelligence: The Scaffolding of Human Cognition

AI has never been more exciting than it is now. Generative AI models, such as large language models, have moved from research labs into daily life, becoming tools for billions of people to create, enhance productivity, and communicate. They have demonstrated abilities once thought impossible, effortlessly generating coherent text, mountains of code, photorealistic images, and even short video clips. Whether AI will change the world is no longer a question. By any reasonable definition, it already has.

However, too much remains out of reach. The vision of autonomous robots remains captivating but speculative, far from becoming an everyday part of life long promised by futurists. The dream of drastically accelerating research in areas like curing diseases, discovering new materials, and particle physics largely remains unrealized. And the promise of AI truly understanding and empowering human creators—whether it's helping students learn complex concepts in molecular chemistry, assisting architects with spatial visualization, helping filmmakers construct worlds, or supporting anyone seeking fully immersive virtual experiences—is still distant.

To understand why these capabilities remain elusive, we need to examine how spatial intelligence evolved and how it shaped our understanding of the world.

Vision has long been a cornerstone of human intelligence, but its power stems from something more fundamental. Long before animals learned to build nests, care for offspring, communicate with language, or establish civilizations, the simple act of "sensing" quietly began an evolutionary journey toward intelligence.

This seemingly isolated ability to collect information from the external world—whether perceiving a faint glimmer or touching a texture—built a bridge between perception and survival, a bridge that grew stronger and more intricate with each generation. Neurons layered upon this bridge, forming nervous systems capable of interpreting the world and coordinating an organism's interaction with its environment. Thus, many scientists speculate that perception and action became the core loop driving the evolution of intelligence, and the foundation upon which nature created our species—the ultimate embodiment of sensing, learning, thinking, and acting.

Spatial intelligence plays a crucial role in defining how we interact with the physical world. Every day, we rely on it for the most ordinary behaviors: parking by imagining the diminishing gap between the bumper and the curb; catching keys thrown from across the room; navigating a crowded sidewalk without collisions; getting up at night to pour water, finding the cup and pouring without turning on the light. In more extreme cases, firefighters navigating changing smoke in collapsed buildings make instantaneous judgments about structural stability and chances of survival, communicating through gestures, body language, and a shared professional instinct that no language can replace. And infants, months and even years before learning to speak, learn about the world entirely through playful interaction with their environment. All of this happens intuitively and naturally—a fluency that machines have not yet achieved.

Spatial intelligence is also the foundation of our imagination and creativity. Storytellers create extraordinarily rich worlds in their minds and use various visual mediums, from ancient cave paintings to modern films to immersive video games, to present these worlds to others. Whether children build sandcastles on the beach or play Minecraft on a computer, spatially-based imagination forms the basis of interactive experiences in real or virtual worlds. In many industrial applications, simulations of objects, scenes, and dynamic interactive environments power countless critical business use cases, from industrial design to Digital Twins to robot training.

History is replete with moments defining the course of civilization where spatial intelligence played a central role. In ancient Greece, Eratosthenes transformed shadows into geometry—measuring a 7-degree angle in Alexandria at the moment the sun was directly overhead in Syene—thereby calculating the Earth's circumference. Hargreaves' "Spinning Jenny" revolutionized textile manufacturing through a spatial insight: arranging multiple spindles side-by-side in a frame allowed one worker to spin multiple threads simultaneously, increasing efficiency eightfold. Watson and Crick discovered the structure of DNA by manually building three-dimensional molecular models, constantly fiddling with metal plates and wires, eventually "piecing together" the spatial arrangement of base pairs. In each case, when scientists and inventors needed to manipulate objects, visualize structures, and reason about physical space, spatial intelligence drove the progress of civilization—and none of this could be captured solely in words.

Spatial intelligence is the scaffolding upon which our cognition is built. It operates whether we are passively observing or actively creating. It drives our reasoning and planning, even for the most abstract topics. It is crucial to how we interact—verbally or physically, with peers or with the environment itself. While most of us don't uncover new truths like Eratosthenes every day, we generally think in the same way—understanding a complex world through sensory perception, then using an intuitive understanding to grasp how it functions physically and spatially.

Unfortunately, today's AI does not think this way.

The past few years have indeed seen tremendous progress. Multimodal Large Language Models (MLLMs), trained on vast amounts of multimedia data in addition to text, have introduced some basic spatial awareness. Today's AI can analyze images, answer related questions, and generate hyper-realistic images and short videos. Through breakthroughs in sensors and haptic technologies, our most advanced robots are also beginning to manipulate objects and tools in highly constrained environments.

However, the frank reality is that AI's spatial capabilities are still far from human-level, and their limitations quickly become apparent. In tasks such as estimating distance, direction, and size, or performing "mental rotation" by regenerating objects from new angles, the performance of the most advanced MLLMs rarely exceeds random guessing. They cannot navigate mazes, identify shortcuts, or predict basic physical phenomena. AI-generated videos—while nascent and indeed cool—often lose coherence after a few seconds.

While today's most advanced AI excels at reading, writing, research, and data pattern recognition, these same models have fundamental limitations when representing or interacting with the physical world. Our view of the world is holistic—it's not just what we are looking at, but also how everything relates spatially, its meaning, and why it matters. Understanding all of this through imagination, reasoning, creation, and interaction—not just description—is the power of spatial intelligence. Without it, AI is disconnected from the physical reality it attempts to understand. It cannot effectively drive our cars, guide robots in our homes and hospitals, create entirely new immersive and interactive experiences for learning and entertainment, or accelerate discoveries in materials science and medicine.

The philosopher Wittgenstein once wrote: "The limits of my language mean the limits of my world." I am not a philosopher. But I know that, at least for AI, the world is much more than words. Spatial intelligence represents the frontier beyond language—the ability that connects imagination, perception, and action, and opens possibilities for machines to truly enhance human life, from healthcare to creativity, from scientific discovery to daily assistance.

AI's Next Decade: Building Machines with True Spatial Intelligence

So, how do we build AI with spatial intelligence? How can we create models capable of visual reasoning like Eratosthenes, precision engineering like an industrial designer, imaginative creation like a storyteller, and fluid interaction with the environment like an emergency responder?

Building AI with spatial intelligence requires a grander goal than large language models: World Models. These are a new type of generative model whose ability to understand, reason, generate, and interact with virtual or real worlds—which are semantically, physically, geometrically, and dynamically complex—far surpasses the scope of current large language models. This field is still in its infancy, with current approaches ranging from abstract reasoning models to video generation systems. World Labs was founded in early 2024 precisely on this belief: fundamental approaches are still being established, making it the defining challenge of the next decade.

In this emerging field, the most important thing is to establish guiding principles. For spatial intelligence, I define world models through three fundamental capabilities:

· Generative: World models are capable of generating worlds that are perceptually, geometrically, and physically consistent.

To unlock spatial understanding and reasoning, world models must also be able to generate their own simulated worlds. They must be able to generate endless, diverse simulated worlds based on semantic or perceptual instructions—while maintaining geometric, physical, and dynamic consistency—whether these worlds represent real or virtual spaces. The research community is actively exploring whether these worlds should represent their inherent geometric structure implicitly or explicitly. Furthermore, I believe a general world model, in addition to powerful latent representations, must also be able to generate an explicit, observable world state for its output to adapt to various different use cases. In particular, its understanding of the present must be coherent with its past, and with the world states that led to the current state.

· Multimodal: World models are multimodal by design.

Just like animals and humans, world models should be able to process multiple forms of input—known as "prompts" in generative AI. Given partial information—whether images, videos, depth maps, text instructions, gestures, or actions—the world model should be able to predict or generate as complete a world state as possible. This requires it to process visual input with the fidelity of real vision, while interpreting semantic instructions with equal fluidity. This allows both agents and humans to communicate with the model about the world through multiple inputs, and in turn receive multiple outputs.

· Interactive: World models can output the next state based on input actions.

Finally, if an action and/or goal is part of the prompt given to the world model, then its output must include the world's next state, whether implicitly or explicitly represented. When given only an action as input, with or without a target state, the world model should produce an output consistent with the world's previous state, any expected target state, and its semantic meaning, physical laws, and dynamic behavior. As world models with spatial intelligence become increasingly powerful and robust in their reasoning and generative capabilities, it is conceivable that, given a goal, the world model itself will not only predict the next state of the world but also predict the next action based on the new state.

The scope of this challenge exceeds any AI has faced before.

While language is purely a generative phenomenon of human cognition, the world follows far more complex rules. For example, on Earth, gravity governs motion, atomic structures determine how light produces color and brightness, and countless physical laws constrain every interaction. Even the most fantastical and creative worlds are composed of spatial objects and agents that follow the physical laws and dynamic behaviors that define them. Coordinating all of this—semantics, geometry, dynamics, and physics—requires entirely new approaches. Representing a world in multiple dimensions is far more complex than representing a one-dimensional sequential signal like language. Achieving world models capable of providing the general capabilities we humans enjoy requires overcoming several formidable technical obstacles. At World Labs, our research team is dedicated to making fundamental progress toward this goal.

Here are some examples of our current research topics:

A new, general training objective function: Defining a general objective function as elegant and concise as "predict the next Token" in large language models has long been a core goal of world model research. The complexity of its input and output spaces inherently makes such a function more difficult to formalize. While much remains to be explored, this objective function and corresponding representations must reflect geometric and physical laws, respecting the fundamental nature of world models as an "grounded" representation of imagination and reality.

Large-scale training data: Training world models requires data far more complex than text processing. The good news is: vast data sources already exist. Internet-scale image and video datasets represent rich, easily accessible training material—the challenge lies in developing algorithms that can extract deeper spatial information from these two-dimensional, image or video frame-based signals (i.e., RGB). A decade of research has shown the power of scaling laws between data volume and model size in language models; the key breakthrough for world models lies in building architectures capable of leveraging existing visual data at comparable scales. Furthermore, I would not underestimate the power of high-quality synthetic data and additional modalities like depth and haptic information. They supplement internet-scale data during critical steps of the training process. But the path forward relies on better sensor systems, more robust signal extraction algorithms, and far more powerful neural simulation methods.

New model architectures and representation learning: World model research will inevitably drive advancements in model architectures and learning algorithms, especially beyond current multimodal large language models and video diffusion paradigms. These two paradigms often "tokenize" data into one-dimensional or two-dimensional sequences, which makes simple spatial tasks unnecessarily difficult—such as counting the number of unique chairs in a short video, or remembering what a room looked like an hour ago. Alternative architectures may help, such as three-dimensional or four-dimensional perceptual approaches for "tokenization," context, and memory. For example, at World Labs, our recent work on a real-time generative, frame-based model called RTFM demonstrated this shift, using spatially-based frames as a form of spatial memory to achieve efficient real-time generation while maintaining the consistency of the generated world.

Clearly, we still face formidable challenges before we can fully unlock spatial intelligence through world modeling. This research is not merely a theoretical exercise; it is the core engine for creating a new class of creativity and productivity tools. And progress within World Labs has been encouraging. We recently shared a glimpse of Marble with a few users, the first world model ever that can generate and maintain consistent three-dimensional environments through multimodal input prompts, allowing users and storytellers to explore, interact, and further build within their creative workflows. We are working hard to make it publicly available as soon as possible!

Marble is just our first step toward creating a truly spatially intelligent world model. As progress accelerates, researchers, engineers, users, and business leaders are all beginning to recognize its extraordinary potential. The next generation of world models will enable machines to achieve spatial intelligence at an entirely new level—an accomplishment that will unlock critical capabilities still generally lacking in today's AI systems.

Building a Better World for People with World Models

The motivation for AI development is crucial. As one of the scientists who helped pioneer the modern AI era, my motivation has always been clear: AI must enhance human capabilities, not replace them. For years, I have strived to align AI's development, deployment, and governance with human needs. Today, extreme narratives of technological utopia and doomsday abound, but I continue to hold a more pragmatic view: AI is developed by humans, used by humans, and governed by humans. It must always respect human agency and dignity. Its magic lies in expanding our capabilities; making us more creative, more connected, more productive, and more fulfilled. Spatial intelligence represents this vision—AI empowering human creators, caregivers, scientists, and dreamers to achieve what was once impossible. It is this belief that drives my commitment to spatial intelligence as AI's next great frontier.

The applications of spatial intelligence span different timelines. Creative tools are emerging—World Labs' Marble has already put these capabilities into the hands of creators and storytellers. Robotics represents an ambitious medium-term goal as we continue to refine the perception-action loop. The most transformative scientific applications will take longer, but promise profound impacts on human flourishing.

Across all these timelines, several areas stand out for their potential to reshape human capabilities. This requires a monumental collective effort, far beyond what one team or one company can achieve. It requires the participation of the entire AI ecosystem—researchers, innovators, entrepreneurs, companies, and even policymakers—working together toward a common vision. But this vision is worth pursuing. Here is what that future might hold:

Creativity: Supercharging Storytelling and Immersive Experiences

"Creativity is intelligence having fun." This is a famous quote from my personal hero, Albert Einstein, and one of my favorites. Long before written language, humans told stories—painting them on cave walls, passing them down through generations, building entire cultures on shared narratives. Stories are how we understand the world, connect across time and space, explore the meaning of humanity, and most importantly, find meaning and love in our own lives. Today, spatial intelligence has the potential to transform how we create and experience narratives, in ways that both respect their fundamental importance and extend their impact from entertainment to education, from design to architecture.

World Labs' Marble platform will place unprecedented spatial capabilities and editable control into the hands of filmmakers, game designers, architects, and storytellers of all kinds, enabling them to quickly create and iterate fully explorable three-dimensional worlds without the heavy overhead of traditional 3D design software. The act of creation itself remains vibrant and human; AI tools simply amplify and accelerate what creators can achieve. This includes:

· New Dimensions of Narrative Experience: Filmmakers and game designers are using Marble to create entire worlds unconstrained by budget or geography, exploring scenarios and perspectives difficult to manage in traditional production processes. As the lines between different forms of media and entertainment increasingly blur, we are approaching a new kind of interactive experience that blends art, simulation, and gaming—personalized worlds where anyone, not just studios, can create and immerse themselves in their own stories. With the emergence of updated, faster ways to elevate concepts and storyboards into full experiences, narratives will no longer be confined to a single medium, and creators will be free to build worlds with common threads across countless interfaces and platforms.

· Spatial Storytelling through Design: Essentially, every object manufactured or space built must be designed in virtual three-dimensional space before its physical creation. This process is iterative and costly in both time and money. With spatially intelligent models, architects can quickly visualize structures without investing months in design; they can walk through spaces that do not yet exist—this is essentially telling stories about how we might live, work, and gather in the future. Industrial and fashion designers can instantly translate imagination into form, exploring how objects interact with the human body and space.

· New Immersive and Interactive Experiences: Experience itself is one of the most profound ways our species creates meaning. Throughout human history, there has been only one single three-dimensional world: the physical world we all share. Only in recent decades, through games and early virtual reality (VR), have we begun to glimpse what it means to share alternative worlds of our own creation. Now, spatial intelligence combined with new product forms, such as VR and Extended Reality (XR) headsets and immersive displays, enhances these experiences in unprecedented ways. We are moving toward a future where stepping into a fully realized multidimensional world will be as natural as opening a book. Spatial intelligence makes world-building no longer the exclusive domain of studios with professional production teams, but open to individual creators, educators, and anyone with a vision to share.

Robotics: The Practice of Embodied AI

From insects to humans, animals rely on spatial intelligence to understand, navigate, and interact with their world. Robots are no exception. Machines with spatial perception have been a dream since the field's inception, including my own work with my students and collaborators in my Stanford research lab. This is why I am so excited about the possibility of achieving this using the types of models World Labs is building.

· Scaling Robot Learning with World Models: Advances in robot learning depend on a scalable, viable training data solution. Given the immense state space of possibilities that robots need to learn to understand, reason, plan, and interact with, many speculate that a combination of internet data, synthetic simulations, and real-world human demonstration capture will be needed to truly create generalizable robots. But unlike language models, training data for today's robot research is very scarce. World models will play a decisive role here. As their perceptual fidelity and computational efficiency improve, the output of world models can rapidly bridge the gap between simulation and reality. This, in turn, will help train robots in simulations with countless states, interactions, and environments.

· Companions and Collaborators: Robots as human collaborators, whether assisting scientists in labs or aiding elderly individuals living alone, can extend the workforce in areas critically needing more labor and productivity. But this requires spatial intelligence for perception, reasoning, planning, and action, and—most importantly—empathetic alignment with human goals and behaviors. For example, a lab robot could handle instruments, freeing scientists to focus on tasks requiring dexterity or reasoning, while a home assistant could help seniors cook without diminishing their enjoyment or autonomy. Truly spatially intelligent world models, capable of predicting the next state and perhaps even actions consistent with that expectation, are essential for achieving this.

· Extending Embodied Forms: Humanoid robots play a role in the worlds we build for ourselves. But the full benefits of innovation will come from more diverse designs: nanobots delivering drugs, soft robots navigating confined spaces, and machines built for the deep sea or outer space. Regardless of their form, future spatial intelligence models must integrate the environments these robots inhabit as well as their own embodied perception and motion. But a key challenge in developing these robots is the lack of training data for these diverse embodied forms. World models will play a crucial role in these efforts regarding simulated data, training environments, and benchmark tasks.

A Longer Horizon: Science, Healthcare, and Education

Beyond creative and robotic applications, the profound impact of spatial intelligence will extend to areas where AI can enhance human capabilities in life-saving and discovery-accelerating ways. I highlight three application areas below with deep transformative potential, but undoubtedly, the use cases for spatial intelligence are vast across many more industries.

· In scientific research, spatially intelligent systems can simulate experiments, test hypotheses in parallel, and explore environments inaccessible to humans—from the deep sea to distant planets. This technology can transform computational modeling in fields such as climate science and materials research. By combining multidimensional simulations with real-world data collection, these tools can lower computational barriers and expand the scope of what each lab can observe and understand.

· In healthcare, spatial intelligence will reshape everything from the lab to the bedside. At Stanford, my students and collaborators have worked for years with hospitals, nursing homes, and homebound patients. This experience has convinced me of spatial intelligence's transformative potential here. AI can accelerate drug discovery through multidimensional simulation of molecular interactions, enhance diagnosis by helping radiologists find patterns in medical images, and enable environmental monitoring systems that support patients and caregivers without replacing the human connection vital for recovery, not to mention the potential for robots to assist our healthcare workers and patients in many different scenarios.

· In education, spatial intelligence can enable immersive learning, making abstract or complex concepts tangible and creating iterative experiences crucial for how our brains and bodies learn. In the age of AI, the need for faster, more effective learning and skill retraining is especially important for school-aged children and adults. Students can explore cellular machinery in multidimensional spaces or walk through historical events. Teachers can gain tools for personalized instruction through interactive environments. Professionals, from surgeons to engineers, can safely practice complex skills in realistic simulations.

In all these areas, the possibilities are limitless, but the goal is always the same: to make AI a force that enhances human expertise, accelerates human discovery, and amplifies human care—rather than replacing the human judgment, creativity, and empathy that belong to us.

Conclusion

The past decade has seen AI become a global phenomenon and a turning point for technology, economy, and even geopolitics. But as a researcher, educator, and now entrepreneur, what motivates me most is still the spirit behind Turing's question 75 years ago. I still share his curiosity. It is this curiosity that energizes me every day for the challenge of spatial intelligence.

For the first time in history, we have the prospect of building machines so attuned to the physical world that we can consider them true partners in tackling our most daunting challenges. Whether it's accelerating how we understand disease in the lab, revolutionizing how we tell stories, or supporting us in our most vulnerable moments due to illness, injury, or old age, we are at the frontier of a technology that will elevate the aspects of life we care about most. This is a vision of a deeper, richer, and more powerful life.

Nearly half a billion years after nature unleashed the first glimmer of spatial intelligence in ancient animals, we are privileged to be the generation of technologists who can soon endow machines with the same capabilities—and to harness these capabilities for the benefit of people everywhere. Our dream of truly intelligent machines is incomplete without spatial intelligence.

Image

Image

Main Tag:Spatial Intelligence

Sub Tags:AI DevelopmentEmbodied AILarge Language ModelsWorld Models


Previous:SJTU PhD's Latest Insights: Clarifying Reinforcement Learning with Just Two Questions

Next:Reinforcement Learning + Large Model Memory: Mem-α, Enabling Agents to "Learn How to Remember" for the First Time

Share Short URL