Fei-Fei Li's Latest Interview: World Models Are Coming

Click "Turing AI" above and select "Set as Star" to receive the latest AI insights directly.

The AI insights you want to know, delivered first.

Image

Copyright Notice

Reprinted from Digital Creation, copyright belongs to the original author, used for academic sharing. If there is any infringement, please leave a message to delete.

Image

On June 4th, Fei-Fei Li, co-founder and CEO of World Labs, and Martin Casado, a16z General Partner and early investor in the company, participated in an interview hosted by a16z General Partner Erik Torenberg. They jointly discussed the concept of "world models" and the urgent need to build them. This conversation deeply analyzed the current AI's limitations, the fundamental principles behind "world models," and their implementation path.

01

The Origin of World Labs: Shared Vision and AI's Physical Foundations

Image

02

Deconstructing the AI Path: The Inevitability of Language, Data, and the Physical World

ImageImage

03

Application Blueprint and Research Foundations of World Models

So, when the vision of "world models" is truly realized, how will it change our world, and what specific applications can it foster? Fei-Fei Li first pointed out: "Creativity is, to a large extent, visual." She listed a wide range of fields from design, film, and architecture to industrial design, all of which heavily rely on visual, perceptual, and spatial abilities. She then mentioned robotics, broadly defining it as "any physical machine that can interact with the environment," noting that these machines must in some way understand their three-dimensional space and collaborate with humans.

Furthermore, Fei-Fei Li envisioned a grander future: "With this technology—which is a combination of generation and reconstruction—we can suddenly create infinite universes. Some universes are designed for robots, some serve creativity, some are for social interaction, some for travel, and some for storytelling. This technology will enable us to live in a multiverse fashion."

Casado then made these seemingly abstract conversations concrete. He explained that these models can generate a complete, manipulable three-dimensional representation in a computer from single or multiple 2D views (like a photo), even including parts outside the field of view, such as the back of a table. This capability means one can manipulate, move, measure, stack objects, and even generate content that didn't exist before, such as creating a 360-degree panorama from a 2D image. Clearly, this will profoundly impact video games, creative design, art creation, and broader physical simulation and interaction fields.

Behind these application prospects, there is a fundamental question: Why must the understanding and reconstruction of the world be three-dimensional?

Fei-Fei Li explained: "Physical laws operate in three-dimensional space, and interactive behaviors unfold in three-dimensional space. Navigating to the back of a table needs to be done in three-dimensional space. Building the world, whether physical or digital, must be done in three-dimensional space."

Casado also added from the perspective of computer programs that for many spatial-related tasks, robots or programs require explicit three-dimensional information for navigation and manipulation, because critical depth information (Z-axis) is missing in 2D images. The human brain can reconstruct a 2D video into a 3D scene, but computer programs require direct 3D input.

To illustrate this more vividly, Fei-Fei Li shared an experience. About five years ago, she briefly lost stereoscopic vision for a few months due to a corneal injury, which meant she was seeing the world with one eye. "I became very afraid to drive," she recalled, "even just driving in my neighborhood, I realized it was very difficult for me to accurately judge the distance between my car and parked vehicles... I had to slow down very, very much." This also indirectly confirms why AI if it is to truly understand and master the world, 3D perception is an indispensable part.

Although the concept of "world models" sounds more cutting-edge than large language models, its research did not start from scratch. Fei-Fei Li introduced that computer vision as a discipline has been carrying out various scattered explorations and accumulations. For example, a significant innovation in 3D computer vision—Neural Radiance Fields (NeRF)—was completed by World Labs' co-founder Ben Mildenhall and his colleagues. The pioneering work of another co-founder, Christoph Lasinger, has promoted the resurgence of Gaussian Splatting representation as an effective 3D scene representation method. In addition, co-founder Justin Johnson, a former student of Fei-Fei Li, did a lot of foundational work in the field of image generation (such as Generative Adversarial Networks (GANs) and style transfer) before the advent of Transformers, and these all constitute the core components of current research.

It is on the basis of these academic accumulations and technological breakthroughs that World Labs has been able to bring together top global talent in computer vision, diffusion models, computer graphics, optimization, AI, and data. "All these people form a close-knit team, working together to bring this technology to fruition and ultimately productize it," Fei-Fei Li emphasized.

Casado also evaluated the necessity and challenge of building such a team from an outsider's perspective: "I must say, from an outsider's perspective, to solve this complex problem, you need experts in AI, and you need experts in graphics. You need a very special team with this interdisciplinary capability to really crack this problem, and Fei-Fei has successfully assembled such a team."

+++++++++++++++++++++++++++++++++++++++++++++++++++++

Recommended Classic New Book:

Decoding Turing to AI Source Code – Journey with Computing Pioneers, Define the New Era of Digital Intelligence! This book introduces 76 Turing Award recipients, their work, achievements, and contributions. Through their introductions, one can witness the development history of a branch of computer science. This book leads us to experience this magnificent and turbulent history. You can scan the QR code in the image to purchase.

Image

ImageImage

Featured Articles:

1. Turing Award winner Yann LeCun: Chinese people don't need us; they can come up with very good ideas themselves.

2. The Birth of a Turing Award

3. Nobel Laureate and AI Godfather Hinton's Academic Lecture: Turing believed in another kind of AI, backpropagation is better than the human brain, and open-source models will bring fatal dangers to the world.

4. Turing Award winner LeCun slammed Silicon Valley's arrogance! Explosive long article in the industry: DeepSeek R1-Zero is more important than R1, becoming the key to breaking the AGI deadlock.

5. Turing Award winner, AI Godfather Bengio: OpenAI will not share superintelligence, but will use it to destroy others' economies.

6. AI Godfather, Turing and Nobel Laureate Hinton interviewed by CBS: AI is now a cute little tiger raised by humans; beware of it biting its master.

7. Turing Award winner Bengio predicts o1 cannot reach AGI! Nature's authoritative interpretation of AI's astonishing evolution, the ultimate boundary is in sight.

8. Should we quickly give up reinforcement learning?! Turing Award winner, Meta Chief AI Scientist Yann LeCun calls out: Current reasoning methods "cheat," and scaling large models is meaningless!

9. Turing Award winner Yann LeCun: Large language models lack understanding of the physical world and reasoning ability, unable to achieve human-level intelligence.

10. Turing Award winner Geoffrey Hinton: From small language to large language, how does artificial intelligence truly understand humans?

Main Tag:Artificial Intelligence

Sub Tags:World Models3D ReconstructionFei-Fei LiComputer Vision


Previous:Wharton Professor Ethan: Are We Really Using AI? Or Just Letting It Fill Blanks, Cut Costs, and Accelerate the Path to Extinction?

Next:World's Top Mathematicians Amazed by AI's Proficiency in Their Work

Share Short URL