Jeff Dean: AI Will Replace Junior Engineers Within a Year, Netizens: "Altman Only Pitches, What Jeff Says Is Fatal"

Compile | Nucleus Cola, Tina

Recently, Google's legendary engineer Jeff Dean made a bold prediction in an interview: within a year, we will have AI systems capable of operating 24/7 with the capabilities of a "junior engineer".

Jeff Dean is a legendary figure in modern computing, having spearheaded many breakthroughs in large-scale distributed systems and artificial intelligence at Google. He is not only a co-founder of the Google Brain project but also led the creation of key systems like MapReduce, Bigtable, Spanner, and TensorFlow. Since 2018, he served as the head of Google AI, and in 2023, after the merger of DeepMind and Google Brain, he became Google's Chief Scientist. From contributing to the BERT paper and leading TPU development to driving the evolution of Google's foundational AI architecture, Dean has witnessed and personally experienced almost every critical AI development point at Google.

As one of the most influential figures in the tech world, Jeff Dean's remarks quickly sparked widespread discussion in the industry. While many insiders, including Sam Altman, had previously expressed similar views, Jeff Dean's words carry significantly different weight. As one netizen put it: Compared to Sam Altman, who is always "selling" some concept, Jeff Dean is more like a down-to-earth computer scientist – every word he speaks is more rigorous and profound, worth our serious listening and reflection.

图片

图片

Although the actual impact of this revolution on the programmer job market has not yet become apparent, various signs indicate that the software development profession may undergo a profound transformation in the coming years. AI engineers are not just capable of "writing code"; they possess a continuity and scalability potential unmatched by human engineers, and this may just be the beginning.

To help everyone better understand Jeff Dean's judgment and views, we have also translated his interview content. Below is the original text (only the order of some questions has been adjusted for easier reading):

The Evolution of AI and the Industry Landscape

Bill Coughran: As Alphabet's Chief Scientist, let's start our conversation with Jeff on this topic: Many friends here are clearly interested in AI and have been following its development. Google has contributed a lot of the underlying foundation upon which the industry exists, especially the Transformer architecture. So, how do you view the direction of development within Google and the entire AI industry today?

Jeff Dean: I think the field of AI has been brewing for a long time, it's just that it's really entered the public consciousness in the last three or four years. Actually, starting around 2012, 2013, people were already using these seemingly massive neural networks at the time to solve various interesting problems. The same algorithms were applicable to visual, speech, and language tasks. This was a remarkable achievement and allowed machine learning to gradually replace traditional hand-designed methods as the primary way to solve these problems.

And as early as 2012, we were already focusing on a question: how to scale up and train extremely large neural networks? We trained a neural network 60 times larger than other models at the time, using 16,000 CPU cores, because that's the only hardware Google's data centers had then. We got very good results. This convinced us that scaling this method indeed works. More and more evidence appeared afterwards, and hardware improvements also helped us enhance our scaling capabilities, allowing us to train larger models and process larger datasets.

We used to have a slogan: "Bigger models, more data, better results." For the past 12 to 15 years, this has largely held true. As for future development directions, I think current models can accomplish some very interesting tasks. Of course, they can't solve all problems, but they can solve more and more problems every year because the models themselves are constantly improving. We have better algorithmic improvement methods that allow us to train larger models with the same computational cost, gaining more powerful capabilities. Additionally, we've had breakthroughs in hardware, with computing power per unit of hardware continuously increasing. We also have reinforcement learning and post-training techniques to make models better and guide them to perform as we expect. All of this is very exciting. I think multimodality is also an important trend, where input formats can be audio, video, images, text, or code, and output can also cover these forms. In short, AI is becoming increasingly useful.

Bill Coughran: The entire industry is currently very fascinated with "agents." Do you think these agents are really useful? Google just released an agent framework recently. Not specifically targeting Google, but I always feel that the current hype around agents is a bit theoretical. Sorry, I might be a bit blunt when I talk...

Jeff Dean: That's okay. I think the field of agents indeed has huge potential, because we see that through the right training process, agents can eventually accomplish many tasks in a virtual computer environment that require humans today. Of course, currently they can only complete some tasks, and there are many they cannot handle.

But the path to improving their capabilities is relatively clear: you can do more reinforcement learning to let agents learn from experience. In fact, many early products couldn't accomplish most tasks but were still very useful to users. I think similar progress will happen in the field of physical robot agents.

Today, we might be approaching a turning point where robots still can't adapt well to cluttered environments like this conference room, but we can see a clear path where, in the next few years, they should be able to perform dozens of actual tasks in such rooms. Initially, robot products capable of performing these tasks will certainly be expensive. But then, through experiential learning, their cost will be optimized, becoming a tenth of the original price, while also being able to perform thousands of tasks. This will further drive cost optimization and technological capability improvements. So, the development of agents is very exciting overall.

Bill Coughran: That's true, it's just that we can't demand too much right now. Another question that often comes up is the current state of large model development. Obviously, Google has Gemini 2.5 Pro and the Deep Research project, and OpenAI and other companies are also involved. Discussions about the number and direction of development of open-source and closed-source large language models in the industry have never stopped. What are your thoughts? Google certainly has a strong position in this field and hopes to continue to dominate, but how do you view the changes in the overall landscape?

Jeff Dean: I think building the most advanced models requires a lot of investment. Therefore, there won't be dozens or hundreds of such models on the market; ultimately, only a few may remain. Once you have these powerful models, you can use techniques like knowledge distillation to generate lighter models for more scenarios.

I was once a co-author of this technology, but NeurIPS rejected our paper in 2014, believing it was unlikely to have an impact.

I heard that DeepSeek might have benefited from this technology. In short, this is a very practical technology: when you have a stronger model,

Bill Coughran: One quick question. Do you use "ambient programming"?

Jeff Dean: I actually tried it a little, and it worked surprisingly well.

We have quite a few demo chat rooms at work, and even the communication for the entire Gemini project is pretty much done in chat rooms. I'm in about 200 chat rooms, and every morning when I wake up and brush my teeth, I get about 9 notifications because my London colleagues are already busy.

We have a really cool demo: you can upload an educational YouTube video and then the prompt is "Please create an educational game based on this video that includes graphics and interactive elements." While it doesn't always succeed, there's about a 30% chance that it actually generates some interesting content, like a game about differential equations, traveling to Mars, or topics related to cells. This is a huge signal for education.

The tools we have now, and the tools we will have in the next few years, truly have the opportunity to change the world in a positive way. We should remember that this is what we are striving for.

Audience: I'm very curious about your thoughts on the future of search, especially given the popularity of Chrome. Chrome already holds payment credentials and web signature credentials, etc. Have you considered integrating Gemini directly into Chrome, turning Chrome applications into Gemini applications, rather than keeping them as separate applications? I'm saying this because I'm an official employee of Google, so please consider your answer carefully.

Jeff Dean: Yes, I think many interesting downstream applications can be derived from the core Gemini model or other models. One of them is to help you complete tasks by observing your operations in the browser or on your desktop computer, such as performing OCR on tabs or accessing the content of original tabs.

This seems very useful. We've had some initial results in this area and have released public demos in video form, such as projects like the AI assistant Mariner. The specific results remain to be seen.

Audience: You previously mentioned that there might only be a few participants left in foundational models, mainly due to the high infrastructure costs and the scale of investment required to maintain cutting-edge technology. As this cutting-edge competition unfolds, where do you think things will ultimately go? Will it simply be whoever spends the most money and builds the largest cluster wins? Or will it be about better utilizing unified memory optimization and existing resources? Or will it ultimately depend on user experience? Where is this arms race heading? Is it whoever reaches Skynet level first wins?

Jeff Dean: I think the winner will be determined by both excellent algorithmic work and outstanding system hardware and infrastructure achievements. It's not simple to say that one is more important than the other, because in the generational evolution of our Gemini models, we've seen that the importance of algorithmic improvements is comparable to, or perhaps even higher than, the importance of hardware improvements or investing more computational resources.

But from a product perspective, this field is still in its early stages. I don't think we've found that killer product that billions of people will use every day yet. It might be an application in the education field, or it might be an information retrieval tool similar to a search engine, but one that fully leverages the advantages of large multimodal models. I think helping people complete tasks in their respective work environments is the most important thing. So, how will these ideas be translated into specific product forms? For example, how should I manage a team of 50 virtual agents? Most of the time they will execute tasks correctly, but occasionally they will need to consult my opinion. I need to provide them with some guidance. This is equivalent to thinking about how I should manage 50 virtual interns? This will be a complex problem.

Audience: I think you might be the most suitable person in the world to answer this question: How far do you think we are from having an AI that can work 24/7 and has the level of a junior engineer?

Jeff Dean: I think it's closer than people imagine.

Bill Coughran: Specifically? Six weeks, or six years?

Jeff Dean: I will claim that's probably possible in the next yearish.

Audience: Returning to the topic of having a junior engineer-level AI within a year. I'd like to know what breakthroughs we need to achieve this goal? Obviously, code generation capabilities will further improve, but what else do you think is needed? Is it tool usage ability? Or agent planning ability?

Jeff Dean: I think the capabilities a virtual engineer needs go far beyond just writing code in an IDE. It also needs to know how to run tests, debug performance issues, and so on. We know how human engineers do this: they have to learn to use various tools to accomplish tasks, gain wisdom from more experienced engineers, or read a lot of documentation. I think virtual junior engineers will be best at reading documentation and constantly trying things out in a virtual environment. This seems to be a way to improve their capabilities. As for how far they can go, I don't know, but I believe this is a very promising path.

The Important Role of Hardware in AI

Bill Coughran: That makes sense. Another striking trend is the development of hardware. In my opinion, major companies are developing their own hardware. Google publicly announced its TPU plan very early, and Amazon also has its own solution. It's rumored that Meta and OpenAI are developing their own chips. But currently, it seems like only Nvidia is being heard in the industry, although that's certainly not the case in your Google offices. What do you think about this issue? How important is specialized hardware for these tasks?

Jeff Dean: Clearly, hardware focused on computation for tasks like machine learning is very important. I like to call them "lower-precision linear algebra accelerators." Each generation of hardware needs to become more powerful and be connected on a massive scale through ultra-high-speed networks to distribute the model's computational requirements across as many computing devices as possible. This is crucial. I remember I helped launch the TPU project in 2013 because at that time we clearly needed a lot of inference computational resources - that was the first generation. The second generation of TPU (TPUv2) handled both inference and training because we saw the demand for that. The version we are using now is no longer numbered because it's too troublesome. We are currently introducing Ironwood, planning to take over from the previous version, Trillium.

Bill Coughran: That name sounds like Intel chips; they didn't seem to do very well... Sorry, maybe that's off-topic, let's talk about something else. I have many physicist friends who were a bit surprised that Geoffrey Hinton and his colleagues won the Nobel Prize in Physics. What do you think about this? Some physicists I know were even unhappy that non-physicists won the Nobel Prize. How far do you think AI will ultimately go in various fields?

Jeff Dean: I think it will go very far. This year, my colleagues Demis and John Jumper also won the Nobel Prize. I think this shows that AI is influencing many scientific fields. Because fundamentally, the ability to learn from interesting data is an important issue in many scientific fields, which is discovering connections between things and understanding them. If AI can assist in achieving this, that would be great. After all, in many scientific fields, we often encounter extremely expensive computational simulation scenarios, such as weather forecasting, fluid dynamics, or quantum chemistry simulations.

The current approach is to use these simulation scenarios as training data and train a neural network to approximate the function of the simulator, but the speed can be increased by 300,000 times. This has completely changed the way we do scientific research. Suddenly, I can screen tens of millions of molecules in the time it takes to eat a meal; in contrast, previously I had to run for a whole year with terrible computing resources to complete this. This fundamentally changes our scientific research process and will accelerate the speed of scientific discovery.

Bill Coughran: I want to quickly follow up on Geoffrey Hinton's situation. He left Google because of his research on the differences between digital and analog computation in reasoning and learning. I want to know, will future inference hardware move towards an analog direction?

Jeff Dean: It's certainly possible. Analog computation has advantages in power efficiency. I also think there's still a lot of room for specialization in digital computation for inference. Generally speaking, digital systems are easier to operate. But I think the overall direction is: how to make inference hardware orders of magnitude more efficient than today's level - ten thousand, twenty thousand, or even fifty thousand times? If we are determined to do it, it's completely possible. In fact, I'm spending time researching this myself.

Audience: Hello, I'd like to ask about the relationship between developer experience and hardware. I think TPU hardware is excellent, but there's a view in the community that CUDA or other technologies are easier to use than TPUs. What do you think about this? Is this something you've been thinking about? Have you received many angry complaint emails?

Jeff Dean: I've thought about it. Although I rarely interact directly with Cloud TPU customers, there's no doubt that the experience has a lot of room for improvement.

In 2018, we started developing a system called Pathways, whose design goal was to allow us to use various computing devices and provide a good abstraction layer. In this system, the mapping of virtual devices to physical devices is managed by the underlying runtime system. We support PyTorch and Jax.

We mainly use Jax internally, but we wrote a single Jax Python process that makes it look like it corresponds to tens of thousands of devices. You can write code like an ML researcher and then run it. You can prototype with four, eight, sixteen, or sixty-four devices, and then just change a constant to switch to the Pathways backend supporting thousands or tens of thousands of chips and continue running. The experience is very good.

Our largest Gemini model is driven by a single Python process using tens of thousands of chips, and it works very well. That kind of developer experience is ideal.

What I want to say is that we hadn't opened this functionality to cloud customers before, but we just announced at Cloud Next that Pathways will be available to cloud customers. This way, everyone can enjoy the wonderful experience of controlling thousands of devices with a single Python process. I agree, this is much better than directly managing 256 chips on 64 processors.

Audience: I really like using the Gemini API. It would be even better if I could use a single API key instead of Google Cloud credential setup. Are you planning to unify the Google Cloud and Gemini stack with the Gemini project? Currently, the latter is more like a test version.

Jeff Dean: I think some simplification measures are being considered in this regard. This is a known issue, and I personally don't spend too much time on it, but I know Logan and other members of the developer team are aware of this friction point. We want to make it frictionless for users to use our tools.

It's being considered, and relevant simplification measures are also underway. We are all aware of this issue, and although I personally don't spend much time on this matter, I know that other members of the Google developer team are aware of this pain point and hope that users can use our tools in a more seamless way.

Audience: This is an interesting time in computing. Moore's Law and Dennard scaling are no longer effective, while AI scaling continues to grow crazily. You are in a unique position to drive the development of these supercomputers and infrastructure. More importantly, you have a unique skill: understanding how to map workloads to these systems. So, what do you think the future of computing will look like? From a theoretical perspective, which direction will the computing infrastructure develop?

Jeff Dean: I think one clear point is that the type of computation we want to run on computers has changed significantly over the past five to ten years. Initially, it was just a small ripple, but now it has become a raging wave. We want to run ultra-large scale neural networks with extremely high performance and very low power consumption, and we want to do training in the same way.

Training and inference are two completely different workloads. Therefore, I think it makes sense to distinguish between them, and you may need different solutions or at least slightly different solutions for these two tasks. I think all computing platforms will adapt to this new reality, which is that their main role is to run extremely powerful models. Some of these applications will be performed in low-power environments, such as everyone's mobile phones.

We all hope our phones can run large parameter models at extremely fast speeds, so that when talking to the phone, it can respond quickly and help us complete various tasks. We will also run these models in robots and autonomous vehicles. We have achieved this to some extent currently, but better hardware will make it easier to build these systems and will also make real-world embodied agents more powerful. At the same time, we also hope to run these models at ultra-large scale in data centers. In addition, for some problems, we need to use a lot of inference computing resources, while for others, we do not.

In short, we need to find a balance: for some problems, you should invest tens of thousands of times the computational resources of ordinary problems, so that your model can be more powerful, give more accurate answers, or enable it to complete tasks that cannot be completed with only a small amount of computation. But at the same time, we should not invest so many resources in all problems. Therefore, how to make the system run well under resource constraints? I think this should be the result of the combined action of hardware, system software, models, and algorithmic techniques (such as knowledge distillation), all of which can help you achieve powerful models with limited computational resources.

Bill Coughran: One thing I've noticed is that traditional computer science, when studying algorithms and computational complexity, was based on operation counts. As people refocus on hardware and system design details, I've found a new trend: we must reconsider factors like network bandwidth, memory bandwidth, etc. Therefore, I think traditional algorithmic analysis needs to be completely rewritten because the actual computing patterns are completely different.

Jeff Dean: My graduate school roommate did his thesis on cache-aware algorithms because big O notation didn't account for the fact that some operations might be 100 times slower than others. That's right. In modern machine learning computation, we are very concerned with tiny differences in data movement - for example, the cost of moving from SRAM to an accumulator might be a minuscule picojoule, but it's already far higher than the cost of the actual computation. Therefore, understanding the concept of "picojoule" is very important today.

Audience: You talked about scaling pre-training and current reinforcement learning scaling. How do you view the future trajectory of these models? Will it continue to be a single model that occupies all computational resources, or will it be multiple small models working together distilled from large models? How do you view the future landscape of AI models?

Jeff Dean: I've always been optimistic about sparse models, which are structures with different expertise in different parts of the model. This draws from our rough understanding of the biological brain, and it's this structure of the human brain that allows us to accomplish many things with just 20 watts of power. When we're worried about hitting a garbage truck when backing up, the Shakespeare poetry module in our heads doesn't become active.

We did some early work on mixture-of-experts models, which used 2 to 48 experts, and found that this model can bring significant efficiency improvements. For example, with the same training FLOPs, model quality improved by 10 to 100 times. This is very important.

But I think we haven't fully explored this area because the sparsity patterns currently used are too regular. Ideally, I'd like there to be certain paths in the model whose computational cost is hundreds or even thousands of times higher than other paths; at the same time, I'd like some parts of the model to have very little computation, while other parts are very large. Perhaps their structure should also be different.

I also want models to be able to expand dynamically, adding new parameters or new spatial segments; perhaps we can compress some parts through the distillation process, making them one-quarter of their original size. Then, the background can, like a garbage collection mechanism, release this part of the memory and allocate it to other more useful places. To me, this more organic, continuous learning system has more potential than the fixed models we have today. The only challenge is that our current methods are very effective, so it's difficult to completely change the existing methods to implement this new pattern. But I firmly believe that this pattern has huge advantages over our current rigid model structure.

Event Recommendation

AICon 2025 is coming strong, with events in Shanghai in May and Beijing in June, a dual-city联动 showcasing the forefront of AI technology and industry implementation. The conference focuses on the deep integration of technology and applications, covering topics such as AI agents, multimodality, scenario applications, large model architecture innovation, intelligent data infrastructure, AI product design, and going global strategies. Scan the QR code to buy tickets now and explore the boundaries of AI applications together!

图片

Main Tag:Artificial Intelligence

Sub Tags:Software EngineeringHardware AcceleratorsGoogleLarge Language Models


Previous:What's the Difference Between AI Agents and Agentic AI?

Next:ZeroSearch: <Alibaba Technology> Large Language Models Learn Through Self-Rewarding Without a Browser

Share Short URL