OpenAI Co-founder Reveals the Company's 'Pain and Suffering': We are Heading Towards a World of Compute Scarcity! Internal GPU Allocation is Like Playing Tetris, Sora 2 is Essentially a Neutered Base Model

Edited by | Ting Yu

“We are heading toward a world of extreme compute scarcity, and energy is going to be the next huge bottleneck.”

“In the future, all licensing will turn into ‘roleplaying’ licensing.”

“We hope to build AI that can autonomously think for a year, or even ten years.”

These views come from a private, in-depth interview with OpenAI Co-founder and President Greg Brockman during DevDay a few days ago, shortly after the release of Sora 2.

In this interview, Greg was very genuine and candid, offering an extremely high density of information.

Greg did not shy away from the dilemmas facing OpenAI. He used "pain and suffering" to describe the internal decision-making process for allocating compute resources, and how they transformed from a pure software company into an infrastructure company that needs to consider building data centers and even developing its own energy facilities.

He stated frankly that the US energy supply will become the biggest bottleneck for AI development. Additionally, he shared a comparison between base models and post-trained models, and his reevaluation of the definition of AGI.

In addition to acknowledging that the current biggest bottleneck is compute and energy, Greg also systematically explained for the first time:

  • Why Sora 2 was developed from a technical model into a social product.
  • How AI agents will change the way the internet is monetized.
  • How they internally allocate extremely scarce GPU resources like playing 'Tetris'.
  • His latest perspective on the AGI timeline and the value of humans within it.

We have edited the entire conversation here. The information density is extremely high, so we recommend saving it for a detailed read.

Model Scaling and the Universality of the Transformer Architecture

Host:

Sora 2 was released last week. What was the experience like scaling a model like Sora, and how does it differ from text or image models?

Greg Brockman:

I like to think about it at the base level—it's still all deep learning, the mechanism is the same, and the underlying principles haven't changed. You need to scale up massive amounts of computation, performing forward passes and gradient calculations. At a more detailed level, it’s still a Transformer, which is quite surprising. You train it in different ways, employing different processes, involving concepts like diffusion. You consider how to inject compute power into these models, but fundamentally, what truly amazes me is that despite talking about text and video, which seem like completely different modalities, their underlying computation processes have huge overlaps. That’s really profound.

Host:

Do you think the Transformer architecture will drive us into the next phase? Even toward achieving world-class models, which Sora 2 clearly took a significant step toward.

Greg Brockman:

Yes, I think there are two points to make. First, I think there are many questions worth discussing, such as whether we missed major creative ideas or whether we need an innovation like the Transformer again. I think the space for innovation is still vast, and we have seen this progress, with the pace of algorithmic improvement maintaining synchronization.

We have been doing research for many years, tracking the curve of model evolution, and I don't think this progress will stall. The scaling curve and data curve continue, and these are what drive this revolution. Every stage has its limiting factors, and you just need to keep adjusting. You will see significant improvements in model performance. So, I think we have a lot more to build. If AGI looks somewhat similar to current models, I wouldn't be surprised, but if it's exactly the same, I would be very shocked.

Host:

When you look at these different types of models, although they are all based on Transformers, how different are their costs? How do you measure the unit economics of different model types?

Greg Brockman:

Yes, there are indeed different performance characteristics. Sometimes we use different inference stacks, and the optimization methods vary. Some models might be better suited for different types of hardware, and there might be differences in the balance between memory and computation.

A lot of system work looks very different in the details. When you try to squeeze the maximum performance out of the hardware, it pushes you in very different directions. But ultimately, we always believe that the core driving force pushing all this innovation and bringing it to the world is still compute.

AMD Collaboration Progress and Chip Ecosystem Challenges

Host:

Recently, new progress was announced in the collaboration between OpenAI and AMD. Is building on AMD hardware fundamentally different from building on other hardware? Is it simply that we can now call upon an ever-expanding pool of resources, or does it require deep technical improvements?

Greg Brockman:

We have actually invested in AMD's software in multiple ways because we build upon Triton. Triton is a project we fund, and it supports most of our GPUs.

Our biggest challenge right now is inference versus training. The fixed cost of inference is already high, and the fixed cost of training is even higher. Now, we have been able to use AMD software with very little effort and achieve good performance. This is thanks to our long-standing partnership with AMD, and we've provided a lot of feedback. Now, from an inference perspective, we feel we've made good progress in scaling, and every hardware platform has its niche and innovations suited for it.

Host:

Have you considered emerging competitors like Cerebras or other similar companies that have taken different paths in chip architecture?

Greg Brockman:

Yes, we were very excited when we saw Cerebras in 2017 because it was a completely different paradigm. When you saw the numbers, you thought, "Wow, if we had a million of these devices, we could achieve AGI." It was clearly a very different, very special platform.

However, it turned out that the challenges of building non-GPU architectures were far greater than we anticipated. In 2017, we actively considered the entire ecosystem, trying to communicate with different chip companies, offering them advice on how workloads should be designed. Frankly, most companies didn't listen to our advice. That was back in 2017.

Host:

OpenAI was certainly very different then than it is now.

Greg Brockman:

You'd be surprised that some people still aren't listening to our advice now. However, I think largely, it's not because they think we are wrong, but because if you look at the problem from the perspective of someone in the chip industry, their thinking is fixed, making it hard to understand the workload requirements. When you try to say, "No, no, the problem should be viewed from another angle," that's when you realize that models should be large, not small. If you don't accept this design philosophy, it's hard to change your original worldview. So, successful companies are usually those that approach it from a deep learning perspective, or at least understand the direction the workload is heading.

The Biggest Bottleneck: Compute and Energy Scarcity

A situation of 'Pain and Suffering'

Host:

When you look at the entire process from computer construction to inference services, where do you see the biggest bottleneck today?

Greg Brockman:

I think we are heading towards a world of extreme compute scarcity, and energy, especially in the US, is going to be a huge bottleneck. Also, many parts of the supply chain haven't adapted to the demand we foresee. Therefore, this is what we have been emphasizing repeatedly for years: we need to build more compute capacity.

Host:

There are many rumors about whether OpenAI is developing its own chips. Have you considered investing in your own energy systems or trying something new in that regard?

Greg Brockman:

If you asked my ten-years-ago self, the me of 2015, we would tell you we were going to build AGI. We viewed it as a software task back then.

But we gradually realized that compute capability is the fundamental material needed to build AGI. It is easier to scale than other resources. That's why we focus so much on compute capability.

You have to push it to the limit, and then you start to realize that you actually need to build massive physical infrastructure. So we are now moving into this domain, starting to build our own data centers like Stargate.

I believe our current bottleneck mainly depends on whether the market can respond in time to the demands we transmit. We have signaled loudly to the market—and this is not just from OpenAI but the entire industry. If the market wakes up and responds to these demands, then we can avoid having to develop energy infrastructure ourselves.

Host:

But we still have to get things done. So, with the limited GPU and compute resources available, you have many conflicting demands, including consumer products, enterprise products, developer APIs, and training. How do you decide how to allocate these compute resources, and how do you coordinate internally?

Greg Brockman:

Pain and suffering, that’s the most honest description. It's incredibly hard because you see all these amazing projects, and many people come to pitch their ideas, and you think, "This is fantastic!"

Host:

You are doing so many things; how do you choose what to focus on? Even for a smaller company like ours, decision-making is difficult. Can you describe how OpenAI handles these issues internally?

Greg Brockman:

Mechanistically, we now have a process. For example, Jakub Pachocki (OpenAI Chief Scientist) and Mark Chen (OpenAI Chief Research Officer) are responsible for deciding compute resource allocation. But more broadly, there are disagreements between the research and application departments, which Sam and I usually coordinate for the final decision.

On the research side, I just described how compute resources are allocated. At a practical level, I have people on my team dedicated to the grueling task of actually scheduling GPU resources. You know, it's a very interesting process. For example, Kevin Park is one of my team members. When you go to him and say, "We need more GPUs to support this new project," he will say: "Okay, five projects are nearing completion right now, and this new project needs to be done first." Then we can adjust the resources.

It's like playing a game of "Tetris"; it's truly amazing to see the entire process realized. I feel that the allocation of compute resources is not just a simple decision; it's actually a very complex coordination effort, with some parts solved by people and some managed by spreadsheets. It's truly a fascinating process to witness, especially given the high attention people pay to accessing compute resources, which is undeniable in driving team productivity.

Host:

You announced a new initiative that introduces the "web" into ChatGPT. You showed the Zillow example. As applications gradually shift towards a more native experience, how do you view this decoupling of the internet experience? As agents increasingly browse on our behalf, it seems people are spending less time personally browsing traditional websites. What do you think the next 18 months will look like?

Greg Brockman:

Actually, I want to add something before answering the previous question. I believe we are moving towards a world where compute capability drives the productivity of the entire economy. The small ecosystem you see within OpenAI, I think will appear everywhere in the future. So what I truly believe is that we need to build compute capacity to alleviate the scarcity issue and better handle allocation problems when we face them.

Host:

What do you think the ratio of supply to demand is currently?

Greg Brockman:

Are we far from the goal? Oh, I think we are very far. I'm not sure exactly how big the gap is, but I can say, if our current compute capacity increased tenfold, would our revenue grow tenfold? I'm not sure, but maybe fivefold. Because we have many products waiting to be released that we cannot launch.

You can see some projects intuitively, like Pulse, which is currently Pro only. Pulse is a great project.

Host:

Yes, we will discuss that project later. It really requires intensive compute resources.

Greg Brockman:

We definitely need more compute resources.

Is the AI Agent Reshaping the Internet? New Monetization Methods May Emerge

Host:

Let’s talk about the decoupling of the internet. You find that the fundamental way we browse the internet is changing drastically, especially as agents start browsing the internet for us, and now bringing traditional websites into ChatGPT. What do you think of this change?

Greg Brockman:

I think ChatGPT really makes you realize how unnatural it is to go to a static website just to look up information. It's like browsing static information.

You are looking for one fact while browsing a page, but most of the page's content is irrelevant to it. We have almost moved past this stage, although we occasionally still encounter it, it is no longer mainstream or what people want to do. When you realize you spend so much time doing these things, it actually doesn't add any value; it's like finding a needle in a haystack. In reality, machines should be doing these things for you.

I believe that with the development of dynamic applications like apps and ChatGPT, in the future we will no longer need to go to websites and click a bunch of buttons to perform dynamic operations. That feels like a complete regression, something we should have overcome long ago. So, I think we are moving towards a world where people will value their time more, because now there's no excuse to waste time on things that don't generate value. If humans are not thinking, creating, or providing feedback, that’s the AI’s job.

Host:

So how will this change internet monetization? You know, traditionally, the web profits from CPM advertising—users provide page views, and the site offers free content and ads. But when an agent browses on your behalf, especially when you bring sites like Zillow into ChatGPT, conflicts arise. For example, are they still showing ads? What would that model look like? How do you view the changes in the internet monetization layer as these shifts occur?

Greg Brockman:

The truth is, no one knows the exact answer right now. But I think we can see the trend; we must explore and find the right way to adjust the new monetization models, and find the correct scaling methods. I think fundamentally, these technologies impose new requirements for providing value to the user.

If you look at ChatGPT, it's a subscription product now, right? We probably didn't predict this when we launched it three years ago, but people are willing to pay for it because it truly adds value—it helps both personal and professional life; this value is comprehensive. So, I am not saying that advertising has no place, but I think the current form of advertising, where you unconsciously scroll a page looking for a sentence you care about and accidentally click on an ad page, is no longer the main driver of value.

However, I do believe new revenue models will emerge, and there will be new ways to monetize. And, honestly, I think this is the most exciting time right now.

ChatGPT is Not 'Another App Store'

Host:

This is truly a golden age for building. If you look back ten years ago at publishers during the mobile internet transition, many companies became dependent on Apple's App Store after entering it. How would you explain to them why this time is different, and why ChatGPT might become the 'homepage' for your AI experience?

Greg Brockman:

I think this story hasn't been written yet. I have one observation: AI always seems to develop in a surprising way, completely unlike anything we've seen before.

It has elements that remind people of the past, but I don't think there's a clear analogy. For example, "This is a continuation of the internet," "This is a continuation of the mobile internet," or "This is like an App Store." I think it's something entirely new. So, how do you want to interact with AI? Is it mediated by a website that interacts with everything else? I'm not sure.

Because one of the meanings of AI is bringing machines closer to humans, rather than forcing yourself to think: "Oh, there's a URL, I have to visit that website." In fact, the machine should directly follow your needs, or even proactively consider what you might want and do it for you. I think this paradigm shift might change our view of entry points and opportunities. So I believe there is huge room for development here, and I'm not sure if interaction with everything can be achieved through a single portal.

From Passive Tool to Active Partner: The Future of AI Autonomy

Host:

I want to follow up on a question. How far away do you think we are from the day when AI can predict most of my needs? When ChatGPT was first released, it was a very passive tool. I gave it a prompt, and it returned the corresponding content. Now, features like Pulse are starting to become more proactive. How do you view the ratio change between reactive and proactive AI over the next 24 months?

Greg Brockman:

I see proactivity becoming more important. For example, you give AI a small task, and it might spend a day, a week, or a month thinking about it. Our goal is to build AI capable of autonomous thought for a year, or even ten years. That's like a human.

Host:

Does this mean there is absolutely no human intervention during that time?

Greg Brockman:

I think it's somewhat like the human process of solving Fermat's Last Theorem. For instance, Andrew Wiles spent ten years essentially solving the problem himself. While he wasn't completely isolated from human interaction, he spent most of the time thinking independently. That's also what we aim to achieve.

We hope AI can help us solve grand problems. To have AI autonomously perform productive work without us needing constant micromanagement. That is painful for humans, and it is also painful for AI. We want to build a world where you can choose whether to micromanage. However, if you constantly micromanage productive humans, they will likely become unhappy quickly. So, I think this shift will completely change the way work is done, and you will truly be able to choose what you want to spend your time on.

Host:

I've seen many discussions about how many hours AI can think independently. Typically, it can think autonomously for many hours. So, how do you view the trade-off between the duration AI can think autonomously and the tasks it can complete during that time? For example, if it takes 30 hours to calculate "1+1," that's clearly different from the complexity of solving cancer. How do you view the trade-off between compressing intelligence within a given time window versus extending the time window?

Greg Brockman:

Yes, I think that's a good question, and it's easy for seemingly meaningful metrics to actually mislead you. As you said, some problems require more thought, stronger computing power, and more compute resources. What you really want is an AI that can efficiently think for a day to solve these complex problems. But if we can solve it easily, that's great.

Host:

Right, like ten Saturns.

Greg Brockman:

If we can do that, of course, it's great. I think these problems are two different dimensions, and it's important that we continue to push on both.

Host:

Alright, considering this question, how long can Codex think completely autonomously? What's the current record?

Greg Brockman:

Actually, I don't know the specific record. I think we released related data once. I know some people reported that Codex has been able to think independently for about seven hours, but I'm not sure if that's the limit. You can find relevant information online. My point is, we are now able to put significant compute resources into some interesting problems.

Why Did Sora 2 Become a Social Product?

Host:

Let's talk about Sora 2. I think some members of my team might be addicted; it's really good to use. When developing this new model, moving from Sora 1 to Sora 2, why did you decide to make it a social experience, rather than releasing and using it in a more traditional way, like Sora 1?

Greg Brockman:

When we think about what features to build, we primarily look at the model's capability, which is why we ultimately launched ChatGPT. I remember when we were developing the infrastructure for the chat feature, and then we released GPT-4.

We did the first training then, and at the time, we were just doing instruction following—using a dataset where the model receives a question and provides an answer. I remember trying another approach: giving the model another question whose answer depended on the context of the previous question. The model should understand and utilize this information, but it actually failed to do so.

You think: "Wow, this model is smart! It can perform this kind of reasoning." It clearly wanted to be a chat model, and the technology had evolved to the point where it should be released as a chat system.

For Sora 2, there is indeed a similar feeling, especially when thinking about the model's strengths, weaknesses, what it can do, and its novelty. Therefore, we have many paths we can take, and many roads are still untraveled. Personally, any interface, any post-trained model, feels a bit regretful, because you are actually narrowing the scope of the original model's capabilities. The original base models are very interesting; they are difficult to use, but they contain infinite possibilities.

Host:

I can understand that there must be many considerations behind your decisions.

Greg Brockman:

I think the outside world doesn't completely understand this point, which makes me feel a bit sorry, because we have released base models before. For example, GPT-3 was a base model at the time, which was perfect but very difficult to use.

Did you use GPT-3? Back then, you had to provide six examples of a task before the model knew how to answer.

Host:

I see, so it was the model in its base stage, not that it had improved after multiple iterations.

Greg Brockman:

Yes, that’s how you should understand it. These base models, what we train them to do is "next-step prediction"; they are almost observing human thought, behavior, and all public data.

It is essentially saying, given this prefix, what comes next? What comes next? During inference, it's like extracting a document from some public data and asking: "What comes next?"

Then, you need to consider how to format the query in a way that would naturally occur in the distribution. So you discovered a pattern: if I have a question and an answer, and then provide another question and answer, the model will know the next thing should be an answer. But if there is only a question, the next thing might be another question.

It’s like guiding the AI to roleplay, making it feel like it's within a reasonable document that conforms to the training data distribution.

However, doing this is very difficult to use, the user experience is very poor, the product is not good, and we cannot control the behavior and values it expresses. It’s somewhat like a person who accumulates knowledge by observing the world and possesses an understanding of everything. Someone once analogized that a base model is more like training a human than a robot. It is all-encompassing, possessing all values and worldviews.

So, when you ask it how to respond to a specific situation, it can basically provide any response a human might make. If you want the model to focus on a consistent set of values, then other steps are needed to guide it. This is the meaning of post-training. The purpose of post-training is to refine this "raw intelligence" to ultimately form a more consistent personality or behavior pattern.

Host:

Does this mean the decision to make it a more social product was made before post-training? Or did you find it had a particular talent for imitation?

Greg Brockman:

The process is actually an iterative loop: you first get the base model and see how it behaves. Then you try different prompts, and when you see certain reactions, you think: "Oh, that's interesting! How great would it be if it could reliably work on this task!" You don't need to do much extra work.

Base models are like the world’s best prototype engines, but they are not reliable. Because finding the right prompt to get the model to complete the task you want is extremely difficult. This is actually a communication problem, and subsequent post-training is intended to facilitate better communication.

The Necessity of Roleplaying: Future Trends in AI Image Licensing

Host:

Is your "role" public?

Greg Brockman:

My role is not public right now.

Host:

I made mine public. I remember Sam Altman also mentioned that it’s surprisingly comfortable to allow others to manipulate your image. What do you think?

Greg Brockman:

It is quite interesting. Honestly, I haven't thought too much about the status of my "role," because I think in six months, regardless of what we do, other companies will definitely release a video model that allows you to "roleplay," and without restrictions. So I think we are heading towards a world where all our licensing will turn into "roleplaying."

I believe part of the meaning of being on this technological frontier is to let more people understand the future direction of this technology and try to release it in a beneficial way. You can see this in our choices, but we don't believe we can fully control this technology because we are not the only company building it.

The World Model Debate: Can Language Models Lead to AGI?

Host:

Sora 2 is a world model capable of simulating the world. Yann LeCun once said that language models are insufficient to achieve AGI because a world model cannot be built using language alone. Do you agree with this view? Why or why not? What role does the world model play in the development of AI and AGI?

Greg Brockman:

I like drawing lessons from the AI progress of the past five or ten years and looking at what we have proven through experimental evidence. I think language models lack a world model.

Although language models can process information in written language, they do not build a complete world model. By the way, this is a long-standing debate. It's not just a debate of the last decade; it's decades old. I mean, we couldn't have predicted many of the things GPT-4 can do. You can ask it questions, such as: "I put the water bottle on the table, then unscrewed the cap, and put the bottle under the table. Where is the cap?" Do you think it can answer this question?

Host:

I once had a test: "There's a marble in the cup. If you pick up the cup from the table, where is the marble?" If the model is smart, it should know the marble is still on the table. I remember GPT-3.5 couldn't answer, but GPT-4 answered correctly, and GPT-4o and later models can all do it.

Greg Brockman:

Exactly. Even if it can't perfectly solve some complex tasks, it shows impressive progress. For example, GPT-4 has achieved good performance on some high-level tasks and is gradually moving towards breakthroughs. Its performance suggests an upward trend.

I think it's easy now to get bogged down in semantic debates: for example, what is "understanding"? Are these models truly "understanding" or just simulating understanding? What do these words actually mean? I'm not sure. But what I do know is that when you show me an evaluation that proves these tasks were once considered nearly impossible for models to complete, but now they succeed, that is the most convincing evidence.

Host:

It's like Sam Altman said before, intelligence is actually prediction, and prediction is intelligence. And this seems to support a similar view: large language models can actually achieve AGI.

Will Human Jobs Be Replaced by AI?

Host:

Honestly, I want to ask, is my job at risk? You know, Mr. Beast said AI threatens the livelihoods of content creators, and that's exactly what I do now. Should I be worried? What do you think?

Greg Brockman:

AI will change many jobs. Many people currently working may find their jobs drastically changed in the future, either becoming completely unrecognizable or ceasing to exist altogether. But new job opportunities that we can't imagine now will also emerge.

What will these new jobs look like? What will their form be? How should we view these changes? I believe that during the AI revolution, we will fundamentally alter the social contract.

I think we will enter a world of "abundance." A world where you can enjoy a very high quality of life even if you are not engaged in economic work, because there is so much available. If you strive, compete, and seek status, this world will offer more opportunities, more things to build, and more valuable things. Frankly, my answer is: no one knows exactly what the other side of the AI event horizon will look like, but I know it will certainly be stranger and more pleasant than we can currently imagine.

Host:

I just started my job, so I hope I can maintain the status quo.

Greg Brockman:

I think that amidst the changes brought by AI, there are some fundamental elements of human connection that won't change easily. For example, human emotional connection—that is very interesting for AI. I also believe that skilled tradespeople like mechanics, plumbers, and electricians are already scarce right now, and it is very difficult for AI to replace these fields, because these areas require more hands-on ability, and AI struggles to truly create value there.

OpenAI's Potential Platform Risk

Host:

Let's talk about Codex and other products released by OpenAI. You know we are at a developer event right now, and the room is full of developers. You announced the Agent Kit. So, how should developers view the potential platform risks when building applications on the OpenAI platform? I believe you have considered this internally too.

There's a popular saying that every time OpenAI holds a DevDay, a thousand startups die. While I don't believe that statement, I want to hear your thoughts on it.

Greg Brockman:

Yes, we are indeed often asked this question. We also think about it frequently. We ultimately want to help the world transition economically towards an AI-first approach, and this transition should benefit everyone.

But we cannot do this alone, absolutely not. We definitely need to collaborate with developers. We need people to build on our platform, exploring how to connect this technology to the real world.

We have to make choices, because we are a company. Although we have thousands of people now, which sounds like a lot, if you look at the scale of the entire economy, we are actually very small. We must consider the expertise in different fields and the difficulty of doing well in each one.

So we have to be very selective. What we really try to think about is which areas have synergy with our existing expertise, or where we can see ourselves adding value. For example, programming—that is an area we are very good at.

Furthermore, if we do well in programming, it also accelerates our own work. So I think, while considering how to maximize value for as many people as possible, we will also strive to do better in the specific areas where we can deeply cultivate.

Host:

Do you think code is the language of AGI?

Greg Brockman:

That's an interesting question. I've always thought that natural language will be the language of AGI. I think if AIs communicate with each other, it might be a slightly optimized version of "noisy English" or something similar. If you look at the mathematical proofs we achieved for a gold medal in the International Mathematical Olympiad (IMO) this year, you'll see that these proofs are actually very readable. Although they are very concise, they are essentially an interesting language explored by the AI.

The Future Role of Humanity: From 'Prompt Engineer' to Goal Setter

Host:

Will humans still have a place in this process? I see these models constantly improving, but currently, humans still provide the prompt at the beginning of the task and perform the final validation. I think the human role in this process might gradually shrink, but we still have a place now. How long do you think this situation will last? Will it go on forever? What are your thoughts on all this?

Greg Brockman:

I truly believe that the fundamental purpose of this technology is to benefit humanity—and actually, not just humans, but all sentient beings that can experience joy and pleasure; AI should enhance the well-being of everyone. So the question is, what does this mean?

I don't think we want to live in a world where humans have to expend energy designing prompts, writing code for context engineering, and managing those mechanical details. To me, these details look like legacy leftovers; they represent what computers looked like in the past, not what their future state should be.

What I want, and what I think the world should want, are AI tools that bring machines closer to humans, understand human goals, and help achieve them. I think that is the key. We must ensure that AI enhances the quality of human life. This is the core mission of OpenAI, and we are working hard to push the technology in this direction.

The Future of Software: AI Generates Everything, Humans Focus on Creativity and Aesthetics

Host:

Great. As someone who constantly thinks about programming, you obviously spent a lot of time building natural language programming languages. A few months ago, I asked you this question in person: do you think software will eventually be completely generated by AI, even down to the operating system level and every pixel seen on the screen, generated in real-time, assuming we can solve the consistency problem?

Greg Brockman:

I think yes, that would be very cool. Imagine what a completely generated user interface looks like; it's actually quite mind-boggling. It's like a real-time dynamic process. When you do things, like whether there should be a button, where the button is, and what the most natural interface looks like. You begin to realize that many of the interfaces we build are actually built around the habits and preferences of existing operating systems.

But if you could reimagine it from scratch, removing all the legacy code, without concepts like folders or files, what would it look like? I don't know the complete answer, but I'm sure the result will be very surprising.

Host:

Let's imagine that future for a moment. Will there still be developers in that world? Will there still be apps?

Greg Brockman:

Take an example like Sora. By the way, Sora is very interesting to me because I remember watching a promotional video we made where Bill was riding a snowmobile and took off his helmet. I thought, "Wow, Bill is really good at snowmobiling." Then I suddenly realized he hadn't actually done it. You find that the way humans participate is very different. It's completely unlike the scene in a movie where Bill personally skis, but he is still involved because he is thinking about the creative process, and that is reflected in his role as a performer.

It's like he appears in the video this way, and when you create a Sora video featuring him as a performer and share it, you feel excited. And the fact that you feel excited excites me too. In fact, we learned this from our experience earlier this year. When our Image Gen technology became incredibly popular, people started generating portraits of themselves and their families.

We realized that if you just generate an image without any real context, like a dog turning into a cool anime style, no one cares; it's boring. It's not engaging. But once you add certain human elements, something you can relate to, people start getting interested.

I think when you see a generated image that looks like a photo of your child, the AI will process it in interesting ways, bringing it into different dimensions of creation, which connects with the audience. And I think this might also influence how software is developed. In the future, people will build applications this way. Imagine you have a dynamic system where the AI acts as the developer, you give it the task, and it writes perfect code for you or creates a completely generated user interface, which you then publish in the ChatGPT App Store.

Host:

This really sounds like the future will focus more on creating a high-quality human experience, and more importantly, the key to the future will no longer be the hard technology, but rather how to aesthetically design that experience, right?

Greg Brockman:

Yes, I think so too. I believe some mechanical skills will definitely be transformed, and we see with the progress of each generation of models, people who try to explore the potential of the model often get the most reliable results. But fundamentally, knowing what you want, having good judgment, and taste are the most critical things.

Agent Commerce: Not a New Idea, The Key is That the Model Finally Works

Host:

You were the CTO of Stripe, and recently you announced the Agency Commerce Protocol. Was this an idea you've had for a long time? Or was it only recently discovered internally: Wow, this is a cool idea that can do a lot of things—allowing agents to browse and make purchases for us?

Greg Brockman:

One thing about this field is that there are no new ideas. All these ideas have been thought of by others, and we have thought about them many times too. The truly fresh thing is that the model is now powerful enough to utilize these ideas effectively.

You can see this from the launch of plugins. We did plugins a few years ago, but the model wasn't powerful enough then, and plugins weren't used much. The model was too complex to call plugins correctly. So today's models are much more reliable than before. It can be said that the fresh thing is not the idea itself, but that it is feasible today.

Host:

Do you shop through ChatGPT? I know Sam said he does.

Greg Brockman:

Funnily enough, I don't really shop much, so almost all my recent shopping has been done through ChatGPT.

AGI is a Continuous Process, Not a Destination

Host:

Can we talk about the future? Last year at DevDay, we saw GPT-4; now a year has passed, and you've released so many things. How do you view the development for next year (2026)? And what will DevDay 2030 look like?

Greg Brockman:

That's a difficult question to answer, but I do think we will have some incredible models next year. The milestone I am most looking forward to is that we will have models capable of solving hard problems. For example, like AlphaGo's breakthrough in Go in 2016. Move 37 in that game changed people's understanding of Go. Imagine the application of this in material science or medicine.

I think we will see such true breakthroughs, whether by AI itself or by AI solving problems with the help of top human experts. I think we will see this collaborative scenario. For developers, this breakthrough will bring immeasurable value.

For instance, in the financial sector, you can build the most advanced applications to help users solve their toughest financial problems. Although these may not be the top problems in finance, we will start solving these extremely complex issues. It is important to note that this will consume a lot of compute resources, so we must ensure these tasks have enough economic value, because otherwise, no one will pay for the compute.

I feel we will continually think about how to push these technologies into deeper and farther domains. As for 2030, I think it's hard to predict, but I believe we will be much closer to AGI than we are now.

Host:

What about your AGI timeline? Has it been adjusted from before?

Greg Brockman:

I think AGI is more like a continuous process than a destination. Initially, I thought AGI was a goal, and completing it meant the mission was accomplished, but now I view it as an evolving process.

At some stage, AGI may be able to perform work equivalent to human economic value, which will be an important milestone, but it is definitely not the end.

I think people have started shifting the discussion from AGI to superintelligence, or simply rejecting all these terms. For me, it doesn't matter. What truly matters is whether we can achieve AI progress, enhance the entire economy, and genuinely benefit people.

I believe that AI will have a profound impact on all aspects of society, and as we drive the development of this technology, we must always ensure it is for the enhancement of human well-being. That is our mission at OpenAI.

Main Tag:Artificial Intelligence

Sub Tags:OpenAIGreg BrockmanAGICompute Scarcity


Previous:Abandoning Fine-Tuning: Stanford Co-releases Agentic Context Engineering (ACE), Boosting Model Performance by 10% and Reducing Token Costs by 83%

Next:The Two Major Pain Points of Agent Long-Range Search Have Been Solved! CAS DeepMiner Runs Nearly 100 Rounds with 32k Context, Open Source Performance Closes in on Closed Source.

Share Short URL