Programming Revolution Explodes! OpenAI's Most Powerful Agent Just Launched on ChatGPT

Xinzhiyuan Report

Editor: Editorial Department YXH

【Xinzhiyuan Guide】OpenAI's most powerful AI programming agent is really here! Codex is officially launched, powered by codex-1, an optimized version of o3. It handles multiple tasks in parallel, completing days of software engineering work in half an hour.

From today, the era of AI programming officially begins!

Just now, Greg Brockman led a six-person team from OpenAI in an online live stream to launch a cloud-based AI programming agent – Codex.

In Altman's words, the era where one person can create countless hit applications is here!

图片

图片

Codex is powered by the new model codex-1, a specially tuned version of o3 tailored for software engineering.

It can not only safely process multiple tasks in parallel in a cloud sandbox environment but also directly access your codebase through seamless integration with GitHub.

It is not just a tool, but a "10x engineer" capable of simultaneously:

Quickly building feature modules

Deeply answering codebase questions

Accurately fixing code vulnerabilities

Submitting PRs

Automatically executing test validations

In the past, these tasks might take developers hours or even days, but now Codex can complete them efficiently in at most 30 minutes.

图片

Click the ChatGPT sidebar, enter prompts, then directly click "Code" to assign tasks or "Ask" to inquire about codebase-related issues.

Through reinforcement learning, Codex is trained on real-world coding tasks and diverse environments. The generated code not only aligns with human preferences but also seamlessly integrates into standard workflows.

Benchmark tests show that codex-1 scored a high 72.1% on SWE-bench, surpassing Claude 3.7 and o3-high.

图片

Starting today, Codex will be officially available to ChatGPT Pro, Enterprise, and Team users worldwide, with Plus and Edu users getting access soon.

图片

It can be said that the advent of the AI programming agent Codex may reshape the underlying logic of software development and completely ignite the spark of the programming revolution.

图片

Codex Multi-Task Parallelism, AI Programming Super Accelerator

As early as 2021, OpenAI first released the CodeX model, ushering in the era of "vibe coding".

This programming method allows developers to collaborate with AI, making code production more intuitive and efficient.

A few weeks ago, OpenAI also released CodeX CLI, an agent that can run in the local terminal.

But that's just the beginning!

OpenAI is launching a new Codex agent today, pushing software engineering to a new height.

Next, let's take a look at Codex's impressive coding performance.

After connecting his GitHub account, OpenAI researcher Thibault Sottiaux selected an open-source repository, the preparedness repo.

图片

Then, he received three tasks:

The first was a question: ask the code agent Codex to explain the codebase and describe the overall structure.

The second was a code task: ask to find and fix a bug somewhere in the codebase.

The third task was a question: iterate through the codebase and proactively suggest tasks it could perform.

图片

In the following demo, Thibault gave Codex multiple tasks, such as spelling and grammar correction, intelligent task delegation, and multi-repository adaptation.

In terms of correction, he intentionally included spelling errors in the instructions. Codex not only understood the intent but also proactively found and fixed spelling and grammar issues in the codebase, with astonishing detail.

图片

When Thibault proposed the goal of making the codebase "easy to maintain and bug-free", Codex iterated through the codebase and proactively discovered issues such as mutable default values and inconsistent timeout settings, and automatically generated fix tasks.

This "self-delegation" capability is a peak performance for an agent.

图片

图片

It's worth noting that the Codex agent runs on OpenAI's computing infrastructure, sharing the same well-tested system as reinforcement learning.

Each task runs in an independent virtual sandbox, equipped with its own file system, CPU, memory, and network policy, ensuring efficiency and security.

图片

In addition to the preparedness repository, Codex also seamlessly handled the CodeX CLI library, demonstrating its generalization ability across different projects.

Whether it's an open-source project or an internal codebase, Codex handles it with ease.

Codex received a bug feedback from a user, where a special character in the filename caused the diff command to error.

图片

During the resolution process, it could not only reproduce the problem but also write test scripts, run linter checks, and generate a PR, with the entire process taking only a few minutes.

Thibault stated directly, "This would have originally taken me 30 minutes, or even several hours to complete."

图片

Furthermore, OpenAI researcher Katy Shi emphasized in the demo that Codex's PR includes a detailed summary, clearly explaining the changes and referenced code, with test results at a glance.

图片

After a series of demonstrations, Greg commented that Codex gave him a profound sense of AGI!

图片

Aligning with Human Preferences

Putting 4 Open Source Libraries into Practice

A primary goal of OpenAI's training of codex-1 is to ensure its output highly conforms to human coding preferences and standards.

Compared to OpenAI o3, codex-1 consistently generates more concise code modification patches that can be directly reviewed by humans and integrated into standard workflows.

To demonstrate the conciseness and efficiency of the code generated by Codex, OpenAI provided 4 practical examples comparing Codex and o3 on open-source libraries:

图片

astropy

astropy is an open-source Python library for astronomy.

图片

The first issue in the astropy/astropy repository was that the separability_matrix in the Modeling module could not correctly calculate the separability of nested CompoundModels.

图片

As seen in the code version comparison before and after modification, using Codex resulted in very concise code changes.

In contrast, the code modified by o3 was somewhat verbose and even added some "unnecessary" comments to the source code.

图片

图片

matplotlib

Matplotlib is a comprehensive Python library for creating static, animated, and interactive visualizations.

图片

This issue was fixing a bug: incorrect windows correction in mlab._spectral_helper.

图片

Similarly, Codex's code modification process was more concise.

图片

图片

django

Django is a Python-based web framework. This issue was fixing the problem where expressions containing only duration did not work correctly on SQLite and MySQL.

图片

Codex's fixing process remained elegant, and compared to o3, it first added the missing dependency calls.

图片

图片

expensify

expensify is an open-source software for financial collaboration centered around chat.

图片

The issue provided by OpenAI was "dd [HOLD for payment 2024-10-14] [$250] LHN - Member chat room name not updated in LHN after clearing cache".

图片

Similarly, Codex's problem localization and modification were more accurate and effective. O3 even made an ineffective code change.

图片

图片

OpenAI Team Is Already Using It

OpenAI's technical team has started incorporating Codex as part of their daily toolkit.

OpenAI engineers most often use Codex to perform repetitive and well-defined tasks, such as refactoring, renaming, and writing tests, which would otherwise interrupt their focus.

It is also suitable for building new features, connecting components, fixing bugs, and drafting documentation.

Teams are building new habits around Codex: handling on-call issues, planning tasks at the start of the day, and performing background work to maintain progress.

By reducing context switching and reminding them of forgotten to-dos, Codex helps engineers deliver faster and focus on what matters most.

Before the official launch, OpenAI collaborated with a few external testers to evaluate Codex's actual performance in different codebases, development processes, and team environments:

Cisco, as an early design partner, explored Codex's potential to accelerate engineering team ideation and implementation, and provided feedback to OpenAI through evaluating real use cases, assisting model optimization.

Temporal leveraged Codex to accelerate feature development, issue debugging, test writing, and execution, and used it for refactoring large codebases. Codex also handled complex tasks in the background, helping engineers stay focused and iterate efficiently.

Superhuman used Codex to automate small repetitive tasks, such as improving test coverage and fixing integration failures; it also enabled product managers to make lightweight code changes without engineering intervention (except for code reviews), improving pairing efficiency.

Kodiak accelerated debugging tool development, test coverage, and code refactoring with Codex support, advancing the R&D of its autonomous driving system, Kodiak Driver. Codex also served as a reference tool, helping engineers understand unfamiliar code stacks, providing relevant context and historical changes.

Based on current usage experience, OpenAI suggests: assigning well-defined tasks to multiple agents simultaneously, and trying various task types and prompting methods to more comprehensively explore the model's capabilities.

图片

Model System Message

Through the following system message, developers can understand codex-1's default behavior and adjust it for their workflow.

For example, the system message guides Codex to run all tests mentioned in the AGENTS.md file, but if time is tight, developers can ask Codex to skip these tests.

# Instructions - The user will provide a task. - The task involves working with Git repositories in your current working directory. - Wait for all terminal commands to be completed (or terminate them) before finishing. # Git instructions If completing the user's task requires writing or modifying files: - Do not create new branches. - Use git to commit your changes. - If pre-commit fails, fix issues and retry. - Check git status to confirm your commit. You must leave your worktree in a clean state. - Only committed code will be evaluated. - Do not modify or amend existing commits. # AGENTS.md spec - Containers often contain AGENTS.md files. These files can appear anywhere in the container's filesystem. Typical locations include `/` , `~` , and in various places inside of Git repos. - These files are a way for humans to give you (the agent) instructions or tips for working within the container. - Some examples might be: coding conventions, info about how code is organized, or instructions for how to run or test code. - AGENTS.md files may provide instructions about PR messages (messages attached to a GitHub Pull Request produced by the agent, describing the PR). These instructions should be respected. - Instructions in AGENTS.md files: - The scope of an AGENTS.md file is the entire directory tree rooted at the folder that contains it. - For every file you touch in the final patch, you must obey instructions in any AGENTS.md file whose scope includes that file. - Instructions about code style, structure, naming, etc. apply only to code within the AGENTS.md file's scope, unless the file states otherwise. - More-deeply-nested AGENTS.md files take precedence in the case of conflicting instructions. - Direct system/developer/user instructions (as part of a prompt) take precedence over AGENTS.md instructions. - AGENTS.md files need not live only in Git repos. For example, you may find one in your home directory. - If the AGENTS.md includes programmatic checks to verify your work, you MUST run all of them and make a best effort to validate that the checks pass AFTER all code changes have been made. - This applies even for changes that appear simple, i.e. documentation. You still must run all of the programmatic checks. # Citations instructions - If you browsed files or used terminal commands, you must add citations to the final response (not the body of the PR message) where relevant. Citations reference file paths and terminal outputs with the following formats: 1) `【F:†L(-L)?】` - File path citations must start with `F:` . `file_path` is the exact file path of the file relative to the root of the repository that contains the relevant text. - `line_start` is the 1-indexed start line number of the relevant output within that file. 2) `【†L(-L)?】` - Where `chunk_id` is the chunk _id of the terminal output, `line_`start `and `line _end` are the 1-indexed start and end line numbers of the relevant output within that chunk. - Line ends are optional, and if not provided, line end is the same as line start, so only 1 line is cited. - Ensure that the line numbers are correct, and that the cited file paths or terminal outputs are directly relevant to the word or clause before the citation. - Do not cite completely empty lines inside the chunk, only cite lines that have content. - Only cite from file paths and terminal outputs, DO NOT cite from previous pr diffs and comments, nor cite git hashes as chunk ids. - Use file path citations that reference any code changes, documentation or files, and use terminal citations only for relevant terminal output. - Prefer file citations over terminal citations unless the terminal output is directly relevant to the clauses before the citation, i.e. clauses on test results. - For PR creation tasks, use file citations when referring to code changes in the summary section of your final response, and terminal citations in the testing section. - For question-answering tasks, you should only use terminal citations if you need to programmatically verify an answer (i.e. counting lines of code). Otherwise, use file citations.

图片

Codex CLI Update

Last month, OpenAI launched a lightweight open-source tool – Codex CLI, allowing powerful models like o3 and o4-mini to run directly in the local terminal, helping developers complete tasks faster.

图片

This time, OpenAI also released a smaller model version optimized for Codex CLI – the o4-mini version of codex-1.

It boasts low latency, strong instruction understanding, and code editing capabilities. It is now the default model for Codex CLI and is also available via API (named codex-mini-latest), and will continue to be iterated and updated.

Additionally, the login method for Codex CLI has been simplified. Developers can now log in directly with their ChatGPT account, select the API organization, and the system will automatically generate and configure the API key.

To encourage usage, within 30 days starting today, users logging into Codex CLI with their ChatGPT account will receive free credits: Plus users get $5 in API usage credit; Pro users get $50.

图片

Is Codex Expensive?

In the coming weeks, all users will be able to heavily try out the Codex feature.

Subsequently, OpenAI will introduce throttling mechanisms and flexible pricing, supporting on-demand purchase of additional usage.

For developers, the codex-mini-latest model is available on the Responses API at the following prices:

Per million input tokens: $1.50

Per million output tokens: $6.00

And enjoys a 75% prompt caching discount

Codex is currently still in the research preview stage and does not yet support frontend capabilities like image input, nor does it have the ability for real-time correction during task execution.

Additionally, the response time for delegating tasks to the Codex agent is relatively long, and users may need to adapt to this asynchronous collaboration workflow.

As the model's capabilities continue to improve, Codex will be able to handle more complex and persistent development tasks, gradually becoming more like a "remote development partner".

图片

What's Next?

OpenAI's goal is for developers to focus on what they are good at, and delegate the remaining tasks to AI agents, thereby improving efficiency and productivity.

Codex will support real-time collaboration and asynchronous task delegation, and these two work modes will gradually merge.

Tools like Codex CLI have become standard for developers to accelerate coding, while the asynchronous, multi-agent collaboration workflow led by Codex in ChatGPT is expected to become a new paradigm for engineers to produce high-quality code efficiently.

In the future, developers will be able to collaborate with AI in IDEs and daily tools - asking questions, getting suggestions, delegating complex tasks, all integrated into a unified workflow.

OpenAI plans to further enhance interactivity and flexibility:

Support providing guidance during tasks

Collaborate with AI to implement strategies

Receive proactive progress updates

Deep integration with commonly used tools (such as GitHub, CLI, issue trackers, CI systems) for convenient task assignment

图片

Software engineering is becoming one of the first industries to significantly improve efficiency due to AI, which will fully unleash the huge potential of individuals and small teams.

At the same time, OpenAI is also working with partners to study how the widespread application of agents will affect development processes, skill development, and global talent distribution.

References:

https://www.youtube.com/watch?v=hhdpnbfH6NU

https://openai.com/index/introducing-codex/

图片

Main Tag:AI Programming

Sub Tags:OpenAIAI AgentsCode GenerationSoftware Development


Previous:Deep Dive: Will Humanity Die Out Without AGI?

Next:Ten-Thousand Word Review: Is Our Behavior a Bundle of Neural Fireworks or a Life Script?

Share Short URL