XinZhiYuan Report
Editors: Peach, Yingzhi
【XinZhiYuan Guide】Recently, a key figure in GPT-4.1 revealed the progress of GPT-5, stating the challenge lies in balancing reasoning and chat capabilities. Meanwhile, OpenAI's Chief Research Officer discussed the key elements on the path to AGI in a new interview.
How far along is GPT-5?
Recently, Michelle Pokrass, a core researcher for GPT-4.1, revealed that the challenge in building GPT-5 is finding the right balance between reasoning and chatting.
She stated, "o3 thinks seriously but isn't suitable for casual chatting. GPT-4.1 improved coding capabilities by sacrificing some quality in small talk."
"Now, the goal is to train a model that knows when to think seriously and when to chat."
In a recent 50-minute conversation, Michelle for the first time shared more about the development process behind GPT-4.1 and the critical role RFT plays in the product.
Facing the ultimate goal of AGI, OpenAI's Chief Research Officer said, "AGI is more than just ChatGPT; it includes many things."
Currently, OpenAI faces not only technical challenges but also needs to find a balance in trust and ethics.
Behind the GPT-4.1 R&D
Michelle Pokrass said that the goal for GPT 4.1 was to make it enjoyable for developers to use.
Sometimes, models are tuned to optimize benchmarks, and the results look good, but in practice, issues arise, such as the model not following instructions, having weird formatting, or too short context.
The team spent a lot of effort communicating with users, collecting their feedback, and then converting this feedback into information that could be genuinely used in the research process.
Researchers observe recurring themes in the feedback, such as instruction following ability.
OpenAI also uses these models internally, so they can feel where the models are not performing well.
Combining these factors, the team can determine which evaluation metrics are truly critical for customers.
OpenAI has an email product where they get free inference services when processing emails. In exchange, the company can use this data.
Michelle really likes seeing the various cool user interfaces people build.
The team quietly added an improvement in the final stage of model development, which was a significant boost to UI and coding capabilities.
She also saw that people liked using Nano, which is small, cheap, and fast.
The hypothesis behind Nano is, can we greatly promote AI adoption through cheap and fast models? The answer is yes. People have demands at various points on the cost-latency curve.
In terms of improving model performance, GPT 4.1 focuses on long context and instruction following.
Long context processing capability is an important metric for evaluating model performance in complex tasks, but generating effective long context evaluation content is quite challenging.
OpenAI is committed to obtaining more real-world long context evaluation data to improve the model's performance in practical applications.
In model applications, handling ambiguity is a major challenge.
Whether to ask the user for more information or to performhypothetical reasoningbased on existing information requires developers to be able to flexibly adjust the model's strategy.
GPT 4.1 made improvements in this regard, enhancing model controllability and reducing issues caused by ambiguity.
When API errors occur, the model might freeze, affecting the user experience.
OpenAI improves training algorithms and data processing methods to ensure the model continues to run stably when facing errors and anomalies.
GPT 4.1 has significantly improved code writing capabilities, performing excellently in local code modification tasks, but still needs optimization when dealing with global context and complex code reasoning.
For example, when handling tasks involving complex technical details passed between files, the model's understanding and processing capabilities need to be strengthened.
In terms of front-end coding, the team not only requires functional correctness but also pays attention to aesthetics and standards, meeting the professional taste of engineers.
RFT New Breakthroughs
Fine-tuning technology plays an important role in GPT 4.1, and the emergence of RFT (Reinforcement Fine-Tuning) brings new possibilities for extending model capabilities.
Compared to traditional SFT, RFT shows strong advantages in specific domains.
In areas like chip design, biology, and drug R&D, the RFT fine-tuning process is highly data-efficient, achieving good results with only a few hundred samples.
In drug R&D, RFT can utilize unique and verifiable data to allow the model to simulate drug action mechanisms more accurately, accelerating the R&D process.
In the field of chip design, RFT helps the model better understand and process complex design rules, optimizing design solutions.
A common characteristic of these areas is that although continuous exploration is needed, experimental results are easily verifiable, which highly aligns with RFT's advantages.
OpenAI Chief Research Officer: The Path to AGI
In a recent article by the foreign media outletTechINAsia, an interview with the person behind the OpenAI models once again showcased OpenAI's foreseeable future for AGI.
Mark Chen, this Chinese research scientist, plays a pivotal role in internal model R&D.
During his seven years at OpenAI, he has progressively advanced from a research scientist to Chief Research Officer, responsible for model development and overall company research.
He has led several milestone projects—the o1 series reasoning models, the text-to-image model Dall-E, and GPT-4, which incorporates visual perception.
From Finance to AI, an Unexpected Career Turn
Mark Chen's career was not initially set on AI from the start.
After earning dual degrees in Mathematics and Computer Science from MIT, his original plan was to pursue a PhD and become a professor.
However, a turning point in fate occurred.
After the professor he planned to collaborate with founded ahedge fund, he changed direction and joined the financial industry.
In the world of high-frequency trading, Mark Chen spent six years.
He confessed, "That job was fulfilling in some ways, but very unfulfilling in others. You face the same competitors, everyone is chasing speed, but you don't feel like you're changing the world."
In 2016,Google's AlphaGodefeated Go grandmaster Lee Sedol in a historic match, its human-level performance surprising even AI experts.
Inspired by this, Mark Chen replicated AlphaGo by implementing a Deep-Q network. It was this attempt that made him completely fascinated with AI.
Despite not having a PhD, fortunately, he entered this field through OpenAI's residency program.
Finding the Best Balance for AGI
When discussing AGI, Mark Chen stated, "We use a very broad definition; it's not just ChatGPT, it includes other things."
OpenAI has always seen AGI as the holy grail of AI and has developed a five-level framework to achieve this goal.
And now, they have reached the third level, Agentic AI—capable of autonomously executing complex tasks and planning.
Mark Chen mentioned that OpenAI's recently launched AI agent products, Deep Research andOperator, are still in their early stages.
Operator can be faster and have longer trajectories in the future. These products represent OpenAI's ambition for Agentic AI.
He also emphasized that balancing short-term product releases with long-term research, and allocating computing resources across OpenAI's entire project portfolio, is at the core of his work, ultimately ensuring OpenAI finds the best balance between commercialization and scientific exploration.
Mark Chen is confident in the optimization of OpenAI's internal algorithms.
He stated that their reasoning models use far less data during training than pre-trained models, but achieve efficient performance through more computing resources.
Thus, OpenAI is no less efficient than competitors like Google's Gemini 2.5.
Responding to Open Source
A few days ago, Altman stated at a conference that he expects to open source the first reasoning model this summer.
In the interview, Mark Chen also revealed that the company is planning to release its first open-source language model since GPT-2.
He believes that the advantage of open-source models lies in their reasoning capabilities and developers being able to optimize them, but they also carry risks of misuse due to fewer safety measures.
Facing the strong rise of AI models like DeepSeek, Chen appeared calm.
He stated that the biggest danger in the field of AI is overreacting. OpenAI believes in its roadmap and focuses on long-term goals rather than short-term market noise.
Finally, Mark Chen also offered advice to young people wanting to enter the field of AI: "Become deeply familiar with all the tools, and always stay curious."
The more tools you play with and the more curious you are, the more you can understand the areas others are trying to push and the right direction for the future. You must stay ahead.
This is a rapidly changing field. Many things you see being explored are glimpses of the future.
References:
https://www.techinasia.com/man-models-openais-research-chief-road-agi
https://www.youtube.com/watch?v=NNGbaiN1L7Y
https://x.com/slow_developer/status/1921248876687999153
https://x.com/jacobeffron/status/1920849638166315104