312 Trajectories Boost Performance by 241%! SJTU and SII Open-Source Computer Agent Surpasses Claude 3.7

圖片

After Anthropic launched Claude Computer Use, firing the first shot for Computer Use Agents, OpenAI successively released Operator, pushing the capabilities of computer agents to new heights with Reinforcement Learning (RL) algorithms, drawing widespread global attention.

It is widely believed that achieving a breakthrough in computer agent performance requires massive trajectory data or complex reinforcement learning—this could mean extensive manual trajectory labeling and the construction of large-scale virtual machine environments to support agent learning and optimization.

However, the latest research from Shanghai Jiao Tong University and SII offers a non-consensus answer: with only 312 human-annotated trajectories, using Claude 3.7 Sonnet to synthesize richer action decisions can boost the model's performance by 241%, even surpassing Claude 3.7 Sonnet's extended thinking mode, becoming the new SOTA for open-source computer agents on Windows systems.

圖片

Paper Title: Efficient Agent Training for Computer Use

Paper URL: https://arxiv.org/abs/2505.13909

Code URL: https://github.com/GAIR-NLP/PC-Agent-E

Model URL: https://huggingface.co/henryhe0123/PC-Agent-E

Data URL: https://huggingface.co/datasets/henryhe0123/PC-Agent-E

This finding conveys a crucial signal: current large models already possess the basic ability to complete tasks using computers, and their performance bottleneck mainly lies in the excitation of long-horizon planning capabilities, which can be significantly enhanced with a very small amount of high-quality trajectories.

PC Agent-E: How to train a powerful computer agent with minimal trajectories?

Where Does the Data Come From? Humans Provide Raw Operation Trajectories

Unlike previous methods that relied on large-scale manual labeling or complex automated synthesis, the team's approach only requires 312 real human operation trajectories. These trajectories were collected using the team's PC Tracker tool, with only two authors spending a day operating their computers to complete the collection of raw trajectory data. Each trajectory includes a task description, screenshots, and keyboard/mouse operations, ensuring data correctness.

圖片

Distribution of 312 trajectories across different software applications

Thought Completion: Giving

Main Tag:AI Agents

Sub Tags:Machine LearningData EfficiencyLarge Language ModelsOpen Source


Previous:Historic First! o3 Finds Linux Kernel Zero-Day Vulnerability, Uncovered After 100 Scans of 12,000 Lines of Code, No Tools Required

Next:Thinking with Images Only: Reinforcement Learning Forges a New Reasoning Model Paradigm, Maximizing Complex Scene Planning!

Share Short URL