Autonomous Agent Approach is Wrong! Chinese Scholars Propose LLM-HAS: Shifting from "Autonomous Capability" to "Collaborative Intelligence"

Image

Building omnipotent, human-independent, fully autonomous AI agents is a popular research direction in the current large model industry.

The prevailing view is that higher autonomy represents a better system—reducing human intervention itself has inherent value, and complete independence should be the ultimate goal.

However, the team of Chinese scholars Philip S. Yu (Distinguished Professor at the University of Illinois Chicago, ACM Fellow, IEEE Fellow) and Dongyuan Li (Assistant Professor at the University of Tokyo) holds a different view:

The criterion for progress should shift from "autonomous intelligence" to "collaborative intelligence," which means developing LLM-HAS (LLM-based Human-Agent Systems) centered on human-machine collaboration.

Under this paradigm, AI is no longer an isolated "operator" but an active collaborative partner for humans; while enhancing human capabilities, it also retains critical human judgment and oversight responsibilities.

The related research paper, titled "A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy," has been published on the preprint website arXiv.

Image

Paper link:

https://arxiv.org/pdf/2506.09420

In their view, the progress of AI should not be measured by the system's degree of independence, but by the effectiveness of their collaboration with humans; the most promising future for AI is not in systems that replace human roles, but in systems that enhance human capabilities through meaningful cooperation.

They call for the industry and academia to fundamentally shift from the current pursuit of fully autonomous agents to LLM-HAS, which is centered on human-machine collaboration.

Why are fully autonomous agents "not working"?

LLM-based autonomous agents are systems capable of operating independently in open, real-world environments, completing tasks through a "perceive-reason-act" loop without human intervention.

Unlike Human-in-the-loop systems, LLM-based autonomous agents can independently parse goals, plan actions, invoke tools, and adapt through language-based reasoning and memory.

For example, in software engineering, GitHub Copilot can autonomously generate, test, and refactor code, requiring little developer intervention, accelerating conventional development processes; in customer support, systems like AutoGLM, Manus, and Genspark can complete complex itinerary planning, automated bookings, and service issue resolution without human intervention, demonstrating excellent perception-action loop capabilities in dynamic environments.

However, the current deployment of LLM-based autonomous agents in the real world still faces three challenges:

1. Lack of Reliability, Trust, and Safety

LLMs are prone to generating "hallucinatory" content that appears credible but is factually false. The widespread existence of hallucination problems directly undermines trust in fully autonomous systems. If a system cannot consistently and reliably provide accurate information, it will be extremely dangerous in high-stakes scenarios (e.g., medical diagnosis, financial decisions, or critical infrastructure control).

2. Insufficient Ability to Handle Complex and Ambiguous Tasks

These agents perform poorly in tasks requiring deep reasoning, especially when the objectives themselves are ambiguous. Human instructions are often imprecise; LLMs lacking common sense may misunderstand tasks and consequently take incorrect actions. Therefore, they are unreliable in complex domains with open-ended, dynamically adjusted goals, such as scientific research.

3. Regulatory and Legal Liability Issues

Although these systems possess "action capabilities," they do not have formal legal liability as subjects under the existing legal framework. This leads to a huge gap between responsibility and transparency: when the system causes harm or makes incorrect decisions, it is difficult to clarify who should bear the responsibility—the developer, the deployer, or the algorithm itself? As agent capabilities increase, this legal gap between "capability" and "responsibility" will only become more severe.

LLM-HAS: Centered on Human-Machine Collaboration

Unlike fully autonomous LLM-based agents, LLM-HAS is a collaborative framework where humans and LLM-driven agents work together to accomplish tasks.

LLM-HAS maintains human involvement throughout its operation to provide critical information and clarification, offer feedback by evaluating outputs and guiding adjustments, and take over control in high-risk or sensitive scenarios. This human participation ensures improvements in LLM-HAS performance, reliability, safety, and clear accountability, especially in areas where human judgment remains indispensable.

The fundamental motivation for promoting LLM-HAS lies in its potential to address the key limitations and risks faced by autonomous agent systems.

1. Enhanced Trust and Reliability

The interactive nature of LLM-HAS allows humans to provide real-time feedback, correct potential hallucinatory outputs, verify information, and guide agents to produce more accurate and reliable results. This collaborative verification mechanism is crucial for building trust, especially in high-cost-of-error scenarios.

2. Better Handling of Complexity and Ambiguity

Compared to autonomous agents that easily get lost when facing ambiguous instructions, LLM-HAS excels with the aid of continuous human clarification. Humans provide critical context, domain knowledge, and can progressively refine goals—abilities indispensable for handling complex tasks. When objectives are unclear, the system can request clarification instead of proceeding under incorrect assumptions. This is particularly suitable for open-ended research or creative work where goals evolve dynamically.

3. Clearer Accountability

Since humans are continuously involved in the decision-making process, especially in oversight or intervention stages, it becomes easier to establish clear boundaries of responsibility. In this model, a specific human operator or supervisor can typically be designated as the responsible party, making it more legally and regulatory explainable, far clearer than assigning responsibility after a fully autonomous system makes an error.

The research team states that LLM-HAS's iterative communication mechanism helps agent behavior better align with human intent, thereby achieving more flexible, transparent, and efficient collaboration than traditional rule-based or end-to-end systems. This makes it widely applicable to various scenarios highly dependent on human input, contextual reasoning, and real-time interaction, including embodied AI, autonomous driving, software development, dialogue systems, as well as gaming, finance, and healthcare.

In these domains, LLM-HAS redefines human-AI interaction as a language-based collaborative process, shaped by feedback and driven by adaptive reasoning.

Five Major Challenges and Potential Solutions

Image

1. Initial Setup: Still Focused on the Agent Itself

Most current research on LLM-HAS adopts an agent-centric perspective, where humans primarily evaluate the agent's output and provide corrective feedback. This unidirectional interaction dominates the existing paradigm, and there is significant potential to reshape this dynamic relationship.

Enabling agents to proactively monitor human performance, identify inefficiencies, and timely offer suggestions would allow for effective utilization of agent intelligence and reduce human workload. When agents transition to a guiding role, proposing alternative strategies, pointing out potential risks, and reinforcing best practices in real-time, both human and agent performance will improve. The research team believes that shifting to a more human-centered or balanced LLM-HAS design is key to achieving true human-agent collaboration.

2. Human Data: Variability in Human Feedback

Human feedback in LLM-HAS varies significantly in role, timing, and expression. Due to human subjectivity, influenced by personality and other factors, the same system might produce completely different results in the hands of different individuals.

Furthermore, many experiments use LLMs to simulate "pseudo-human" feedback. Such simulated data often fail to truly reflect human behavioral variability, leading to performance distortion and weakening the validity of comparisons.

The acquisition, processing, and use of high-quality human data are fundamental to building well-aligned, collaboratively efficient LLM-HAS. Human-generated data can help agents gain a more nuanced understanding, enhance their collaborative abilities, and ensure their behavior conforms to human preferences and values.

3. Model Engineering: Lack of Adaptability and Continuous Learning Capabilities

In the development of LLM-HAS, building truly "adaptive and continuously learning" AI collaborators remains a core challenge.

Current mainstream methods treat LLMs as static pre-trained tools, leading to problems such as "failure to effectively absorb human insights," "lack of continuous learning and knowledge retention capabilities," and "lack of real-time optimization mechanisms."

To fully unleash the potential of LLM-HAS, these bottlenecks must be overcome through the integration of "human feedback fusion, lifelong learning mechanisms, and dynamic optimization strategies."

4. Post-Deployment: Unresolved Security Vulnerabilities

Deployed LLM-HAS still faces challenges in security, robustness, and accountability. The industry often focuses more on performance metrics, yet issues like reliability, privacy, and security in human-computer interaction have not been sufficiently studied. Ensuring reliable human-machine collaboration requires continuous monitoring, strict supervision, and the integration of responsible AI practices.

5. Evaluation: Insufficient Evaluation Methods

The current evaluation system for LLM-HAS has fundamental flaws. They typically overemphasize agent accuracy and static testing, often completely ignoring the real burden undertaken by human collaborators.

Therefore, there is an urgent need for a new evaluation system that comprehensively quantifies the "contributions" and "costs" of humans and agents in collaboration across multiple dimensions: (1) task effectiveness and efficiency, (2) human-computer interaction quality, (3) trust, transparency, and explainability, (4) ethical alignment and safety, and (5) user experience and cognitive load, thereby truly achieving efficient, reliable, and responsible human-agent collaboration.

For more details, please refer to the paper.

Compiled by: Academic Jun

For reprinting or submission, please leave a message directly in the official account

Main Tag:Artificial Intelligence

Sub Tags:Large Language ModelsCollaborative IntelligenceAI AgentsHuman-AI Collaboration


Previous:Google, Amazon, Microsoft make moves: Will AI's huge energy demands revive nuclear power?

Next:AI Acts as Its Own Network Administrator, Achieving a "Safety Aha-Moment" and Reducing Risk by 9.6%

Share Short URL