RMoA: Residual Extraction Mixture-of-Agents, Enabling Agents to Discover New Information and Adaptively Stop [ACL2025]

The RMoA framework, proposed by a joint research team from East China Normal University, Meituan, Donghua University, and Tsinghua University, maximizes the information utilization rate of model responses while minimizing computational costs. This paper has been accepted by ACL2025.

Paper address: https://arxiv.org/abs/2505.24442

Open-source code: https://github.com/mindhunter01/RMoA

Foreword: The Beauty and Reality of MoA

If you are developing Agent products, you must have heard of or used the Mixture-of-Agents (MoA) architecture. This framework, which allows multiple AI models to collaborate on complex problems, theoretically combines the strengths of many, but in practice, it can be a love-hate relationship:

- The love: It genuinely improves answer quality.

- The hate: The heartbreaking API call costs, and the answer quality gradually "going off track" as the layers increase.

The RMoA (Residual Mixture-of-Agents) framework, recently proposed by a research team from East China Normal University, Meituan, and other institutions, may be about to completely change this situation.

Residual Learning Crosses Boundaries: From Image Recognition to Agent Collaboration

What is Residual Learning? A Simple Analogy

Imagine you are working with a friend to revise an important document. The traditional approach is for everyone to rewrite the entire document from scratch, and then you compare which version is better. But there's a problem with this: most of the content is actually repetitive, and what's truly valuable is the small part that each person newly adds or improves.

Residual learning is a smarter idea: instead of having everyone rewrite everything, let everyone focus on finding and improving the differences. This saves effort and ensures that every valuable modification suggestion is not overlooked.

Inspiration from Image Recognition to AI Collaboration

In 2015, a technology called ResNet caused a sensation in the field of image recognition. It solved a long-standing problem in AI: why do more complex neural networks sometimes perform worse? ResNet's answer was simple: don't make AI relearn everything; instead, let it focus on learning "new improvements."

It's like a student doing math problems: instead of starting from basic addition and subtraction every time, it's better to check and improve potentially problematic steps based on a previous student's answer. This is both faster and more accurate.

RMoA's Clever Adaptation

RMoA researchers found that multiple AI models collaborating also face similar problems: each AI tries to provide a complete answer from scratch, leading to a lot of repetitive work and information waste. They had an epiphany: why not teach AIs to "just state the key points"?

Specifically, this means the subsequent AI should not repeat what the previous AI has already said, but instead focus on:

✓ Discovering information previously missed

✓ Correcting possible errors

✓ Supplementing new perspectives

In this way, each AI can contribute unique value instead of simply repeating work. It's like an efficient brainstorming meeting where everyone builds on previous ideas rather than repeating what others have already said.

Comparison of Traditional MoA and RMoA Architectures. RMoA introduces residual mechanisms and diversity selection.

Three Core Innovations: Making Agent Collaboration Smarter

🎯 Greedy Diversity Embedding Selection: Not All Answers Are Worth Considering

Traditional MoA feeds all model responses to the next layer, which sounds democratic, like everyone speaking in a meeting, but is actually inefficient.

RMoA introduces an ingenious filtering mechanism:

1. Vectorization: Convert all responses into vector representations.

2. Greedy Strategy Selection: Select the K most diverse responses.

3. Specific Algorithm:

- First, select the one with the lowest average similarity to all responses as the starting point.

- Then, progressively select responses least similar to the already selected set.

Core Value: Ensures diversity of viewpoints while significantly reducing computational load for subsequent processing.

🔍 Residual Extraction Agent: Specifically Responsible for Discovering "New Things"

This is RMoA's most core innovation. The research team designed a specialized residual extraction agent:

Core Task:

- Compare responses from the previous round and the current round.

- Identify genuinely new information, corrected errors, and supplementary details.

Output Format:

- Structured report

- Clearly marked "Residuals Detected: Yes/No"

- Specific content of differences

Analogy: Just like a medical consultation, each expert doesn't repeat the entire diagnosis of the previous doctors, but rather highlights new problems and differing opinions they've discovered.

🔧 Residual Aggregation Agent: Organically Integrating "New Things"

With residual information, another agent is needed for integration:

Workflow:

1. Receive responses from the previous round.

2. Receive residual information from the current layer.

3. Integrate them into a more complete and accurate answer.

Design Philosophy: Adheres to the Single Responsibility Principle in software engineering.

- Residual Extraction Agent: Specializes in finding differences.

- Residual Aggregation Agent: Specializes in integrating value.

Advantages: Clear division of labor, better results.

RMoA Complete Architecture Diagram. Shows the complete process of greedy diversity selection, residual extraction, residual aggregation, and adaptive termination.

Adaptive Termination: Letting the System Know When to Stop

Intelligent Marginal Benefit Judgment

RMoA also implements a particularly clever mechanism: adaptive termination. When the system detects no valuable residual information for several consecutive rounds, it actively stops iterating. This is like a skilled programmer knowing when code is good enough and doesn't need over-optimization. This mechanism not only saves computational resources but also avoids hallucination problems that might arise from excessive iteration.

Exquisite Details of Engineering Implementation

Selection and Optimization of Embedding Models

RMoA's open-source implementation chose BGE-M3 as the embedding model, a multi-granularity, multi-functional vectorization model. In the actual implementation, the research team made many optimizations: batch size set to 6, maximum length 2048, and support for GPU acceleration. Behind these seemingly simple parameters are extensive experiments and tuning results.

Cognitive Science Applications of Role-Playing

To maximize cognitive diversity among agents, RMoA designed specialized role prompts for different tasks. For example, in a math task, six agents play the roles of a theoretical mathematician, a competition coach, a computational scientist, an educational content creator, a PhD student, and an actuary, respectively. This design is not arbitrary but based on cognitive science research: different professional backgrounds bring different ways of thinking and problem-solving perspectives.

Refined Cost Control Management

As a framework oriented towards industrial applications, RMoA places great importance on cost control. The system accurately records token consumption for each layer and each step, supports pricing models for different APIs, and provides detailed cost analysis reports. This meticulous cost management is precisely the function engineers need most in actual projects.

Experimental Validation: Data Speaks

Comprehensive Validation on Four Benchmarks

The research team conducted comprehensive tests on AlpacaEval 2.0, MATH, CRUX, and MMLU-redux benchmarks. Results show that RMoA achieves better performance while significantly reducing computational costs. Especially in mathematical reasoning tasks, the accuracy of the Qwen2.5-7B-Instruct model improved by 2.26%, Gemma2-9B-Instruct by 13.8%, and even the powerful GPT-4o improved by 4.56%.

Significant Improvement in Cost-Effectiveness

Even more impressive is the cost control effect. On the MATH dataset, RMoA improved accuracy by 1.92% compared to traditional MoA, while using only 68.83% of the token cost. This dual advantage of performance improvement and cost reduction is the most valued indicator in industrial applications.

RMoA's Performance on Four Benchmarks - Achieved significant performance improvements across all models.

Practical Validation in Enterprise Strategic Consulting

To validate RMoA's effectiveness in real-world business scenarios, I developed an enterprise strategic consulting system based on the core algorithms of the paper and simulated a digital transformation case for testing. This system integrates RMoA's three core innovations: greedy diversity selection, residual learning mechanism, and adaptive termination function.

The case involved developing a digital transformation strategy for a traditional textile and apparel enterprise (annual revenue 5 billion, 3000 employees). The system was configured with four professional roles: market analyst, financial advisor, operations expert, and technology strategist, using DeepSeek and Qwen models as underlying LLMs.

Scroll up and down to see more

Slide left and right to see more

From the actual operation results, several key characteristics can be observed:

Intelligent Multi-round Collaboration: The system automatically performed 4 rounds of analysis iteration. In each round, new information was identified and strategic recommendations were refined based on the previous round.

Precise Cost Control: A total of 40,804 tokens were consumed, averaging about 10,201 tokens per round, significantly lower than traditional MoA.

High-Quality Business Output: A complete strategy across 5 dimensions was generated, providing executable solutions from priority planning to risk control.

Robust Fault Tolerance: The system was able to complete the task despite network instability, and engineering robustness was demonstrated even with some API call failures.

This practical validation proves that RMoA not only performs excellently in academic benchmarks but also provides high-quality, low-cost intelligent services in real enterprise application scenarios. For complex decision-making scenarios requiring multi-disciplinary collaboration, RMoA demonstrates advantages unmatched by traditional single models and simple MoA.

Performance of different models at different layers. RMoA shows continuous improvement, while traditional MoA experiences performance degradation.

Cost-Benefit Analysis Comparison. RMoA achieves better performance while reducing costs.

RMoA: Which Areas Benefit Most

Financial Risk Control: An Ideal Choice for Multi-Dimensional Risk Assessment

In financial risk control scenarios, RMoA's residual learning mechanism ensures that no important signals are overlooked during the risk assessment process. The diversity selection mechanism can filter out the most valuable risk perspectives from credit, market, operational, and compliance angles, avoiding blind spots caused by groupthink. The adaptive termination mechanism allows for timely cessation when risk assessment reaches a stable state, ensuring analysis quality while controlling costs.

Medical Diagnosis: AI-enabled Multi-disciplinary Consultation

Medical diagnosis is another ideal application scenario. RMoA can simulate the process of multi-disciplinary consultations, allowing AI assistants from different specialties to analyze cases from their respective perspectives, and the residual mechanism ensures that no diagnostic clues are lost during collaboration. This approach improves the comprehensiveness of diagnosis and avoids resource waste caused by repetitive examinations.

Code Review: Multi-Perspective Quality Assurance

In software development, RMoA can achieve more efficient code review. Architects focus on design patterns, security experts on vulnerability risks, performance experts on optimization space, and operations engineers on deployment issues. Residual learning ensures that each expert's unique insights are preserved and integrated, forming a more comprehensive code quality assessment.

💡 Practical Advice for Agent Developers

📈 Incremental Integration Strategy

If you are considering integrating RMoA into an existing Agent system, an incremental strategy is recommended:

Step 1: First try RMoA on non-critical paths.

Step 2: Familiarize yourself with its features and parameter tuning methods.

Step 3: Gradually expand to core business scenarios.

Important Tip: Pay special attention to the sensitivity of different task types to the K value (number of diversity selections); typically, K=3 is a good starting point.

💰 Importance of Cost Monitoring

When deploying RMoA, it is essential to establish a complete cost monitoring mechanism:

- Detailed statistics: Utilize the detailed token statistics provided by the framework.

- Layer-level analysis: Analyze the cost contribution of each layer.

- Optimization potential: Identify potential areas for optimization.

- Trade-off analysis: The cost of residual extraction and aggregation processes needs to be weighed against the resulting quality improvement.

🎭 Professionalization of Role Design

Invest time in designing high-quality role prompts, as this is crucial for RMoA's effectiveness:

Element: Professional division of labor; Requirement: Based on real professional division; Suggestion: Avoid overlapping responsibilities between roles.

Element: Professionalism; Requirement: Ensure the professionalism of role settings; Suggestion: Collaborate with domain experts.

Element: Accuracy; Requirement: Ensure the accuracy of descriptions; Suggestion: Multi-round verification and optimization.

Ablation experiment results validated the effectiveness of each RMoA component, with residual agents contributing the most.

Concluding Remarks

RMoA is not just a new technical choice, but a new way of thinking: enabling AI systems to learn to focus on change, value differences, and stop at the right time. These qualities, which sound very much like human intelligence, might precisely be the correct direction for the development of general artificial intelligence.

The future is here, destined to walk together

🎉Let's create more beauty together!🎉

If you find this article helpful,

Thank you for giving me a [like] and [watching]!

WeChat ID: xiumaoprompt

Please state your purpose when adding!

End of article, Author: Xiuxiu Mao

RMoA: Residual Extraction Mixture-of-Agents, Enabling Agents to Discover New Information and Adaptively Stop [ACL2025]

Share Short URL