Google Research Finds: Prompt Design is the Core of Multi-Agent Systems!

In multi-agent systems (MAS), designing effective prompts and topologies poses challenges, as individual agents can be sensitive to prompts, and manually designing topologies requires extensive experimentation.

Image

To automate the entire design process, Google & Cambridge University first conducted an in-depth analysis of the design space to understand the factors contributing to effective MAS. They found that prompt design significantly impacts downstream performance, while effective topologies occupy only a small fraction of the entire search space.

On mathematical problems, Gemini 1.5 Pro, when compared to agents extended with only self-consistency (SC), self-refine (reflect), and multi-agent debate, shows the accuracy versus total tokens for prompt-optimized agents on each problem. Error bars indicate 1 standard deviation. We show that higher accuracy can be achieved with more computational resources through more effective prompts.

Image

Performance of different topologies using Gemini 1.5 Pro compared to baseline agents, where each topology was optimized via APO, and “Sum.” (Summarizer) and “Exe.” (Executor) are task-specific topologies as shown in Figure 4. We observe that not all topologies have a positive impact on multi-agent system (MAS) design.

Image

Based on these findings, Google & Cambridge University proposed the Mass framework, which optimizes MAS through three stages:

Block-level (local) prompt optimization: Optimizing prompts for agents within each topological block.

Workflow topology optimization: Optimizing workflow topologies in the pruned topological space.

Workflow-level (global) prompt optimization: Performing global prompt optimization on the best-found topology.

ImageImage

The proposed Multi-Agent System Search (Mass) framework discovers effective multi-agent system designs by interleaving prompt optimization and topology optimization in a customizable multi-agent design space (optimized topology and prompts shown on the right), with key components illustrated on the left.

Image

Experiments used Gemini 1.5 Pro and Flash models and compared with various existing methods, including Chain-of-Thought (CoT), Self-Consistency (SC), Self-Refine, Multi-Agent Debate, ADAS, and AFlow.

Image

Performance Improvement: Mass significantly outperforms existing methods on multiple tasks, with an average performance increase of over 10%.

Importance of Optimization Stages: Through staged optimization, Mass achieved performance improvements at each stage, demonstrating the necessity of local-to-global optimization.

Co-optimization of Prompts and Topologies: Mass achieved better performance by simultaneously optimizing prompts and topologies than by optimizing them separately.

Cost-Effectiveness: Mass demonstrated stable and effective performance improvements during optimization, offering higher sample efficiency and cost-effectiveness compared to existing automatic design methods.

ImageImage

Main Tag:Artificial Intelligence Research

Sub Tags:Multi-Agent SystemsMachine LearningTopology OptimizationPrompt Engineering


Previous:The Sky Has Fallen! Apple Just Proved: DeepSeek, o3, Claude and Other "Reasoning" Models Lack True Reasoning Ability

Next:35% Accuracy Evaporates! ByteDance & HUST's WildDoc Reveals Robustness Shortcomings in Multimodal Document Understanding

Share Short URL