HALO, a Hierarchical Dynamic Prompting Framework Based on MCTS, Enabling Agents to Always Find the Optimal Path | Latest

Introduction: The HALO framework reshapes multi-Agent (MAS) collaboration through three major innovative mechanisms: a hierarchical reasoning architecture overcomes cognitive overload, allowing agents to specialize; dynamic role instantiation enables matching professional agents to different tasks; and an MCTS-based search engine automatically explores optimal reasoning paths. It can transform vague user queries into professional prompts, decompose complex tasks, and dynamically adjust execution plans. (https://arxiv.org/pdf/2505.13516) Experiments show that HALO achieves an average improvement of 14.4% in code generation, general reasoning, and arithmetic reasoning tasks, demonstrating outstanding performance, especially when dealing with highly specialized tasks. This framework allows AI Agent systems to act like navigation satellites, always finding the best path to solve complex problems.

HALO Framework Overview

Researchers from Nanjing University of Posts and Telecommunications and Chongqing University point out that current Agent frameworks are often limited by predefined role designs and static communication structures, making it difficult to cope with complex interactive environments and expert-level tasks. To address this, the researchers proposed HALO (Hierarchical Autonomous Logic-Oriented Orchestration), a hierarchical autonomous logic-oriented coordination framework, which fundamentally changes this situation through a three-stage paradigm, enabling multi-agent systems to self-organize and coordinate without human intervention. HALO is a multi-agent collaboration framework based on a hierarchical reasoning architecture, which can not only dynamically instantiate agent roles but also adaptively build optimal communication workflows, providing new ideas for complex problem-solving.

HALO Framework Overview. HALO consists of three modules: (1) Adaptive Prompt Optimization, which refines user queries into high-quality, understandable prompts; (2) Hierarchical Reasoning Stack, responsible for task decomposition, role instantiation, and subtask execution; (3) Workflow Search Engine, which explores multi-agent collaboration and constructs optimal workflows. The green path represents the optimal reasoning trajectory, and the red path is pruned during the search process.

Reasons Why Agents Struggle with Complex Tasks

Traditional multi-Agent systems often encounter two major difficulties when dealing with complex tasks:

1. Lack of Flexibility: Reliance on predefined agent role design space

2. Lack of Adaptability: Communication structures are too static to adapt to dynamic task environments

These limitations result in existing systems performing poorly when facing highly specialized and expert-level tasks, such as complex mathematical problems or ethical analysis that require deep domain knowledge. More concerning is that most users lack expertise in prompt engineering and cannot effectively guide agent systems, leading to inefficient task execution, a problem that is almost unsolvable in existing frameworks.

Three Core Components of the HALO Framework

The HALO framework addresses the above challenges through three collaborative core components, injecting unprecedented flexibility and adaptability into multi-agent systems:

• Adaptive Prompt Optimization Module: Converts raw user queries into high-quality, structured prompts, solving the problem of users' insufficient prompt engineering capabilities

• Hierarchical Reasoning Stack: Composed of high-level planning agents, middle-level role design agents, and low-level reasoning agents, forming a complete task decomposition and execution chain

• Workflow Search Engine: Based on Monte Carlo Tree Search (MCTS) technology, it systematically explores the multi-agent collaboration space to construct the optimal reasoning trajectory

These components work together to enable the entire system to adaptively find the best path to solve problems.

Adaptive Prompt Optimization

The Adaptive Prompt Optimization module is the first line of defense for the HALO framework. It uses four collaborative agents to transform vague user queries into clear, structured prompts:

1. Task Analysis Agent: Analyzes the original query, extracts core intent, task type, and key details, forming a global semantic context

2. Prompt Template Agent: Constructs an initial prompt framework, including task description, reasoning goals, input conditions, and output format

3. Prompt Optimization Agent: Introduces slow-thinking prompt strategies and tool invocation instructions to further refine the prompt structure

4. Prompt Generation Agent: Integrates the optimized structure into the final prompt, paving the way for downstream reasoning

This process ensures that even non-expert users can receive expert-level prompt guidance.

System prompts used in the Adaptive Prompt Optimization module: The optimization process is carried out by four specialized agents: the Task Analysis Agent extracts task semantics from user queries; the Prompt Template Agent constructs structured prompt templates; the Prompt Optimization Agent enhances clarity and usability; the Prompt Generation Agent generates the final prompt.

Hierarchical Reasoning Stack

The Hierarchical Reasoning Stack is the core engine of the HALO framework. It completes the entire process from task decomposition to execution through the collaboration of three layers of agents:

• High-level Planning Agent: Receives optimized prompts, decomposes the overall task into a series of subtasks, and iteratively updates the decomposition strategy based on the execution history of preceding subtasks

• Middle-level Role Design Agent: Dynamically instantiates specialized agents for each subtask, ensuring that each generated agent is highly matched to the subtask requirements

• Low-level Reasoning Agent: Responsible for executing specific subtasks and producing intermediate outputs through collaborative mechanisms

The system also introduces an early stopping mechanism, terminating the reasoning process when 66% of completed subtasks produce consistent answers, significantly improving system efficiency.

MCTS-based Optimal Path Explorer

The Workflow Search Engine is the most innovative component of the HALO framework. It rephrases subtask execution as a structured workflow search problem. Using Monte Carlo Tree Search (MCTS) technology, the system can systematically explore the agent action space and construct optimal reasoning trajectories. In this process:

• Each node represents an agent-generated response or an intermediate reasoning step

• Edges represent possible transitions between reasoning states

MCTS guides the search through four standard stages:

1. Selection Stage: Uses the UCT algorithm to select the best agent

2. Expansion Stage: Instantiates new role-specific agents

3. Simulation Stage: Simulates a series of agent collaboration steps from the current state, evaluating quality through judging agents and scoring agents

4. Backpropagation Stage: Propagates the simulation results back along the search path, updating the evaluation scores of all relevant nodes

This design enables HALO to find the most effective path among a large number of possible multi-agent collaborations, making it particularly suitable for complex reasoning tasks.

How Monte Carlo Tree Search (MCTS) guides multi-agent reasoning through selection, expansion, simulation, and backpropagation stages. Each node represents an Agent, and edge transitions are guided by execution results and evaluation feedback.

How Does MCTS Guide Multi-Agent Collaboration?

Monte Carlo Tree Search plays a core guiding role in the HALO framework, transforming complex multi-agent collaboration problems into structured search processes:

MCTS Stage

Implementation in HALO

Selection Stage

Recursively selects the best agent using the UCT formula, balancing exploration and exploitation

Expansion Stage

Adds untried actions for selected agents, increasing the breadth of the search tree

Simulation Stage

Simulates a series of agent collaboration steps from the current state, evaluating quality through judging agents and scoring agents

Backpropagation Stage

Propagates the simulation results back along the search path, updating the evaluation scores of all relevant nodes

This method introduces a reward signal adjustment mechanism based on judgment results, reinforcing successful paths and penalizing failed paths, ensuring the system can find optimal solutions.

HALO Empowers Business Strategy Formulation

To demonstrate the powerful capabilities of the HALO framework in practical business scenarios, I wrote an example of business strategy formulation for the catering industry based on HALO. This example fully implements the three core components of the HALO framework, transforming simple user queries into comprehensive, professional business strategy reports. When users pose questions like "I am the CEO of a medium-sized chain restaurant, mainly operating Chinese fast food. How can I increase turnover and profit margins?", the system can automatically perform task decomposition, expert role matching, and optimal workflow construction, generating in-depth analyses and suggestions far exceeding ordinary prompt interactions.

上下滑动查看更多

Slide left and right to see more

In the implementation, we used:

• PromptAgent class to construct the adaptive prompt optimization module

• TaskDecompositionAgent class to implement the high-level planning agent

• RoleGenerationAgent class to perform middle-level role design

• MCTSWorkflowSearch class to implement MCTS-based workflow search

When the system runs, it first extracts the core problem type ("Competitive Strategy and Profit Improvement"), goals ("increase turnover and profit margins"), and key details (catering industry background, competitive environment, etc.) from the user's query. Then, the high-level planning agent decomposes the problem into a series of subtasks, such as "analyze customer group data", "analyze competitor strategies", and "evaluate menu profitability structure". For each subtask, the role design agent selects the most suitable combination of expert roles, such as market analysts, competitor analysts, financial analysts, etc. The MCTS workflow search engine explores different expert combination execution paths through multiple iterations and evaluates the quality of each path based on the execution results, ultimately finding the optimal reasoning trajectory and generating high-quality strategic recommendations. For more on MCTS, you can also refer to previous article "MultiOn and Stanford Latest Release: Agent Q Uses POMDP and MCTS to Increase Real Booking Rate to 95.4%".

Breakthrough Advantages of the HALO Framework

The HALO framework offers significant advantages over existing methods, with experimental results demonstrating its superior performance:

1. Overcoming Cognitive Overload: The hierarchical reasoning architecture distributes responsibilities such as planning, reasoning, and reflection to specialized agent layers, allowing each agent to focus on specific tasks

2. Increasing Task Execution Granularity: Adaptive agent instantiation and search-based workflow exploration enable the system to adapt to task requirements in real-time

3. Excelling at Expert-level Tasks: HALO performs exceptionally well in handling highly complex and expert-level reasoning tasks, especially in areas requiring deep domain knowledge

These advantages make HALO a powerful tool for solving complex problems.

Excellent Performance in Three Benchmarks

The project authors validated the effectiveness of the HALO framework through three benchmarks, with impressive results:

Benchmark

HALO Score

Improvement

Special Highlight

Code Generation (HumanEval)

95.2% (pass@1)

+12.8%

Significant improvement in the ability to generate correct code in a single attempt

General Reasoning (MMLU)

81.6% (accuracy)

+8.8%

Improved by 13.3% on ethical scenarios

Arithmetic Reasoning (MATH)

58.9% (accuracy)

+22.0%

Improved by 19.6% in the algebra subfield

On average, HALO improved performance by 14.4% compared to existing methods, fully demonstrating HALO's excellent capabilities in handling highly specialized and expert-level tasks.

Performance comparison of three computationally intensive subfields on the MATH dataset. Metrics reported as average accuracy (%) over three runs.

Contribution Analysis of HALO Components

The project authors demonstrated the importance of each HALO component through ablation experiments, with results showing that each component significantly contributes to overall performance:

• Removal of Adaptive Prompt Optimization Module: System performance decreased by an average of 5.3%, with the MMLU test being most affected (from 81.6% to 75.4%)

• Removal of High-level Planning Agent: Performance decreased by an average of 11.3%, HumanEval dropped from 95.2% to 83.8%, and MATH from 58.9% to 44.7%

These experimental results clearly indicate that every component of HALO is indispensable, working collaboratively to enhance the system's overall performance.

Impact of removing the Adaptive Prompt Optimization Module and High-level Planning Agent on GPT-4o's performance across three benchmarks.

Solving Real-world Application Problems

The HALO framework can solve or significantly improve various real-world application problems, offering new ideas for AI Agent product development:

Application Area

HALO Advantages

Typical Application Scenarios

Expert-level Complex Reasoning Tasks

Hierarchical decomposition of complex problems handled by different specialized agents

Advanced scientific research problems, complex legal case analysis, medical diagnosis

User Prompt Engineering Problems

Automatically converts raw queries into high-quality structured prompts

Education assistance systems, customer service robots, public information query systems

Dynamic Adaptation to Changing Environments

Dynamically instantiates specialized agents and adjusts collaboration strategies in real-time

Requirements changes in software development, dynamic resource scheduling, real-time decision support systems

Practical Recommendations for HALO Implementation

For developers and product managers interested in implementing the HALO framework, here are some practical recommendations to consider:

1. Start with Pain Points: Identify scenarios requiring high specialized knowledge, complex reasoning, or dynamic adaptability, as these are areas where HALO can bring significant improvements

2. Focus on Role Design: Although HALO can dynamically instantiate agents, initial role design remains crucial, requiring consideration of domain characteristics and task requirements

3. Reasonable Resource Allocation: Especially for the workflow search engine, MCTS requires certain computational resources to explore possible collaboration paths

4. Establish Evaluation Mechanisms: Continuously optimize system configurations by monitoring HALO's performance with specific metrics

These recommendations will help you fully leverage the potential of the HALO framework to provide users with excellent AI Agent products.

Concluding Remarks

The HALO framework represents a significant milestone in multi-agent collaboration systems. It addresses the core limitations of existing systems through a hierarchical reasoning architecture, adaptive prompt optimization, and MCTS-based workflow search. Experimental results fully demonstrate HALO's excellent performance in tasks such as code generation, general reasoning, and arithmetic reasoning, especially its significant advantages in highly specialized and expert-level tasks. For AI Agent product developers, HALO provides a powerful framework that enables you to build smarter, more flexible, and more efficient multi-agent systems.

The future is here, send "group" to the official account backend

May we walk together

Please contact me for reprinting

🎉Let's create more beauty together!🎉

If you find this article helpful

Thank you for giving me [like] and [in-sight]

👉WeChat ID: xiumaoprompt

Please state your purpose when adding!

HALO, a Hierarchical Dynamic Prompting Framework Based on MCTS, Enabling Agents to Always Find the Optimal Path | Latest

Share Short URL