Making LLMs Work Like a Company: Microsoft Turns 'Concurrent Thinking' into a Protocol, Higher Accuracy and 28% Reduction in Critical Path Latency

We have long treated LLMs as lone warriors capable of tackling challenges alone, and this works well for many tasks.

However, once problems involve multi-step dependencies, branch exploration, and mid-process verification, sequential thinking (Sequential Thinking) inference chains start to struggle and even collapse. The longer the chain, the slower and more fragile it becomes. The human-wave tactic of 'parallel thinking (Parallel Thinking)' generates multiple independent thinking paths for the same problem using the model, then selects the final answer via majority vote. But without communication, it's often dragged down by the slowest path, increasing overall inefficiency and costs exponentially. Instead of trading off between 'longer single chains' and 'more parallel samples,' why not change the approach? Can we make the model work like a company or a small organization?

A paper from Microsoft Research provides a concrete method. They propose the concept of 'Agentic Organization' and offer an executable text-level action protocol that embeds 'concurrency' into the reasoning process. The same model can act as both 'organizer' and 'worker.' The organizer dispatches subtasks (Fork) when needed, allowing multiple workers to advance independently; at key nodes, it retrieves and merges intermediate conclusions (Join); if necessary, it continues dispatching new directions until final answering (Answer). This doesn't require stacking extra models or changing the network structure; it relies entirely on standardized text tags to decompose, schedule, and synchronize reasoning. Empirical results show that this 'organized thinking' not only improves accuracy on math benchmarks but also significantly shortens critical path latency by about 28%, yielding better answers in shorter 'mandatory serial parts.'

If past approaches represent 'blindly lengthening one chain' and 'everyone goes their own way and votes at the end,' this work addresses a third: teaching the model to plan, divide labor, synchronize, and merge. From here, LLMs are no longer just reasoning individuals but systems that organize reasoning.

AsyncThink Revealed: Equipping AI with 'Organizer-Worker' Dual-Core Drive

The core of the AsyncThink paradigm is an exquisitely designed 'Organizer-Worker' protocol. It completely overturns the traditional setup of AI as a single thinking entity, allowing the same language model to dynamically play two distinctly different roles when solving problems:

• Organizer: Like an experienced project manager or team brain, it handles global strategic planning, task decomposition, and process coordination. It doesn't dive into execution details but orchestrates via two key text instructions:

• <FORK-i> (Fork/Dispatch): When the 'organizer' identifies an independently handleable subtask, it immediately uses <FORK-i> to assign the task with a clear description to an idle 'worker.' Here, i is the unique task ID for tracking.

• <JOIN-i> (Join/Verify): When the organizer's main thinking line needs a subtask result as input, it issues <JOIN-i>. It pauses its thinking, waits for and receives the result from worker i, incorporates the new knowledge into its context, and continues.

• Worker: Like a focused, efficient engineer in the team, it receives specific subtasks from the organizer, deeply thinks and executes without distraction, then packages the final conclusion or key info and returns it via <RETURN> to the organizer.

The true power of this protocol lies in its 'asynchronous (Asynchronous)' nature, mirroring the most efficient team management in the real world:

Imagine a project manager (organizer) planning a complex software project. He first Forks 'database design' to Engineer A. After dispatching, he doesn't wait in place but immediately turns to the next module, Forking 'frontend UI development' to Engineer B. Meanwhile, Engineers A and B work in parallel. The manager can continue thinking about the overall architecture or Fork a third task to Engineer C. Only when he needs the final table structure for backend API design does he perform Join to retrieve Engineer A's results.

This asynchronous, parallel collaboration mode achieves exponential improvements in efficiency and flexibility compared to 'sequential thinking' (manager does everything himself) and 'parallel thinking' (three engineers each develop the full software from scratch and vote on the best version). It allows AI to dynamically build a 'thinking structure graph' executable concurrently, perfectly balancing breadth exploration and depth digging.

'Learning to Organize': How to Train an Ordinary AI into a Top Manager?

With the advanced 'organizer-worker' architecture, the next core question is: How to train an ordinary AI that only follows instructions into a top manager who seizes opportunities, delegates wisely, and plans efficiently? This is no easy task, as 'organizational ability' is highly abstract wisdom not definable by simple rules.

To this end, the paper designs a clever two-stage 'manager training program.'

Stage 1: Cold-Start Format Fine-Tuning (The Internship)

This stage's goal is to teach the model the 'company rules and jargon,' i.e., the syntax and basic usage of the Fork and Join protocol.

• Challenge: Vast internet data rarely contains complex management-style thinking traces with Fork-Join structures. The model has nothing to learn from.

• Solution: Researchers ingeniously use the stronger GPT-4o as a 'mentor' to synthetically generate high-quality training data. They show GPT-4o a few 'organizer-worker' collaboration examples, then have it generate complete thinking trajectories conforming to the protocol for specific problems.

• Results: After this 'pre-job training,' the model masters the formats for acting as organizer and worker, knowing how to issue and respond to instructions. But it's like a rote intern: processes correctly but can't make optimal organizational decisions based on situations. It 'knows how' but not 'why.'

Stage 2: Reinforcement Learning (The Real Job)

This is the key stage to forge the 'intern' into a 'top manager.' The model is pushed to real battlefields, learning 'management' art through constant trial-and-error and reflection. The core driver is a carefully designed reward and punishment mechanism (Reward System).

After each problem-solving attempt, the entire 'organized thinking' trajectory generated by the model is evaluated by the system and given a composite score. This score consists of three parts:

1. Accuracy Reward: The basic goal— is the team's final output correct? Solving the problem earns high 'performance bonus.' This is outcome-oriented, ensuring ultimate effectiveness of organizational behavior.

2. Format Reward: Did the organizer make违规 operations during command? E.g., trying to Fork new tasks when the team is full, causing 'overstaffing'; or Joining a non-existent task. Such low-level errors deduct 'compliance fines.' This ensures basic order in operations.

3. Thinking Concurrency Reward: The finishing touch of the training design. The system calculates the average 'busyness' of all 'workers' over the task cycle.

If the organizer cleverly schedules tasks so multiple workers are in parallel work most of the time, it earns high 'efficiency bonus.'

Conversely, if its commands lead to workers taking turns and idling most of the time, the reward is low.

By maximizing the final composite reward, the model is forced into deep 'management reflection.' It gradually realizes: merely getting the right answer isn't enough; it must organize the team in the most efficient, reasonable way. Simple tasks may not need division; complex ones require carefully designed parallel paths. Through repeated 'reviews,' the model's innate 'organizational strategies' evolve, transforming from a rigid instruction issuer to a truly strategic wisdom core.

Battlefield Roll Call: AsyncThink's Overwhelming Victories on Three Battlegrounds

Theory's elegance needs practice's test. Researchers rigorously tested the fully trained AsyncThink model on three battlegrounds of varying difficulty.

Battleground 1: Multi-Solution Countdown

This is a task demanding extreme thinking breadth. The model must use given numbers via add/sub/mult/div to find four different operation combinations equaling a target number.

• Situation: Traditional 'sequential thinking' models easily get stuck in local optima, finding 1-2 solutions and no more. 'Parallel thinking' finds more but inefficiently. AsyncThink showed crushing superiority.

• Tactical Review: AsyncThink's organizer learned a 'divide-and-conquer' strategy. It first Forks a task to a worker to 'specifically find mult/div-based combos'; meanwhile, the organizer focuses on add/sub. After return, it analyzes existing solutions and Forks new targeted explorations like 'try combining numbers X and Y.' This dynamic, iterative exploration greatly boosts multi-solution coverage and efficiency. Ultimately, AsyncThink leads on all metrics.

Battleground 2: Advanced Math Reasoning (AIME & AMC)

This demands extreme logical depth and rigor in Olympiad-level math competition problems.

• Situation: On this hardcore battlefield, AsyncThink achieved remarkable success again. Against higher-config, longer-step 'parallel thinking' models, AsyncThink not only surpassed in accuracy, with its 'critical path latency' (total time) stunningly reduced by 28%.

• Tactical Review: This means AsyncThink achieves higher-quality reasoning with fewer compute resources and less time. The paper's 'accuracy-latency frontier' graph clearly shows AsyncThink occupying the optimal 'low cost, high performance' zone across all configs. It proves a good 'organizational structure' yields far greater efficiency gains than just piling on compute.

Battleground 3: Ultimate Test—Generalization to Unseen Domains (Sudoku)

This is the research's brightest, most shocking highlight. It answers: What AsyncThink learns—is it task-specific 'routines' or general, transferable 'organizational wisdom'?

• Test Setup: The team ran a bold experiment. They took an AsyncThink model trained only on 'multi-solution countdown' and threw it into a completely new domain with different rules—4x4 Sudoku—with no extra Sudoku training.

• Stunning Result: A miracle happened. Facing the unfamiliar Sudoku board, the model spontaneously and proficiently applied the Fork-Join organizational ability learned from countdown tasks. Its organizer analyzes the board, then Forks tasks like 'fill row 1 and check validity' to workers. This decomposes, parallels, verifies the complex Sudoku, and its solving accuracy even surpassed traditional models specially trained for Sudoku.

• Profound Insight: This convincingly proves AsyncThink learns not rigid 'problem-solving templates' but abstract, cross-domain 'meta-skill': 'how to organize and plan solving unknown problems.' Like a great general whose command art works not just on plains but mountains, cities, etc. This marks a solid step toward true 'general intelligence' for AI.

Implications for the Future: Farewell 'Brute Force Miracles,' Embrace 'Organizational Emergent Wisdom'

This research is like a thunderclap, opening a new, imaginative dimension for the dominant AI paradigm of 'bigger models, more data = better.' It brings profound insights to every AI practitioner and observer.

1. Redefining 'Model Capability': A model's strength lies not just in its 'individual knowledge' depth but in its 'organizational intelligence' level. Future AI evaluation may shift from parameter count to efficiency in task decomposition, parallel collaboration, and result integration.

2. New AI Development Ideas: For AI engineers, this means shifting focus from 'fine-tuning single models better' to 'designing and training efficient multi-agent collaborative systems.' AsyncThink offers a plug-and-play 'organizational framework' for app developers to build 'AI expert teams' solving complex domain problems.

3. Stairway to More Robust, Trustworthy AI: An organizing system is inherently more robust than a singleton. Under AsyncThink, if a 'worker' errs or loops, the 'organizer' can detect, abort, or reassign. This built-in fault tolerance and correction is key to truly reliable, trustworthy AI systems.

Epilogue: Intelligence's Next Chapter Begins with 'Organization'

AsyncThink lets us glimpse AI's magnificent future landscape. Here, AI evolves from isolated 'superbrains' to vast, efficient, dynamically evolving 'superorganisms.'

The paper's authors further envision exciting possibilities at the end:

• Recursive Organizational Structures: Any 'worker' facing complex tasks can 'promote' to 'sub-organizer,' Forking its own 'sub-worker' team, forming infinitely nestable, flexible hierarchies for ultra-complex systemic problems.

• Human-AI Hybrid Intelligent Organizations: Human experts can seamlessly integrate. AI organizers Fork tasks needing human common sense, intuition, or ethics to humans; human managers Fork massive data processing/compute to AI worker legions.

From imitation to understanding, computation to reasoning, individual to organization. AI's evolution enters a new era. AsyncThink may be just the overture to this great change, but its 'collaboration' and 'organization' melody will define next-gen AI's core chapter. What we want may no longer be a smarter 'Einstein' but a 'super organizer' leading countless 'Einsteins' to collaborate. And that era is quietly dawning.

The future is here; let's journey together if fate allows!

Making LLMs Work Like a Company: Microsoft Turns 'Concurrent Thinking' into a Protocol, Higher Accuracy and 28% Reduction in Critical Path Latency

Share Short URL