Today, DeepMind officially released AlphaEvolve – a revolutionary evolutionary coding agent powered by LLMs. It's not just a code generation tool, but a powerful system capable of evolving entire codebases for general algorithm discovery and optimization.
LLMs are incredibly versatile. They can summarize documents, generate code, and even propose new ideas. Now, DeepMind is extending these capabilities to fundamental and highly complex problems in mathematics and modern computing.
Matej Balog, a researcher at Google DeepMind, said in an interview: "It can find incredibly complex algorithms — spanning hundreds of lines of code, with intricate logical structures, far beyond the scope of simple functions."
Terence Tao also stated on Mathstodon that he has been collaborating with Google DeepMind to explore AlphaEvolve's potential mathematical applications.
Most AI models hallucinate. Due to their probabilistic architecture, they sometimes confidently make things up. In fact, newer AI models like OpenAI's o3 are more prone to hallucination than their predecessors.
AlphaEvolve introduces an ingenious mechanism to reduce hallucination: an automated evaluation system. This system uses the model to generate, critique, and produce a pool of possible answers to a problem, and automatically evaluates and scores the accuracy of the answers.
AlphaEvolve also combines the creative problem-solving capabilities of the Gemini model with an evaluator that automatically verifies answers, and utilizes an evolutionary framework to continuously optimize the most promising solutions.
AlphaEvolve enhances the efficiency of Google's data centers, chip design, and AI training processes – including training the large language models that underpin AlphaEvolve itself. It has also helped design faster matrix multiplication algorithms and found new solutions to open mathematical problems, with huge potential for application in many fields.[]
Designing Better Algorithms with Large Language Models
AlphaEvolve is an agent that can go beyond single function discovery, evolve entire codebases, and develop more complex algorithms. Unlike many systems that only evolve single functions, a major highlight of AlphaEvolve is its ability to iteratively optimize and evolve entire codebases.
This builds upon DeepMind's 2023 work, FunSearch, where DeepMind first demonstrated that large language models could generate functions in computer code to help discover new knowledge on open scientific problems and prove its correctness.
Table 1 shows a comparison of the capabilities and typical behavior of AlphaEvolve and previous agents.
Core Mechanism: Combining LLM Creativity with Automated Evaluation
So, how does AlphaEvolve achieve this powerful code evolution capability? Its core lies in cleverly integrating the creativity of large language models with the objective feedback of automated evaluation into an evolutionary framework.
This process can be summarized as a continuous "Generate - Evaluate - Evolve" cycle:
The diagram shows the system's workflow: The prompt sampler first constructs input prompts, driving the language model to generate new programs; these programs are scored by the evaluator and stored in the program database. The database continuously optimizes program selection through evolutionary algorithms, driving the system's continuous evolution.
Generation
AlphaEvolve combines several state-of-the-art large language models: Gemini Flash (DeepMind's fastest and most efficient model) expands the breadth of creative exploration, while Gemini Pro (DeepMind's most powerful model) provides the critical depth required for solutions with its profound insights.
The goal of this integrated strategy is to improve computational throughput while maintaining the quality of generated solutions. These models collaborate to generate computer programs that implement algorithmic solutions.
1. Prompt Sampling
As AlphaEvolve relies on current state-of-the-art LLMs, it supports various forms of custom operations and can provide long context information in its primary evolution prompt.
This prompt typically includes multiple existing solutions sampled from the program database, as well as system instructions on how to modify specific solutions. Users can further customize the prompt through explicit context, rendered evaluation results, and other means.
2. Creative Generation
To drive the evolutionary process, AlphaEvolve leverages the capabilities of state-of-the-art LLMs. Their core role is to understand information about previously generated solutions and propose diverse new ways to improve them.
Although AlphaEvolve itself does not depend on a specific model (it is model-agnostic), ablation studies have shown that AlphaEvolve's performance continues to improve as the performance of the underlying LLM increases.
Output Format: When AlphaEvolve requests an LLM to modify existing code, especially in a larger codebase, it asks for a set of differential (diff) fragments in a specific format. The format is as follows:
Here, the code between <<<<<<<SEARCH and ======= is the original fragment in the current program version that needs to be precisely matched; and between ======= and>>>>>>> REPLACE is the new code fragment used to replace the original fragment.
In this way, modifications can be precisely applied to specified parts of the code. If the code being evolved is short, or when rewriting from scratch is more reasonable than local modification, AlphaEvolve can also be configured to ask the LLM to directly output the complete code block without using the differential format.
Evaluation
In this cycle, automated evaluation metrics play a crucial role. AlphaEvolve uses these metrics to verify, run, and score proposed programs. These metrics provide objective, quantifiable evaluations of the accuracy and quality of each solution.
1. Task Specification
Evaluation: Since AlphaEvolve deals with machine-gradeable problems, users must provide a mechanism for automatically evaluating generated solutions. This is typically in the form of a function h, which maps a solution to a set of scalar evaluation metrics (to be maximized), usually implemented as a Python function (evaluate).
API: To support the evolution of multiple components within a codebase, AlphaEvolve provides an input API that specifies which code blocks can be evolved by the system by adding special markers (e.g., # EVOLVE-BLOCK-START and # EVOLVE-BLOCK-END in comments) to the code. The user-provided code in these evolve blocks serves as the initial solution, and the remaining code forms the skeleton connecting these evolvable parts so that they can be called by the evaluate function.
Flexibility in choosing the level of abstraction: AlphaEvolve can be applied to the same problem in very different ways, especially when the evolved program is not the final output but a means to discover the solution. For example, one can evolve solutions represented as raw strings, evolve a well-defined function to construct a solution from scratch, evolve a custom search algorithm to find a solution within a fixed computational budget, or even jointly evolve intermediate solutions and the search algorithm.
2. Evaluation Process and Optimization Mechanism
To track AlphaEvolve's progress and select ideas to propagate in future generations, each new solution proposed by the LLM is automatically evaluated. The basic process is to execute the user-provided evaluation function h. In practice, AlphaEvolve supports various mechanisms to make evaluation more flexible and efficient:
Evaluation Cascade: Use increasingly difficult test sets to quickly filter solutions.
LLM-generated Feedback: Use LLMs to score attributes that are difficult to quantify.
Parallelized Evaluation: Execute evaluation tasks distributively to improve efficiency.
Supports Multiple scores optimization, optimizing multiple objectives simultaneously, and even improving single objective results.
Evolution
In its evolutionary process, AlphaEvolve continuously generates solutions with evaluation results (scores and program outputs). These solutions are stored in an evolutionary database. The primary goal of this database is to optimally reuse previously explored ideas in future generations.
A key challenge in designing such a database is balancing exploration with exploitation: that is, while continuously improving the current best solution, maintaining diversity of solutions to encourage exploration of the entire search space.
In AlphaEvolve, this evolutionary database implements an algorithm inspired by a combination of the MAP-elites algorithm and island-based population models.
This makes AlphaEvolve particularly helpful in broad domains like mathematics and computer science where progress can be clearly and systematically measured.
Optimizing DeepMind's Computing Ecosystem
Over the past year, DeepMind has deployed algorithms discovered by AlphaEvolve across Google's computing ecosystem, including DeepMind's data centers, hardware, and software.
The impact of these improvements is magnified across DeepMind's AI and computing infrastructure, building a more powerful and sustainable digital ecosystem for all of DeepMind's users.
The diagram illustrates how AlphaEvolve helps Google provide a more efficient digital ecosystem, from data center scheduling and hardware design to AI model training.
Improving Data Center Scheduling
AlphaEvolve discovered a simple but highly effective heuristic method that helps Borg coordinate Google's massive data centers more efficiently. This solution has been in production for over a year, on average continuously recovering 0.7% of Google's global computing resources. This sustained efficiency improvement means that at any given moment, more tasks can be completed on the same computing resources.
AlphaEvolve's solution not only achieved strong performance but also provided important operational advantages of human-readable code: interpretability, debuggability, predictability, and ease of deployment.
Assisting Hardware Design
AlphaEvolve proposed a Verilog rewriting solution that removed redundant bits in the key, highly optimized arithmetic circuit for matrix multiplication. This proposal was verified through strong validation, confirming that the modified circuit maintained functional correctness, and has been integrated into the upcoming Tensor Processing Unit (TPU).
By proposing modifications in the standard language of chip designers (Verilog), AlphaEvolve facilitates collaboration between AI and hardware engineers to accelerate the design of future dedicated chips.
Boosting AI Training and Inference Efficiency
AlphaEvolve is significantly accelerating AI performance and research progress. By finding smarter ways to decompose large matrix multiplication operations, it increased the speed of this critical kernel in the Gemini architecture by 23%, which in turn reduced Gemini's training time by 1%.
In addition to performance gains, AlphaEvolve significantly reduced the engineering time required for kernel optimization, from weeks spent by experts to just a few days with automated experiments.
AlphaEvolve is also capable of optimizing low-level GPU instructions. In Transformer-based AI models, it achieved up to a 32.5% speedup for the FlashAttention kernel implementation. This optimization helps experts pinpoint performance bottlenecks and easily integrate improvements.
Advancing the Frontier of Mathematics and Algorithm Discovery
Faster Matrix Multiplication Algorithms
AlphaEvolve can also propose new methods for complex mathematical problems, such as matrix multiplication – a foundational problem in computer science. Through a gradient-based optimization program, AlphaEvolve designed and discovered an algorithm using 48 scalar multiplications to multiply 4x4 complex matrices.
This finding improves upon Strassen's 1969 algorithm, marking the first known improvement in this setting in 56 years, and also indicates a significant leap beyond DeepMind's previous work, AlphaTensor.
Solving Open Problems
To explore AlphaEvolve's breadth, DeepMind applied the system to over 50 open problems in mathematical analysis, geometry, combinatorics, and number theory. The system's flexibility allowed most experiments to be set up within a few hours.
In approximately 75% of cases, it rediscovered state-of-the-art solutions; in 20% of cases, AlphaEvolve improved upon previously known best solutions, making progress on the corresponding open problems.
For example, on the Kissing number problem, which has attracted mathematicians' interest for over 300 years, AlphaEvolve found a configuration of 593 outer spheres and established new lower bounds in 11 dimensions.