DreamCoder: Growing Generalizable, Interpretable Knowledge with Wake-Sleep Bayesian Program Learning

Expert problem-solving relies on powerful languages of thought – systems for thinking about problems and their solutions. Acquiring expertise means learning these languages – both the conceptual systems and the skills for using them. We introduce DreamCoder, a learning system that solves problems by writing programs. It progressively builds expertise by creating programming languages for expressing domain concepts, combined with neural networks that guide the search for programs in these languages. A “wake-sleep” learning algorithm alternately expands the language, adding new symbolic abstractions, and trains the neural network on imagined and replayed problems. DreamCoder is capable of solving classic inductive programming tasks, as well as creative tasks like drawing images and building scenes. It rediscovers the fundamental principles of modern functional programming, vector algebra, and classical physics, including Newton’s laws and Coulomb’s law. Concepts are built compositionally from previously learned content, forming multi-layered, interpretable symbolic representations that transfer to new tasks while flexibly and efficiently scaling with experience.

A long-standing dream of artificial intelligence (AI) is to build machines that learn like children (1) – starting with little knowledge and gradually growing to the full knowledge possessed by human adults. This goal remains distant, however, because human intelligence depends on many learning capabilities not yet mastered by artificial systems. While current machines are typically designed for single categories of tasks, humans learn to solve a wide variety of problems, from cooking to calculus to graphic design. Furthermore, while machine learning often requires large amounts of data and generalizes poorly, humans often achieve strong generalization from sparse experience. Perhaps the most striking difference is that humans build expertise – we acquire knowledge that can be communicated and extended, constantly generating new concepts on top of existing ones, becoming better and faster learners once we master a domain.

This article introduces DreamCoder, a machine learning system designed to come closer to human capabilities – to efficiently discover interpretable, reusable, and generalizable knowledge across a wide range of domains. DreamCoder embodies an approach we call “wake-sleep Bayesian program induction.” The rest of this section explains the core ideas behind it: what it means to view learning as program induction; why viewing program induction as inference in a Bayesian model is valuable; and how a “wake-sleep” algorithm makes the model grow with experience to learn more efficiently, making this approach practical and scalable.

Our framing of learning as program induction dates back to the early days of AI (2): we view learning a new task as searching in a program space for a program that solves the task or has the desired behavior. Figure 1 shows examples of program induction tasks in eight different domains where DreamCoder has been applied (Figure 1A), and a detailed illustration of a task in the classic list processing domain: learning a program that sorts a list of numbers given only a few input-output examples (Figure 1B). Viewing learning as program induction offers certain advantages over purely statistical methods.

Symbolic programs exhibit strong generalization capabilities – intuitively, they tend to extrapolate rather than merely interpolate. This also makes the learning process highly sample-efficient: often, just a few examples are sufficient to specify a function to be learned. By design, programs are highly human-interpretable: they encompass the standard modeling languages in our science and engineering, and reveal knowledge that can be reused and composed to solve increasingly complex tasks. Finally, programs are universal: in principle, any Turing-complete language can represent all computational problems intelligence can solve.

Despite these advantages, and successful applications in several domains (3–9), program induction has had limited impact in AI. The Bayesian framework helps to clarify the challenges, while also pointing to paths for addressing them. The programming language we use for searching defines the hypothesis space and the prior probability of learning; shorter programs in that language have higher prior probabilities. While any universal programming language supports program induction, previous systems have often found it necessary to start with a carefully designed domain-specific language (DSL) that provides strong, hand-tuned inductive biases or prior knowledge. Without a DSL, programs to be discovered would be too long (low prior probability), making them difficult to search for within a reasonable time. Even with carefully designed priors, finding the optimal program search is almost always intractable for general algorithms due to the combinatorial nature of the search space. Therefore, most practical program induction applications require not only hand-designed DSLs but also specially designed search algorithms to leverage that DSL for rapid inference. These two requirements limit the scalability and broad applicability of program induction.

DreamCoder addresses these two bottlenecks by learning to compactly represent and efficiently induce programs within a given domain. The system achieves “learning to learn” – writing better programs and searching for them more efficiently – by jointly constructing two different types of domain expertise: (1) explicit declarative knowledge, manifested as a learned domain-specific language that captures conceptual abstractions common across tasks; and (2) implicit procedural knowledge, manifested as a neural network that guides how to use the learned language to solve new tasks, embodying a learned domain-specific search strategy. From a Bayesian perspective, the system simultaneously learns the prior distribution of programs and an inference algorithm parameterized by a neural network for efficiently approximating the posterior distribution of programs given observed task data.

DreamCoder learns both of these aspects in a self-supervised, self-guided manner, progressively building them over multiple encounters with a set of training tasks. This allows learning to scale to new domains and, given sufficiently diverse training tasks, to expand within a domain. Typically, a moderate number of tasks is sufficient to initiate learning in a new domain. For example, the list sorting function shown in Figure 1B represents one of 109 tasks processed by the system, which gradually built a library of about 20 basic operations for processing lists of numbers during learning, and these operations in turn became fundamental components for solving new tasks encountered in the future.

The language learned by DreamCoder takes the form of a multi-layered hierarchical abstraction structure (Figures 1B and 7A, B). These hierarchies are similar to internal representations in deep neural networks, but here, each layer is built upon symbolic code defined by previous code layers, making these representations naturally human-interpretable and explainable. The abstraction network grows over time, with each layer of concepts building upon previously acquired concepts, inspired by how humans build conceptual systems: we learn algebra before calculus, and arithmetic before algebra; we learn to draw simple shapes before drawing complex patterns. For instance, in the list processing example (Figure 1B), our model implements sorting a sequence of numbers by calling a four-layer deep library component – “take the nth largest element” – which in turn calls lower-level learned concepts: maximum and filter. Theoretically, equivalent programs could also be written using the initial language, but the ultimately learned language generates programs that are easier to understand and shorter. If expressed only with initial primitives, these programs would be too complex for a learner to find them in a reasonable amount of time. Most problems only become practically solvable after acquiring domain-specific expertise.

The name DreamCoder comes from its iterative growth of domain knowledge through a “wake-sleep” cycle, a mechanism broadly inspired by memory consolidation processes occurring during different stages of sleep (10, 11). In general, “wake-sleep” Bayesian learning (12) alternates between training a generative model that defines the learner’s prior and a neural recognition model that learns to invert this generative model given new data. In the “wake” phase, the generative model is used to explain new data, guided by the recognition model; in the “sleep” phase, the recognition model is trained offline using imaginary datasets sampled from the generative model (referred to as “dreams” or “fantasies”).

DreamCoder develops the “wake-sleep” approach for program learning: its learned language defines a generative model over programs and tasks, where each program solves a specific hypothetical task; its neural network learns to recognize patterns between tasks to best predict program components that might solve any given new task. In the “wake” phase, the system receives data from multiple tasks and attempts to synthesize programs that solve them, leveraging the neural recognition model to propose candidate programs. Learning occurs in two distinct but interleaved “sleep” phases: one extends the learned language (i.e., the generative model) by incorporating new abstractions found in programs from the “wake” phase; the other optimizes the neural network (i.e., the recognition model) by training on “fantasy” programs sampled from the generative model.

This “wake-sleep” architecture builds upon and further integrates two existing ideas: Bayesian multi-task program learning (5, 13, 14) and neuro-guided program synthesis (15, 16). These two directions have had significant impact in recent literature but are combined for the first time in our work, starting with the EC2 algorithm (17), and now extended on a larger scale in DreamCoder (see Section S3 for further discussion of prior work).

The resulting system has wide application potential. We describe its application in eight domains (Figure 1A): classic program synthesis challenges, more creative visual drawing and construction problems, and ultimately library learning for recursive programming, vector algebra, and basic physics languages. All our tasks involve inducing programs from very small amounts of data, such as five to ten examples of a new concept or function, or an image or scene depicting a new object. The learned languages cover both deterministic and probabilistic programs, as well as programs that can operate generatively (e.g., generate images or plans) and conditionally (e.g., map inputs to outputs).

In summary, we hope these applications illustrate the potential of program induction as a practical, general, and data-efficient method for building interpretable and reusable knowledge in AI systems.

Wake/Sleep Program Learning

We now describe the specific details of learning in DreamCoder, beginning with an overview of the algorithm and its mathematical formalization, followed by a deeper dive into its three phases. The learning process is iterative, with each iteration (Equation 1, Figure 2) going through a cycle of a “wake” phase and two “sleep” phases: the “wake” phase attempts to solve tasks, followed by two “sleep” phases to learn how to solve new tasks.

In the wake phase (Figure 2 top), the system samples tasks from a random minibatch in the training set (or potentially the entire training set, depending on domain size and complexity). It then searches for programs that solve each task by enumerating programs in decreasing order of probability under the recognition model Q(ρ|x), and checks if a program ρ assigns positive probability to solving that task (i.e., P[x|ρ] > 0).

Since the model may find many programs that solve a particular task, we keep only a small subset (k = 5 in beam search) of programs with the highest posterior probability P[ρ|x, L] and marginalize over this set of candidate programs in the sleep update of Eq. 1.

We represent programs as polymorphically typed λ-calculus expressions, a highly expressive formal system containing conditionals, variables, higher-order functions, and the ability to define new functions.

Abstraction Phase

In the abstraction sleep phase, the model grows its knowledge system by expanding its library of concepts, aiming to discover specialized abstract structures that make it easier to express solutions for current tasks. This “ease of expression” translates into a preference for libraries that best compress the programs found in the wake phase. The objective of abstraction sleep (Equation 1) is equivalent to minimizing the description length of the library (−log P[D]), plus the description length of the programs found in the wake phase after reconstruction.

Intuitively, we maximize a Bayesian criterion by “compressing reused code”; but unlike compressing repeated syntactic structures, we reveal recurring semantic patterns by reconstructing programs.

Code can be reconstructed in infinitely many ways, so we limit the number of λ-calculus evaluation steps between a program and its reconstruction, resulting in a finite but often extremely large set of reconstructions. Figure 3 shows an example: starting from a set of general primitives (including recursion implemented via the Y combinator), the model gradually discovers one of the most fundamental building blocks of modern functional programming – the higher-order function map. In this example, there are approximately 10¹⁴ possible reconstructions – a number that grows exponentially with program size and the maximum allowed evaluation steps.

To address this exponential explosion, we introduce a new data structure for representing and manipulating these sets of reconstructions, combining ideas from version space algebras and equivalence graphs, and derive a dynamic programming algorithm for its construction (see Appendix S4.5). This data structure grows polynomially with program size, benefiting from factored representations of shared subtrees; however, it remains exponential with respect to the bound on the maximum evaluation steps. We can control the scale of this exponential term by setting a small bound (e.g., 3) without affecting performance. The result is a significant efficiency improvement: a version space containing 10⁶ nodes can be computed in minutes, representing 10¹⁴ reconstructions from Figure 3 that would otherwise take centuries to explicitly enumerate and search.

Dream Phase

In the dream sleep phase, the system trains its recognition model to accelerate the program search process during subsequent wake phases for problem-solving. We implement the recognition model as a neural network and inject domain knowledge through its network architecture: for example, when inducing graphic programs from images, we use a convolutional network to bias it towards learning useful image features.

We train the recognition network on two types of self-supervised data sources: replayed data of programs discovered during the wake phase, and “fantasy” data, which are programs randomly sampled from library L. Replayed data ensures that the recognition model can be trained on tasks it actually needs to solve and does not forget how to solve them; “fantasies” provide a large and highly diverse training data source, which is crucial for data efficiency: becoming an expert in a domain is neither a few-shot learning problem nor a big data problem. We typically train DreamCoder on 100–200 tasks, which is too small a sample size for a high-capacity neural network. However, once the model learns a customized domain library, we can sample or generate “dreams” infinitely from it to train the recognition network.

Our dream phase differs from the dream phase in traditional “wake-sleep” methods. Classic “wake-sleep” methods sample a random program from the generative model, execute it to generate a task, and train the recognition network to predict the sampled program from that task. We, however, view “dreaming” as the process of generating a sequence of random problems and solving them during sleep in an active manner, using the same program search process as in the wake phase. We then train the recognition network to predict the discovered solutions based on these problems.

Specifically, we train the recognition model Q to perform Maximum A Posteriori (MAP) inference by maximizing the following expectation:

where the expectation is taken over all tasks. If this expectation is computed with respect to the empirical distribution of tasks, the training data is replayed data; if it is computed with respect to samples from the generative model, the training data is fantasy data. We train using a 50/50 mixture of replayed and fantasy data; for fantasy tasks that map inputs to outputs, we sample inputs from the training tasks. While theoretically Q could be trained to perform full posterior inference, our MAP objective has an advantage: it teaches the recognition network to find the simplest, canonical solution for each problem. More technically, our MAP objective breaks syntactic symmetry in the program space by forcing the network to concentrate all probability mass on a unique member within a set of semantically equivalent but syntactically different expressions. Hand-coded symmetry breaking mechanisms have proven crucial for many program synthesizers (22, 23); for theoretical and empirical analysis of DreamCoder’s learned symmetry-breaking mechanisms, see Appendix S4.6.

Results

We first experimentally studied DreamCoder in two classic benchmark domains: list processing and text editing. In both cases, tasks were defined by conditional mappings (i.e., input/output examples) and training started from a general functional programming foundation, including basic routines such as map, fold, cons, car, cdr, etc.

Our list processing tasks comprised 218 problems from (17), divided 50/50 into test and training sets, with 15 input/output examples provided for each task. In solving these problems, DreamCoder synthesized about 20 new library routines (see Appendix S1.1) and rediscovered higher-order functions such as filter. Each abstraction loop built upon concepts discovered in previous sleep cycles – for instance, the model first learned filter, then used it to learn to extract the maximum element from a list, then used this routine to learn a new library routine for extracting the nth largest element from a list, and finally used this new routine to sort lists of numbers (see Figure 1B).

Synthesizing programs that can edit text is a classic problem in programming language and artificial intelligence literature (18), and algorithms for synthesizing text editing programs have been applied in Microsoft Excel (7). For example, such systems can observe mappings like “Alan Turing” → “A.T.” and infer a program that converts “Grace Hopper” to “G.H.”. Previous text editing program synthesizers relied on hand-designed libraries of basic operations and hand-designed search strategies. Here, we jointly learn these two elements and achieve performance comparable to current state-of-the-art general-domain program synthesizers.

We trained the system on 128 automatically generated text editing tasks and tested on 108 text editing problems from the 2017 SyGuS (24) program synthesis competition. Before learning, DreamCoder solved 3.7% of the problems within 10 minutes, with an average search time of 235 seconds; after learning, it solved 79.6% of the problems, and much faster, with an average solution time of only 40 seconds. The best performing synthesizer in this competition, CVC4, solved 82.4% of the problems – but the competition conditions allotted 1 hour per problem and 8 CPU cores, under which more relaxed computational resources, we solved 84.3% of the problems.

Furthermore, SyGuS provided each text editing problem with its own distinct hand-designed library of basic operations. We, on the other hand, learned only a unified text editing concept library applicable to all editing tasks, which is a necessary prerequisite for practical applications.

Next, we consider more creative tasks: generating images, plans, and text. Programmatic or generative visual concepts – from Bongard problems (25), handwritten characters (5, 26) to Raven’s Progressive Matrices (27) – are widely studied in AI and cognitive science because they bridge low-level perception and high-level reasoning.

Here, inspired by LOGO Turtle graphics (28), we let the model control a “pen” with imperative flow control capabilities and arithmetic operations on angles and distances. The task is to draw a corpus of 160 images (divided 50/50 into test and training sets; see Figure 4A). After training DreamCoder for 20 “wake-sleep” cycles, we examined the learned library (see Appendix S1.1) and found interpretable parameterized drawing routines corresponding to categories of visual objects in its training data, such as polygons, circles, and spirals (Figure 4B) – the system learned the basic object types in its visual world in an unsupervised manner.

Additionally, it learned more abstract visual relationships, such as radial symmetry, and added them to the library as a new higher-order function (Figure 4C).

Visualizing the system’s “dreams” during its learning process can reveal how the generative model guides the recognition model’s training: as the library grows and becomes more adapted to the current domain, the neural network receives richer and more diverse training data. In the early stages of learning, random programs written using the library are simple and loosely structured (Figure 4D), offering limited training value for the recognition model; after learning, the system’s “dreams” become richly structured (Figure 4E), combining underlying building blocks and patterns obtained from training data in creative ways that were never present in its wake experience, yet are highly suitable for training a recognition model with broad generalization capabilities (29).

Inspired by the classic AI “copy demo” task – where an agent observes a tower built from blocks and then attempts to reconstruct it (30) – we next provided DreamCoder with 107 “copy tower” tasks (divided 50/50 into test and training sets, see Figure 5A), where the system needed to simultaneously observe an image of a tower and the position of each of its blocks, and write a program to plan how a simulated robotic arm would build this tower. The control flow primitives used by the system were the same as those for LOGO graphics. In its learned library, we found parameterized “options” (31) for building block towers, including concepts like arches, stairs, and bridges, which also appeared in the model’s dreams (Figures 5C-D).

Next, we consider the ability to learn probabilistic generative concepts with few examples, a natural human ability, such as learning new rules in natural language (32), learning regular patterns of symbols and signs (5), or learning new motor programs to produce words (33). We first had DreamCoder infer a probabilistic regular expression (Regex, see example in Figure 1A) from a small number of strings, these strings coming from 256 CSV columns scraped from web pages (data source: (34), tasks divided 50/50 into test and training sets, 5 string examples provided for each concept). The system learned to infer regular expressions describing typical text concept structures, such as phone numbers, dates, times, or monetary amounts (see Figure S5). It can interpret many real-world text patterns and interpret them as probabilistic generative models to imagine new instances of these concepts.

For example, although DreamCoder does not understand the concept of a dollar amount, it can infer an underlying abstract pattern from examples like $5.70, $2.80, $7.60, and generate other examples like $2.40 and $3.30. For patterns with exceptions, such as -4.26, -1.69, -1.622, ..., -1, it infers a probabilistic model that typically generates strings like -9.9, but occasionally generates anomalies like -2. It can also learn some more obscure concepts – concepts that humans might not be familiar with, but which it can quickly learn and generalize from just a few examples: for instance, given examples like -00:16:05.9, -00:19:52.9, -00:33:24.7, it infers a generative concept capable of producing -00:93:53.2, and reasonable approximations such as -00:23=43.3.

Finally, we consider inferring real-valued parameterized equations that generate smooth trajectories (see Appendix S2.1.6 and Figure 1A, “Symbolic Regression”). The goal for each task is to fit data generated by a specific curve – either a rational function or a polynomial up to degree four. We initialized DreamCoder with addition, multiplication, and division operations, as well as key arbitrary real-valued parameters, and optimized these parameters using inner-loop gradient descent. We modeled each parameterized program as probabilistically generating a set of curves, and penalized the use of these continuous parameters using the Bayesian Information Criterion (BIC) (35). Our Bayesian mechanism learned to focus on programs that both explain the data and minimize the use of redundant continuous parameters. For example, when given real-valued data for 1.7x² − 2.1x + 1.8, it inferred a program with three continuous parameters; when given data for x−2.2/3.8, it inferred a program with only two continuous parameters.

Quantitative Analysis of DreamCoder Across Domains

To better understand how DreamCoder learns, we experimentally compared the full system with several variants lacking key components: either the neural recognition model (i.e., the “dream” sleep phase) or the ability to form new library routines (i.e., the “abstraction” sleep phase). We also compared it to the following baseline methods:

- Exploration-Compression (13): alternately searches for programs and compresses reused components to form a learned library, but without using our reconstruction algorithm;

- Neural Program Synthesis: trains a RobustFill (16) model on samples from the initial library;

- Enumeration: performs 24 hours of type-directed enumeration (23) for each task, generating and testing up to 400 million programs.

To isolate the role of the compression mechanism in learning good libraries, we also constructed two “Memorize” baselines. These variants extend the library by incorporating task solutions whole as new primitives; they do not attempt to compress, but simply memorize the solutions found in the wake phase for reuse on new problems (see (36)). We evaluated “Memorize” variants with and without the neural recognition model.

Across multiple domains, our model consistently solved the most held-out test tasks (Figure 6A; see Figure S13 for memorize baselines), and often completed them in the shortest time (mean 54.1 seconds; median 15.0 seconds; Figure S11). These results indicate that DreamCoder’s core components – library learning through reconstruction and compression in the “sleep-abstraction” phase, and recognition model learning in the “sleep-dream” phase – both make substantial contributions to its overall performance.

The synergy between these components is particularly evident in more creative and generative structured building domains (such as LOGO graphics and tower construction): in these domains, no alternative model could solve more than 60% of the held-out tasks, while DreamCoder learned to solve nearly 100% of them. The time required for DreamCoder to train to convergence, as shown in Figure 6A, varied by domain but typically took about one day with moderate computational resources (20–100 CPUs).

By observing the growth of the learned library over time (whether or not it included a learned recognition model), we found significant functional differences in its depth and scale. Across domains, deeper libraries correlated strongly with higher task solution rates (correlation coefficient r = 0.79), and systems with a learned recognition model performed better at all depths. Furthermore, the recognition model also led to a deeper final learned library, achieving higher asymptotic performance levels (Figure 6B, Figure S1). A similar, but weaker, relationship also existed between learned library size and performance. Thus, the recognition model appears to guide the learning of “better” libraries, where “better” relates to both the depth and breadth of the learned symbolic representations.

From the perspective of how DreamCoder’s recognition model guides library learning, we analyzed how these representations jointly embed the similarity structure of problems to be solved. DreamCoder first encodes a task in the activations of its recognition network, and then re-represents this task with a symbolic program that solves it. As learning progresses, these implicit initial representations gradually align with the explicit structure of the final program solutions, reflected in an increasing correlation between problem similarity in the recognition network’s activation space and the similarity of the code components used to solve the problems (see Figure S4; using χ² test, p < 10⁻⁴ before and after learning). Visualizing these learned task similarities through t-SNE embeddings revealed that as the model acquires a richer conceptual vocabulary, its representations progressively group tasks with higher abstract commonality together (Figure S3) – which may be similar to how human domain experts learn to categorize problems based on underlying principles rather than superficial similarities (37, 38).

From Learning Libraries to Learning Languages

Our experiments so far have investigated how DreamCoder grows from a “beginner” state, initially given some basic domain-specific procedures such that only the simplest problems have short, direct solutions; as the system gradually grows into an “expert” state, it masters various concepts that allow even the hardest problems to be solved with short, meaningful programs. Now we pose a question: can learning proceed from an even leaner starting point, without any foundational domain knowledge at all? Can DreamCoder develop a domain-specific language that includes both basic and advanced domain concepts, capable of solving all problems within a given domain, starting only from highly generic programming and arithmetic primitives?

Inspired by research in classical physics on inferring physical laws from experimental data (39–41), we first had DreamCoder learn to describe 60 different physical laws and mathematical identities from AP and MCAT physics exam “cheat sheets,” learning based on numerical example data corresponding to each formula. The complete dataset included many well-known laws from mechanics and electromagnetism, which are naturally expressed using concepts such as vectors, forces, and ratios. But instead of directly providing these mathematical abstractions, we only initialized DreamCoder with a more general foundation – a few recursive sequence operation primitives (e.g., map and fold) and basic arithmetic operations – and tested whether it could self-learn a mathematical language applicable to physics.

Indeed, after 8 “wake-sleep” cycles, DreamCoder successfully learned 93% of the laws and identities in the dataset. It first learned the basic building blocks of vector algebra, such as dot product, vector sum, and norm (Figure 7A). Then, it used this mathematical vocabulary to construct concepts underlying multiple physical laws, such as the pattern of “inverse square law,” enabling it to learn Newton’s law of universal gravitation and Coulomb’s law of electrostatic force, thus effectively performing a “basis transformation” from its initial recursive sequence processing language to a physics-like language.

So, could DreamCoder also learn this recursive sequence processing language itself? We initialized the system with only the minimal set of primitives from 1959 Lisp (car, cdr, cons, ...), and asked it to solve 20 basic programming tasks – similar to common exercises in introductory computer science courses. Crucially, the initial language included primitive recursion mechanisms (Y combinator), theoretically allowing the learner to express arbitrary recursive functions, but otherwise provided no ready-made recursive functions; in previous settings, we encapsulated recursion within higher-order functions (e.g., map, fold, etc.) provided as primitives to the learner. With sufficient computational time (approximately five days running on 64 CPUs), DreamCoder learned to solve all 20 problems, and in the process built a library equivalent to modern functional programming idioms, including map, fold, zip, length, and arithmetic operations like generating lists of natural numbers within an interval (see Figure 7B).

All these library functions can be expressed using the higher-order function fold and its dual unfold, both of which are formally the two most basic operations on recursive data types, a discovery known as “origami programming” (42). DreamCoder reproduced this discovery process: it first reinvented fold, then unfold, and then defined all other recursive functions as combinations of fold and unfold.

Discussion

Our research shows that building a general program induction system is possible and feasible: it can learn the necessary expertise to represent and solve new learning tasks in many fundamentally different domains, and continuously improve its capabilities with experience. Optimal expertise in DreamCoder relies on the co-learning of explicit declarative knowledge and implicit procedural skills.

More broadly, DreamCoder’s ability to learn deep explicit representations of domain conceptual structures demonstrates the power of combining symbolic, probabilistic, and neural network learning methods: hierarchical representation learning algorithms can create human-understandable knowledge, unlike traditional deep neural networks; and the symbolic expertise representations it produces are flexible, adapting and expanding with experience, unlike traditional AI expert systems.

In the problems we focus on here, the solution space can be accurately captured through clear symbolic forms, even in domains with other complexities, such as image pixel inputs, exceptions and irregularities in generative text patterns, or the continuous parameters we used in the symbolic regression example. However, much real-world data is far messier than these. A key challenge for future program induction is how to better handle pervasive noise and uncertainty, which will require more reliance on probabilistic and neural network AI methods (5, 43, 44).

Recent research has explored various hybrid neural-symbolic representations for program induction (45–49), and combining these methods with DreamCoder’s library learning and self-guided capabilities could be an important direction for future development.

Extending program induction to the entire field of AI – such as common sense reasoning, natural language understanding, or causal inference – will require significant innovation, but it also holds immense potential. As a learning vehicle, programs have unique advantages: they combine universal expressiveness, data-efficient generalization, and the potential for interpretable, composable reuse.

Today, we can learn not only individual programs but also entire domain-specific languages, which makes another property of programs particularly important: programs express knowledge in a way that both humans and machines can understand.

Recognizing that every AI system is essentially a joint product of human and machine intelligence, we believe the tools demonstrated here lay the foundation for building a truly collaborative human-AI development path.

In the following discussion, we will further explore the broader implications of our work for building better human learning models, and forms of machine learning that are more human-like.

Interface with Biological Learning

DreamCoder’s “wake-sleep” mechanism is inspired by the Helmholtz machine, which itself broadly draws from human learning processes during sleep. DreamCoder introduces a pair of interleaved “sleep” cycle concepts, and interestingly, biological sleep also occurs in multiple stages. Rapid Eye Movement (REM) sleep, or “dream” sleep, is associated with the learning processes that produce implicit procedural skills (11), involving episodic replay and dreaming, similar to the “dream” sleep phase in our model.

Slow-wave sleep, on the other hand, is associated with the formation and consolidation of new declarative abstractions (10), roughly corresponding to the “abstraction” sleep phase in our model. While neither DreamCoder nor the Helmholtz machine were designed to model biological processes, we speculate that our methods may bring “wake-sleep” learning algorithms closer to the learning processes that actually occur during human sleep.

DreamCoder’s knowledge grows progressively, and its dynamic mechanism is related to but differs from earlier developmental theories of “curriculum learning” (50) and “starting small” (51). Unlike curricula where human teachers sequence tasks by difficulty for the system to gradually solve, DreamCoder’s learning is more akin to unsupervised exploration in nature: it attempts to solve randomly sampled tasks, exploring the boundaries of its capabilities in the wake phase, and expanding this boundary in subsequent sleep cycles by using concepts learned from simple tasks to guide the solution of more complex ones.

However, human learning is more active: they can choose which tasks to solve, and even generate tasks themselves – either as intermediate steps towards harder problems, or out of curiosity or aesthetic motivation. Building agents that can automatically generate problems in a human-like way is an important future step.

Our division of domain expertise into explicit declarative knowledge and implicit procedural skills is broadly inspired by “dual-process models” in cognitive science (52, 53) and studies of human experts (37, 38). Human experts not only learn declarative domain concepts that can be expressed in language – for example, artists learn principles of curves, symmetry, and perspective; physicists learn dot products, vector fields, and inverse square laws – but also master procedural (and often implicit) skills for quickly applying these concepts to solve new problems.

These two types of knowledge collectively enable experts to more accurately categorize problems based on their “deep structure” (37, 38) and intuitively judge which concepts might be useful before even beginning to search for a solution. We believe that both types of expertise are essential components for both biological and artificial learning systems, with neural networks and symbolic methods playing complementary roles here.

What Should Be Built In, What Should Be Learned

The goal of learning like humans – especially like children – is often equated by researchers with the goal of learning “from scratch.” They assume, as Turing proposed, that children begin almost as “blank slates”: “like a notebook bought from the stationer’s: very little mechanism, and lots of blank sheets.” (1)

Program induction, as an approach to general AI, also has its roots in this vision, driven by early research showing that, in principle, given a minimal Turing-complete language, it is possible to induce programs capable of solving any computable problem (2, 54–56). DreamCoder’s ability to discover the fundamental vocabularies of functional programming, vector algebra, and physics from first principles can be seen as another step towards this goal.

So, can this method be further extended, not just learning one domain at a time, but developing expertise across multiple different types of problems simultaneously from a minimal foundation? Such progress might rely on cross-domain meta-learning – building a cross-domain library, or “language of thought” (57, 58), much like humans collectively built through biological and cultural evolution, which can then differentiate into infinitely many representations for new problem domains.

While these directions are fascinating, learning so much from such a minimal foundation may not be our best path to artificial intelligence – especially today when we can stand on the shoulders of many giants. Even if “from scratch” is theoretically possible, such methods often face the well-known data hunger problem (e.g., neural networks), or, if not data-dependent, require enormous computational resources: DreamCoder consumed approximately one CPU year just to build “origami-style” functional programming capabilities.

Instead, we draw inspiration from “sketching” approaches to program synthesis (22). Sketching methods typically address single synthesis problems independently and expect human engineers to outline the solution’s skeleton. Similarly, we pre-endow our system with fundamental elements that we believe are crucial for solving problems across multiple diverse domains – a relatively concise but universally powerful set of control flow operators, higher-order functions, and a type system. We then leverage learning mechanisms to build specialized languages on top of these foundations.

The future of learning in program synthesis may lie in systems initialized with richer but still widely applicable resources, such as simulation engines or those embodied in modern programming language standard libraries.

This vision also shapes how we view program induction’s optimal contribution to building more human-like AI: not by pursuing “blank slate” learning, but by learning on top of a rich body of built-in knowledge. Before learning the various domains we discuss here, human children are innately equipped with “core knowledge”: conceptual systems for representing and reasoning about objects, agents, space, and other common-sense concepts (59–61).

We strongly support an AI approach that starts from existing human conceptual resources and builds a human-understandable knowledge system. This may be our best path towards an AI system that exists within the human world and co-evolves with human intelligence.

Original link: https://arxiv.org/pdf/2006.08381

DreamCoder: Growing Generalizable, Interpretable Knowledge with Wake-Sleep Bayesian Program Learning

Share Short URL