https://direct.mit.edu/books/oa-monograph/5299/Active-InferenceThe-Free-Energy-Principle-in-Mind
Active Inference as a Unified Theory of Sentient Behavior
In general we are least aware of what our minds do best.
—Marvin Minsky
This chapter is a deep summary of active inference and a deep comparison of a large number of different theories. (20,000 words)
10.1 Introduction
10.2 Chapter Summaries
10.3 Connecting the Dots: A Unified Perspective of Active Inference
10.4 Predictive Brain, Predictive Mind, and Predictive Processing (Suboptimal)
10.4.1 Predictive Processing
10.5 Perception
10.5.1 Bayesian Brain Hypothesis (Suboptimal)
10.6 Action Control
10.6.1 Ideomotor Theory
10.6.3 Optimal Control Theory
10.7 Utility and Decision
10.7.1 Bayesian Decision Theory (Suboptimal)
10.7.2 Reinforcement Learning (Suboptimal)
10.7.3 Planning as Inference
10.8 Behavior and Bounded Rationality
10.8.1 Free Energy Theory of Bounded Rationality
10.9 Valence, Emotion, and Motivation
10.10 Homeostasis, Allostasis, and Interoceptive Processing
10.11 Attention, Salience, and Epistemic Dynamics
10.12 Rule Learning, Causal Inference, and Fast Generalization
10.13 Active Inference and Other Fields: Open Directions
10.13.1 Social and Cultural Dynamics
10.13.2 Machine Learning and Robotics
10.14 Summary
10.1 Introduction
In this chapter, we **summarize the main theoretical points of active inference** (from the first part of this book) and their practical implementations (from the second part). Then, we **connect these points**: we abstract from the specific active inference models discussed in previous chapters, **focusing on the integrated aspects of the framework**. **One of the benefits of active inference is that it provides a complete solution to the adaptive problems sentient organisms must solve**. Thus, it offers a **unified perspective** on problems such as **perception, action selection, attention, and emotion regulation**, which are often treated separately in psychology and neuroscience and solved using different computational methods in artificial intelligence. We **will discuss these issues (and more) in the context of established theories such as control theory, ideomotor theory of action, reinforcement learning, and optimal control**. Finally, we briefly discuss how the scope of active inference can be extended to cover **other biological, social, and technological topics** not deeply discussed in this book.
10.2 Summary
This book systematically introduces the theoretical foundations and practical implementations of active inference. Here, we briefly summarize the discussions of the previous nine chapters. This provides an opportunity to **review the key constructs of active inference**, which will play a role in the remainder of this chapter.
In **Chapter 1**, we introduced active inference as a **normative approach to understanding sentient beings** that constitute an action-perception cycle with their environment (Fuster 2004). We explained that a normative approach starts from first principles, deriving and testing empirical predictions about the phenomena of interest here, where organisms persist in their environment while engaging in adaptive (action-perception environment interaction cycles).
In **Chapter 2**, we elaborated on the low road of active inference. This path begins with the idea that **the brain is a prediction machine with a generative model: a probabilistic representation of how hidden causes in the world generate sensations** (e.g., how light reflected from an apple stimulates the retina). By **inverting this model, it can infer the causes of its sensations** (e.g., given that my retina is stimulated in a certain way, am I seeing an apple). **This view of perception (aka perception as inference) has historical roots in Helmholtz's concept of unconscious inference and, more recently, the Bayesian brain** hypothesis. Active inference extends this view by **incorporating action control and planning into the realm of inference** (also known as control as inference, planning as inference). Crucially, it shows that **perception and action are not typically separable processes but achieve the same goal**. We first describe this goal more informally as the minimization of the difference between the model and the world (often reduced to minimizing surprise or prediction error). **In short, one can minimize the difference between one's model and the world in two ways: changing one's mind to fit the world (perception) or changing the world to fit the model (action)**. These can be described with Bayesian inference. However, **exact inference is often tricky, so active inference uses (variational) approximations** (note that exact inference may be seen as a special case of approximate inference). This leads to a **second, more formal description of the common goal of perception and action, which is variational free energy minimization**. **This is the core quantity used in active inference**, and it can be decomposed into its components (e.g., **energy and entropy, complexity and accuracy, or surprise and divergence**). Finally, we introduce a **second type of free energy: expected free energy. This is especially important in planning**, as it provides a way to score alternative policies by considering the **future outcomes expected to be generated** by alternative policies. This can also be decomposed into its components (e.g., **information gain and practical value, expected ambiguity and risk**).
In **Chapter 3**, we elaborated on the high road of active inference. This alternative path **begins with the necessity for biological organisms to maintain integrity and avoid dissipation**, which can be described as **avoiding surprising states**. We then **introduced the concept of a Markov blanket**: a **formalization of the statistical separation between the internal states of an organism and the external states of the world**. Crucially, **internal and external states can only influence each other through intermediate (active and sensory) variables (called blanket states)**. This statistical separation mediated by a Markov blanket is **essential for endowing organisms with a degree of autonomy independent of the external world**. To understand why this is a useful perspective, consider the following three consequences.
**First**, **organisms with a Markov blanket appear to model the external environment in a Bayesian sense**: their **internal states** **correspond, on average, to approximate posterior beliefs** about the external states of the world. **Second**, **autonomy is guaranteed by the fact that the organism's model (its internal states) is not unbiased but prescribes some existence preconditions (or prior preferences) that must be maintained**, such as, for a fish, being in water. **Third**, with this formalism, **optimal behavior (relative to prior preferences) can be described as the maximization of (Bayesian) model evidence for perception and action**. By **maximizing model evidence (i.e., self-evidence), the organism ensures it achieves its prior preferences (e.g., a fish staying in water) and avoids surprising states**. In turn, **the maximization of model evidence is mathematically (approximately) equivalent to the minimization of variational free energy**. Thus, we again (in another way) **arrive at the same central construct of active inference discussed in Chapter 2**. Finally, we detail the **relationship between minimizing surprise and Hamilton's principle of least action. This demonstrates the formal relationship between active inference and the first principles of statistical physics**.
In **Chapter 4**, we outlined the formal aspects of active inference. We focused on the transition from Bayesian inference to tractable approximations, namely **variational inference**, and the ultimate goal of organisms to minimize variational free energy through perception and action. This leads to the importance of the **generative (world) model** that organisms use to understand their world. We introduced two generative (world) models, using **discrete or continuous variables to express our beliefs about how data are generated**. We explained that both provide the same active inference, but they are suitable for situations where affairs are formulated in **discrete time (e.g., partially observed Markov decision problems) or continuous time (e.g., stochastic differential equations)**.
In **Chapter 5**, we commented on the **difference between the normative principle of free energy minimization and the process theories of how the brain implements this principle**, and explained that the latter yields testable predictions. Then, we **outlined various aspects of the process theories accompanying active inference, including areas such as neuronal message passing**, including neuroanatomical circuits (e.g., corticothalamic loops) and neuromodulation. For example, at the anatomical level, **message passing maps well onto typical cortical microcircuits, with predictions originating from deeper cortical layers of one area** and targeting superficial cortical layers in the area below (Bastos et al. 2012). At a **more systemic level, we discussed how Bayesian inference, learning, and precision weighting correspond to neuronal dynamics, synaptic plasticity, and neuromodulation, respectively, and how the top-down and bottom-up neural message passing of predictive coding maps to slower (e.g., alpha or beta) and faster (e.g., gamma) brain rhythms**. These and other examples show that after designing a specific active inference model, one can derive **neurobiological implications from the form of its generative world model**.
In **Chapter 6**, we provided a recipe for designing active inference models. We saw that while all organisms minimize their variational free energy, they behave differently, sometimes even contradictorily, **because they are endowed with different generative models. Thus, what distinguishes different (e.g., simpler and more complex) organisms is only their differing generative models**. There is a **rich variety of possible generative models, which correspond to different biological (e.g., neuronal) implementations and produce different adaptive or maladaptive behaviors in different environments and niches**. This makes **active inference equally applicable to characterizing simple organisms, such as bacteria sensing and seeking nutrient gradients**, to complex organisms like us pursuing complex goals and engaging in rich cultural practices, and even different individuals, to the extent that one appropriately characterizes their respective characteristic generative models. Evolution **seems to have discovered increasingly complex brain and body design structures, enabling organisms to process (and shape) rich niches**. Modelers can reverse-engineer this process and, based on the type of niche occupied by the organism of interest, **specify the design of its brain and body in the form of a generative model. This corresponds to a range of design choices (e.g., models using discrete or categorical variables, shallow or hierarchical models)**, which we unpacked in this chapter.
In **Chapters 7 and 8**, we provided a large number of examples of active inference models in discrete and continuous time that address problems such as **perceptual inference, goal-directed navigation, model learning, and action control**. These examples aim to demonstrate the variety of emergent behaviors under these models and to detail how their principles are specified in practice.
In **Chapter 9**, we discussed how active inference can be used for model-based data analysis and to recover parameters of individual generative models, thereby better explaining the behavior of subjects in a task. This computational phenotyping analysis uses the same form of Bayesian inference discussed in the rest of the book but in a different way: it helps design and evaluate (objective) models of others' (subjective) models.
10.3 Connecting the Dots: A Unified Perspective of Active Inference
Decades ago, **philosopher Daniel Dennett lamented that cognitive scientists dedicated too much effort to modeling isolated subsystems (e.g., perception, language comprehension), whose boundaries were often arbitrary**. **He suggested trying to model "the whole iguana": a complete cognitive organism (perhaps a simple one) and the environmental niche it needs to cope with** (Dennett 1978).
**One benefit of active inference is that it provides first principles for how organisms solve their adaptive problems**. The **normative approach pursued in this book assumes that one can start from the principle of variational free energy minimization and derive implications about specific cognitive processes, such as perception, action selection, attention, and emotion regulation, and their neural bases**.
Imagine a simple organism that must solve problems such as finding food or shelter. When formulated as active inference, the organism's problem can be described in positive terms, i.e., taking action to elicit preferred sensations (e.g., food-related sensations). To some extent, **these preferred sensations are encapsulated in its generative model (as prior beliefs)**, and the organism is effectively gathering evidence for its model, or more allegorically, **evidence for its existence (i.e., maximizing model evidence or self-evidence)**. **This simple principle has implications for psychological functions traditionally considered in isolation, such as perception, action control, memory, attention, intention, emotion, and many more**. For example, both perception and action are self-evidencing, in the sense that **an organism can reconcile its expectations (given its generative model) with what it perceives either by changing its beliefs (about the existence of food) or by changing the world** (by pursuing food-related sensations). **Memory and attention can also be considered optimizations of the same goal**. Long-term memory develops by learning the parameters of the generative model. **Working memory is belief updating** when beliefs are about past and future external states. **Attention is the optimization of beliefs about the precision of sensory input**. Planning (and intention) can be conceptualized by appealing to the ability of (certain) organisms to choose among alternative futures, which in turn **requires generative models with temporal depth**. These predict the outcomes that a sequence of actions will produce and are optimistic about those outcomes. This optimism is expressed as a belief that future outcomes will lead to preferred outcomes. **Deep temporal models can also help us understand complex forms of prospection (where current beliefs are used to derive beliefs about the future) and retrospection (where current beliefs are used to update beliefs about the past)**. **Interoceptive regulation and emotion can be conceptualized by appealing to a generative model of internal physiology that predicts the allostatic consequences of future events**.
As the examples above illustrate, studying cognition and behavior from the perspective of a normative theory of affective behavior has **an important consequence. Such a theory does not start by assembling separate cognitive functions, such as perception, decision-making, and planning. Instead, it first provides a complete solution to the problems an organism must solve and then analyzes that solution to derive implications about cognitive functions**. For example, what mechanisms allow an organism or artificial organism (e.g., a robot) to perceive the world, remember the world, or plan (Verschure et al., 2003, 2014; Verschure 2012; Pezzulo, Barsalou et al., 2013; Krakauer et al. 2017)? This is an important **trend that is changing, as the classification of cognitive functions used in psychology and neuroscience textbooks largely inherits from earlier philosophical and psychological theories (sometimes called Jamesian categories). Despite their immense heuristic value, they may be rather arbitrary, or they may not correspond to separate cognitive and neural processes** (Pezzulo and Cisek 2016, Buzsaki 2019, Cisek 2019). Indeed, these Jamesian categories may be candidates for how our generative models explain our engagement with the sensorium—as opposed to explaining that engagement. For example, the solipsistic assumption of "I am perceiving" is just my explanation of the current state of affairs, including my belief updates.
**Adopting a normative perspective may also help identify formal analogies between cognitive phenomena studied in different domains**.
One example is the **exploration-exploitation trade-off**, which arises in various forms (Hills et al. 2015). This trade-off is often studied during foraging, when an organism must choose between exploiting previously successful plans and exploring new (potentially better) ones. **However, the same trade-off also occurs during memory search and deliberation when an organism can choose between exploiting the currently best plan versus investing more time and cognitive effort to explore other possibilities**, when **utilizing limited resources (e.g., time constraints or search effort)**. Characterizing these **seemingly unrelated phenomena with free energy may reveal deep similarities** (Friston, Rigoli et al. 2015; Pezzulo, Cartoni et al. 2016; Gottwald and Braun 2020). Finally, **in addition to a unified perspective on psychological phenomena, active inference also provides a principled approach to understanding the corresponding neural computation**. In other words, it **provides a process theory that links cognitive processing with (expected) neuronal dynamics**. **Active inference hypothesizes that all behavior related to the brain, mind, and behavior can be described by minimizing variational free energy. In turn, this minimization has specific neural characteristics that can be empirically validated (e.g., in terms of message passing or brain anatomy)**. In the remainder of this chapter, we will explore some implications of active inference for mental functions, as if we were outlining a psychology textbook. For each function, we also highlight some connections (or divergences) between active inference and other popular theories in the literature.
10.4 Predictive Brain, Predictive Mind, and Predictive Processing (Suboptimal)
I have this picture of pure joy
it’s of a child with a gun
he’s aiming straight in front of himself,
shooting at something that isn’t there.
—Afterhours, “Quello che non c’è” (Something that isn’t there)
**Traditional theories of brain and cognition emphasize a feedforward transformation from external stimuli to internal representations, and then to motor actions**. This is known as the "sandwich model," because **everything between stimulus and response is labeled "cognition"** (Hurley 2008). From this perspective, the primary function of the brain is to convert incoming stimuli into contextually appropriate responses. **Active inference differs significantly from this view, emphasizing the predictive and goal-directed aspects of the brain and cognition**. In psychological terms, **active inference organisms (or their brains) are probabilistic inference machines** that **continuously generate predictions based on their generative models. Self-evidencing organisms use their predictions in two basic ways**. **First, they compare predictions with incoming data to validate their hypotheses (predictive coding)** and, on slower timescales, modify their models (learning). **Second, they formulate predictions to guide how they collect data (active inference)**. By doing so, active inference organisms **satisfy two necessary conditions: cognitive (e.g., visual exploration exists where salient information can resolve uncertainty about hypotheses or models)** and **pragmatic (e.g., moving to a location where preferred observations (e.g., rewards) can be made safely)**. **Cognitive imperatives activate perceptual and learning processes, while pragmatic imperatives make behavior goal-directed**.
10.4.1 Predictive Processing
**This predictive and goal-centered view of the brain and cognition is closely related to (and has provided inspiration for) predictive processing** (PP): **an emerging framework in philosophy of mind and epistemology that places prediction at the core** of brain and cognition, and appeals to concepts such as the "predictive brain" or "predictive mind" (Clark 2013, 2015; Hohwy 2013). Sometimes, **PP theories appeal to specific functions of active inference and some of its constructs, such as generative models, predictive coding, free energy, precision control, and Markov blankets, but sometimes they appeal to other constructs, such as coupled inverse and coupled forward models, which are not part of active inference. Therefore, the term "predictive processing" has a broader (and less restrictive) meaning compared to active inference**.
Predictive processing theories have attracted widespread attention in **philosophy** because they have the **potential to unify in many senses: across multiple cognitive domains, including perception, action, learning, and psychopathology; from lower-level (e.g., sensorimotor) to higher-level cognitive processing (e.g., mental constructs); from simple biological organisms to brains, individuals, and social and cultural structures**. Another appeal of PP theories is that they use conceptual terms such as belief and surprise, which involve levels of psychological analysis familiar to philosophers (with the caveat that sometimes these terms may have technical meanings different from common usage). However, as interest in PP has grown, it has **become increasingly apparent that philosophers hold different views on its theoretical and epistemological implications. For example, it has been interpreted as internalist** (Hohwy 2013), embodied or action-based (Clark 2015), and enactivist and non-representational (Bruineberg et al. 2016, Ramstead et al. 2019). Debates surrounding these conceptual interpretations are beyond the scope of this book.
10.5 Perception
You can’t depend on your eyes when your imagination is out of focus.
—Mark Twain
**Active inference views perception as an inference process based on a generative model of how sensory observations are produced**. **Bayes' rule essentially inverts the model** to compute beliefs about the hidden states of the environment based on observations. This "perception as inference" idea dates back to Helmholtz (1866) and has often been re-proposed in psychology, computational neuroscience, and machine learning (e.g., integrated analysis) when faced with challenging perceptual problems, such as breaking text-based CAPTCHA (George et al. 2017).
10.5.1 Bayesian Brain Hypothesis (Suboptimal)
The most prominent contemporary expression of this idea is the Bayesian brain hypothesis, which has been applied in multiple domains such as decision-making, sensory processing, and learning (Doya 2007). **Active inference provides a normative basis for these inferential ideas by deriving the requirement for variational free energy minimization**. As the same imperative extends to action dynamics, **active inference naturally models active perception and the way organisms actively sample observations to test their hypotheses** (Gregory 1980). **In contrast, under the Bayesian brain agenda, perception and action are modeled under different imperatives** (where action requires Bayesian decision theory; see section 10.7.1). **More broadly, the Bayesian brain hypothesis refers to a range of approaches that are not necessarily integrated and often make different empirical predictions**. For example, these include computational-level proposals that the brain performs Bayes-optimal sensorimotor and multisensory integration (Kording and Wolpert 2006), algorithmic-level proposals that the brain implements specific approximations of Bayesian inference, such as decision-by-sampling (Stewart et al. 2006), and neural-level proposals about the specific ways neural populations perform probabilistic computations or encode probabilistic distributions, for example, as samples or probabilistic population codes (Fiser et al. 2010, Pouget et al. 2013). **At each level of explanation, there are competing theories in the field. For example, deviations from optimal behavior are often explained by appealing to approximations of exact Bayesian inference, but different works consider different (and not always compatible) approximations, such as different sampling methods. More broadly, the relationships between proposals at different levels are not always direct. This is because Bayesian computations can be implemented (or approximated) in multiple algorithmic ways, even without explicitly representing probabilistic distributions** (Aitchison and Lengyel 2017).
**Active inference provides a more integrated perspective, linking normative principles and process theories**. At the **normative level, its core assumption is that all processes minimize variational free energy**. The corresponding **inferential process theory uses free energy gradient descent, which has clear neurophysiological implications**, explored in Chapter 5 (Friston, FitzGerald et al. 2016). **More broadly, we can derive implications for brain architecture from the principle of free energy minimization. For example, the normative process model of perceptual inference (in continuous time) is predictive coding**. Predictive coding was originally proposed by Rao and Ballard (1999) as a theory of hierarchical perceptual processing to explain a range of documented top-down effects that are **difficult to reconcile with feedforward architectures and known physiological facts** (e.g., the existence of both forward or bottom-up and backward or top-down connections in sensory hierarchies). **However, under certain assumptions, such as the Laplace approximation (Friston 2005), predictive coding can be derived from the principle of free energy minimization**. Furthermore, active inference in continuous time can be constructed as a directed extension of predictive coding into the action domain by endowing predictive coding agents with motor reflexes (Shipp et al. 2013). This brings us to the next point.
10.6 Action Control
If you can’t fly then run, if you can’t run then walk, if you can’t walk then crawl, but whatever you do you have to keep moving forward.
—Martin Luther King
In active inference, action processing is analogous to perceptual processing, as both are guided by forward predictions (exteroceptive and proprioceptive, respectively). It is the (proprioceptive) prediction of "my hand grasping the cup" that triggers the grasping action. **The equivalence between action and perception also exists at the neurobiological level: the motor cortex is structured in the same way as the sensory cortex** as a predictive coding structure, **the difference being that it can influence motor reflexes in the brainstem and spinal cord** (Shipp et al. 2013) and that it receives relatively few ascending inputs. Motor reflexes allow for the control of movement trajectories by setting "equilibrium points" along the desired direction, corresponding to the idea of the **equilibrium point hypothesis** (Feldman 2009). **Importantly, initiating an action (e.g., grasping a cup) requires appropriately regulating the precision (inverse variance) of prior beliefs and sensory input. This is because the relative values of these precisions determine how the organism resolves conflicts between its prior beliefs (it is holding the cup) and its sensory input (indicating it is not holding the cup)**. **Given contradictory sensory evidence, an imprecise prior belief about grasping a cup can easily be revised, leading to a change of mind without taking any action. Conversely, when the prior belief is dominant (i.e., has higher precision), it will be maintained even in the face of conflicting sensory evidence, and the grasping action that resolves the conflict will be triggered**. **To ensure this, action initiation causes a transient sensory attenuation (or reduced weighting of sensory prediction errors). Failure of this sensory attenuation can have maladaptive consequences, such as an inability to initiate or control movements** (Brown et al. 2013).
10.6.1 Ideomotor Theory
In active inference, actions stem from (proprioceptive) predictions, rather than motor commands (Adams, Shipp, and Friston 2013). **This idea links active inference to the ideomotor theory of action: a framework for understanding action control** that dates back to William James (1890) and later to "event coding" and "anticipatory action control" theories (Hommel et al. 2001, Hoffmann 2003). **Ideomotor theory suggests that action-effect associations (akin to forward models) are key mechanisms in cognitive architecture. Importantly, these links can be used bidirectionally. When used in the action-effect direction, they allow for the generation of sensory predictions; when used in the effect-action direction, they allow for the selection of actions that achieve desired perceptual outcomes. This means that actions are selected and controlled based on their predicted outcomes (hence "ideo + motor")**. This anticipatory view of action control is supported by a large body of literature documenting the **influence of (expected) action consequences on action selection and execution** (Kunde et al. 2004). **Active inference provides a mathematical characterization of this idea, which also incorporates other mechanisms, such as the importance of precision control and sensory attenuation, that are not yet fully studied (but compatible with) in ideomotor theory**.
10.6.2 Control Theory
**Active inference is closely related to cybernetic ideas concerning the purposeful, goal-directed nature of behavior and the importance of (feedback-based) agent-environment interactions**, as illustrated by TOTE (Test, Operate, Test, Exit) and related models (Miller et al. 1960; Pezzulo, Baldassarre et al. 2006). In TOTE and active inference, **the selection of actions is determined by the difference between preferred (goal) states and current states. These approaches differ from simple stimulus-response relationships, as often assumed in behaviorist theories and computational frameworks such as reinforcement learning** (Sutton and Barto 1998).
**The concept of action control in active inference is particularly analogous to perceptual control theory (Powers 1973). A central concept of perceptual control theory is that what is controlled are perceptual states, not motor outputs or actions**. When driving, what we control—and keep stable in the face of disturbances—is our reference or desired speed (e.g., 90 mph), as indicated by the speedometer, while the actions we select for this (e.g., accelerating or decelerating) are more variable and context-dependent. For example, depending on disturbances (e.g., wind, steep roads, or other cars), we need to accelerate or decelerate to maintain the reference speed. This view realizes William James's (1890) suggestion that "humans achieve stable goals through flexible means."
While in both active inference and perceptual control theory, it is perception (specifically proprioception) that controls action, **these two theories differ in how control operates. In active inference (unlike perceptual control theory), action control has an anticipatory or feedforward aspect based on a generative model. In contrast, perceptual control theory assumes that feedback mechanisms are largely sufficient for controlling behavior**, and attempting to predict disturbances or impose feedforward (or open-loop) control is deemed futile. However, this objection primarily refers to the limitations of solving inverse control problems using forward models (see next section). **Under active inference, generative or forward models are not used to predict disturbances but to predict future (desired) states and trajectories achieved through action, and to infer the underlying causes of perceptual events**.
Finally, **another important point of connection between active inference and perceptual control theory is how they conceptualize control hierarchies. Perceptual control theory proposes that higher levels control lower levels by setting reference points or set-points** (i.e., **goals** they must achieve), leaving them free to choose the means to achieve these goals, rather than **by setting or biasing the actions (i.e., how to operate) that lower levels must perform**. **This contrasts sharply with most hierarchical and top-down control theories**, where higher levels either directly select plans (Botvinick 2008) or bias the selection of actions or motor commands at lower levels (Miller and Cohen 2001). **Similar to perceptual control theory, in active inference, one can decompose hierarchical control into a (top-down) cascade of goals and subgoals that can be autonomously achieved at appropriate (lower) levels**. Furthermore, in active inference, the contribution of goals represented at different levels of the control hierarchy can be modulated by motivational processes (precision weighting), thereby **prioritizing more salient or urgent goals** (Pezzulo, Rigoli, and Friston 2015, 2018).
10.6.3 Optimal Control Theory
The way active inference explains action control **differs significantly from other control models in neuroscience, such as optimal control theory** (Todorov 2004, Shadmehr et al. 2010). **This framework assumes that the brain's motor cortex uses reactive control policies that map stimuli to responses to select actions. In contrast, active inference assumes that the motor cortex conveys predictions, not commands**. Furthermore, while both optimal control theory and active inference appeal to internal models, **they describe internal models in different ways** (Friston 2011). In optimal control, there is a distinction between two internal models: inverse models encode stimulus-response contingencies and select motor commands (according to some cost function), while forward models encode action-outcome contingencies and provide simulated outcomes to the inverse model as a substitute for noisy or delayed feedback, thereby going beyond purely feedback control schemes. Inverse and forward models can also operate in loops separated from external action-perception (i.e., when inputs and outputs are suppressed) to support internal "what-if" simulations of action sequences. This internal simulation of actions is related to various cognitive functions, such as planning, action perception, and imitation in the social domain (Jeannerod 2001, Wolpert et al. 2003), as well as various motor disorders and psychopathologies (Frith et al. 2000).
**In contrast to forward-inverse modeling schemes, in active inference, the forward (generative) model is responsible for the heavy lifting of action control, while the inverse model is very simple, and often simplifies to simple reflexes solved at the peripheral level (i.e., in the brainstem or brain)**. **Action begins when there is a discrepancy between an expected state and an observed state (e.g., desired vs. current arm position), i.e., a sensory prediction error. This means that motor commands are equivalent to predictions made by the forward model, rather than results calculated by an inverse model as in optimal control**. Sensory (more precisely, proprioceptive) **prediction errors are resolved through action (i.e., arm movement). The gap filled by action is considered so small that it does not require complex inverse models, but much simpler motor reflexes** (Adams, Shipp, and Friston 2013). **The reason motor reflexes are simpler than inverse models is that they do not encode a mapping from inferred world states to actions, but a simpler mapping between actions and sensory outcomes**. See Friston, Daunizeau et al. (2010) for further discussion. **Another key difference between optimal motor control and active inference is that the former uses the concept of a cost or value function to motivate action, while the latter replaces it with the concept of Bayesian priors** (or prior preferences, implicitly contained in expected free energy), as we discuss in the next section.
10.7 Utility and Decision
Action expresses priorities.
—Mahatma Gandhi
**The concept of state cost or value functions is central to many fields, such as optimal motor control, economic theories of utility maximization, and reinforcement learning**. For example, in optimal control theory, the optimal control policy for a task is often defined as the policy that minimizes a specific cost function (e.g., smoother or with minimal jerk). In reinforcement learning problems, such as navigating a maze containing one or more rewards, the optimal policy is one that maximizes (discounted) rewards while minimizing movement costs. These problems are often solved using the Bellman equation (or the Hamilton-Jacobi-Bellman equation in continuous time), the general idea being that the problem of decision-making can be decomposed into two parts: immediate reward and the value of the remaining portion of the decision problem. This decomposition provides the iterative process of **dynamic programming**, which is central to control theory and reinforcement learning (RL) (Bellman 1954).
**Active inference differs from the above approaches in two main ways**. **First**, active inference does not only consider utility maximization, but a broader goal of (expected) free energy minimization, **which also includes other (epistemic) imperatives, such as disambiguating current states and seeking novelty** (see Figure 2.5). These additional goals are sometimes added to classic rewards, for example, as "novelty rewards" (Kakade and Dayan 2002) or "intrinsic rewards" (Schmidhuber 1991, Oudeyer et al. 2007, Baldassarre and Mirolli 2013, Gottlieb et al. 2013), **but they emerge automatically in active inference**, allowing it to solve the exploration-exploitation balance. **The reason is that free energy is a function of beliefs, meaning we are in the domain of belief optimization, not external reward functions. This is crucial for exploratory problems**, **where success depends on resolving as much uncertainty as possible**.
**Second**, in active inference, the concept of cost is absorbed into priors. **Priors (or prior preferences) specify control objectives, such as a trajectory to follow or an endpoint to reach. Using priors to encode preferred observations (or sequences) can be more expressive than using utilities** (Friston, Daunizeau, and Kiebel 2009). Using this approach, **finding an optimal policy is redefined as an inference problem** (a sequence of control states that realize preferred trajectories), **and does not require value functions or Bellman equations**, although similar methods such as recursive logic (Friston, Da Costa et al. 2020) can be appealed to. **There are at least two fundamental differences between how prior functions and value functions are typically used in active inference and reinforcement learning. First**, reinforcement learning methods use value functions for states or state-action pairs, while active inference uses priors over observations. **Second**, value functions are defined in terms of expected returns from following a particular policy, i.e., the sum of future (discounted) rewards obtained from starting in that state and then executing the policy. **In contrast, in active inference, priors typically do not sum future rewards**, nor do they discount them. Instead, **something analogous to expected returns only arises in active inference when expected free energy is reached**. This means that expected free energy is the closest analogue to a value function. However, even then, expected free energy is also different in that it is **a functional of beliefs about states, not a function of states**. That being said, it is possible to construct priors that resemble value functions for states in RL, for example, by caching expected free energy computations in those states (Friston, FitzGerald et al. 2016; Maisto, Friston, and Pezzulo 2019).
Furthermore, the **absorption of the concept of utility into priors has an important theoretical consequence: priors act as goals**, and **bias the generative model**—or are optimistic, in the sense that the organism believes it will encounter better outcomes. It is this optimism that allows inferred plans to reach expected outcomes in **active inference**; a failure of this optimism may correspond to apathy (Hezemans et al. 2020). **This contrasts sharply with other formal decision-making approaches, such as Bayesian decision theory, which separate the probabilities of events from their utilities**. That being said, this **distinction is somewhat superficial**, as utility functions can always be rewritten to encode prior beliefs, which is consistent with the fact that behavior that maximizes a utility function is a priori (by design) more likely. From a deflationary perspective (a slight deviation from logic), this is the definition of utility.
10.7.1 Bayesian Decision Theory (Suboptimal)
Bayesian decision theory is a mathematical framework that extends the ideas of the Bayesian brain (as described above) to the domains of decision-making, sensorimotor control, and learning (Kording and Wolpert 2006, Shadmehr et al. 2010, Wolpert and Landy 2012). Bayesian decision theory describes decision-making according to two different processes. The first process uses Bayesian computation to predict the probabilities of future (action or policy-dependent) outcomes, and the second process uses a (fixed or learned) utility or cost function to define preferences for plans. The final decision (or action selection) process integrates the two streams, thereby choosing (with higher probability) action plans that have a higher probability of yielding higher rewards. **This contrasts sharply with active inference, where prior distributions directly indicate what is valuable to the organism** (or what has been valuable in evolutionary history). However, the two streams of Bayesian decision theory can be drawn into analogy with the optimization of variational free energy and expected free energy, respectively. Under active inference, the minimization of variational free energy provides accurate (and simple) beliefs about the state of the world and its possible evolution. The prior belief is that, through policy selection, expected free energy will be minimized, which encapsulates the notion of preference.
In some circles, there are concerns about the status of Bayesian decision theory. This arises from the complete class theorem (Wald 1947, Brown 1981), which states that for any given decision and cost function pair, there exists some prior belief that makes the Bayesian decision optimal. This means there is an implicit duality or degeneracy when treating prior beliefs and cost functions separately. In a sense, **active inference resolves this degeneracy by absorbing the utility or cost function into prior beliefs in the form of preferences**.
10.7.2 Reinforcement Learning (Suboptimal)
Reinforcement learning (RL) is a popular method for solving Markov decision problems in artificial intelligence and cognitive science (Sutton and Barto 1998). It focuses on how an agent **learns a policy through trial and error**: by **trying actions** (e.g., moving left) and receiving positive or negative reinforcement based on whether the action succeeded (e.g., balancing a pole) or failed (e.g., the pole falling).
Active inference and reinforcement learning address a range of overlapping problems, **but differ mathematically and conceptually in many ways**. As discussed above, active inference dispenses with the concepts of reward, value functions, and Bellman optimality, which are central to reinforcement learning methods. Furthermore, the concept of policy is used differently in the two frameworks. In reinforcement learning, a policy represents a set of stimulus-response mappings that need to be learned. **In active inference, a policy is part of the generative model: it represents a sequence of control states that need to be inferred**.
There are many reinforcement learning methods, but they can be broadly categorized into three main classes. The first two methods attempt to learn good (state or state-action) value functions, albeit in two different ways.
**Model-free RL methods learn value functions directly from experience**: they perform actions, collect rewards, update their value functions, and use them to update their policies. They are called model-free because they **do not use (transition) models that allow predicting future states**, similar to those used in active inference. Instead, they implicitly appeal to simpler models (e.g., state-action mappings). Learning value functions in model-free reinforcement learning often involves computing reward prediction errors, as in the popular temporal difference rules. While active inference often involves prediction errors, these are state prediction errors (as there is no concept of reward in active inference).
Model-based reinforcement learning methods do not learn value functions or policies directly from experience. Instead, they learn a model of the task from experience, use that model to plan (simulate possible experiences), and update value functions and policies based on these simulated experiences. **While both active inference and reinforcement learning accommodate model-based planning, they are used differently. In active inference, planning is a means of computing expected free energy for each policy, rather than updating a value function**. Arguably, if expected free energy is viewed as a value function, then the inferences derived using a generative model can be said to update that function, thereby providing **a point of analogy** between these methods.
The third family of reinforcement learning methods is policy gradient methods, which attempt to optimize policies directly, without the need for intermediate value functions, which are central to model-based and model-free reinforcement learning. These methods start with a parameterized policy capable of generating (e.g.) motor trajectories, and then optimize them by modifying parameters to increase (decrease) the likelihood of the policy when the trajectory leads to high (low) positive rewards. This approach connects policy gradient methods with active inference, which also dispenses with value functions (Millidge 2019). However, the overall objective of policy gradients (maximizing long-term cumulative reward) differs from active inference.
**In addition to the formal differences between active inference and reinforcement learning, there are some important conceptual differences. One distinction** lies in **how these two methods explain goal-directed and habitual behavior**. In the animal learning literature, goal-directed choice is mediated by (prospective) knowledge of the contingency between an action and its outcome (Dickinson and Balleine 1990), while habitual choice is not prospective but relies on simpler (e.g., stimulus-response) mechanisms. A popular view in reinforcement learning is that goal-directed and habitual choices correspond to model-based and model-free reinforcement learning, respectively, and that these choices are acquired in parallel and constantly compete to control behavior (Daw et al. 2005).
**In contrast, active inference maps goal-directed and habitual choices to different mechanisms. In active inference (discrete time), policy selection is inherently model-based, and thus conforms to the definition of goal-directed deliberative choice**. This is similar to what happens in model-based reinforcement learning, but with a difference. In model-based reinforcement learning, actions are selected in an anticipatory manner (using a model) but controlled in a reactive manner (using stimulus-response policies); in active inference, actions can be controlled in an active manner by realizing proprioceptive predictions (regarding action control, see Section 10.6).
**In active inference, habits can be acquired by executing goal-directed policies and then caching information about which policies were successful in which contexts**. **Cached information can be incorporated as priors over policies** (Friston, FitzGerald et al. 2016; Maisto, Friston, and Pezzulo 2019). **This mechanism allows policies with higher prior values (in a given context) to be executed without deliberation**. This can **simply be thought of as observing "what I did" through repeated engagement in a task and learning "I am the kind of organism that tends to do this"**. Unlike model-free reinforcement learning, where habits are acquired independently of goal-directed policy selection, **in active inference, habits are acquired by repeatedly pursuing goal-directed policies (e.g., by caching their outcomes)**.
**In active inference, goal-directed and habitual mechanisms can cooperate rather than merely compete. This is because prior beliefs about policies depend on both habitual terms (prior values for policies) and deliberative terms (expected free energy). Hierarchical elaborations of active inference suggest that reactive and goal-directed mechanisms can be organized hierarchically, rather than as parallel pathways** (Pezzulo, Rigoli, and Friston 2015).
Finally, it is worth noting that active inference and reinforcement learning have subtle differences in how they view behavior and its causes. **Reinforcement learning, derived from behaviorist theories, views behavior as the result of trial-and-error learning mediated by reinforcement. In contrast, active inference posits that behavior is the result of inference. This brings us to the next point**.
10.7.3 Planning as Inference
**Just as perceptual problems can be transformed into inference problems, control problems can also be transformed into (approximate) Bayesian inference** (Todorov 2008). Consistent with this, **in active inference, planning is treated as an inference process: inferring a sequence of control states for a generative model. This idea is closely related to other approaches, including control as inference** (Rawlik et al. 2013, Levine 2018), **planning as inference** (Attias 2003, Botvinick and Toussaint 2012), **and risk-sensitive and KL control** (Kappen et al. 2012). **Planning is performed by inferring posterior distributions over actions or action sequences using a dynamic generative model that encodes probabilistic contingencies between states, actions, and future (expected) states**. Optimal actions or plans can be inferred by observing conditional generative models of future returns (Pezzulo and Rigoli 2011, Solway and Botvinick 2012) or optimal future trajectories (Levine 2018). For example, future expected states in the model can be fixed (i.e., fix their values), and then action sequences more likely to bridge the gap from current states to future expected states can be inferred.
Active inference, planning as inference, and other related schemes use **a form of prospective control that starts from an explicit representation of future states to be observed, rather than from a set of stimulus-response rules or policies**, which are more common in optimal control theory and reinforcement learning. However, the specific implementations of control and planning as inference differ in at least **three dimensions: what form of inference they use (e.g., sampling or variational inference), what they infer (e.g., posterior distributions over actions or action sequences), and the objective of inference (e.g., maximizing the marginal likelihood of optimal conditions or the probability of obtaining rewards)**.
**Active inference takes a unique perspective on each dimension**.
**First**, it uses a scalable approximation scheme (variational inference) to solve the challenging computational problems that arise in planning as inference. **Second**, it provides **model-based planning**, or posterior inference over control states, corresponding to action sequences or policies, rather than individual actions. **Third**, to infer action sequences, active inference considers the expected free energy functional, which **mathematically encompasses other widely used planning as inference schemes (e.g., KL control) and can handle ambiguous situations** (Friston, Rigoli et al. 2015).
10.8 Behavior and Bounded Rationality
The wise are instructed by reason, average minds by experience, the stupid by necessity and the brute by instinct.
—Marcus Tullius Cicero
**Behavior in active inference automatically combines multiple components: deliberative, persistent, and habitual** (Parr 2020). Imagine a person walking towards a store near her home. If she anticipates the consequences of her actions (e.g., turning left or right), **she can formulate a good plan to reach the store. This deliberative behavior is provided by expected free energy, which is minimized when one acts in a way that elicits preferred observations (e.g., at the store)**. Note that expected free energy **also includes the drive to reduce uncertainty, which can be manifested in deliberation. For example, if the person is unsure of the best direction, she can move to an appropriate vantage point from which she can easily find the way to the store, even if it means a longer route**. In short, her plan acquires cognitive affordance.
If the person is less capable of deliberation (e.g., because she is **distracted), she may continue walking after reaching the store. This persistence of behavior is provided by variational free energy, which is minimized when one collects observations consistent with current beliefs (including beliefs about the current ongoing behavior). The sensory and proprioceptive observations collected by the person provide evidence for "walking," and thus persistence can be decided without deliberation**.
Finally, when the person is less capable of deliberation, another thing she can do is choose her usual plan for going home, without needing to deliberate about it. This **habitual component is provided by the prior value of the policy**. **This may assign a high probability to the plan of going home; she observes herself having made this plan many times in the past, and it may become dominant without further deliberation**.
Note that **the deliberative, persistent, and habitual aspects of behavior coexist and can be combined in active inference. In other words, we can infer that, in this situation, a habit is the most likely course of action. This differs from "dual theories," which assume we are driven by two independent systems, one rational and one intuitive** (Kahneman 2017). **The mix of deliberative, persistent, and habitual aspects of behavior seems to depend on contextual conditions, such as the amount of experience and cognitive resources one can invest in deliberative processes, which may have high complexity costs**.
The impact of cognitive resources on decision-making has been extensively studied under the framework of bounded rationality (Simon 1990). The core idea is that while an ideal rational agent should always fully consider the consequences of its actions, **a bounded rational agent must balance the costs, effort, and timeliness of computation, for example, the information processing costs of deliberating the optimal plan** (Todorov 2009, Gershman et al. 2015).
10.8.1 Free Energy Theory of Bounded Rationality
Bounded rationality is expressed in terms of Helmholtz free energy minimization: a thermodynamic construct strictly **related to the concept of variational free energy used in active inference**; for details, see Gottwald and Braun (2020). **The "free energy theory of bounded rationality" elaborates on the trade-off between action selection and limited information processing capabilities in terms of two components of free energy: energy and entropy, respectively. The former represents the expected value of the choice (accuracy term), and the latter represents the cost of deliberation (complexity term)**. During deliberation, what is costly is reducing the entropy (or complexity) of beliefs before making choices more precise (Ortega and Braun 2013, Zénon et al. 2019). **Intuitively, choices with more precise posterior beliefs will be more accurate (and potentially yield higher utility), but because increasing the precision of beliefs comes at a cost**, bounded decision-makers must find a **compromise** by minimizing free energy. **The same trade-off also appears in active inference, thereby yielding a form of bounded rationality**. The concept of bounded rationality also resonates with the use of evidence variational bounds (or marginal likelihood), which is a defining aspect of active inference. **In summary, active inference provides a model of (bounded) rationality and optimality, where the best solution to a given problem arises from a compromise between complementary objectives: accuracy and complexity. These objectives stem from normative (free energy minimization) requirements, which are richer than the classic objectives typically considered in economic theory (e.g., utility maximization)**.
10.9 Valence, Emotion, and Motivation
Consider your origins: you were not made to live as brutes, but to follow virtue and knowledge.
—Dante Alighieri
Active inference focuses on (negative) free energy as a measure of adaptiveness and an organism's ability to achieve its goals. While active inference proposes that organisms act to minimize their free energy, **this does not mean they must compute it. In general, processing the gradient of free energy is sufficient. By analogy, we do not need to know our altitude to find the top of a mountain, but only to walk uphill**. However, some suggest that organisms can model how their free energy changes over time. Proponents of this hypothesis argue that it may allow for the representation of phenomena such as **valence, emotion, and motivation**.
According to this view, it has been proposed that **emotional valence, or the positive or negative character of emotions**, can be **regarded as the rate of change of free energy over time (first derivative)** (Joffily and Coricelli 2013).
Specifically, when an organism's free energy increases over time, it may assign a negative valence to the situation; conversely, when its free energy decreases over time, it may assign it a positive valence. **Extending this line of thought to the long-term dynamics of free energy (and second derivatives), complex emotional states might be described**; for example, relief as a transition from low to high valence, or disappointment as a transition from high to low valence. **Monitoring free energy dynamics (and the emotional states they elicit) might allow for adjusting behavioral strategies or learning rates based on long-term environmental statistics**.
**The assumption that a second generative model is used to monitor the free energy of the first generative model seems somewhat a leap. However, these ideas can also be explained in another way. An interesting formalization of these views lies in considering what causes rapid changes in free energy. Since it is a function of beliefs, rapid changes in free energy must be due to rapid belief updates. A key determinant of this speed is precision**, which acts as a time constant in the dynamics of predictive coding. Interestingly, **this relates to the concept of higher-order derivatives of free energy, as precision is the negative of the second derivative (i.e., the curvature of a free energy landscape)**. However, this raises a **question: why should we associate precision with valence. The answer comes from noting that precision is inversely related to ambiguity**. **The more precise something is, the less ambiguous its interpretation. Choosing an action plan that minimizes expected free energy also means minimizing ambiguity, thereby maximizing precision**. Here, we see a **direct association between higher-order derivatives of free energy, its rate of change, and motivational behavior**.
**Expectations of free energy (increasing or decreasing) may also play a motivating role and incentivize behavior**. In active inference, an agent's **expectations about changes in free energy (increase or decrease) are precisions over policy beliefs. This again highlights the importance of second-order statistics**. For example, **highly precise beliefs indicate that one has found a good policy, i.e., one that can be confidently expected to minimize free energy**. Interestingly, **the precision of policy (beliefs) is related to dopamine signaling** (FitzGerald, Dolan, and Friston 2015). From this perspective, **stimuli that increase the precision of policy beliefs trigger dopamine bursts, which may indicate their motivational salience** (Berridge 2007). This view may help **elucidate the neurophysiological mechanisms linking expectations of goal or reward achievement with increased attention** (Anderson et al. 2011) **and motivation** (Berridge and Kringelbach 2011).
10.10 Homeostasis, Allostasis, and Interoceptive Processing
There is **more wisdom in your body than in your deepest philosophy.**
—Friedrich Nietzsche
**Biological generative models are not only about the external world, but also—perhaps more importantly—about the internal environment. The generative model of the body's interior (or interoceptive schema) has a dual role: explaining how interoceptive (bodily) sensations are produced, and ensuring the proper regulation of physiological parameters** (Iodice et al. 2019), such as body temperature or blood sugar levels. Cybernetic theories (mentioned in Section 10.6.2) hypothesize that the central goal of an organism is to maintain homeostasis (Cannon 1929)—ensuring physiological parameters remain within viable ranges (e.g., body temperature never gets too high)—and **homeostasis can only be achieved through successful control of the environment** (Ashby 1952). **This form of homeostatic regulation can be implemented in active inference by specifying the viable range of physiological parameters as priors for interoceptive observations. Interestingly, homeostatic regulation can be achieved in multiple nested ways**. The simplest regulatory loop is the **engagement of autonomic reflexes** when certain parameters are (expected to be) out of range (e.g., when body temperature is too high). **This autonomic control can be construed as interoceptive inference: an active inference process running on interoceptive streams rather than proprioceptive streams**, as in the case of externally directed actions (Seth et al. 2012, Seth and Friston 2016, Allen et al. 2019). For this, the brain can use a generative model to predict interoceptive and physiological streams and trigger autonomic reflexes to correct interoceptive prediction errors (e.g., surprisingly high body temperature). This is analogous to how motor reflexes are activated to correct proprioceptive prediction errors and guide externally directed actions.
**Active inference goes beyond simple autonomic loops: it can correct the same interoceptive prediction errors (high body temperature) in increasingly complex ways** (Pezzulo, Rigoli, and Friston 2015). It **can use predictive allostatic strategies** (Sterling 2012, Barrett and Simmons 2015, Corcoran et al. 2020), **going beyond homeostasis to proactively control physiology in an allostatic manner before interoceptive prediction errors are triggered, for example, seeking shade before overheating**. Another predictive strategy requires **mobilizing resources in anticipation of physiological set-point deviations, for example, increasing cardiac output before a long run in anticipation of increased oxygen demand. This requires dynamically modifying the priors of interoceptive observations**, going beyond homeostasis (Tschantz et al. 2021). Ultimately, the predictive brain can formulate **complex goal-directed strategies, such as ensuring cold water is brought to the beach, to meet the same requirement (controlling body temperature) in richer, more effective ways**.
Biological and interoceptive regulation may be crucial for **affect and emotional processing** (Barrett 2017). During contextual interaction processes, **the brain's generative models not only constantly predict what will happen next, but also the interoceptive and allostatic consequences. Interoceptive streams, triggered during perception of external objects and events, imbue them with an affective dimension, indicating how good or bad they are for the organism's allostasis and survival, thereby making them "meaningful"**. **If this view is correct, then impairments in such interoceptive and allostatic processing may lead to alexithymia and various psychopathological conditions** (Pezzulo 2013; Barrett et al. 2016; Barca et al. 2019; Pezzulo, Maisto et al. 2019).
**Interoceptive inference has an emerging counterpart: affective inference**. In this application of active inference, **emotions are considered part of the generative model: they are simply another construct or hypothesis that the brain uses to deploy precision in deep generative models**. From the perspective of belief updating, this means that **anxiety is simply a commitment to the Bayesian belief "I am anxious," which best explains the pervasive sensory and interoceptive queue**. From the perspective of acting, the ensuing (interoceptive) predictions augment or attenuate various precisions (i.e., covert action) or enslave autonomic responses (i.e., overt action).
This might look a lot like arousal, confirming the hypothesis of "I am anxious." Typically, affective inference requires domain-general belief updating, absorbing information from both interoceptive and exteroceptive sensory streams, thus there is an **intimate relationship between emotion, interoception, and attention in health and disease** (Seth and Friston 2016; Smith, Lane et al. 2016).
10.11 Attention, Salience, and Epistemic Dynamics
True ignorance is not the absence of knowledge, but the refusal to acquire it.
—Karl Popper
**Given that we have mentioned precision and expected free energy multiple times in this chapter alone, it would be remiss not to dedicate some space to attention and salience. These concepts recur throughout psychology and have been redefined and reclassified multiple times**. Sometimes, these terms refer to **synaptic gain control mechanisms** (Hillyard et al. 1998), **which prioritize certain sensory modalities or subsets of channels within a modality**. Sometimes they refer to how we orient ourselves through overt or covert actions to **gain more information about the world** (Rizzolatti et al. 1987; Sheliga et al. 1994, 1995).
Although **the multiple meanings of attention bring uncertainty**, proving some cognitive appeal of this research area, resolving the resulting ambiguity also has value. One thing that a formal view of psychology offers is that we do not need to worry about this ambiguity. We can **operationally define attention as precision associated with certain sensory inputs. This maps neatly to the concept of gain control, because we infer that more precise sensations will have a greater impact on belief updating than sensations inferred to be imprecise. The structural validity of this association has been demonstrated through psychological paradigms, including the famous Posner paradigm** (Feldman and Friston 2010). Specifically, responses to stimuli at locations in visual space afforded a higher precision are faster than responses to stimuli at other locations.
This leads to the term "salience" requiring a similar formal definition. Typically, in active inference, we **associate salience with expected information gain (or epistemic value): a component of expected free energy**. **Intuitively, something is more salient when we expect it to yield more information. However, this defines the salience of actions or policies**, whereas **attention is an attribute of beliefs about sensory input**. This aligns with salience as a concept of overt or covert orientation. In **Chapter 7, we saw that we can further subdivide expected information gain into salience and novelty. The former is the potential for inference, the latter is the potential for learning**. An analogy to express the difference between attention and salience (or novelty) is the design and analysis of scientific experiments. **Attention is the process of selecting the highest quality data from what we have already measured and using that data to inform our hypothesis testing**. **Salience is the design of the next experiment to ensure the highest quality data**.
We are not detailing this issue simply to add another reclassification of the phenomenon of attention to the literature, but rather to **emphasize an important advantage of committing to formal psychology. Under active inference, it doesn't matter if others define attention (or any other construct) differently, because we can simply refer to the mathematical construct in question and exclude any confusion**. A final point to consider is that these **definitions provide a simple explanation for why attention and salience are often conflated. Highly precise data rarely have ambiguity. This means they should be attended to, and actions to acquire such data are highly salient** (Parr and Friston 2019a).
10.12 Rule Learning, Causal Inference, and Fast Generalization
Yesterday I was clever, so I wanted to change the world. Today I am wise, so I am changing myself.
—Rumi
Humans and other animals excel at making complex causal inferences, learning causal relationships between abstract concepts and objects, and generalizing from limited experience, unlike current machines that require vast numbers of examples to achieve similar performance. This disparity suggests that current machine learning methods, primarily based on complex pattern recognition, may not fully capture how humans learn and think (Lake et al. 2017).
The active inference learning paradigm is based on the development of generative models that capture causal relationships between actions, events, and observations. In this book, we considered relatively simple tasks (e.g., the T-maze example in Chapter 7) that require uncomplicated generative models. In contrast, understanding and inferring complex situations requires deep generative models to capture the underlying structure of the environment, such as hidden rules that allow for generalization across many apparently different situations (Tervo et al. 2016; Friston, Lin et al. 2017).
A simple example of hidden rules governing complex social interactions is a traffic intersection. Imagine a naive person observing a busy intersection and having to predict (or explain) when pedestrians or cars will cross. One could accumulate statistics about co-occurring events (e.g., a red car stops, a tall man crosses; an old woman stops, a large truck passes), but most would ultimately be useless. One might eventually discover some recurring statistical patterns, such as pedestrians crossing shortly after all cars have stopped at a certain point in the road. If the task is simply to predict when a pedestrian is about to walk, such a determination would suffice in a machine learning setting, but without any understanding of the situation. Indeed, this could even lead to false conclusions: the stopping of cars explains the pedestrian's movement. Such errors are common in machine learning applications that do not appeal to (causal) models and cannot distinguish whether rain explains wet grass or wet grass explains rain (Pearl and Mackenzie 2018).
On the other hand, inferring the correct hidden (e.g., traffic light) rules provides a deeper understanding of the situation's causal structure (e.g., the traffic light causes cars to stop and pedestrians to walk). Hidden rules not only offer better predictive power but also make inference more concise, as they can abstract away most sensory details (e.g., the color of the car). In turn, this allows for generalization to other situations, such as different intersections or cities, where most sensory details differ significantly—with the caveat that facing an intersection in some cities like Rome might require more than just looking at traffic lights. Finally, understanding traffic light rules also enables more efficient learning in new situations, or the development of what is called a "learning set" in psychology or **learning to learn** in machine learning (Harlow 1949). When faced with an intersection where traffic lights are off, one cannot use the learned rules, but might expect another similar hidden rule to be at play, which can help understand what a traffic police officer is doing.
As this simple example illustrates, **learning generative models of the rich underlying structure of the environment (aka structural learning) can provide complex forms of causal inference and generalization. Extending generative models to address these complex situations is an ongoing goal of computational modeling and cognitive science** (Tenenbaum et al. 2006, Kemp and Tenenbaum 2008). Interestingly, there is a tension between current machine learning trends (the general idea being "bigger is better") and the statistical approach of active inference, which **emphasizes the importance of balancing a model's accuracy with its complexity and accuracy**, tending towards simpler models. Model reduction (and pruning unnecessary parameters) is not merely a way to avoid wasted resources; it is also an effective method for learning hidden rules, including during offline periods such as sleep (Friston, Lin et al. 2017), perhaps manifesting in resting-state activity (Pezzulo, Zorzi, and Corbetta 2020).
10.13 Active Inference and Other Fields: Open Directions
It has to start somewhere, it has to start sometime,
what better place than here? What better time than now?
—Rage Against the Machine, “Guerrilla Radio”
In this book, we **primarily focused on active inference models that address biological problems of survival and adaptation. However, active inference can be applied to many other areas**. In this final section, we briefly discuss two such areas: **social and cultural dynamics** and **machine learning and robotics**.
Addressing the former requires considering **how multiple active inference agents interact and the emergent effects of such interaction**. Addressing more complex problems but in a way that is compatible with the basic assumptions of the theory. Both are interesting open research directions.
10.13.1 Social and Cultural Dynamics
Many interesting aspects of our (human) cognition are related to social and cultural dynamics, rather than individualistic perception, decision, and action (Veissière et al. 2020). By definition, social dynamics require **multiple active inference organisms participating in physical interactions** (e.g., joint actions, such as playing team sports) or more abstract interactions (e.g., elections or social networks). Simple demonstrations of inference among identical organisms have yielded interesting emergent phenomena, such as the self-organization of simple life forms to resist dispersion, the possibility of engaging in morphogenetic processes to acquire and restore body forms, and mutually coordinated predictions and turn-taking (Friston 2013; Friston and Frith 2015a; Friston, Levin et al. 2015). Other simulation studies have explored how organisms can extend their cognition to material artifacts and shape their cognitive niches (Bruineberg et al. 2018).
These simulations capture only a fraction of our social and cultural dynamics, but they illustrate the potential of active inference to extend from individual science to social science—and how cognition extends beyond our heads (Nave et al. 2020).
10.13.2 Machine Learning and Robotics
The generative modeling and variational inference methods discussed in this book are widely applied in machine learning and robotics. In these fields, the focus is often on how to learn (connectionist) generative models rather than how to use them for active inference, which is the focus of this book. This is interesting because machine learning methods may help broaden the complexity of generative models and the problems considered in this book, **but with the caveat that they may require very different active inference process theories**.
While it is impossible to review the vast literature on machine learning generative models here, we briefly mention some of the most popular models from which many variants have been developed. Two early connectionist generative models, the Helmholtz machine and the Boltzmann machine (Ackley et al. 1985, Dayan et al. 1995), provided paradigms for **how to learn internal representations of neural networks in an unsupervised manner**. The Helmholtz machine is particularly relevant to active inference's variational approach because it uses separate recognition and generative networks to infer distributions over hidden variables and sample from them to obtain fictional data. The practical success of these early methods was limited. But later, the possibility of stacking multiple (restricted) Boltzmann machines, capable of learning multi-layered internal representations, was one of the early successes of unsupervised deep neural networks (Hinton 2007).
**Two recent examples of connectionist generative models, variational autoencoders or VAEs** (Kingma and Welling 2014), **and generative adversarial networks or GANs** (Goodfellow et al. 2014), are widely used in machine learning applications, such as recognizing or generating pictures and videos. VAEs embody an elegant application of variational methods in generative network learning. Their learning objective, the Evidence Lower Bound (ELBO), is mathematically equivalent to variational free energy. This objective enables learning accurate descriptions of data (i.e., maximizing accuracy), but also favors internal representations that do not differ too much from the prior (i.e., minimizing complexity). This latter objective acts as a so-called regularizer, helping generalization and preventing overfitting.
GANs follow a different approach: they combine two networks, a generative network and a discriminative network, which constantly compete during the learning process. The discriminative network learns to distinguish whether example data generated by the generative network is real or fictional. The generative network tries to generate fictional data that deceives the discriminative network (i.e., is misclassified). The competition between these two networks forces the generative network to improve its generative capabilities and produce high-fidelity fictional data. This capability has been widely used to generate realistic images, among other things.
The generative models mentioned above (and others) can be used for control tasks. For example, Ha and Eck (2017) used a (sequence-to-sequence) VAE to learn to predict pencil strokes. By sampling from the VAE's internal representations, the model can construct novel stroke-based drawings. Boltzmann machines were capable of learning multi-layered internal representations and were one of the early successes of unsupervised deep neural networks (Hinton 2007). Generative modeling methods have also been used to control robot movements. Some of these methods use active inference (Pio-Lopez et al. 2016, Sankrithi et al. 2020, Syria et al. 2021) or closely related ideas, but in a connectionist context (Ahmadi and Tani 2019, Tani and White 2020).
One of the main challenges in this field is that **robot motion is high-dimensional and requires (learning) complex generative models. An interesting aspect of active inference and related methods is that the most important thing to learn is the forward mapping between actions and sensory (e.g., visual and proprioceptive) feedback at the next time step**. This forward mapping can be learned in various ways: through autonomous exploration, through demonstration, or even through direct interaction with humans, for example, a teacher (experimenter) guiding the robot's hand along a trajectory to a target, thereby building an effective acquisition of goal-directed actions (Yamashita and Tani 2008). The possibility of learning generative models in various ways greatly expands the range of skills that robots can ultimately achieve. In turn, the possibility of developing more advanced (neuro-)robots using active inference is important not only technically but also theoretically. Indeed, some key aspects of active inference, such as adaptive agents interacting with the environment, the integration of cognitive functions, and the importance of embodiment, are naturally addressed in robotic settings.
10.14 Summary
Home is behind, the world ahead,
and there are many paths to tread
through shadows to the edge of night,
until the stars are all alight.
—J. R. R. Tolkien, The Lord of the Rings
We **began this book with a question: is it possible to understand the brain and behavior from first principles**. We then introduced active inference as a candidate theory to address this challenge. We hope the reader is convinced that the answer to our initial question is yes. In this chapter, we considered the **unified perspective that active inference provides for perceptual behavior, and the implications of this theory for familiar psychological constructs (e.g., perception, action selection, and emotion)**. This allowed us to revisit concepts introduced throughout the book and remind ourselves of interesting questions still open for future research. We hope this book provides a useful supplement to relevant works on active inference, including those in philosophy (Hohwy 2013, Clark 2015) on the one hand, and physics (Friston 2019a) on the other.
We have now reached the end of our journey. Our goal was to provide an introduction to those interested in using these methods, whether conceptually or formally. However, it **is important to emphasize that active inference is not something that can be learned purely theoretically. We encourage anyone who enjoyed this book to consider pursuing it in practice. An important stage of theoretical neurobiology is to try writing down generative models, experiencing the frustration of simulating maladaptive behavior, and learning from behaviors that violate prior beliefs when surprise occurs**. Whether or not you choose to practice this at a computational level, we hope you can reflect on active inference in your daily life. This might **manifest as forcing your eyes to resolve uncertainty about something in your peripheral vision. It might be choosing to eat at a favorite restaurant to satisfy prior (taste) preferences. It might be reducing the heat when a shower is too hot to ensure the temperature conforms to your model of how the world should be**. Ultimately, **we believe you will continue to pursue active inference in some form**.