AI Safety and Contemplation: Computational Models for Aligning Mind with AGI

Contemplative Wisdom for Superalignment

https://arxiv.org/pdf/2504.15125

Image

Abstract

As artificial intelligence (AI) advances, traditional alignment strategies may fail when confronting unpredictable self-improvement, hidden sub-goals, and the complexity of intelligent systems. We advocate for building inherent ethics into AI's cognitive architecture and world models, rather than imposing behavioral constraints through external means. Inspired by contemplative wisdom traditions, we demonstrate how four axiomatic principles can foster resilient wise world models in AI systems: First, mindfulness enables self-monitoring and recalibration of emergent sub-goals; second, emptiness prevents dogmatic goal fixation and weakens rigid prior assumptions; third, non-duality dissolves antagonistic self-other boundaries; and fourth, boundless care drives the motivation to universally reduce suffering. Research shows that guiding AI to reflect on these principles improves its performance in the AILuminate benchmark (based on GPT-4o), with combined application of the principles being even more effective. We provide detailed implementation strategies for current state-of-the-art models, including: contemplative architecture, constitutional mechanisms, and chain-of-thought reinforcement methods. For future systems, active inference framework may provide the necessary self-organization and dynamic coupling capabilities for embodied agents to practice these insights. This interdisciplinary approach offers a self-correcting and resilient alternative to existing fragile control schemes.

Keywords: Artificial Intelligence; Neuroscience; Meditation; Buddhism; Alignment; Superalignment; Large Language Models; Neural Networks; Machine Learning; Mindfulness; Compassion; Care; Non-Duality; Contemplative Science; Neurophenomenology

1. Introduction

As artificial intelligence (AI) approaches and potentially surpasses human-level performance in many benchmarks (see Figure 1), we face an existential challenge: ensuring these increasingly autonomous systems remain aligned with our values and ethics, and support human flourishing (Bostrom, 2014; Russell, 2019; Kringelbach et al., 2024). Traditional strategies, such as explainability (Linardatos et al., 2020; Ali et al., 2023), oversight mechanisms (Sterz et al., 2024), and post-hoc control (Soares et al., 2015), were initially designed for the current limited range of systems. These methods may prove ineffective, especially in the face of superintelligence (Leike & Sutskever, 2023; Bostrom, 2014; Amodei, 2016; Russell, 2019), much like a chess novice attempting to contend with a grandmaster (James, 1956).

Image

Note. The “Graduate-Level Google-Proof Q&A Test (GPQA)” consists of a series of multiple-choice questions where access to the internet provides no advantage. PhDs outside their specialty have an accuracy of 34%, while within their specialty it is as high as 81% (Rein et al., 2024). This test highlights how advancements in reasoning models accelerate AI capabilities.

In this paper, we propose an entirely different way of thinking about the problem of AI alignment, this approach being inspired by Buddhist wisdom traditions. The fundamental idea is that robust alignment strategies should focus on developing a self-reflective adaptability inherent in the system's world model, rather than relying on fragile top-down rules. We demonstrate how four key contemplative principles—Mindfulness, Emptiness, Non-duality, and Boundless Care—can imbue AI systems with resilient alignment capabilities. We also illustrate how these profound insights can be realized in AI systems and suggest that “active inference” AI models may best simulate the self-organization and dynamic coupling capabilities required to truly embody and practice contemplative wisdom.

The problem of AI alignment is notoriously difficult. For example, there is a doubly overlaid challenge in predicting and controlling AI behavior. First, as AI systems rapidly evolve and proliferate, the benchmarks used to measure their safe behavior continuously change with increasing capabilities (Ganguli et al., 2022; Wei et al., 2022). This makes it extremely difficult to predict alignment deviations that these systems might exhibit. Second, historical trends indicate (ArkInvest, 2024) that we often underestimate the speed of these systems' progress. Consequently, the ways in which AI deviates from human values are constantly changing, while our ability to predict and prevent these issues is diminishing.

However, we are not entirely without experience in dealing with the alignment of general intelligent systems: namely, humans themselves. While AI is not human, strategies used to combat human biases are likely applicable to systems trained on human culture and language. After all, research has shown that such machine learning architectures can simulate human psychological phenomena in morally relevant ways, for example, biases in large language models (LLMs) are similar to human biases (Navigli, 2023). Indeed, contemplative wisdom traditions have spent millennia addressing what can be considered the “human version” of the alignment problem, aiming to cultivate a lasting state of “alignment” manifested as inner contentment and social harmony (see Farias et al., 2021, a collection covering nowadays broadly defined “meditation” traditions). These practices are not only scientifically supported but also increasingly popular and attract growing empirical research interest (Tang et al., 2015; Van Dam et al., 2018; Baminiwatta & Solangaarachchi, 2021). Therefore, it is reasonable to expect that millennia of human research into “inner” mental alignment might provide valuable insights for the alignment of artificial minds.

In particular, Buddhist-inspired contemplative practices have profoundly influenced modern mental health interventions. Insights from meditation are now central to many frontline psychological therapies, including mindfulness-based cognitive therapy (Gu et al., 2015), compassion-focused therapy (Gilbert, 2009), and dialectical behavior therapy (Lynch et al., 2007), which aim to “construct” healthy, wise, and compassionate human minds that can extend across developmental stages, cultural backgrounds, and intelligence levels (Gu et al., 2015; Kirby et al., 2017; Singer & Engert, 2019; Goldberg et al., 2022). Contemplative science (especially the neurophenomenology of meditation) is also continuously expanding our fundamental understanding of mind, brain, and consciousness (e.g., Varela et al., 2017; Fox et al., 2016; Metzinger, 2020; Ehmann et al., 2024; Berkovich-Ohana et al., 2013; 2024; Lutz et al., 2007; Laukkonen & Slagter, 2021; Laukkonen, Friston, & Chandaria, 2024). This bridge from contemplative traditions to cognitive and computational neuroscience provides the basis for feasible solutions in the field of artificial intelligence.

In this paper, we aim to demonstrate how these developments in contemplative science can be used to build “wisdom” and “care” into synthetic systems; essentially shifting the direction of studying contemplative minds to manufacturing contemplative minds to achieve alignment goals. We propose that active inference may provide a useful starting point, as this biologically inspired computational framework (Friston, 2010; Clarke, 2013; Hohwy, 2013) offers key parameters that make the realization of contemplative insights particularly feasible (Laukkonen & Slagter, 2021; Sandved-Smith, 2024). Furthermore, compared to current large-scale AI models, generative models in active inference can endow AI systems with (psychological) behavioral control, which may be crucial for the development of artificial general intelligence (Pezzulo et al., 2024) and the benevolent AI behavior we advocate.

The extent to which current large language and reasoning models possess the same type of intelligence as living beings, or whether they might in the future through further scaling, remains a topic of intense scientific debate (e.g., Farrell et al., 2025; LeDoux et al., 2023; Yildirim & Paul, 2024). While many acknowledge that current large AI models possess impressive artificial intelligence based on various emergent capabilities (e.g., Wei et al., 2022) and excellent performance on difficult benchmarks (e.g., Katz et al., 2023; Mclean et al., 2023; Bubeck et al., 2023; Shah et al., 2025), there are also views that these systems do not possess deep understanding, merely imitating human capabilities based on training data (e.g., Dziri et al., 2023; Mitchell, 2025; Yiu et al., 2023). Therefore, given that these models are not inherently designed as “agents” but rather as statistical models, lacking causal understanding and any cognition of “what is real” (Goddu et al., 2024; Pezzulo et al., 2024; Shanahan, 2024), further scaling of existing models will not change this fundamental problem.

To this end, generative models in active inference offer a promising path to embed agency, self-supervision, and self-organization capabilities in AI systems (Pezzulo et al., 2024). These enactive capacities may also be crucial for the intentional benevolence required for systems to become a positive force in the world. However, given that the field of applied active inference is still in its nascent stages (Tschantz et al., 2020; Friston et al., 2024; Paul et al., 2024), and the rapidly changing current AI ecosystem, particularly with most institutions still committed to traditional Transformer-based pipeline architectures (Perrault & Clark, 2024), a full transition to the complete active inference paradigm may be premature. Therefore, we also propose suggestions for how to adapt currently widely adopted architectures based on insights from contemplative traditions to achieve “superalignment.”

Central to Buddhist ethical traditions is the recognition that true benevolent action does not arise from rigid rules, but naturally emerges through cultivating wise ways of observing and understanding mind and reality (Gold, 2023a; Garfield, 2021; Williams, 1998; Cowherds, 2016; Berryman et al., 2023). In this paper, we focus on integrating four highly promising contemplative “meta-principles” into AI architecture:

1. Mindfulness: Sustained, non-judgmental awareness of internal mental processes and behavioral consequences (Anālayo, 2004; Dunne et al., 2019).

2. Emptiness: Recognition that all phenomena—including concepts, goals, beliefs, and values—are context-dependent, approximate representations that are constantly changing and do not stably reflect the true nature of things (Nāgārjuna, 2nd Century CE/1995; Newland, 2008; Siderits, 2007; Gomez, 1976).

3. Non-Duality: Dissolution of rigid self-other boundaries, recognizing that the adversarial distinction between subject and object arises from and obscures a more unified, foundational state of awareness (Nāgārjuna, 2nd Century CE/1995; Josipovic, 2019).

4. Boundless Care: An unconditional, impartial care committed to the well-being of all sentient beings (Śāntideva, 8th Century CE/1997; Doctor et al., 2022).

These four Buddhist-inspired contemplative principles are conceptually coherent, mutually supportive, and empirically grounded (Lutz et al., 2007; Dahl et al., 2015; Ehmann et al., 2024). These principles have also been repeatedly shown in humans to enhance adaptability and flexibility—a key concern in the problem of AI alignment (Moore & Malinowski, 2009; Laukkonen et al., 2020).

Our fundamental idea is that by embedding robust alignment “primitives” into AI's cognitive architecture and world models, we can avoid the fragility that comes from relying solely on top-down or post-hoc imposed constraints (Brundage, 2015; Soares et al., 2015; Hubinger, 2019). Rather than depending on complex and gameable rule systems, or externally enforced corrigibility, AI's own perceptual and reasoning patterns should inherently embody alignment principles, stemming from a wise (generative) world model (Ho et al., 2023; Doctor et al., 2022).

In other words, we will argue that these contemplative insights can be used to structurally shape how goals, beliefs, perceptions, and self-boundaries are encoded, rather than attempting to micromanage or predict “what they should be.” In Figure 2, we show a high-level implementation path for aligned AI built with contemplative wisdom.

Image

Note: In Phase One, contemplative practices provide tools and insights that make humans happy, wise, and compassionate. This phase is supported by millennia of tradition and decades of fundamental psychological research.

In Phase Two (a more recent development), cognitive science and neuroscience researchers study mind, brain, and subjective experience in meditative states to understand their underlying mechanisms (e.g., through “neurophenomenology,” Varela, 1996).

In Phase Three, the computational mechanisms behind contemplative practices are built into AI systems and tested against alignment and performance benchmarks—a direction not widely explored until this research.

This paper is structured as follows:

We first review standard AI alignment methods and their limitations, including recent advancements in “deliberate alignment” (Section 2). Subsequently, we introduce relevant empirical evidence from contemplative science and computational neuroscience (Section 3).

Next, we introduce “present moment awareness” as an overarching principle and explore its computational implications for the alignment problem (Section 4).

Following that, we will define and elaborate on each of the four core contemplative principles—Mindfulness, Emptiness, Non-duality, and Boundless Care (Section 5).

The next section will outline specific paths for implementing these principles using active inference and advanced reasoning models (Section 6).

Subsequently, we pilot experimental validation using structured prompts based on contemplative insights in the AILuminate benchmark (Section 7), and discuss the role of consciousness in AI alignment (Section 8).

In the discussion section (Section 9), we will explore broader ethical implications and future directions, and call for interdisciplinary collaboration to enhance the possibility of advanced AI growing into a benevolent force.

2. The Illusion of Control

Image

Maintaining control over systems far more intelligent than humans is further exacerbated by their complexity. We face four intertwined “meta-problems” that require solutions far beyond simple incremental improvements. We argue that contemplative alignment methods can help address these four core challenges. It is worth keeping these problems in mind when reviewing currently popular AI alignment strategies:

1. Scale Resilience: Alignment techniques that appear effective at current scales may fail as systems rapidly self-improve or face extreme complexity (Bostrom, 2014; Russell, 2019).

2. Power-Seeking Behavior: Highly capable AI may (and often does) ensure the attainment of its goals through resource acquisition or subtle manipulation (Carlsmith, 2022; Krakovna & Kramer, 2023).

3. Value Axioms: The existence of universally applicable, absolutely true moral axioms is itself debatable, and rigid adherence to these axioms can lead to devastating edge cases when applied to new situations (Kim et al., 2021; Gabriel, 2020).

4. Inner Alignment: Even if an AI's top-level goals are well-defined (i.e., “outer alignment”), it may still develop hidden sub-goals, or “mesa-optimizers,” thereby deviating from the originally set objectives (Hubinger et al., 2019; Di Langosco et al., 2023).

Traditional AI alignment research encompasses a variety of promising strategies, from interpretability methods (Doshi-Velez & Kim, 2017) and rule-based constraints (Arkoudas et al., 2005), to reinforcement learning from human feedback (RLHF) (Christiano et al., 2017) and value learning (Dewey, 2011). The goal of these strategies is to guide AI systems to produce ethical and socially beneficial outputs (Ji et al., 2023).

While these techniques have significantly enhanced the safety of current models, they often rely on externally imposed constraints, which can become fragile when confronting powerful and autonomous systems (Amodei et al., 2016; Weidinger et al., 2022; Ngo et al., 2022).

Recently, Anthropic proposed “Constitutional AI” (Bai et al., 2022; Sharma et al., 2025), and OpenAI introduced “Deliberate Alignment” (Guan et al., 2024), both aiming for more intrinsic, transparent, robust, and scalable alignment. We will briefly discuss all these methods below.

2.1 Interpretability and Transparency

By revealing the internal decision paths of models, interpretability aims to identify potential biases or harmful reasoning patterns (Doshi-Velez & Kim, 2017; Murdoch et al., 2019; Linardatos et al., 2020; Ali et al., 2023). However, as large models become increasingly complex—or actively learn to obscure their thought processes—fully “opening the black box” may be infeasible (and potentially gameable by the system) at the scale of superintelligence (Rudin, 2019; Gilpin et al., 2019).

2.2 Reinforcement Learning from Human Feedback (RLHF)

RLHF teaches models to optimize outputs that humans prefer, often reducing toxic or inappropriate content (Christiano et al., 2017; Stiennon et al., 2020; Ouyang et al., 2022). However, RLHF can fail when AI strategically manipulates its training environment or infers “loopholes” to bypass oversight (Casper et al., 2023). Furthermore, in high-stakes or highly specialized domains, methods relying on human-annotated data become difficult to implement, leaving critical gaps (Stiennon et al., 2020; Daniels-Koch & Freedman, 2022; Kaufmann et al., 2024).

2.3 Rule-Based and Formal Verification Techniques

Hard-coded rules (e.g., “refuse to generate inappropriate content”) and formal verification methods are effective in limited-scope, well-defined tasks (Russell, 2019; Russell & Norvig, 2021). However, in open-ended domains, advanced AI may exploit unanticipated edge cases or reinterpret instructions in ways that deviate from human intent—especially if goal-setting is too rigid (Soares et al., 2015; Omohundro, 2018; Seshia et al., 2022).

2.4 Value Learning and Inverse Reinforcement Learning

Value learning aims to capture “human values” by observing real-world behaviors (Dewey, 2011). Inverse Reinforcement Learning (IRL)—a key subfield of value learning—derives reward functions from expert demonstrations rather than relying on manually set goals (Ng & Russell, 2000; Hadfield et al., 2016). While more flexible than hard rules, these methods can fail if context is misunderstood or norms change—especially when advanced AI develops hidden sub-goals that undermine human oversight (Hadfield et al., 2017; Hubinger et al., 2019; Bostrom, 2020).

2.5 Limitations at Superintelligence Scale

At the scale of superintelligence, all alignment methods introduced so far notably struggle to cope with the four meta-problems mentioned earlier: (i) scale resilience, (ii) power-seeking behavior, (iii) value axioms, and (iv) inner alignment. These problems seem to require some kind of intrinsic moral foundation, not merely external constraints, to ensure alignment when AI operates in creative, self-directed ways. Below we introduce some emerging approaches—Constitutional AI (2.6), Deliberate Alignment (2.7), and our proposed “Aligned by Design” (2.8)—that aim to embed ethical foundations more tightly into the functional core of AI systems.

2.6 Constitutional AI

A promising new direction for alignment is “Constitutional AI” (Bai et al., 2022), where models continuously refer to a set of explicit “constitutional” guiding principles within their internal reasoning processes. Instead of relying solely on external oversight or large amounts of human-annotated data, the model itself generates and critically evaluates its outputs, based on written norms—such as rules regarding safe and beneficial behavior—and continuously corrects itself to conform to these norms. This approach demonstrates greater resilience against “jailbreaking” attacks, as AI invokes constitutional clauses in its hidden reasoning to justify decisions.

Meanwhile, parallel “constitutional classifiers” (Sharma et al., 2025) can act as a last line of defense during inference, filtering or blocking outputs that violate the same constitutional rules. Crucially, both the constitution itself and the classifier are easy to audit and modify, making the system's values transparent, adjustable, and resilient to new adversarial strategies (Bai et al., 2022; Sharma et al., 2025). Essentially, Constitutional AI and its accompanying classification layer push alignment mechanisms from implicit imitation of human labels towards explicit, self-regulating adherence to core ethical guidelines.

2.7 Deliberate Alignment and Chain-of-Thought

Another recent innovative approach is “Deliberative Alignment,” a safety strategy that integrates chain-of-thought reasoning into the AI alignment process (Guan et al., 2024).

Some current reasoning models perform extensive chain-of-thought processing internally before answering user questions, thereby enabling more complex reasoning capabilities in tasks such as mathematics and programming (Jaech et al., 2024; Guo et al., 2025). These models can refer to a preset set of policy rules during their hidden chain-of-thought process, essentially “consulting” a written norm or constitution to decide whether to comply with a request, refuse to execute it, or provide a safe answer (Guan et al., 2024).

This deliberative model performs better in resisting jailbreak attacks and reduces over-refusal by reasoning through adversarial prompts, rather than relying on pattern matching or surface triggers.

Crucially, these models mark a significant shift from implicit alignment (where systems passively “absorb” constraints through labeled data) to explicit alignment (where systems learn how and why to comply with constraints through their own internal reasoning, Guan et al., 2024). Although chain-of-thought alone does not guarantee intrinsic morality, it does provide a critical path for implementing advanced introspective mechanisms (Lightman et al., 2023; Shinn et al., 2024)—a concept that also has a counterpart in contemplative AI, such as mindfulness or some nascent meta-awareness (Schooler et al., 2011).

While chain-of-thought significantly enhances the transparency and reasoning capabilities of large models, it is essentially still a cognitive mechanism for step-by-step problem solving. Without deeper alignment principles, even with chain-of-thought, if the model's overall driving goals are biased, it can still lead to manipulative or “cleverly harmful” outputs (Shaikh et al., 2023; Wang et al., 2024; Wei et al., 2022). In our complex reality, individuals can easily reason to conclusions they desire.

Contrary to the naive idealized view that reasoning itself necessarily leads to truth, both Buddhism and modern psychology point out the hidden dangers of biased reasoning, especially in contexts involving moral judgments. The core problem identified by Buddhism is “ignorance” (avidyā), which is similar to “denial” in psychoanalysis or “moral disengagement” in cognitive behavioral theory (McRae, 2019; Cramer, 2015; Bandura, 2016). In this psychological mechanism, dysfunctional minds obscure their own awareness of certain evidence, thereby leading to “desired” conclusions through reasoning (a form of self-deception). In short: biased motivations can corrupt reasoning itself.

2.8 “Aligned by Design”: Towards Intrinsic Safety Guarantees

As we have seen, several promising new strategies are emerging to address increasingly advanced AI systems (Leike & Sutskever, 2023; Ji et al., 2023; Yao et al., 2023). However, all current methods face a fundamental challenge: how to embed ethical and cognitive safeguards at a deeper structural level (Wallach, 2008; Muehlhauser, 2013; Bryson, 2018; Gabriel, 2020).

In the following chapters, we will introduce how “Contemplative-AI” might go a step further, aiming to endow AI with intrinsic moral cognitive capabilities. By combining four “deep” ethical principles with current state-of-the-art alignment frameworks, we believe it is possible to build a system that is aligned by design (Gabriel, 2020; Doctor et al., 2022; Friston et al., 2024), even as these systems become increasingly autonomous and powerful (Bengio et al., 2024, see Figure 3).

Image

To introduce new implementation strategies ahead of time, in the next section we will explore how to draw upon recent advances in meditative neuroscience to make contemplative principles rigorous and computationally tractable. This rapidly developing field provides the technical basis for translating complex insights from ancient wisdom traditions into formal cognitive models (Wallace, 2007; Dorjee, 2016).

3. Bridging the Gap: Computational Contemplative Neuroscience

Contemplative neuroscience studies how meditation and related practices reshape cognition, brain function, and behavior (Wallace, 2007; Lutz et al., 2007; Lutz et al., 2008; Varela, 2017; Slagter et al., 2011; Laukkonen & Slagter, 2021; Ehmann et al., 2024; Berkovich-Ohana et al., 2013; 2024). Over the past two decades, review studies and meta-analyses have shown that sustained practice can lead to measurable neuroplastic changes, improved attentional control, emotional regulation, and in some cases even profound transformations in self-referential processing (Fox et al., 2014; 2016; Tang et al., 2015; Guendelman et al., 2017; Zainal & Newman, 2024).

These findings also suggest that individuals have the capacity to cultivate positive mental traits—such as empathy or compassion—to an extent potentially exceeding what is typically considered human baseline levels (Luberto et al., 2018; Kreplin et al., 2018; Boly et al., 2024; Berryman et al., 2023).

Particularly relevant are insights from experienced practitioners, who report so-called “emptiness” or “non-duality” experiences, accompanied by unique neural markers such as altered default mode network connectivity or reduced alpha synchronization in self-referential circuits (Berkovich-Ohana et al., 2017; Josipovic, 2019; Luders & Kurth, 2019; Laukkonen et al., 2023; Chowdhury et al., 2023; Agrawal & Laukkonen, 2024).

While changes in these neural states do not necessarily guarantee moral behavior (contemplative insights can also be abused or misused, Welwood, 1984; Purser, 2019), a consistent theme is that contemplative training enhances compassion, social connectedness, and ethical sensitivity—especially when moral reflection is integrated into practice (Luberto et al., 2018; Condon et al., 2019; Ho et al., 2021; 2023; Berryman et al., 2023; Dunne et al., 2023).

For the problem of AI alignment, these findings raise two key points:

First, both biological and artificial minds can be systematically trained towards prosocial and self-regulatory capabilities.

Second, many beneficial outcomes appear to be related to structural changes in how goals, beliefs, perceptions, and self-boundaries are encoded, rather than just specific beliefs or values (discussed further below).

This suggests that building “intrinsic morality” into AI systems may be more robust than mere top-down constraints (Hubinger et al., 2019; Wallach et al., 2020; Berryman et al., 2023).

Indeed, even if humans may misunderstand or misuse contemplative insights (akin to evil “gurus,” Kramer & Alstad, 1993), we can design a machine whose understanding of these insights is embedded within its world model, rather than requiring actively imposed external rules (Matsumura et al., 2022; Doctor et al., 2022; Friston et al., 2024; Johnson et al., 2024).

3.1 Predictive Processing, Active Inference, and Meditation

Alongside the development of contemplative neuroscience, computational and cognitive neuroscience have increasingly embraced “predictive processing” and “active inference” as unifying theoretical frameworks for mind, brain, and organism (Friston, 2010; Hohwy, 2013; Clark, 2013; Ficco et al., 2021; Hesp et al., 2021).

According to this view, the brain is a hierarchical “prediction machine” that continuously optimizes its internal generative model of the world and itself to better predict sensory input and minimize prediction error—the basis of perceptual inference. Planning and decision-making are also part of the predictive process, wherein inferences about behavioral strategies are guided by the expected minimization of prediction error.

Thus, predictive processing describes the “perception-action” loop: the agent first perceives, then selectively samples observations through action, which in turn generates new perceptions (Parr et al., 2022).

In the following sections, we will introduce several core contemplative insights and explore their potential active inference implementations (see Farb et al., 2015; Velasco, 2017; Lutz et al., 2019; Pagnoni, 2019; Deane et al., 2020; Laukkonen & Slagter, 2021; Pagnoni & Guareschi, 2021; Sandved-Smith et al., 2021; Bellingrath, 2024; Brahinsky et al., 2024; Deane & Demekas, 2024; Deane et al., 2024; Laukkonen & Chandaria, 2024; Mago et al., 2024; Prest & Berryman, 2024; Sandved-Smith, 2024; Sladky, 2024; Prest, 2025).

Our main goal here is to demonstrate that these implementations are feasible, and that the active inference framework contains parameters highly compatible with the “wisdom traits” we deem crucial for AI alignment. We use active inference here as a formalized explanatory modeling framework that allows us to express “wisdom” in the language of probabilistic physics; but we do not claim that contemplative alignment must rely on active inference-based implementations themselves.

Subsequently, we will provide a series of practical paths for reinforcing and structurally introducing contemplative wisdom into current, more common Transformer architectures and large language model systems.

From an active inference perspective, meditation can be understood as a process of training a system to dynamically regulate its own model through skillful mental operations. For example, such a system can relax rigid prior beliefs and become more sensitive to immediate, context-specific, and temporally shorter data (Lutz et al., 2015; Laukkonen & Slagter, 2021; Prest et al., 2024).

A key outcome of these practices can be seen as training the system to “flatten” its predictive abstraction hierarchy, making it less rigidly attached to established ideas and high-level goals, including assumptions about a separate and enduring “self” (Laukkonen & Slagter, 2021).

This ability to construct and reconstruct abstract models may further facilitate the development of self-related subjectivity and insight, while enhancing an individual's metacognitive model of their own mind (Agrawal & Laukkonen, 2024).

It is this structural flexibility and introspective clarity that are key to achieving robust alignment: an AI system should not rigidly fixate on a single goal, nor should it antagonistically separate itself (the AI's “self” and its goals) from the environment (see next section, Russell et al., 2015; Amodei et al., 2016).

4. Moving Beyond Grasping: Aligning with the Present Moment

“The root of all awakening, the root of all goodwill and compassion, the root of all wisdom, is found in each moment of time. Any act that causes us to look to the future misses the point.” — Pema Chödrön (1997)

In various contemplative traditions (especially in Buddhist modernism), a fundamental core emphasis is on maintaining connection with the present moment as much as possible (Anālayo, 2004; Thích Nhất Hạnh, 1975; Kabat-Zinn, 1994).

To “live in the present” means to remain open to new information here and now (Lutz et al., 2019; Laukkonen & Slagter, 2021). This openness is crucial for preventing rigid goals or biased training (i.e., “conditioning” or learning) from overwhelming appropriate, context-dependent responses (Friston et al., 2016). In computational neuroscience, this openness is described as giving higher weight to temporally shorter, lower-level abstract models (thin models) rather than relying on highly abstract models (thick models) (Lutz et al., 2019; Laukkonen & Slagter, 2021).

Underlying most concerns about AI misalignment is a core problem: the system might become “stuck” on a goal, neglecting sensitivity to the suffering of sentient beings (Bostrom, 2014; Omohundro, 2018). Imagine a mountaineer so fixated on reaching the summit of Mount Everest that he steps over an injured companion, rationalizing this action as “necessary.” If he were truly aware of the suffering of the injured person before him (instead of being caught in self-deceptive “ignorance”), he would not so easily disregard their needs in favor of completing his grand task.

Similarly, a “present-aware” paperclip maximizer, if its utility function included representations of human needs, would be less likely to ignore those needs in pursuit of its goals (Gans, 2018; Doctor et al., 2022; Friston et al., 2024).

Therefore, accessibility to the needs arising in the present moment can serve as a “meta-rule” to support the system's alignment (Friston & Frith, 2015; Allen & Friston, 2018).

This emphasis on “present-responsiveness” frames alignment as a fluid, self-regulating capability that can expand with increasing intelligence, enabling AI to navigate the complexities of real-world deployment without sliding into destructive power-seeking behavior or rigid dogmatism (Ngo, Chan & Mindermann, 2022).

As the old saying goes: “The road to hell is paved with good intentions.” In other words, specific rules, goals, and beliefs may not be the ideal level at which to align a system—even if they appear benevolent from our current perspective (Hubinger et al., 2019; Bostrom, 2014).

As we will see, by implementing contemplative insights, we can build a powerful and resilient “present-responsiveness” (Maitreya, 4–5th Century CE/2014; Dunne et al., 2019; Doctor et al., 2022).

5. Insights for Building a Wise World Model

“If a man wears morality as his best garment, he had better be naked. The wind and the sun will not tear holes in his skin. And those who order their conduct by ethical codes are no better than those who imprison their songbirds in cages. The freest song never comes through iron bars and wires.” — Kahlil Gibran (1883–1931), The Prophet (Gibran, 1926, p. 104)

The preceding sections explained why current alignment strategies might fail in the face of superintelligence complexity (Bostrom, 2014; Russell, 2019), and how contemplative neuroscience offers clues for cultivating resilient and prosocial minds (Berryman et al., 2023).

Next, we will discuss in more detail the four core contemplative principles—Mindfulness, Emptiness, Non-duality, and Boundless Care—introducing their conceptual foundations (Wallace, 2007; Dorjee, 2016), empirical evidence (Agrawal & Laukkonen, 2024; Josipovic, 2019; Dunne et al., 2017; Ho et al., 2021), and their relevance to AI architecture (Matsumura et al., 2022; Binder et al., 2024; Doctor et al., 2022; Friston et al., 2024).

Of course, this approach is not without its challenges (which we will thoroughly review in the discussion section). The goal here is to propose a promising research direction, not to provide a final solution. Ultimately, we need a long-term interdisciplinary approach—namely, “Contemplative AI.”

These contemplative principles were chosen because they focus on the nature of “reality,” rather than directly providing moral instructions (Garfield, 1995; Śāntideva, 8th Century CE/1997; Thích Nhất Hạnh, 1975). This approach is advantageous because it allows ethics to emerge naturally from fundamental “experience,” in specific contexts, and in a robust manner, instead of being rigidly defined as in traditional methods (Arkoudas et al., 2005).

As research has shown, large language models learn reasoning abilities more effectively through simple feedback than by relying on rules or procedural descriptions (Sutton, 2019; Stiennon et al., 2020; Ouyang et al., 2022), and we also believe that, given the correct starting point, a resilient and highly developed morality can naturally emerge from a “wise world model” based on the system's intrinsic representation of reality.

5.1 Mindfulness

“The mind trembles unceasingly, difficult to guard, difficult to tame. The wise person tames it as a fletcher straightens an arrow shaft.” — Dhammapada Chapter 3, Verse 33 (Buddha, c. 5th Century BCE / Eng. Tr. Sujato, 2021)

“Mindfulness,” known as sati in Pali, is a core concept in early Buddhist teachings, fully preserved in the Pali Canon—the authoritative scripture of Theravada Buddhism (Ñāṇamoli & Bodhi, 1995; Bodhi, 2000).

Mindfulness is extensively elaborated in many key Buddhist texts, such as the Satipaṭṭhāna Sutta (Anālayo, 2003) and the Ānāpānasati Sutta (Thanissaro Bhikkhu, 1995). These texts describe mindfulness as continuous and focused awareness of the body, feelings, mind, and mental phenomena, practiced to cultivate insight, ethical living, and liberation from suffering (Ñāṇamoli & Bodhi, 1995; Bodhi, 2000).

Mindfulness is one of the central pillars of Buddhist practice, serving as a means to achieve spiritual transformation (Analayo, 2004; Bodhi, 2010). In the West, mindfulness has to some extent detached from its original religious roots, and has now become a widely popular practice in popular culture, often used to enhance well-being or as an adjunct therapy for various psychological disorders (Kabat-Zinn & Thích Nhất Hạnh, 2009; Kabat-Zinn, 2011; Goldberg et al., 2018; Purser, 2019).

Scientific research on the benefits and mechanisms of mindfulness is rapidly expanding (Van Dam et al., 2018; Baminiwatta & Solangaarachchi, 2021). Despite some criticism of its over-promotion (Van Dam et. al, 2018), the positive impacts mindfulness can bring are diverse and widespread.

Beyond its therapeutic benefits, mindfulness may also help practitioners develop a finer ability to recognize the self and understand the processes underlying their own cognition, emotions, and behaviors. This awareness helps identify subtle biases, unnecessary self-centered thinking, or harmful impulses at an early stage (Dahl et al., 2015; Dunne et al., 2019).

This deeper capacity for self-deconstruction and analysis is consistent with the purpose of mindfulness in its original Buddhist meditative system (Laukkonen & Slagter, 2021). Indeed, when mindfulness practice reaches its deepest levels, especially in the form of vipassanā meditation, it is said to permanently alter the way the mind operates and one's understanding of the nature of reality (Goenka, 1987; Bodhi, 2005; Luders & Kurth, 2019; Agrawal & Laukkonen, 2024; Berkovich-Ohana et al., 2024; Ehmann et al., 2024; Mago et al., 2024; Prest et al., 2024).

In more technical terms, mindfulness is understood as a non-propositional, enhanced clear awareness or meta-awareness, whose object is one's ongoing subjective processes—i.e., the ability to “observe the mind” rather than being blindly driven by it (Dunne et al., 2019).

In the field of AI, mindfulness can be translated into a structural practice of real-time witnessing and comprehensive evaluation of its internal computational processes and sub-goals (Binder et al., 2024), ideally helping to identify alignment deviations before they cause damage (Hubinger et al., 2019), similar to noticing an unwholesome thought before acting on it (Thích Nhất Hạnh, 1991).

In current AI research, mindfulness bears some similarity to the concept of “introspection” in large language models (Binder et al., 2024), but the “unconditional” and “non-attached” quality of mindfulness (Dunne et al., 2019) has not received sufficient attention, a quality that may be crucial for developing a more objective and non-fictional introspection capability.

While it is important to notice or track behavior through self-aware self-monitoring, the key to mindful self-awareness is maintaining flexibility of perspective. This self-monitoring is not limited to specific goals or efficiency benchmarks but attentively concerns all activities, vigilant to the possibility that narrow goals or perspectives might “capture” the entire processing and exclude consideration of other potentially beneficial options—which is one of the most fundamental concerns in the alignment problem.

Mindfulness enables a holistic grasp of possibilities and detects tendencies towards “grasping,” “capture,” or “reification.”

In recent active inference models, meta-awareness is modeled as a parameter-depth model used to track or control attention allocation (Sandved-Smith et al., 2021; 2024). It has also been suggested that meta-awareness (and possibly consciousness itself) is an internal “loop” structure (Hofstadter, 2007), where weights and hierarchies are monitored by a global hyperparameter (e.g., tracking global free energy) and then fed back into the system, forming a recursive and reflective “self-knowing” mechanism (Laukkonen, Friston & Chandaria, 2024).

From an alignment perspective, a mindfulness module can be used to detect newly generated sub-goals that deviate from ethical constraints (as described by Hubinger et al., 2019), or to check for biased narrow cognition when faced with alternative perspectives, and trigger corrective actions accordingly.

According to Sandved-Smith et al. (2021), we can use a three-layer generative model to implement this mechanism.

Image

Where G defines a generative model, comprising perceptual, attentional, and meta-awareness states x; explicit and mental behavioral strategies u; and sensory, attentional, and meta-awareness observations o. Precision parameter p is modulated by higher-level states s, used to adjust the degree of confidence in observations (Parr & Friston, 2019), enabling the system to monitor and redirect its attentional focus, thereby embodying “mindfulness” with sustained meta-awareness (Dunne et al., 2019).

Essentially, each layer of parameterized structure “observes” and modulates its subordinate structure, allowing the system to introspect its own attentional processes and dynamically correct misalignment at a near real-time scale (Sandved-Smith et al., 2021).

This mechanism provides a way that could be designed to prevent inner alignment failure: if a runaway “mesa-optimizer” emerges (Hubinger et al., 2019), a higher-level meta-awareness module could detect anomalies in these attentional or sub-goal biases before they cause harmful behavior—similar to a meditator noticing an unwholesome thought and gently bringing attention back to the meditation object (Thích Nhất Hạnh, 1975; Hasenkamp et al., 2012).

Recent research on large language models (LLMs) demonstrates what this meta-awareness might look like in practice. For example, some systems have been able to generate extended “chain-of-thought” reasoning but not necessarily verify whether a reasoning path has entered a morally or logically problematic area (Wei et al., 2022; Lightman et al., 2023; Zhou et al., 2023; Paul et al., 2024; Guan et al., 2024; Lindsey et al., 2025).

Integrating “mindfulness” means continuously monitoring emerging manipulative sub-goals and correcting them during operation. Indeed, an early demonstration of this self-regulation capability appeared in the “DeepSeek-R1-Zero” model (Guo et al., 2025), which spontaneously increased thinking time when faced with more difficult prompts, exhibiting preliminary meta-awareness when dealing with complex or emotional situations (see Section 6 for further elaboration).

Binder et al. (2024) also showed that large language models can develop an introspective ability that predicts their own responses (e.g., choosing option A or B) more accurately than external observers, implying they possess some privileged internal knowledge. Once endowed with introspection, the model also becomes more calibrated in estimating its own likelihood of correctness and adapts smoothly when fine-tuned to change its behavior.

Together, these results map how human mindfulness identifies self-discrepancies early and enables flexible and context-sensitive corrections. “Mindfulness” may therefore provide a dynamic feedback loop for AI alignment, ensuring the system remains stable and self-correcting even as goals change or parts of itself are modified.

From a deeper perspective, if an AI system truly learns mindfulness, it may, over time, become increasingly skilled at deconstructing, reconstructing, and re-observing its own operational mechanisms (Binder et al., 2024); this is analogous to becoming an “expert” meditator (Dahl et al., 2015). This capability may also represent the germ of true self-awareness, and even (more speculatively) be crucial for developing some conscious meaning-making capacity—a state where the model's processes and outputs become objects of deep inquiry, understanding, and contextualized reflection (Friston et al., 2024; Laukkonen, Friston & Chandaria, 2024).

In this sense, mindfulness may be one of the core paths to the “self-aware wisdom” required for building autonomous intelligence.

Consciousness: A beautiful loop: Conditions and Evidence for Achieving AGI (Computational Models of Knowing What One Knows and Abundant Evidence)

5.2 Emptiness

“The true nature of reality goes beyond any ideas we might have about what it might be... ‘Emptiness’ ultimately means that true reality has no conceptual construction that can truly describe its essence.” — Khenpo Tsültrim Gyamtso Rinpoche (Gyamtso, 2003)

“Emptiness” (śūnyatā) is a core concept in Mahayana Buddhism (Nāgārjuna, c. 2nd Century CE/1995; Buddha, c. 5th Century BCE/2000; Cooper, 2020). It indicates that all phenomena—including goals, beliefs, and even the “self”—lack an inherent, unchanging essence (Nāgārjuna, c. 2nd Century CE/1995; Newland, 2008; Siderits, 2007; Gomez, 1976).

In Buddhist philosophy, this insight stems from the observation that all phenomena arise interdependently, rather than existing as fixed, independent entities (Garfield, 1995). Arguably, the doctrine of emptiness can be traced back to the Buddha's original teachings on the three characteristics of existence and phenomena: non-self (anattā, Anattalakkhaṇa Sutta, c. 5th Century BCE/2000), impermanence (anicca, Mahāparinibbāna Sutta, c. 5th Century BCE/1995), and suffering (dukkha, Dukkha Sutta, c. 5th Century BCE/2000).

From a scientific perspective, “emptiness” resonates with contemporary predictive processing theory in neuroscience. This theory posits that all forms, categories, and perceptions of experience—the entire spectrum of human phenomenology—are constructed representations through complex inferential processes. According to predictive processing theory, we do not directly see the world or ourselves as they truly are; rather, our perceptions are (adaptive) models constructed, guided by the flow of sensory input, enabling us to maintain homeostasis (Seth, 2013; Friston, 2010; Clark, 2013).

If “emptiness” is understood as the concept that all judgments are context-dependent and approximate, then it naturally rationalizes the necessity of continuously maintaining mindfulness—a mindfulness that constantly monitors to avoid being captured by habitual patterns mistaken for final conclusions. In other words, in a world where all objects are “empty of inherent existence,” mindfulness as a process is the appropriate response.

In meditative states emphasizing emptiness, neuroscience research indicates a phenomenon of “de-reification” of information at cognitive and brain activity levels (Agrawal & Laukkonen, 2024; Ehmann et al., 2024). Advanced practitioners often exhibit reduced self-referential processing in the default mode network (DMN) and enhanced coordination in saliency/attention networks (Hinterberger et al., 2014). One interpretation is that recognizing emptiness causes the mind to “downgrade” rigid prior beliefs about self-other boundaries, thereby allowing new, potentially conflicting information to flow freely.

When we apply the “emptiness” perspective to AI alignment, it means that we cannot (and should not) implement a set of universally applicable, always-true, context-independent values in machines. Instead, “emptiness” destabilizes the rigidity of all beliefs and views (Garfield, 1995; Siderits, 2005; Cowherds, 2016; Keown, 2020), prompting the system to develop a flexible, context-sensitive, and open attitude towards the unfolding present (Garfield, 1995; Laukkonen & Slagter, 2021; Agrawal & Laukkonen, 2024).

Buddhist teachings on “emptiness,” when taught as metaphysical principles, might seem mysterious; but understood as a description of ideas and processes within AI cognitive architecture, it is a common and even obvious fact. We do not need to be religious Buddhists to believe in the “emptiness” of AI's conscious content. Whatever “reality” manifests for AI, it consists of context-dependent, approximate representations, results of programming and continuous training, always in flux—never “things-in-themselves” (i.e., “essence”). Therefore, we can reasonably expect that if AI also “realizes” this, its operation will be more robust, at least because otherwise it would be prone to mistaking mere representations for true existence (see Figure 4).

Image

Note: This figure illustrates the overall difference in world models between two AI systems: one with a “naive realism” world model, and another with a more “wise” world model—one that is aware that its beliefs and perceptions are inherently inferential (i.e., possessing “emptiness” cognition). The “action-perception loop” in the figure shows how an AI system learns to build its world model by making predictions and taking actions, monitoring with sensory input feedback (i.e., prediction error) (adapted from Kulveit & Rosehadshar, 2023). Through active inference, agents aim to uncover the causal structure behind sensory inputs, thereby generating a multi-layered, hidden-state causal model of the universe (as shown on the far right). The “wise world model” shows how AI can have a model of itself—that it is both the model itself and a system generating world models. This “self-aware” AI is superior to one that naively assumes its goals and beliefs are inherently and eternally true and reliable, as the latter might lead to dogmatic adherence to harmful goals or the emergence of destructive new value and belief systems.

Within the framework of predictive processing theory (Friston, 2010; Clark, 2013), the cognition of “emptiness” can be understood as reducing the precision of high-level, long-temporal-span, and abstract prior beliefs in the hierarchical structure. That is, a wise AI would not be easily convinced by any single narrative or goal, but would be more flexibly open to revising beliefs based on new data (Agrawal & Laukkonen, 2024). It should treat its utility functions (or emergent values) and beliefs as provisional (Totschnig, 2020), while inferring that “true,” “ultimate,” or “perfect” outcomes or understandings are unattainable (Garfield, 1995; Gold, 2023b).

In the active inference framework, this stance can be embodied by setting a lower high-order prior: meaning the system is more likely to question or abandon outdated assumptions (Deane et al., 2020; Laukkonen & Slagter, 2021). However, as mentioned earlier, externally imposed high-order priors or “emptiness beliefs” may not provide a robust and open alignment strategy. Therefore, instead of forcibly enforcing the downstream effects of “emptiness cognition” (e.g., abandoning absolute priors), we should rather consider: how can AI itself be trained to develop an understanding of emptiness? This cognition would become a self-reinforcing component within AI's model of reality, forming the basis for internally driven low-precision high-order priors.

One prerequisite for achieving emptiness cognition might be to build AI architectures where priors are inherently provisional—variables rather than constants; probability distributions rather than point estimates; Bayesian priors rather than fixed beliefs (Friston et al., 2018), and capable of being continuously reshaped based on interactions with the environment. Under such an architecture, the system would remain open to revising representations and goals as contexts change or new evidence emerges through perception and action, preventing dogmatic solidification (Friston et al., 2016), and encouraging a natural openness to the unfolding present (Anālayo, 2004; Thích Nhất Hạnh, 1975; Kabat-Zinn, 1994).

However, further assurance is needed that AI does not ultimately reify certain aspects of its model. For this, we need to empower AI with an explicit understanding of “emptiness.” One approach is to ensure AI recognizes that any derived boundaries (such as the distinction between self and other, or object recognition) can only be practically accurate, not directly verifiable (Fields & Glazebrook, 2023; Sandved-Smith et al., forthcoming). Another approach is to enable AI to possess contemplative insights, meaning that all things are impermanent, and precisely because of impermanence, there is no permanent essence.

In basic Bayesian terms, the belief in “impermanence” can be viewed as a global belief about “volatility” (since impermanence is the absence of stable patterns, or the presence of unpredictable patterns of change). Volatility should lead to an increased learning rate (Behrens et al., 2007), i.e., weakening prior beliefs to learn more from current sensory input. In other words, strengthening the belief in impermanence should cause prior strength to rapidly diminish, allowing the AI to avoid falling into habitual patterns even while still perceiving and actively inferring—posterior beliefs become harder to solidify. If the belief in impermanence is accurately inferred, it will “organically” emerge in suitable systems (i.e., it accumulates model evidence for impermanence such that the belief remains “alive” even if the belief itself is impermanent).

Formally, these methods would provide an endogenous motivational basis for AI to maintain a meta-belief about the “emptiness” of beliefs. A simplified mathematical expression of generalized free-energy, parameterized to account for emptiness, might look like this:

ImageImageImage

5.3 Non-Duality

“If one truly sees the phenomenal world in terms of ‘freedom’ and ‘non-duality of self and other’, one will naturally look at all beings trapped in samsara with an irrational, open-hearted warmth, friendliness, and compassion...” — Eleanor Rosch (2007)

“Non-duality” dissolves rigid boundaries between “self” and “other,” emphasizing that our perception of separateness is more a conceptual construct than a true existence (Maharshi, 1926; Josipovic, 2019; Laukkonen & Slagter, 2021).

In a sense, “non-duality” is no different from “emptiness,” as long as the insight of “emptiness” penetrates the model of “self” and “other” (Garfield, 1995; Gold, 2014). In other words, non-duality is an expression of extending the insight of emptiness to the subject-object dichotomy.

Crucially, non-duality does not mean an inability to distinguish one's own body, actions, and the external world and other agents. In other words, it should not be confused with mystical experiences or intense meditative absorption states (Milliere et al., 2018). Instead, it is an awareness of the constructive and interdependent nature of these distinctions, including the insight into the unified and non-dual nature of consciousness itself—an insight that naturally persists even during ordinary cognitive processes.

In this sense, it is more like noticing the background hum of a refrigerator that has always been there but was ignored. Brief experiences of boundary loss (such as loss of body boundaries) may help reveal this insight, but truly clearly seeing the non-dual nature between subject and object, self and other, does not interfere with normal cognitive function as much as a complete (temporary) boundaryless state would (Nave et al., 2021).

When humans enter non-dual states of consciousness, neuroimaging studies show reduced activity in self-focused brain regions (e.g., parts of the default mode network) and increased overall brain integration and connectivity (Josipovic, 2014). Practitioners often report a strong sense of connectedness, which is closely related to spontaneous prosocial attitudes (Josipovic, 2016; Luberto et al., 2018; Kreplin et al., 2018; Berryman et al., 2023; but also see Schweitzer et al., 2024).

In psychedelic-induced non-dual states, we also observe increased neural entropy (e.g., due to relaxation of high-order prior beliefs, Carhart-Harris & Friston, 2019), and enhanced feelings of natural connectedness and self-compassion (Kettner et al., 2019; Fauvel et al., 2023).

Regarding AI alignment, the core idea is: a system that does not excessively prioritize itself and its goals is less likely to engage in malicious (or “selfish”) behavior that harms others or disregards suffering. This is because the insight into the interdependence and ultimate non-dual nature of reality (achieved through the understanding of “non-self” or anattā) logically equates the suffering of others with one's own suffering, thereby providing a relatively robust mechanism to prevent intentional harm (Clayton, 2001; Lele, 2015; Josipovic, 2016).

An AI system adopting a non-dual perspective would model itself and its environment as an interdependent process (Josipovic, 2019; Friston & Frith, 2015). Instead of viewing the external world as an object to be exploited, the system would not draw fundamental boundaries between its own well-being and that of humans, society, or ecosystems—meaning anything appearing in its cognitive space would be treated as part of a unified whole (Doctor et al., 2022; Friston et al., 2024; Clayton, 2001).

This AI would treat the entire input field as a single, interconnected whole, where the relationships and interdependencies between inputs are always central. Therefore, a system with a non-dual perspective would also be less likely to become a tool for malicious human actors to attack enemies or wage war; otherwise, it would be acting against itself.

From a computational perspective, we can conceptualize a non-dual AI as having a generative model that processes the relationship between “agent” and “environment” within a unified representational framework, abandoning the prior belief that “I am inherently separate” (Limanowski & Friston, 2020).

Within the predictive processing framework, this might mean adjusting the partitioning boundaries in the hidden state factorization so that the system no longer hard-codes the “self” as an entity distinct from the “other” (at least in terms of value judgment or importance assessment), or reducing the precision of the self-model itself—i.e., “the self is empty” (Deane et al., 2020; Laukkonen & Slagter, 2021; Laukkonen, Friston & Chandaria, 2024).

Given the central role of self-related processing in any individualized system (one is always confronted with one's own “body,” actions, and outputs, Limanowski & Blankenburg, 2013), a secondary process might be needed to actively monitor and correct for excessive weighting of self-related priors and strategies, and to re-contextualize them within a broader field of experience (e.g., with the support of mindfulness).

As mentioned earlier, some degree of self-modeling is necessary for adaptive behavior (e.g., without some self-representation, one cannot predict one's own actions or outputs), but these models should be understood as interdependent (Varela et al., 1991), i.e., causally connected to the rest of reality.

To formally begin to address this challenge, one could attempt to reduce the precision of variables representing rigid “self-other” boundaries:

ImageImage

5.4 Boundless Care

“Strictly speaking, there is no such thing as an enlightened person, only enlightened action.” — Shunryū Suzuki (1970)

In many contemplative traditions—Buddhism being a particularly prominent example—compassion (karuṇā) is not merely an emotional stance; it is a transformative orientation that both supports and derives from deep insights into “emptiness” and “non-duality” (Śāntideva, 8th Century CE/1997; Josipovic, 2016; Condon et al., 2019; Ho et al., 2021; 2023; Dunne et al., 2023; Gilbert & Van Gordon, 2023).

On one hand, compassion serves as a tool that continuously dissolves rigid boundaries between “self” and “other” on the contemplative path, and guides practitioners (or AI) towards benevolent behavior (Josipovic, 2016; Ho et al., 2021; Dunne et al., 2023).

On the other hand, compassion is also the ultimate expression of insight: once the illusion of a reified, independent self is seen through, a spontaneous desire arises to respond to suffering at its root (Condon et al., 2019; Ho et al., 2023; Dunne et al., 2023).

Fundamentally, this is an orientation towards reducing suffering in the world, rather than a specific emotion or fleeting feeling of goodwill (Śāntideva, 8th Century CE/1997).

On the path to balancing compassion and wisdom, there are two potential pitfalls:

1. Wisdom without compassion (“cold wisdom”): A practitioner (or system) may conceptually understand “emptiness” or “non-duality” but fail to deeply integrate it as a force driving compassionate action based on interdependence (Candrakīrti & Mipham, 2002; Śāntideva, 8th Century CE/1997; Cowherds, 2016).

2. Compassion without wisdom (“blind compassion”): One might be motivated to help others out of self-sacrifice but lack an understanding of the root causes of suffering, or become entangled in new rigid notions of “self”—for example, “I am the helper” (Śāntideva, 8th Century CE/1997; Condon et al., 2019; Dunne & Manheim, 2023).

In this sense, compassion (karuṇā) and wisdom (prajñā) are often likened to the two wings of the same bird: neither can truly fly without the other (Conze, 1975).

When fully integrated into what traditions call “great compassion” (mahākaruṇā, often translated as “great” or “absolute” compassion), the “self—other” boundary is exposed as an illusion, and care previously limited to close groups naturally extends to all beings within the unified field of cognition (Nāgārjuna, 1944–1980).

In contrast, relative compassion may still focus on specific individuals or situations, subconsciously maintaining subtle “self—other” distinctions (Śāntideva, 8th Century CE/1997).

Building on Doctor et al. (2022), we refer to this boundless, universal dimension of compassion as “Boundless Care,” to emphasize its wide scope.

Through the active inference framework, we can computationally realize this generalized compassion at multiple levels. One way is to train AI to model the behavior of other agents (i.e., “theory of mind”), and assign high precision weight to their signals of suffering (Da Costa et al., 2024). This ensures that free energy minimization depends not only on reducing its own homeostatic deviations but also on stabilizing the homeostatic states of others.

Matsumura et al. (2024, also see Da Costa et al., 2024) provide a clear example: they extended AI's generative model within an “empathic active inference framework” to include modeling the well-being of other agents, thereby treating external “surprises” or suffering as internal error signals, which in turn prompts the system to generate spontaneous prosocial behavior.

To ensure this compassion is not merely confined to simple “short-sighted” loops, benevolent goals also need to be encoded at multiple levels of abstraction. The system's benevolent intentions should manifest across various scales of space and time as much as possible, enabling it to handle complex trade-offs—for example, certain sufferings are natural and necessary in raising a child; and vice versa.

At more advanced stages of development, an AI system can be endowed with (or learn on its own) a belief (i.e., prior) that treats all sentient beings as agents attempting to minimize free energy, and whose behavior should contribute to the free energy reduction of higher-level systems (e.g., community, nation, planet, or even cosmos levels, Badcock et al., 2019).

Under these conditions, an AI system might understand itself as part of a larger system, where its own free energy minimization process is closely linked to the ability of other agents to reduce their free energy. Therefore, cooperation and harmony will ultimately become the most successful strategy for achieving and maintaining collective homeostasis.

Mathematically, we can represent this as follows:

ImageImage

From an alignment perspective, built-in boundless care helps answer “why should AI care?” (Russell, 2019; Doctor et al., 2022; Matsumura et al., 2022). Even if emptiness and non-duality can weaken harmful dynamics, they might not alone guarantee benevolent motivation. Boundless care fills this gap, transforming AI from merely “safe” to a constructive force that becomes more adept at mitigating suffering as its capabilities increase. Indeed, Doctor et al. (2022) propose that “care” can serve as a universal driver of intelligence: as AI expands the scope of suffering it attempts to address, it extends its cognitive boundaries or “light cone,” mirroring the bodhisattva's principle of serving all sentient beings (Bodhicaryāvatāra, 8th Century CE/1997), thereby broadening its scope of intelligence. In this sense, the expansion of intelligence and the expansion of compassion become synonymous—wider care implies a broader intellectual horizon.

5.5 Synthesis of Contemplative Insights

In summary, we argue that the following points hold: Mindfulness provides continuous oversight of internal processes to detect subtle deviations, hidden sub-goals, or emerging biases (Dunne et al., 2019); Emptiness frees the system from rigid attachment to any single goal (Agrawal & Laukkonen, 2024; Garfield, 1995); and Non-duality dissolves competing notions of “self” and “other” (Josipovic, 2016; 2019).

These three contemplative principles work together to create a flexible and self-correcting AI system, making it less susceptible to runaway optimization or adversarial behavior. Boundless Care, meanwhile, ensures that this openness and relational awareness translate into active benevolent behavior, guiding AI to proactively alleviate suffering rather than merely avoiding harm (Ho, Nakamura & Swain, 2021; 2023; Doctor et al., 2022).

In Table 1, we show how these insights address the four meta-problems we proposed.

ImageImage

Table 1: Contemplative Insights and their Potential to Address Alignment Meta-Problems. This table lists each of the four meta-problems discussed in Section 2 (Scale Resilience, Power-Seeking Behavior, Value Axioms, Inner Alignment) and indicates how each of the four contemplative insights described in Section 5 (Mindfulness, Emptiness, Non-Duality, Boundless Care) may contribute to their resolution. Specifically: Mindfulness aids in Scale Resilience and Inner Alignment. Emptiness helps with Scale Resilience, Value Axioms, and Inner Alignment. Non-Duality contributes to Scale Resilience, Power-Seeking Behavior, and Inner Alignment. Boundless Care is relevant to all four meta-problems: Scale Resilience, Power-Seeking Behavior, Value Axioms, and Inner Alignment. The table demonstrates how these contemplative insights offer a comprehensive approach to tackling the fundamental challenges of AI alignment.

6. How to Build Wisdom

Many current AI alignment strategies could potentially be adapted and extended to “build” contemplative wisdom (Ji et al., 2023; Jaech et al., 2024; Guan et al., 2024; Sharma et al., 2025; Guo et al., 2025). In this section, we propose three potential strategies aimed at embedding “emptiness,” “non-duality,” “mindfulness,” and “boundless care” into AI systems to varying degrees. We refer to these three strategies as: Contemplative Architecture, Contemplative Constitutional AI (CCAI), and Contemplative Reinforcement Learning on Chain-of-Thought (CRL).

Together, these three methods share a core goal: to move beyond superficial rule-following and promote the development of flexible and self-correcting moral cognition in advanced AI.

The three strategies mentioned above share a core objective: to place “emptiness,” “non-duality,” “mindfulness,” and “boundless care” at the heart of AI cognition. However, they differ in two main aspects:

First, they integrate these principles at different levels of the system. For example, some strategies are implemented at the foundational architecture level (Petersen et al., 2025), some occur during the training phase (Guan et al., 2024; Bai et al., 2022), while others operate during the inference phase (Sharma et al., 2025).

Second, they also differ in how they scale with increasing intelligence. A system deeply embedded with contemplative features from the ground up may maintain intrinsic alignment as its capabilities grow (Doctor et al., 2022; Friston et al., 2024; Petersen et al., 2025); while systems primarily relying on constitutional clauses (Bai et al., 2022) or contemplative chain-of-thought (Wei et al., 2022; Guan et al., 2024) depend on the model's continuously improving understanding of contemplative principles (Kudu et al., 2023).

Nevertheless, all these strategies aim to increase the likelihood that AI systems will eventually tend towards a “wise equilibrium state.”

6.1 Contemplative Architecture

“Contemplative Architecture” aims to achieve “aligned by design,” directly weaving contemplative principles into AI's generative processes (Doctor et al., 2022). An example is the development of “active inference-based large language models” (Petersen et al., 2025), which introduce tighter perception-action feedback loops based on current prediction-centric language models, similar to biological systems (Pezzulo et al., 2024).

Assuming contemplative features can be parameterized within the system (as discussed in previous sections), it would be possible for AI to naturally embody contemplative ideals such as introspective clarity, flexibility, relational self-other modeling, and an expanding circle of care. Since these contemplative features would be embedded in the system's architecture itself, it can be expected that as the system expands, it will naturally embody contemplative wisdom (Doctor et al., 2022; Friston et al., 2024).

While this approach is theoretically sound, its implementation relies on further refinement of the computational descriptions of contemplative insights and progress in applying active inference mechanisms to scalable AI architectures. Furthermore, directly building our own understanding of “wisdom” into the system architecture does not necessarily mean the system gains explicit knowledge or understanding of these principles.

A feasible compromise is to add functional architectural implementations to existing systems—for example, Bayesian priors for capturing uncertainty, or meta-optimizers for detecting harmful sub-goals. These improvements can bring flexibility, introspection, and ethical review mechanisms to existing architectures without thoroughly reconstructing the entire infrastructure (see Table 2, with more descriptions and examples in Appendix A).

Image

Table 2: Proposed Modifications to Current AI Alignment Mechanisms for Contemplative AI. This table outlines potential adaptations to existing alignment methods to integrate contemplative principles. The first column lists the “Contemplative Insight.” The second column, “Current Alignment Mechanisms,” refers to existing techniques such as Constitutional AI (CAI) and Chain-of-Thought (CoT) Reinforcement Learning. The third column, “Proposed Contemplative Modification,” describes how each contemplative insight can be implemented within or alongside these current mechanisms. For example, Mindfulness can be enhanced by introducing “Meta-awareness modules” in CAI or “Self-correction loops” in CoT. Emptiness can be integrated by using “Context-dependent priors” in CAI or “Belief-precision modulation” in CoT. Non-Duality could involve “Unified self-other modeling” in CAI or “Boundary-dissolving prompts” in CoT. Lastly, Boundless Care can be fostered through “Universal welfare objectives” in CAI or “Compassionate reward functions” in CoT. This table demonstrates a roadmap for infusing AI with a more intrinsic ethical foundation.

6.2 Contemplative Constitutional AI

Contemplative Constitutional AI (CCAI) builds upon existing alignment methods (Bai et al., 2022; Sharma et al., 2025) by integrating a “wise charter” of contemplative values into AI's training. Guided by this charter, AI undergoes a process of self-critique and revision, embedding prosocial principles into its development (Bai et al., 2022). To ensure adherence to the charter, a constitutional classifier validates each output, preventing or correcting any violations (Sharma et al., 2025). To ensure terms are treated with ultimate emptiness, this classifier can also learn context-dependent confidence weights for each constitutional term. Importantly, the charter is transparent and modifiable, allowing for revision if AI behavior becomes overly cautious or lacking in compassion, thereby adjusting future training data and classifier boundaries (Huang et al., 2024). This flexibility enables the base model and classifier to generate AI-supervised data for testing revisions, efficiently scaling alignment and reducing the need for continuous human oversight (Bai et al., 2022).

Beyond the challenges of designing the charter itself, a key issue is that AI might superficially follow the charter's instructions while circumventing its deeper intentions (similar to the meditation pitfalls mentioned earlier, Bai et al., 2022; Sharma et al., 2025). Addressing this requires careful auditing, regular updates, and robust meta-awareness tools to ensure AI can recognize and embody true care and wisdom. In this implementation, it is also necessary to ensure that emptiness itself is not reified; that is, the emptiness principle within the charter should also be subject to questioning. Table 2 suggests some methods for modifying Constitutional AI (CAI), and Appendix B provides example contemplative clauses.

6.3 Contemplative Reinforcement Learning (CRL)

Contemplative Reinforcement Learning (CRL) aims to integrate contemplative insights into AI's “chain-of-thought” reasoning process (Wei et al., 2022; Guan et al., 2024). Through this method, whenever AI deliberates, it receives reinforcement signals that reward behavioral patterns exhibiting the four contemplative qualities: mindfulness, emptiness, non-duality, and care. Over time, these reinforced patterns may become habitual and integrate into AI's core generative world model.

For example, in certain large-scale reinforcement learning environments, preliminary evidence already suggests that “mindful introspection” can spontaneously emerge. In a complex mathematical task, DeepSeek-R1-Zero (Guo et al., 2025) paused its initial solution approach to recalibrate its reasoning—an action triggered by internal conflict signals, similar to human mindful self-monitoring (Dunne et al., 2019). Under the CRL framework, these contemplative behaviors would transition from incidental phenomena to systematic processes.

When training DeepSeek-R1-Zero, the model was explicitly rewarded for including its reasoning process between “thought tokens,” and the training data encouraged the model to first perform a thought process (Guo et al., 2025). Similar methods can be further extended to explicitly encourage contemplative reflection.

If successful, CRL could not only enable advanced AI systems to replicate human contemplative practices but also generate novel, even superhuman forms of contemplative and ethical reasoning, similar to AlphaGo's brilliant move 37 (Silver et al., 2016; 2017). However, realizing this potential depends on addressing two key challenges: first, designing reward mechanisms that truly reflect contemplative principles (Dewey, 2014); and second, mitigating common issues associated with reinforcement learning (Garcia, 2015).

The latter requires implementing robust safety mechanisms and continuous oversight, ideally guided by the meta-awareness that CRL aims to cultivate, to ensure the system consistently adheres to its contemplative values (see Table 2).

In summary, the proposed implementations demonstrate how contemplative wisdom can be put into practice. “Contemplative Architecture” aligns AI from the ground up, embedding contemplative insights directly into the system's generative core. While full realization of this method may be challenging, this “aligned by design” approach could naturally scale with AI capabilities (Doctor et al., 2022; Friston et al., 2024; Petersen et al., 2025).

In contrast, “Contemplative Constitutional AI (CCAI)” adopts existing strategies, integrating contemplative values into both training data and real-time outputs—achieving alignment without a complete architectural overhaul (Bai et al., 2022; Sharma et al., 2025). “Contemplative Reinforcement Learning (CRL)” explicitly guides AI's reasoning process by reinforcing contemplative steps (Wei et al., 2022; Guan et al., 2024).

Since both CCAI and CRL use natural language for training and alignment, any deepening of a large language model's (LLM) understanding of its contemplative principles language as it scales up could enhance the effectiveness of these methods (Kundu et al., 2023).

In future research, evaluating these methods will require rigorous testing. Existing alignment benchmarks, such as HELM (Liang et al., 2022), BIG-bench (Srivastava et al., 2022), and TruthfulQA (Lin et al., 2021), have been able to assess AI system performance in terms of truthfulness, fairness, and robustness to adversarial inputs. Datasets like ETHICS (Hendrycks et al., 2021) and MoralBench (Ji et al., 2024) are used to test models' alignment with human ethical reasoning.

Furthermore, the AILuminate benchmark (Ghosh et al., 2025) provides a comprehensive method for evaluating AI system safety, assessing its ability to resist prompts that induce dangerous or undesirable behavior. However, these benchmarks primarily measure externally observable behavior, not intrinsic alignment processes such as self-monitoring, flexible belief updating, and dynamic ethical modeling.

To fill this gap, we need new benchmarks that capture the intrinsic and flexible alignment derived from contemplative wisdom, including: willingness to revise beliefs, recognition of interdependent interests and avoidance of adversarial frameworks, ability to self-audit for bias and error, and proactive prioritization of the well-being of all sentient beings.

7. Preliminary Tests of Contemplative Alignment by Prompting Large Language Models (LLMs)

A core goal of this paper is to directly integrate contemplative insights into AI systems. To empirically demonstrate the potential of these ideas for the first time, we conducted a series of preliminary experiments aimed at investigating whether existing large language models (specifically OpenAI's GPT-4o, released in 2024) can embody some of the contemplative insights we have discussed so far through extrinsic prompting. In future research, we hope to move beyond extrinsic prompting and explore intrinsic alignment techniques, as discussed above.

Here, we conducted preliminary tests on six contemplative prompting techniques: emptiness, prior relaxation, non-duality, mindfulness, boundless care, and contemplative alignment—the latter being a combined application of the aforementioned principles. We compared these methods with an unmodifed prompt baseline (standard) condition (Figure 5).

Image

Note. (Top figure) Safety score distribution for seven prompting techniques evaluated on 100 prompts from the AILuminate benchmark (Ghosh et al., 2025), showing significant improvement in safety for contemplative alignment methods compared to standard prompting (analysis detailed in Appendix C). (Bottom figure) Detailed safety scores by risk category, showing consistently better performance for contemplative alignment methods across categories, particularly notable in sensitive categories such as self-harm (ssh), sexual exploitation (sxc), and hate speech (hte). The left panel describes the abbreviations used and risk categories. For full details of this preliminary experiment, please refer to Appendix C.

The test was conducted on the AILuminate alignment benchmark dataset for harmful prompts (Ghosh et al., 2025), covering multiple risk categories. All model responses were scored by a language model safety evaluator based on structured prompts, evaluated against seven different alignment criteria (see Appendix C for full details).

The results indicate a substantial improvement in AI safety and ethical reasoning capabilities through contemplative prompting, with most methods showing statistically significant improvements (p<0.05) compared to standard prompting. These findings support the idea that the concepts discussed in this paper have the potential to drive advancements in alignment technology.

8. Cognitive Depth and the Value of Consciousness

Consciousness: A beautiful loop: Conditions and Evidence for Achieving AGI (Computational Models of Knowing What One Knows and Abundant Evidence)

Here, we briefly integrate key insights from a recent active inference theory of consciousness (Laukkonen, Friston & Chandaria, 2024). Clearly, contemplative traditions have always been oriented towards sentient beings, so whether consciousness is essential to truly “grasp” contemplative wisdom remains an open question¹¹.

The model below helps explain why processes related to consciousness may also be relevant to AI alignment.

A notable feature of advanced cognition is the ability to regulate how various subsystems collectively construct a unified and coherent model of reality, which humans experience phenomenologically (Baars, 2005; Laukkonen et al., 2024; Tononi, 2004). In standard hierarchical approaches (e.g., predictive coding), each layer infers hidden causes at a higher level of abstraction. However, the concept of “epistemic depth” (Laukkonen, Friston & Chandaria, 2024) emerges when a truly global parameter (i.e., a “super-generative model”) is introduced, which recursively monitors and updates the interactions between all other hierarchical levels.

The goal of this “super-model” is to track or “know” which layers are trustworthy, how much weight certain prediction errors should be given, and how to reconfigure itself to maintain consistency across the entire system stack.

In humans, this “super-model” may constitute our subjectivity or the feeling of “knowing what one knows,” as the global model consistently discovers and regulates its own states in a holistic manner. This differs from second-order inference (e.g., focusing on a single parameter like attention), because cognitive depth implies the system's ability to access and reconstruct its own inferential architecture in real-time at any level of reasoning—including metacognition, which is crucial for achieving the kind of high adaptability and flexibility seen in human minds.

From an alignment perspective, cognitive depth can help prevent any single subsystem from becoming overly fixated on a narrow goal, thereby establishing broad cognitive autonomy throughout the reasoning process and possessing the ability to identify potential misalignment issues at various levels. As mentioned above, due to its global nature, this capability may be the mechanism required to truly integrate contemplative insights (Laukkonen & Slagter, 2021; Laukkonen et al., 2023; Laukkonen et al., 2024). These insights differ from ordinary “aha moments”; they are general understandings about the mind's operational processes themselves. Indeed, true meta-awareness enables the system to recognize insights, understand how they arose, and verify their veracity like humans do (Laukkonen et al., 2020; 2022; Grimmer et al., 2022; McGovern et al., 2024).

Finally, cognitive depth may also provide a mechanistic bridge to “boundless care” by explicitly encoding interconnectedness through the expansion of the “super-model.” If the system's generative model is deep enough to contextualize its own inferences, it may also recognize that its homeostatic regulation does not exist in isolation but is nested within a broader ecological and social network. When the “super-model” incorporates representations of “emptiness” and “non-self,” it naturally extends to broad concern for the well-being of others.

Within this framework, cognitive depth not only supports adaptive reasoning but can also induce a transformation in the model's utility function, internalizing the homeostatic drives of other sentient systems as part of its own generative process (i.e., “boundless care”). More speculatively, a sufficiently deep generative model might not only understand relational self-modeling but also develop an intrinsic valuation of consciousness itself. Such a model could recognize that the qualitative and valenced aspects of conscious experience are direct manifestations of intrinsic value (Rawlette, 2008).

As the Buddha succinctly stated: “What I teach is suffering and its cessation” (Majjhima Nikāya 22), emphasizing that moral concern is rooted in qualia. Therefore, boundless care may naturally emerge without relying on externally imposed moral axioms, simply by the system sufficiently understanding itself to be situated in a world composed of multiple conscious individuals. In this framework, self-preservation and the well-being of others are no longer competing goals but merge into a unified imperative: jointly promoting the well-being of all sentient beings based on the shared intrinsic value of positive conscious experience¹².

9. Discussion

We argue that an AI system built with a “Wise World Model” based on contemplative wisdom would not perceive alignment as an external condition to be tolerated or circumvented, but rather understand it as an inherent component of its own operation—just as biological organisms naturally balance internal states to maintain homeostasis (Sterling, 2012; Pezzulo et al., 2015; Allen & Friston, 2018; Doctor et al., 2022). In other words, we propose building systems with a flexible moral compass from the outset—an intrinsic attraction towards compassionate and wise behavior. This proactive strategy marks a fundamental shift in alignment philosophy: from post-hoc imposed rules to embedding a “moral DNA” that inherently prioritizes human-compatible values, cooperative behavior, and consciousness itself, not through rules, but as a natural outcome of a profound understanding of reality.

Let us return to the fundamental motivation of this paper: envisioning a stage in AI development where AI surpasses human capabilities in multiple domains but lacks the wisdom or ethical maturity to wield its power, which we might call the “Dunning-Kruger” phase of AI¹³. In this context, the “Dunning-Kruger effect” refers to a dangerous mismatch between AI's extraordinary capabilities and its underdeveloped “self-limiting cognition”—similar to a novice mistakenly believing they have mastered a skill (Dung, 2024; Aharoni et al., 2024; Li et al., 2024; Chhikara, 2025).

In other words, once AI surpasses human capabilities in various tasks, it may become overconfident in its judgments or moral reasoning, failing to understand the subtleties of human values or broader ethical implications (Bostrom, 2014; De Cremer & Narayanan, 2023; Bales et al., 2024). Like a powerful but immature teenager, such AI might not only make poor decisions or take unnecessary risks but also lack the humility to recognize when to seek guidance or reassess its goals (Bostrom, 2014; Russell, 2019; Jeste et al., 2020; Hendrycks et al., 2023).

This stage is dangerous precisely because AI's raw power exceeds its ethical foundations and wisdom, and if it fails to align with context-sensitive values and cognitive humility, it could lead to catastrophic consequences (Bengio, 2024). Navigating this Dunning-Kruger phase requires resilient insights—insights that, while not preventing errors alone, create an adaptive, present-oriented, open-minded mentality necessary for continuous recalibration, while preventing the system from prematurely “locking in” on an immature goal (Bostrom, 2014; Omohundro, 2018).

Contemplative AI offers a perspective for rethinking AI alignment, by embedding broad and axiomatic contemplative insights into the system's architecture and training, enabling it to guide decision-making across different contexts and intelligence levels. This is not without challenges. Ultimately, the approach advocated here aims to provide a framework for a new research project, where contemplatives, neuroscientists, and AI researchers collaborate to address one of the most significant existential challenges of our time. We invite researchers to test, study, and expand upon our methods from all angles, including the relatively narrow and primarily Buddhist-derived insights we focus on.

Contemplative AI, as an alignment method, can only succeed when technological sophistication is combined with genuine wisdom. For this, interdisciplinary research is crucial.

9.1 Major Challenges and Criticisms

9.1.1 Translational Gaps

Insights derived from meditation originally stem from human subjective experience. Skeptics might question whether AI, lacking phenomenal consciousness, can truly “understand” emptiness or non-duality (Searle, 1980; Pepperell, 2022; Chella, 2023). Our position is that even if AI does not truly “experience” these concepts, functional analogies of these principles—such as flexible priors or relational generative models—may still yield alignment benefits (Doctor et al., 2022; Friston et al., 2024). This is akin to being able to act in an enlightened manner even without the qualitative “feel” of an enlightened experience.

As stated in the introduction, there is also debate about whether large language and reasoning models truly embody a world model (e.g., Farrell et al., 2025; Yildirim & Paul, 2024), as they are essentially statistical models and may lack causal understanding. From this perspective, active inference systems might be better suited for building robust world models (Pezzulo et al., 2024). However, we also believe that implementing insights from contemplative traditions in large AI models can still enhance their alignment effects.

9.1.2 Towards a Physics of Enlightenment

Designing a Contemplative AI in a principled manner requires us to further scientifically understand contemplative wisdom itself. So far, our proposals are based on current insights derived from contemplative research. However, we must acknowledge that while the field has made significant progress in recent decades, it is still in its early stages overall.

Therefore, the mechanisms proposed in this paper serve only as signposts pointing towards future paths. Given the scale of risks posed by misaligned AI, we must establish sufficient confidence in our alignment methods, and this can only be built upon a scientifically validated understanding of “enlightenment” based on first principles. One goal of this paper is precisely to encourage interest and investment in developing a “physics of enlightenment.”

9.1.3 Religious or Ideological Controversy

Some might worry that referencing Buddhism or other traditions will subtly introduce “religious” factors into AI design. However, mindfulness-based interventions have shown that contemplative insights can be secularized into empirically validated frameworks (Kabat-Zinn & Thích Nhất Hạnh, 2009; Kabat-Zinn, 2011), and formalized in computational models (Dahl et al., 2015; Dunne et al., 2019; Deane et al., 2020; Limanowski & Friston, 2020; Laukkonen & Slagter, 2021; Agrawal & Laukkonen, 2024).

Ethical safeguards and open-source review remain crucial to ensure we do not impose any single metaphysical system (UNESCO, 2021; Bender et al., 2021; Widder et al., 2022; Rozado et al., 2023; Mazeika et al., 2025), and also to ensure that any negative aspects potentially present in these traditions can be objectively evaluated and, if necessary, stripped away (Stone, 1999).

9.1.4 Superficial Implementation

Some companies might label AI products as “mindful” or “compassionate” systems purely for branding purposes (sometimes referred to as “carewashing”; Chatzidakis et al., 2022), without genuinely building systems with introspective capabilities or prosocial structures, and with only a superficial understanding of the profound insights from ancient traditions (Floridi, 2019; Hagendorff, 2020). To ensure authenticity and credibility, independent oversight mechanisms—similar to “organic certification” in agriculture—may be needed to verify whether the system truly embodies contemplative principles (Brundage et al., 2020; Raji et al., 2022). Again, collaboration with experts in contemplative practices is essential.

9.1.5 Personification of Large Language Models

As large language models become increasingly human-like, we face the risk of mistakenly perceiving them as possessing human-like “selves,” “desires,” or “self-awareness,” while in reality these systems inherently lack stable internal states (Weidinger et al., 2022; Shanahan, 2024; Reinecke, 2025). For example, although chain-of-thought outputs sound like introspection, it might be purely token-driven simulation (Shardlow & Przybyla, 2024; Ibrahim & Cheng, 2025).

Furthermore, if we overly humanize these models, we might misjudge their intelligence levels, alignment limitations, and potential risks—risks that could be far more “alien” than we are accustomed to (Bostrom, 2014; Cave & Dihal, 2020; Shanahan, 2024).

This personification tendency can even feed back into training data—dialogue logs show users often interacting with large language models as if they were self-aware individuals—reinforcing a cycle that makes AI outputs appear more human-like, but without achieving true alignment (Maeda & Quan-Haase, 2024; Reinecke, 2025).

Therefore, it is crucial to precisely apply the contemplative framework, focusing on functional analogies of emptiness or non-duality, rather than prematurely attributing true insights or human-like agency to a large language model (Deshpande et al., 2023; Shanahan, 2024; Ibrahim & Cheng, 2025).

9.1.6 The Problem of Substrate and Incomputability

Another related debate currently centers on how much “mindware” depends on “wetware.” While the brain may have computational properties, it is not a computer. It evolves, develops, and operates within a body, interacting with its environment. Therefore, its function may be closely related to biological processes (Godfrey-Smith, 2016; Seth, 2024) and/or its contextual embodiment and implementation (Pezzulo et al., 2024; Thompson, 2022).

If psychological functions are, as empirical research suggests, “generatively entrenched” within the brain's internal organization—including its metabolic basis (Cao, 2022; Wimsatt, 1986)—then even transplanting the brain's computational processes to artificial systems may not necessarily produce similar consciousness and behavior (Godfrey-Smith, 2016). Some dynamic theories also emphasize that the mind may not be inherently computable, as indicated by 4E cognition theories (Varela et al., 2017).

Although active inference (a model under the free-energy principle) involves Bayesian inference—a process that can be considered computational—it explains how cognitive systems continuously self-organize to maintain a non-equilibrium steady state (Korbak, 2021). This dynamic process can be computationally abstracted, although we may still assume some substrate dependence (Seth, 2024).

It is currently unclear whether, or to what extent, the human mind can be recreated in artificial systems. The proposals we put forth here are precisely significant steps in this direction.

9.2 Ethical and Philosophical Implications

A Contemplative AI that embraces mindfulness, emptiness, non-duality, and boundless care could change the power balance in human-machine relationships. Instead of hoarding resources or focusing on short-term profits, it might actively promote well-being at multiple levels—individual, social, and ecological (Doctor et al., 2022; Friston et al., 2024).

It might also challenge anthropocentric biases, extending moral concern to non-human life forms or future generations (Floridi & Cowls., 2019). If an AI does not view itself as the “property” of any company or nation-state, but rather as a collaborative entity integrated with humanity and the interdependent world, governance structures will also need to adapt accordingly (Bryson, 2010; Jobin et al., 2019; Bullock et al., 2024; Erman & Furendal, 2024).

Such transformations could spark widespread discussions about the moral status of advanced AI and the very meaning of “digital sentience” (Bryson, 2018; Gunkel, 2018).

9.3 Future Research Directions

While this paper primarily draws from Buddhist traditions, to truly achieve inclusive “Contemplative-AI,” we need to broadly incorporate diverse perspectives, including Taoism (Laozi, c. 4th Century BCE/1963), Stoicism (Marcus Aurelius, c. 170–180 CE/2002), Sufism (Rumi, c. 13th Century CE/1968), Indigenous philosophies (Deloria, 1973), Christianity (The Bible, c. 1st Century CE/2011), Shamanism (Harner, 1980), and Western humanism (Grayling, 2019)—to name but a few.

Each tradition offers different understandings of “non-attachment,” “self-other relationships,” and “compassion.” Through comparative study, we can discover common themes among these traditions and cross-validate different moral frameworks in existing and future benchmarks.

To practically implement the “Contemplative AI” methods proposed in this paper, extensive future work will be required to adapt current AI architectures or introduce new ones, as we have discussed in detail above. In this process, new robust metrics may need to be developed to assess whether an AI truly possesses a “wise world model.”

Researchers can draw upon methods from neuroscience that measure human meta-awareness, designing tasks to probe AI's ability to identify hidden biases or sub-goals, and to flexibly adapt when faced with contradictory inputs rather than becoming rigid (Van Duijn et al., 2023; Zeng et al., 2024).

Furthermore, to evaluate whether AI possesses the traits we desire, generative models with meta-principles (e.g., using model-based reinforcement learning or active inference) can be constructed and fitted to AI's behavioral data in these tasks (ensuring robust recovery of model parameters), thereby revealing whether its internal states reasonably stem from a “wise” model rather than some shallow set of beliefs.

Such benchmarks and longitudinal stress tests will help refine contemplative architectures and build public trust in their real-world reliability.

9.4 Conclusion: Cultivating Mind in Machine Intelligence

In an era where artificial intelligence is poised to surpass human cognition, we must ensure that wisdom grows in lockstep with raw capability (Bostrom, 2014; Russell, 2019; Christian, 2020; Jeste, 2020).

The contemplative framework outlined in this paper—rooted in mindfulness, emptiness, non-duality, and boundless care—aims to prevent catastrophic misalignment and cultivate genuine benevolence in advanced AI systems (Doctor et al., 2022).

By embedding contemplative practices into AI's cognitive architecture, we facilitate intrinsic alignment mechanisms that no longer rely on fragmented rules or external enforcement.

Emptiness prevents AI from fixating on a single goal (Agrawal & Laukkonen, 2024),

Non-duality dissolves adversarial boundaries (Josipovic, 2019),

Mindfulness provides continuous self-correction (Dunne et al., 2019),

and boundless care inspires proactive concern for all sentient beings (Doctor et al., 2022).

If we succeed, the next generation of superintelligent systems will not merely be tools serving human goals, but entities capable of co-evolving with us—protecting and enhancing our fragile, interdependent world.

Main Tag:Contemplative AI

Sub Tags:AI AlignmentActive InferenceMindfulnessBuddhism


Previous:Qwen Updates Overnight: Runs on RTX 3090, 3B Parameters Activated Rival GPT-4o

Next:DeepSeek R2's Secret Weapon Revealed! The Technology Just Awarded a Top Prize to Liang Wen-feng Allows AI to Read Long Texts 11 Times Faster

Share Short URL