Continuous Thought Machines Are Here! Startup by a 'Transformer Eight Sons' Member Launches, Letting AI Stop Making 'One-Step' Snap Decisions

Synced Report

Editors: Du Wei, Dan Jiang

Launching a new paradigm of "step-by-step thinking."

It is a consensus in the scientific community that even the most complex modern artificial intelligence is difficult to match the performance and efficiency of the human brain.

Researchers often look to nature for inspiration on how to make progress in artificial intelligence, such as using evolution to merge models, evolving more efficient memory for language models, or exploring the space of artificial life. While artificial neural networks have achieved extraordinary feats in AI in recent years, they remain simplified representations of their biological counterparts. So, can we elevate the capability and efficiency of artificial intelligence to a new level by incorporating features from biological brains?

They decided to rethink a crucial feature at the core of cognition: time.

Just now, Sakana AI, co-founded by Llion Jones, one of the original Transformer authors, released the "Continuous Thought Machine" (CTM), an AI model that uses the synchronization of neural activity as its core reasoning mechanism. It can also be seen as a new type of artificial neural network that utilizes the synchronization between neuronal dynamics to perform tasks.

图片

Blog: https://sakana.ai/ctm/

Technical Report: https://pub.sakana.ai/ctm/paper/index.html

Code: https://github.com/SakanaAI/continuous-thought-machines/

Unlike traditional artificial neural networks, CTM uses timing information at the neuronal level, enabling more complex neural behavior and decision-making processes. This innovation allows the model to "think" about problems step-by-step, making its reasoning process interpretable and human-like.

Studies have shown that the robot's problem-solving ability and efficiency are improved in various tasks.

Sakana AI states that CTM is a significant step towards bridging the gap between artificial and biological neural networks, potentially unlocking new frontiers in AI capabilities.

Visualization of CTM solving a maze and thinking about real photos (Image Credit: Alon Cassidy). It's notable that despite the CTM's design not being explicit, the solutions it learns in the maze are highly interpretable and human-like, showing it tracing a path through the maze as it "thinks" about the solution. For real images, although there are no explicit incentives for it to look around, it scans in an intuitive way.

Research Innovation

While the advent of deep learning in 2012 brought about a significant leap in artificial intelligence capabilities, the basic model of artificial neurons used in AI models has largely remained unchanged since the 1980s. Researchers still primarily use a single output from a neuron, which represents the neuron's firing status but ignores the precise timing of the neuron's firing relative to other neurons.

However, strong evidence suggests that this timing information is crucial in biological brains, for example, in spike-timing-dependent plasticity, which is fundamental to biological brain function.

In the new model, Sakana AI represents this information by allowing neurons to access their own history of behavior and learn how to use this information to compute their next output, rather than just knowing their current state. This allows neurons to change their behavior based on information from different times in the past. Furthermore, the main behavior of the new model is based on the synchronization between these neurons, meaning they must learn to use this timing information to coordinate tasks. The researchers believe that compared to what is observed in contemporary models, this will lead to a richer dynamic space and different task-solving behaviors.

After adding this timing information, Sakana AI observed a range of unusual behaviors in many tasks. The behaviors they saw were highly interpretable: when observing an image, the CTM would carefully move its gaze across the scene, choosing to focus on the most salient features, and its performance improved in some tasks. This surprised the researchers with the diversity of behavior in neuronal activity dynamics.

图片

Sample of neuronal dynamics in CTM, showing how neurons change with different inputs. CTM clearly learned multiple neuronal behaviors. How each neuron (random color) synchronizes with other neurons. Researchers measured this and used it as the representation for CTM.

The behavior of the new model is based on a new representation: the synchronization between neurons over time. Researchers believe this is more reminiscent of biological brains but not a strict simulation. They call the resulting AI model the "Continuous Thought Machine," which can use this new time dimension, rich neuronal dynamics, and synchronization information to "think" about tasks and formulate plans before providing an answer.

The word "continuous" in the name is used because CTM operates entirely within an internal "thought dimension" during inference. It is asynchronous to the data it consumes: it can reason about static data (like images) or sequential data in the same way. The researchers tested this new model on a wide range of tasks and found that it could solve various problems, often in a very interpretable way.

The neuronal dynamics observed by the researchers were somewhat more like dynamics measured in real brains than in more traditional artificial neural networks, which exhibit much less behavioral diversity (see comparison with the classic AI model LSTM below). CTM showed neurons oscillating at different frequencies and amplitudes. Sometimes, individual neurons would oscillate at different frequencies, while other neurons would only show activity when completing a task. It is important to emphasize that all these behaviors were entirely emergent and not designed into the model but appeared as a side effect of adding timing information and learning to solve different tasks.

image.png

The complete CTM architecture is shown below. ① is the synaptic model (weights shown as blue lines), modeling cross-neuron interaction to produce pre-activations. For each neuron, ② keeps a history of pre-activations, where the most recent history is used by the ③ neuron-level model (weights shown as red lines) to produce ④ post-activations. In addition, ⑤ a history of post-activations is kept and used in ⑥ computing the synchronization matrix. ⑦ Pairs of neurons are selected from the synchronization matrix, yielding ⑧ latent representations. CTM uses these representations ⑨ to produce outputs and modulate data via a cross-attention mechanism. The modulated data (e.g., attention output) is connected with post-activations ⑩ for the next internal clock cycle.

image.png

CTM Architecture Test Results

Due to the addition of the time dimension, a major advantage of CTM is that you can observe and intuitively see how it solves problems over time. Traditional AI systems might classify an image in a single pass through a neural network, while CTM can take multiple steps to "think" about how to solve the task.

Below are two tasks demonstrated: maze solving and object classification in photos.

First, let's look at the Maze Solving task. In this task, the CTM is presented with a top-down 2D maze and asked to output the steps required to exit the maze. This mode is particularly challenging because the model must understand the maze structure and plan a solution, rather than simply outputting a visual representation of the path.

CTM's internal continuous "thought steps" allow it to formulate a plan, making it intuitive to see which parts of the maze it focuses on during each thought step. Notably, CTM learned a method for solving mazes that is very similar to humans - moving its attention pattern along the maze path.

图片

The behavior pattern of CTM is particularly impressive because it emerges naturally from the model architecture. The researchers did not specifically design CTM to track paths in the maze; it developed this method through learning. They also found that when allowed more thought steps, CTM would continue along the learned path, indicating that it had indeed learned a general method for solving this problem.

Next is the image recognition task. Traditional image recognition systems make classification decisions in a single step, while CTM requires multiple steps to examine different parts of the image before making a decision. This step-by-step approach not only makes the AI's behavior more interpretable but also improves accuracy: the longer it "thinks," the more accurate the answer.

Researchers also found that this method allows CTM to reduce the time spent thinking about simple images, thus saving computational resources. For example, when recognizing a gorilla, CTM's attention shifts from the eyes to the nose, and then to the mouth, which is very similar to human visual attention patterns.

图片

These attention patterns provide a window into the model's reasoning process, showing which features it considers most relevant to the classification target. This interpretability not only helps understand the model's decisions but may also help identify and address biases or failure modes.

Conclusion

Although modern artificial intelligence is built upon the concept of "artificial neural networks" inspired by the brain, even today, the overlap between AI research and neuroscience remains surprisingly small. AI researchers choose to continue using the minimalist models developed in the 1980s, and thanks to attributes like simplicity and efficient training, these models continue to achieve success in driving AI development.

On the other hand, neuroscience can create more accurate brain models, but its main purpose is to understand the brain, not to attempt to create more advanced intelligence models. Of course, there might be some connection between the two. Although these neuroscience models are more complex, their performance is often still below that of the current state-of-the-art AI models, hence, these types of models may lack the attractiveness for further research in the field of artificial intelligence applications.

Nevertheless, the researchers believe that if modern artificial intelligence does not continue to move closer to the way the brain works in some aspects, we will miss opportunities. Perhaps we can create more powerful and efficient models in this way. The "deep learning revolution" emerged in 2012 thanks to neural network models inspired by the brain, leading to a leap in AI capabilities.

To continue this progress, should we continue to be inspired by the brain? CTM is the researchers' first attempt to bridge the gap between these two fields. It shows some initial signs of behavior more akin to the brain while still being a practical artificial intelligence model capable of solving important problems.

The researchers hope to continue pushing models in this naturally inspired direction and explore potential new functionalities. For CTM's behavior on different tasks, please refer to the original technical report.

© THE END

Please contact this official account for reprint authorization

Submissions or inquiries for coverage: liyazhou@jiqizhixin.com

Main Tag:AI Models

Sub Tags:Neural NetworksSakana AIMachine LearningNeuroscience


Previous:ZeroSearch: Zero-Search Reinforcement Incentivizes Model Potential, Ushering in a New Era for LLM Search Capability

Next:Anthropic Co-founder Jack Clark on AGI: AI is Already Affecting Our Economic Growth

Share Short URL