Breaking the Chain-of-Thought Reasoning Bottleneck! "Soft Thinking" Enables LLMs to Learn Human Abstract Abilities, with Reduced Token Usage

No longer "generating word by word" like CoT (Chain-of-Thought), adding "Soft Thinking" allows large models to think abstractly like humans.

Researchers from SimularAI and Microsoft DeepSpeed jointly proposed Soft Thinking, enabling models to perform "soft reasoning" in a continuous concept space, rather than being limited to discrete linguistic symbols, thereby breaking the reasoning bottleneck based on discrete tokens.

Compared to standard CoT, Soft Thinking improves Pass@1 average accuracy by up to 2.48% and reduces token usage by 22.4%.

Furthermore, Soft Thinking is a plug-and-play reasoning strategy that can be applied to existing models (such as Llama, Qwen) without additional training.

Image

A key problem with current mainstream language model reasoning methods is that they can only generate discrete linguistic symbols (such as words or sub-words) word by word.

This is like thinking by uttering one word at a time, which not only limits the model's ability to express abstract concepts but also makes it prone to errors in complex problems due to "single path selection."

The human brain does not rely on explicit linguistic symbols when thinking but rather on the flexible integration of abstract concepts for reasoning.

Image

Soft Thinking is inspired by this, extending the reasoning of language models from a "discrete symbol space" to a "continuous concept space."

This allows the model to capture concepts between subtly different semantics, enabling it to explore multiple solution paths more flexibly while remaining efficient and interpretable.

Some netizens commented: This method solves the autoregressive "greedy" next token search problem.

Image

How to enable models to think abstractly like humans

Reasoning Process: "Soft Reasoning" in a Continuous Concept Space

Soft Thinking only modifies the intermediate reasoning stage of traditional CoT, retaining the discrete generation of the final answer (e.g., numerical answers for math problems or specific code statements).

The theoretical essence of Soft Thinking is linear approximation replacing path enumeration.

When solving complex problems, the number of reasoning paths in traditional CoT grows exponentially with steps (e.g., choosing 1000 tokens per step, 3 steps yield 1000^3 paths), making explicit enumeration impossible.

Soft Thinking simplifies the summation of exponential paths into weighted calculations of conceptual tokens through linear approximation.

It uses probability weighting to replace discrete sampling, implicitly aggregating information from multiple paths through linear transformation in a continuous concept space, avoiding computational explosion from explicit enumeration.

Image

Concept Tokens: Using Probability Distributions Instead of Single Symbols

Traditional methods generate a single determined token each time (e.g., "30," "plus"), while Soft Thinking generates a probability distribution (e.g., 40% probability for "30," 30% for "multiply," 20% for "decompose," etc.). This distribution is called a "concept token."

Each concept token is equivalent to a "mixture" of multiple possible symbols, allowing the model to simultaneously retain multiple reasoning possibilities.

As shown in the example below, when calculating "43×34," the model might simultaneously consider the probabilities of two paths: "decompose 34 into 30+4" and "direct multiplication," instead of choosing only one.

Image

Continuous Concept Space: Reasoning in a "Fuzzy" Semantic Space

By weighting the probability distribution of concept tokens with the model's word vectors (Token Embedding), a continuous concept space is formed.

Here, "continuous" means that the model can smoothly transition between different concepts, for example, naturally moving from "number decomposition" to "multiplication operation" without needing explicit linguistic symbols to separate steps.

Image

Cold Stop Mechanism: Avoiding Invalid Loops

Since the model has not encountered concept tokens during training (they are "out-of-distribution" inputs), prolonged reasoning might lead to repetition or confusion (similar to "mental blocking" in humans).

Soft Thinking introduces a "Cold Stop" mechanism: by monitoring the entropy of the probability distribution, it assesses the model's "confidence."

When the entropy remains consistently low (indicating the model is very confident about the current reasoning path), it terminates the intermediate steps early and directly generates the answer, avoiding wasted computational resources.

Test Results and Comparisons

In benchmark tests, the average Pass@1 accuracy of the QwQ - 32B model increased from 83.84% with standard CoT to 86.32%, a maximum increase of 2.48%, including a 6.45% improvement on the AIME 2024 dataset.

In terms of inference efficiency, DeepSeek-R1-Distill-Qwen-32B reduced token usage by 22.4% in mathematical tasks.

Image

Comparison with other methods

COCONUT-TF (no training): Directly using hidden states as input completely failed, generating maximum length output without correct solutions.

Average Embedding Strategy: Only calculating the mean of top-5 tokens resulted in low accuracy and long generation length (e.g., only 6.66% correct on AIME 2024).

Image

Soft Thinking intelligently balances efficiency and accuracy through continuous concept space reasoning and the Cold Stop mechanism, providing new insights for large model optimization.

Interested friends can visit the official website for more details.

Official website: https://soft-thinking.github.io/Paper address: https://arxiv.org/abs/2505.15778Code address: https://github.com/eric-ai-lab/Soft-ThinkingReference link: https://x.com/xwang_lk/status/1925399783503798692

— End —

📪 QubitAI's AI topic planning is currently underway! We welcome your participation in the "365 AI Application Solutions" and "1001 AI Applications" topics, or share with us AI products you are looking for, or new AI trends you have discovered.

💬 You are also welcome to join the QubitAI daily AI exchange group to chat about AI!

Image

One-click follow 👇 light up the star

Daily updates on cutting-edge technology

One-click triple combo "Like" "Share" "Heart"

Feel free to leave your thoughts in the comment section!

Main Tag:Large Language Models

Sub Tags:Soft ThinkingAI EfficiencyChain-of-ThoughtAbstract Reasoning


Previous:How Does Claude 4 Think? Senior Researchers Respond: RLHF Paradigm is Out, RLVR Proven in Programming/Mathematics

Next:Microsoft AI Publicly 'Torments' Microsoft Employees: Bug Fixes' Sole Contribution Was Changing PR Titles, GitHub Comments Section Becomes a Spectacle

Share Short URL