The reasoning ability of LLMs relies on "Chain-of-Thought" (CoT), which involves generating intermediate reasoning steps. However, traditional methods generate these steps in a discrete token space, leading to two major problems:
Information Loss: Only one word can be chosen at each step, and complex logic may be simplified;
Insufficient Diversity: Multiple samplings may generate identical paths, failing to fully explore possibilities.
For example, when asking a model to solve a math problem, it might repeatedly use the same incorrect approach, resulting in an inaccurate answer.
Paper: SoftCoT++: Test-Time Scaling with Soft Chain-of-Thought Reasoning
Link: https://arxiv.org/pdf/2505.11484
Comparison between traditional CoT and SoftCoT++: The former generates steps in discrete space, while the latter generates "soft thoughts" in continuous space
In recent years, studies like Coconut and SoftCoT have attempted to encode the reasoning process using a continuous latent space (similar to the brain's "fuzzy thinking"), but a new problem arises: How to enable the model to "think multiple paths" in the continuous space?
How SoftCoT++ Breaks Through Limitations with "Soft Thoughts"
Core Idea of SoftCoT++:
Separate "Thinking" and "Reasoning":
Thinking Stage: Use a small auxiliary model to generate "soft thoughts" in continuous space (similar to vague inspiration);
Reasoning Stage: The large model generates specific steps based on these "inspirations".
Simulate Multi-Path Exploration: Traditional methods can only generate different paths through random sampling, while SoftCoT++ enables the model to naturally differentiate into diverse paths in the continuous space by perturbing initial conditions (e.g., providing different "thinking starting points").
For example: When solving the same problem, the model might first consider "using equations" or "drawing a diagram"; different starting points will lead to different solution methods.
Technical Details: Diverse Initial Tokens and Contrastive Learning
Two Key Technologies:
Specialized Initial Tokens ([TNT] token)
Traditional methods use fixed placeholders (e.g., [UNK]) to trigger thinking, while SoftCoT++ uses multiple different [TNT] tokens, each corresponding to a different initial thinking direction.
This is equivalent to giving the model different "thinking fuses" to trigger diverse soft thoughts.
Contrastive Learning
Goal: Make the soft thoughts from different paths as "different" as possible.
Method: Maximize the difference between different thoughts through a loss function (formula below).
(Simply put: make thoughts from the same path more concentrated, and thoughts from different paths more dispersed)
Contrastive experiments: Adding noise alone (SoftCoT-P) has limited effect, while combining specialized tokens and contrastive learning (SoftCoT++) significantly improves performance.
Experiments: Fully Surpassing Traditional Methods
In 5 benchmark tests covering mathematics, commonsense, and symbolic reasoning, SoftCoT++ performed remarkably well:
Mathematical Reasoning: GSM8K accuracy increased by 1-2%, with the Qwen3 model reaching 93.65%;
Commonsense Reasoning: Stable lead in StrategyQA task;
Compatibility: Performance further soared when combined with Self-Consistency.
Comparison of mainstream methods: SoftCoT++ completely surpasses traditional CoT and Coconut
More crucially, without modifying model parameters, simply increasing computational resources during inference (e.g., generating 10 thinking paths) can immediately improve results.
Note: Nickname - School/Company - Area / Conference (e.g. ACL), enter the tech/submission group
ID: DLNLPer, remember to include a note