AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale

1. Introduction: A New Milestone in AI Evolution

Do you remember the scene at the end of last year, when major manufacturers were competing to launch super-scale AI models? OpenAI's o1, Google's Gemini 2.5, Anthropic's Claude 3.7... These models often had hundreds of billions of parameters, which was astounding. But have you ever wondered: do you really need such massive models to achieve excellent reasoning capabilities?

圖片

Recently, researchers released a model called "AM-Thinking-v1," which, with only a 32B parameter dense architecture, achieved surprising results on high-difficulty tasks such as mathematical reasoning and code generation, even surpassing mixture-of-experts models like DeepSeek-R1 (671B parameters) and near Qwen3-235B-A22B. What is the significance of this achievement? And how was it realized? Let's take a look.

2. Unveiling: Medium-Scale Models Can Also Have Super Reasoning Capabilities

If the development of large language models is a marathon, most companies are sprinting in the direction of "bigger is better," while AM-Thinking-v1 has chosen a different path: striving for excellence, not blind expansion.

This model achieved high scores of 85.3 and 74.4 on the math competition-level AIME 2024 and AIME 2025 tests, respectively, and a score of 70.3 on the LiveCodeBench code benchmark. What does this mean? Simply put, its ability to solve complex mathematical problems and write high-quality code has surpassed many large models with 10 or even 20 times the parameters!

Even more astonishing is that the research team built this result entirely based on the open-source Qwen2.5-32B base model and publicly available training data. This is like creating a product far exceeding expectations with the same raw materials through exquisite craftsmanship.

3. Technical Breakdown: How a Carefully Designed Post-Training Process Changes the Game Rules

The success of AM-Thinking-v1 is not accidental; it stems from the researchers' carefully designed post-training process. This process primarily includes two key stages, and it is these stages that enabled an ordinary base model to gain super reasoning capabilities.

(1) Data Processing: Quality Over Quantity

The research team did not blindly pursue massive amounts of data but strictly screened and processed all training data:

1) Strict deduplication: Remove duplicate query samples

2) Quality filtering: Exclude data with URLs or referenced images

3) Data validation: Especially for mathematical data, they built a complete processing pipeline, including query filtering and answer validation

For mathematical data, researchers even used DeepSeek-R1 to generate multiple answers and compare them with the original answers. When inconsistencies were found, they consulted the o4-mini model again to get alternative answers. This meticulous data validation ensures that the model does not learn from errors, greatly improving the training effect.

(2) Two-Stage Training: The Powerful Combination of SFT + RL

The training process adopted a two-stage design:

Stage 1: Supervised Fine-Tuning (SFT)

1) Used approximately 2.84 million samples, covering five major categories: mathematics, programming, science, instruction following, and general conversation

2) Used a relatively high learning rate (8e-5) and a large batch size (64)

3) For multi-turn dialogue data, only the final answer containing the reasoning process was used as the training target

Stage 2: Reinforcement Learning (RL)

1) Adopted difficulty-aware query selection, filtering out samples with a pass rate of 0 or 1 to ensure that the training data was sufficiently challenging

2) Used the Grouped Relative Policy Optimization (GRPO) algorithm, without KL constraints

3) Two-stage generation and learning rate scheduling: The first stage limited the maximum response length to 24K, with a learning rate of 4e-6; the second stage increased the maximum response length to 32K and reduced the learning rate to 1e-6

Researchers found that using a larger learning rate in the early stages of training can make the model converge faster, significantly reducing the overall training cost. This proves that a carefully designed training strategy can compensate for the lack of parameter scale.

圖片

圖片

4. Conclusion

The success of AM-Thinking-v1 has multiple implications:

(1) Cost-effectiveness: Compared to MoE models with hundreds of billions of parameters, the inference and deployment costs of 32B dense models are much lower, meaning more institutions and developers can afford high-level AI capabilities

(2) Practicality advantage: Medium-scale models are easier to deploy and fine-tune, suitable for a wider range of application scenarios

() Open-source innovation: Proves that the open-source community can also build high-performance models comparable to proprietary systems, promoting the democratization of AI technology

Shift in research direction: Indicates that progress in the field of AI does not solely depend on increasing parameter scale; meticulous post-training design is equally important

Although AM-Thinking-v1 has achieved impressive results, it still has some limitations: lack of support for structured function calling and tool use, no multimodal input capabilities, and safety alignment is still in the preliminary stage.

However, this research undoubtedly provides a new direction for the future development of AI: through a carefully designed training process, medium-scale models can achieve or even surpass the performance of super-scale models on specific tasks.

This paradigm shift may influence the development direction of the entire AI industry, leading more researchers and developers to consider: can AI capabilities be enhanced through smarter methods, rather than simply stacking parameters?

With the continuous emergence of models like AM-Thinking-v1, we have reason to believe that the future of AI does not only belong to tech giants with massive computing resources but also to innovators who can skillfully utilize limited resources to create extraordinary value.

Paper Title: AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale

Paper Link: https://arxiv.org/abs/2505.08311

Recommended Reading

FloE: Makes MoE Models

Main Tag:Artificial Intelligence

Sub Tags:Large Language ModelsReasoningModel PerformanceModel Training


Previous:Ant Group's Wu Wei: A Big Guess on the Next Generation 'Reasoning' Model Paradigm

Next:Topping the Arena! MiniMax's Latest Speech-02 Model Sweeps the Charts: Surpassing OpenAI, ElevenLabs, 99% Human Voice Similarity

Share Short URL