Breaking News! DeepSeek Officially Releases 2 Models

Breaking!

On the third anniversary of ChatGPT's release, DeepSeek suddenly launched two models:

• DeepSeek-V3.2

• DeepSeek-V3.2-Speciale

In public reasoning benchmarks, DeepSeek-V3.2 reaches GPT-5 levels, slightly below Gemini-3.0-Pro; compared to Kimi-K2-Thinking, V3.2 significantly reduces output length, cutting computational overhead and user wait times.

The former focuses on balanced practicality, suitable for daily Q&A, general Agent tasks, and tool calling in real-world scenarios.

Reasoning at GPT-5 level, slightly below Gemini-3.0-Pro.

The latter excels in ultimate reasoning, with benchmark performance matching Gemini-3.0-Pro.

It also secured gold medals in IMO 2025, CMO 2025, ICPC World Finals 2025, and IOI 2025.

Key point: ICPC at second place among human competitors, IOI at tenth.

Specifically, DeepSeek-V3.2 emphasizes balancing reasoning capability and output length to reduce computational costs.

DeepSeek's official tweet states, "DeepSeek-V3.2 achieves the highest level among current open-source models in Agent evaluations."

Other details of the model:

• Reasoning capability on par with GPT-5;

• Significantly shorter output length than Kimi-K2-Thinking, reducing user wait times;

• DeepSeek's first model with "thinking integrated into tool calling," supporting thinking/non-thinking dual-mode tool calls;

• Trained on large-scale Agent data with 1800+ environments and 85000+ complex instructions, strong generalization.

The chart below shows DeepSeek-V3.2 scores compared to other models on various Agent tool-calling evaluation sets

— Notably, DeepSeek-V3.2 was not specially trained on tools from these test sets.

DeepSeek-V3.2-Speciale is a long-thinking enhanced version of DeepSeek-V3.2, incorporating theorem-proving capabilities from DeepSeek-Math-V2.

It excels in instruction following, mathematical proofs, and logical verification, recommended for highly complex math reasoning, programming contests, and academic research tasks.

Note! This version is not optimized for daily conversation or writing.

It is for research use only, without tool calling support.

On highly complex tasks, the Speciale model greatly outperforms the standard version but consumes significantly more tokens and higher costs.

Currently, DeepSeek's App and web versions have updated to the official DeepSeek-V3.2; Speciale is available only via temporary API.

The technical report was released alongside the models.

The paper reveals hardcore technical details:

New sparse attention mechanism DSA drastically reduces computational complexity, RL training compute exceeds 10% of pre-training, plus a new large-scale Agent task synthesis pipeline...

Let's dive in.

Introducing DSA Efficient Sparse Attention Mechanism, Long Text No Longer a Burden

DeepSeek-V3.2's biggest architectural innovation is the DSA (DeepSeek Sparse Attention) mechanism.

Traditional attention has O(L²) complexity for long sequences, severely limiting deployment efficiency and training scalability.

DSA reduces it to O(L·k), where k << L.

Meanwhile, DSA significantly speeds up inference on long-context tasks without noticeable performance loss.

Supports FP8 precision, compatible with MLA (Multi-Query Attention) architecture, training-friendly.

How is it achieved?

DSA mainly includes two components: lightning indexer and fine-grained token selection mechanism.

The lightning indexer quickly computes relevance scores between query tokens and historical tokens, then selects only the top-k most relevant for attention computation.

The team specifically used ReLU activation to boost throughput.

When continuing training from DeepSeek-V3.1-Terminus, the team used a two-stage strategy.

Stage 1: Dense Warm-up, keeping dense attention, training only the lightning indexer to align with main attention distribution.

This stage used only 1000 steps, processing 2.1 billion tokens.

Stage 2 introduces sparsity, selecting 2048 key-value pairs per query token, trained for 15000 steps, totaling 943.7 billion tokens.

Real-world results are impressive—

On 128k sequences, DeepSeek-V3.2's inference cost is several times lower than V3.1-Terminus.

H800 cluster tests show: at 128K length, prefill cost per million tokens drops from $0.7 to ~$0.2, decode from $2.4 to $0.8.

Post-Training Compute Exceeds 10% of Pre-Training

Notably, the DeepSeek team invested heavily in reinforcement learning this time.

The paper states RL training compute budget exceeded 10% of pre-training costs, rare for open-source models.

DeepSeek notes that open-source models lack sufficient post-training compute, limiting hard-task performance.

Thus, the team developed a stable, scalable RL protocol, with post-training compute over 10% of pre-training, unlocking advanced capabilities.

In detail—

To stably scale RL compute, the team improved the GRPO (Group Relative Policy Optimization) algorithm.

First, unbiased KL estimation, fixing the original K3 estimator to eliminate systematic errors.

The original could produce unbounded gradient weights, causing instability.

Second, offline sequence masking strategy.

In practice, massive rollout data is generated then split into mini-batches for updates, introducing off-policy behavior.

By computing KL divergence between data sampling policy and current policy, distant negative samples are masked to avoid interference.

The team also designed Keep Routing for MoE models.

Differences between inference and training frameworks can activate different experts for the same input, causing parameter jumps. Saving inference routing paths and enforcing them in training ensures consistency.

For training, expert distillation strategy was used.

First, train specialized models per task: math, coding, general logic reasoning, general Agent tasks, Agent coding, Agent search—6 domains, each with thinking/non-thinking modes.

Then use these to generate domain-specific data for the final model.

Breakthrough in Agent Capabilities

Additionally, breakthroughs in Agent tasks are eye-catching.

The team found a way to enable simultaneous reasoning and tool use.

In thinking context management, the team found DeepSeek-R1's strategy of discarding reasoning on new chats too token-wasteful.

Thus, new mechanism: discard historical reasoning only on new user messages; retain for tool messages. Even if traces deleted, tool call history/results stay.

For cold start, clever prompt design.

System prompts teach natural tool insertion during reasoning.

E.g., for coding contests, require thinking first, mark paths with tags.

Hardcore: auto environment synthesis pipeline generated 1827 task-oriented envs and 85000 complex prompts.

E.g., travel planning: 3-day itinerary under constraints like no repeat cities, adjust restaurant/attraction budgets by hotel price, complex logic.

Finding constraint-satisfying plans in vast space is hard, but verifying is easy—this "hard-to-solve, easy-to-verify" suits RL.

For code Agents, mined millions of GitHub issue-PR pairs, rigorously filtered and auto-built envs for tens of thousands executable software fixes, covering Python, Java, JS, etc.

Search Agents use multi-Agent pipeline: sample long-tail entities from web corpus, then question build, answer gen, verification for high-quality data.

Results: 73.1% solve rate on SWE-Verified, 46.4% accuracy on Terminal Bench 2.0, far surpassing open-source SOTA.

On MCP-Universe, Tool-Decathlon, near closed-source performance.

These show generalization of reasoning to unseen Agent scenarios.

One More Thing

The report candidly notes limitations.

Due to lower total training FLOPs, world knowledge breadth lags leading closed-source models.

Token efficiency challenge: models need longer trajectories for Gemini-3.0-Pro quality.

But team says these are future improvements.

However—

DeepSeek, when will our longed-for R2 come?!!!!

Breaking News! DeepSeek Officially Releases 2 Models

Share Short URL