4B Qwen3 Overtakes 671B DeepSeek! Is ByteDance's DAPO Fine-tuning Method That Powerful?

What is the limit for 4B small models?

The latest model, Jan-nano, has sparked heated discussion. It outperformed the latest 671B DeepSeek-V3 0528 on agent tasks, scoring 80.7 on the SimpleQA benchmark.

Image

Let's take a look at its actual performance, including two tasks:

To research a company's current expansion, which is threatening another company's market share, and write an MBA-level report that may impact financial due diligence processes.

Summarize today's breaking financial news, focusing on shocking headlines.

In summary, Jan-nano's capabilities include:

Conducting in-depth research with the correct prompts

Effectively retrieving relevant information from search results

Optimized for MCP protocol, allowing seamless integration with various MCP server calling tools

Let's also look at the official evaluation results. Its competitors are either closed-source solutions or large 671B MoE models like DeepSeek-v3.

Image

Currently, Jan-nano has achieved the highest score of 80.7%, and the authors revealed that the goal for the next version is 85%.

Image

However, the Menlo Research team specifically reminded everyone that Jan-Nano is only superior to Deepseek-671B on this single metric, and the testing used an MCP-based method.

We fully understand that 4B models have their limitations, but it's always interesting to see how far they can go.

Specifically, Jan-nano uses ByteDance & Tsinghua's open-source DAPO reinforcement learning fine-tuning method on Qwen3-4B.

Image

The team stated that a detailed technical report will be released soon, so please stay tuned.

Who is Menlo Research?

Menlo Research is an open R&D lab focused on AI and robotics technology, with its primary goal being to build the "brain" for robots.

Its founders are a husband-and-wife team, Daniel Ong and Nicole Zhu. Nicole Zhu dropped out of her Human-Computer Interaction master's program at Stanford to start the company, having previously worked as a Senior Engineer at Google.

Image

Menlo Research adheres to the user ownership principle, with all products being open source and designed for offline operation or self-hosting.

Image

Previously, Menlo Research's core product was Jan, an open-source AI assistant application that could run 100% offline.

Jan was positioned as an alternative to ChatGPT and achieved over a million downloads within months of its launch, without venture capital support.

Image

Jan's long-term vision is to become a "self-driving computer," transitioning from user-operated computers to autonomous computer operation. Specifically, planned capabilities include:

Converting user instructions into direct actions

Working across applications without manual switching

Learning specific user workflows

Autonomously completing repetitive tasks

Additionally, Menlo Research also showcased a humanoid robot at the Echelon exhibition in Singapore.

Image

Jan-nano model download: https://huggingface.co/Menlo/Jan-nano

Menlo Research: https://menlo.ai

Reference link: [1]https://www.reddit.com/r/LocalLLaMA/comments/1lbrnod/jannano_a_4b_model_that_can_outperform_671b_on_mcp/

Main Tag:Artificial Intelligence

Sub Tags:Large Language ModelsOpen Source AIFine-tuningSmall Models


Previous:Nature Warns: AI's 'Data Hunger' Triggers Academic Website Outages! 90% of Knowledge Bases on the Brink of Collapse

Next:o3-pro Completes 'Sokoban,' Classic Retro Games Become New Benchmarks for Large Models

Share Short URL