What is the limit for 4B small models?
The latest model, Jan-nano, has sparked heated discussion. It outperformed the latest 671B DeepSeek-V3 0528 on agent tasks, scoring 80.7 on the SimpleQA benchmark.
Let's take a look at its actual performance, including two tasks:
To research a company's current expansion, which is threatening another company's market share, and write an MBA-level report that may impact financial due diligence processes.
Summarize today's breaking financial news, focusing on shocking headlines.
In summary, Jan-nano's capabilities include:
Conducting in-depth research with the correct prompts
Effectively retrieving relevant information from search results
Optimized for MCP protocol, allowing seamless integration with various MCP server calling tools
Let's also look at the official evaluation results. Its competitors are either closed-source solutions or large 671B MoE models like DeepSeek-v3.
Currently, Jan-nano has achieved the highest score of 80.7%, and the authors revealed that the goal for the next version is 85%.
However, the Menlo Research team specifically reminded everyone that Jan-Nano is only superior to Deepseek-671B on this single metric, and the testing used an MCP-based method.
We fully understand that 4B models have their limitations, but it's always interesting to see how far they can go.
Specifically, Jan-nano uses ByteDance & Tsinghua's open-source DAPO reinforcement learning fine-tuning method on Qwen3-4B.
The team stated that a detailed technical report will be released soon, so please stay tuned.
Who is Menlo Research?
Menlo Research is an open R&D lab focused on AI and robotics technology, with its primary goal being to build the "brain" for robots.
Its founders are a husband-and-wife team, Daniel Ong and Nicole Zhu. Nicole Zhu dropped out of her Human-Computer Interaction master's program at Stanford to start the company, having previously worked as a Senior Engineer at Google.
Menlo Research adheres to the user ownership principle, with all products being open source and designed for offline operation or self-hosting.
Previously, Menlo Research's core product was Jan, an open-source AI assistant application that could run 100% offline.
Jan was positioned as an alternative to ChatGPT and achieved over a million downloads within months of its launch, without venture capital support.
Jan's long-term vision is to become a "self-driving computer," transitioning from user-operated computers to autonomous computer operation. Specifically, planned capabilities include:
Converting user instructions into direct actions
Working across applications without manual switching
Learning specific user workflows
Autonomously completing repetitive tasks
Additionally, Menlo Research also showcased a humanoid robot at the Echelon exhibition in Singapore.
Jan-nano model download: https://huggingface.co/Menlo/Jan-nano
Menlo Research: https://menlo.ai
Reference link: [1]https://www.reddit.com/r/LocalLLaMA/comments/1lbrnod/jannano_a_4b_model_that_can_outperform_671b_on_mcp/