Category: Small Models

4B Qwen3 Overtakes 671B DeepSeek! Is ByteDance's DAPO Fine-tuning Method That Powerful?
ZTE Research: LLM Adaptive Question Difficulty Grading Distillation Gives Small Models 'Long Chain Thinking'