Category: Computational Efficiency
- Chinese Team Trains "Spiking Large Model," Boosting Inference Speed by 100 Times
- NeurIPS'25! AutoPrune: A Plug-and-Play Adaptive Pruning Framework for Large Models
- Achieving Lossless Mathematical Reasoning with 10% KV Cache: An Open-Source Method to Resolve 'Memory Overload' in Large Inference Models
- RMoA: Residual Extraction Mixture-of-Agents, Enabling Agents to Discover New Information and Adaptively Stop [ACL2025]
- ICML 2025 | Fast and Powerful Liger! Transformer Instantly Switches to Linear RNN with only 20M Token Fine-tuning