Category: Computational Efficiency

Chinese Team Trains "Spiking Large Model," Boosting Inference Speed by 100 Times
NeurIPS'25! AutoPrune: A Plug-and-Play Adaptive Pruning Framework for Large Models
Achieving Lossless Mathematical Reasoning with 10% KV Cache: An Open-Source Method to Resolve 'Memory Overload' in Large Inference Models
RMoA: Residual Extraction Mixture-of-Agents, Enabling Agents to Discover New Information and Adaptively Stop [ACL2025]
ICML 2025 | Fast and Powerful Liger! Transformer Instantly Switches to Linear RNN with only 20M Token Fine-tuning