Category: Inference Optimization
- Meta Introduces Deep Think with Confidence: Boosting Reasoning Accuracy and Efficiency with Minimal Changes
- Transformer Killer! Google DeepMind's New MoR Architecture Emerges, A New Generation's King Has Arrived
- Say Less 'Wait', Do More: NoWait Reshapes Large Model Inference Paths
- Achieving Lossless Mathematical Reasoning with 10% KV Cache: An Open-Source Method to Resolve 'Memory Overload' in Large Inference Models
- Andrej Karpathy Praises Stanford Team's New Work: Achieving Millisecond-Level Inference with Llama-1B
- ICML 2025 | Training-Free, Instant Alignment of Large Model Preferences
- Qwen Breakthrough: Using "Parallel Computing" Instead of "Stacking Parameters", New Method Reduces Memory by 22x, Latency by 6x