Category: Inference Optimization

Meta Introduces Deep Think with Confidence: Boosting Reasoning Accuracy and Efficiency with Minimal Changes
Transformer Killer! Google DeepMind's New MoR Architecture Emerges, A New Generation's King Has Arrived
Say Less 'Wait', Do More: NoWait Reshapes Large Model Inference Paths
Achieving Lossless Mathematical Reasoning with 10% KV Cache: An Open-Source Method to Resolve 'Memory Overload' in Large Inference Models
Andrej Karpathy Praises Stanford Team's New Work: Achieving Millisecond-Level Inference with Llama-1B
ICML 2025 | Training-Free, Instant Alignment of Large Model Preferences
Qwen Breakthrough: Using "Parallel Computing" Instead of "Stacking Parameters", New Method Reduces Memory by 22x, Latency by 6x