Category: Transformer Models
- Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces
- ByteDance Seed's New Work DeltaFormer: An Attempt at Next-Generation Model Architecture
- Microsoft and Others Propose 'Chain-of-Model' New Paradigm, Comparable to Transformer Performance with Better Scalability and Flexibility