Abstract Algorithms

Topic

mixture of experts

1 article

Sparse Mixture of Experts: How MoE LLMs Do More With Less Compute

Sparse Mixture of Experts: How MoE LLMs Do More With Less Compute

TLDR: Mixture of Experts (MoE) replaces the single dense Feed-Forward Network (FFN) layer in each Transformer block with N independent expert FFNs plus a learned router. Only the top-K experts activat

Apr 17, 2026•27 min read