Home
Topic
mixture of experts
1 article

Sparse Mixture of Experts: How MoE LLMs Do More With Less Compute
TLDR: Mixture of Experts (MoE) replaces the single dense Feed-Forward Network (FFN) layer in each Transformer block with N independent expert FFNs plus a learned router. Only the top-K experts activat
•27 min read

