CS336 Notes: Lecture 4 - Mixture of Experts
Mixture of Experts (MoE): adding capacity without proportional compute, routing, load balancing, and what makes MoE stable.
Read
Blog
Filter
Mixture of Experts (MoE): adding capacity without proportional compute, routing, load balancing, and what makes MoE stable.
What modern LLMs converge on: pre-norm, RMSNorm, SwiGLU, RoPE, and stability tricks.