Blog

Machine-learning

Filter

All Posts self-improvement18machine-learning18stanford-cs33618 letters15 deep-learning15 motivation8 discipline7 mindset6 books5 focus4 productivity4 action4 +74 more

Tutorials·January 9, 2026·9 min read

CS336 Notes: Lecture 7 - Parallelism 1

Distributed training fundamentals: data parallelism, ZeRO/FSDP for memory efficiency, tensor and pipeline parallelism, and how to combine strategies for frontier-scale models.

machine-learning distributed-training stanford-cs336 deep-learning

Read

Tutorials·January 8, 2026·6 min read

CS336 Notes: Lecture 6 - Kernels and Triton

Writing efficient GPU kernels with Triton: profiling, benchmarking, kernel fusion, and when to hand-optimize versus using torch.compile.

machine-learning gpu stanford-cs336 triton

Read

Tutorials·January 7, 2026·11 min read

CS336 Notes: Lecture 5 - GPUs

GPU fundamentals for LLM training: memory hierarchy, arithmetic intensity, kernel optimization, FlashAttention, and bandwidth limits.

machine-learning gpu stanford-cs336 hardware

Read

Tutorials·January 6, 2026·12 min read

CS336 Notes: Lecture 4 - Mixture of Experts

Mixture of Experts (MoE): adding capacity without proportional compute, routing, load balancing, and what makes MoE stable.

machine-learning transformers stanford-cs336 moe

Read

Tutorials·January 5, 2026·8 min read

CS336 Notes: Lecture 3 - Architectures and Hyperparameters

What modern LLMs converge on: pre-norm, RMSNorm, SwiGLU, RoPE, and stability tricks.

machine-learning transformers stanford-cs336 architecture

Read