Skip to content

Blog

Machine-learning

Tutorials··6 min read

CS336 Notes: Lecture 6 - Kernels and Triton

Writing efficient GPU kernels with Triton: profiling, benchmarking, kernel fusion, and when to hand-optimize versus using torch.compile.

Read
Tutorials··11 min read

CS336 Notes: Lecture 5 - GPUs

GPU fundamentals for LLM training: memory hierarchy, arithmetic intensity, kernel optimization, FlashAttention, and bandwidth limits.

Read