Tutorials·January 12, 2026·8 min readCS336 Notes: Lecture 10 - InferenceLLM inference optimization: understanding the prefill vs decode split, KV cache management, speculative decoding, and why inference is fundamentally memory-bound.machine-learninginferencestanford-cs336deep-learningRead