Tutorials·January 13, 2026·8 min readCS336 Notes: Lecture 11 - Scaling Laws 2Practical scaling: muP for hyperparameter transfer, WSD learning rate schedules, case studies from Cerebras-GPT, MiniCPM, and DeepSeek on compute-optimal training.machine-learningscaling-lawsstanford-cs336deep-learningRead