Tutorials·January 9, 2026·9 min readCS336 Notes: Lecture 7 - Parallelism 1Distributed training fundamentals: data parallelism, ZeRO/FSDP for memory efficiency, tensor and pipeline parallelism, and how to combine strategies for frontier-scale models.machine-learningdistributed-trainingstanford-cs336deep-learningRead