Tutorials·January 15, 2026·13 min readCS336 Notes: Lecture 13 - Data 1Training data for LLMs: Common Crawl processing, quality filtering, the evolution of data pipelines from BERT to modern models, and the critical role of copyright and licensing.machine-learningdatastanford-cs336deep-learningRead