Skip to content

Blog

Machine-learning

Tutorials··10 min read

CS336 Notes: Lecture 12 - Evaluation

LLM evaluation beyond accuracy: perplexity, knowledge benchmarks, instruction-following, agent tasks, safety, and why evaluation design shapes what models become.

Read