Blog

Evaluation

Filter

All Posts self-improvement18 machine-learning18 stanford-cs33618 letters15 deep-learning15 motivation8 discipline7 mindset6 books5 focus4 productivity4 action4 +74 more

Tutorials·January 14, 2026·10 min read

CS336 Notes: Lecture 12 - Evaluation

LLM evaluation beyond accuracy: perplexity, knowledge benchmarks, instruction-following, agent tasks, safety, and why evaluation design shapes what models become.

machine-learning evaluation stanford-cs336 benchmarks

Read