01
Stack Money for the Wrong Reasons.
The uncomfortable truth about wealth, revenge, and why dark fuel burns just as bright.
Read
Articles
The uncomfortable truth about wealth, revenge, and why dark fuel burns just as bright.
How to negotiate from strength - knowing when to speak, when to stay silent, and how to leave room for both sides to win.
A letter from my future self on becoming undefinable, building a personal monopoly, and why the paradox is the edge.
RL foundations for LLMs: policy gradients, baselines for variance reduction, GRPO implementation details, and practical training considerations for reasoning models.
Advanced RL for alignment: PPO implementation details, GRPO as a simpler alternative, overoptimization risks, and case studies from DeepSeek R1, Kimi K1.5, and Qwen 3.