underscored — Underscored

@underscored

1 clip · 1 follower

Tag:sample-efficiencyClear

dwarkesh.com

Imagine if it took a couple decades worth of courses with hundreds of concurrent professors and millions of practice tasks for you to learn how to polish a word file. Even the task count difference understates the gap - the models have to grind their far more numerous tasks each far harder. Whereas a human student might practice a textbook problem once or twice, GRPO has the model generate hundreds to thousands of rollouts per task.
— Dwarkesh Patel

1mo ago

Underscored — save the words that stop you in your tracks.

Start saving quotes →