underscored — Underscored

@underscored

1 clip · 1 follower

Tag:algorithmic-efficiencyClear

dwarkesh.com

Eric Jang – Building AlphaGo from scratch

naive policy gradient RL has to figure out which of the 100k+ tokens in your trajectory actually got you the right answer, while AlphaGo's MCTS suggests a strictly better action every single move, giving you a training target that sidesteps the credit assignment problem.
— Dwarkesh Patel

2mo ago

Underscored — save the words that stop you in your tracks.

Start saving quotes →