underscored

@underscored

3 clips · 1 follower

Follow
Tag:hardwareClear

Three years ago we were still in the ChatGPT era of AI, and I was very excited about the possibility of local inference. Then came the reasoning era, blowing up KV cache (which increases the need for more memory) and emphasizing the importance of decode (to generate that many more tokens). Now we're in the agentic era, where CPU performance is incredibly important. To that end, the ideal setup for a local agent is strong local CPU performance and calling out to the cloud for inference.

Ben Thompson
1w ago

Underscored — save the words that stop you in your tracks.

Start saving quotes →