Three years ago we were still in the ChatGPT era of AI, and I was very excited about the possibility of local inference. Then came the reasoning era, blowing up KV cache (which increases the need for more memory) and emphasizing the importance of decode (to generate that many more tokens). Now we're in the agentic era, where CPU performance is incredibly important. To that end, the ideal setup for a local agent is strong local CPU performance and calling out to the cloud for inference.