Three years ago we were still in the ChatGPT era of AI, and I was very excited about the possibility of local inference. Then came the reasoning era, blowing up KV cache (which increases the need for more memory) and emphasizing the importance of decode (to generate that many more tokens). Now we're in the agentic era, where CPU performance is incredibly important. To that end, the ideal setup for a local agent is strong local CPU performance and calling out to the cloud for inference.
You might also like
There's a lot of pressure to bring this new energy online very quickly because states are currently competing for the opportunity to attract these data centers, even though the financial benefits of them for the state has not actually yet borne fruit.
The New York Times
While uh the US spends 12 times more on compute this from the private sector versus the Chinese private sector uh China is spending 42% more on the robotic sector uh than the United States and apparently this gap is going to widen so this is two very different uh I would say modalities but I think that the robotic race, especially the AI component of it, uh, is a really exciting one and it's where China actually has distinct advantages, not just in the hardware, but potentially the software, too.
China Decode
Chip companies in the future might stop obsessing so much as they do right now about how small the transistors in a semiconductor should be and they might focus more instead on how fast the data moves through a semiconductor.
China Decode