In other words, many of the biggest flaws from the original ChatGPT have been substantially mitigated, at least for verifiable use cases like coding: LLMs are much more likely to be right the first time, they reason over their results to increase their chances, and now agents actively verify the results without humans needing to be in the loop.