Hardware is massively parallel. You've got tens of billions, hundreds of billions of transistors on your chip, and it takes maybe 100 clock cycles to get from one side of the chip to the other. You can't do a sequential computation involving transistors on both sides of the chip. The hardware is just fundamentally parallel, and you have to take advantage of that.