Is the visual studio C++ compiler deliberatly making unoptimzed builds slow?

Unlegushwanion

#8693

September 25, 2016

Casey has said that his code is more or less aligned to how the hardware works? So why is the un-optimized build so slow? 27ms or something?

Shouldn't the unoptimized build be at least fast enough that there would be only a few frames at the most to win from optimizing? I don't understand how a compiler can more then double the throughput of reasonable code. Espsially not in the case of a game.

What is it that I am missing?

Bill Strong

#8694

September 25, 2016

So what Casey writes is code that fits his mental model of the CPU. Unless he is a super genius and has the plans for Intel's actual chips, his mental model isn't perfect.

The compiler in unoptimized mode mostly just straight translates the code into machine code. Whereas the optimization literally makes thousands of optimizations that the compiler vendor knows he can get away with doing to make the program run faster*.

* Hint: You never really know what your compiler is doing to your code unless you look at it. And they most likely aren't doing the thing you think they are doing.

Mārtiņš Možeiko

#8696

September 25, 2016

Yes, it is other way around. Instead of compiler making unoptimized builds slow, it does really hard work for optimized builds and makes them fast. Doing this is a hard work takes time and memory.

Here's example of calculating new_position = position + speed * time with vector2.
Compare unoptimized build: https://godbolt.org/g/zXfDjL
vs optimized build: https://godbolt.org/g/fq6njm
(it's not MSVC, but clang - still the idea is the same).

Look how compiler optimized code - only three instructions + ret.
But for unoptimized build there much more. Compiler didn't create these instructions out of nowhere. It's the other way around. For any code compiler starts with list of unoptimized instructions that were created by translating C code to instructions (usually they are high-level pseudo-assembly instructions before they are converted to real x86 instructions, but the idea is the same). It treats each float as independent variable. Each variable goes to stack exactly how it is specified in C code. After generating unoptimized instructions the compiler goes over them and tries to figure out what can be simplified, what is redundant, what can be removed. That takes time and effort. That's why optimized builds are slower (sometimes significantly). But in result you get everything nice and compact.

When Casey says "code is more or less aligned to how the hardware works" he means optimized assembly code (and data structures, but that's a different story). C code as C code never maps to hardware. You need to know and think what compiler does and how the output code will look like.

Edited by Mārtiņš Možeiko on September 25, 2016, 4:41am