1 2 3 4 5 6 7 8 9 10 11 12 13 | int32_t a = 0; uint8_t b8 = true; if (b8) { a = 1; } uint32_t b32 = true; if (b32) { a = 2; } bool b = true; if (b) { a = 3; } |
Compiled without optimization, MSVC generates this assembly:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | int a = 0; 000007FEF9A92D2F mov dword ptr [a],0 uint8_t b8 = true; 000007FEF9A92D37 mov byte ptr [b8],1 if (b8) { 000007FEF9A92D3C movzx eax,byte ptr [b8] 000007FEF9A92D41 test eax,eax 000007FEF9A92D43 je updateGame+45Dh (07FEF9A92D4Dh) a = 1; 000007FEF9A92D45 mov dword ptr [a],1 } uint32_t b32 = true; 000007FEF9A92D4D mov dword ptr [b32],1 if (b32) { 000007FEF9A92D55 cmp dword ptr [b32],0 000007FEF9A92D5A je updateGame+474h (07FEF9A92D64h) a = 2; 000007FEF9A92D5C mov dword ptr [a],2 } bool b = true; 000007FEF9A92D64 mov byte ptr [b],1 if (b) { 000007FEF9A92D69 movzx eax,byte ptr [b] 000007FEF9A92D6E test eax,eax 000007FEF9A92D70 je updateGame+48Ah (07FEF9A92D7Ah) a = 3; 000007FEF9A92D72 mov dword ptr [a],3 } |
First observation: bool and uint8_t behave exactly the same (no surprises there). The difference between bool/uint8_t and uint32_t is in the if-condition:
1 2 3 4 5 6 7 8 | // bool / uint8_t if (b) { 000007FEFA092D6D movzx eax,byte ptr [b] 000007FEFA092D72 test eax,eax // uint32_t if (b32) { 000007FEFA092D57 cmp dword ptr [b32],0 |
So, 8 bit types need two instructions, compared to a single instruction for the 32 bit type. The Agner Fog tables for Haswell have the following to say about the used instructions:
[table]
[tr]
[td]Instruction[/td]
[td]Micro operations (fused domain)[/td]
[td]Reciprocal throughput[/td]
[/tr]
[tr]
[td]movzx r,m[/td]
[td]1[/td]
[td]0.5[/td]
[/tr]
[tr]
[td]test r,r[/td]
[td]1[/td]
[td]0.25[/td]
[/tr]
[tr]
[td]cmp m,i[/td]
[td]1[/td]
[td]0.5[/td]
[/tr]
[/table]
If I interpret those numbers right (and I might very well not, so correct me if I'm wrong), the 8 bit versions take 2 micro operations and 0.5 + 0.25 = 0.75 clock cycles, while the 32 bit version takes 1 micro operation and 0.5 clock cycles to check the condition.
It seems like bool32 eeks out a victory, performance wise. That may change in situations where using a bool8 instead of a bool32 in a struct brings the struct's size below the size of a cache line (or causes significantly more of them to fit into one cache line), but I don't know how to test this (yet).