Ok, so struct size optimization is one reason to use fixed-size bool. I guess the next question is: why bool32, when a bool8 would do an even better job at keeping data structures small? I remember Casey saying something about processors being "happier" with 32 bit data types, so I did a quick test:
1
2
3
4
5
6
7
8
9
10
11
12
13 | int32_t a = 0;
uint8_t b8 = true;
if (b8) {
a = 1;
}
uint32_t b32 = true;
if (b32) {
a = 2;
}
bool b = true;
if (b) {
a = 3;
}
|
Compiled without optimization, MSVC generates this assembly:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28 | int a = 0;
000007FEF9A92D2F mov dword ptr [a],0
uint8_t b8 = true;
000007FEF9A92D37 mov byte ptr [b8],1
if (b8) {
000007FEF9A92D3C movzx eax,byte ptr [b8]
000007FEF9A92D41 test eax,eax
000007FEF9A92D43 je updateGame+45Dh (07FEF9A92D4Dh)
a = 1;
000007FEF9A92D45 mov dword ptr [a],1
}
uint32_t b32 = true;
000007FEF9A92D4D mov dword ptr [b32],1
if (b32) {
000007FEF9A92D55 cmp dword ptr [b32],0
000007FEF9A92D5A je updateGame+474h (07FEF9A92D64h)
a = 2;
000007FEF9A92D5C mov dword ptr [a],2
}
bool b = true;
000007FEF9A92D64 mov byte ptr [b],1
if (b) {
000007FEF9A92D69 movzx eax,byte ptr [b]
000007FEF9A92D6E test eax,eax
000007FEF9A92D70 je updateGame+48Ah (07FEF9A92D7Ah)
a = 3;
000007FEF9A92D72 mov dword ptr [a],3
}
|
First observation: bool and uint8_t behave exactly the same (no surprises there). The difference between bool/uint8_t and uint32_t is in the if-condition:
| // bool / uint8_t
if (b) {
000007FEFA092D6D movzx eax,byte ptr [b]
000007FEFA092D72 test eax,eax
// uint32_t
if (b32) {
000007FEFA092D57 cmp dword ptr [b32],0
|
So, 8 bit types need two instructions, compared to a single instruction for the 32 bit type. The Agner Fog tables for Haswell have the following to say about the used instructions:
[table]
[tr]
[td]Instruction[/td]
[td]Micro operations (fused domain)[/td]
[td]Reciprocal throughput[/td]
[/tr]
[tr]
[td]movzx r,m[/td]
[td]1[/td]
[td]0.5[/td]
[/tr]
[tr]
[td]test r,r[/td]
[td]1[/td]
[td]0.25[/td]
[/tr]
[tr]
[td]cmp m,i[/td]
[td]1[/td]
[td]0.5[/td]
[/tr]
[/table]
If I interpret those numbers right (and I might very well not, so correct me if I'm wrong), the 8 bit versions take 2 micro operations and 0.5 + 0.25 = 0.75 clock cycles, while the 32 bit version takes 1 micro operation and 0.5 clock cycles to check the condition.
It seems like bool32 eeks out a victory, performance wise. That may change in situations where using a bool8 instead of a bool32 in a struct brings the struct's size below the size of a cache line (or causes significantly more of them to fit into one cache line), but I don't know how to test this (yet).