In my own projects, I've been trying to write code where I request all the memory for my application up front and work within the memory block I allocated at start. My current usage of the memory block is basically a linear allocation scheme, just like Casey has been doing. This has gotten me surprisingly far in my own work, and it's really awesome to see how much memory allocation overhead I'm avoiding.
However, I ran into a problem with alignment on some of my code, and I was wondering what people thought of in terms of handling this issue. Basically, my linear allocator works identically to Casey's PushStruct() macro and the function it calls. I know the size of my memory block and I simply push on to the block however many bytes is requested, so long as the size of the allocation request does not exceed the total capacity.
All is good in the land. Except for when I enable high optimizations on x86 and the compiler (gcc/g++ in my case) decides to auto-vectorize my code and spits out aligned memory loads. Awesome.
So the core of my issue is that my own pushing code (which is, for all intents and purposes, identical to Casey's) has no understanding of alignment and just packs things into my memory block as tightly as possible (including the padding the compiler puts in my structs). If my understanding is correct, for most x86 code this is OK since x86 is very tolerant of unaligned accesses. But this might not be true for another architecture...
Is it worth trying to make my allocations all aligned to some strict alignment requirement? My inclination is to say no since my code will run on mostly x86. However, I still run into the problem of the compiler being potentially very aggressive and spitting out instruction sequences that expect alignment, so it feels like I don't have any choice but to make sure everything is aligned properly.
Any tips?
Currently, I work around this by implementing a function which will align the next allocation and I've patched up each location where the compiler emits aligned loads. But this is *extremely* error prone and depends on me catching every single location where the compiler would do such a thing, which I can't be sure of.
Ah, yes, that pretty much discusses my problem very thoroughly.
Although, it seems a bit iffy to me to have to align each thing individually in the presence of a compiler that doesn't really tell me when it's expecting aligned memory addresses...
At the limit you probably want to optimize for arrays and structs that are 16-byte friendly all the time anyway, so I would recommend just always aligning to 16 bytes when you Push. That's probably what I will do once we get to the part in Handmade Hero where we care :)