On day 126 Casey talks about aligning the memory for the tiled renderer. As I understand it he sets up three goals:
- Tile widths should be a multiple of 4
- Framebuffer memory should have a little row right-padding
- Framebuffer address should be 16 byte aligned
This first two I understand. Widths of 4 means that two threads won't mess with the same data. The extra memory padding is for the right-most tile in cases where actual screen width doesn't match up with the SSE size of 16 bytes.
But why is it necessary to make the framebuffer *address* 16 byte aligned?
https://youtu.be/blcNbU70I9o?t=1790
I can see how the 16 byte address alignment might help regarding CPU cache lines and also it lets us use `_mm_store_si128` instead of `_mm_storeu_si128`. But I don't hear Casey mentioning these at all when discussing the need for the 16 byte alignment so I suspect there is some other reason that I just don't understand.
Thank you Casey for a great show and thank you everyone else for this great community :)