Nowadays we all here about he importance of data locality when trying to create a performant program. Of course this is due to a CPU's utilization of cache memory. What I'm curious about is if there are other major CPU mechanisms the programmer should be aware of besides the cache? I know CPU's also have plenty of other architectural designs to help increase instruction completion rate, just wansn't sure which of those mechanisms can also be significantly influenced by program design and not already heavily optimized for by the compiler.
Thank you for the link! Definitely has some interesting ideas. Though not gonna lie, its hard to digest everything he talks about. I'll have to read it over a few times to really grasp the concepts well.