Simple question, why do we do:
No need for the size of the array to be a power of two anymore.
Once upon a time, people used to use hash tables of prime number size. The reason for this (well, there was some solid theory behind it too) was that you could use a cheap-but-reasonable hash function. It was unlikely that your hash function would produce values which were correlated with the size of the hash table (since it was prime), and so it would overcome any issues with the hash function.
GCC's libstdc++ still uses prime-sized hash tables. I had one program whose inner loop required a fast "set membership" test. I started with std::unordered_set, and instruction-level profiling revealed that 15% of the run time was spent in the division instruction. That's because...
On the other hand, the wraparound operation is now clearly slower (I think by about a factor or 10 or so) since it involves a division.
...it's more like a factor of 30-100. On Haswell, integer division takes up to 30 cycles for a 32-bit division, and 100 cycles for a 64-bit division.
(Incidentally, that raises another interesting point. A 64-bit integer addition or logic operation is rarely more expensive than 32-bit, but 64-bit division is much more expensive. So using 32-bit arithmetic if you're doing divisions may be worth it.)
For comparison, an L1 cache hit is around 4 cycles, L2 cache hit is around 10 cycles, and L3 cache hit is around 50 cycles.
That's why in recent years, a lot of research has gone into designing better hash functions. It's worth going to the trouble of using a better and more expensive hash function and avoiding the modulo operation.
One other recent trend is that thanks to SIMD instructions, you can compute two or four hash functions on the same key simultaneously, for very little extra cost. This has meant that multiple-hash-function data structures (e.g. double hashing, Bloom filters) have made a comeback, and new techniques such as Cuckoo hashing have appeared to take advantage of the hardware.