Handmade Hero»Forums»Code
Mārtiņš Možeiko
2559 posts / 2 projects
_mm_sqrt_ps optimization for Day 119
Edited by Mārtiņš Možeiko on
How about replacing expensive _mm_sqrt_ps with approximation of inverse square root?

Basically replace:
1
_mm_sqrt_ps(a)


with:
1
_mm_mul_ps(a, _mm_rsqrt_ps(a))


For me on Haswell this reduces 40cy/h to 38cy/h. So not a big improvement. I have no older CPUs to test this on, but I would expect it will make bigger difference where sqrt is more expensive.
Roderic Bos
70 posts
_mm_sqrt_ps optimization for Day 119
Fabian Giesen posten this twitter thread which could also be helpfull

https://twitter.com/rygorous/status/598795742145224704