Alexey
They meant that mulps/mulss and addps/mulss have the same latency on many x64 chips, but that doesn't mean that one of them is "faster" than the other. It's like asking what is faster: an apple or an orange? Depends on how you use it, I guess ;)

Just to rephrase my question: I was just curious how many cycles it takes to use TLS or atomics. I know they're different from each other. If you don't like the word "faster" then I won't use it, just interested in the cycle count. When he said: "it is often exactly the same", I didn't know what the "it" was.

Also, the mulps and addps are SIMD instructions, what about normal instructions?

Mārtiņš Možeiko
