What are intrinsics?

Todd

#14588

March 18, 2018

Can someone explain to me what intrinsics are and what they are for?

MSDN says that they are "inline code rather than a function call."

So are they essentially implemented as a label with no stack setup/teardown? What is the purpose for this and how are they used? Thanks.

Edited by Todd on March 18, 2018, 8:27am Reason: Initial post

Mārtiņš Možeiko

#14589

March 18, 2018

Typically language abstracts the CPU it is compiling for. So you can write very complex operations (array[index]->member->x = 0) without worrying what kind of machine instructions will be generated. Sometimes this is good. But sometimes bad. Sometimes you simply cannot express in C (or whatever language) the thing you want compiler to generate. One good example is rotate instruction. Most modern architectures (x86, arm) can rotate 32 or 64 bits in register by constant amount. But you you can express this in C? For shifting it is easy, you simply write "x >> 1". But there is no C operator for rotate.

You could write (x >> n) | (x << (32-n)), but then you would depend on compiler optimizer to not mess up and optimize it properly to rotate instruction. And in debug build it would still generate a lot of operations, because it doesn't optimize, obviously.

One way how to fix this problem is to use inline assembler. You can explictly write instruction that does what you want, figure out how to tell inline assembler to use your variables and you're good. Unfortunately there are a some issues with this. Sometimes compiler will have a hard time optimizing around your inline assembly fragment. Because it treats your assembly block as "black block" - it cannot optimize it, for example, by rearranging instructions.

That's where intrinsics come to help. They look like regular functions, but they don't generate call to function. Compiler recognizes these "functions" as a way you telling it - "please generate instruction X at this point". It knows what instruction generate from name of intrinsic. But it can optimize much better, because it understands what intrinsic does and it sees all the arguments, how they are passed, created, etc...

Back to rotate example - in x86/x86_64 MSVC provides _rotl and _rotr instrinics (in intrin.h header). If you use them like this:

1 2	unsigned int x = 123; unsigned int y = _rotr(x, 5);

Then compiler will know that you want to generate "ror" instruction with rotate count 5, and it will generate appropriate register allocation so variable "x" gets passed/used in input to ror instruction, and output of it is used as "y".

Again, whats good about this is that compiler has full visibility of whats going on. In my example above compiler will actually optimize code to this:

1	unsigned int y = 0xd8000003;

Because it knows you are rotating by 5. So there's no point of doing this at runtime, if input is constant value. Which it cannot do if you use inline assembly.

Same thing applies to SSE/AVX intrinsics, or math.h intrinsics like fabs.

There are other advantages to intrinsics. Most of them are portable between compilers - you can use same ones for GCC and MSVC. Which again you cannot do with inline assembly. Some of intrinsics are also portable between architectures - like GCC's __builtin_bswap32, which will swap bytes in 32-bit integer.

Edited by Mārtiņš Možeiko on March 18, 2018, 8:47am

Todd

#14601

March 18, 2018

Thanks Martins! That was a beautiful explanation. I hadn't yet touched this space because I've either written C or asm but never needed to optimize yet. This makes a lot of sense.