Christian
10 posts
How does intrinsics work?
As I understand it is a CPU instruction that you can run instead of several other (more simple) ones. But Casey checks if they are available by checking which compiler is running! Won't that break if the game is moved to a different machine with a different CPU? Programming against the hardware is new to me :-)

Thanks
Mārtiņš Možeiko
2374 posts / 2 projects
How does intrinsics work?
When you write C code like "a = b * c" compiler will convert that to multiply instruction. There can be of course additional instructions to move values from stack to registers and back, but the main thing is * symbol means to use multiply instruction.

But CPU has much much more instructions: http://ref.x86asm.net/geek-abc.html
There simply isn't a way to write them in C using special symbols (like + for addition, or * for multiply). So compiler or CPU vendors came up with idea of intrinsics. They look like a function call. But compiler recognizes these fake function calls, and instead of generating real CALL instruction they generate specific CPU instruction for each intrinsic. So simply this is a way how to access all the complex CPU instruction set from C code without writing assembly code.

There are bunch of advantages to writing directly assembly:

1) compiler can allocate registers for you. You don't need to worry what should be in register 0, and what should be in register 1.

2) compiler can rearrange instructions so they are better scheduled in target CPU. If you write assembly code compiler won't change that a bit. In C code it has ability to rearrange calls to instrinsics if that doesn't change semantics. In result you get additional optimizations without need to worry about instruction scheduling

3) portability - same C code can be compiler for 32-bit or 64-bit CPU. If you would write asm then you would need to write one for each architecture (or create some pseudo-assembly macros - basically new assembly pseudo-language). This also applies to portability between compilers and OS. ASM syntax can differ between Microsoft Macro assembler for Windows and GNU assembler for Linux/OSX.

Not sure what you mean by Casey checking which compile is running for choosing if they are available. What do you mean by that? Currently he is using MSVC to compile. And MSVC for Intel architecture supports only SSE or AVX intrinsics. He said he will require user to have SSE2 capable CPU, so he freely uses any SSE or SSE2 intrinsic.
Andrew Bromage
183 posts / 1 project
Research engineer, resident maths nerd (Erdős number 3).
How does intrinsics work?
mmozeiko
He said he will require user to have SSE2 capable CPU, so he freely uses any SSE or SSE2 intrinsic.

Just so you know, SSE2 as a minimum hardware requirement wasn't chosen at random. All x86_64 CPUs support SSE2, and it's also a hardware requirement for Windows 8.
Mārtiņš Možeiko
2374 posts / 2 projects
How does intrinsics work?
Edited by Mārtiņš Možeiko on
I don't think that was the reason to choose SSE2. I would guess Casey will ship also 32-bit binary and will support Windows version less than 8 :)
It's just the nowadays every CPU supports at least SSE2, because nobody makes 32-bit only Intel chips.
Christian
10 posts
How does intrinsics work?

 1 2 3 #if COMPILER_MSVC uint32 Result = _rotl(Value, Amount); #else 

_rotl() is an intrinsic, right?

Okay so as long as the CPU knows the instruction set you don't need to compile a different version of the game? Does AMD also have SSE2 or is that an Intel thing? Also, in the final game I guess that you need to check if the CPU knows the instructions?
Mārtiņš Možeiko
2374 posts / 2 projects
How does intrinsics work?
Edited by Mārtiņš Možeiko on
Yes, _rotl is an intrinisc. But it is only MSVC intrinsic. GCC and clang doesn't provide such. For them either you use inline assembler, or just simply use two shifts and or - optimizer is good enough to understand that two shifts + or can be turned into one rotate instruction.

Yes, AMD have SSE2 intructions. Wikipedia pages on this are pretty OK: https://en.wikipedia.org/wiki/SSE2
Competing chip-maker AMD added support for SSE2 with the introduction of their Opteron and Athlon 64 ranges of AMD64 64-bit CPUs in 2003.

Casey said he will require CPU's to have SSE2 to run HH. So he won't need to check if CPU supports it. Either you have it and then can play HH, or you don't have and don't play HH. But if you would want to write different code path, lets say for AVX instruction set, then you would need to check if CPU supports it and then call one function or different one.
Andrew Bromage
183 posts / 1 project
Research engineer, resident maths nerd (Erdős number 3).
How does intrinsics work?
mmozeiko
I would guess Casey will ship also 32-bit binary and will support Windows version less than 8 :)

Sure. My point is that SSE2 is a de facto standard measure of "reasonably modern CPU".

It's just the nowadays every CPU supports at least SSE2, because nobody makes 32-bit only Intel chips.
I'm pretty sure that AMD will sell you a shiny new 32-bit Geode if that's what you want, and I think that VIA still makes G7s. But yes, it's a niche product these days.
Christian
10 posts
How does intrinsics work?
Thanks for the explanations :-)
milo
1 posts
How does intrinsics work?
Edited by milo on
Its work :)
ola
Karan Joisher
10 posts
How does intrinsics work?
Are the intrinsics agreed upon by all x86 and x86-64 vendors, or is there a possibility that each of these vendors can have specialized intrinsics for their own hardware? After reading other replies it seems that each compiler can support different subsets of intrinsics. So if support for intrinsics changes between vendors and compilers, would we have to define different paths for each vendor and compiler?
Mārtiņš Možeiko
2374 posts / 2 projects
How does intrinsics work?
Yes, intrinsics can be and are different for different vendor compilers. Or even between different versions of compiler from same vendor.

In general most SSE/AVX intrinsics are the same. Some others that maps to x86 instructions are a bit different (cpuid, ror/rol, 64x64 -> 128 bit mul, etc...)

Usually you solve this with preprocessor - ifdef msvc do one thing, elif gcc do another, etc...