Supporting multiple instruction sets

I was thinking in a way to build a executable that uses the best instruction sets supported by the processor or falls back to a less performant implementation at runtime. I came with these two "solutions":

1. Branching inside function: maybe the memory access and branching cost pays up when the processor supports the bests instructions, but if makes the worst case slower.
1
2
3
4
5
6
7
real32 Sin(real32 Angle)
{
    if (ProcessorHasSSE4)
        // Compute sine with SSE4
    else
        // Compute sine with SSE2
}


2. Switching function at startup: prevents function from being inlined.
1
Sin = ProcessorHasSSE4 ? SinSSE4 : SinSSE2;


Does someone know another way to do this? maybe modifying instructions at runtime... :ohmy:

PS.: Sorry for the bad English, it's not my native language
I think first solution is OK to use as long as there are very few conditions test (so you are not testing 10 different ifs for what instruction set to use). I think CPU will predict this branch, because it will always take same jump.

But note that for clang/gcc you will need to implement sin function for SSE4 and SSE2 in different translation units. MSVC allows to use whatever intrinsics (SSE4/SSE2) in same file, but for gcc/clang you will need to compile file with special command-line argument ("-msse4" or "-msse2" in this case). Otherwise compiler won't allow to use SSEx intrinsics. But compiling whole file with "-msse4" will allow compiler to use SSE4 instructions also in other places, which you definitely don't want. Otherwise program will crash if CPU doesn't support SSE4. So pretty much you are left with solution nr2, unless you want to put calls to two functions inside if test in solution nr1.

Edited by Mārtiņš Možeiko on