I was thinking in a way to build a executable that uses the best instruction sets supported by the processor or falls back to a less performant implementation at runtime. I came with these two "solutions":
1. Branching inside function: maybe the memory access and branching cost pays up when the processor supports the bests instructions, but if makes the worst case slower.
 | real32 Sin(real32 Angle)
{
    if (ProcessorHasSSE4)
        // Compute sine with SSE4
    else
        // Compute sine with SSE2
}
 
 | 
 
2. Switching function at startup: prevents function from being inlined.
 | Sin = ProcessorHasSSE4 ? SinSSE4 : SinSSE2;
  
 | 
 
Does someone know another way to do this? maybe modifying instructions at  runtime... :ohmy:
PS.: Sorry for the bad English, it's not my native language