If operations you are performing on vX are done in a loop, then yes it makes sense to optimize with SSE. But for simple stuff, if it just a one or two individual vX operations, then it doesn't make much sense.
Also you should check if compiler you are using already performs these optimizations. Because Casey said he'll target SSE as minimum requirement, you can turn this optimization setting on (x86_64/x64 already does by default) and see it it is optimized or not.
For example, let's write to "test.cpp" file simple function "test_function":
1
2
3
4
5
6
7
8
9
10
11
12 | #define HANDMADE_INTERNAL 1
#include "handmade_platform.h"
#include "handmade_intrinsics.h"
#include "handmade_math.h"
v4 test_function(v4 Input)
{
v4 Result;
v4 SomeValue = { 1,2,3,4 };
Result = Input + SomeValue;
return Result;
}
|
Then I use GCC (4.9.2) to produce optimized x64 assembly:
Then check the output file "test.s" for "test_function":
| _Z13test_function2v4:
subq $24, %rsp
movups (%rdx), %xmm0
movq %rcx, %rax
addps .LC0(%rip), %xmm0
movups %xmm0, (%rcx)
addq $24, %rsp
ret
|
As you can see, it is using one instruction (SSE vector add - addps) to add 4 floats. So converting individual vector operations to use SSE instructions won't help at all because compiler is already doing that.
But as I said above, converting more complex loops that perform specific operations like blitting or blending + sRGB stuff to SSE will definitely help because for compilers to automatically optimize that is not a very easy job, you as a developer can do better.