Hey mmozeiko,
First off, that is a pretty cool site there.
To clarify a bit more here is the scenario. I personally would prefer to have something like
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19 | #ifdef NO_SIMD
typedef struct v2{
float x;
float y;
}v2;
#elif defined(NEON)
typedef struct v2{
float32x4_t v;
}v2;
#else // SSE
typedef struct v2{
__m128 v;
}v2;
#endif
// same for v3/v4, matrices...
|
The functions would have the same pattern E.g.
| v2 add(v2 a, v2 b){
#ifdef NO_SIMD
#elif defined(NEON)
#else // SSE
#endif
}
|
Very similar to DirectXMath if you have looked at that library at all. With this setup there is of course no reason not to support scalar types. However, this setup has some things that I don't like about it. Functions like add, mul, div, etc... where the number of components is irrelevant for SIMD end up having duplicated code for no reason. For example, v2Add, v3Add, and
v4Add all have the exact same code. While I don't love that, I feel like I would still prefer to have the distinct v2, v3, v4 types and live with the duplicated code. What is really bothering me though is in debug builds, having struct around the primitive types performs horrible compared to simple typedefs
| typedef struct v4{
__m128 v;
}v4;
// versus
typedef __m128 v4;
|
In release builds there is no difference but in debug builds the struct version slows things down a lot to the point where the user would have to do something like:
| #pragma optimize("g", on)
#include "math_lib.h"
#pragma optimize("", off)
|
Not the worst thing ever, but still inconvenient.
So my two arguments against the structs are duplicated code and extremely slower debug code. If you drop the structs and just use typedefs, then I feel like you have to only have one vector type instead of v2, v3, v4, and also drop scalar support. Leading to my initial question, is there any argument for supporting scalar types. Thanks for your feedback.