An affine transform in 3D can be stored in a 4x3 matrix. But that costs you 48 bytes, rather than the 40 your current layout requires. So as far as memory is concerned, no, you don't win anything from using a matrix.
My personal preference is to keep each part of the transform (position, rotation, scale) separate. This is easier to work with in the game code -- a translation is just 'position += delta', and so on. (Quaternion multiplication is slightly less expensive than 4x4 matrices as well.) It's easy enough to pack it all down at the end of the frame to convert to matrices en masse, which means you can write it all in SIMD, get the speedup win, and also only have to do this transform for the matrices you *actually need* (usually just anything that gets rendered).
The slight win you get for having a smaller transform struct (32 bytes gets you two in a cache line, rather than 1 and a bit) isn't really worth it unless you're working with a lot of them at once -- and in that case you don't want them to be in AOS form anyway.
Incidentally, a bit of trivia: if you do decide you only want a single uniform scale, you can just scale the quarternion and save 4 bytes -- a non-normalized quaternion just scales positions at the same time. (Don't actually do this unless you want to confuse the hell out of anyone else working with your code.)
EDIT:
An example of what I mean, from my current engine (in 2D, not 3D):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48 | struct SOA_Transform {
u32 capacity;
u32 count;
float * px;
float * py;
float * sx;
float * sy;
float * rx;
float * ry;
};
struct SOA_Mat2x3 {
u32 capacity;
u32 count;
float * a0;
float * a1;
float * a2;
float * b0;
float * b1;
float * b2;
};
static void
Mat2x3FromTransform(SOA_Transform in, SOA_Mat2x3 out) {
assert(in.count <= out.capacity);
__m128 ZERO = _mm_set1_ps(0.0f);
for (u32 i = 0; i < in.count; i += 4) {
__m128 px = _mm_load_ps(in.px + i);
__m128 py = _mm_load_ps(in.py + i);
__m128 sx = _mm_load_ps(in.sx + i);
__m128 sy = _mm_load_ps(in.sy + i);
__m128 rx = _mm_load_ps(in.rx + i);
__m128 ry = _mm_load_ps(in.ry + i);
__m128 a0 = _mm_mul_ps(rx, sx);
__m128 b0 = _mm_mul_ps(ry, sx);
__m128 a1 = _mm_mul_ps(_mm_sub_ps(ZERO, ry), sy);
__m128 b1 = _mm_mul_ps(rx, sy);
_mm_store_ps(out.a0 + i, a0);
_mm_store_ps(out.a1 + i, a1);
_mm_store_ps(out.a2 + i, px);
_mm_store_ps(out.b0 + i, b0);
_mm_store_ps(out.b1 + i, b1);
_mm_store_ps(out.b2 + i, py);
}
out.count = in.count;
}
|
Each drawn entity then references its transform by index, which is the same before and after the conversion.