Entity Transform Type

At the moment, in my game I have a type called Transform.

1
2
3
4
5
6
struct Transform
{
	Vector3    position;
	Quaternion orientation;
	Vector3    scale;
};

I use this to represent the transformation of an entity. I have been testing using a float for the scale rather than 3 floats, so that sizeof(Transform) = (3+4+1)*sizeof(float) = 32 bytes, but then 3D scaling is lost.

1
2
3
4
5
6
struct Transform
{
	Vector3    position;
	Quaternion orientation;
	f32        scale;
};


All the transforms are stored within a scene graph e.g.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
struct Scene_Graph
{
	struct Instance_Data
	{
		u32   count;
		u32   capacity;
		void* buffer;

		Entity_Id* entity_id;
		Transform* local_transform;
		Transform* global_transform;
		Scene_Node_Id* parent;
		Scene_Node_Id* first_child;
		Scene_Node_Id* prev_sibling;
		Scene_Node_Id* next_sibling;
	};

	Allocator* allocator;
	Instance_Data data;
	Hash_Table<Scene_Node_Id> map;
};


I was wondering, is it better to just store a Matrix rather than one of these Transforms or store both or just the Transform. What do other people use for their games and what is the performance like for them? (n.b. Entities are just Ids in this game/engine and managed elsewhere.)
An affine transform in 3D can be stored in a 4x3 matrix. But that costs you 48 bytes, rather than the 40 your current layout requires. So as far as memory is concerned, no, you don't win anything from using a matrix.

My personal preference is to keep each part of the transform (position, rotation, scale) separate. This is easier to work with in the game code -- a translation is just 'position += delta', and so on. (Quaternion multiplication is slightly less expensive than 4x4 matrices as well.) It's easy enough to pack it all down at the end of the frame to convert to matrices en masse, which means you can write it all in SIMD, get the speedup win, and also only have to do this transform for the matrices you *actually need* (usually just anything that gets rendered).

The slight win you get for having a smaller transform struct (32 bytes gets you two in a cache line, rather than 1 and a bit) isn't really worth it unless you're working with a lot of them at once -- and in that case you don't want them to be in AOS form anyway.

Incidentally, a bit of trivia: if you do decide you only want a single uniform scale, you can just scale the quarternion and save 4 bytes -- a non-normalized quaternion just scales positions at the same time. (Don't actually do this unless you want to confuse the hell out of anyone else working with your code.)

EDIT:

An example of what I mean, from my current engine (in 2D, not 3D):
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
struct SOA_Transform {
    u32 capacity;
    u32 count;
    float * px;
    float * py;
    float * sx;
    float * sy;
    float * rx;
    float * ry;
};

struct SOA_Mat2x3 {
    u32 capacity;
    u32 count;
    float * a0;
    float * a1;
    float * a2;
    float * b0;
    float * b1;
    float * b2;
};

static void
Mat2x3FromTransform(SOA_Transform in, SOA_Mat2x3 out) {
    assert(in.count <= out.capacity);
    __m128 ZERO = _mm_set1_ps(0.0f);
    for (u32 i = 0; i < in.count; i += 4) {
        __m128 px = _mm_load_ps(in.px + i);
        __m128 py = _mm_load_ps(in.py + i);
        __m128 sx = _mm_load_ps(in.sx + i);
        __m128 sy = _mm_load_ps(in.sy + i);
        __m128 rx = _mm_load_ps(in.rx + i);
        __m128 ry = _mm_load_ps(in.ry + i);

        __m128 a0 = _mm_mul_ps(rx, sx);
        __m128 b0 = _mm_mul_ps(ry, sx);
        __m128 a1 = _mm_mul_ps(_mm_sub_ps(ZERO, ry), sy);
        __m128 b1 = _mm_mul_ps(rx, sy);

        _mm_store_ps(out.a0 + i, a0);
        _mm_store_ps(out.a1 + i, a1);
        _mm_store_ps(out.a2 + i, px);
        _mm_store_ps(out.b0 + i, b0);
        _mm_store_ps(out.b1 + i, b1);
        _mm_store_ps(out.b2 + i, py);
    }
    out.count = in.count;
}

Each drawn entity then references its transform by index, which is the same before and after the conversion.

Edited by Bryan Taylor on
Thanks for the reply.

I think I will go for the smaller Transform struct to get 2 in a cache line and go from there. I don't think anything will really need a non uniform scale but if it does, I will probably hardcode it into the mesh or create a specialized bit for it (e.g. custom transform/matrix).

At the moment, I am not SIMDing anything yet but it wouldn't be hard to change it if I needed to as the last out is pretty flexible already.