14 posts
Edited by HawYeah on Reason: Initial post
Hey guys, I've been working on my tower defense game for a bit, and as I have been adding features to my game, the struct size for my monstars has been increasing a bunch recently (currently at 400 bytes). Now this size is still small, I can have over 1000 of these and be under a megabyte of memory usage but I do a lot of querying in my game and I noticed that its starting to take up more time than I'd like. Since my game is on mobile, I know this is a problem I'm eventually going to have to tackle and for querying, I only need to store position information for my entities, as well as some kind of id that lets me reference the entity that this position is coming from. So watching data oriented design vids, I should be trying to store the positions of my monstars in some kind of SOA format outside the monstar that can be quickly run through with SIMD. I came up with some ways of achieving this:

1) Every frame I can run through all the monstars in my world, assemble their positions into a SOA array, and use this array for all queries during that frame. Then after the logic is done for this frame, write all the positions back into the monstar structs. This has the benefit that I don't need to do much book keeping since the array is created once every frame brand new. It does use up more memory tho (since technically, the same data gets stored twice), and I run through the cache misses of building and writing back this array twice per frame.

2) I can have a linked list where every element of the linked list can be 64 positions in a SOA format. This list would store the positions (the positions would not be stored in the monstars anymore), and although its not completely linear, the amount of cache misses on every jump in the linked list is small because we pack 64 positions into every element of the linked list. It has the benefit that adding new monstars becomes easy, if we run out of space, we just add an element to our linked list. It does suck for deleting monstars since once monstars die, they can die in a sparse way so we can end up with lots of elements in the linked list being arrays of 64 positions, but only 12 of those positions correspond to alive monstars, the rest are unused. So unless all 64 monstars that belong to a element of this list die, the element cannot be deleted. Of course, we can make the size smaller, like 16 but the smaller we make it, the less linear our array becomes so the simd optimizations become less effective. We also have to keep track of which parts of our 64 position array correspond to unused memory so that we can place data there if a new monstar gets created.

3) I can have a hashtable to store the position data in a SOA way. So if my position is 2d (x,y), then the id that my hash function spits our for a monstar would be the index in a x array for the x position data and a index in the y array for the y array position data. So using the hash table, I still get SOA benefits but the main problem instead comes from the fixed size nature of hash tables. Hashtables can't be resized so I'd have to make the hash table larger than the max number of monstars I expect to have in a level. This uses up extra memory but the worst part is that if the max amount of monstars in a level is 1000 but currently there are 60 monstars, then I lose all benefits of SOA and simd since there are so many unused positons stored sparsly in the hash table, that I'm just wasting most of the cache with unused memory.

The properties that 1) has are really useful since we have very little constant book keeping that we need to do to store our positions linearly, but I feel like 2), 3) or some other algorithm should be able to get more of the benefits of 1) with a little less book keeping. I know the kind of data I'm working with doesn't need some kind of crazy efficient system to solve my above problem, like 1) will probably provide enough of a speed up in my situation that it doesn't matter. I can also just make a grid data structure to not be looping through every monstar in my game for every query, but I'm interested to know how games go about solving this problem. I feel like there is a better way to do this, and even if its over engineering for my circumstance, I wanna know how other people solve this kind of problem. Thanks in advance.
498 posts
I think most games just don't have a lot of independent entities with complex logic. At most it will be particles that number more than a few thousand but those are very simple to process.

Also consider reusing the graves of old monstars, no need to let that memory go to waste.
14 posts
Ahh okay, I probs should have been more clear but I am reusing in the above examples. I think i kind of figured out how to fix a bunch of the problems with the stuff i proposed above by having a layer of indirection. So instead of storing a pointer to the monstars position, I'd have a handle that is a index in a array, and this array stores the pointers. This way I can more easily shuffle around how I store the positions of monstars in memory without worrying about something pointing to that piece of memory.
Mikael Johansson
99 posts / 1 project

maxNumberOfEntitys = SOME_VALUE

struct entity
{
//Stuff ALL entitys use. Like pos.
};

struct special_entity1
{
//Special stuff
entity* e;
};

entity e[maxNumberOfEntitys];

special_entity1 special;
special.e = &e[some_free_e];

Building on this should save memory, get rid of cache misses, and still be able to easily parse through all entitys when needed.
101 posts
Edited by pragmatic_hero on
That's good!

Indices over pointers though:
1. Takes less space, can be 16bits (65536 entities covers most cases), 32bits, etc.
2. Stuff can be serialized/deserialized/written to disk as-is.
3. Entity array and entities can move in memory.

3rd is important when doing code-hotswapping with ability to edit structs (add,remove fields = change struct size) at runtime.

I find that indices/handles are almost always better than raw pointers.
That includes function pointers (hot swapping!)

Mikael Johansson
99 posts / 1 project
Agreed. Only issue I have with indices is that they produce alot more code. When you make intereesting stuff structs tend to get fairly nested, which makes stuff like:

myObject.data[index].data[index].data..... and so on. Pointers helps against things like that. Can ofcource be solved with making a pointer in the functions starts, or with some #defines.

Just something to concider, but other then that, indices is often better yes.
14 posts
Telash

maxNumberOfEntitys = SOME_VALUE

struct entity
{
//Stuff ALL entitys use. Like pos.
};

struct special_entity1
{
//Special stuff
entity* e;
};

entity e[maxNumberOfEntitys];

special_entity1 special;
special.e = &e[some_free_e];

Building on this should save memory, get rid of cache misses, and still be able to easily parse through all entitys when needed.

If I understand this correctly, you have (in my case) the positions stored in special entity and whenever i want to do a update, i loop over these special entities, update the positions, and then write it out to the entities (in your case, by a ptr access). Your method looks like an optimization over 1) where instead of gathering position data into an array, updating, then writing the data back to the entities, you just maintain the array but you still write back by using the entity* pointers. It looks pretty interesting, I hadn't considered it before. So I have one confusion with this system:

If a bunch of monstars start to die, you have to keep track of every index in your array thats currently unused so that when a monstar gets created, it fills up that spot. This is what I think is the hardest thing for me to solve. Its keeping track of which parts of the array have vacant spots that need to be filled by new entities.

pragmatic_hero
That's good!

Indices over pointers though:
1. Takes less space, can be 16bits (65536 entities covers most cases), 32bits, etc.
2. Stuff can be serialized/deserialized/written to disk as-is.
3. Entity array and entities can move in memory.

3rd is important when doing code-hotswapping with ability to edit structs (add,remove fields = change struct size) at runtime.

I find that indices/handles are almost always better than raw pointers.
That includes function pointers (hot swapping!)

Yeah I guess I hadn't really considered it before because the extra level of indirection always left really slow to me. I don't use a ECS in my game, i just have a struct for a tower, monstar, projectile, etc. as needed and functions like move entity just take in the floats that are needed to move a entity. If for example a monstar dies, it sticks around for an extra frame but with a flag that says its marked to remove (meaning its gone the next frame). So anything that points to it just checks for that flag, and if its there, it handles that case as needed. This way I dont have dangling pointers or stuff like that. But yeah, you brought up some good points for what indexes provide, so ill see how things go.

Telash
Agreed. Only issue I have with indices is that they produce alot more code. When you make intereesting stuff structs tend to get fairly nested, which makes stuff like:

myObject.data[index].data[index].data..... and so on. Pointers helps against things like that. Can ofcource be solved with making a pointer in the functions starts, or with some #defines.

Just something to concider, but other then that, indices is often better yes.

Yeah, code like this sucks a lot D:. Having lots of nested structs can also cause this and its a pain to write out, especially since I try to name my variables with clear names so sometimes i get really long chains or xxx.xxx.xxx[some_id].xx = ...;
Mikael Johansson
99 posts / 1 project
HawYeah
If I understand this correctly, you have (in my case) the positions stored in special entity and whenever i want to do a update, i loop over these special entities, update the positions, and then write it out to the entities

When you update positions, I guess you update ALL livinig and moveable entitys positions. For that, lets make a moveable entity.

struct entity
{
//Stuff ALL entitys use. Like pos.
};

struct moveable
{
entity* e;
//Things like speed, acceleration, turning speed, breaks
};

struct special_entity_that_can_move_and_other_stuff
{
//Special stuff
moveable* move;
};

entity e[maxNumberOfEntitys];

moveable* moveAbleUnits = malloc/realloc/calloc(AS_MANY_AS_NEEDED);

special_entity_that_can_move_and_other_stuff special;
special.move = &moveAvleUnits[some_free_e];

Now my unit has: special->move and special->move->e

With indices you just change pointers to ints. Like this:

struct entity
{
//Stuff ALL entitys use. Like pos.
};

struct moveable
{
int myEntity;
//Things like speed, acceleration, turning speed, breaks
};

struct special_entity_that_can_move_and_other_stuff
{
//Special stuff
int myMoveable;
};

entity e[maxNumberOfEntitys];

moveable* moveAbleUnits = malloc/realloc/calloc(AS_MANY_AS_NEEDED);

special_entity_that_can_move_and_other_stuff special;
special.myMoveable = some_free_positon_in_moveable_units;
moveable[special.myMoveable].entity = some_free_entity;

Personally I often find this system to get a bit more complex, i like my pointers :D But if you want ALOT of entitys and memory is an issue, then maybe you should concider this.

About dead entitys. Just have the system check for a specific value, you dont even need a dead-flag. Like, if positionX is -1 (impossible position) then the unit is dead. Then ignore it in for example the moveAllStuff function. When its time to make a new unit, then loop and check for a unit with -1 position, and use that.

Structure the data in ways that you use it. You probobly have a function to move ALL moveable entitys, then send in "moveable" to that, while the render function probobly need "e" instead.

Im sorry if I write confusing, this is kinda complex stuff :)

Mārtiņš Možeiko
2237 posts / 1 project
Please use [ code ] bbtags for code fragments to keep text readable and have good formatting.
101 posts
Edited by pragmatic_hero on
HawYeah

Yeah I guess I hadn't really considered it before because the extra level of indirection always left really slow to me.

 1 2 3 4 5 Dude * dude = dudes[dude_idx]; // A) gun_types[dude->gun_idx].damage // B) dude->gun_type->damage 

A) Is barely extra level of indirection
[base + index * size + offset] vs [address + offset]

The the address of gun_types has to be loaded, but the overhead here seems so very negligible.
And is probably evened out by having slightly smaller entity sizes due to using 16/32 bit indices vs 64bit.

If anyone here is clued in with instruction timings on latest archs, can tell use exactly what this overhead is - if there really is any of note.

Agreed. Only issue I have with indices is that they produce alot more code.
There's certainly some *ugliness* and little bit of extra typing (not "alot more code" surely), but for the downsides of having raw pointers in game state structs - I feel its totally worth it.
498 posts
actually a larger problem is that the value of dude->gun_idx has to be loaded before damage starting to be loaded. That is 2 loads worth of latency compared to B which only has a single load before you can get the value.
Mārtiņš Možeiko
2237 posts / 1 project
Edited by Mārtiņš Možeiko on
pragmatic_hero
A) Is barely extra level of indirection
[base + index * size + offset] vs [address + offset]

Yes, there is an indirection. First you load gun_idx from memory by using dude_idx index, then you use gun_idx as index to another memory location.

in B case you just load directly from location using dude_idx as index.

[base + index * size + offset]
Also this works fine only when size of element is 1, 2, 4 or 8 bytes large. Otherwise compiler will need to do explicit multiply or shift instruction.
101 posts
Edited by pragmatic_hero on
There's was a small typo, should have been
 1 dude->gun_type->damage //instead of dude->gun_type.damage 

So in case A) it would be, load 1 on gun_types base addr, load 2 on dude->gun_idx, load 3 on *(gun_types + gun_idx).damage
And in case B) load 1 on dude->gun_type, load 2 on dude->gun_type->damage

Namely the extra load in case of A) is on reading gun_types *variable*. Which i'd imagine would almost always be in cache, or red outside of the loop once.

Should really get disassembly out of both cases out of curiosity (lets say struct is 20 bytes), since I feel that his is hairsplitting and very unlikely to ever cause a bottleneck or significant overhead.
498 posts
gun_types[dude->gun_idx].damage

and

dude->gun_type->damage

are basically the same with regards to dependent loads and the only difference is that the first needs another load which can be done in parallel with the first load.
101 posts
Edited by pragmatic_hero on
I played around with it a bit, and my asm is a bit rusty, but the difference is roughly:
A)
 1 2 3 4 // rax = dude addr, rdi = gun_types mov edx, DWORD PTR [rax] // load1, edx = gun_idx lea rdx, [rdx+rdx*4] // address calc, rdx = rdx * 5 add esi, DWORD PTR [rdi+8+rdx*4] // load2, adding damage to esi, [gun_types + fieldoffset + rdx * 4] 

B)
 1 2 mov rdx, QWORD PTR [rax] //load1, gun_type add esi, DWORD PTR [rdx+8] //load2, adding damage to esi 

In case of A) at some point in the function "gun_types" variable has to be loaded into rdi or some other register. So there is *potentially* an extra read. *potentially*
Which is why I said this is *barely* an indirection.

But this is hard to benchmark in any reasonable sense since this is gameplay code, not some super-tight data-heavy loops which can be singled out.

But overall +1 instruction (lea) and more complicated addressing mode = more instruction bytes.
Plus potentially gun_types array address has to be loaded from a variable. If it's an array in data segment, then doesn't even need a memory load.

Optimizing instructions in gameplay code like this is quite insane truly. Ratholing ^ 2.