I played around with it a bit, and my asm is a bit rusty, but the difference is roughly:
// rax = dude addr, rdi = gun_types
mov edx, DWORD PTR [rax] // load1, edx = gun_idx
lea rdx, [rdx+rdx*4] // address calc, rdx = rdx * 5
add esi, DWORD PTR [rdi+8+rdx*4] // load2, adding damage to esi, [gun_types + fieldoffset + rdx * 4]
mov rdx, QWORD PTR [rax] //load1, gun_type
add esi, DWORD PTR [rdx+8] //load2, adding damage to esi
In case of A) at some point in the function "gun_types" variable has to be loaded into rdi or some other register. So there is *potentially* an extra read. *potentially*
Which is why I said this is *barely* an indirection.
But this is hard to benchmark in any reasonable sense since this is gameplay code, not some super-tight data-heavy loops which can be singled out.
But overall +1 instruction (lea) and more complicated addressing mode = more instruction bytes.
Plus potentially gun_types array address has to be loaded from a variable. If it's an array in data segment, then doesn't even need a memory load.
Optimizing instructions in gameplay code like this is quite insane truly. Ratholing ^ 2.
Thats probably fair to say that its not important enough to optimize but I figured that whatever optimization you apply here can be generalized to other systems (physics, rendering, etc). On the other side, if the optimizations that can be applied are not too complex, we may get large performance increases in gameplay code which will let us have crazier mechanics that take advantage of that (for example, having way more entities or entities that have much more complex ai). And lastly, figuring out how to store the positions in a data oriented way isn't really optimizing gameplay code, its a lot more general. Position is important for physics and rendering as well so in many ways, this is optimizing all of those sub systems. Again, this might not actually matter but if it did, I'm interested to explore and figure out how to architect a fix cause I think the fix would generalize to other parts of programming.
My concern is not about the performance, it never was.
The good thing with pointers is that they are very useful when the structs gets really nested.
Like you have a tank "struct", that contains a "vehicle" struct, that contains a "movable" struct, that contains a "entity" struct.
That could become very complex code to get to that specific tanks entity. So you give the tank a pointer to its entity.
Workflow is (for me) extremely important.
I try to approach it differently. I think that we should figure out what the code that gives us really good performance looks like, and then figure out how we can write that code as easily as possible. To some degree, this is up to the language to provide but we can also just structure the API we expose to also be cleanly organized.
As for your code on the first page, you solve the problem of overwriting dead monstars positions by looping and checking for impossible position values. I guess thats a fair way to do it and creating monstars is definitly not as common so it shouldn't be a performance issue, but I think I have a better system.
//NOTE: Handles are indexes into the pos id table
// NOTE: PosId's are indexes into our soa table where we store the x, y
// NOTE: The soa data
// Rest of the entity data
inline u32 CreatePosHash(soa_pos_data* PosData, f32 NewX, f32 NewY)
u32 PosHash = GenHashValue(PosData);
u32 PosId = PosData->CurrNumPos++;
PosData->PosIdArray[PosHash] = PosId;
PosData->PosX[PosId] = NewX;
PosData->PosY[PosId] = NewY;
inline void DeletePosHash(soa_pos_data* PosData, u32 DelPosHash)
// NOTE: We don't want to leave any holes in our x,y arrays so we swap the
// deleted x,y with the last x,y and update the pos id table
u32 LastPosId = PosData->CurrNumPos--;
u32 LastPosHash = FindPosHashVal(PosData, LastPosId);
u32 DelPosId = PosData->PosIdArray[DelPosHash];
PosData->PosIdArray[LastPosHash] = DelPosId;
PosData->PosX[DelPosId] = PosData->PosX[LastPosId];
PosData->PosY[DelPosId] = PosData->PosY[LastPosId];
inline void UpdateSingleEntity(soa_pos_data* PosData)
// NOTE: Get Pos
u32 PosId = PosData->PosIdArray[Entity.PosHash];
f32 EntityX = PosData->PosX[PosId];
f32 EntityY = PosData->PosY[PosId];
// ... do stuff with the pos
// NOTE: If we modified the pos, write back
PosData->PosX[PosId] = NewEntityX;
PosData->PosY[PosId] = NewEntityY;
So I think I implemented this right, I'm gunna try it in my game but this is the pseudo code. So I have a array of indexes PosIdArray that is my layer of indirection. Every entity stores a index into this hashtable called PosHash instead of a position. When I want to retrieve a position, I use the PosHash as a array index into my PosIdArray and then get out a u32. This u32 is the index in my actual posx, posy array's and I can use this index to retrieve/modify them.
Now when I create a new position, I want to append the position to my posx, posy array's assuming that those array's are filled already (they have no holes in them). So I get a PosHash by calling GenHashValue and I make the stored value at PosHash be the index to one past the current last position in the posx, posy arrays.
When I delete a PosHash, I want to make sure I don't leave a empty position in the middle of my posx, posy arrays. So to prevent me from leaving the DelPosId index empty in my array's, I overwrite it with the x,y values stored at the end of my posx, posy arrays. I then find the PosHash that corresponds to the last x,y values and I make it now point to DelPosId (I make it point to where I moved the x,y values to). This way, my array of positions is always filled.
For my hashtable functions, I use create hash, delete hash, and find hash. Hashtables are somewhere between O(1) and O(logn) depending on the hash function but generally, these operations shouldn't be too taxing. The thing that bothers me is that, although my posx, posy array can be made to be arbitrarily long by converting it to a linked list with elements being an array of 64 positions, the hashtable needs to know upfront how many elements there can be in my game. I guess its really hard to have fast access times without putting a limit on how many positions we can store in our game. I'm not super knowledgable on data structures so is there a different way to do this? It seems like this method has a lot of benefits except for putting a upfront limit on how many monstars we can have.