Where to allocate memory from ?

rivten

#9716

December 9, 2016

Hello everyone !

I am going through Handmade Hero a little bit late (I am around ep. 130 right now) but have learned so much from it that there is even a few things that I have hard time digesting.

One of the first topic that Handmade Hero tackled is memory, and rightly so since this is one of the most important topic of programming. This is obviously a huge area of programming and that is why I have so hard times grasping all of it.

From what I understand, there is to way to get memory :

first one is the static one. Declaring it on the stack. For example, I want to have a buffer, I simply declare it

1	u32 Buffer[600];

second one is the dynamic one. It goes on the heap. Done with malloc for example.

Casey does this a little bit differently though, he uses the concept of an Arena to pick memory from, and he has basically two : the Transient memory Arena and the main game Arena.

Now, assuming that this is really what is going on, I have two questions about that :

am I right when I say that the Trasient Arena is used only for computation that do not need to be stored between frames ? If so what is the point of stocking them in an arena instead of putting them directly on the stack so that they are discarded at the end of the scope ?

with Casey's memory framework in mind, I don't really know why but I tend to allocate more and more on the stack and less and less on the heap (using the Arena system or another one even). This can make me hit stack overflow... Is there a particular rule of thumb when allocating memory that says : I should allocate this on the stack or on the heap ? Basically is it only if I care about memory being kept out of the scope or not ?

Thanks a lot and happy programming everyone.

thebeast33

#9722

December 9, 2016

am I right when I say that the Tr...carded at the end of the scope ?

I believe that is the way Casey uses it. But in general, its memory that you don't expect to be around for along period of time. In my engine, I double buffer the transient memory in the case that I have some data I want to persist to the next frame. The reason this is done from heap memory instead of on the stack is that the stack has a limited amount of space and the transient memory might be too big or, if it does all fit on the stack, leave very little additional memory for other stack variables or function calls.

with Casey's memory framework in ...g kept out of the scope or not ?

I'm not sure about general rule of thumbs, but I usually only allocate on the stack for locally scoped variable and fixed sized arrays. If an array is variable size, I usually heap allocate it. The only exception I might make to this is if I know the size of the array will always be small but I am caution about a heap allocation, you could always use alloca to dynamically allocate memory from the stack (this also comes with the normal issues with stack based allocations) for your array.

I have also seen code that allocated a variable to the stack, called some function to process on it, and then cached the results. Looks like this:

void func()
{
    Var someStackVar;
    performCalculations(&someStackVar);
    cacheResults(&someStackVar);
}

The few areas I have seen this done were because 'Var' was relatively large (so you question if want to stack allocate it) but speed was important and heap allocations were relatively expensive.

Mārtiņš Možeiko

#9726

December 9, 2016

rivten
From what I understand, there is two ways to get memory :

There is a third way. You can put "u32 Buffer[600];" in global scope - this will allocate memory in process space at load time of exe/dll. As a bonus it will be automatically 0 initialized.

Edited by Mārtiņš Možeiko on December 9, 2016, 8:40pm

thebeast33

#9728

December 9, 2016

There is a downside to allocating memory in the global scope. Because its allocated into the exe/dll, it is more likely to cause a cache miss when accessing because it is separated from the stack and heap memory.

Mārtiņš Možeiko

#9729

December 9, 2016

Umm.. that entirely depends on your code. If all the memory I "allocate" lives in global data, and I don't use heap allocations at all, there will be no more cache misses than if I used only heap memory.

ratchetfreak

#9731

December 9, 2016

thebeast33
There is a downside to allocating memory in the global scope. Because its allocated into the exe/dll, it is more likely to cause a cache miss when accessing because it is separated from the stack and heap memory.

cache performance is based on how much you access that area of memory, has nothing to do with how it's allocated.

thebeast33

#9736

December 10, 2016

mmozeiko
Umm.. that entirely depends on your code. If all the memory I "allocate" lives in global data, and I don't use heap allocations at all, there will be no more cache misses than if I used only heap memory.

I agree, but you usually don't have lots of global data as this will increase the size of your executable.

ratchetfreak
cache performance is based on how much you access that area of memory, has nothing to do with how it's allocated.

I agree with this as well. But what I was suggesting is that global data is more likely to cause cache misses because you will probably be accessing it along side stack and/or heap allocated memory in the same code. If you have a function that only accesses global data, than the number of cache misses is completely determined by how much global data you have and how you access it.

Mārtiņš Možeiko

#9737

December 10, 2016

thebeast33
I agree, but you usually don't have lots of global data as this will increase the size of your executable.

Non-initialized data doesn't take any space in executable. PE file format stores only one number - size required for global non-initialized data, and Windows uses this number to allocate memory (VirtualAlloc) when binary is being loaded.

Edited by Mārtiņš Možeiko on December 10, 2016, 5:33am

@Mattias_G

#9740

December 10, 2016

Keep in mind that a cache line is typically something like 64 bytes. If you touch one byte of a cache line, the whole line will be pulled in. So the important thing when optimising for the cache, is to make sure that all the data that is pulled into the cache is used, to minimize waste. So I'd say it has more to do with data layout and access pattern, and nothing to do with how it is allocated. And the stack is nothing special - sure, the current 64 bytes of the stack are likely to be in cache most of the time, but beyond that, it again comes down to access pattern of the code.

thebeast33

#9744

December 10, 2016

mmozeiko
thebeast33
I agree, but you usually don't have lots of global data as this will increase the size of your executable.

Non-initialized data doesn't take any space in executable. PE file format stores only one number - size required for global non-initialized data, and Windows uses this number to allocate memory (VirtualAlloc) when binary is being loaded.

Ah okay. I don't deal with globally scoped data a lot, let alone ones that are uninitialized. Makes sense.

Edited by thebeast33 on December 10, 2016, 7:44pm

Bryan Taylor

#9780

December 12, 2016

There are three sources of memory:
- Static
- Stack
- Heap

Static memory is allocated for you by the program loader -- all of your global variables and static data live here. Stack memory is allocated when a function is called, and then freed when that function returns. Finally, heap memory is allocated by asking the OS to give you access to additional memory.

All of these are *just memory* -- the performance of accessing memory is dictated by whether it's in the cache, not by where the memory was allocated.

Now, as to your questions:

1. Stack allocations are only valid within that function and the functions it calls. The next function call can (and will) overwrite anything left on the stack from a function that has returned. When you return a value from a function, that value gets copied onto the stack of the calling function. If it's a large value, that's a large copy. If I pass that directly as a parameter to another function, there's another copy.

EX:

// Copy result into foo
BigStruct foo = GetFoo();
// Copy into parameter list for ProcessFoo
ProcessFoo(foo);

The compiler can optimize this away if it can reason about it, but it can be surprisingly easy to confuse a compiler.

If instead we allocate the struct on the heap, all that needs to be copied around is a pointer:

1 2	BigStruct * foo = GetFoo(arena); ProcessFoo(foo);

There's no magic here, by the way, the stack works the same whether the return value is a pointer or a struct, but by allocating the struct on the heap the copies are much smaller.

2. This is a very context-sensitive question. There are two reasons to avoid stack allocations: stack overflows, and return value / parameter copies.

To avoid stack overflow, consider how deep your callstack is going to get. Is this function recursive (directly or indirectly)? Is it mostly called from functions high up the callstack or those lower down (you can't always determine this)?

For optimizing copies, consider the tradeoff between copying the value around versus a potentially cold pointer dereference. How big is your value? Does it fit in one or two registers? Test it. What code does the compiler produce for particular struct sizes?

Most egregious errors here are common sense. Allocating a kilobyte buffer on the stack in a recursive function is probably a bad idea. Passing 80 byte structures by value is probably not good. Single integers and floats fit just fine on the stack, etc.