I have seen Casey talking about this on both handmade hero forums and on the stream. When designing a library, let's say you need some sort of memory allocation for doing caching on the background. I saw Casey recommending this:
u32 Size = GetSizeOfThing(parameters Params); type *Result = AllocateMemory(Size); InitializeThing(Result);
I also use this approach whenever I know the size of the thing before-hand. But this generally only works for things which you can tell the size beforehand.
For example, if I want to initialize a tree for doing caching behind the user's back, I might need to allocate some memory in the middle of a function. For example let's assume (just as an example) that we will simulate some stuff on a world.
world *World = ... // initialization stuff // Pushing stuff to the world or other works LibraryProcessData(World);
What if during LibraryProcessData I need to allocate a tree node for doing caching? I can only think 3 ways of doing this.
I can design the function such that it returns some sort of state like:
enum { LibraryProcessData_Finished, LibraryProcessData_AllocateMemory, }
And I call it again and again until it returns LibraryProcessData_Finished
and I allocate more memory as long as it returns LibraryProcessData_AllocateMemory
but that is a very dirty implementation and becomes harder if you have state to keep track of inside LibraryProcessData()
.
of doing it is via a callback. But doing so yields a code flow like so:
MyCode -> Library -> MyCode
And that breaks the code flow entirely. If I had a very fine-grained allocator I might need more than this:
void *(*allocator_fn)(int Size)
Which is what malloc looks like. Surely I can add an extra parameter to take a void pointer but that still breaks the code flow and make stuff just more glued together.
Abuse the virtual memory system to allocate a range which is so large that it won't ever overflow, and pass that to the system.
The problem is this is completely os dependent, and the virtual memory doesn't work the same on every system. On windows I still need some sort of way for (possibly a callback) comitting memory incrementally, and even if I ignore all those issues I can't keep allocating couple of terabytes of address space for every library call. And I can't create a seperate api for systems that don't have the same virtual memory mechanism.
I don't have any more ideas on how I can achieve this. All the ways seem to have some sort of problem with either code flow, being os dependent or too dirty. I have had this issue coming up multiple times now and I have done the first solution. But even on a small scope it becomes hard to manage, and breaks the code state inside the api call, requiring saving and restoring state.
There isn't a single solution that works everywhere.
From my limited experience you want to have a default allocator that you, the lib author, choose (can be malloc, VirtualAlloc or whatever you want), and let a possibility to the user to change it only if they want. But it should be something like: a way to allocate memory, some way to free memory, and some way to access some state (e.g. a void* that point to whatever the user allocator needs to work, like you said).
void* memory_alloc( void* allocator, size_t size /*, u32 alignment ? */); void memory_free( void* allocator, void* pointer /*, size_t size ? */);
If the user is compiling the lib, those could be macros that the user defines. If the user only links against the lib or loads the dll, those could be function pointers (callbacks) passed to the function that initialize the libraries (or in a context/allocator struct...).
I think that no matter what, if you want to let the user control memory allocation you'll need something like that.
Your first method is OK too if you don't return to the user for every allocation (you want to allocate things in big chunks). I believe "stb_rectpack" does something like that where they say "we couldn't pack all the rectangle in the memory provided, please allocate a bigger chunk". How do you free the memory with this method ? If there were several calls, does the user needs to remember what to free ?
And that breaks the code flow entirely
What is the issue here exactly with the code flow ?
Independently of how the user provides allocators, you can do allocation internally how you want. Even if the user provides "malloc" and "free" from the CRT, you can allocate a big chunk of memory and do an Arena allocator. You don't have to call "malloc" and "free" for each allocation.
Instead of reserving tera-octets of address space and committing pages as you go, you can allocate blocks (1Mio, 10Mio or 100Mio depending of your use case) and chain them. HMH switched to that type of allocation at some point.
You can also use the GetSizeOfThing
method and say "given the input parameter, this is the worst case scenario" and allocate for that (it's not always possible).
Memory allocation will always be system dependent. If you call malloc, you don't have guarantee that it'll work the same way, or with the same perf on different OS, or version of an OS. You also can't do something that will work on every system in exactly the same way, and tomorrow there might be a new system/OS that work differently. That's why handmade hero as a platform layer so that you can write a minimum amount of code to be able to run on a new platform. You need to know what your target platforms are, and make it work for that. If your targets are desktop OSes (Windows, MacOS, Linux) you know that all of those have virtual memory, and a huge address space. If you target mobile devices they might have that (I never checked but I'm assuming they do).
You might have more (and better) answers if you ask on the handmade discord.
Thanks for the answer.
From my limited experience you want to have a default allocator that you, the lib author, choose (can be malloc, VirtualAlloc or whatever you want), and let a possibility to the user to change it only if they want. But it should be something like: a way to allocate memory, some way to free memory, and some way to access some state (e.g. a void* that point to whatever the user allocator needs to work, like you said).
That assumes that the user has a generic allocator, something similar that can replace malloc(). That's not always the case. Not everyone wants to implement a malloc() free() alternative in their codebase, for example I certainly don't. And I also don't want to link with the libc either (that is if I am even writing something on userspace. If I'm on a freestanding enviroment I don't even have that choice.)
Not only that but that also assumes a certain usage scheme for the allocator. Your allocator might look like:
void* memory_alloc( void* allocator, size_t size /*, u32 alignment ? */); void memory_free( void* allocator, void* pointer /*, size_t size ? */);
Like the example you gave, but that gives no clue about how it's being used. If I'm on an embedded system, I might not want to call that on a high irql. Or I might have certain allocation requirements which force me to do allocations on after certain conditions are met. So I might need to defer the work until that time.
Memory allocation will always be system dependent. If you call malloc, you don't have guarantee that it'll work the same way, or with the same perf on different OS, or version of an OS.
The problem is with GetSizeOfThing()
you didn't have this problem, because the code flow was on user. They would be the ones responsible for handling this, thus you wouldn't have this problem. But with the callback way of doing things, you now took that problem and made it your library's problem.
If the code flow was on the user, like GetSizeOfThing()
they could very simply call the function to see if they have a chunk that is avaliable, and defer the actual call until the conditions are met.
The problem is GetSizeOfThing()
can only return a fixed size result, and cannot be adjusted dynamically. Furthermore even if we guarantee the worst case scenario for allocation that number might be insanely high.
Instead of reserving tera-octets of address space and committing pages as you go, you can allocate blocks (1Mio, 10Mio or 100Mio depending of your use case) and chain them. HMH switched to that type of allocation at some point.
That's not the problem. You can allocate in any way you want. The problem is how you handle when the code flow/api design once it gets to that point.
Even if we assume we are only writing the library for the userspace code, and one os that still doesn't make callbacks a proper choice. Because I don't want the library to dictate the code flow, I want the user to be able to dictate that.
Independently of how the user provides allocators, you can do allocation internally how you want. Even if the user provides "malloc" and "free" from the CRT, you can allocate a big chunk of memory and do an Arena allocator. You don't have to call "malloc" and "free" for each allocation.
That completely disrespets the user's allocator. The reason they want to provide a custom allocator might be because they think the default allocator is slow, and they provided you with an allocator that is fast. But you chunk allocated from them, and called your own allocator, which makes things as slow as before. And only time it becomes fast is now the rare time you call their allocator to allocate another chunk, which barely makes any differance.
If I understand correctly, you want your lib to be able to signal the user that it needs some amount of memory (possibly that it needs to free too), be able to return to the user and wait for the user to do the allocation and call the library again ?
If that's correct it seems pretty specific (I wouldn't expect most library to want return flow control to the user for each allocations). A different way to do that than your first idea would be a message loop maybe ? But that seems more complicated than your original idea.
Someone on the discord server will probably help you more that I can.
If I understand correctly, you want your lib to be able to signal the user that it needs some amount of memory (possibly that it needs to free too),
Signaling is a wrong term. I basically need some way to allocate a memory while the code flow is on my library, (Like UserCode -> Library) without calling them back, so that I don't dictate the code flow of the whole program (Like UserCode -> Library -> UserCode)
Even if dictating it was not a problem, I have to would assume the user can allocate memory whenever they want, however much they want without any issues within my given callback, and that callback does work well with the user's memory allocation patterns.
I can write a function to calculate the size of memory before each operation, but that requires a pre-pass before the function. It might not be a performance bottleneck depending on what you do, but it's not something you can apply if the scale of the thing you need to pre-pass is big.
be able to return to the user and wait for the user to do the allocation and call the library again ?
That works on simple cases, but as the operations library need to perform get more complicated, it becomes a harder problem to solve. And even if it wasn't that doesn't change the fact that it adds more complexity than is necessary, only to allocate memory.
(I wouldn't expect most library to want return flow control to the user for each allocations).
Indeed, that's why I ask this question here.
A different way to do that than your first idea would be a message loop maybe ? But that seems more complicated than your original idea.
Yeah in no universe or timeline am I ever doing that.
Thanks for the answer tough.
I will try my chances on the discord channel if I don't get a satisfying answer :)