Memory Management (or lackthereof)

I've been catching up on streams, so I'm pretty far behind, but I've noticed quite a few times people asked questions about how memory is being managed and how you can be sure there aren't any leaks.

Not going to try to explain how no leaks are occurring here since Casey has already explained, but I suspect many people are still left with questions.

A blog post by Noel Llopis I think will give some insight to people wondering about how some games manage the block of memory internally once they have it: http://gamesfromwithin.com/start-pre-allocating-and-stop-worrying

One of the key atrocities that was indoctrinated into me during college was the idea of allocating memory only at the moment of need and only for the size that you need it. For a long time I thought this was the one true way of requesting memory from the system.

However, I'm slowing coming around to the reality that requesting memory in this way has huge burdens on me, the programmer, and in fact is worse for a huge number of cases. My guess is Casey is going to do something similar to what Noel is suggesting in the blog post. I've read quite a bit that game developers especially tend to use very custom tailored, simple allocators to solve the problems they actually have, rather than use a crazy general allocator that does nothing well.

I feel like Noel covers the bases pretty well, but anybody have anything to add or know of any other resources for how people deal with memory? Jason Gregory (Naughty Dog) wrote a book, Game Engine Architecture, which is pretty solid and has a decent section talking about memory. Seems like it's pretty popular (at least in the console development world) to follow this kind of memory allocation strategy. Desktop PC software, however, seems to do quite poorly with memory these days...
"A blog post by Noel Llopis I think will give some insight to people wondering about how some games manage the block of memory internally once they have it: gamesfromwithin.com/start-pre-allocating-and-stop-worrying"

This is an decent article, yet it is incomplete, imo.

There are problems with both methods that are left out. The biggest problem with the "preallocation" form, is that you can exhaust the adresspace, even when there is memory available. On x64 this isn't a problem. But on other plattforms, it is a real problem that must be solved, unless you know exactly what your bounds are.

If you know your bounds, exactly. Then the way Casey does it, is the very best.

But chances are that you don't.

Another thing the article leaves out, is paging. Paging is WAY more costly then the allocator itself. And so it should have been mentioned. Paging will hurt you no matter which way you design your memory allocations. And it is (can be) about as costly as a copy.

The problems with a generic memory allocator is mentioned in the article. Except for bugs. Small bugs, that will happen, in code that is complex, can pervert the memorymanager. And when this happens, all bet's are off. And it will happen.

Preallocation will not allow for unknown quantities of objects of an unknown size. If it is possible that the worst case can happen, then it will happen, hehe, and you end up dumping virtual memory space on covering up for the worst case. While hurting the common case.

I think the best is to combine the two strategies. Create an memory allocator, which is fully dynamic, and then, preallocate objects in it. The allocator should have properties that allows you to avoid all searching.

The allocator should have the debugger built into its structures. It will require a considerable amount of overhead. But this overhead can, in all cases, be devided by preallocating objects. And it has the advantage of speed.

If you preallocate 1000 objects, then you can cut overhead and speed of allocations in 1000. I am not saying you SHOULD always do this. But when you need to. This last point is important. There is no way to ensure performance if you do not want to get into the spesifics.

Before you write an allocator, I would advice you should THOUROUGHLY test VirtualAlloc or whatever you use for systemallocation. This mean you need to do a lot more than just make one or even a 1000 alloations with it, and call it the day. You need to stress-test it. You need to uncover it's weakness. So that you can design around it. This is almost impossible to do for the long term. Since they can change it at any time. Which is why no performant origented person wants to call to a library in the first place. But today, we don't really have a choice.

I wouldn't assume what anyone says about it, to be true. I have really not found a single person who truely knows. A problem is that since it can change at any time, anything you say about it can be wrong tomorrow. So I will not do that. You need to find out yourself.

If you write your allocator as an object, you can create more then one, that can be called in a similar way. That way, when you run into problems in a large codebase, or in multithreaded code, you can put suspicious objects into their own allocator. This also avoid the need to lock those allocations. This create even more overhead, but it will be worth it.

NB! Memory allocated by an allocator, is just as much GLOBAL memory, as memory allocated as global static memory.
Interesting read because I just finished episode 14 and he does exactly this. I want to try this way in the future.

I do wonder, would this work in a MMORG game? I remember Allods had a big memory leak that caused a lot of problems.

Edited by popcorn on
This can work for any game, including MMORPG. Even for non-game applications, if done properly.
mmozeiko
This can work for any game, including MMORPG. Even for non-game applications, if done properly.


Can you go into what "done properly" means then?
Well I could imagine for simple apps like few buttons and dialog or something it would be not much different that what currently is implemented in Handmade Hero. Just one allocation at startup, and thats it.

But for some apps that load files with unknown amount of content, like text editors or audio editing apps, or even browser this approach wouldn't work like it is currently implemented. You can not allocate lets say 4GB at text editor startup to support any file up to 4GB. Because you don't want your desktop app to always take 4GB even if you load small files. So you could have multiple regions - one like it is implemented currently for stuff that is always there (cursor position, selected font index and so one), and then allocate separate region when loading new file. And when file is freed, release this region back to OS. For web browser type of apps, you could for example allocate and free regions per tab, but realistically it probably would require a bit more sophisticated region allocation - like one region for content loaded from network, one for JavaScript heap and so on. But it would still be significantly less allocations than apps are making today with ton of calls to malloc/new. You just need to figure out who and how much is using memory, then memory management becomes trivial.

Edited by Mārtiņš Možeiko on
I suspect that an application such as Photoshop or Maya would be a little more difficult to write with this allocation strategy (or at least, just very different). I'm not sure, I haven't tried to write those kinds of apps that are heavily GUI and user event driven before. Maybe games simply have an inherent advantage in that we tend to know a lot more about our data and memory lifetimes and can handle them in very specific ways?

For example, I've been trying to work on some side projects where I set up a big memory buffer up front and try to do everything with in that buffer for the entirety of the lifetime of the program. At the start, I basically set up some regions in the buffer, like a "heap" region at one end of the buffer, which I use for the rare case of memory that needs to persist over a few frames, but I'm not sure how many frames. Then I have a "level region" on the other end of the buffer, which is data that lives throughout the entire run of a particular level and gets reset once the level finishes. Then right next to the "level region" I set up a stack area which is strictly for per frame allocations that gets reset at the end of every frame.

Something like this:
1
|------HEAP-----|------UNALLOCATED------|------PER FRAME STACK------|------LEVEL-----|


I've noticed in my main project that the number of instances of memory that truly needs to live in the heap region is actually quite small. Like, vanishingly small. Most of the memory I use in the main loop of my game actually falls into the per frame stack area. There are some instances, of course, where I would like to retain some state of some dynamic event and would like to keep track of it over time, but this is not exactly the norm (at least in my game).

With this knowledge in hand, it really does seem like "memory management" here is quite simple. If most of the time you can get away with using the per frame stack, then it's just doing simple pointer movements and resetting every frame. The real issue is if you need to do a lot of "heap" allocations and you don't know exactly what the lifetimes are, but even there, I think you can do a lot to use simple allocation strategies instead of a full blown general allocator.
The way I see it, you both suggest a combination of the two strategies. As you both see their limitations. Mmozeiko implement it differently, that's about it. DaleKim's implementation seems reasonable, too, but only when scope is known.

We agree that if you know the scope of your thing, and you can slowly evolve it, and adapt it. Then an allocator isn't needed.

But I would still stress, to beginners, that it is needed to have the option. And the argument that you will win anything by preallocating stuff, already at designtime, is simply not true.

There are two things that cause delays, uncached memory, and paging. (And in the worst of cases; sleeping devices). You cannot avoid any of these, except for very spesific tasks, where you took care to avoid it.

400 cycles, is like 1200-1300 nano-seconds. A millionth of a second. The only way this can hurt you; is if your implementation is braindead. Meaning; it takes the hit, time and time again through intensive loops.

In most cases, for every first touch of a data struct, you will need to take that hit. So being worried about a calltable or not is laughable. For complex applications, you will in _either way_ need to think long and hard to optimize for that, and the deciding factor then, will have nothing to do with the use of OOP, or an allocator or anything like that.

It ONLY has to do with you knowing what you structures looks like in memory. That is: You being intimate with your program and the way it use memory. And being able to change it. It is not OOP or an allocator that prevents this, but more or less willful ignorance of these things. Once you know, you'll make sure to optimize the allocations so that they are in the order you want.

I will argue that using a highlevel language to implement those things, will encourage this kind of ignorance. But if you do this slowly, in a low level language, you much faster become aware of such things.

The oo vs do debate is really retarded, since it fully avoid talking about the real problem, which always is to understand what you are *spesifically* trying to do. In EACH SINGLE CASE.

The reason anyone would use C or C++, or any other very high level languages, in the first place, is because you really don't want to understand. And you're not willing to admit to yourself, that in the end, you will be forced to.

I will repeat that. Preallocation, by itself, will not buy you anything in terms of speed.
In my own experience, preallocation often bought substantial and measurable speed improvements. Maybe some of the cases were initially more pathological than they could have been, but I've found it often buys me speed with a lot of other (more important) benefits.

I'm a little puzzled at how you can even claim that preallocation doesn't buy you speed. The entire fact that I'm avoiding the system's allocator alone during runtime is enough to buy immense speed improvements for allocation heavy code. And if I did preallocate, I only pay the cost up front and pay a very small cost during runtime for using it.

Also, I'm not sure why you're mentioning paging like it's such a huge deal. Yes, paging is slow and terrible, but I don't see why it will even be an issue if you size your memory allocation properly and the system has enough memory to even run the game in the first place? If you're ever in a situation where you'd be paging, you're already dead. Most consoles (I don't know about current ones) don't even have a virtual memory system to back main memory with the hard drive, so you don't even get to page; you just crash and burn.

I just don't feel like the alternative of ad-hoc, on demand heap allocations are a *better* alternative to preallocating your memory for the application, at least for games. Maybe more convenient for the uninitiated, but I've already noticed so many wins from just allocating one area of memory ahead of time and handling it myself in my own code.
Dalekim.

Sorry for the long post again.

TLDR: Programming, and performance is about solving for spesific cases. Not about using an allocator or not.

I am not saying preallocation is bad. I am saying it should be done via a memoryallocator, because then you can preallocate for every spesific case, and not anything else.

The cost for preallocating everything is wasted memory space. But the more serious problem with it is that it can mean your project no longer can scale well. Because all of a sudden it needs an allocator.

An allocator can also preallocate memory you know. And become a cache. This is how I do it. My allocator preallocates memory, and reuses it. But the interface is a generic allocator. It may be that the one in C++ isn't any good. IDK. But mine is extremely fast.

But if I don't need to have more then one thing of something, and I need unknown things of something else, and this differs between each run, I can have it. In most cases a texteditor needs to load only a few pages of text. But there exist cases, where you would like to have more. Both those cases fits well in an allocator.

You need to plan for it. But the same is true about preallocations. You need to weight things against each other, and this will become a nightmare when you have say 100 or 1000 different types of structures. Which could easily happen. Even in a game. And then you are either fucked, or you need to implement one. You can wait until you need it, but I am saying it would be better to learn how to use the one you got, efficiently.

An often ignored fact of computing is that even relatively small amounts of complexity can easily cause problems for you. On the drawing board, dividing your space in 3 parts like you do, seems easy enough. But it's already way to complicated for when you run into the problems I am talking about.

If you preallocate everything, you are eating up your adressspace. And it happens very quickly, unless you have extremely tight bounds. If you know what you are doing. then go ahead. But for beginners, this is a different case.

Most of the programs I am making wouldn't be possible without an allocator.

And now, as I am making my first game, I end up trying to guessing at how big every array should be. I started implementing it by "preallocation", not using the allocator. But there's so many cases already where you would want one.

You cannot write a gameengine of a finite scope, unless you willing to tell your users that you can't have this, and you can't have that. You know, I have seen even geniuses fail completely with this. They fail, because they fail to recognize their own limitations.

An allocator is an extremely important tool. And if it makes your program slow, its because you didn't understand how it worked. If you need unknown amounts, but you know there are cases which needs many, you preallocate them, 1000 at a time. IN the allocator. If you preallocate everything, you need to preallocate all of that memory, all the time, to make sure you've covered up for the worst case.

Just now, yesterday, I was implementing something I call "drones" in my game. They are these entities that follows the player and defend him. They float around him like a cloud. But initially they didn't have much spread. They all moved to the same spot behind the player when they came to rest. And they all attacked the same target, using the same search algoritm for locating an enemy.

I was surprised to see it could easily handle 300.

300 is way more then I intend to use, but I was just testing. It seemed to work very well. In GDI. There was absolutely no problem with it. Then I implemented a "formation" for them. Now they are spread across a large area around the player. Instead of all moving almost uniform. This simple step dropped the framerate by 10 frames when they attacked! The way I have written it, I know exactly whats going on. And the only thing new was the spread. And the only thing that I can imagine to have caused the slowdown is that now the cache is trashed, over and over, as it writes the sprites to way more, random, locations in memory. Jumping back and forth.

No amount of preallocation of my internal structures is going to help here. I don't know what will help. I don't know even if it's worth solving. Maybe I could sort them by memory adress before painting? Or maybe just give up GDI?

My point is that the speed of running software has nothing todo with using an allocator or not. Is about how you use your resources in a SPESIFIC case. It's not about this philosophy versus that philosophy.

Edited by Livet Ersomen Strøm on
Yes, memory preallocation eats up your address space. Why that is different from non-preallocation? When game has allocated 1GB of memory, how it is different in preallocation or non-preallocation case? You are still eating 1GB address space of your process. Same in both cases. You are actually eating more address space in non-preallocation case, because memory allocator needs to store somewhere which regions are used, and which are not and change those data structures during allocation.

I don't understand your point why preallocation is bad. All you talk is something about "unknown things" that might happen, or because you need to think about how you use memory. Of course you do. Every time you code. There is no difference in non-preallocation case - you still need to think how you allocate.
I didn't say it was "bad". I say it is good. What I am saying is: not to do it through a dynamic allocator, is "bad".

If you wrote your allocator to make use of your 1GB preallocation, to dynamically subdivide it, and reuse it, then you are using an allocator...

But if you are guessing the scope of your structures, then you're in for a a lot of uneeded pain. (unless you have a limited scope).

I guess you're not really reading my posts, or my english is worse than I thought, so this is last word from me in this thread.
I think we're getting mixed up on some terminology and which level of the problem we're talking about.

Specifically, when I refer to an allocator, I mean the system's general heap allocator that can be interfaced (usually) via malloc() or operator new.

The problem I am talking about specifically is how your application gets memory from the system and uses it.

While I was in university, there was *zero* mention of a memory management strategy that was like what Noel Llopis proposes and what Casey appears to be doing for Handmade Hero. The thought was that you call malloc()/operator new for every time you need some memory and then you free it when you don't. In a C world, this is incredibly tedious. In a C++ world, it's incredibly tedious and starts to snag you into the swamp of C++'s object model.

On top of that, they seemed to completely ignore the realities of software which is that you don't have unlimited memory and there are lots of software engineering issues that aren't helped out by calling malloc()/operator new at every site you need memory.

It started to make more sense to me why game developers often allocated one big block at program start which is intended to be the entire memory footprint for the application. They're not saying you need to have compile time fixed sized arrays for everything in your game. They're saying that you need to absolutely own your memory, and that the easiest way to do that appears to be by requesting one big block of memory from the system at the start, and then hand out bytes from that block the way you need it.

They might have their own little dynamic allocation function that works within the block, but they never go back to the system heap allocator. And my current experience has shown me that the supposed necessity of the generalized heap allocator provided by the system is in fact false for the kinds of software that I'm writing. Maybe it is absolutely necessary in other kinds of code, I'm not sure.

But in my personal side projects where I have gone with this memory allocation strategy, this has yielded some really big wins. I often find huge performance wins, but this isn't even the primary advantage. It really is just a nice to have benefit on the side. The big wins are now I know exactly how much memory is being used, which portions of the memory range is being used by what parts of the program, and debugging memory issues is a heck of a lot easier since I can have a lot of custom control over the state of memory.

In my projects that do this, oftentimes I make ONE heap allocation in the entire run of the program. If I make more, it's usually due to some library or system call that ends up making those allocations as a result of calling into them, but I make pretty much zero allocations in my main loop.

Yes, sometimes I'm "wasting" memory, but I'm sizing my allocation to be as big as my program needs to actually run. It's nicer to know that my program is guaranteed to have all the memory it ever needs to run than for me to go to the system allocator all the time and then realize halfway through execution that I actually didn't have enough memory and now I have the awesome choice of hard crashing my program or trying to back up and use less memory...

The hard part for me about this way of dealing with memory is that I still spend incredible amounts of time thinking about how to not waste it. By this, I mean that I write a lot of malloc()/free()-like pairs in all of my code and it's just pretty time consuming and laborious. I suspect that I'm spending a little too much time focused on individual "objects" rather than entire batches of data, but this is my biggest disadvantage so far; it's still very manual and very time consuming.
I agree to most of the things you say, DaleKim. I see nothing wrong with your thinking, and I see you have all the important points pinned downed. But then again, you're hardly a beginner.

For the many malloc/free pairs, using a caching memoryallocator, you write yourself, will nullify that problem, in most cases. And then preallocating of a resonable amount, where needed, will nullify the rest.

But, if you ask for 2 mega at a time, 10 mega or 100 mega, or the entire 1G, from the system, it will not matter much for performance. (I can access 3,8G in win32).

You still need to optimize your code, if you want it to rock. And even if you have allocated 1G, doesn't mean the OS has paged it in yet. That's what I mean. And paging is ORDER of magnitudes slower than the allocator itself. Unless it's an extremely poor one, which performs searching inside for free blocks. (My first one did this, so I know :D)

The problem in OOP code, is the misuse of those features, due to habit. It's because we are not aware of how the beast works. And this is natural, because if we could avoid having to think about it, then of course nothing would be better. We have better things to do. If all memory was just as fast as CPU caches, and we had infinite amount of it.

btw.
I want to add that sorting my "drones" by X, and Y coordinate, now I can have 3000 drones and my framerate is back to mostly 30 frames. I havent tested more, but it seems like it is now scales perfectly. Because now no matter how many drones I add, they will effectively be redused to just one bufferclear, at the most. Now the next limit will be much likely sorting.

Programming games is even more fun then playing them. I can almost not believe I didn't do this before. It feels like comming home. Thanks Casey! I have 3000 fucking amazing, now black and shiny "drones", that are agressively attacking any hostile units, and it's awesome. It looks like rebirth of "swarm"! And I have never made performant graphics before. It looks like I almost have a game already.
This is a very interesting topic. I have never done or heard of this method and I want to try it out because the way I did for my last game was so horrible. It does the whole pause for a few mins before closing thing that Casey mentioned in one of the episodes. The worst part is that it crashes on Vista, I think but works on other Windows Machines and Linux.
I am currently finishing another game and wonder if I should pre-alloc memory or just leave it to the game engine, Cocos2D to clean it up. The game engine is doing a terrible job to begin with.