What's so Bad about Garbage Collection

Henry Kloren

#16304

September 12, 2018

Hey guys! Previously, I only learned Java in school, so this is absolutely not a argument; just a question.

why is GC considered so bad in games? After researching I saw 2 primary reasons:

The first is that it clearly has some overhead in that it needs to check for how many references each object has. I get this point.

The second is that GC is non-deterministic and can happen at any time, causing potential stalls when rendering a frame. How is Manual Memory Management better for this? I mean at the end of the day you need to free the memory eventually, right? Is the idea that you do memory management during Pause/Load Screens, etc? I mean when a Object has no more references, it seems like you would want to free it, so I don't really see the problem.

BTW the reason I ask is because my teacher mentioned that while GC used to be slow, it is "as fast as manual memory management nowadays." However, based on the Streams I've watched so far, that does not seem to be true. So, I'm very confused.

Edited by Henry Kloren on September 12, 2018, 5:59am

Mārtiņš Možeiko

#16305

September 12, 2018

The second is that GC is non-deterministic and can happen at any time, causing potential stalls when rendering a frame. How is Manual Memory Management better for this?

Because you have control when and what to do. Even if you need to deallocate something, you control where to do that. With GC you don't have such control and VM runtime decides this for you.

I mean at the end of the day you need to free the memory eventually, right?

Yes, but that can be as easy as one sub operation. For example - allocate one "huge" block of 1GB for frame temporary memory. And at the end of frame you do one sub operation to move pointer back 1GB. That's it. This is what HH does for most of allocations. You allocate big block of memory - and then subdivide it in smaller "allocations" which are very cheap, just one add operation. Basically dellocation with this kind of scheme is super cheap - it is constant time, regardless of how many "objects" you have allocated. It could be million objects, but still deallocating all of them costs you ~one assembly instruction.

But with GC you'll need to walk all over the hierarchy to figure which objects are alive and which not. And this worse - there's no way to do that in controlled way once per frame. Maybe VM will decide to buffer up this process. So it will do that when some critical threshold of allocated memory happens. And then you'll pay x amount of times instead a bit in every frame. Let's say your GC takes ~4 msec to collect all the "garbage" objects that were allocated during the frame. That's reasonable and manageable, even for 60Hz rendering where you have 16msec time to render one frame. But what if VM decides to do that only ever 4 or 5 frames? Then suddenly you have 16 or 20 msec of delay where everything stops and only GC runs. That is one whole frame lost. You cannot do 60Hz rendering anymore. And its even worse for VR where 90Hz is required for smooth rendering.

Yeah, sure there are more modern GC algorithms which run in parallel and in background threads. That helps a bit, but still does not solve delay issue 100%. All algorithms eventually still need to stop all the threads in process to properly clean up "garbage". And you don't want such delay.

I mean when a Object has no more references, it seems like you would want to free it, so I don't really see the problem.

The problem is what does "free it" means. With GC there is no explicit "free" operation. You just "forget" about the object - think about it like assigning null to object. GC system will take care of actually "freeeing" it only later which you don't control. That's the problem.

Edited by Mārtiņš Možeiko on September 12, 2018, 7:24am

Henry Kloren

#16306

September 12, 2018

so in a GC language, is there really no way to do a Memory Arena type of set up, where you allocate memory up front? is it really the case that it has to allocate one object at a time, every time? if so, why isn't this possible in GC, because it seems like it should be... i mean shouldn't it still be able to reference count a 1GB block allocated up front, why would it have to scan every object in that block?

Edited by Henry Kloren on September 12, 2018, 7:28am

Mārtiņš Možeiko

#16307

September 12, 2018

In languages like C# or Java there is no way. If language is designed with this type of allocation in mind, then it could work with GC, but I am not aware of such language.

Btw you can think of Memory Arena type of allocations as one kind of garbage collection. Any objects you allocate from this arena are "garbage collected" once you reset arena pointer. This is actual garbage collection, but in more controlled way :)

is it really the case that it has to allocate one object at a time, every time?

In C# you can get around this by using array of struct. Struct is value type so it does not require separate allocation. It will allocate whole array with one allocation.

Again - this is not about allocation itself, but its about control when to run GC. Even when you could allocate objects in specific place (arena), language now would need to do extra checks on every assignment that you are not storing pointers to these objects outside of arena, which costs performance. Or it would require different type of assignment operator. Current GC languages such as C# or Java are more about "safety" of code. It guarantees that you cannot unintentionally corrupt memory like in C or C++. So it requires you to work with restrictions of GC type of allocations. Otherwise it would not be possible to guarantee this "safety" feature.

i mean shouldn't it still be able to reference count a 1GB block ...

What do you mean by reference count here? C# and Java objects are not reference counted. Their GC algorithm is mark-and-sweep or some of its modification/improvement.

Edited by Mārtiņš Možeiko on September 12, 2018, 7:33am

Henry Kloren

#16308

September 12, 2018

i see, so just to summarize and see if i fully understand.

GC itself has 3 primary issues.

1. it is obviously slow to have to check whether something has any remaining references via something like Mark & Sweep.

2. the GC runs whenever it feels like, so even if it was possible to maintain 60 frames, if it ran every frame, if it chose to run every 5 frames, the GC may take the entire frame or more, which results in a frame dip.

3. it is much faster to allocate a 1GB block, then deallocate that block in 1 alloc and 1 free, than it is to free individual objects (although this is potentially just because the common GC languages don't support this feature, and because it may require extra checks on assignment).

Is that about right?

Mārtiņš Možeiko

#16310

September 12, 2018

Almost right. Third one is not really an issue. I was just explaining how memory allocation can be faster because it does not need "GC collect" operation at all.

In some cases GC allocation is a bit slower than simple "add operation", but its more or less the same. And there is no deallocation in GC for individual objects at all. Deallocation costs exactly 0. In GC languages allocation and deallocation often is faster than C++ new/delete operation. GC languages allocate larger blocks from OS (just like malloc) and then reserve space for objects in allocated block. Once GC algorithm figures out that all objects in particular block are not reachable anymore (this is what costs performance), then it deallocates whole block by calling free/VirtualFree or similar runtime/OS function.

One more issue with GC is that you don't control memory layout which could be important for your algorithm due to effects on CPU cache memory. Objects are moved in memory during GC operation. Often GC languages have multiple heaps/levels for objects. All objects initially are allocated in first level, which is super fast, typically just same "one add op" like in memory arena. Then if object survives 1 or 2 GC collections it is assume "long lived" and copied over to second heap. Sometimes there are more levels for specific cases (large allocation, etc..). This costs some performance. All this moving means that you cannot keep persistent pointers. If you need stable address, for example for passing to GL/D3D/Vulkan APIs, you need to "pin" objects to fixed location which complicates GC algorithm (again performance). For objects that can be moved around it needs to update all the pointers inside other objects which point to moved object. I hope you can see that all this is about control - which you don't have if you use GC. And it goes back to your first point - slowness for GC algorithm due to checking and updating references to live objects during GC compact operation. You are paying in performance for a lot of hidden cost that could be simple "sub" operation for memory arena :)

Edited by Mārtiņš Možeiko on September 12, 2018, 8:21am Reason: typos

Henry Kloren

#16311

September 12, 2018

understood. thanks very much for your help.

ratchetfreak

#16313

September 12, 2018

A general GC does not make any assumptions about your allocations which means it must be backed by a general allocator. This is the first big issue. You (in the majority of times) cannot specify whether an object is short lived or long lived, which objects should be allocated close together etc.

A custom allocation scheme (like HMH's arena) Puts a few restrictions on your allocations but this downside is much less than the gain you get from simplifying the operations needed to allocate and deallocate.

A GC also needs meta information about what you are allocating so it knows where the pointers are. If you are using dynamic typing (inheritance, tagged unions,...) then it also needs to tie into that to figure out which concrete class each object is. You can do without that if you assume that every pointed-aligned & pointer-sized value is a pointer, but this leads to phantom strong references that result in leaks.

GC can also leak object if you forget to null a reference to an object. Most common way this explodes is through a cache or map that doesn't get cleared.

A GC also needs a way to access all objects independently from the application's object graph. Running a full GC will also touch all memory the application allocated which will pollute the cache. The concurrent mark implementations need the executing thread to do some extra work when they read/write pointers to avoid race conditions leading to false negatives.

Neo Ar

#16321

September 13, 2018

HFKloren
The second is that GC is non-deterministic and can happen at any time, causing potential stalls when rendering a frame.

It should be noted that GCs do not have to be non-deterministic in this way, in D there are well defined points at which the garbage collector may be triggered. If you write your code with this in mind and the GC works in a well defined manner like D, you can avoid collections at critical times - the real problem is just how slow it is, in a game your entire window is the length of a frame and there is no "good time" to trigger a collection if the collection is slow, that time is needed for updating the game state and rendering the frame