Preallocted memory and dynamic arrays

Farhod Miralimov

#12918

August 23, 2017

Hi all,

I'm having a little trouble understanding the concept of preallocating a block of memory, and only ever using that block for the game. It makes a lot of sense to me why you would do such a thing, but there are some details that I don't understand.

I'm fairly new to C, so here's my understanding of this method:
1. We allocate a block of memory (basically no layout info, so void* pointer of some size)
2. We pass a pointer to the block to the game loop, and the game loop casts the block to the layout/struct it expects: that way it can divide the memory into useful chunks as defined by the struct.
3. The platform code only knows the info as a block of memory, so I can freely change the game struct, and it won't break anything.

Here's the confusion:
When I want to add some info to the memory like player position or something, I can simply add:

struct game_memory {
    float player_x;
    float player_y;
}

But how do I add dynamic stuff like number of vertices that are loaded from an obj file? Only way I can see right now is defining a big array, and use only the parts I need:

struct transient_memory {
    vertex vertices[1000];
    int vertices_count;
}

and that way I can use the vertices_count as a "watermark", and clear it to zero if I don't need the vertices later.

Other resources out there solve the problem by using std::vector, which doesn't seem like a good idea to me.

So is there a better way to do this? Or is it fine as I described it?

ratchetfreak

#12919

August 23, 2017

when uploading to opengl/directX/... you can allocate a buffer and map that into memory.

Then you can build the vertex data in place and only need a little bit of actual data outside of that.

It helps if your asset format isn't hopelessly broken for that purpose like obj is. You need the count of vertexes that will end up in the buffer and the format of the vertexes before you allocate. So put that in the header of the asset file.

once you have read the header you allocate the buffer, map it and then dump the vertices directly into there.

For the times where you actually need the dynamic array then you can use a chunked linked list.

struct entity_chunk {
    entity entities[1000];
    entity_chunk* next;
}

Then you can allocate chunks as needed without needing to move them. if you empty out a chunk you can put it in a free list and reuse them as needed.

Simon Anciaux

#12922

August 23, 2017

sneiderlein
2. We pass a pointer to the block to the game loop, and the game loop casts the block to the layout/struct it expects: that way it can divide the memory into useful chunks as defined by the struct.

You don't cast the block to the layout/struct. What we want is to store the game_state struct at the beginning of the block and use the rest of the block to store other things.
Since we want it to be the first thing in the block, we can cast the address of the block to the game_state* type to access it. After that you can use the rest of the memory to store what you want.

|-----------------------------------| <- Memory we get from the win32 layer.
|game_state|other things------------| <- we want to treat it like this.
game_state* state = (game_state*) memory; // memory is void*. This is not the actual code from handmade hero. 

// Those next two lines are the values you would use to create a memory arena for the rest of the memory.
void* rest_of_the_memory = memory + sizeof( game_state );
u64 rest_of_the_memory_size = memory_size - sizeof( game_state );

The rest of the memory can be used to store any data you want. In handmade hero we use "push" functions with a memory arena to add the data at the first available byte in that memory and never release it. So its life time is the entire duration of the program.
If you need temporary memory (for example only for the duration of a function call) you can save the state of the memory arena (saving how much byte are used), allocate your data and process them, and finally restore the state of the memory arena (BeginTemporaryMemory / EndTemporaryMemory in handmade hero).
All these function allocate at the current position in the memory arena so there are no gap in the block.

|game_state|------------------------| push(...)
|game_state|x|----------------------| push(...)
|game_state|x|x|--------------------| BeginTemporary(...)
|game_state|x|x|--------------------| push(...)
|game_state|x|x|y|------------------| push(...)
|game_state|x|x|y|y|----------------| EndTemporary(...)
|game_state|x|x|--------------------| push(...)
|game_state|x|x|x|------------------|

In the case of a dynamic array:
- if the lifetime is just the duration of the function call use Begin/End temporary memory;
- otherwise you need to evaluate the maximum size and allocate (pushArray ...) that. You can also create a "sub arena" in the current arena and just push/pop things on that.

|game_state|x|x|--------------------| createSubArena(...)
|game_state|x|x|------|-------------| push(sub, ...)
|game_state|x|x|y|----|-------------| push(sub, ...)
|game_state|x|x|y|y|--|-------------| push(main, ...)
|game_state|x|x|y|y|--|x|-----------|

If you really have no idea of the size and want a dynamic array, you probably want to use a different memory allocator.

pragmatic_hero

#12929

August 23, 2017

What If I need something like N frame allocators?

Something like N vertex buffers where I don't know beforehand how many vertices there are going to be in each one at the end of the frame?
(sort of in a immediate mode rendering API).

And It'd be good if vertices were in continuous memory.

Right now i just use big fixed arrays (hooray!)

I was thinking about reserving N big chunks of virtual address space - each one for each "frame allocator" and then just committing pages as necessary. And then freeing most of them at the end of the frame. Is that A-okay to do that?

Edited by pragmatic_hero on August 23, 2017, 6:56pm

Farhod Miralimov

#12951

August 28, 2017

Hi again,

Thank you for help, it's much clearer now!
(to learn better, I like trying to explain stuff I understood in my own words, please correct me if I'm wrong)

1. If we preallocate a block of memory, and write our game_state struct to its lower part, then:

1	p_to_the_rest = pointer_to_the_beginning + sizeof(game_state);

2. We can create a function that adds/appends any data starting from p_to_the_rest and returns a pointer to it, which can act as an 'allocator'. Something like:

static void* allocate(int size, void* start){
    
    // check that it doesn't overflow
    assert((game_state->last_pointer + size) <= game_state->memory_size);

    // I would dereference the pointer to write to it, right?
    *game_state->last_pointer = *start;

    // update the position of the last_pointer
    game_state->last_pointer += size;
    
    // return pointer to the newly allocated data
    return game_state->last_pointer - size;
}

3. We could create dynamic array implementations using that allocator, like a linked list, and all of our data will always be allocated on that block.

4. Like @pragmatic_hero, I wanted to start by creating a simple 'immediate' 2D renderer, idea for which I got from Jonathan Blows streams, something like:

static void push_vertex(Vertex vertex){
    vertices[last_vertex_index++] = vertex;
}
static void render(){
   glBufferData(...push all the vertices to GPU...);
   //            mode       first      count
   glDrawArrays(GL_TRIANGLES, 0, last_vertex_index);
}

It wouldn't be very performant, but I wanted to start out with something very simple, and to have some 2D rendering for debugging (maybe some imgui style editor widgets).
And so a linked list wouldn't be a good idea here, because opengl wants continuous memory layout.
So I could simply allocate some space for the vertices (using the above method), and use a part of it, and clear it after every render call?

5. Or maybe a better Idea would be to do it modern opengl way (how I understand it), where several primitive shapes are created and stored on GPU on initialization, and we could bind a correct VAO and send a transform matrix as a uniform before each draw to change shapes size, rotation and position. Could someone please give an example for this? I'm a little lost on the specifics:

a. Do you need to use some projection matrix like orthographic?
b. Do you need to manage the Z positions of vertices? Or you could somehow rely on drawing order and discard the Depth Buffer completely?
c. Maybe a math library recommendation? GLM is the most mentioned, but I would appreciate some generic free, and more c-style API. So far I found gb_math, and there is even a library called HandmadeMath.

I'm having trouble including these libraries though, I get a lot of undefined identifier errors.

Appreciate your help!

ratchetfreak

#12952

August 28, 2017

sneiderlein

5. Or maybe a better Idea would be to do it modern opengl way (how I understand it), where several primitive shapes are created and stored on GPU on initialization, and we could bind a correct VAO and send a transform matrix as a uniform before each draw to change shapes size, rotation and position. Could someone please give an example for this? I'm a little lost on the specifics:

a. Do you need to use some projection matrix like orthographic?
b. Do you need to manage the Z positions of vertices? Or you could somehow rely on drawing order and discard the Depth Buffer completely?
c. Maybe a math library recommendation? GLM is the most mentioned, but I would appreciate some generic free, and more c-style API. So far I found gb_math, and there is even a library called HandmadeMath.

I'm having trouble including these libraries though, I get a lot of undefined identifier errors.

Appreciate your help!

5:

foreach(object in draw_queue){
glBindVertexArray(object.vao);
glUniformMatrix4fv(modelMatrix, 1, false, &object.modelMatrix);
glDrawArrays(GL_TRIANGLES, 0, object .last_vertex_index);
}

with opengl 4.3 (or the extension ARB_separate_attrib_format) you can use separate vertex attributes replace the call to glBindVertexArray with a call to glBindVertexBuffer(s) and keept he same vao bound which may be faster for modern drivers.

A: you don't need a perspective matrix. The vertex shader's output is expected to be in clip space where the geometry to be within the boundaries defined by -w < x, y, z < w. The convention is to use the model matrix to put the vertex data int a world space and then use the view-perspective matrix to transform that into clipspace.

B: opengl guarantees that the order of draws is respected by default. This allows you to implement the painters algorithm, or if using the alpha channel for it the reverse painters algorithm.

Marc Costa

#12955

August 28, 2017

ratchetfreak

A: you don't need a perspective matrix. The vertex shader's output is expected to be in clip space where the geometry to be within the boundaries defined by -w < x, y, z < w. The convention is to use the model matrix to put the vertex data int a world space and then use the view-perspective matrix to transform that into clipspace.

Just a small comment/correction on that: OpenGL's clip space is defined as a cube expanding from (-1, -1, -1) to (1, 1, 1) and centered at (0, 0, 0), so -w < x, y, z < w is exactly -1 <= x, y, z <= 1.

I find it very useful to output vertex coordinates between -1 and 1 when testing screen space rendering before going all the way in with perspective matrices and pixel perfect drawing :)

Simon Anciaux

#12956

August 28, 2017

Your function for n°2 isn't correct.

static void* allocate(int size, void* start){
    
    // Here you compare a pointer and a size.
    // You need to either compare 2 pointers or 2 size.
    // - replace game_state->memory_size with game_state->last_memory_address
    // - or replace game_state->last_pointer with game_state->used_memory;
    assert((game_state->last_pointer + size) <= game_state->memory_size);

    // You don't want to de-reference pointers here.
    // If you de-reference them you modify the content (what the address is pointing to).
    // Here we want to modify or store the address of the content.
    *game_state->last_pointer = *start;

    // You didn't de-reference here which is good.
    game_state->last_pointer += size;
    
    // return pointer to the newly allocated data
    return game_state->last_pointer - size;
}

// Here is your function "fixed"

typedef struct GameState {
    ...
    void* last_pointer;
    void* last_memory_address;
    ...
} GameState;

...
game_state->last_pointer = memory + sizeof( GameState );
game_state->last_memory_address = game_state->last_pointer + ( memory_size - sizeof( GameState ) );
...

void* allocate( int size ) {
    void* result;
    assert( game_state->last_pointer + size <= game_state->last_memory_address );
    result = game_state->last_pointer;
    game_state->last_pointer += size;
    return result;
}

// But this seems less intuitive to me than what Casey is doing in handmade hero.
// Memory arena.
typedef struct Data {
    void* base; // The first address of "user" memory. It never changes.
    size_t reserved; // The amount of "user" memory. It never changes.
    size_t used; // The amount of "user" memory used.
} Data;

...
data->base = memory + sizeof( GameState );
data->reserved = memory_size - sizeof( GameState );
data->used = 0;
...

void* allocate( Data* memory, size_t size ) {

    void* result = 0;
    assert( memory->used + size <= memory->reserved );

    if ( memory->used + size <= memory->reserved ) {
        result = memory->base + memory->used;
        memory->used += size;
    }

    return result;
}

3. A dynamic array is an array that will reallocate memory when needed. It generally means that it will allocate a new chunk of memory bigger than the previous one, copy the old content to the new location and then free the previous memory. The type of allocator we use here doesn't do that.

If you use the method of sub arena I talked earlier and you free the memory, it will leave an empty chunk of memory that you will not be able to use in the future (moving all the memory to fill the hole would not help because your pointers would not point to the right addresses anymore).

If you just push things at the end, it would be ok if you never have another function that need to push something.

You could use a linked list but that will probably add some overhead (node pointers, random memory access...). Or use what ratchetfreak said in his first post.

You can generally find a maximum memory size for the data set you are working with, refine it during development so its the smaller possible when you release your application.
For example, in your renderer: how many vertex do you think you will have at a maximum ? 100 ? 1000 ? 100000 ? What is the size of your vertex (position, uv, normals, color...) ? With that information you can infer a size. During development asserts will fire if you run out of memory and you can then re-evaluate the size and find why you have more then predicted (which may be a bug).

Don't be afraid to allocate too much during development, on modern computer you virtually have infinite memory.

You can have several memory arenas: one for the game logic, one for the renderer, one for the sound...

4. You can do that. What I do is allocate enough space for the number of vertex I want at the beginning of the program and fill that each frame and send it to the graphics card. Also see ratchetfreak's post (about mapping the memory). This can be helpfull to avoid performance problems.

5.
a. You don't need a projection matrix. But the projection matrix basically helps you use what coordinate system you want. If you are in 2d it could allow you to use pixels coordinate instead of the normalize cube, and chose if 0,0 is the center of the screen, or bottom left...
b. You chose (and test) if you need depth buffering or not. It depends of what you want to do. Handmade hero started without depth buffer and using a ordering algorithm but then switched to use a depth buffer.
c. Both handmade math and gb_math use the same method: you include the file once in each translation units. In ONE (and only one) of the translation unit you need to define a symbol: #define GB_MATH_IMPLEMENTATION or #define HANDMADE_MATH_IMPLEMENTATION before the include of the library. If you use the same system as handmade hero, you have 2 translation unit: the win32 layer and the game layer (that compiles to a dll). So you would have

1 2	#define HANDMADE_MATH_IMPLEMENTATION #include "HandmadeMath.h"

at the top of win32_handmade.cpp and handmade.cpp

pragmatic_hero

#12958

August 28, 2017

mrmixer

For example, in your renderer: how many vertex do you think you will have at a maximum ? 100 ? 1000 ? 100000 ?

That only works if you have one vertex buffer.
In an immediate style graphics API - the sort which doesn't make a drawcalls every step of the way - it's going to be something like MAX_VERTS * N.

Where N = 2 ^ (bits_in_render_key).

And renderkey could have 4 bits for textures (16textures), 2 bits for blendmode and 2 bits for (layer or shader) or something like that. You could easily have more than 8 bytes in render key.

And you probably don't wanna manually size every one of 256 vertex buffs.

So there, 256 vertex buffs * 10000 verts * 32 bytes per vertex = 81.92 megs
The moment its max 100.000 verts per buffer its whooping 819.2 megs.
And if you need 16bit - or more bit renderkeys - then well, its 256 times more!

Sure not every render key needs vertex buffers with 32bytes per vertex, but you get the point.

Don't be afraid to allocate too much during development, on modern computer you virtually have infinite memory.

The memory is totally finite.
It's the virtual memory address space which feels more infinite (was it 2^43 or 2^44?)
That's why i'm curious how slow/fast is VirtualAlloc commiting and VirtualFree decommit operations (doing it every frame I suppose)

Edited by pragmatic_hero on August 28, 2017, 6:54pm

Farhod Miralimov

#13005

September 3, 2017

Thanks to all!
I really appreciate your help, your answers really cleared a lot for me.
Thanks again,
Farhod.