In your example, is it correct to think that Casey created a stack and push a bunch of draw commands every frame?
Yes, but it's not just draw calls. So basically he allocates an area of memory in the platform layer that is, as you suggested, treated like a stack in the game code where he pushes whatever his version of a render command is. So, he could have a render command like "DrawRect" which requires the user to pass in the pos, color, width, height of the rect and then this information get pushed onto the render command stack (in the form of a struct like CmdEntry_DrawRect). So, function might look like:
void GPUCmd_DrawRect(RenderCmdBuffer* cmdBuffer, v2 pos, Color color, f32 width, f32 height)
//Here's where the entry is actually added to the command stack
RenderEntry_DrawRect* rectEntry = RenderCmdBuf_Push(cmdBuffer, RenderEntry_DrawRect);
//Fill in the values
rectEntry->header.type = EntryType_DrawRect;
rectEntry->color = color;
rectEntry>pos = pos;
rectEntry>width = width;
rectEntry>height = height;
but you can also add say a "ClearColorBuffer" command where a user passes in what color to clear the window to or a "LoadTextureBuffer" command or anything else you would want to add. It's important to note that, with these commands and the way Casey has it set up in his code, no actually opengl calls are made in this api layer. This is only for setting up a render command buffer. This command buffer/stack (whatever you wanna call it) is then eventually passed to an opengl layer of the code where the commands are actually processed. In his opengl layer, he just has a switch statement which checks what command is next on the stack and performs the correct operation on it. E.g.:
//Inside opengl.h within the processing function
u8* currentRenderBufferEntry = bufferToRender.baseAddress;
for (s32 entryNumber = 0; entryNumber < bufferToRender.entryCount; ++entryNumber)
RenderEntry_Header* entryHeader = (RenderEntry_Header*)currentRenderBufferEntry;
//Other render entry cases....
RenderEntry_DrawRect rectEntry = *(RenderEntry_DrawRect*)currentRenderBufferEntry;
//Where you actually perform whatever opengl calls you need.
glColor3fv(rectEntry.color.r, rectEntry.color.g, rectEntry.color.b);
glVertex2f(rectEntry.pos.x + rectEntry.width, rectEntry.pos.y);
glVertex2f(rectEntry.pos.x + rectEntry.width, rectEntry.pos.y + rectEntry.height);
glVertex2f(rectEntry.pos.x, rectEntry.pos.y + rectEntry.height);
currentRenderBufferEntry += sizeof(RenderEntry_DrawRect);
And yes, this building and processing of the command buffer is done every frame.
Casey's renderer did start out as 2d only but has since evolved into a 2d/3d hybrid. Basically he still passes quads to the renderer but they also have z positions and the actual world is rendered as a 3d world.
So most of the time, optimize the renderer just boil down to reduce the number of data and draw calls that the gpu needs to handle, right?
Broadly speaking yes. Generally your goal is to batch as much data as you can and feed it to the GPU. If your game is slow is more than likely because it's CPU bound, and GPU is sitting idle (GPU's are incredibly powerfule nowadays). For Casey, I know at one point he started passing just 1 giant buffer for textures and one for vertex data (I think) and you would then just index into that data when drawing (which beats having to buffer data individually constantly). Though I'm really not very specialized in rendering techniques and really don't have too much experience with modern performance rendering.