Z-buffer and transparency in 2D rendering

I am currently working on a 2D GUI framework built directly on top of Direct3D 11 that uses Z-buffering and early-discard to minimize fill load (as well as allow it to avoid off-screen intermediate compositing and keep the number of draw calls to a minimum).

When the layout engine is traversing the visual tree it is converting elements to quad data for the GPU along the way. It uses a couple of intermediate buffers for this; one for opaque surfaces and one for [semi-]transparent surfaces. Quads are written as instance data into the appropriate buffer based on their originating element's opacity (modulated by their parent element's opacity etc).

The tree is organized so that the traversal is effectively "front to back" (in terms of Z order; Z-values are assigned to the quads during this phase). Opaque quads are written to their buffer sequentially "left to right" (and will thus be rendered "front to back", with decreasing Z-values) while transparent quads are written backwards ("right to left") to their buffer (i.e. "back to front" in the eyes of the GPU, with increasing Z-values).

Then the buffers are memcpied into a single instance buffer which is mapped to the GPU (this has to be done every frame due to how the 2D surfaces are constantly changing numbers, dimensions and properties).

The whole screen composition is rendered in just two draw calls:

The GPU blend state is initially not set (i.e. no alpha blending; not needed for opaque objects + it is faster) before the first draw call is issued, pointing to the first part of the instance buffer (containing the front-to-back ordered non-transparent surfaces).

Then the blend state is set with alpha blending and the second draw call is issued, this time specifying buffer offsets to render the instances in the second part of the (same) buffer; the back-to-front ordered transparent surfaces.

This is very cheap and benefits from both using the z-buffer to early-discard occluded areas on the opaque surfaces + having transparent surfaces render and blend correctly back to front (and without having to do any manual blending).

EDIT: In a game like Handmade Hero, it would be more difficult to leverage early-discard depth buffer techniques due to the irregularly shaped sprites that it typically has. Rendering quad based sprites front-to-back with depth buffering will in effect "block" the whole rectangular area covered by the quad and prevent subsequent rasterizing from reaching the pixels "behind" its transparent areas (i.e. the parts not covered by the sprite's actual graphics).

Edited by d7samurai on
I don't mean to be rude or offensive, but how's that relevant to the handmade hero code discussion? I see no question, are you advertising your UI framework? :)
Casey was discussing this exact point at some length in the day 60 stream (58:24), and my post was in reference to that:

https://www.youtube.com/watch?v=0_xzS8zxuq4&feature=youtu.be&t=58m24s

Also, this forum is not just for "questions", but for "code discussion".

(BTW how would I be "advertising" something that is not even mentioned by name? I was merely providing some context for the approach I was describing).

Edited by d7samurai on
Interesting, in your system, do you use the same shader for decoration and text ?

I'm working a bit on 2D GUI rendering too, and in the simple case it's quite easy to batch,
but as soon as I mix transparency and custom shaders it gets a bit ugly.

My least unsatisfying idea for now is to build some kind of "overlap map", and allow
batching until two overlapping elements with different shaders force me to start
a new batch, but maybe I'm missing something obvious.

Maybe the right answer is simply to avoid fancy stuff and just draw everything
with the same shader.
On the subject of GUIs, I happen to catch a bit of the pre-stream chat this week (it is very rare that I'm able to watch live) and Casey mentioned an excellent example of the IMGUI approach here.

I thought I would mention it since the pre-stream chat is not recorded and I think a lot of people are interested in seeing what the IM GUI approach looks like (well, at least I'm interested :) ).
@khahem

First let me put things in perspective: This framework is for internal use, embedded in a broader platform architecture, and is not an external / standalone general purpose GUI 'library' meant for others to use in their own applications. Knowing exactly what the UI is designed to do / will ever need allows some specialized simplifications and optimizations:

In essence everything is a rectangle (or at least defined by four corners), including lines. It uses a single custom designed bitmap font (http://imgur.com/RcGe1kz) throughout the whole UI, so text is drawn as textured quads.

The key is to make everything representable as the same basic thing - in this case a [textured] quad - precisely to allow a single shader to render everything.

Although the whole UI could, in principle, be rendered in a single draw call, it isn't optimal (both with regard to overdraw and other things*). In actuality, the system uses 1-5 draw calls to render a complete screen update:

The system has 3 built-in texture atlases subject to varying update frequencies: One rarely-updated atlas for 'system' textures (font / glyphs, baked text phrases and digit groups, line patterns, symbols / icons etc), one for user loaded images and one for secondary render targets (to allow subsystems to render custom content directly into GUI-accessible textures on the GPU** without having to rebuild an atlas). A bin packer converts lists of 'texture reservations' into a corresponding Texture2DArray on demand (using an algorithm similar to this: http://www.blackpawn.com/texts/lightmaps/default.html).

As the system is traversing the tree of elements and building the corresponding list of quads to be rendered, it puts each quad in one of 5 buffers (4 for opaque elements - solid color, atlas 1, atlas 2, atlas 3 - and 1 for transparent ones, similar to the simplified dual-buffer version described in the original post).

All the buffers are then concatenated into a single mapped / unmapped instance buffer.

The instances differ only by their designated pixel shader, so in the current version one vertex shader and three different pixel shaders are used ([solid color], [single texture sampling] and [solid color + triple texture sampling]).

First draw call uses a solid color pixel shader:

1
2
3
4
float4 main(ps_in input) : SV_TARGET
{
    return input.color;
}

Second to fourth draw calls use a 'texel shader', sampling a single texture (one of the three atlases) since *having one shader with a switch statement to sample several textures will sample all of them for each fragment every time and slow the shader down:

1
2
3
4
float4 main(ps_in input) : SV_TARGET
{
    return input.color * source.Sample(sState, float3(input.texcoord, input.slice));
}

Fifth draw call renders all transparent quads - and (unfortunately) that shader has to use a switch statement to draw quads of any type (solid colored + texture atlas 1-3) in one go:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
float4 main(ps_in input) : SV_TARGET
{
    [forcecase] switch (input.atlas)
    {
    case 1:
        return input.color * system.Sample(sState, float3(input.texcoord, input.slice));
    case 2:
        return input.color * images.Sample(sState, float3(input.texcoord, input.slice));
    case 3:
        return input.color * render.Sample(sState, float3(input.texcoord, input.slice));
    default:
        return input.color;
    }
}

I haven't yet found a way around the latter compromise without having to do batching (that would force me to interleave smaller batches of quads by type like you mention), but I think the overall gain is well worth it. Most quads are opaque.

All opaque quads are rendered front to back without alpha blending. The transparent ones are rendered back to front, obviously with alpha blending.

The only work needed between each draw call is to point the shader resources to the right texture buffer index. Pixel shader is switched between call 1 & 2 and 4 & 5. For the last call the blend state is also set.

**Should you need to render anything that the default shader(s) can't handle, you either provide it as a pre-rendered texture or you reserve and render dynamically into a secondary render target texture (bin packed and accessible via atlas 3) - using whatever shaders and logic you want. That way you don't have to split up the main batch just to interleave something "fancy".

Edited by d7samurai on
@rathersleepy

Speaking of IMGUI.. Here's a video of (a bearded) Casey explaining his take on the concept a few years ago :)

http://mollyrocket.com/861