@khahem
First let me put things in perspective: This framework is for internal use, embedded in a broader platform architecture, and is not an external / standalone general purpose GUI 'library' meant for others to use in their own applications. Knowing exactly what the UI is designed to do / will ever need allows some specialized simplifications and optimizations:
In essence everything is a rectangle (or at least defined by four corners), including lines. It uses a single custom designed bitmap font (
http://imgur.com/RcGe1kz) throughout the whole UI, so text is drawn as textured quads.
The key is to make everything representable as the same basic thing - in this case a [textured] quad - precisely to allow a single shader to render everything.
Although the whole UI could, in principle, be rendered in a single draw call, it isn't optimal (both with regard to overdraw and other things*). In actuality, the system uses 1-5 draw calls to render a complete screen update:
The system has 3 built-in texture atlases subject to varying update frequencies: One rarely-updated atlas for 'system' textures (font / glyphs, baked text phrases and digit groups, line patterns, symbols / icons etc), one for user loaded images and one for secondary render targets (to allow subsystems to render custom content directly into GUI-accessible textures on the GPU** without having to rebuild an atlas). A bin packer converts lists of 'texture reservations' into a corresponding Texture2DArray on demand (using an algorithm similar to this:
http://www.blackpawn.com/texts/lightmaps/default.html).
As the system is traversing the tree of elements and building the corresponding list of quads to be rendered, it puts each quad in one of 5 buffers (4 for opaque elements - solid color, atlas 1, atlas 2, atlas 3 - and 1 for transparent ones, similar to the simplified dual-buffer version described in the original post).
All the buffers are then concatenated into a single mapped / unmapped instance buffer.
The instances differ only by their designated pixel shader, so in the current version one vertex shader and three different pixel shaders are used ([solid color], [single texture sampling] and [solid color + triple texture sampling]).
First draw call uses a solid color pixel shader:
| float4 main(ps_in input) : SV_TARGET
{
return input.color;
}
|
Second to fourth draw calls use a 'texel shader', sampling a single texture (one of the three atlases) since *having one shader with a switch statement to sample several textures will sample all of them for each fragment every time and slow the shader down:
| float4 main(ps_in input) : SV_TARGET
{
return input.color * source.Sample(sState, float3(input.texcoord, input.slice));
}
|
Fifth draw call renders all transparent quads - and (unfortunately) that shader has to use a switch statement to draw quads of any type (solid colored + texture atlas 1-3) in one go:
1
2
3
4
5
6
7
8
9
10
11
12
13
14 | float4 main(ps_in input) : SV_TARGET
{
[forcecase] switch (input.atlas)
{
case 1:
return input.color * system.Sample(sState, float3(input.texcoord, input.slice));
case 2:
return input.color * images.Sample(sState, float3(input.texcoord, input.slice));
case 3:
return input.color * render.Sample(sState, float3(input.texcoord, input.slice));
default:
return input.color;
}
}
|
I haven't yet found a way around the latter compromise without having to do batching (that would force me to interleave smaller batches of quads by type like you mention), but I think the overall gain is well worth it. Most quads are opaque.
All opaque quads are rendered front to back without alpha blending. The transparent ones are rendered back to front, obviously with alpha blending.
The only work needed between each draw call is to point the shader resources to the right texture buffer index. Pixel shader is switched between call 1 & 2 and 4 & 5. For the last call the blend state is also set.
**Should you need to render anything that the default shader(s) can't handle, you either provide it as a pre-rendered texture or you reserve and render dynamically into a secondary render target texture (bin packed and accessible via atlas 3) - using whatever shaders and logic you want. That way you don't have to split up the main batch just to interleave something "fancy".