I am currently working on a 2D GUI framework built directly on top of Direct3D 11 that uses Z-buffering and early-discard to minimize fill load (as well as allow it to avoid off-screen intermediate compositing and keep the number of draw calls to a minimum).
When the layout engine is traversing the visual tree it is converting elements to quad data for the GPU along the way. It uses a couple of intermediate buffers for this; one for opaque surfaces and one for [semi-]transparent surfaces. Quads are written as instance data into the appropriate buffer based on their originating element's opacity (modulated by their parent element's opacity etc).
The tree is organized so that the traversal is effectively "front to back" (in terms of Z order; Z-values are assigned to the quads during this phase). Opaque quads are written to their buffer sequentially "left to right" (and will thus be rendered "front to back", with decreasing Z-values) while transparent quads are written backwards ("right to left") to their buffer (i.e. "back to front" in the eyes of the GPU, with increasing Z-values).
Then the buffers are memcpied into a single instance buffer which is mapped to the GPU (this has to be done every frame due to how the 2D surfaces are constantly changing numbers, dimensions and properties).
The whole screen composition is rendered in just two draw calls:
The GPU blend state is initially not set (i.e. no alpha blending; not needed for opaque objects + it is faster) before the first draw call is issued, pointing to the first part of the instance buffer (containing the front-to-back ordered non-transparent surfaces).
Then the blend state is set with alpha blending and the second draw call is issued, this time specifying buffer offsets to render the instances in the second part of the (same) buffer; the back-to-front ordered transparent surfaces.
This is very cheap and benefits from both using the z-buffer to early-discard occluded areas on the opaque surfaces + having transparent surfaces render and blend correctly back to front (and without having to do any manual blending).
EDIT: In a game like Handmade Hero, it would be more difficult to leverage early-discard depth buffer techniques due to the irregularly shaped sprites that it typically has. Rendering quad based sprites front-to-back with depth buffering will in effect "block" the whole rectangular area covered by the quad and prevent subsequent rasterizing from reaching the pixels "behind" its transparent areas (i.e. the parts not covered by the sprite's actual graphics).