I'm just curious if shaders are purely an api construct or if they are built into gpu hardware. Is it that opengl takes what you define as a shader and just interacts with the gpu in ways to make the shader behave properly? Or are there special cores or some other hardware construct on the gpu that deal with, say, vertex shader processing and then other cores/constructs involved in fragment shader processing, etc? If so, does this mean not all GPU's support all shader type like compute shaders?
Shaders as in HLSL, GLSL or SPIR-V code exists purely as API construct. GPU does not know anything or care about those languages or their encodings.
What happens when you use one these languages (or their binary encodings) is that user-space part of Direct3D, OpenGL or Vulkan driver compiles & optimizes source code for specific GPU instructions & state (what buffers are bound, state set, texture formats & similar info). Then it sends produced instructions stream to GPU to execute when you issue draw calls.
In a lot of ways this transformation is similar to what C compiler is doing to your C source code - it produces x86 or ARM or other instructions that can execute on CPU. The GPU's have different instructions depending on their architecture & model. GPU driver that is loaded knows what GPU you have, so it knows what kind of instructions to produce from HLSL/GLSL/SPIR-V.
Older GPU's ~15 years ago had dedicated cores for vertex or fragment shaders with different instructions & encodings. Nowadays they all have so called unified shader model - which means they are generic shader cores that can execute any workflow.
For AMD and Intel you can find documentation on how these shader instructions need to be encoded (you won't be able to use them directly, unless you're writing your own OS/driver). For example, for Intel check intel-gfx-prm-osrc-tgl-vol02a-commandreference-instructions.pdf file for Tiger Lake GPU instructions - you can find docs for older Intel GPU's in https://01.org/linuxgraphics/documentation
For AMD and Intel you can find documentation on how these shader instructions need to be encoded (you won't be able to use them directly, unless you're writing your own OS/driver). For example, for Intel check intel-gfx-prm-osrc-tgl-vol02a-commandreference-instructions.pdf file for Tiger Lake GPU instructions - you can find docs for older Intel GPU's in https://01.org/linuxgraphics/documentation
Thanks for the links. I will check them out a little later and see if I can gain some insight.
Older GPU's ~15 years ago had dedicated cores for vertex or fragment shaders with different instructions & encodings. Nowadays they all have so called unified shader model - which means they are generic shader cores that can execute any workflow.
Ah, okay. So this means that D3D, opengl, etc. are responsible for creating the correct series of instructions from all the shaders you pass them to get out the desired behavior, as if there still was dedicated hardware on the GPU for each shader. If I got that right, then at this point in time are shaders even a necessary construct? I mean, why break it down to these separate stages instead of just writing a GPU program like we would a CPU program? Is this just because of legacy purposes or is there still an advantage to thinking about GPU programs in set stages like this?
What happens when you use one these languages (or their binary encodings) is that user-space part of Direct3D, OpenGL or Vulkan driver compiles & optimizes source code for specific GPU instructions & state (what buffers are bound, state set, texture formats & similar info). Then it sends produced instructions stream to GPU to execute when you issue draw calls.
I'm currently reading this series of blog posts by fabian: https://fgiesen.wordpress.com/2011/07/01/a-trip-through-the-graphics-pipeline-2011-part-1/ (was written in 2011 but hoping it's still pretty relevant today). From what I gather, there is technically 2 stages of compilation, there's a compilation stage that D3D performs itself, generating a sort of IR with some high level optimizations completed, and then there's further compilation done by the user-mode graphics driver dll which gives you the final GPU assembly instructions. I'm I understanding this correctly?
There is still some specialized dedicated hardware functionality that is not programmable - like rasterization, zbuffer and blending. Also caches behave a bit differently depending on what kind of memory are you loading (vertex input, post/pre-transform, etc..) So splitting shaders into stages is just for performance - to use this fixed functionality to your benefit.
But conceptually you are right - there's no reason to split it like this. You can already today write compute shaders that will do exactly same rasterization as vertex/fragment shader combo. For many cases it will fast enough and will work good.
This is basically what Unreal Engine 5 did with their Nanite rendering - custom geometry rendering and not going through regular vertex/fragment shaders.
I'm I understanding this correctly?
Yes, that is exactly correct.