First of all, remember that the perspective divide involves a division by different amounts for each vertex position.
That is why you can't represent that with a single matrix(!).
You can try to incorporate that in a matrix, but then you'd have to have a slighly different matrix for each vertex (which nobody does...). And then you can't have that matrix as a "uniform" in OpenGL, because "uniforms" area provided .. well, "uniformly" to ALL vertex positions (same matrix in all vertex shader invocations).
From what you said it looks like the point of the perspective projection part is NOT to actually take a point and convert it to another point to get the perspective effect, but to get things into this -1 to 1 clip space which has z setup to be placed in the w component of the point's new clip space matrix/vector position. This is so openGL can then actually perform the perspective divide, which gives the perspective look, during the transition to NDC space.
That is correct. Except that clip space is NOT really this -1 to 1 cube (yet...) - that's actually NDC space.
Clip space is "whatever it would need to be for it to be convertibale to NDC space, by a perspective divide".
So, if you consider that what the perspective divide is going to do is "divide each vector by it's own W
component", then Clip space would have all of the vectors as they were to be in NDC space "multiplied" by their own
W component (the inverse of that Clip->NDC transformation).
So, because NDC space is -1 to +1, then the outskirts
of the vertex positions in Clip space, would have all their X, Y and Z coordinated between the negative and positive of their own W componet (-W -> +W).
Any vertex positions in Clip space that has an X, or Y or Z components that are either greater than their own
W component, or smaller than their own
negative W component, would end up landing outside the -1 to +1 box at NDC space (after perspective divide), and so would be considered outside the view frustum.
So in Clip space, it is pretty trivial to do "frustum culling", and reject any triangles that would end up being outside the view frustum - by a simple comparison of their X, Y and Z components with their own
That's the purpuse of this whole odd scheme in the first place - being able to cull/reject triangle that are outside the view frustum before
the persepctive divide (beacus a division has historically been considered a more expensive computation than a multiplication, or an addition/subtraction).
Additionally, for the same reason, it is also possible to Clip
triangles (not to be confused with Cull
which is outright rejecting), that are partially
within the view frustum (say, one vertex is inside, and the other 2 are outside, or vice versa).
Triangles like that must be clipped
, forming one or more new smaller
triangles, that have all
their vertices inside
the view frustum.
And again, all of that before
the persepctive divide.
This in fact is where Clip space gets it's name.
It's for frustum clipping
(but also frustum culling
When you're working with OpenGL, all your vertex shaders must
output vertices in this Clip space(!) Always
That means NO persepctive divide on your part.
Otherwise things just don't work.
to heve the vertices before the perspective divide, so that it can do the frustum culling and clipping itself in hardware, and then
apply the perspective divide (again, in hardware - parallelized to the max).
You should NEVER apply the perspective divide yourself, whenever working with ANY graphics API (It's the same for Vulkan, DirectX, and I presume Metal as well).
You only really need to to worry yourself about that if/when you're implementing your own rasteriser in software.
I'm thinking about taking those new perspective coordinates and using the aspect ratio to come up with some equation to get -1 to 1 coordinates for clip space. And then these are the coordinates I'm passing to openGL.
With a better understanding of what I just said in this post, you should probably re-read my priot comment(s).
The role of the aspect ratio is to squeeze/stretch a rectangle of the screen proportion into a square (NDC space).
Later, after NDC space, the hardware API would re-scale that NDC space back up to some rectangle of the same proportions, just before rasterising triangles. This whole "normalized" device coordinate space (NDC) it so have all the clipping and culling happen in a resolution-independent way (hence the normalization).
Again, whenever you use a graphics API, you don't (and actually shouldn't) concearn yourself with any of this...
If you try to apply any of what the graphics API is going to do, it'll just end up being applied twice, producing a wrong result.
The role of the near and far clipping planes, is to define how far into the screen, and how close to the screen, should vertex positions still be considered to be "in view" (within the view frustum).
The role of the fov (field of view) is to define specically how far to the right and to the left of the camera (in it's own space) should vertex positions still be considered within the view frustum. Obviousely this determination changes along depth (vertices that are further away from the camera into the screen, can still be consired in-view even when their are further away to the left or right than closer ones).
So, the fov ends up determining the maximum distance to the left or right that vertices that are furthest away to the left/right - which would be for vertices that are furthest away from the camera into the screen (right on the far clipping plane). That's one way to think about it.