First of all, remember that the perspective divide involves a division by different amounts for each vertex position.
That is why you can't represent that with a single matrix(!).
You can try to incorporate that in a matrix, but then you'd have to have a slighly different matrix for each vertex (which nobody does...). And then you can't have that matrix as a "uniform" in OpenGL, because "uniforms" area provided .. well, "uniformly" to ALL vertex positions (same matrix in all vertex shader invocations).
boagz57
From what you said it looks like the point of the perspective projection part is NOT to actually take a point and convert it to another point to get the perspective effect, but to get things into this -1 to 1 clip space which has z setup to be placed in the w component of the point's new clip space matrix/vector position. This is so openGL can then actually perform the perspective divide, which gives the perspective look, during the transition to NDC space.
That is correct. Except that clip space is NOT really this -1 to 1 cube (yet...) - that's actually NDC space.
Clip space is "whatever it would need to be for it to be convertibale to NDC space, by a perspective divide".
So, if you consider that what the perspective divide is going to do is "divide each vector by it's
own W component", then Clip space would have all of the vectors as they were to be in NDC space "multiplied" by their
own W component (the inverse of that Clip->NDC transformation).
So, because NDC space is -1 to +1, then the
outskirts of the vertex positions in Clip space, would have all their X, Y and Z coordinated between the negative and positive of their own W componet (-W -> +W).
Any vertex positions in Clip space that has an X, or Y or Z components that are either greater than their
own W component, or smaller than their
own negative W component, would end up landing outside the -1 to +1 box at NDC space (after perspective divide), and so would be considered outside the view frustum.
So in Clip space, it is pretty trivial to do "frustum culling", and reject any triangles that would end up being outside the view frustum - by a simple comparison of their X, Y and Z components with their
own W coordinate.
That's the purpuse of this whole odd scheme in the first place - being able to cull/reject triangle that are outside the view frustum
before the persepctive divide (beacus a division has historically been considered a more expensive computation than a multiplication, or an addition/subtraction).
Additionally, for the same reason, it is also possible to
Clip triangles (not to be confused with
Cull which is outright rejecting), that are
partially within the view frustum (say, one vertex is inside, and the other 2 are outside, or vice versa).
Triangles like that must be
clipped, forming one or more new
smaller triangles, that have
all their vertices
inside the view frustum.
And again, all of that
before the persepctive divide.
This in fact is where Clip space gets it's name.
It's for frustum
clipping (but also frustum
culling).
When you're working with OpenGL, all your vertex shaders
must output vertices in this Clip space(!)
Always(!).
That means NO persepctive divide on your part.
Otherwise things just don't work.
OpenGL
wants to heve the vertices before the perspective divide, so that it can do the frustum culling and clipping itself in hardware, and
then apply the perspective divide (again, in hardware - parallelized to the max).
You should NEVER apply the perspective divide yourself, whenever working with ANY graphics API (It's the same for Vulkan, DirectX, and I presume Metal as well).
You only really need to to worry yourself about that if/when you're implementing your own rasteriser in software.
boagz57
I'm thinking about taking those new perspective coordinates and using the aspect ratio to come up with some equation to get -1 to 1 coordinates for clip space. And then these are the coordinates I'm passing to openGL.
With a better understanding of what I just said in this post, you should probably re-read my priot comment(s).
The role of the aspect ratio is to squeeze/stretch a rectangle of the screen proportion into a square (NDC space).
Later, after NDC space, the hardware API would re-scale that NDC space back up to some rectangle of the same proportions, just before rasterising triangles. This whole "normalized" device coordinate space (NDC) it so have all the clipping and culling happen in a resolution-independent way (hence the normalization).
Again, whenever you use a graphics API, you don't (and actually shouldn't) concearn yourself with any of this...
If you try to apply any of what the graphics API is going to do, it'll just end up being applied twice, producing a wrong result.
The role of the near and far clipping planes, is to define how far into the screen, and how close to the screen, should vertex positions still be considered to be "in view" (within the view frustum).
The role of the fov (field of view) is to define specically how far to the right and to the left of the camera (in it's own space) should vertex positions still be considered within the view frustum. Obviousely this determination changes along depth (vertices that are further away from the camera into the screen, can still be consired in-view even when their are further away to the left or right than closer ones).
So, the fov ends up determining the maximum distance to the left or right that vertices that are furthest away to the left/right - which would be for vertices that are furthest away from the camera into the screen (right on the far clipping plane). That's one way to think about it.