How to get correct perspective projection

To be clear, OpenGL expect clip space coordinates at the end of the vertex shader, not normalized coordinates. Clip space coordinates (what you put in gl_Position) are coordinates that are between -w and w on all axis when they are visible from the camera. The clipping stage will test each point components (x, y, z) against the w component to determine if a point is in the view (-w < x < w; -w < y < w; -w < z < w for the point to be visible). The perspective divide, that is executed between the vertex shader and fragment shader will then divide the coordinates by w to produce the normalized device coordinates (NDC).


Yep, this was me being dumb. I was definitely trying to convert everything to -1 to 1 space first. Arnon definitely stated this but I still wasn't reading things thoroughly enough. There is a couple other assumptions I was making that was screwing up my calculations and making me come up with values that didn't make sense. Everyone on this thread has been an enormous help, especially you Arnon, and this information should be plenty sufficient for me to fully understand this process. I plan on posted code on here in the near future of a complete yet stripped down version of this pipeline for anyone who's also struggling with this to look through. I'll try and make it so you can copy and paste it into your own, openGL ready code so you can step through it as well.
As for what happens to X and Y, I have already explained it conceptually at length in my prior comments, but mathematically, it really depends on what your inputs are - meaning, how you are representing the "ratio" I was reffering to many times.

That is why if your search on the web "perspective projection matrix", you are likely to come across seemingly very different matrix formulations(!) Even for just OpenGL(!!!)

I remember I found that VERY confusing when I first tried to learn about it...

The reason is again, that at least for the X and Y scaling factors, the formulation is always in terms of what the "input" to the matrix are (in other words, what are the arguments to the function that creates the matrix).

If you've chosen to represent the magnitude of the perspective as an "fov angle" (field of view), then you'll need a bit of trigonometry to extract a ratio out of that:
The fov angle typically represents the "horizontal" angle between the left and right "planes" of the view frustum.

The ratio you're after, just to reiterate, is the ratio between half of the width of the projection plane and it's distance from the camera. All in world-units (NOT PIXELS!!!)

You can place the projection plane at any distance you want.
It's actually completely unrelated and decoupled from how far the near clipping plane is...
Even though people tend to like to imagine projecting the scene onto the near clipping plane.

It doesn't really matter, you can do that - it's arbitrary, because what really matters is the ratio between it's distance to it's size. You can push it forward twice what it's distance is, but then scale it up by 2 (in both width and height, maintaining the aspect ratio), and that "ratio" would still be the same.

If you only change it's distance but not it's size, or vice-versa, that would change the ratio.
Changing the fov angle, is basically scaling up/down the projection plane (by the same amount in both width and height), while keeping the same distance.

To get the ratio between half of the width of the projection plane to it's distance, is a simple observation from the top that it's the tangent of half of the horizontal-fov angle (oppsite over adjacent).

scaling factor = (projection plane width / 2) / projection plane distance = tan( fov_angle / 2 )

Now, remember that I said that we want the focal-length to be constant at 1, and if it's different then we'd simply scale horizontally and vertically by the "inverse" amount to compensate? Well, that's why you actually need to take the "inverse" of that ratio, which would be the distance to the camera over half the width, or adjacent over opposite, or cotangence of half the angle, or 1 over the tangence of half the angle:

scaling factor = projection plane distance / (projection plane width / 2) = cot( fov_angle / 2 ) = 1 / tan( fov_angle / 2 )

Lastly, remember that we want to squeez the rectangle into a square.
For the ratio, we've considered the width, so the width is going to be considered the "refference" dimension of the square.
The height is going to be shorter ususally, so to get a square the height needs to be stretched up to the length of the width. That's just by scaling the vertical space it by how much the width is longer than the height (a.k.a: "aspect ratio").

aspect_ratio = width / height

The width and height are "technically" the dimensions of the projection plane, but since we only care about their ratio, we can use the render resolution instead.

So the scaling factors are:

X = cot( fov_angle / 2 )
Y = cot( fov_angle / 2 ) * aspect_ratio

Notice how in both terms we're dealing with ratios - that's reflecting how it really doesn't matter where the projection plane is and how big it is, all that matters is how to squeeze it to a rectangle, and by how much it's wider than it's further away from the camera. So you never really need to compute the actual dimensions of the projection plane and it's distance from the camera, because these are arbitrary and irrelevant.

So, every vertex position in view(camera)-space, needs to have it's .x coordinate multiplied by that X factor, and it's .y component multiplied by that Y factor.
That is why if your search on the web "perspective projection matrix", you are likely to come across seemingly very different matrix formulations(!) Even for just OpenGL(!!!)

I remember I found that VERY confusing when I first tried to learn about it...


This. That's why I've been asking on here because almost everything I search tries to explain things with matrices and their own defined coordinate systems and it gets very difficult to sift through when trying to learn the fundamentals. I need to try and learn this with my own numbers/framework for the time being.

It's funny, going back and re-reading your comments I'm really seeing everything you were explaining more clearly now. I think when you're first learning you tend to really focus on certain parts of explanations and gloss over other sections you think you understand (though you usually don't). I guess that's the nature of learning. Trust me, I will be going over all your explanations thoroughly over the next few days.

Edited by Jason on
Okay, so I have code that seems to be outputting the correct perspective look for 6 vertices that I'm using to represent the side of a cube. I say this because I projected these verts with my own equations and then with glm's perspective projection matrix and got the same results for the cube. Here is the code I used for the projection:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
    //Thinking of vert positions in meters
    Array<glm::vec4, 6> squareVerts_meters =
    {
        glm::vec4{2.0f, 1.0f, 3.0f, 1.0f},
        glm::vec4{2.4f, 1.0f, 2.0f, 1.0f},
        glm::vec4{4.0f, 1.0f, 3.0f, 1.0f},
        
        glm::vec4{2.0f, -0.5f, 3.0f, 1.0f},
        glm::vec4{2.4f, -0.5f, 2.0f, 1.0f},
        glm::vec4{4.0f, -0.5f, 3.0f, 1.0f}
    };
    
    Array<glm::vec4, 6> squareVerts_openGLClipSpace;

    {//Projection transform
        f32 fov = glm::radians(90.0f);
        f32 aspectRatio = 16.0f/9.0f;
        f32 tanHalfFov = TanR(fov / 2.0f);
        f32 xScale = 1.0f / (tanHalfFov * aspectRatio);
        f32 yScale = 1.0f / tanHalfFov;
        
        for(i32 vertI{}; vertI < 6; ++vertI)
        {
            squareVerts_openGLClipSpace[vertI].x = squareVerts_meters[vertI].x * xScale;
            squareVerts_openGLClipSpace[vertI].y = squareVerts_meters[vertI].y * yScale;
            squareVerts_openGLClipSpace[vertI].z = 1.0f;
            squareVerts_openGLClipSpace[vertI].w = squareVerts_meters[vertI].z;
        };
    };

    ...code sending verts to openGL


and the resulting cube image:



Now, in order to have a more comprehensive understanding of things I tried to project/transform my verts using focal length calculations as well. The code for that is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Array<glm::vec4, 6> squareVerts_meters =
    {
        glm::vec4{2.0f, 1.0f, 3.0f, 1.0f},
        glm::vec4{2.4f, 1.0f, 2.0f, 1.0f},
        glm::vec4{4.0f, 1.0f, 3.0f, 1.0f},
        
        glm::vec4{2.0f, -0.5f, 3.0f, 1.0f},
        glm::vec4{2.4f, -0.5f, 2.0f, 1.0f},
        glm::vec4{4.0f, -0.5f, 3.0f, 1.0f}
    };
    
    Array<glm::vec4, 6> squareVerts_openGLClipSpace;
    
    {//Projection transform
        f32 focalLength = .8f;
        
        for(i32 vertI{}; vertI < 6; ++vertI)
        {
            f32 distanceToPoint = focalLength + squareVerts_meters[vertI].z;
            f32 perspectiveDivide = (distanceToPoint/focalLength);
            f32 aspectRatio = 16.0f/9.0f;
            
            squareVerts_openGLClipSpace[vertI].x = squareVerts_meters[vertI].x;
            squareVerts_openGLClipSpace[vertI].y = squareVerts_meters[vertI].y * aspectRatio;
            squareVerts_openGLClipSpace[vertI].z = 1.0f;
            squareVerts_openGLClipSpace[vertI].w = perspectiveDivide;
        };
    };
    
...code sending verts to openGL


image for this cube is here:



The only thing is I'm getting a slightly different perspective look on the second image using focal length. I'm not quite sure yet why this is - if I'm miscalculating something or if it's just because focal length adjustments will create a different perspective look due to zooming (as I understand focal length typically deals with zooming in or out from the scene).

Edited by Jason on
1. The image links don't work.

2. Your first code sample is partially correct:
X, Y seem correct.
W is correct, only if your coordinate system is left handed, otherwise it should be negated.
Z is incorrect (Not going to repeat what had alrady been explained to death, just go over it again if you need).

3. Your second code example is wholly incorrect:
Focal length is an alternative/more-direct way of getting at the perspective magnitude ratio,
which needs to be accounted for in X and Y.
Your X and Y there are not accounting for perspective at all, so they are incorrect.
W should always jut have the original Z (or it's negative, as mentioned above).
Z should not use the the focal length at all. It should account for near and far clipping planes distances.
There shouldn't be anything in your code named "perspective divide", again, that's not your code's role to care about.
W is correct, only if your coordinate system is left handed, otherwise it should be negated.


Ya, it was just easier for me to think about things with a left handed coordinate system.

Z is incorrect (Not going to repeat what had alrady been explained to death, just go over it again if you need).


Ya, sorry. Didn't mention that I wasn't worried about z at the moment (for either example). Just wanted to get the perspective effect so just stuck an arbitrary value in z.

W should always jut have the original Z (or it's negative, as mentioned above).


I was trying to mimic what Casey performed in one of his episodes when he was just trying to get perspective working. I see how things are off with it now.

And I'll try to fix the image links.
Okay, so I think I have things worked out and here is the final code that someone should be able to just copy and paste into their openGL ready code and step through:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
struct v4
{
    f32 x, y, z, w;
};

void ProjectionTestUsingFOV_InMeters(f32 windowWidth, f32 windowHeight)
{
    //These are assumed to be in camera space, with camera looking down positive z axis (left handed coordinate system)
    v4 squareVerts_meters[6] =
    {
        v4{2.0f, 1.0f, 3.0f, 1.0f},
        v4{2.4f, 1.0f, 2.0f, 1.0f},
        v4{4.0f, 1.0f, 3.0f, 1.0f},
        
        v4{2.0f, -0.5f, 3.0f, 1.0f},
        v4{2.4f, -0.5f, 2.0f, 1.0f},
        v4{4.0f, -0.5f, 3.0f, 1.0f}
    };
    
    //Where we will store clip space coordinates
    v4 squareVerts_openGLClipSpace[6];
    
    {//Projection transform which will convert x, y and z to clip space as well as store original z value in clip space w
        f32 fov = glm::radians(90.0f);
        f32 aspectRatio = 16.0f/9.0f;
        f32 tanHalfFov = TanR(fov / 2.0f);
        f32 xScale = 1.0f / (tanHalfFov * aspectRatio);
        f32 yScale = 1.0f / tanHalfFov;
        
        f32 farClip = 100.0f;
        f32 nearClip = 1.0f;
        
        //These equations were calculated assuming camera z values are positive
        f32 a = (-farClip - nearClip) / (nearClip - farClip);
        f32 b = (2.0f * farClip * nearClip) / (nearClip - farClip);
        
        for(i32 vertI{}; vertI < 6; ++vertI)
        {
            squareVerts_openGLClipSpace[vertI].x = squareVerts_meters[vertI].x * xScale;
            squareVerts_openGLClipSpace[vertI].y = squareVerts_meters[vertI].y * yScale;
            squareVerts_openGLClipSpace[vertI].z = squareVerts_meters[vertI].z * a + b;
            squareVerts_openGLClipSpace[vertI].w = squareVerts_meters[vertI].z;
        };
    };
    
    //Send down newly projected verts to openGL
    GLfloat verts[] =
    {
        squareVerts_openGLClipSpace[0].x, squareVerts_openGLClipSpace[0].y, squareVerts_openGLClipSpace[0].z, squareVerts_openGLClipSpace[0].w,
        1.0f, 0.0f, 0.0f,
        squareVerts_openGLClipSpace[1].x, squareVerts_openGLClipSpace[1].y, squareVerts_openGLClipSpace[1].z, squareVerts_openGLClipSpace[1].w,
        0.0f, 1.0f, 0.0f,
        squareVerts_openGLClipSpace[2].x, squareVerts_openGLClipSpace[2].y, squareVerts_openGLClipSpace[2].z, squareVerts_openGLClipSpace[2].w,
        1.0f, 0.0f, 0.0f,
        squareVerts_openGLClipSpace[3].x, squareVerts_openGLClipSpace[3].y, squareVerts_openGLClipSpace[3].z, squareVerts_openGLClipSpace[3].w,
        1.0f, 0.0f, 0.0f,
        squareVerts_openGLClipSpace[4].x, squareVerts_openGLClipSpace[4].y, squareVerts_openGLClipSpace[4].z, squareVerts_openGLClipSpace[4].w,
        0.0f, 1.0f, 0.0f,
        squareVerts_openGLClipSpace[5].x, squareVerts_openGLClipSpace[5].y, squareVerts_openGLClipSpace[5].z, squareVerts_openGLClipSpace[5].w,
        1.0f, 0.0f, 0.0f
    };
    
    GLuint bufferID;
    glGenBuffers(1, &bufferID);
    glBindBuffer(GL_ARRAY_BUFFER, bufferID);
    glBufferData(GL_ARRAY_BUFFER, sizeof(verts), verts, GL_STATIC_DRAW);
    glEnableVertexAttribArray(0);
    glVertexAttribPointer(0, 4, GL_FLOAT, GL_FALSE, sizeof(GLfloat) * 7, 0);
    glEnableVertexAttribArray(1);
    glVertexAttribPointer(1, 3, GL_FLOAT, GL_FALSE, sizeof(GLfloat) * 7, (char*)(sizeof(GLfloat)*3));
    
    GLushort indicies[] =
    {
        0, 1, 3,  3, 1, 4,  1, 2, 4,  2, 5, 4
    };
    
    GLuint indexBufferID;
    glGenBuffers(1, &indexBufferID);
    glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, indexBufferID);
    glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(indicies), indicies, GL_STATIC_DRAW);
    
    glDisable(GL_TEXTURE_2D);
    glDrawElements(GL_TRIANGLES, 12, GL_UNSIGNED_SHORT, 0);
    glEnable(GL_TEXTURE_2D);
};


Just wanted to give someone also struggling to understand this something concrete to work with and that doesn't involve matrices to help with fundamentals. This should just produce something that looks like the corner of a cube. Obviously, if someone notices issues please let me know. Here is the image you are suppose to get:

final image
Look allright to me.
Nice work! :)

(Now make the full cube and not just 2 squares...:P )