Why do we use homogenious coordinates?

1. Why do most of the game industry use a 4d homogenous coordinate system? Why don't we just use a 3d vector?
2. What does the w in [x, y, z, w] mean?
3. Why do most of the game industry use quaternion instead of euler angles in form of a vector 3?
4. What does quaternion represent? (In an abstract/high-level view)

Edited by longtran2904 on Reason: Initial post
1. because it means that a matrix multiplication can encode translation, one of the most common operations in games. And using a slightly adjusted matrix you get a nearly free perspective projection by swapping the z and w and re-homogenizing.

2. there is no special meaning to the letter it just happens to be the 4th to last letter of the alphabet. In the math all homogeneous coordinates are point projected onto the w=1 plane with the origin as the projection point.

3. fewer games use quaternions than you'd think. But quaterions can interpolate and compose very easily

4. it's an axis+angle representation but instead of representing that as a unit length axis and angle in radians, you instead have the w component be the cosine of the half angle and the axis length is the sine of the half angle. This happens to have all the nice properties that we use quaternions for.
Can you elaborate on this a little bit more? Maybe give some examples? Also, when I asked about the 'w', I meant that what does the 'w' encode?

Edited by longtran2904 on
As I'm sure you're aware it's really common to use matrices to encode the transformations you do in graphics programming. They have the lovely property where, if you have a bunch of matrices for specific transformations (translation, rotation, scaling, projection), you can multiply the matrices together to get one transformation that does everything in one go.

The catch is that matrices can only encode linear transformations, and translation is not a linear transformation. One of the conditions for a linear transformation is that it does not change the origin; in other words, the point [0, 0] must remain at [0, 0] after the transformation. Translation obviously does not do this; a translation of [x, y] moves [0, 0] to [x, y].

There's a neat trick you can do, though - by using an extra dimension (e.g. using 3D vectors to represent 2D points) you can represent translation with a shear, which is indeed a linear transformation. It's easy to visualize in the 2D/3D case, and Wikipedia has a good animation: https://en.wikipedia.org/wiki/File:Affine_transformations.ogv

Indeed, if you look up shear matrices and translation matrices online, you'll see that they have the exact same structure, because they're the exact same thing.

So that's one reason to use homogeneous coordinates, but there are other reasons like ratchetfreak alluded to. Perspective projection is another nonlinear transformation, but again one where the properties of homogeneous coordinates are very useful. I don't have a nice visual for that unfortunately; maybe someone else knows of a good one.

If you wanted, you absolutely could just use 3D vectors for everything and not use matrices to do your transformations. But homogeneous coordinates are so darn convenient that they're just the standard.
As for what the w encodes - that's the scaling factor on the rest of the members of the vector.

Say we have the 2D vector [1, 2]. In homogeneous coordinates we have a third member, that scaling factor. The following vectors all represent the same point in homogeneous coordinates:

  • [1, 2, 1]
  • [2, 4, 2]
  • [0.5, 1, 0.5]


With any of these, you can divide the whole vector by that scaling factor to get back to the normalized version where w = 1: [1, 2, 1].

If you play around with this, getting a bunch of random 3D vectors [x, y, z] and then normalizing them this way (divide the whole vector by the last component), you'll see that this appears to "project" those points onto the plane where z = 1.

The projection method actually used in computer graphics is a bit different, but you can hopefully see why homogeneous coordinates are helpful for this kind of thing.
What about quaternion? What advantage does quaternion have over a vector3?
Regarding homogenous coordinates, I found a clip that talks about this fairly well.

Edited by longtran2904 on
If by a vector3 you mean an angle for each axis (yaw, pitch, roll), if you ever tested such a rotation you'll notice the rotation will be different based on the order in which you apply the rotations (yaw->pitch->roll will give a different result than pitch->yaw->roll).

As ratchetfreak said, quaternions are a way of encoding a rotation as (axis + angle).

Before quaternions (or rotators now) were commonly known, some early 3D games could get away with such rotations (for example if I'm not mistaken, Quake used them, and it's entities could only rotate on the up axis).

Edited by Guntha on
euler angles have the danger of gimbal locking where you lose a degree of freedom

representing the rotation as a full 3x3 rotation matrix fixes that but that is more difficult to orthonormalize (to avoid the matrix from representing something that isn't a rotation) and doesn't interpolate
longtran2904
What about quaternion? What advantage does quaternion have over a vector3?



Let me try and explain what the problem is with treating 3D vectors as rotations...

So, a rotation is a transformation, i.e. a function that maps vectors to vectors.

For example, which 2D rotation is this?

1
2
3
4
5
Vector2D MyRotation(Vector2D input) {
	Vector2D output;
	output.x = input.y;
	output.y = -input.x;
}


It's a 90 degree, clockwise rotation, about the origin, right? (You can convince yourself this is true by plugging a few values into the function and see what you get!)

However, this is not a very nice form for rotations to be in. If I gave you an arbitrarily complicated function that does a rotation, there's no easy way to find to the inverse rotation (the one that has the opposite effect).

So instead humans assign numbers to 2D rotations, which we call angles! If I want the inverse of a rotation of "angle 30", I negate the number, giving the rotation "angle -30". Furthermore, if I want find an equivalent rotation to applying two rotations one after another, I add their angles together. And importantly for animation, if I want to find a rotation somewhere between two rotations with angles x and y, I linearly interpolate them, i.e. (y-x)t+x, where t ranges between 0 and 1.

But then what's the problem with doing this for 3D rotations? For a 3D rotation, we need 3 angles, often called row, pitch and yaw. But the effect of these angles are intertwined! To apply a 3D rotation, we actually apply the row, pitch and yaw rotations in succession, one after the other. Therefore, our rotation now looks like a C function again!

1
2
3
4
5
6
7
Vector3D Apply3DRotation(Vector3D input, Vector3D angles) {
	Vector3D vector = input;
	vector = ApplyPlanarRotation(vector, angles.x, THE_X_PLANE);
	vector = ApplyPlanarRotation(vector, angles.y, THE_Y_PLANE);
	vector = ApplyPlanarRotation(vector, angles.z, THE_Z_PLANE);
	return vector;
}


This means all the nice things that we used to be able to do with angles in 2D, we now can't do =(. Therefore, we have to go back to the drawing board and get quaternions involved -- a 4D generalisation of complex numbers.
bvisness
I don't have a nice visual for that unfortunately; maybe someone else knows of a good one


Actually, there hasn't been one, until a few days ago when I finished mine :)



I go over the whole spill of projective space and projective coordinate system (a.k.a: "Homogeneous")

BTW, calling 4D vectors and matrices "Homogeneous" when used only for translation, is technically incorrect:
Ther'e absolutely no need there to invoke the term "Homogenous" which is just a common synonim for "projective", because it's not actually a projective space at all, and "Homogeneouse Normalization" doesn't even apply there.
You don't need any projective space at all to do translation by shearing - it happens in completely standard Euclidian space with Cartesian coordinate system (as opposed to Projective/Homogeneous).
It just happens to have 4 dimentions because the shear has to happen in one dimension higher.

longtran2904
Why do most of the game industry use a 4d homogenous coordinate system? Why don't we just use a 3d vector

For perspective projection though, there you DO need a Projective space and coordinate system (Homogeneous), and most of my video covers exactly how that looks like geometrically, and how the matrices naturally fall out of that process.

Conceptually you can consider the perspective projection process as involving 2 separate transformations of 2 kinds:
You could conceptualize the first as happening in a standard 4D Euclidian space with a standard 4D Cartesian coordinate system, involving a composite of multiple purely-linear transformations (which I detail in my video).
You can then conceptualize taking the resulting 4D Euclidian vectors you get out of that, and re-interpreting them "as-if" they were projective coordinates (Homogeneouse). In a sense, super-imposing a projective-space onto a Euclidian space, where the 4D Cartisian coordinate system is superimposed over the 4D Projective (Homogeneouse) coordinate system, so they match up.
Then do the homogeneouse-(re)normalization (a.k.a: "The Perspective Divide"), then re-interpret the result back as 4D Cartesian coordinates, and then slice/sub-space it taking only the first 3 dimensions.
That's just an alternative conceptualization though, arithmetically it's the same as just doing it all in just projective space.

It can be thought of as representing 3D space as a "slice" of 4D space (just like 2D space would be a "slice" of 3D space).
So you get:
1) Re-interpret 3D space a a slice of 4D space.
2) Situating that 3D-slice in 4D space at level-1 of the 4th dimension (creating a 4D vector out of the 3D one, with w=1).
3) Do all the linear transformations there.
4) Re-interpret the result as "projective".
5) Re-normalize projectively (homogeneousely).
6) Re-interpret the result back as Euclidian 4D.
7) Slice it back to 3D by ignoring the 4th dimension (creating a 3D vector out of the 4D one, dropping the w coordinate).

E-voila(!)
You've applied perspective projection in 3D by going up to 4D and then down again.
It's just like with translation-by-shearing, only with an added step of reinterpreting as projective for a little while for projective-normaliztion, then back to Euclidian again (which doesn't actually happen with translation-by-shearing, despite the common missconception).

Edited by Arnon on
Thanks for all the docs and videos you guys sent me. I kind of understand why do we use homogenous coordinates now. The next thing that I want to know is how to read and understand quickly what a specific matrix or quaternion means. Are there any tips or tricks to easily visualize and identify which numbers in a given matrix/quaternion should I focus on? For example, in a particular context, I may need to read the matrix by its rows. But in another context, I may need to read the same matrix by its column. Also, what are some common matrices and quatenions, and what is their's usage in the game industry?
Vector2D MyRotation(Vector2D input) {
Vector2D output;
output.x = input.y;
output.y = -input.x;
}

For the first example, I can just treat the function as matrix multiplication. The inverse rotation is just the inverse of the matrix. image.png

Even if we use quaternion, we still can only use matrix operations, right? I don't understand the advantage that you were trying to make here?

So the only reason we use quaternion is that it doesn't have a gimbal lock problem?


Edited by longtran2904 on
Replying to nakst (#24713)