float vs double?

About float/real32 being good enough compared to double/real64, I think it really depends on the scale of your world vs the precision you want to achieve.

As an example, if the minimum absolute position precision you need is around 0.001 meter (i.e. 1mm or 1/25th an inch), then with floats you can't achieve that if you go much beyond a 10km scale (6 miles).
But with doubles the same limit is 10,000,000,000 km, which is about the size of the solar system.
I don't know if you would reference a position in such a global manner. I think most games of such size tend to have smaller maps that share borders and are streamed in when the player is close to a border. So you would only reference a position inside the current map.
If I remember correctly it was discussed on the stream and we will be using integer world position so no problem with accuracy.
I think float can be used for relative position inside the tile.
Right, Casey talks about this in the context of HMH in stream #30 or #31 I think.

It's true that an alternative solution is to divide the world into smaller sections, and use floats as a relative position within subsections. That works well for HMH.
But this solution is more complicated when the world is truly open and dynamic, with a mix of very big and very small objects, like a space sim where capital ships could be many miles long.
In video #32, even though floats are used to just track the position within a tile, Casey is running into a precision issue when doing a modulo on floats:
Transforming a tiny negative position around zero -dx (in the current tile) into a positive position tileWidth-dx (relative to the next tile) just doesn't work. The ratio dx/tileWidth is so small that tileWidth-dx gets stored as tileWidth.

The interesting thing is that the same issue could even happen with doubles, but it's just more likely to happen with floats (which is actually a good thing since you can catch it earlier).
Operations with floats often require expressing wanted precision explicitly with an epsilon, so that if |dx| < epsilon, the value can be forced to a clean zero.

Edited by Fred on
This is what Tom Forsyth talks about in his article 'A matter of precision' https://home.comcast.net/~tom_forsyth/blog.wiki.html He recommends avoiding both float and double and using fixed point for space and time. It's an interesting read, and when you think about fixed point, to me at least, it's a more natural way of dividing up a space, as it creates constant intervals across the space.

So i think i have this correct ... a 32 bit int could represent a 24.8 bit fixed point, giving a range of 0 to 16777216+255/256. Where the fractional part provides a precision of 1/256th of a unit. That is a very reasonable division and quite a range. You can vary the position of the binary point to balance the integral range and the fractional precision.

Edited by Gavin Williams on Reason: typo
1/256th of a unit seems OK at first but I don't think it's precise enough to handle the acceleration/friction thing.
Fred
But this solution is more complicated when the world is truly open and dynamic, with a mix of very big and very small objects, like a space sim where capital ships could be many miles long.

I use a similar (two-coord, not double) scheme for exactly such things.

- Casey
cmuratori
Fred
But this solution is more complicated when the world is truly open and dynamic, with a mix of very big and very small objects, like a space sim where capital ships could be many miles long.

I use a similar (two-coord, not double) scheme for exactly such things.

- Casey


In this case, would you end up baking the two-coord (say, int32 + real32) into your vector class and rewrite your vector math functions to manipulate both, like to compute the length of a very long vector?
For simple operations, like sub and add, it's probably ok to operate on the two coord directly, but maybe for more complex math (like vector length) it's best to convert to a temporary double? Hmm.
I'm torn...
Another possibility is that instead of storing world coordinates as int+float, we can use doubles:

- Both solutions are the same size in memory.

- With doubles, even if they're slower than floats, you can use standard vector math. With int+float you need to write world position manipulation that operates on the int coords and the float vec, then possibly "recanonicalize" (which requires divides and mods, etc).

Say that you want the distance between two points far apart in the game world.
With doubles it's straightforward:

(I don't user operator overloading here)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
struct v2d {
  real64 x, y;   //a pair of doubles
}

v2d 
subtract (v2d p1, v2d p2) {
    v2d res = {};
    res.x = p2.x - p1.x;
    res.y = p2.y - p1.y;
    return res;	
}

double 
length(v2d p) {
   double res = sqrt(p.x*p.x + p.y*p.y);
   return res;
}

double
distance(v2d p1, v2d p2) {
   v2d d = subtract(p1, p2);
   return length(d);
}


With int+float, you need:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
struct v2 {
   real32 x, y;
}

struct coord {
   uint32 x, y;
}

struct world_pos {
   coord c;
   v2 p;
}

world_pos 
subtract (world_pos p1, world_pos p2) {
    world_pos res = {};
    res.c.x = p2.c.x - p1.c.x;
    res.c.y = p2.c.y - p1.c.y;
    res.p.x = p2.p.x - p1.p.x;
    res.p.y = p2.p.y - p1.p.y;
    //possibly recanonicalize res???
    return res;
}

??
length (world_pos p) {
  ???
}


For a subtraction, you're roughly operating on the same number of bits in both cases (128). But if you need to recanonicalize, int+float is gonna be slower.

To compute the length of a world vector, with int/float, you would need to return a scalar int+float to maintain full precision. And the math isn't obvious to me.


With world positions as doubles, when you want to do lots of fast "local" physics, you can convert all your world double positions for objects in the local sub-space into some float vector in a local frame of reference:

1
2
3
4
5
6
7
8
9
v2 
convertToLocalPos( v2d localOrigin, v2d worldPosition) {
    
     v2d localPos = subtract(worldPosition, localOrigin);
     v2 res = {};
     res.x = (real32) localPos.x;
     res.y = (real32) localPos.y;
     return res;	
}



int+float does have the advantage that the precision doesn't depend on world location at all.
Also, the coord part (the ints) can be use to do some sort of broad phase collision detection (you can easily use them as indices into a quadtree).

Edited by Fred on
The main difference is Floats and Doubles are binary floating point types and a Decimal will store the value as a floating decimal point type. So Decimals have much higher precision and are usually used within monetary (financial) applications that require a high degree of accuracy. But in performance wise Decimals are slower than double and float types.

Float - 7 digits (32 bit)

Double-15-16 digits (64 bit)

Decimal -28-29 significant digits (128 bit)

Decimals have much higher precision and are usually used within financial applications that require a high degree of accuracy. Decimals are much slower (up to 20X times in some tests) than a double/float. Decimals and Floats/Doubles cannot be compared without a cast whereas Floats and Doubles can. Decimals also allow the encoding or trailing zeros.
What is this "Decimal" you are talking about? SQL datatype?
Seems they're just quoting verbatim from the web page they linked to, for whatever reason. The web page is talking about the C# decimal type, which is not relevant here.