Example outputs:-

1)

1 | -15.000000 unsigned rounds to -14 |

Should round to -15?

2)

1 | -14.099999 unsigned rounds to -13 |

Should round to -14?

Code and debug output file are attached below.

Rounding functions giving wrong output

Edited by
Class GenericHuman
on
When I run the handmade intrinsic functions, I get wrong output for negative numbers.

Example outputs:-

1)

Should round to -15?

2)

Should round to -14?

Code and debug output file are attached below.

Example outputs:-

1)

1 | -15.000000 unsigned rounds to -14 |

Should round to -15?

2)

1 | -14.099999 unsigned rounds to -13 |

Should round to -14?

Code and debug output file are attached below.

Rounding functions giving wrong output

Solution to this might be using:

instead of the original function

1 2 3 4 5 6 7 8 9 10 11 12 13 14 | inline int32 RoundReal32ToInt32(real32 Real32) { if (Real32 >= 0.0f) { int32 Result = (int32)(Real32 + 0.5f); return(Result); } else { int32 Result = (int32)(Real32 - 0.5f); return(Result); } } |

instead of the original function

Rounding functions giving wrong output

I think this is one of the few cases in video game programming where unit tests would actually make sense.

Andrew Bromage

183 posts
/ 1 project

Research engineer, resident maths nerd (Erdős number 3).

Rounding functions giving wrong output

Just for the record (it's a TODO in the source code), C99 has the rint family of functions, such as lrintf(), and C++11 has an overloaded round() function.

The semantics are slightly different, in that it uses the prevailing rounding mode, which by default should be "round to even":

Why would you bother? Consider the following sequence of operations:

One way to judge a rounding mode is to see what happens when you do things like this.

The semantics are slightly different, in that it uses the prevailing rounding mode, which by default should be "round to even":

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | int32 RoundFloatToInt32(float f) { // Round down int32 i = f < 0 ? -((int32)-f) : (int32)f; // Get the fractional part f -= i; // If less than 0.5, round down. if (f < 0.5f) { return i; } // If greater than 0.5, round up. if (f > 0.5f) { return i + 1; } // If exactly 0.5, round to the nearest // even number. return (i % 2 == 0) ? i : i + 1; } |

Why would you bother? Consider the following sequence of operations:

1 2 3 4 5 6 7 | float f = 0.5f; f = round(f) + 0.5f; f = round(f) - 0.5f; f = round(f) + 0.5f; f = round(f) - 0.5f; f = round(f) + 0.5f; f = round(f) - 0.5f; |

One way to judge a rounding mode is to see what happens when you do things like this.

Rounding functions giving wrong output

Edited by
Class GenericHuman
on
Round to even is a biased round but that is not what I am talking about. In Casey's round function

In this image, you place the point whose RoundReal32ToInt32 you want to see on the slanted line on top of the nunmber line and follow the slanted line until the arrow. The integer pointed by the arrow is the output of the function.

Since we are adding 0.5 and then truncuating, anything between I and I-0.5f(where I is some negative number) is rounding to the wrong value. That is why for negative number we have to subtract 0.5 instead of adding 0.5f.

In this image, you place the point whose RoundReal32ToInt32 you want to see on the slanted line on top of the nunmber line and follow the slanted line until the arrow. The integer pointed by the arrow is the output of the function.

Since we are adding 0.5 and then truncuating, anything between I and I-0.5f(where I is some negative number) is rounding to the wrong value. That is why for negative number we have to subtract 0.5 instead of adding 0.5f.

Rounding functions giving wrong output

Pseudonym73, Do you know why round to even is a common rounding mode? It seems kind of strange but I assume it must have some nice properties that are not obvious to me, either computational or technical (maybe ease of hardware implementation).

Rounding functions giving wrong output

Uberstedt, round to even is useful when you have a lot of half fractions in your data(3.5, -4.5). By using round to even(or odd), it keeps the average close to the original average (because they sometimes lose 0.5 and sometimes gain 0.5)

Example:- Average of (5x 3.5 + 10x 4 + 8x4.5) = 4.065

Rounding up average = ( 5x 4 + 10x 4 + 8x5) = 4.34

Rounding down average = (5x 3 + 10x 4 + 8x4) = 3.78

Round to even average = (5x 4 + 10x 4 + 8x4 ) = 4

Round to odd average = (5x 3 + 10x 4 + 8x5) = 4.13

You'll have to confirm the math but you get the idea.

Example:- Average of (5x 3.5 + 10x 4 + 8x4.5) = 4.065

Rounding up average = ( 5x 4 + 10x 4 + 8x5) = 4.34

Rounding down average = (5x 3 + 10x 4 + 8x4) = 3.78

Round to even average = (5x 4 + 10x 4 + 8x4 ) = 4

Round to odd average = (5x 3 + 10x 4 + 8x5) = 4.13

You'll have to confirm the math but you get the idea.

Casey Muratori

801 posts
/ 1 project

Casey Muratori is a programmer at Molly Rocket on the game 1935 and is the host of the educational programming series Handmade Hero.

Rounding functions giving wrong output

Yeah, sorry for not putting a big disclaimer on these things, but until we *actually* get to writing these functions for real, you definitely don't want to be using them for anything other than what we're doing with them in the code!!!

- Casey

- Casey

Andrew Bromage

183 posts
/ 1 project

Research engineer, resident maths nerd (Erdős number 3).

Rounding functions giving wrong output

Uberstedt

Pseudonym73, Do you know why round to even is a common rounding mode?

I actually hinted at the answer. Consider the sequence of operations:

1 2 3 4 5 6 7 | float f = 0.5f; f = round(f) + 0.5f; f = round(f) - 0.5f; f = round(f) + 0.5f; f = round(f) - 0.5f; f = round(f) + 0.5f; f = round(f) - 0.5f; |

Suppose this is your round function:

1 2 3 | int round(float f) { return (int)(f + 0.5f); } |

What you'll find is that f steadily increases. When you consider what is happening semantically (you're just adding and subtracting 0.5), this is highly undesirable behaviour. Using round-to-even limits the amount of artificial increase that can happen, keeping the whole sequence more numerically stable.

It seems a bit weird when you're converting floats to integers, but it makes a bit more sense when you consider the kind of rounding that happens in more usual floating point operations, such as adding two floats. If the exact answer doesn't fit in your floating point number format, the result is rounded.

There are some numeric methods which act like the above example, in the least significant digit of the mantissa. If it's a simulation which runs for a long time, you don't want numbers creeping in one direction just because of the rounding mode.

Andrew Bromage

183 posts
/ 1 project

Research engineer, resident maths nerd (Erdős number 3).

Rounding functions giving wrong output

Incidentally, I've been mucking around with SSE2, and this is the simplest and fastest floor function that I could come up with:

While I'm at it, here's the world's best round function. It only works for well-behaved normal floats.

That constant, float_to_int, is one of the most important constants in all of IEEE-754 bit hackery. I'm certain it will come up again before Handmade Hero is done.

One final thought. I'm not having a go at Casey here, given in the discussion of the floor() function in Day... 31, I think it was, and quite rightly noting that we don't need all full compliance, describing floating point numbers as "real" is a barefaced lie. What you name something is, of course, more contentious than what it is, but does the "real32" typedef bother anyone other than me?

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | inline int32_t sse2_floor(float value) { // Get a vector of all ones. // mmozeiko's version: __m128 all_ones = _mm_set1_ps(1.0f); // My original version was this. It generates essentially the // same code as the above on Clang, but worse code on MSVC. // float one = 1.0f; // __m128 all_ones = _mm_load1_ps(&one); // Load the value into a SSE register. If you need a version which does // four floor operations at once, use _mm_load_ps or _mm_loadu_ps here. __m128 x = _mm_set1_ps(value); // Convert x to int and back to float __m128i x_to_int = _mm_cvttps_epi32(x); __m128 x_to_int_to_float = _mm_cvtepi32_ps(x_to_int); // A result needs adjustment if the round-trip resulted in an // increase. __m128 adjustment_mask = _mm_cmpgt_ps(x_to_int_to_float, x); // The adjustment is to subtact one. __m128 adjustment = _mm_and_ps(adjustment_mask, all_ones); __m128 x_floor = _mm_sub_ps(x_to_int_to_float, adjustment); // Extract the result as an integer. return _mm_cvtss_si32(x_floor); // If you are doing four floor operations and want the result as // a vector of int32s, this is probably more efficient: // WARNING: Untested code. May need to coerce __m128i to __m128 or something. // __m128i all_ones_int = _mm_setr_epi32(1,1,1,1); // __m128 adjustment = _mm_and_ps(adjustment_mask, all_ones_int); // return _mm_sub_ps(x_to_int, adjustment); } |

While I'm at it, here's the world's best round function. It only works for well-behaved normal floats.

1 2 3 4 5 6 | #include <float.h> int32_t worlds_best_round(float x) { const float float_to_int = 0.75f * (1u << FLT_MANT_DIG); return (int32_t)((x + float_to_int) - float_to_int); // Hope that /fp:fast doesn't optimise this away. } |

That constant, float_to_int, is one of the most important constants in all of IEEE-754 bit hackery. I'm certain it will come up again before Handmade Hero is done.

One final thought. I'm not having a go at Casey here, given in the discussion of the floor() function in Day... 31, I think it was, and quite rightly noting that we don't need all full compliance, describing floating point numbers as "real" is a barefaced lie. What you name something is, of course, more contentious than what it is, but does the "real32" typedef bother anyone other than me?

Rounding functions giving wrong output

Edited by
Mārtiņš Možeiko
on
You can use _mm_set1_ps(1.0f) to get number 1 in all four lanes of SSE register, no need for two lines (float one = 1.0f and _mm_load1_ps). You can use same intrinsic to load "value" argument into SSE register type, there won't be need to take address of value

Same value from _mm_set1_ps(1.0f) will work in case of four flooring operations (no need for _mm_setr_epi32).

For Visual Studio using _mm_set1_ps will generate fewer instructions than using _mm_load1_ps/_mm_load_ss.

With _mm_load1_ps and _mm_load_ss:

With _mm_set1_ps:

Same value from _mm_set1_ps(1.0f) will work in case of four flooring operations (no need for _mm_setr_epi32).

For Visual Studio using _mm_set1_ps will generate fewer instructions than using _mm_load1_ps/_mm_load_ss.

With _mm_load1_ps and _mm_load_ss:

1 2 3 4 5 6 7 8 9 10 11 12 13 | movss xmm1, DWORD PTR __real@3f800000 movss DWORD PTR [rsp+8], xmm0 movss xmm2, DWORD PTR value$[rsp] cvttps2dq xmm0, xmm2 movss DWORD PTR one$[rsp], xmm1 cvtdq2ps xmm1, xmm0 movss xmm0, DWORD PTR one$[rsp] shufps xmm0, xmm0, 0 cmpltps xmm2, xmm1 andps xmm2, xmm0 subps xmm1, xmm2 cvtss2si eax, xmm1 ret 0 |

With _mm_set1_ps:

1 2 3 4 5 6 7 8 9 | movss xmm2, xmm0 shufps xmm2, xmm2, 0 cvttps2dq xmm0, xmm2 cvtdq2ps xmm1, xmm0 cmpltps xmm2, xmm1 andps xmm2, XMMWORD PTR __xmm@3f8000003f8000003f8000003f800000 subps xmm1, xmm2 cvtss2si eax, xmm1 ret 0 |

Andrew Bromage

183 posts
/ 1 project

Research engineer, resident maths nerd (Erdős number 3).

Rounding functions giving wrong output

Good catch! FYI, I tested it on Clang, which generates the same code. I've edited it to suit.

Rounding functions giving wrong output

To get assembly I posted at bottom of post it should be:

Note that now all_ones variable is same as all_ones_int in your comment (where it has wrong type uint32_t).

1 2 | __m128 all_ones = _mm_set1_ps(1.0f); // set not load __m128 x = _mm_set1_ps(value); // set not load, and pass argument by value |

Note that now all_ones variable is same as all_ones_int in your comment (where it has wrong type uint32_t).

Andrew Bromage

183 posts
/ 1 project

Research engineer, resident maths nerd (Erdős number 3).

Rounding functions giving wrong output

Fixed.

It shouldn't be, because _mm_setr_epi32 should load four integers as integers, where as _mm_set1_ps should loads four floats. The point of the "possibly more efficient" method is to avoid the final float-to-int conversion, which means that the adjustment should be done on integers.

mmozeiko

Note that now all_ones variable is same as all_ones_int in your comment [...]

It shouldn't be, because _mm_setr_epi32 should load four integers as integers, where as _mm_set1_ps should loads four floats. The point of the "possibly more efficient" method is to avoid the final float-to-int conversion, which means that the adjustment should be done on integers.

Rounding functions giving wrong output

Edited by
Mārtiņš Možeiko
on
But how will call to _mm_and_ps work? It accepts two arguemnts with __m128 type. But _mm_setr_epi32 returns __m128i type. Code won't compile.

[strike]And result of function must be __m128i, not __m128 (what _mm_sub_ps returns).[/strike]Ah I see, you want to return float, not ints. And continue calculations with float types (but with floored values).

[strike]And result of function must be __m128i, not __m128 (what _mm_sub_ps returns).[/strike]Ah I see, you want to return float, not ints. And continue calculations with float types (but with floored values).