Incidentally, I've been mucking around with SSE2, and this is the simplest and fastest floor function that I could come up with:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40 | inline int32_t
sse2_floor(float value)
{
// Get a vector of all ones.
// mmozeiko's version:
__m128 all_ones = _mm_set1_ps(1.0f);
// My original version was this. It generates essentially the
// same code as the above on Clang, but worse code on MSVC.
// float one = 1.0f;
// __m128 all_ones = _mm_load1_ps(&one);
// Load the value into a SSE register. If you need a version which does
// four floor operations at once, use _mm_load_ps or _mm_loadu_ps here.
__m128 x = _mm_set1_ps(value);
// Convert x to int and back to float
__m128i x_to_int = _mm_cvttps_epi32(x);
__m128 x_to_int_to_float = _mm_cvtepi32_ps(x_to_int);
// A result needs adjustment if the round-trip resulted in an
// increase.
__m128 adjustment_mask = _mm_cmpgt_ps(x_to_int_to_float, x);
// The adjustment is to subtact one.
__m128 adjustment = _mm_and_ps(adjustment_mask, all_ones);
__m128 x_floor = _mm_sub_ps(x_to_int_to_float, adjustment);
// Extract the result as an integer.
return _mm_cvtss_si32(x_floor);
// If you are doing four floor operations and want the result as
// a vector of int32s, this is probably more efficient:
// WARNING: Untested code. May need to coerce __m128i to __m128 or something.
// __m128i all_ones_int = _mm_setr_epi32(1,1,1,1);
// __m128 adjustment = _mm_and_ps(adjustment_mask, all_ones_int);
// return _mm_sub_ps(x_to_int, adjustment);
}
|
While I'm at it, here's the world's best round function. It only works for well-behaved normal floats.
| #include <float.h>
int32_t worlds_best_round(float x) {
const float float_to_int = 0.75f * (1u << FLT_MANT_DIG);
return (int32_t)((x + float_to_int) - float_to_int); // Hope that /fp:fast doesn't optimise this away.
}
|
That constant, float_to_int, is one of the most important constants in all of IEEE-754 bit hackery. I'm certain it will come up again before Handmade Hero is done.
One final thought. I'm not having a go at Casey here, given in the discussion of the floor() function in Day... 31, I think it was, and quite rightly noting that we don't need all full compliance, describing floating point numbers as "real" is a barefaced lie. What you name something is, of course, more contentious than what it is, but does the "real32" typedef bother anyone other than me?