On every frame, the variable tSine is incremented. After a while, won't we have issues with precision? Shouldn't 2*pi be subtracted from tSine if it exceeds 2*pi? Or is the precision of floating point so high that we don't need to care about this on the time scale of the game?

Casey Muratori

801 posts
/ 1 project

Casey Muratori is a programmer at Molly Rocket on the game 1935 and is the host of the educational programming series Handmade Hero.

Day 009: Won't tSine lose precision after a while?

The sine wave is just a test to make sure the sound is working, so we don't care at all about precision or wrapping or anything. If we were actually trying to make a program that generated a sine wave for many hours, we would think about that!

- Casey

- Casey

Day 009: Won't tSine lose precision after a while?

It seems likely that it will, and it seems to me it probably will get "stuck" at some point because the float operation will have no effect anymore. So I wrote some code:

I whipped up a simple script in D that maintains two floats - one that grows monotonically, and the other which does a mod. It shows that there definitely is quite a variation, and surprisingly quickly. Or that I have a bug in my code.

The earliest deviation happens around the 7000th iteration (probably around 2*PI because the mod kicks on there). It is fairly small though:

Then the next iteration reported is even smaller:

I don't have enough time to look into it, but I'm guessing that it probably gets further and closer a bunch of times, rather than wrapping around. Also, since it is abs, it may change from positive to negative a few times.

I whipped up a simple script in D that maintains two floats - one that grows monotonically, and the other which does a mod. It shows that there definitely is quite a variation, and surprisingly quickly. Or that I have a bug in my code.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | #!/usr/bin/env rdmd import std.stdio; import std.range; import std.math; void main() { float f = 0; float fmod = 0; float increment = 0.001; foreach(long i; iota(long.max)) { f += increment; fmod += increment; fmod %= PI * 2.0; float fsin = sin(f); float fmodsin = sin(fmod); if ((i % 1000) == 0) { writefln("%d %f", i, abs(fmodsin - fsin)); } } } |

The earliest deviation happens around the 7000th iteration (probably around 2*PI because the mod kicks on there). It is fairly small though:

1 | 7000 0.000035 |

Then the next iteration reported is even smaller:

1 | 8000 0.000022 |

I don't have enough time to look into it, but I'm guessing that it probably gets further and closer a bunch of times, rather than wrapping around. Also, since it is abs, it may change from positive to negative a few times.

Andrew Bromage

183 posts
/ 1 project

Research engineer, resident maths nerd (Erdős number 3).

Day 009: Won't tSine lose precision after a while?

As Casey noted, this is just a placeholder sound, and premature optimisation is the root of all kinds of evil. Nonetheless, we need something to do while waiting for the next instalment.

So let's try the well-known fast but stable algorithm for generating sine waves based on a second-order equation:

cos(x + 2d) = 2 cos (x+d) cos d - cos x

Here's the test program:

I compiled this with clang -O3 -msse2 on a recent MacBook Air. Here's the output after 10 million cycles:

As you can see, the second order equation overshoots slightly, so we might need to make sure the samples are scaled down a little. However, unlike cos and cos-fmod, the second order equation is still in phase!

This is what we refer to as "low harmonic distortion", and it's why you often see this algorithm in digital synthesisers.

Finally, testing the performance:

Compiled with clang++ -O3 -msse2, I get:

Doing the fmod operation adds 50% to the runtime, but the second-order equation takes only 30% of the original. So if we need to generate lots of sine wave samples over the long term, this method seems like a winner.

Of course, if we only need one sample per frame (e.g. we're not generating audio, but the game features a flashing warning light or something), then this is probably overkill.

So let's try the well-known fast but stable algorithm for generating sine waves based on a second-order equation:

cos(x + 2d) = 2 cos (x+d) cos d - cos x

Here's the test program:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | #include <stdio.h> #include <stdint.h> #include <math.h> int main() { float x = 0; float xmod = 0; // Easy way to detect frequency drift: make // the period of the wave a round number. float d = (2 * M_PI) / 100; float cosd = cosf(d); float c0 = cosf(x); float c1 = cosf(x+d); for (uint32_t i = 0; i < (1 << 31); ++i) { printf("%u\t%f\t%f\t%f\n", i, cosf(x), cosf(xmod), c0); x += d; xmod = fmodf(xmod + d, 2 * M_PI); float c2 = 2 * c1 * cosd - c0; c0 = c1; c1 = c2; } return 0; } |

I compiled this with clang -O3 -msse2 on a recent MacBook Air. Here's the output after 10 million cycles:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | 9999989 0.519647 0.877383 0.770501 9999990 0.571997 0.905778 0.809006 9999991 0.622113 0.930598 0.844318 9999992 0.669799 0.951746 0.876299 9999993 0.714870 0.969138 0.904821 9999994 0.757149 0.982704 0.929772 9999995 0.796472 0.992393 0.951053 9999996 0.832685 0.998165 0.968581 9999997 0.865645 0.999997 0.982287 9999998 0.895226 0.997884 0.992116 9999999 0.921311 0.991832 0.998030 10000000 0.943798 0.981865 1.000005 10000001 0.962599 0.968024 0.998033 10000002 0.977642 0.950362 0.992122 10000003 0.988867 0.928950 0.982296 10000004 0.996230 0.903871 0.968594 10000005 0.999703 0.875226 0.951069 10000006 0.999273 0.843126 0.929790 10000007 0.994940 0.807699 0.904842 10000008 0.986722 0.769084 0.876323 |

As you can see, the second order equation overshoots slightly, so we might need to make sure the samples are scaled down a little. However, unlike cos and cos-fmod, the second order equation is still in phase!

This is what we refer to as "low harmonic distortion", and it's why you often see this algorithm in digital synthesisers.

Finally, testing the performance:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | #include <stdio.h> #include <stdint.h> #include <math.h> #include <boost/timer/timer.hpp> #define CYCLES 100000000 #define DELTA ((2 * M_PI) / 100) float sine_wave_by_cosf() { float x = 0; // Compute the sum to ensure that this doesn't get aggressively // optimised away. float sum = 0; for (uint32_t i = 0; i < CYCLES; ++i) { sum += cosf(x); x += DELTA; } return sum; } float sine_wave_by_cosf_mod() { float x = 0; float sum = 0; for (uint32_t i = 0; i < CYCLES; ++i) { sum += cosf(x); x = fmodf(x + DELTA, 2 * M_PI); } return sum; } float sine_wave_by_secondorder() { float x = 0; float cosd = cosf(DELTA); float c0 = cosf(x); float c1 = cosf(x+DELTA); float sum = 0; for (uint32_t i = 0; i < CYCLES; ++i) { sum += c0; float c2 = 2 * c1 * cosd - c0; c0 = c1; c1 = c2; } return sum; } // Making this variable global ensures that assignments to // it aren't optimised away. float result; int main() { printf("sine_wave_by_cosf:\n"); { boost::timer::auto_cpu_timer t; result = sine_wave_by_cosf(); } printf("sine_wave_by_cosf_mod:\n"); { boost::timer::auto_cpu_timer t; result = sine_wave_by_cosf_mod(); } printf("sine_wave_by_secondorder:\n"); { boost::timer::auto_cpu_timer t; result = sine_wave_by_secondorder(); } return 0; } |

Compiled with clang++ -O3 -msse2, I get:

1 2 3 4 5 6 | sine_wave_by_cosf: 1.353193s wall, 1.340000s user + 0.000000s system = 1.340000s CPU (99.0%) sine_wave_by_cosf_mod: 2.127797s wall, 2.110000s user + 0.020000s system = 2.130000s CPU (100.1%) sine_wave_by_secondorder: 0.384126s wall, 0.380000s user + 0.000000s system = 0.380000s CPU (98.9%) |

Doing the fmod operation adds 50% to the runtime, but the second-order equation takes only 30% of the original. So if we need to generate lots of sine wave samples over the long term, this method seems like a winner.

Of course, if we only need one sample per frame (e.g. we're not generating audio, but the game features a flashing warning light or something), then this is probably overkill.

Day 009: Won't tSine lose precision after a while?

Thanks! The cos(x + 2d) = 2 cos (x+d) cos d - cos x method is cool.

Day 009: Won't tSine lose precision after a while?

This is a really cool way of generating sine-waves indeed. One disadvantage of the method is that every calculation depends on the previous one which disallows SIMD-optimization and stalls the CPU-pipeline. It's also difficult/expensive to change the frequency dynamically on a sample-by-sample basis. The fact that it stays in phase is really important though for many serious audio processing purposes so it's nice. I would probably make the data type double for this calculation because it would basically cost the same and improve the precision. The cost could come if we are generating a lot of sines interleaved, which would be able to benefit from SIMD-optimizations.

Just my 2 cents.

Just my 2 cents.

Andrew Bromage

183 posts
/ 1 project

Research engineer, resident maths nerd (Erdős number 3).

Day 009: Won't tSine lose precision after a while?

For some reason, I'm not seeing my reply, so let's try reposting.

The obvious way to get SIMD happening is to increase the stride. So let's give that a go.

Code:

And results:

The first two lines are the "error" between this and the previous second-order version, expressed as maximum absolute error and RMS error. To put that number in perspective, 0.000041 is a little over 1 bit of difference if this is being rendered to 16-bit audio samples. So it's not perfect, but it's close enough that nobody will hear the difference.

And in return, 4-way SIMD gives you more than 5-fold better performance. Pipelining matters!

The obvious way to get SIMD happening is to increase the stride. So let's give that a go.

Code:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | #include <emmintrin.h> #ifdef _WIN32 #define ALIGN(a) __declspec(align(a)) #else #define ALIGN(a) __attribute__((aligned(a))) #endif float sine_wave_by_secondorder_simd() { float x = 0; ALIGN(16) float c0v[4] = { cosf(x), cosf(x+DELTA), cosf(x+2*DELTA), cosf(x+3*DELTA) }; __m128 c0 = _mm_load_ps(c0v); ALIGN(16) float c1v[4] = { cosf(x+4*DELTA), cosf(x+5*DELTA), cosf(x+6*DELTA), cosf(x+7*DELTA) }; __m128 c1 = _mm_load_ps(c1v); float cos4d2f = 2.0f * cosf(4*DELTA); __m128 cos4d2 = _mm_load1_ps(&cos4d2f); __m128 sum = _mm_setzero_ps(); for (uint32_t i = 0; i < CYCLES / 4; ++i) { sum = _mm_add_ps(c0, sum); __m128 c2 = _mm_sub_ps(_mm_mul_ps(cos4d2, c1), c0); c0 = c1; c1 = c2; } float s; _mm_store_ss(&s, sum); // This is just a dummy anyway. return s; } |

And results:

1 2 3 4 5 6 | Max abs error: 0.000041 RMS error: 0.000018 sine_wave_by_secondorder: 0.372116s wall, 0.370000s user + 0.000000s system = 0.370000s CPU (99.4%) sine_wave_by_secondorder_simd: 0.072629s wall, 0.070000s user + 0.000000s system = 0.070000s CPU (96.4%) |

The first two lines are the "error" between this and the previous second-order version, expressed as maximum absolute error and RMS error. To put that number in perspective, 0.000041 is a little over 1 bit of difference if this is being rendered to 16-bit audio samples. So it's not perfect, but it's close enough that nobody will hear the difference.

And in return, 4-way SIMD gives you more than 5-fold better performance. Pipelining matters!

Day 009: Won't tSine lose precision after a while?

That was genius. Thank you for posting that, disproving my assumption, and teaching me something new. I will try this with double precision and experiment further. I am eternally grateful. :cheer:

Day 009: Won't tSine lose precision after a while?

Precision of the sine operation in Intel processors is actually pretty bad when the numbers are big, as discovered by Bruce Dawson: http://randomascii.wordpress.com/...-error-bounds-by-1-3-quintillion/

Day 009: Won't tSine lose precision after a while?

Edited by
Johan Öfverstedt
on
jon_valdes

Precision of the sine operation in Intel processors is actually pretty bad when the numbers are big, as discovered by Bruce Dawson: http://randomascii.wordpress.com/...-error-bounds-by-1-3-quintillion/

Very interesting blog post. Thanks!

It's a scary truth that at every level of computing, there is a bunch of erroneous functionality that flies under the radar, often for the life-time of the component.

Andrew Bromage

183 posts
/ 1 project

Research engineer, resident maths nerd (Erdős number 3).

Day 009: Won't tSine lose precision after a while?

If you have a decent C implementation, then sin() and sinf() don't use the x87 instruction anyway. I refer you to the Intel Architecture Optimization Manual, section 3.8.5:

Although x87 supports transcendental instructions, software library implementation of transcendental function [sic] can be faster in many cases.

Day 009: Won't tSine lose precision after a while?

I've been trying to make clang output the sin intrinsic without luck (even with SSE4.1 and autovectorization enabled), so it seems you might be right.

Interestingly, Casey just said in the stream he would rather have the intrinsic version:

https://www.youtube.com/watch?v=zN7llTrMMBU#t=1090

Interestingly, Casey just said in the stream he would rather have the intrinsic version:

https://www.youtube.com/watch?v=zN7llTrMMBU#t=1090

Day 009: Won't tSine lose precision after a while?

Imho, Casey said that he will use SSE intrinsics to implement sin, not that he wants "sin" fpu opcode.

Day 009: Won't tSine lose precision after a while?

It could very well be any of the two options, based on what he said:

Anyway, if people have stumbled upon the "sin" opcode precision issues, then it must have been used somewhere...

If the compiler knows how to do an intrinsic version of sinf, which is basically say: do the assembly code right there that just calls the processor and does something, you know if the processor knows how to do a sine, for example, just do that

Anyway, if people have stumbled upon the "sin" opcode precision issues, then it must have been used somewhere...

sagar, you are right, and it is actually very noticable.

I'm only on day 10 so I noticed this problem with day 9 just last night. What I noticed is that at first (and for a very short time), the pitch changes very smoothly with the stick. However, after a not-very-long period of time, I notice that the pitch goes up and down in distinct steps as you move the stick up and down. And if you let it run for a few minutes, then you can only get about 2 extra pitches going up and 2 extra pitches going down. If you restart the program, then of course you can change the frequency smoothly again. (Before the code used the stick, and just a button to switch between two frequencies, I would sometimes hear the frequencies change in buggy ways too.)

So I added the following while loop, right after increasing tSine, to get it back down into the 0 to 2PI range. I made it a while loop just in case tSine was even bigger than expected, but it should only need to be done once in the normal case anyway.

SoundOutput->tSine += 2.0f * Pi32 / (real32)SoundOutput->WavePeriod;

while (SoundOutput->tSine > 2.0f * Pi32)

{

SoundOutput->tSine -= 2.0f * Pi32;

}

SoundOutput->RunningSampleIndex++;

Although it is just test code for this game, and won't be used, I am also interested in doing some music synthesizing, so it will become important later for me.

After this small change, it worked great.

I'm only on day 10 so I noticed this problem with day 9 just last night. What I noticed is that at first (and for a very short time), the pitch changes very smoothly with the stick. However, after a not-very-long period of time, I notice that the pitch goes up and down in distinct steps as you move the stick up and down. And if you let it run for a few minutes, then you can only get about 2 extra pitches going up and 2 extra pitches going down. If you restart the program, then of course you can change the frequency smoothly again. (Before the code used the stick, and just a button to switch between two frequencies, I would sometimes hear the frequencies change in buggy ways too.)

So I added the following while loop, right after increasing tSine, to get it back down into the 0 to 2PI range. I made it a while loop just in case tSine was even bigger than expected, but it should only need to be done once in the normal case anyway.

SoundOutput->tSine += 2.0f * Pi32 / (real32)SoundOutput->WavePeriod;

while (SoundOutput->tSine > 2.0f * Pi32)

{

SoundOutput->tSine -= 2.0f * Pi32;

}

SoundOutput->RunningSampleIndex++;

Although it is just test code for this game, and won't be used, I am also interested in doing some music synthesizing, so it will become important later for me.

After this small change, it worked great.