Day 009: Won't tSine lose precision after a while?

On every frame, the variable tSine is incremented. After a while, won't we have issues with precision? Shouldn't 2*pi be subtracted from tSine if it exceeds 2*pi? Or is the precision of floating point so high that we don't need to care about this on the time scale of the game?

Edited by Sagar on Reason: grammar fix
The sine wave is just a test to make sure the sound is working, so we don't care at all about precision or wrapping or anything. If we were actually trying to make a program that generated a sine wave for many hours, we would think about that!

- Casey
It seems likely that it will, and it seems to me it probably will get "stuck" at some point because the float operation will have no effect anymore. So I wrote some code:

I whipped up a simple script in D that maintains two floats - one that grows monotonically, and the other which does a mod. It shows that there definitely is quite a variation, and surprisingly quickly. Or that I have a bug in my code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#!/usr/bin/env rdmd

import std.stdio;
import std.range;
import std.math;

void main() {
    float f = 0;
    float fmod = 0;

    float increment = 0.001;
    foreach(long i; iota(long.max)) {
        f += increment;
        fmod += increment;
        fmod %= PI * 2.0;

        float fsin = sin(f);
        float fmodsin = sin(fmod);

        if ((i % 1000) == 0) {
            writefln("%d %f", i, abs(fmodsin - fsin));
        }
    }
}


The earliest deviation happens around the 7000th iteration (probably around 2*PI because the mod kicks on there). It is fairly small though:

1
7000 0.000035


Then the next iteration reported is even smaller:

1
8000 0.000022


I don't have enough time to look into it, but I'm guessing that it probably gets further and closer a bunch of times, rather than wrapping around. Also, since it is abs, it may change from positive to negative a few times.
As Casey noted, this is just a placeholder sound, and premature optimisation is the root of all kinds of evil. Nonetheless, we need something to do while waiting for the next instalment.

So let's try the well-known fast but stable algorithm for generating sine waves based on a second-order equation:

cos(x + 2d) = 2 cos (x+d) cos d - cos x

Here's the test program:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#include <stdio.h>
#include <stdint.h>
#include <math.h>

int
main()
{
    float x = 0;
    float xmod = 0;

    // Easy way to detect frequency drift: make
    // the period of the wave a round number.
    float d = (2 * M_PI) / 100;
    float cosd = cosf(d);
    float c0 = cosf(x);
    float c1 = cosf(x+d);

    for (uint32_t i = 0; i < (1 << 31); ++i) {
        printf("%u\t%f\t%f\t%f\n",
            i, cosf(x), cosf(xmod), c0);

        x += d;
        xmod = fmodf(xmod + d, 2 * M_PI);
        float c2 = 2 * c1 * cosd - c0;
        c0 = c1;
        c1 = c2;
    }
    return 0;
}


I compiled this with clang -O3 -msse2 on a recent MacBook Air. Here's the output after 10 million cycles:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
9999989 0.519647        0.877383        0.770501
9999990 0.571997        0.905778        0.809006
9999991 0.622113        0.930598        0.844318
9999992 0.669799        0.951746        0.876299
9999993 0.714870        0.969138        0.904821
9999994 0.757149        0.982704        0.929772
9999995 0.796472        0.992393        0.951053
9999996 0.832685        0.998165        0.968581
9999997 0.865645        0.999997        0.982287
9999998 0.895226        0.997884        0.992116
9999999 0.921311        0.991832        0.998030
10000000        0.943798        0.981865        1.000005
10000001        0.962599        0.968024        0.998033
10000002        0.977642        0.950362        0.992122
10000003        0.988867        0.928950        0.982296
10000004        0.996230        0.903871        0.968594
10000005        0.999703        0.875226        0.951069
10000006        0.999273        0.843126        0.929790
10000007        0.994940        0.807699        0.904842
10000008        0.986722        0.769084        0.876323


As you can see, the second order equation overshoots slightly, so we might need to make sure the samples are scaled down a little. However, unlike cos and cos-fmod, the second order equation is still in phase!

This is what we refer to as "low harmonic distortion", and it's why you often see this algorithm in digital synthesisers.

Finally, testing the performance:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
#include <stdio.h>
#include <stdint.h>
#include <math.h>
#include <boost/timer/timer.hpp>

#define CYCLES 100000000
#define DELTA ((2 * M_PI) / 100)

float
sine_wave_by_cosf() {
    float x = 0;

    // Compute the sum to ensure that this doesn't get aggressively
    // optimised away.
    float sum = 0;
    for (uint32_t i = 0; i < CYCLES; ++i) {
        sum += cosf(x);
        x += DELTA;
    }
    return sum;
}

float
sine_wave_by_cosf_mod() {
    float x = 0;

    float sum = 0;
    for (uint32_t i = 0; i < CYCLES; ++i) {
        sum += cosf(x);
        x = fmodf(x + DELTA, 2 * M_PI);
    }
    return sum;
}


float
sine_wave_by_secondorder() {
    float x = 0;
    float cosd = cosf(DELTA);
    float c0 = cosf(x);
    float c1 = cosf(x+DELTA);

    float sum = 0;
    for (uint32_t i = 0; i < CYCLES; ++i) {
        sum += c0;
        float c2 = 2 * c1 * cosd - c0;
        c0 = c1;
        c1 = c2;
    }
    return sum;
}

// Making this variable global ensures that assignments to
// it aren't optimised away.
float result;

int
main()
{
    printf("sine_wave_by_cosf:\n");
    {
        boost::timer::auto_cpu_timer t;
        result = sine_wave_by_cosf();
    }

    printf("sine_wave_by_cosf_mod:\n");
    {
        boost::timer::auto_cpu_timer t;
        result = sine_wave_by_cosf_mod();
    }

    printf("sine_wave_by_secondorder:\n");
    {
        boost::timer::auto_cpu_timer t;
        result = sine_wave_by_secondorder();
    }

    return 0;
}


Compiled with clang++ -O3 -msse2, I get:

1
2
3
4
5
6
sine_wave_by_cosf:
 1.353193s wall, 1.340000s user + 0.000000s system = 1.340000s CPU (99.0%)
sine_wave_by_cosf_mod:
 2.127797s wall, 2.110000s user + 0.020000s system = 2.130000s CPU (100.1%)
sine_wave_by_secondorder:
 0.384126s wall, 0.380000s user + 0.000000s system = 0.380000s CPU (98.9%)


Doing the fmod operation adds 50% to the runtime, but the second-order equation takes only 30% of the original. So if we need to generate lots of sine wave samples over the long term, this method seems like a winner.

Of course, if we only need one sample per frame (e.g. we're not generating audio, but the game features a flashing warning light or something), then this is probably overkill.
Thanks! The cos(x + 2d) = 2 cos (x+d) cos d - cos x method is cool.
This is a really cool way of generating sine-waves indeed. One disadvantage of the method is that every calculation depends on the previous one which disallows SIMD-optimization and stalls the CPU-pipeline. It's also difficult/expensive to change the frequency dynamically on a sample-by-sample basis. The fact that it stays in phase is really important though for many serious audio processing purposes so it's nice. I would probably make the data type double for this calculation because it would basically cost the same and improve the precision. The cost could come if we are generating a lot of sines interleaved, which would be able to benefit from SIMD-optimizations.

Just my 2 cents.
For some reason, I'm not seeing my reply, so let's try reposting.

The obvious way to get SIMD happening is to increase the stride. So let's give that a go.

Code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#include <emmintrin.h>

#ifdef _WIN32
#define ALIGN(a)  __declspec(align(a))
#else
#define ALIGN(a)  __attribute__((aligned(a)))
#endif


float
sine_wave_by_secondorder_simd() {
    float x = 0;

    ALIGN(16) float c0v[4] = {
        cosf(x),
        cosf(x+DELTA),
        cosf(x+2*DELTA),
        cosf(x+3*DELTA)
    };
    __m128 c0 = _mm_load_ps(c0v);

    ALIGN(16) float c1v[4] = {
        cosf(x+4*DELTA),
        cosf(x+5*DELTA),
        cosf(x+6*DELTA),
        cosf(x+7*DELTA)
    };
    __m128 c1 = _mm_load_ps(c1v);

    float cos4d2f = 2.0f * cosf(4*DELTA);
    __m128 cos4d2 = _mm_load1_ps(&cos4d2f);

    __m128 sum = _mm_setzero_ps();
    for (uint32_t i = 0; i < CYCLES / 4; ++i) {
        sum = _mm_add_ps(c0, sum);
        __m128 c2 = _mm_sub_ps(_mm_mul_ps(cos4d2, c1), c0);
        c0 = c1;
        c1 = c2;
    }

    float s;
    _mm_store_ss(&s, sum); // This is just a dummy anyway.
    return s;
}


And results:

1
2
3
4
5
6
Max abs error: 0.000041
RMS error: 0.000018
sine_wave_by_secondorder:
 0.372116s wall, 0.370000s user + 0.000000s system = 0.370000s CPU (99.4%)
sine_wave_by_secondorder_simd:
 0.072629s wall, 0.070000s user + 0.000000s system = 0.070000s CPU (96.4%)


The first two lines are the "error" between this and the previous second-order version, expressed as maximum absolute error and RMS error. To put that number in perspective, 0.000041 is a little over 1 bit of difference if this is being rendered to 16-bit audio samples. So it's not perfect, but it's close enough that nobody will hear the difference.

And in return, 4-way SIMD gives you more than 5-fold better performance. Pipelining matters!
That was genius. Thank you for posting that, disproving my assumption, and teaching me something new. I will try this with double precision and experiment further. I am eternally grateful. :cheer:
Precision of the sine operation in Intel processors is actually pretty bad when the numbers are big, as discovered by Bruce Dawson: http://randomascii.wordpress.com/...-error-bounds-by-1-3-quintillion/
jon_valdes
Precision of the sine operation in Intel processors is actually pretty bad when the numbers are big, as discovered by Bruce Dawson: http://randomascii.wordpress.com/...-error-bounds-by-1-3-quintillion/


Very interesting blog post. Thanks!

It's a scary truth that at every level of computing, there is a bunch of erroneous functionality that flies under the radar, often for the life-time of the component.

Edited by Johan Öfverstedt on
If you have a decent C implementation, then sin() and sinf() don't use the x87 instruction anyway. I refer you to the Intel Architecture Optimization Manual, section 3.8.5:

Although x87 supports transcendental instructions, software library implementation of transcendental function [sic] can be faster in many cases.
I've been trying to make clang output the sin intrinsic without luck (even with SSE4.1 and autovectorization enabled), so it seems you might be right.

Interestingly, Casey just said in the stream he would rather have the intrinsic version:
https://www.youtube.com/watch?v=zN7llTrMMBU#t=1090
Imho, Casey said that he will use SSE intrinsics to implement sin, not that he wants "sin" fpu opcode.
It could very well be any of the two options, based on what he said:
If the compiler knows how to do an intrinsic version of sinf, which is basically say: do the assembly code right there that just calls the processor and does something, you know if the processor knows how to do a sine, for example, just do that

Anyway, if people have stumbled upon the "sin" opcode precision issues, then it must have been used somewhere...
sagar, you are right, and it is actually very noticable.

I'm only on day 10 so I noticed this problem with day 9 just last night. What I noticed is that at first (and for a very short time), the pitch changes very smoothly with the stick. However, after a not-very-long period of time, I notice that the pitch goes up and down in distinct steps as you move the stick up and down. And if you let it run for a few minutes, then you can only get about 2 extra pitches going up and 2 extra pitches going down. If you restart the program, then of course you can change the frequency smoothly again. (Before the code used the stick, and just a button to switch between two frequencies, I would sometimes hear the frequencies change in buggy ways too.)

So I added the following while loop, right after increasing tSine, to get it back down into the 0 to 2PI range. I made it a while loop just in case tSine was even bigger than expected, but it should only need to be done once in the normal case anyway.

SoundOutput->tSine += 2.0f * Pi32 / (real32)SoundOutput->WavePeriod;
while (SoundOutput->tSine > 2.0f * Pi32)
{
SoundOutput->tSine -= 2.0f * Pi32;
}
SoundOutput->RunningSampleIndex++;

Although it is just test code for this game, and won't be used, I am also interested in doing some music synthesizing, so it will become important later for me.

After this small change, it worked great.

Edited by Don on