I wouldn't call the write and play cursor inaccurate. They report a value that is correct. It's more that you can't accurately predict how much they will advance in a given time frame.
What we want to avoid glitches in the sound is to write enough sample in the buffer so that the play cursor will not read invalid samples. That could be done by writing a lot a samples in advance, but since we want to be able to modify sound, we also want to minimize the amount of sample we write so we can modify the sound with minimum latency.
If we could write anywhere in the buffer (if there was no concept of write cursor) we would query the position of the play cursor, write at least a frame (a frame being the amount of time we expect to elapse before we will write again in the buffer) ahead of data in front of it.
But direct sound has a concept of write cursor, meaning that some samples in front of the play cursor can't be modified and thus are latency in the sound playback. Write cursor - play cursor gives that latency, which is about 30 ms.
When you write the sound the first time, the play cursor and write cursor are both 0. If you write 1 frame worth of samples you'll get a problem on the second frame: the write cursor will be after uninitialized samples. So we need to write 2 frames worth of samples at the start. This is only when you start the playback, after you would be fine just writing a single frame.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25 | p: play cursor
w: write cursor
s: valid samples
-: invalid samples
|: expected frame boundary
/* Frame 1 */
|ssss|----|----|
|p | | |
|w | | |
/* Frame 2 */
|ssss|----|----|
| |p |w |
/* Solution */
/* Frame 1 */
|ssss|ssss|----|
|p | | |
|w | | |
/* Frame 2 */
|ssss|ssss|----|
| |p |w
|
If the game takes longer than the expected time before writing new samples, than the write cursor would again be too far. There are several reason for having a frame longer than expected:
- handmade hero uses Sleep to try to get 30 fps, and it's not precise enough;
- even if using vsync to synchronize with the monitor refresh rate there are small variations;
- the operating system could give you less time on the cpu if it has more important things that need to run;
- you're doing too much work.
To compensate for that we need to write more samples. Not necessarily 2 frames, but enough for the region between the play cursor and the write cursor to contain valid sample when we will write next. This adds to the latency but is necessary to avoid sound glitches.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32 | /* Frame 2 */
|ssss|ssss|----|
| |p | |
| | |w |
/* Frame 3 */
|ssss|ssss|ssss|----|
| | | p | w |
^ ^
p and w are 1 too far
There are invalid samples here.
/* Solution */
/* We always write more than necessary,
trying to have 5 s after w as wee expect
4 samples to be needed with a safety
margin of 1. */
/* Frame 1 */
|ssss|ssss|s---|----|----|
|p | | | | |
|w | | | | |
/* Frame 2 */
|ssss|ssss|ssss|s---|----|
| |p |w | | |
/* Frame 3 */
|ssss|ssss|ssss|ssss|ss--|
| | | p | w | |
^
p and w are 1 too far but the samples are valid
|