Day 19: Direct Sound Audio Sync

Henry Kloren

#21107

May 29, 2019

Hi Guys,

I'm a little confused with Casey's discussion on Audio Sync on Day 19:

Now, from what I can tell the ideal amount of FramesOfLatency is 2, because if you snap the PlayCursor on the previous frame's flip (as Casey does on Day 19), then you want to write from the byte you left off on up till 2 frames ahead of the LastPlayCursor. EG: In below Diagram we want to write from BTL to 2 Frames Ahead of LPC. Correct?

s for sound already written
LPC for LastPlayCursor
BTL for ByteToLock

|sssssssss|----------|
LPC BTL

Also, Casey mentions that an ideal amount of sound latency is 1 frame, since it'll allow close to perfect sync between video and audio; I suppose this corresponds to 2 FramesOfLatency as above, since the above calculation is from the Last Play Cursor, so is actually 1 Frame of Latency. (is this true?).

Also, I was a bit confused about why unless we set FramesOfLatency to 3 we got skips in the sound.
https://youtu.be/hELF8KRqSIs?t=9597

Casey explains it in the following clip I saw in the Day 20 Q&A, but what I don't understand is how the PlayCursor can be "a little bit off" and what he even means by that. Is he referring to DSound reporting it incorrectly I mean we know from Day 19 there is 480 sample granularity in the reporting and there are 48000 Samples/Second so (480 / 48000) * 1000 = 10ms inaccuracy, so where we expect the Play/Write cursor to be may be up to 10ms off? Is this what he is getting at?

Edited by Henry Kloren on May 29, 2019, 3:11pm

Simon Anciaux

#21112

May 29, 2019

I have a hard time understanding exactly what you mean in you explanations (I didn't recently watch the episode).

The thread [Day 019] - Possible solution to the 3-frame latency probably contains information that can help you understand the problem better.

If you still have questions feel free to ask them here.

Henry Kloren

#21113

May 30, 2019

hmmm i'll try my best to explain the crux of the problem.

in DSound we get skips in the sound unless we write 1 frame ahead of the frame we actually want to write sound for. Yet, DSound only has 30ms of Audio Latency. How is it we are skipping given a frame is 33ms and our latency is only 30ms (why do we need to write a frame ahead given these values)? I presume this could be because of inaccuracy in the Play/Write cursor reporting given it has 10ms granularity to begin with or maybe due to variability in the update time.

Simon Anciaux

#21116

May 30, 2019

I wouldn't call the write and play cursor inaccurate. They report a value that is correct. It's more that you can't accurately predict how much they will advance in a given time frame.

What we want to avoid glitches in the sound is to write enough sample in the buffer so that the play cursor will not read invalid samples. That could be done by writing a lot a samples in advance, but since we want to be able to modify sound, we also want to minimize the amount of sample we write so we can modify the sound with minimum latency.

If we could write anywhere in the buffer (if there was no concept of write cursor) we would query the position of the play cursor, write at least a frame (a frame being the amount of time we expect to elapse before we will write again in the buffer) ahead of data in front of it.

But direct sound has a concept of write cursor, meaning that some samples in front of the play cursor can't be modified and thus are latency in the sound playback. Write cursor - play cursor gives that latency, which is about 30 ms.

When you write the sound the first time, the play cursor and write cursor are both 0. If you write 1 frame worth of samples you'll get a problem on the second frame: the write cursor will be after uninitialized samples. So we need to write 2 frames worth of samples at the start. This is only when you start the playback, after you would be fine just writing a single frame.

p: play cursor
w: write cursor
s: valid samples
-: invalid samples
|: expected frame boundary

/* Frame 1 */
|ssss|----|----|
|p   |    |    |
|w   |    |    |

/* Frame 2 */
|ssss|----|----|
|    |p   |w   |

/* Solution */

/* Frame 1 */
|ssss|ssss|----|
|p   |    |    |
|w   |    |    |

/* Frame 2 */
|ssss|ssss|----|
|    |p   |w

If the game takes longer than the expected time before writing new samples, than the write cursor would again be too far. There are several reason for having a frame longer than expected:
- handmade hero uses Sleep to try to get 30 fps, and it's not precise enough;
- even if using vsync to synchronize with the monitor refresh rate there are small variations;
- the operating system could give you less time on the cpu if it has more important things that need to run;
- you're doing too much work.
To compensate for that we need to write more samples. Not necessarily 2 frames, but enough for the region between the play cursor and the write cursor to contain valid sample when we will write next. This adds to the latency but is necessary to avoid sound glitches.

/* Frame 2 */
|ssss|ssss|----|
|    |p   |    |
|    |    |w   |

/* Frame 3 */
|ssss|ssss|ssss|----|
|    |    | p  | w  |
           ^    ^
       p and w are 1 too far
       There are invalid samples here.

/* Solution */
/* We always write more than necessary,
trying to have 5 s after w as wee expect
4 samples to be needed with a safety
margin of 1. */

/* Frame 1 */
|ssss|ssss|s---|----|----|
|p   |    |    |    |    |
|w   |    |    |    |    |

/* Frame 2 */
|ssss|ssss|ssss|s---|----|
|    |p   |w   |    |    |

/* Frame 3 */
|ssss|ssss|ssss|ssss|ss--|
|    |    | p  | w  |    |
           ^
        p and w are 1 too far but the samples are valid

Edited by Simon Anciaux on May 30, 2019, 2:15pm Reason: formating

Henry Kloren

#21117

May 30, 2019

that was an amazing explanation. you've seriously helped me a ton in my understanding of this. i only have 1 more question if you don't mind.

You said: "I wouldn't call the write and play cursor inaccurate. They report a value that is correct. It's more that you can't accurately predict how much they will advance in a given time frame."

why is it that you can't predict how much play/write cursor will advance in a given frame time. isn't it true that if DSound plays 0.5seconds of sound then we can calculate how many samples that would be, and in doing so map it into our buffer?

is this what you were addressing by saying that our frame, for whatever reason, could take longer than expected due to inaccuracy in sleep, etc. or is this related to the 10ms granularity leading to unpredictability? or am i just missing the plot entirely.

Edited by Henry Kloren on May 30, 2019, 3:31pm

Simon Anciaux

#21118

May 30, 2019

HFKloren
isn't it true that if DSound plays 0.5seconds of sound then we can calculate how many samples that would be, and in doing so map it into our buffer?

The problem is that you don't know when DirectSound has played 0.5s of sounds.

We are asking it at intervals where the play cursor is. We try to make those intervals the same every time but there is always a small variation.

I remember trying to read the play and write cursor at regular interval and the result was that most of the time you get similar advance, but not all the time. For example, most of the time the advance was 960 samples, but some times it was 480 samples. I don't know if those results are due to me not getting the interval precise enough or to DirectSound internals (remember that DirectSound is "emulated" on modern Windows).

Henry Kloren

#21122

May 31, 2019

makes sense although couldn't we just use queryperformancecounter to figure out when it plays 0.5 seconds of sound? i guess it would be unsafe to assume the 2 clocks perfectly line up which may lead to some sort of propagation of error.

Edited by Henry Kloren on May 31, 2019, 1:16am

Simon Anciaux

#21127

June 1, 2019

You can measure time elapsed in your application but it doesn't guarantee anything about DirectSound internals. I can only guess what is happening in DirectSound but the play cursor is most likely not the "real" sample that is currently playing (since the play cursor is always on a multiple x samples).