DAY 18: Approaches to avoid audio dropouts

On day 18 timestamp 20:15 [url=https://hero.handmade.network/episode/code/day018], Casey mentions 4 approaches to help prevent audio dropouts. I couldn't understand those approaches properly, I'll mention what I have understood from the video:

1) Enforcing a frame rate: This means that we write code in such a way that we always finish the computation for the next frame within our target frame rate. Thus audio buffer will always contain some "meaningful" sound for a specific frame.

2) Overwrite next frame: Assuming that our target frame rate is 30fps we'll need to output 0.033 seconds worth of audio within 0.033 seconds. In this approach, if we are currently displaying frame A (i.e we are currently computing frame A + 1), then we'll write audio buffer with samples for frame A+1 and A+2 i.e we'll write 0.066 seconds worth of audio per frame.(How can we write audio samples for frame A+2 without knowing events that occur in frame A+1?)

3)Frame of lag: From what I understood, this is same as method (2)

4)Guard thread: A thread will run concurrently, if the thread sees that enough time is left to compute audio samples for frame A+1 then the thread will do nothing, else it will fill the audio buffer with samples for frame A+1(What will it fill in the audio buffer for frame A+1?Is it a copy of the samples in audio buffer for frame A?)

Can someone explain me these approaches and correct me if I am wrong.
Sorry if this is a really simple question with an obvious answer.


Edited by Karan Joisher on
If you are at frame A (frame A is on the screen and sound for frame A is playing), you are computing frame B and sound for frame B.

In option 1: You output exactly 1 frame (16.6 ms) worth of audio each frame. If frame A has 16.6 ms worth of audio, the computation for frame B need to be less or equal than 16.6 ms otherwise there will be a gap in the sound buffer.
1
gap = length_of_computations_for_frame_B - 16.6

The problem here is that the frame time always slightly changes, even with vsync on. So you generally need to add more samples than exactly one frame.

In option 2: You write more than a frame (lets say 2 frames) worth of audio so that if you miss your target frame rate there will be no gap. If your computation take less then a frame, the next frame will also write 2 frames but will overwrite the last frame that was outputted. The overwrite reduces the audio latency.
1
2
3
4
5
6
7
During frame A we output for B and C.
B hit the framerate, so it output for C and D. C is overwritten.
C miss the framerate, so it can't overwrite D because D started playing. It will only output E.
 [B][C][D][E]
A -  -
B    -  -
C          - 

This solution is not always possible. In direct sound I believe you can overwrite the sound buffer, but in WASAPI you can't.

In option 3: You always output one frame ahead. While playing frame A you output frame C, during frame B you output D... The first frame needs to fill 2 frame worth of audio. There is always one frame of latency.

In option 4: The audio processing is separated from the image processing. You fill the audio buffer with an amount of audio (could be less than a frame). When the buffer as been played, you fill it with the next values. This can be totally decoupled from the frame rate, you could for example update your audio 100 times a second. This solution has less latency. Remember that when playing a sound it will take more then a frame to be played so you "know" what to put next in the buffer.
Thank you for the great explanation :)
I have one more doubt.

In option 2 considering your example, suppose a user provides some input that should create a explosion.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
-> [Frame B is on screen and frame C being computed]
   Input provided on frame B
-> [Frame C is on screen and Frame D is being computed]
   Input processed during frame C and image of explosion for frame D is created. C misses frame rate 
   so now the audio for explosion is written for frame E.
-> [Frame D is on screen and Frame E is being computed]
   Frame D with explosion image is being displayed(Ideally explosion audio should have been played 
   here)
-> [Frame E is on screen and Frame F is being computed]
   Frame E is being displayed and explosion audio is played here.

So the audio that ideally should have been played with image on frame D is now being played with image on Frame E causing a mismatch of audio and image.So my questions are:

Is the above made up scenario wrong? If frame C missed the frame rate would the image that should have been on Frame D be shifted on the timeline to be displayed on Frame E? Or the above scenario can actually happen and it would be perceived as 1 frame of audio lag?
Yes, this situation can happen and you will get one frame of lag for audio. Some games do this.
That's why some of sound API's are better (where you can get more direct access to buffer for overwriting data), and some API's worse.
In addition to what mmozeiko said: I may be wrong about this but if you have vsync on and miss the frame rate, the last frame is displayed twice. So in our case, C would be displayed twice, the D image would be displayed with E sound and they would be synchronized (the E image would become F).