DAY 020 : Understanding the calculation of target cursor for high and low latency sound cards

PCx : Play Cursor position when we are computing frame x and displaying frame(x-1)
WCx : Write Cursor position when we are computing frame x 
TCx : Target Cursor computed when we are computing frame x 
FlipPCx : PCx + seconds left until frame flip occurs.

Consider the game loop looks as follows:
1) Gather Input
2) Update game state and render
3) Fill auxilary sound buffer
4) Wait till we reach (1/gameUpdateHertz) secs
5) Blit to screen(Frame Flip)

Assume that steps 1) and 2) take 10ms.

For a high latency(lets say 25ms) sound card:
Consider we are computing frame 0(currently displaying blank screen and no audio)

As we have high latency, the start of our audio write will be from write cursor(WC0).

To determine the target cursor the two criteria that we'll need to follow are:
1) The next chance that we'll get to write the audio buffer(i.e next time we reach step(3) of our game loop).
Thus minimum target cursor location(TC0) would be the position of play cursor on next frame(PC1) else we'll have unfilled samples between TC0 and PC1 causing audio bugs.
2) We have to position the TC0 in such a way that when we reach the step(3) of our game loop on
next frame(Frame 1), the target cursor doesn't fall between play cursor and write cursor of next
frame(i.e between PC1 and WC1)
If we don't follow this criteria and set our target cursor(TC0) to be at, say LATENCY seconds,
from write cursor(WC0) , filling the buffer accordingly; then when we'll be computing frame 1:
step(3) we'll have to start writing from TC0 which falls in between PC1 and WC1.
(This scenario is shown in FIGURE 1).

So taking these two cases in consideration: we need to be (PC1 - WC0) samples ahead of WC0 to satisfy criteria(1) and an additional LATENCY seconds ahead of that.
Thus Position of TC0 = Position of W0 + (PC1 - WC0) + LATENCY
= Position of W0 + 7ms + 25ms
Samples will be filled from W0 to TC0.
This computation would put TC0 exactly at WC1.(This scenario is shown in FIGURE 2)

For a low latency card(lets say 5ms) sound card:

When we are computing frame 0 (displaying blank screen), we need to start writing from FlipPC0
So only in this case of computing frame 0,
Position of TC0 = Position of FlipPC0 + (PC1 - FlipPC0) + LATENCY
= Position of FlipPC0 + 10ms + 5ms
Samples will be filled from FlipPC0 to TC0.
This computation would put TC0 exactly at WC1.(This scenario is shown in figure 3)

For subsequent frames,(assume that now we are displaying frame 0 and computing frame 1)
Position of TC1 = Position of our last write(i.e TC0) + (FlipPC1 - WC1) + (PC2 - FlipPC1) + Latency
= Position of TC0 + 18ms + 10ms + 5ms
Samples will be filled from TC0 to TC1.
(This scenario is shown in figure 4)

My questions are:
- Is whatever I mentioned above correct?
- It seems that we follow the calculation for high latency card the way I described,
however for low latency we are writing entire frame (i.e we are writing from FlipPC0 to FlipPC1) even when we can be less latent than
that; is this done for not passing on the complexity to the game?

All figures:
FIGURE 1

FIGURE 2

FIGURE 3

FIGURE 4

Your explanation seems correct (although it has been a while since I watched those episodes).

Even with low latency we want to write a entire frame because we work in frames, we don't want to write sound several time per frame. I believe that even if we wrote sound several times per frame, the "perceived" latency would not change because sound is continuous (the membrane of your speaker is always moving).

Since Windows Vista, Direct Sound is "emulated" (using Core Audio API) and you can't have low latency. I believe the lower you can get is 30ms meaning the write cursor will always be 30ms ahead of the play cursor. If you want a low latency API mmozeiko made a version of day 19 using WASAPI.