audio typically is handled in a separate high-priority thread since it is more time-critical than graphics
Isn't that was DirectSound does for us, though? Let's say we used a separate thread for mixing our sound samples and copying them into DirectSound's buffer. In between graphics frames, we would still use the same game state all the time until the next GameUpdateAndRender call was through to, well, update the state.
So it wouldn't matter much whether we generated a 10th of a frame worth of sound samples 15 times (with half a frame as a safety margin) in a separate thread, or just the whole frame and a half upfront, especially with a (more or less) fixed framerate. The only thing we would gain is an optimization in case we do hit our target frame time, in which case we would have only generated 1.1 frames worth of samples instead of 1.5 frames. I'm not experienced enough with audio programming to say if that is worth the overhead of the necessary synchronization, though.
This ended up happening due to the way he decided to time the sound output relative to the frame output.
I can't say I fully understood what Casey was trying to do with the ExpectedBytesUntilFlip shenanigans, so I reviewed his audio code (from day 31). I'm still not sure I understand it, but, as far as I can tell, the only thing that is really used from all those calculations is the BytesToWrite that tells GameGetSoundSamples how many samples to generate. In low latency cases, BytesToWrite is a little smaller than in high latency cases. Is that really the whole point? An optimization to reduce the number of samples we need to generate? Or am I still missing something here?
Anyway, here's how I wrote the audio part of my platform layer, feel free to poke holes into it.
- Create an auxiliary ring buffer with the same size as DirectSound's ring buffer.
- Maintain a "game time cursor" into this auxiliary buffer.
- GameUpdateAndRender writes one frame of audio (plus a safety margin) into the SoundBuffer.
- Advance the game time cursor by the same dT that was passed to GameUpdateAndRender.
- Copy the piece of sound that was obtained from GameUpdateAndRender into the auxiliary ring buffer, starting from the game time cursor.
- Lock the DirectSound buffer, starting from the write cursor with DSBLOCK_FROMWRITECURSOR. The number of bytes to lock is however many bytes GameUpdateAndRender produced.
- Query the position of the write cursor with GetCurrentPosition. Remember that this position is also valid for our auxiliary ring buffer.
- Starting from the write cursor, copy data from the auxiliary ring buffer into the DirectSound buffer.
- Unlock the DirectSoundBuffer
- Next frame, please!
The most complicated part of all this is copying from one ring buffer into the other, but it's not that
difficult (Edit: Come to think of it, it's really easy in this case since we always copy identically sized regions). You could probably do without the auxiliary ring buffer by doing some gymnastics with the game time cursor and write cursor, but I find it easier to reason about two ring buffers than juggling cursors.
In any case, this approach will always fill DirectSound's buffer from its current write cursor. I don't see how you can do any better in terms of latency without switching to another API altogether. Granted, it does always generate the same amount of sound samples which might not be always necessary. If that turns out to be a bottleneck, the sound sample generation can still be separated from GameUpdateAndRender. I'm not entirely sure it would be necessary to create a dedicated thread for it, though.