GameUpdateAndRender and GameGetSoundSamples

Hi there,
Casey probably mentioned it in one of the streams, but I must have missed it. Why are the GameUpdateAndRender and GameGetSoundSamples functions separated? It seems to me like it would be simpler to pass the sound buffer to GameUpdateAndRender and have it filled with a piece of sound that should start playing at the same moment when the graphics frame is displayed.
Nimbal
Hi there,
Casey probably mentioned it in one of the streams, but I must have missed it. Why are the GameUpdateAndRender and GameGetSoundSamples functions separated? It seems to me like it would be simpler to pass the sound buffer to GameUpdateAndRender and have it filled with a piece of sound that should start playing at the same moment when the graphics frame is displayed.


This ended up happening due to the way he decided to time the sound output relative to the frame output. When we output the sound, we want to take into account how long it took to render the frame. So we give the game information so it can update the buffer, we render the buffer, and then we do the sound. I think Casey mentioned he wanted to move those things back together, so we might revisit that when we polish the sound part of the platform layer.
I seem to recall that Casey touched on this briefly when he was doing the sound stuff, but audio typically is handled in a separate high-priority thread since it is more time-critical than graphics. No decision was made, but if it does end up in its own thread, then it will remain separate from GameUpdateAndRender I suspect.
[quote=Flyingsand]
audio typically is handled in a separate high-priority thread since it is more time-critical than graphics


Isn't that was DirectSound does for us, though? Let's say we used a separate thread for mixing our sound samples and copying them into DirectSound's buffer. In between graphics frames, we would still use the same game state all the time until the next GameUpdateAndRender call was through to, well, update the state.

So it wouldn't matter much whether we generated a 10th of a frame worth of sound samples 15 times (with half a frame as a safety margin) in a separate thread, or just the whole frame and a half upfront, especially with a (more or less) fixed framerate. The only thing we would gain is an optimization in case we do hit our target frame time, in which case we would have only generated 1.1 frames worth of samples instead of 1.5 frames. I'm not experienced enough with audio programming to say if that is worth the overhead of the necessary synchronization, though.

[quote=ChronalDragon]
This ended up happening due to the way he decided to time the sound output relative to the frame output.

I can't say I fully understood what Casey was trying to do with the ExpectedBytesUntilFlip shenanigans, so I reviewed his audio code (from day 31). I'm still not sure I understand it, but, as far as I can tell, the only thing that is really used from all those calculations is the BytesToWrite that tells GameGetSoundSamples how many samples to generate. In low latency cases, BytesToWrite is a little smaller than in high latency cases. Is that really the whole point? An optimization to reduce the number of samples we need to generate? Or am I still missing something here?

Anyway, here's how I wrote the audio part of my platform layer, feel free to poke holes into it.

  1. Create an auxiliary ring buffer with the same size as DirectSound's ring buffer.
  2. Maintain a "game time cursor" into this auxiliary buffer.
  3. GameUpdateAndRender writes one frame of audio (plus a safety margin) into the SoundBuffer.
  4. Advance the game time cursor by the same dT that was passed to GameUpdateAndRender.
  5. Copy the piece of sound that was obtained from GameUpdateAndRender into the auxiliary ring buffer, starting from the game time cursor.
  6. Lock the DirectSound buffer, starting from the write cursor with DSBLOCK_FROMWRITECURSOR. The number of bytes to lock is however many bytes GameUpdateAndRender produced.
  7. Query the position of the write cursor with GetCurrentPosition. Remember that this position is also valid for our auxiliary ring buffer.
  8. Starting from the write cursor, copy data from the auxiliary ring buffer into the DirectSound buffer.
  9. Unlock the DirectSoundBuffer
  10. Next frame, please!

The most complicated part of all this is copying from one ring buffer into the other, but it's not that difficult (Edit: Come to think of it, it's really easy in this case since we always copy identically sized regions). You could probably do without the auxiliary ring buffer by doing some gymnastics with the game time cursor and write cursor, but I find it easier to reason about two ring buffers than juggling cursors.

In any case, this approach will always fill DirectSound's buffer from its current write cursor. I don't see how you can do any better in terms of latency without switching to another API altogether. Granted, it does always generate the same amount of sound samples which might not be always necessary. If that turns out to be a bottleneck, the sound sample generation can still be separated from GameUpdateAndRender. I'm not entirely sure it would be necessary to create a dedicated thread for it, though.

Edited by Benjamin Kloster on
Nimbal
[quote=Flyingsand]
audio typically is handled in a separate high-priority thread since it is more time-critical than graphics


Isn't that was DirectSound does for us, though? Let's say we used a separate thread for mixing our sound samples and copying them into DirectSound's buffer. In between graphics frames, we would still use the same game state all the time until the next GameUpdateAndRender call was through to, well, update the state.

So it wouldn't matter much whether we generated a 10th of a frame worth of sound samples 15 times (with half a frame as a safety margin) in a separate thread, or just the whole frame and a half upfront, especially with a (more or less) fixed framerate. The only thing we would gain is an optimization in case we do hit our target frame time, in which case we would have only generated 1.1 frames worth of samples instead of 1.5 frames. I'm not experienced enough with audio programming to say if that is worth the overhead of the necessary synchronization, though.


Yes, DirectSound will operate on its own thread, but the game is still on the hook for providing it with 48000 samples/second. So the issue isn't synchronization with the game as much as it is actually getting the sound samples from the game to the sound card in time.

Take background music for example; the sample-to-sample data of the music does not depend on user input unlike the graphics in which every frame is a result of user input. So they're fundamentally different in this respect. When the game decides to start playing a piece of music or a sound effect at some time T, it has all the data it needs to keep sending it to the sound card. It doesn't need to wait for each frame to provide the audio samples, but we do it so that we can synchronize audio timing with the game.

However, if they drift apart due to framerate hiccups or something, the game should still keep providing the sound card with audio samples even if it means a temporary desynchronization, or else you'll start hearing clicks and pops which are incredibly irritating.
What I found with at least the first piece of DirectSound code is that it updates the DSsoundbuffer "too often".

Even if the number of samples written each time is small, and even smaller then a frame, this is actually more timeconsuming than to time the update to each frame.
The locking of the buffer takes much more time, then it takes to write the samples, is my guess. And you become DS lock-"bound".

Therefore, one should write as much as possible. Yet, not so much that it takes time to change the sound, based on userinput. I am guessing a bit here, because it's hard to get it exactly right. And even if it seems I did, things may change later? IDK.

But when I tested the orignal code, and the modified code, on Wine of all things, the original code didnt make it in Wine. It caused stuttering sound. But the code that time exactly a frame at a time, or as close as possible, would play no problem in Wine as well. That's how I came to the conclusion above.

Further more, I found that trying to write exactly one frame, is only possible as long as you can actually meet that framerate. And the sleepcalls in the code, makes it impossible to meet higher frames than 60fps. (in windows) On wine 30hz is/was max. (in my test).

So in Windows, the original code would play, even when you could not meet the framerate. And that may be important. However, if know you can, then I would try to write as close to a frameboundery as possible.