Day 20: The Tragedy of Cases

I started working on an engine some time ago and just decided to implement audio support. So, I turned to Handmade Hero and have finished Day 20. After banging my head for a while, I think I understand the issues related to playing audio (with respect to timing). Following is a comment I have made in my code to document all that I understand, can someone please read through it and tell me if there are any errors?

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
/* NOTE(naman): Description of current Audio system.

   --------------------------------------------------------------------

   We have two positions inside the audio buffer: play_cursor and
   write_cursor. The play_cursor signals what bytes are currently
   being played by the audio hardware (well...) while the write_cursor
   signals where in the audio buffer is it safe to write new data.
   MSDN says that write_cursor is usually ahead of the play_cursor
   by about 15 ms. As the audio gets played, both these positions
   slowly move forward (and wrap around when they reach the edge
   of the audio buffer).

   -------------------------------------------------------------------

   Now, there are two parameters that effect our audio:

     1. (write_cursor - play_cursor): The difference between the
        two positions indicates the latency of our audio system.
        Since we will be writing after the write_cursor (usually,
        read further for more details), any new data that we write
        will get played after the time indicated by the difference
        in the two positions.

     2. Difference between two back-to-back positions: This indicates
        responsiveness of the sound system. If we find the difference
        between the write_cursor (or play_cursor) of this frame and the
        one of the last, we can hit upon two cases:

       i)  The difference is zero: The cursors are actually not hardware
           variables. This means that they might have very low resolution and
           increment only once in a while. The time between two changes
           in the value of cursors could even be more than a frame time.

       ii) The difference is non-zero: At this point, the cursors have
           finally moved. However, because the cursors might have such low
           resolution, we can not be sure that play_cursor actually indicates
           the position where the audio is being played. Instead, the cursors
           would only give a rough approximation of what sound might actually
           be playing. Also, because of the low resolution, the difference
           might be very high, higher than a frame time.

   -------------------------------------------------------------------

   High latency would mean that the audio that we write -- that corresponds
   to the video image that is going to be displayed at the upcoming
   frame flip -- will actually play some time after the video image has been
   displayed.

   Low latency would mean that if we were to write the audio data at
   the write_cursor, it might actually start playing before the
   corresponding video image has appeared on the screen. In this case,
   we need to figure out what position in the sound buffer would
   correspond with the frame flip and write the audio data there. It also
   means that if we spend too much time between querying for the
   cursors and actually writing the data, the play_cursor might
   catch up to or leave behind the cached write_cursor that we queried.

   Low responsivness means that our write cursor might jump ahead
   of the position in the sound buffer until where we have actually
   written the audio data. This would mean that we have to write data
   into the part of audio buffer where it is technically not safe to write.

   High responsiveness doesn't seem to have any problems.
   TODO(naman): Make sure that this is actually true (regarding high
                responsiveness).

   ------------------------------------------------------------------------

   So, this is what we should do:

     1. Subtract the play_cursor from the write_cursor to find out the
        latency of the system.

     2. If it is low latency, calculate how much time ahead the next
        frame boundary lies (wrt the play_cursor). Find out what that time
        means in terms of samples and bytes of audio data. This is where we
        would write from instead of write_cursor.

     3. Else, if it is high latency, then see if this this the first frame ever.
        If it is, then just write the audio data worth one frame of time
        from the write_cursor.

     4. If this is not the first frame, find out the position until
        where we wrote the data last time. Call it target_cursor.

     5. If target_cursor is ahead of the write_cursor, everything is well.
        Just get a frame's worth of audio data and and write it in the buffer
        starting at target_cursor. Finally, update the target_cursor.

     6. If target_cursor is behind the write_cursor, find the difference
        between the two. Call it unsafe_region.

     7. If the duration of unsafe_region is less than the best responsiveness
        ever seen, then get (unsafe_region + one frame's worth) audio data and write
        it from target_region.

     8. If the duration of unsafe_region is more than that, then we our audio
        probably skipped. At this point, the play cursor has either already
        passed or is about to reach the last sample we wrote.

     9. First, try locking the buffer from target_cursor for a size of
        (unsafe_region + one frame's worth) audio data. If this succeeds, get
        the data, write it in and hope that we were fast enough to avoid a skip.

     10. If the lock fails, there would definitely be a skip. So, just get a
         frame's worth of data and write it from write_cursor. We fucked up,
         shame on us!

   -------------------------------------------------------------------------

   TODO(naman): Steps 2, 7 and 9 cause us to write behind the write_cursor.
                Is it safe?

*/

Edited by Naman Dixit on Reason: Initial post
You should use Windows Audio Sessions API (msdn) instead of DirectSound. See the thread Day 19 - Audio Latency in these forums. You can also search for other threads about DirectSound and WASAPI.

Otherwise your summary seems correct (and would be helpful to people following handmade hero). But I think you're trying to tackle cases that will never happen on Windows Vista and up (low latency, high responsiveness).

I think writing between the play and write cursor could lead to audio glitches: assuming that that region is buffered by DirectSound (we don't actually know what DirectSound does internally) meaning that at some point DirectSound copied it to its internal memory or the sound card, writing into it would have no effect on the current playing sound region (the region between play and write cursors). So you would write some sample in that region that would not be played, and the rest in a region that would be played (after the write cursor) and that would at best create a truncated sound, but also probably a sound click.

Edited by Simon Anciaux on Reason: typo
Thanks. I will give WASAPI a look. Having two audio backends will probably be good for debugging any future audio issues too.

WRT writing behind write_cursor, buffering is what I was worried about. If we try to get a lock for a position behind the write_cursor and it fails, then at least we know we can't write there. But if we get a lock, write new data and then that data doesn't plays, we won't have any way of detecting the failure.

I guess the question I have then is: how can we keep latency to a minimum while making sure to not skip/glitch if we miss our frame rate (perhaps by a large duration, due to driver bugs or such)? The only way I could think of was to write a large amount of data and then -- if we hit the frame rate -- overwrite it. But if we can't overwrite already written data (if it is behind write_cursor in DirectSound, or if we use WASAPI), then how do we make sure that there is enough data to keep playing though long frame rate hiccups and yet not so much data that the audio lags if no frame rate hiccup happens?

Edited by Naman Dixit on Reason: Added monospace formatting for variables
In WASAPI it is very common to have separate thread that submits buffer to sound card. This thread should run with highest priority (call AvSetMmThreadCharacteristics with "Pro Audio" task name). The thread should submit buffers with minimum possible size to have least possible latency. You can query what size is with IAudioClient::GetStreamLatency.

Then you need to have mixing circular buffer. Mixing buffer should be large in size - the max overhead you expect. It's ok to have it even for something like second or more. The audio submission thread will simply read whatever is next in buffer and submit to sound card. Ideally this should be done with lockless primitives. You should really really avoid using CriticalSection/Mutex here.

So all this thread does is sits in a loop to wait when it can submit next buffer (you can wait on event associated with IAudioClient::SetEventHandle), then fill the audio buffer from your mixing circular buffer and submits it. Then go back to sleep waiting on event handle. Not much code.

In your main game loop, you get location from where the audio thread has not yet started to read and write mixed audio up to circular buffer size. Here you can have all the complexity you want - 3d spatial audio, echo effects, etc...

In case game loop stalls, the mixing thread will still submit data to audio card and you won't have any annoying sound glitches. Next frame game will start mixing sound from place where thread finished reading data.

The tricky part is getting synchronization between main game thread and audio thread to be done correctly. Especially when mixing data on game thread can take a while. You can probably manually introduce extra latency - never start writing into buffer where audio thread will want to start reading data. Give extra 10 or so msec, and only write from that position.

Edited by Mārtiņš Možeiko on
And in the case of DirectSound you can overwrite, so you'll just have the usual amount of latency (write_cursor - play_cursor).

Edited by Simon Anciaux on
mmozeiko
In WASAPI it is very common to have separate thread that submits buffer to sound card.


Thanks, I'll implement WASAPI support once I tackle multithreading.

mrmixer
And in the case of DirectSound you can overwrite, so you'll just have the usual amount of latency (write_cursor - play_cursor).


No, I know about that. I was more worried about putting in enough data so that target_cursor never gets behind write_cursor (since writing behind write_cursor is risky) while still maintaining low latency. But I'll just leave the DirectSound implementation as it is for now and later make a more polished WASAPI based audio system.

Edited by Naman Dixit on
I have fixed a few bugs in the above description and made it so that it never writes behind write_cursor (skips abound!). I am uploading it as a Github Gist here, where I'll fix any other bugs that I might find. Posting it here as it might serve as a nice reference to someone else wanting to implement a single-threaded DirectSound-based real-time audio system.

Edited by Naman Dixit on