Naman Dixit
7 posts
Day 20: The Tragedy of Cases
Edited by Naman Dixit on Reason: Initial post
I started working on an engine some time ago and just decided to implement audio support. So, I turned to Handmade Hero and have finished Day 20. After banging my head for a while, I think I understand the issues related to playing audio (with respect to timing). Following is a comment I have made in my code to document all that I understand, can someone please read through it and tell me if there are any errors?

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 /* NOTE(naman): Description of current Audio system. -------------------------------------------------------------------- We have two positions inside the audio buffer: play_cursor and write_cursor. The play_cursor signals what bytes are currently being played by the audio hardware (well...) while the write_cursor signals where in the audio buffer is it safe to write new data. MSDN says that write_cursor is usually ahead of the play_cursor by about 15 ms. As the audio gets played, both these positions slowly move forward (and wrap around when they reach the edge of the audio buffer). ------------------------------------------------------------------- Now, there are two parameters that effect our audio: 1. (write_cursor - play_cursor): The difference between the two positions indicates the latency of our audio system. Since we will be writing after the write_cursor (usually, read further for more details), any new data that we write will get played after the time indicated by the difference in the two positions. 2. Difference between two back-to-back positions: This indicates responsiveness of the sound system. If we find the difference between the write_cursor (or play_cursor) of this frame and the one of the last, we can hit upon two cases: i) The difference is zero: The cursors are actually not hardware variables. This means that they might have very low resolution and increment only once in a while. The time between two changes in the value of cursors could even be more than a frame time. ii) The difference is non-zero: At this point, the cursors have finally moved. However, because the cursors might have such low resolution, we can not be sure that play_cursor actually indicates the position where the audio is being played. Instead, the cursors would only give a rough approximation of what sound might actually be playing. Also, because of the low resolution, the difference might be very high, higher than a frame time. ------------------------------------------------------------------- High latency would mean that the audio that we write -- that corresponds to the video image that is going to be displayed at the upcoming frame flip -- will actually play some time after the video image has been displayed. Low latency would mean that if we were to write the audio data at the write_cursor, it might actually start playing before the corresponding video image has appeared on the screen. In this case, we need to figure out what position in the sound buffer would correspond with the frame flip and write the audio data there. It also means that if we spend too much time between querying for the cursors and actually writing the data, the play_cursor might catch up to or leave behind the cached write_cursor that we queried. Low responsivness means that our write cursor might jump ahead of the position in the sound buffer until where we have actually written the audio data. This would mean that we have to write data into the part of audio buffer where it is technically not safe to write. High responsiveness doesn't seem to have any problems. TODO(naman): Make sure that this is actually true (regarding high responsiveness). ------------------------------------------------------------------------ So, this is what we should do: 1. Subtract the play_cursor from the write_cursor to find out the latency of the system. 2. If it is low latency, calculate how much time ahead the next frame boundary lies (wrt the play_cursor). Find out what that time means in terms of samples and bytes of audio data. This is where we would write from instead of write_cursor. 3. Else, if it is high latency, then see if this this the first frame ever. If it is, then just write the audio data worth one frame of time from the write_cursor. 4. If this is not the first frame, find out the position until where we wrote the data last time. Call it target_cursor. 5. If target_cursor is ahead of the write_cursor, everything is well. Just get a frame's worth of audio data and and write it in the buffer starting at target_cursor. Finally, update the target_cursor. 6. If target_cursor is behind the write_cursor, find the difference between the two. Call it unsafe_region. 7. If the duration of unsafe_region is less than the best responsiveness ever seen, then get (unsafe_region + one frame's worth) audio data and write it from target_region. 8. If the duration of unsafe_region is more than that, then we our audio probably skipped. At this point, the play cursor has either already passed or is about to reach the last sample we wrote. 9. First, try locking the buffer from target_cursor for a size of (unsafe_region + one frame's worth) audio data. If this succeeds, get the data, write it in and hope that we were fast enough to avoid a skip. 10. If the lock fails, there would definitely be a skip. So, just get a frame's worth of data and write it from write_cursor. We fucked up, shame on us! ------------------------------------------------------------------------- TODO(naman): Steps 2, 7 and 9 cause us to write behind the write_cursor. Is it safe? */ 
Simon Anciaux
1108 posts
Day 20: The Tragedy of Cases
Edited by Simon Anciaux on Reason: typo
You should use Windows Audio Sessions API (msdn) instead of DirectSound. See the thread Day 19 - Audio Latency in these forums. You can also search for other threads about DirectSound and WASAPI.

Otherwise your summary seems correct (and would be helpful to people following handmade hero). But I think you're trying to tackle cases that will never happen on Windows Vista and up (low latency, high responsiveness).

I think writing between the play and write cursor could lead to audio glitches: assuming that that region is buffered by DirectSound (we don't actually know what DirectSound does internally) meaning that at some point DirectSound copied it to its internal memory or the sound card, writing into it would have no effect on the current playing sound region (the region between play and write cursors). So you would write some sample in that region that would not be played, and the rest in a region that would be played (after the write cursor) and that would at best create a truncated sound, but also probably a sound click.
Naman Dixit
7 posts
Day 20: The Tragedy of Cases
Edited by Naman Dixit on Reason: Added monospace formatting for variables
Thanks. I will give WASAPI a look. Having two audio backends will probably be good for debugging any future audio issues too.

WRT writing behind write_cursor, buffering is what I was worried about. If we try to get a lock for a position behind the write_cursor and it fails, then at least we know we can't write there. But if we get a lock, write new data and then that data doesn't plays, we won't have any way of detecting the failure.

I guess the question I have then is: how can we keep latency to a minimum while making sure to not skip/glitch if we miss our frame rate (perhaps by a large duration, due to driver bugs or such)? The only way I could think of was to write a large amount of data and then -- if we hit the frame rate -- overwrite it. But if we can't overwrite already written data (if it is behind write_cursor in DirectSound, or if we use WASAPI), then how do we make sure that there is enough data to keep playing though long frame rate hiccups and yet not so much data that the audio lags if no frame rate hiccup happens?
Mārtiņš Možeiko
2233 posts / 1 project
Day 20: The Tragedy of Cases
Edited by Mārtiņš Možeiko on
In WASAPI it is very common to have separate thread that submits buffer to sound card. This thread should run with highest priority (call AvSetMmThreadCharacteristics with "Pro Audio" task name). The thread should submit buffers with minimum possible size to have least possible latency. You can query what size is with IAudioClient::GetStreamLatency.

Then you need to have mixing circular buffer. Mixing buffer should be large in size - the max overhead you expect. It's ok to have it even for something like second or more. The audio submission thread will simply read whatever is next in buffer and submit to sound card. Ideally this should be done with lockless primitives. You should really really avoid using CriticalSection/Mutex here.

So all this thread does is sits in a loop to wait when it can submit next buffer (you can wait on event associated with IAudioClient::SetEventHandle), then fill the audio buffer from your mixing circular buffer and submits it. Then go back to sleep waiting on event handle. Not much code.

In your main game loop, you get location from where the audio thread has not yet started to read and write mixed audio up to circular buffer size. Here you can have all the complexity you want - 3d spatial audio, echo effects, etc...

In case game loop stalls, the mixing thread will still submit data to audio card and you won't have any annoying sound glitches. Next frame game will start mixing sound from place where thread finished reading data.

The tricky part is getting synchronization between main game thread and audio thread to be done correctly. Especially when mixing data on game thread can take a while. You can probably manually introduce extra latency - never start writing into buffer where audio thread will want to start reading data. Give extra 10 or so msec, and only write from that position.
Simon Anciaux
1108 posts
Day 20: The Tragedy of Cases
Edited by Simon Anciaux on
And in the case of DirectSound you can overwrite, so you'll just have the usual amount of latency (write_cursor - play_cursor).
Naman Dixit
7 posts
Day 20: The Tragedy of Cases
Edited by Naman Dixit on
mmozeiko
In WASAPI it is very common to have separate thread that submits buffer to sound card.

Thanks, I'll implement WASAPI support once I tackle multithreading.

mrmixer
And in the case of DirectSound you can overwrite, so you'll just have the usual amount of latency (write_cursor - play_cursor).

No, I know about that. I was more worried about putting in enough data so that target_cursor never gets behind write_cursor (since writing behind write_cursor is risky) while still maintaining low latency. But I'll just leave the DirectSound implementation as it is for now and later make a more polished WASAPI based audio system.
Naman Dixit
7 posts
Day 20: The Tragedy of Cases
Edited by Naman Dixit on
I have fixed a few bugs in the above description and made it so that it never writes behind write_cursor (skips abound!). I am uploading it as a Github Gist here, where I'll fix any other bugs that I might find. Posting it here as it might serve as a nice reference to someone else wanting to implement a single-threaded DirectSound-based real-time audio system.