Day 19 - Audio Latency

Casey said on Day 19, that for Windows 7 they should have figured low-latency for audio. They have. It is called Windows Core Audio API and includes Windows Audio Session API (WASAPI). It was introduced in Windows Vista. DirectSound, XAudio2, WinMM are all emulated on Vista+, they all use Core Audio to make them work. And by using that all three (DSound, Xaudio2, WinMM) won't get lowest possible latency.

I modified win32_handmade.cpp code from day 19 to use Core Audio/WASAPI, not DirectSound. On my machine DirectSound works only with FramesOfAudioLatency = 3 frames. Core Audio works fine with FramesOfAudioLatency = 1, so it produces exactly next frame of audio - just as we wanted. Although it sometimes drops few samples in startup at first or second frame (I'm guessing it is because Windows is figuring out or caching something in background). But after that it is smooth. The vertical white lines stay pretty stable and evenly spaced - they sometimes move only a tiny bit, pixel or two.

Obviously this won't work on WinXP. It would be possible to adjust code to use Core Audio if it is available otherwise use DirectSound as Casey talks in Q&A.

Here's is code with win32_handmade.cpp adjusted to use WASAPI: https://gist.github.com/mmozeiko/38c64bb65855d783645c Using WASAPI imho is a bit simpler than DirectSound, no more two region nonsense. Although setup part is just a bit more code that before, but nothing too much crazy (except COM stuff).

Edited by Mārtiņš Možeiko on
Kudos! That's nice to know and I bet it will useful later on.
mmozeiko
Although it sometimes drops few samples in startup at first or second frame (I'm guessing it is because Windows is figuring out or caching something in background).

I believe the reason is that you call GlobalSoundClient->Start(); before filling the buffer. From IAudioClient::Start documentation:
To avoid start-up glitches with rendering streams, clients should not call Start until the audio engine has been initially loaded with data by calling the IAudioRenderClient::GetBuffer and IAudioRenderClient::ReleaseBuffer methods on the rendering interface.
Filling the audio buffer at initialization is not the solution since it would introduce latency (since we can't overwrite previous samples). One solution is to start the buffer the first time we fill data in it.

For calculating the samples to write, I believe it's better to do
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
int SamplesToWrite = 0;
UINT32 SoundPaddingSize;
if (SUCCEEDED(GlobalSoundClient->GetCurrentPadding(&SoundPaddingSize)))
{
    int MaxSampleCount = (int)(SoundOutput.SecondaryBufferSize - SoundPaddingSize);
    SamplesToWrite = (int) SoundOutput.LatencySampleCount - SoundPaddingSize;
    if (SamplesToWrite < 0)
    {
        SamplesToWrite = 0;
    }
    assert(SamplesToWrite <= MaxSampleCount);
}

/* Instead of
int SamplesToWrite = 0;
UINT32 SoundPaddingSize;
if (SUCCEEDED(GlobalSoundClient->GetCurrentPadding(&SoundPaddingSize)))
{
    SamplesToWrite = (int)(SoundOutput.SecondaryBufferSize - SoundPaddingSize);
    if (SamplesToWrite > SoundOutput.LatencySampleCount)
    {
        SamplesToWrite = SoundOutput.LatencySampleCount;
    }
}
*/

We want x samples (one frame worth + some latency). GetCurrentPadding returns the number of samples that are ready and haven't been read in the buffer. So the number of samples to write should be x - padding.
I tried to implement a WASAPI version of day 20 and would like to know whether this implementation is correct or there are some errors to be corrected/improvements to be made.

On running this(I set the latency to 3ms), I found padding(pending samples) to be around 23ms; seconds elapsed from the start of frame to the start of this section of code to be around 10ms; which tells us that at the start of this frame the pending samples were around 33ms(a frame worth of samples).
[EDIT]: I set the latency to 950ms and tested it; the sum of pending ms and ms elapsed from start of the frame to audio write is about 950ms so it seems to work for both low and high audio latency.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
UINT32 pendingSamples;
UINT32 samplesToFill = (uint32)(targetSecondsPerFrame * (real32)globalSoundOutput.samplesPerSecond);
UINT32 skipSamples = 0;
UINT64 playPosition, queryWallClock, flipPlayPosition, playPositionOnNextFrame;

globalSoundOutputClock->GetPosition(&playPosition, &queryWallClock);
globalSoundClient->GetCurrentPadding(&pendingSamples);

flipPlayPosition = playPosition + (uint64)(secondsRemainingUntilFrameFlip * (real32)globalSoundOutput.samplesPerSecond);
playPositionOnNextFrame = playPosition + (uint64)(targetSecondsPerFrame * (real32)globalSoundOutput.samplesPerSecond);

if(flipPlayPosition > (playPosition + pendingSamples))
{
	if(firstFrameAudio)
	{
		skipSamples = (UINT32)(flipPlayPosition - (playPosition + pendingSamples));
	}
	else
	{
		samplesToFill += (UINT32)((flipPlayPosition - (playPosition + pendingSamples)));
	}
}
else
{
        UINT32 alreadyFilled = (UINT32)(((playPosition + pendingSamples) - flipPlayPosition));
	
	if(samplesToFill > alreadyFilled)
	{
	    samplesToFill -= alreadyFilled;
	}
        else
        {
            samplesToFill = 0;
        }

}

UINT64 currentFillLevel = (playPosition + pendingSamples  + skipSamples + samplesToFill);
UINT64 minimumFillLevel =playPositionOnNextFrame + (UINT64)(((real32)globalSoundOutput.latency/1000.0f) *  (real32)(globalSoundOutput.samplesPerSecond)); 
if(minimumFillLevel > currentFillLevel)
{
	samplesToFill +=  (UINT32)((minimumFillLevel - currentFillLevel));
}

BYTE *wasapiMemory;
if((samplesToFill + skipSamples) > 0)
{
	HRESULT bufferAcquisition =globalSoundOutputClientDevice->GetBuffer((samplesToFill + skipSamples), &wasapiMemory); 
	if(FAILED(bufferAcquisition))
	{
		ASSERT(!"FAILED TO ACQUIRE BUFFER");
	}
	
	gameSoundBuffer.memory = (int16*)((BYTE*)(wasapiMemory) + (skipSamples * globalSoundOutput.bytesPerSample));
	gameSoundBuffer.samplesToOutput = samplesToFill;
	gameSoundBuffer.samplesPerSecond = globalSoundOutput.samplesPerSecond;
	ASSERT((gameSoundBuffer.samplesToOutput/globalSoundOutput.bytesPerSample) < globalSoundOutput.soundBufferSize);
	
	gameGetSoundSamples(&gameMemory, &gameSoundBuffer);
	HRESULT bufferRelease =globalSoundOutputClientDevice->ReleaseBuffer((samplesToFill + skipSamples), 0); 
	if(FAILED(bufferRelease))
	{
		ASSERT(!"FAILED TO RELEASE  BUFFER");
	}
	if(firstFrameAudio)
	{
		firstFrameAudio = false;
		playResult = globalSoundClient->Start();
		if(FAILED(playResult))
		{
			ASSERT(!"FAILED TO START PLAYING");
		}
	}
}
#if HANDMADE_INTERNAL
soundDebugMarkers[soundDebugCurrentMarker].outputPlayCursor = (DWORD)playPosition * globalSoundOutput.bytesPerSample;
soundDebugMarkers[soundDebugCurrentMarker].outputWriteCursor = (DWORD)(playPosition + pendingSamples) * globalSoundOutput.bytesPerSample;
soundDebugMarkers[soundDebugCurrentMarker].outputStartLocation = (DWORD)(playPosition + pendingSamples + skipSamples) * globalSoundOutput.bytesPerSample;
soundDebugMarkers[soundDebugCurrentMarker].targetCursor = (DWORD)(playPosition + pendingSamples  + skipSamples + samplesToFill) * globalSoundOutput.bytesPerSample;
soundDebugMarkers[soundDebugCurrentMarker].flipPlayCursor = (DWORD)(flipPlayPosition * globalSoundOutput.bytesPerSample);

sprintf_s(title, "ElapsedFrameTime:%.2fms, Pending:%.2fms, Sum:%.2fms",secondsElapsedFromFrameStartToAudioWriteBegin * 1000.0f, ((real32)pendingSamples/(real32)globalSoundOutput.samplesPerSecond) * 1000.0f, (secondsElapsedFromFrameStartToAudioWriteBegin * 1000.0f) + ( ((real32)pendingSamples/(real32)globalSoundOutput.samplesPerSecond) * 1000.0f));
//THE SUM IS ALWAYS ABOUT 30ms ON A LOW LATENCY CARD

SetWindowText(windowHandle,title);
#endif


Edited by Karan Joisher on Reason: Added what happens when latency is too high
If it sounds good to you and there is no perceptible latency when you play a sound, it's probably good.

I'm not sure about the skipSamples thing: it's only used on the first frame, so its value will always be flipPlayPosition since playPosition and pendingSamples should be 0 on the first frame. If you add that offset when writing to the buffer you add that amount of latency to the sound. In addition, the content of the buffer could contain garbage so you should probably fill it up with zeroes. Maybe I'm missing something about this.

When "converting" time to samples, you may need to ceil the floating point result before casting it to an integer.
0.016666 * 48000 = 799,968
Since you can't write partial samples, to represent 799.968 you need 800 samples.
if you do ( uint32 ) (0.016666 * 48000), 799.968 will be truncated and you will get 799 samples.
So you need to do ( uint32 ) ceil(0.016666 * 48000) to get 800 samples.

@Mārtiņš Možeiko, any chance you could make your code available online again?

If you want the code for handmade hero, you'll have to wait for mmozeiko to reply, or maybe ask them on the handmade network discord.

If you want an example on how to initialize WASAPI you can have a look at Minimal WASAPI from d7samurai.

Note that you often want/need the audio processing/mixing to be in its own thread, because by default WASAPI will request audio samples every 10ms which doesn't work well if your main loop run every 16ms.

[EDIT] Removed the code as it wasn't good.


Edited by Simon Anciaux on Reason: Removed the code example

This is a bit strange code - why ole32 is loaded dynamically? It's not like it can be missing from system. It would be loading like user32 dynamically for CreateWindowEx function. It's not technically wrong thing to do, just pretty useless imho.

Note that you often want/need the audio processing/mixing to be in its own thread, because by default WASAPI will request audio samples every 10ms which doesn't work well if your main loop run every 16ms.

That's not really a problem. You just submit audio frame a two ahead. Just like with DirectSound. The real reason why to put wasapi on thread is so to make latency smaller. Which can be done way better than with DS. But that obviously complicates code and makes you write WASAPI usage very differently - that's why I deleted that gist. It was just not a good way to use WASAPI.

I have a different gist though - https://gist.github.com/mmozeiko/5a5b168e61aff4c1eaec0381da62808f#file-win32_wasapi-h
It wraps WASAPI in a tiny single-header library that exposes very similar api as DirectSound. So replacing DS with this would be very very easy. It's also not the best way to use WASAPI, but is decent enough. Example source on top of gist shows how to use it.


Edited by Mārtiņš Možeiko on

I think I had a reason at the moment of writing that for loading ole32 dynamically, but I can't remember and it was probably not a good reason.

I'll remove the code as it was just to give a concrete example, but I should have cleaned it up before posting it.

You just submit audio frame a two ahead.

For some reason I thought that when using the callback event we had to submit a specific size. Thanks for reminding me.


Replying to mmozeiko (#30398)