Day 19 - Audio Latency

Casey said on Day 19, that for Windows 7 they should have figured low-latency for audio. They have. It is called Windows Core Audio API and includes Windows Audio Session API (WASAPI). It was introduced in Windows Vista. DirectSound, XAudio2, WinMM are all emulated on Vista+, they all use Core Audio to make them work. And by using that all three (DSound, Xaudio2, WinMM) won't get lowest possible latency.

I modified win32_handmade.cpp code from day 19 to use Core Audio/WASAPI, not DirectSound. On my machine DirectSound works only with FramesOfAudioLatency = 3 frames. Core Audio works fine with FramesOfAudioLatency = 1, so it produces exactly next frame of audio - just as we wanted. Although it sometimes drops few samples in startup at first or second frame (I'm guessing it is because Windows is figuring out or caching something in background). But after that it is smooth. The vertical white lines stay pretty stable and evenly spaced - they sometimes move only a tiny bit, pixel or two.

Obviously this won't work on WinXP. It would be possible to adjust code to use Core Audio if it is available otherwise use DirectSound as Casey talks in Q&A.

Here's is code with win32_handmade.cpp adjusted to use WASAPI: https://gist.github.com/mmozeiko/38c64bb65855d783645c Using WASAPI imho is a bit simpler than DirectSound, no more two region nonsense. Although setup part is just a bit more code that before, but nothing too much crazy (except COM stuff).

Edited by Mārtiņš Možeiko on
Kudos! That's nice to know and I bet it will useful later on.
mmozeiko
Although it sometimes drops few samples in startup at first or second frame (I'm guessing it is because Windows is figuring out or caching something in background).

I believe the reason is that you call GlobalSoundClient->Start(); before filling the buffer. From IAudioClient::Start documentation:
To avoid start-up glitches with rendering streams, clients should not call Start until the audio engine has been initially loaded with data by calling the IAudioRenderClient::GetBuffer and IAudioRenderClient::ReleaseBuffer methods on the rendering interface.
Filling the audio buffer at initialization is not the solution since it would introduce latency (since we can't overwrite previous samples). One solution is to start the buffer the first time we fill data in it.

For calculating the samples to write, I believe it's better to do
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
int SamplesToWrite = 0;
UINT32 SoundPaddingSize;
if (SUCCEEDED(GlobalSoundClient->GetCurrentPadding(&SoundPaddingSize)))
{
    int MaxSampleCount = (int)(SoundOutput.SecondaryBufferSize - SoundPaddingSize);
    SamplesToWrite = (int) SoundOutput.LatencySampleCount - SoundPaddingSize;
    if (SamplesToWrite < 0)
    {
        SamplesToWrite = 0;
    }
    assert(SamplesToWrite <= MaxSampleCount);
}

/* Instead of
int SamplesToWrite = 0;
UINT32 SoundPaddingSize;
if (SUCCEEDED(GlobalSoundClient->GetCurrentPadding(&SoundPaddingSize)))
{
    SamplesToWrite = (int)(SoundOutput.SecondaryBufferSize - SoundPaddingSize);
    if (SamplesToWrite > SoundOutput.LatencySampleCount)
    {
        SamplesToWrite = SoundOutput.LatencySampleCount;
    }
}
*/

We want x samples (one frame worth + some latency). GetCurrentPadding returns the number of samples that are ready and haven't been read in the buffer. So the number of samples to write should be x - padding.
I tried to implement a WASAPI version of day 20 and would like to know whether this implementation is correct or there are some errors to be corrected/improvements to be made.

On running this(I set the latency to 3ms), I found padding(pending samples) to be around 23ms; seconds elapsed from the start of frame to the start of this section of code to be around 10ms; which tells us that at the start of this frame the pending samples were around 33ms(a frame worth of samples).
[EDIT]: I set the latency to 950ms and tested it; the sum of pending ms and ms elapsed from start of the frame to audio write is about 950ms so it seems to work for both low and high audio latency.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
UINT32 pendingSamples;
UINT32 samplesToFill = (uint32)(targetSecondsPerFrame * (real32)globalSoundOutput.samplesPerSecond);
UINT32 skipSamples = 0;
UINT64 playPosition, queryWallClock, flipPlayPosition, playPositionOnNextFrame;

globalSoundOutputClock->GetPosition(&playPosition, &queryWallClock);
globalSoundClient->GetCurrentPadding(&pendingSamples);

flipPlayPosition = playPosition + (uint64)(secondsRemainingUntilFrameFlip * (real32)globalSoundOutput.samplesPerSecond);
playPositionOnNextFrame = playPosition + (uint64)(targetSecondsPerFrame * (real32)globalSoundOutput.samplesPerSecond);

if(flipPlayPosition > (playPosition + pendingSamples))
{
	if(firstFrameAudio)
	{
		skipSamples = (UINT32)(flipPlayPosition - (playPosition + pendingSamples));
	}
	else
	{
		samplesToFill += (UINT32)((flipPlayPosition - (playPosition + pendingSamples)));
	}
}
else
{
        UINT32 alreadyFilled = (UINT32)(((playPosition + pendingSamples) - flipPlayPosition));
	
	if(samplesToFill > alreadyFilled)
	{
	    samplesToFill -= alreadyFilled;
	}
        else
        {
            samplesToFill = 0;
        }

}

UINT64 currentFillLevel = (playPosition + pendingSamples  + skipSamples + samplesToFill);
UINT64 minimumFillLevel =playPositionOnNextFrame + (UINT64)(((real32)globalSoundOutput.latency/1000.0f) *  (real32)(globalSoundOutput.samplesPerSecond)); 
if(minimumFillLevel > currentFillLevel)
{
	samplesToFill +=  (UINT32)((minimumFillLevel - currentFillLevel));
}

BYTE *wasapiMemory;
if((samplesToFill + skipSamples) > 0)
{
	HRESULT bufferAcquisition =globalSoundOutputClientDevice->GetBuffer((samplesToFill + skipSamples), &wasapiMemory); 
	if(FAILED(bufferAcquisition))
	{
		ASSERT(!"FAILED TO ACQUIRE BUFFER");
	}
	
	gameSoundBuffer.memory = (int16*)((BYTE*)(wasapiMemory) + (skipSamples * globalSoundOutput.bytesPerSample));
	gameSoundBuffer.samplesToOutput = samplesToFill;
	gameSoundBuffer.samplesPerSecond = globalSoundOutput.samplesPerSecond;
	ASSERT((gameSoundBuffer.samplesToOutput/globalSoundOutput.bytesPerSample) < globalSoundOutput.soundBufferSize);
	
	gameGetSoundSamples(&gameMemory, &gameSoundBuffer);
	HRESULT bufferRelease =globalSoundOutputClientDevice->ReleaseBuffer((samplesToFill + skipSamples), 0); 
	if(FAILED(bufferRelease))
	{
		ASSERT(!"FAILED TO RELEASE  BUFFER");
	}
	if(firstFrameAudio)
	{
		firstFrameAudio = false;
		playResult = globalSoundClient->Start();
		if(FAILED(playResult))
		{
			ASSERT(!"FAILED TO START PLAYING");
		}
	}
}
#if HANDMADE_INTERNAL
soundDebugMarkers[soundDebugCurrentMarker].outputPlayCursor = (DWORD)playPosition * globalSoundOutput.bytesPerSample;
soundDebugMarkers[soundDebugCurrentMarker].outputWriteCursor = (DWORD)(playPosition + pendingSamples) * globalSoundOutput.bytesPerSample;
soundDebugMarkers[soundDebugCurrentMarker].outputStartLocation = (DWORD)(playPosition + pendingSamples + skipSamples) * globalSoundOutput.bytesPerSample;
soundDebugMarkers[soundDebugCurrentMarker].targetCursor = (DWORD)(playPosition + pendingSamples  + skipSamples + samplesToFill) * globalSoundOutput.bytesPerSample;
soundDebugMarkers[soundDebugCurrentMarker].flipPlayCursor = (DWORD)(flipPlayPosition * globalSoundOutput.bytesPerSample);

sprintf_s(title, "ElapsedFrameTime:%.2fms, Pending:%.2fms, Sum:%.2fms",secondsElapsedFromFrameStartToAudioWriteBegin * 1000.0f, ((real32)pendingSamples/(real32)globalSoundOutput.samplesPerSecond) * 1000.0f, (secondsElapsedFromFrameStartToAudioWriteBegin * 1000.0f) + ( ((real32)pendingSamples/(real32)globalSoundOutput.samplesPerSecond) * 1000.0f));
//THE SUM IS ALWAYS ABOUT 30ms ON A LOW LATENCY CARD

SetWindowText(windowHandle,title);
#endif


Edited by Karan Joisher on Reason: Added what happens when latency is too high
If it sounds good to you and there is no perceptible latency when you play a sound, it's probably good.

I'm not sure about the skipSamples thing: it's only used on the first frame, so its value will always be flipPlayPosition since playPosition and pendingSamples should be 0 on the first frame. If you add that offset when writing to the buffer you add that amount of latency to the sound. In addition, the content of the buffer could contain garbage so you should probably fill it up with zeroes. Maybe I'm missing something about this.

When "converting" time to samples, you may need to ceil the floating point result before casting it to an integer.
0.016666 * 48000 = 799,968
Since you can't write partial samples, to represent 799.968 you need 800 samples.
if you do ( uint32 ) (0.016666 * 48000), 799.968 will be truncated and you will get 799 samples.
So you need to do ( uint32 ) ceil(0.016666 * 48000) to get 800 samples.