Day 19 - Audio Latency

Casey said on Day 19, that for Windows 7 they should have figured low-latency for audio. They have. It is called Windows Core Audio API and includes Windows Audio Session API (WASAPI). It was introduced in Windows Vista. DirectSound, XAudio2, WinMM are all emulated on Vista+, they all use Core Audio to make them work. And by using that all three (DSound, Xaudio2, WinMM) won't get lowest possible latency.

I modified win32_handmade.cpp code from day 19 to use Core Audio/WASAPI, not DirectSound. On my machine DirectSound works only with FramesOfAudioLatency = 3 frames. Core Audio works fine with FramesOfAudioLatency = 1, so it produces exactly next frame of audio - just as we wanted. Although it sometimes drops few samples in startup at first or second frame (I'm guessing it is because Windows is figuring out or caching something in background). But after that it is smooth. The vertical white lines stay pretty stable and evenly spaced - they sometimes move only a tiny bit, pixel or two.

Obviously this won't work on WinXP. It would be possible to adjust code to use Core Audio if it is available otherwise use DirectSound as Casey talks in Q&A.

Here's is code with win32_handmade.cpp adjusted to use WASAPI: https://gist.github.com/mmozeiko/38c64bb65855d783645c Using WASAPI imho is a bit simpler than DirectSound, no more two region nonsense. Although setup part is just a bit more code that before, but nothing too much crazy (except COM stuff).

Edited by Mārtiņš Možeiko on
Kudos! That's nice to know and I bet it will useful later on.
mmozeiko
Although it sometimes drops few samples in startup at first or second frame (I'm guessing it is because Windows is figuring out or caching something in background).

I believe the reason is that you call GlobalSoundClient->Start(); before filling the buffer. From IAudioClient::Start documentation:
To avoid start-up glitches with rendering streams, clients should not call Start until the audio engine has been initially loaded with data by calling the IAudioRenderClient::GetBuffer and IAudioRenderClient::ReleaseBuffer methods on the rendering interface.
Filling the audio buffer at initialization is not the solution since it would introduce latency (since we can't overwrite previous samples). One solution is to start the buffer the first time we fill data in it.

For calculating the samples to write, I believe it's better to do
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
int SamplesToWrite = 0;
UINT32 SoundPaddingSize;
if (SUCCEEDED(GlobalSoundClient->GetCurrentPadding(&SoundPaddingSize)))
{
    int MaxSampleCount = (int)(SoundOutput.SecondaryBufferSize - SoundPaddingSize);
    SamplesToWrite = (int) SoundOutput.LatencySampleCount - SoundPaddingSize;
    if (SamplesToWrite < 0)
    {
        SamplesToWrite = 0;
    }
    assert(SamplesToWrite <= MaxSampleCount);
}

/* Instead of
int SamplesToWrite = 0;
UINT32 SoundPaddingSize;
if (SUCCEEDED(GlobalSoundClient->GetCurrentPadding(&SoundPaddingSize)))
{
    SamplesToWrite = (int)(SoundOutput.SecondaryBufferSize - SoundPaddingSize);
    if (SamplesToWrite > SoundOutput.LatencySampleCount)
    {
        SamplesToWrite = SoundOutput.LatencySampleCount;
    }
}
*/

We want x samples (one frame worth + some latency). GetCurrentPadding returns the number of samples that are ready and haven't been read in the buffer. So the number of samples to write should be x - padding.
I tried to implement a WASAPI version of day 20 and would like to know whether this implementation is correct or there are some errors to be corrected/improvements to be made.

On running this(I set the latency to 3ms), I found padding(pending samples) to be around 23ms; seconds elapsed from the start of frame to the start of this section of code to be around 10ms; which tells us that at the start of this frame the pending samples were around 33ms(a frame worth of samples).
[EDIT]: I set the latency to 950ms and tested it; the sum of pending ms and ms elapsed from start of the frame to audio write is about 950ms so it seems to work for both low and high audio latency.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
UINT32 pendingSamples;
UINT32 samplesToFill = (uint32)(targetSecondsPerFrame * (real32)globalSoundOutput.samplesPerSecond);
UINT32 skipSamples = 0;
UINT64 playPosition, queryWallClock, flipPlayPosition, playPositionOnNextFrame;

globalSoundOutputClock->GetPosition(&playPosition, &queryWallClock);
globalSoundClient->GetCurrentPadding(&pendingSamples);

flipPlayPosition = playPosition + (uint64)(secondsRemainingUntilFrameFlip * (real32)globalSoundOutput.samplesPerSecond);
playPositionOnNextFrame = playPosition + (uint64)(targetSecondsPerFrame * (real32)globalSoundOutput.samplesPerSecond);

if(flipPlayPosition > (playPosition + pendingSamples))
{
	if(firstFrameAudio)
	{
		skipSamples = (UINT32)(flipPlayPosition - (playPosition + pendingSamples));
	}
	else
	{
		samplesToFill += (UINT32)((flipPlayPosition - (playPosition + pendingSamples)));
	}
}
else
{
        UINT32 alreadyFilled = (UINT32)(((playPosition + pendingSamples) - flipPlayPosition));
	
	if(samplesToFill > alreadyFilled)
	{
	    samplesToFill -= alreadyFilled;
	}
        else
        {
            samplesToFill = 0;
        }

}

UINT64 currentFillLevel = (playPosition + pendingSamples  + skipSamples + samplesToFill);
UINT64 minimumFillLevel =playPositionOnNextFrame + (UINT64)(((real32)globalSoundOutput.latency/1000.0f) *  (real32)(globalSoundOutput.samplesPerSecond)); 
if(minimumFillLevel > currentFillLevel)
{
	samplesToFill +=  (UINT32)((minimumFillLevel - currentFillLevel));
}

BYTE *wasapiMemory;
if((samplesToFill + skipSamples) > 0)
{
	HRESULT bufferAcquisition =globalSoundOutputClientDevice->GetBuffer((samplesToFill + skipSamples), &wasapiMemory); 
	if(FAILED(bufferAcquisition))
	{
		ASSERT(!"FAILED TO ACQUIRE BUFFER");
	}
	
	gameSoundBuffer.memory = (int16*)((BYTE*)(wasapiMemory) + (skipSamples * globalSoundOutput.bytesPerSample));
	gameSoundBuffer.samplesToOutput = samplesToFill;
	gameSoundBuffer.samplesPerSecond = globalSoundOutput.samplesPerSecond;
	ASSERT((gameSoundBuffer.samplesToOutput/globalSoundOutput.bytesPerSample) < globalSoundOutput.soundBufferSize);
	
	gameGetSoundSamples(&gameMemory, &gameSoundBuffer);
	HRESULT bufferRelease =globalSoundOutputClientDevice->ReleaseBuffer((samplesToFill + skipSamples), 0); 
	if(FAILED(bufferRelease))
	{
		ASSERT(!"FAILED TO RELEASE  BUFFER");
	}
	if(firstFrameAudio)
	{
		firstFrameAudio = false;
		playResult = globalSoundClient->Start();
		if(FAILED(playResult))
		{
			ASSERT(!"FAILED TO START PLAYING");
		}
	}
}
#if HANDMADE_INTERNAL
soundDebugMarkers[soundDebugCurrentMarker].outputPlayCursor = (DWORD)playPosition * globalSoundOutput.bytesPerSample;
soundDebugMarkers[soundDebugCurrentMarker].outputWriteCursor = (DWORD)(playPosition + pendingSamples) * globalSoundOutput.bytesPerSample;
soundDebugMarkers[soundDebugCurrentMarker].outputStartLocation = (DWORD)(playPosition + pendingSamples + skipSamples) * globalSoundOutput.bytesPerSample;
soundDebugMarkers[soundDebugCurrentMarker].targetCursor = (DWORD)(playPosition + pendingSamples  + skipSamples + samplesToFill) * globalSoundOutput.bytesPerSample;
soundDebugMarkers[soundDebugCurrentMarker].flipPlayCursor = (DWORD)(flipPlayPosition * globalSoundOutput.bytesPerSample);

sprintf_s(title, "ElapsedFrameTime:%.2fms, Pending:%.2fms, Sum:%.2fms",secondsElapsedFromFrameStartToAudioWriteBegin * 1000.0f, ((real32)pendingSamples/(real32)globalSoundOutput.samplesPerSecond) * 1000.0f, (secondsElapsedFromFrameStartToAudioWriteBegin * 1000.0f) + ( ((real32)pendingSamples/(real32)globalSoundOutput.samplesPerSecond) * 1000.0f));
//THE SUM IS ALWAYS ABOUT 30ms ON A LOW LATENCY CARD

SetWindowText(windowHandle,title);
#endif


Edited by Karan Joisher on Reason: Added what happens when latency is too high
If it sounds good to you and there is no perceptible latency when you play a sound, it's probably good.

I'm not sure about the skipSamples thing: it's only used on the first frame, so its value will always be flipPlayPosition since playPosition and pendingSamples should be 0 on the first frame. If you add that offset when writing to the buffer you add that amount of latency to the sound. In addition, the content of the buffer could contain garbage so you should probably fill it up with zeroes. Maybe I'm missing something about this.

When "converting" time to samples, you may need to ceil the floating point result before casting it to an integer.
0.016666 * 48000 = 799,968
Since you can't write partial samples, to represent 799.968 you need 800 samples.
if you do ( uint32 ) (0.016666 * 48000), 799.968 will be truncated and you will get 799 samples.
So you need to do ( uint32 ) ceil(0.016666 * 48000) to get 800 samples.

@Mārtiņš Možeiko, any chance you could make your code available online again?

If you want the code for handmade hero, you'll have to wait for mmozeiko to reply, or maybe ask them on the handmade network discord.

If you want an example on how to initialize WASAPI you can have a look at Minimal WASAPI from d7samurai.

Note that you often want/need the audio processing/mixing to be in its own thread, because by default WASAPI will request audio samples every 10ms which doesn't work well if your main loop run every 16ms.

This is an example of how I use it in my programs. This is more complicated that it should be because I tried to avoid using some windows header (I can't say I remember why), so things that start with csh_ or window_ are available in headers and you should use that instead.

Also this is C code that use COM interfaces, so if you use Cpp things like IAudioClient_GetDevicePeriod( audio_client, &default_period, &minimum_period ); Will look like audio_client->GetDevicePeriod( &default_period, &minimum_period );

You call audio_start before your main window loop, and audio_update inside your main loop, and if you want to clean up audio_end after the main loop. audio_update only use is to re-initialize the audio system if the default audio device changes while the application is running (like plugin a headset).

The audio is processed in a thread, so you'll need some way to send and process command from the main thread to the audio thread (I do it with a link list of command), in the audio_mixer_render function (not provided here).

/* Core Audio APIs on msdn: https://msdn.microsoft.com/en-us/library/windows/desktop/dd316599(v=vs.85).aspx */
// #include <initguid.h>
#include "../lib/custom_system_header.h"

#define SOUND_ENABLED 1

#if SOUND_ENABLED
#define DEBUG_WASAPI 0

#ifndef window_WAVE_FORMAT_PCM
#define window_WAVE_FORMAT_PCM 0x0001
#endif

#ifndef window_WAVE_FORMAT_IEEE_FLOAT
#define window_WAVE_FORMAT_IEEE_FLOAT 0x0003
#endif

#ifndef window_WAVE_FORMAT_EXTENSIBLE
#define window_WAVE_FORMAT_EXTENSIBLE 0xFFFE
#endif

#define AUDCLNT_STREAMFLAGS_AUTOCONVERTPCM 0x80000000

#define guid_wave_extensible( name, data_1 ) csh_define_guid_2( name, data_1, 0x0000, 0x0010, 0x8000, 0x00aa00389b71 )
guid_wave_extensible( guid_pcm, window_WAVE_FORMAT_PCM ); /* KSDATAFORMAT_SUBTYPE_PCM */
guid_wave_extensible( guid_ieee_float, window_WAVE_FORMAT_IEEE_FLOAT ); /* KSDATAFORMAT_SUBTYPE_IEEE_FLOAT */

csh_WAVEFORMATEXTENSIBLE g_device_mix_format = { 0 };

#if 1

typedef enum audio_error_t {
    audio_error_failed_to_load_ole32,
    audio_error_failed_to_find_CoCreateInstance,
    audio_error_failed_to_find_CoTaskMemFree,
    audio_error_failed_to_create_device_enumerator,
    audio_error_failed_to_get_the_default_audio_endpoint,
    audio_error_failed_to_active_device,
    audio_error_failed_to_get_the_mix_format,
    audio_error_failed_to_initialize_audio_client,
    audio_error_failed_to_get_the_buffer_size,
    audio_error_failed_to_get_the_render_client_service,
    audio_error_failed_to_create_audio_client_event,
    audio_error_failed_to_set_audio_client_event_handle,
    audio_error_failed_to_create_thread_stopped_event,
    audio_error_failed_to_create_audio_thread,
    audio_error_failed_to_get_the_buffer,
    audio_error_failed_to_release_the_buffer,
    audio_error_failed_to_get_padding,
    audio_error_failed_to_start,
    audio_error_failed_to_stop
} audio_error_t;

const char* audio_error_strings[ ] = {
    "audio_error_failed_to_load_ole32",
    "audio_error_failed_to_find_CoCreateInstance",
    "audio_error_failed_to_find_CoTaskMemFree",
    "audio_error_failed_to_create_device_enumerator",
    "audio_error_failed_to_get_the_default_audio_endpoint",
    "audio_error_failed_to_active_device",
    "audio_error_failed_to_get_the_mix_format",
    "audio_error_failed_to_initialize_audio_client",
    "audio_error_failed_to_get_the_buffer_size",
    "audio_error_failed_to_get_the_render_client_service",
    "audio_error_failed_to_create_audio_client_event",
    "audio_error_failed_to_set_audio_client_event_handle",
    "audio_error_failed_to_create_thread_stopped_event",
    "audio_error_failed_to_create_audio_thread",
    "audio_error_failed_to_get_the_buffer",
    "audio_error_failed_to_release_the_buffer",
    "audio_error_failed_to_get_padding",
    "audio_error_failed_to_start",
    "audio_error_failed_to_stop"
};

uint16_t audio_error_string_lengths[ ] = {
    32, 43, 40, 46, 52, 35, 40, 45, 41, 51,
    47, 51, 49, 41, 36, 40, 33, 27, 26
};

uint16_t audio_error_max_string_length = 52;

#else

/*
#include "meta_enums.h"

_meta_enum( audio_error,
          failed_to_load_ole32,
          failed_to_find_CoCreateInstance,
          failed_to_find_CoTaskMemFree,
          failed_to_create_device_enumerator,
          failed_to_get_the_default_audio_endpoint,
          failed_to_active_device,
          failed_to_get_the_mix_format,
          failed_to_initialize_audio_client,
          failed_to_get_the_buffer_size,
          failed_to_get_the_render_client_service,
          failed_to_create_audio_client_event,
          failed_to_set_audio_client_event_handle,
          failed_to_create_thread_stopped_event,
          failed_to_create_audio_thread,
          failed_to_get_the_buffer,
          failed_to_release_the_buffer,
          failed_to_get_padding,
          failed_to_start,
          failed_to_stop
          );
*/

#endif

b32 g_audio_restart = false;

HRESULT OnDeviceStateChanged_( csh_IMMNotificationClient* This, LPCWSTR pwstrDeviceId, DWORD dwNewState ) {
    return S_OK;
}

HRESULT OnDeviceAdded_( csh_IMMNotificationClient* This, LPCWSTR pwstrDeviceId ) {
    return S_OK;
}

HRESULT OnDeviceRemoved_( csh_IMMNotificationClient* This, LPCWSTR pwstrDeviceId ) {
    return S_OK;
}

HRESULT OnDefaultDeviceChanged_( csh_IMMNotificationClient* This, csh_EDataFlow flow, csh_ERole role, LPCWSTR pwstrDefaultDeviceId ) {
    
    g_audio_restart = true;
    
    return S_OK;
}

HRESULT OnPropertyValueChanged_( csh_IMMNotificationClient* This, LPCWSTR pwstrDeviceId, const csh_PROPERTYKEY key ) {
    return S_OK;
}

csh_IMMNotificationClientVtbl g_audio_notifications_vtbl = {
    .OnDeviceStateChanged = OnDeviceStateChanged_,
    .OnDeviceAdded = OnDeviceAdded_,
    .OnDeviceRemoved = OnDeviceRemoved_,
    .OnDefaultDeviceChanged = OnDefaultDeviceChanged_,
    .OnPropertyValueChanged = OnPropertyValueChanged_
};
csh_IMMNotificationClient g_audio_notifications = { &g_audio_notifications_vtbl };

#include <avrt.h>

DWORD audio_thread( void* thread_data ) {
    
    u32 error = 0;
    profiler_timeline_initialize( audio_thread, 100000, &error );
    
    platform_t* platform = cast( platform_t*, thread_data );
    
    DWORD task_index = 0;
    HANDLE task_handle = AvSetMmThreadCharacteristicsW( L"Pro Audio", &task_index );
    
    if ( !task_handle ) {
        log_l( "[WARNING] Couldn't set the audio thread characteristic to \"Pro Audio\".\n" );
    }
    
    breakable {
        
        REFERENCE_TIME default_period;
        REFERENCE_TIME minimum_period;
        HRESULT test = csh_IAudioClient_GetDevicePeriod( platform->audio_client, &default_period, &minimum_period );
        
        if ( test != S_OK ) {
            log_l( "[ERROR] Couldn't get the audio device period.\n" );
            break;
        }
        
        u64 samples_per_second = platform->mixer->format.samples_per_second;
        u64 period_to_second = 10000000; /* NOTE simon (06/10/23 15:28:41): period is expressed in "100 nanoseconds" unit*/
        
        u32 samples_to_fill = cast( u32, ( cast( f64, default_period ) / period_to_second ) * samples_per_second );
        
        HRESULT success = csh_IAudioClient_Start( platform->audio_client );
        
        if ( success != S_OK ) {
            log_l( "[ERROR] Couldn't start the audio client.\n" );
            break;
        }
        
        while ( WaitForSingleObject( platform->audio_client_event, INFINITE ) == WAIT_OBJECT_0  ) {
            
            if ( platform->audio_stop_thread ) {
                break;
            }
            
            u32 pending_samples;
            success = csh_IAudioClient_GetCurrentPadding( platform->audio_client, &pending_samples );
            
            if ( success != S_OK ) {
                log_l( "[ERROR] Couldn't retrieve the audio padding.\n" );
                break;
            }
            
            u32 to_fill = samples_to_fill;
            
            if ( pending_samples < samples_to_fill ) {
                to_fill = samples_to_fill - pending_samples;
            }
            
#if DEBUG_WASAPI
            memory_get_on_stack( message, kibioctet( 1 ) );
            memory_push_copy_l( &message, "audio_thread: " );
            string_push_u64( &message, to_fill );
            memory_push_u8( &message, "\n" );
            debug_d( message );
#endif
            
            u8* buffer;
            success = csh_IAudioRenderClient_GetBuffer( platform->audio_render_client, to_fill, &buffer );
            
            if ( success != S_OK ) {
                log_l( "[ERROR] Couldn't get the audio buffer.\n" );
                break;
            }
            
            profiler_event_start( audio_mixer_render );
            audio_mixer_render( platform->mixer, buffer, to_fill );
            profiler_event_end( audio_mixer_render );
            
            success = csh_IAudioRenderClient_ReleaseBuffer( platform->audio_render_client, to_fill, 0 /*csh_AUDCLNT_BUFFERFLAGS_SILENT */ );
            
            if ( success != S_OK ) {
                log_l( "[ERROR] Couldn't release the audio buffer.\n" );
                break;
            }
        }
        
        success = csh_IAudioClient_Stop( platform->audio_client );
        
        if ( success != S_OK ) {
            log_l( "[ERROR] Couldn't stop the audio client.\n" );
            break;
        }
    }
    
    if ( task_handle ) {
        AvRevertMmThreadCharacteristics( task_handle );
    }
    
    thread_event_signal( &platform->audio_thread_stopped );
    
    return 0;
}

u32 g_audio_error = 0;

stu void audio_end( platform_t* platform, u32* error ) {
    
    /* NOTE simon (13/10/23 17:22:18): Release functions that are "derived" from IUnknown return the reference count left for the object. We don't want to compare them to S_OK */
    
    if ( platform->audio_enumerator ) {
        
        HRESULT success = csh_IMMDeviceEnumerator_UnregisterEndpointNotificationCallback( platform->audio_enumerator, &g_audio_notifications );
        
        if ( success != S_OK ) {
            log_l( "[WARNING][Audio] Couldn't unregister audio notification callbacks.\n" );
        }
        
        csh_IMMDeviceEnumerator_Release( platform->audio_enumerator );
        platform->audio_enumerator = 0;
    }
    
    if ( platform->audio_thread ) {
        platform->audio_stop_thread = true;
        /* NOTE simon (13/10/23 16:29:55): Do I need a memory fence here ? */
        WaitForSingleObject( platform->audio_thread_stopped, INFINITE );
        platform->audio_stop_thread = false;
        platform->audio_thread = 0;
    }
    
    if ( platform->audio_thread_stopped ) {
        CloseHandle( platform->audio_thread_stopped );
        platform->audio_thread_stopped = 0;
    }
    
    if ( platform->audio_client_event ) {
        CloseHandle( platform->audio_client_event );
        platform->audio_client_event = 0;
    }
    
    if ( platform->audio_render_client ) {
        csh_IAudioRenderClient_Release( platform->audio_render_client );
        platform->audio_render_client = 0;
    }
    
    if ( platform->audio_client ) {
        csh_IAudioClient_Release( platform->audio_client );
        platform->audio_client = 0;
    }
}

stu void audio_start( platform_t* platform, audio_mixer_t* mixer, u32* error ) {
    
    /* NOTE simon: Kernel streaming (KS) refers to the Microsoft-provided services that support kernel-mode processing of streamed data.
https://docs.microsoft.com/en-us/windows-hardware/drivers/stream/kernel-streaming
*/
    
    /* NOTE simon:
https://docs.microsoft.com/en-us/windows/win32/coreaudio/user-mode-audio-components

"In exclusive mode, the client can choose to open the stream in any audio format that the endpoint device supports. In shared mode, the client must open the stream in the mix format that is currently in use by the audio engine (or a format that is similar to the mix format). The audio engine's input streams and the output mix from the engine are all in this format.

In Windows 7, a new feature called low-latence mode has been added for streams in share mode. In this mode, the audio engine runs in pull mode, in which there a significant reduction in latency. This is very useful for communication applications that require low audio stream latency for faster streaming."

https://docs.microsoft.com/en-us/windows/win32/coreaudio/device-formats

"An application that uses WASAPI to manage shared-mode streams can rely on the audio engine to perform only limited format conversions. The audio engine can convert between a standard PCM sample size used by the application and the floating-point samples that the engine uses for its internal processing. However, the format for an application stream typically must have the same number of channels and the same sample rate as the stream format used by the device."

https://docs.microsoft.com/en-us/windows/win32/coreaudio/representing-formats-for-iec-61937-transmissions
*/
    
    /* NOTE simon: https://handmade.network/forums/t/8622/p/29139
If you want to do manual resampling you can do it with Media Foundation api using CLSID_CResamplerMediaObject object. wcap code uses it to resample captured wasapi audio for audio encoding to mp4.
    */
    
    /* NOTE simon: Raymond Chen on COM interface https://devblogs.microsoft.com/oldnewthing/20200909-00/?p=104198 */
    
    _assert( platform->audio_enumerator == 0 );
    _assert( platform->audio_client == 0 );
    _assert( platform->audio_render_client == 0 );
    _assert( platform->audio_client_event == 0 );
    _assert( platform->audio_thread == 0 );
    _assert( platform->audio_thread_stopped == 0 );
    _assert( platform->audio_stop_thread == 0 );
    
    csh_IMMDeviceEnumerator* enumerator = 0;
    csh_IMMDevice* device = 0;
    
    breakable {
        
        breakable_check( error );
        
        HMODULE ole_dll = LoadLibraryW( L"Ole32.dll" ); // LPUNKNOWN pUnkOuter
        
        if ( !ole_dll ) {
            set_error( error, audio_error_failed_to_load_ole32 );
            break;
        }
        
        struct IUnknown;
        typedef HRESULT CoCreateInstance_t( const IID* rclsid, struct IUnknown* pUnkOuter, DWORD dwClsContext, const IID* riid, LPVOID *ppv );
        CoCreateInstance_t* CoCreateInstance = ( CoCreateInstance_t* ) GetProcAddress( ole_dll, "CoCreateInstance" );
        
        if ( !CoCreateInstance ) {
            set_error( error, audio_error_failed_to_find_CoCreateInstance );
            break;
        }
        
        typedef void CoTaskMemFree_t( void* pv );
        CoTaskMemFree_t* CoTaskMemFree = ( CoTaskMemFree_t* ) GetProcAddress( ole_dll, "CoTaskMemFree" );
        
        if ( !CoTaskMemFree ) {
            set_error( error, audio_error_failed_to_find_CoTaskMemFree );
            break;
        }
        
        HRESULT success = CoCreateInstance( ( const IID * ) &csh_CLSID_MMDeviceEnumerator, NULL, csh_CLSCTX_ALL, ( const IID* ) &csh_IID_IMMDeviceEnumerator, ( void** ) &enumerator );
        
        if ( success != S_OK ) {
            set_error( error, audio_error_failed_to_create_device_enumerator );
            break;
        }
        
        platform->audio_enumerator = enumerator;
        
        FreeLibrary( ole_dll );
        
        success = csh_IMMDeviceEnumerator_GetDefaultAudioEndpoint( enumerator, csh_eRender, csh_eConsole, &device );
        
        if ( success != S_OK ) {
            set_error( error, audio_error_failed_to_get_the_default_audio_endpoint );
            break;
        }
        
        success = csh_IMMDevice_Activate( device, ( const IID* ) &csh_IID_IAudioClient, csh_CLSCTX_ALL, NULL, ( void** ) &platform->audio_client );
        
        if ( success != S_OK ) {
            set_error( error, audio_error_failed_to_active_device );
            break;
        }
        
        csh_WAVEFORMATEXTENSIBLE* device_mix_format;
        success = csh_IAudioClient_GetMixFormat( platform->audio_client, ( csh_WAVEFORMATEX** ) &device_mix_format );
        
        if ( success != S_OK ) {
            set_error( error, audio_error_failed_to_get_the_mix_format );
            break;
        }
        
        g_device_mix_format = deref( device_mix_format );
        CoTaskMemFree( device_mix_format );
        device_mix_format = 0;
        
        csh_WAVEFORMATEXTENSIBLE buffer_format = { 0 };
        /* NOTE simon (27/06/24 16:15:19): buffer_format.Format.cbSize is not the size of the whole structure, only the size of the additional bytes. It should be 22. */
        
#if 0
        buffer_format.Format.wFormatTag = window_WAVE_FORMAT_EXTENSIBLE;
        buffer_format.Format.nChannels = mixer->format.channel_count;
        buffer_format.Format.nSamplesPerSec = mixer->format.samples_per_second;
        buffer_format.Format.wBitsPerSample = mixer->format.container_size * 8;
        buffer_format.Format.nBlockAlign = mixer->format.channel_count * mixer->format.container_size;
        buffer_format.Format.nAvgBytesPerSec = buffer_format.Format.nSamplesPerSec * buffer_format.Format.nBlockAlign;
        buffer_format.Format.cbSize = sizeof( buffer_format ) - sizeof( buffer_format.Format );
        
        buffer_format.Samples.wValidBitsPerSample = mixer->format.bits_per_sample;
        buffer_format.dwChannelMask = mixer->format.channel_mask;
        
        buffer_format.SubFormat = csh_guid_pcm;
#else
        
        buffer_format.Format.wFormatTag = window_WAVE_FORMAT_EXTENSIBLE;
        buffer_format.Format.nChannels = mixer->format.channel_count;
        buffer_format.Format.nSamplesPerSec = mixer->format.samples_per_second;
        buffer_format.Format.wBitsPerSample = 32;
        buffer_format.Format.nBlockAlign = mixer->format.channel_count * ( 32 / 8 ); /* NOTE simon (27/06/24 14:27:42): 4 bytes per sample.*/
        buffer_format.Format.nAvgBytesPerSec = buffer_format.Format.nSamplesPerSec * buffer_format.Format.nBlockAlign;
        buffer_format.Format.cbSize = sizeof( buffer_format ) - sizeof( buffer_format.Format );
        
        buffer_format.Samples.wValidBitsPerSample = 32;
        buffer_format.dwChannelMask = mixer->format.channel_mask;
        buffer_format.SubFormat = csh_guid_ieee_float;
#endif
        
        /* NOTE simon (11/10/23 16:27:02):
        1 reference_time = 100 nanoseconds = 1 sec / 10 000 000
        10000000 is 1 seconde.
        */
        u64 buffer_duration = 10000000;
        
        success = csh_IAudioClient_Initialize( platform->audio_client,
                                              csh_AUDCLNT_SHAREMODE_SHARED,
                                              csh_AUDCLNT_STREAMFLAGS_AUTOCONVERTPCM | csh_AUDCLNT_STREAMFLAGS_SRC_DEFAULT_QUALITY | csh_AUDCLNT_STREAMFLAGS_EVENTCALLBACK,
                                              buffer_duration, 0, &buffer_format.Format, 0 );
        
        if ( success != S_OK ) {
            set_error( error, audio_error_failed_to_initialize_audio_client );
            break;
        }
        
#if 0
        u32 buffer_size_in_samples = 0;
        success = csh_IAudioClient_GetBufferSize( platform->audio_client, &buffer_size_in_samples );
        
        if ( success != S_OK ) {
            set_error( error, audio_error_failed_to_get_the_buffer_size );
            break;
        }
#endif
        
        success = csh_IAudioClient_GetService( platform->audio_client, ( const IID* ) &csh_IID_IAudioRenderClient, ( void** ) &platform->audio_render_client );
        
        if ( success != S_OK ) {
            set_error( error, audio_error_failed_to_get_the_render_client_service );
            break;
        }
        
        platform->mixer = mixer;
        platform->audio_client_event = CreateEventW( 0, 0, 0, 0 );
        
        if ( !platform->audio_client_event ) {
            set_error( error, audio_error_failed_to_create_audio_client_event );
            break;
        }
        
        success = csh_IAudioClient_SetEventHandle( platform->audio_client, platform->audio_client_event );
        
        if ( success != S_OK ) {
            set_error( error, audio_error_failed_to_set_audio_client_event_handle );
            break;
        }
        
        platform->audio_thread_stopped = CreateEventW( 0, 0, 0, 0 );
        
        if ( !platform->audio_thread_stopped ) {
            set_error( error, audio_error_failed_to_create_thread_stopped_event );
            break;
        }
        
        platform->audio_thread = CreateThread( 0, 0, audio_thread, platform, 0, 0 );
        
        if ( !platform->audio_thread ) {
            set_error( error, audio_error_failed_to_create_audio_thread );
            break;
        }
        
        SetThreadDescription( platform->audio_thread, L"Audio thread" );
        
        success = csh_IMMDeviceEnumerator_RegisterEndpointNotificationCallback( enumerator, &g_audio_notifications );
        
        if ( success != S_OK ) {
            log_l( "[WARNING][Audio] Couldn't register audio notification callbacks.\n" );
        }
    }
    
    if ( is_error( error ) ) {
        audio_end( platform, error );
    }
    
    if ( device ) {
        csh_IMMDevice_Release( device );
    }
}

stu void audio_update( platform_t* platform, u32* error ) {
    
    if ( g_audio_restart ) {
        audio_end( platform, error );
        g_audio_restart = false;
        audio_start( platform, platform->mixer, error );
    }
}

Edited by Simon Anciaux on

This is a bit strange code - why ole32 is loaded dynamically? It's not like it can be missing from system. It would be loading like user32 dynamically for CreateWindowEx function. It's not technically wrong thing to do, just pretty useless imho.

Note that you often want/need the audio processing/mixing to be in its own thread, because by default WASAPI will request audio samples every 10ms which doesn't work well if your main loop run every 16ms.

That's not really a problem. You just submit audio frame a two ahead. Just like with DirectSound. The real reason why to put wasapi on thread is so to make latency smaller. Which can be done way better than with DS. But that obviously complicates code and makes you write WASAPI usage very differently - that's why I deleted that gist. It was just not a good way to use WASAPI.

I have a different gist though - https://gist.github.com/mmozeiko/5a5b168e61aff4c1eaec0381da62808f#file-win32_wasapi-h
It wraps WASAPI in a tiny single-header library that exposes very similar api as DirectSound. So replacing DS with this would be very very easy. It's also not the best way to use WASAPI, but is decent enough. Example source on top of gist shows how to use it.


Edited by Mārtiņš Možeiko on