If you are at frame A (frame A is on the screen and sound for frame A is playing), you are computing frame B and sound for frame B.
In option 1: You output exactly 1 frame (16.6 ms) worth of audio each frame. If frame A has 16.6 ms worth of audio, the computation for frame B need to be less or equal than 16.6 ms otherwise there will be a gap in the sound buffer.
gap = length_of_computations_for_frame_B - 16.6
The problem here is that the frame time always slightly changes, even with vsync on. So you generally need to add more samples than exactly one frame.
In option 2: You write more than a frame (lets say 2 frames) worth of audio so that if you miss your target frame rate there will be no gap. If your computation take less then a frame, the next frame will also write 2 frames but will overwrite the last frame that was outputted. The overwrite reduces the audio latency.
During frame A we output for B and C.
B hit the framerate, so it output for C and D. C is overwritten.
C miss the framerate, so it can't overwrite D because D started playing. It will only output E.
A - -
B - -
This solution is not always possible. In direct sound I believe you can overwrite the sound buffer, but in WASAPI
In option 3: You always output one frame ahead. While playing frame A you output frame C, during frame B you output D... The first frame needs to fill 2 frame worth of audio. There is always one frame of latency.
In option 4: The audio processing is separated from the image processing. You fill the audio buffer with an amount of audio (could be less than a frame). When the buffer as been played, you fill it with the next values. This can be totally decoupled from the frame rate, you could for example update your audio 100 times a second. This solution has less latency. Remember that when playing a sound it will take more then a frame to be played so you "know" what to put next in the buffer.