I may be beating a dead horse here; I tried looking at older threads that talk about the audio stuff, but didn't find anything that matches 1:1, so here goes.
Casey's style of game development comes from the desire of squeezing all the possible power out of the target device, which is a noble goal. Personally I like to keep things a bit more relaxed, and don't care about frame rate that much (unless it becomes a problem), or what the screen resolution happens to be (which leads to not caring about pixel alignment). I also keep game physics and graphics refresh separate, because otherwise things wouldn't simply work with random frame rates =)
That said, I understand the desire to be frame-rate locked; I was coding real time stuff back on DOS days and we used palette changes to draw raster bars to show how much time certain functions took, and wanted to keep everything running at full frame rate. In many ways, move to windows was a step backwards.
So when doing frame locked stuff, being able to play audio on a frame-by-frame basis feels really desirable. However, it's not really feasible on current systems (especially if you want to go cross-platform).
60Hz means 16.6ms a frame (or 735 samples at 44.1khz). I've seen some article that claims that current systems are capable of 1-5ms audio latency, but in practice I've never seen this. The best I can do with ASIO drivers seems to be around 6ms (256 samples), and that's with dedicated, non-shared audio hardware. With shared hardware the latency always seems to be 20+ms (1024 samples). Requiring dedicated audio hardware for games on windows is a no-no.
DirectAudio on windows vista and later is not direct - it's a simulated layer over WASAPI. Using WASAPI directly doesn't help too much though, since you will want to keep it in shared mode.
I've understood that linux situation may be a bit better, but mobile platforms have it worse. Streaming through OpenAL is pretty much not done (or well, you CAN do it, but expect HUGE latency).
I've been working on a little open source audio engine called SoLoud, which pretty much breaks all the performance rules at the moment (uses mutexes, may do disk i/o on audio thread, etc). I typically keep the audio buffers pretty large (at 2048 samples, or 46+ms), as the audio may not break
To solve the audio latency issue I'm actually delaying the sound more.
Since sound moves slower than light, we can get away with some audio latency, but the coherence of visuals and audio are still important. My solution to this is to make sure that sounds that are triggered 5ms apart also start 5ms apart, by delaying the start of the second sound by 5ms (or about 220 samples at 44.1khz).
This is enough for most uses, but I wouldn't want to try to play drums with that.. =)