Let's also keep in mind that audio and visuals are not equal: frame or two lag in audio is very common in everyday life, as the sound travels only about 34 cm/ms, after all. So audio is always lagging visuals in reality, and perception has evolved to cope with this. One can experiment with this by playing some talking head video in VLC, for example, and fiddling with synchronization settings (tools>synchronization) -- even a slight lead in audio will feel very awkward, but quite a considerable lag (a few tens of ms) can be tolerated (getting used to and forgotten after a while, not that it's preferable either, but not as bad).
Casey's current scheme of avoiding audio lead (on yet hypothetical minimal lag systems/libs) before the frame is shown, and otherwise with minimal extra lag, seems nice from this perspective. Especially if this kind of synchronization helps to keep the game code simple -- but we haven't got to game code yet, so no idea about it's implications yet. But I'm afraid there will be glitches when one starts testing this: try simulating a minimal lag system by lowering the target frame rate (increasing the frame time larger than the current lag), and also with variable simulated loads, and see if the game code really doesn't need to care...