Audio along with frame

I may be beating a dead horse here; I tried looking at older threads that talk about the audio stuff, but didn't find anything that matches 1:1, so here goes.

Casey's style of game development comes from the desire of squeezing all the possible power out of the target device, which is a noble goal. Personally I like to keep things a bit more relaxed, and don't care about frame rate that much (unless it becomes a problem), or what the screen resolution happens to be (which leads to not caring about pixel alignment). I also keep game physics and graphics refresh separate, because otherwise things wouldn't simply work with random frame rates =)

That said, I understand the desire to be frame-rate locked; I was coding real time stuff back on DOS days and we used palette changes to draw raster bars to show how much time certain functions took, and wanted to keep everything running at full frame rate. In many ways, move to windows was a step backwards.

So when doing frame locked stuff, being able to play audio on a frame-by-frame basis feels really desirable. However, it's not really feasible on current systems (especially if you want to go cross-platform).

60Hz means 16.6ms a frame (or 735 samples at 44.1khz). I've seen some article that claims that current systems are capable of 1-5ms audio latency, but in practice I've never seen this. The best I can do with ASIO drivers seems to be around 6ms (256 samples), and that's with dedicated, non-shared audio hardware. With shared hardware the latency always seems to be 20+ms (1024 samples). Requiring dedicated audio hardware for games on windows is a no-no.

DirectAudio on windows vista and later is not direct - it's a simulated layer over WASAPI. Using WASAPI directly doesn't help too much though, since you will want to keep it in shared mode.

I've understood that linux situation may be a bit better, but mobile platforms have it worse. Streaming through OpenAL is pretty much not done (or well, you CAN do it, but expect HUGE latency).

I've been working on a little open source audio engine called SoLoud, which pretty much breaks all the performance rules at the moment (uses mutexes, may do disk i/o on audio thread, etc). I typically keep the audio buffers pretty large (at 2048 samples, or 46+ms), as the audio may not break

To solve the audio latency issue I'm actually delaying the sound more.

Since sound moves slower than light, we can get away with some audio latency, but the coherence of visuals and audio are still important. My solution to this is to make sure that sounds that are triggered 5ms apart also start 5ms apart, by delaying the start of the second sound by 5ms (or about 220 samples at 44.1khz).

This is enough for most uses, but I wouldn't want to try to play drums with that.. =)
6ms, or even 33,33333 ms is WAY faster responstime than the human brain. For playing back recorded drums, no problem. For playing drums, while you play, also no problem.
Kladdehelvete
6ms, or even 33,33333 ms is WAY faster responstime than the human brain. For playing back recorded drums, no problem. For playing drums, while you play, also no problem.

That's funny, because playing synths with 5ms latency "feels" much better than playing them at 30ms.
Yeah but in that case is it not a computer producing the sound? And then you can queue it, so this solves it?

I don't know much about it. I find the topic very interesting though.

But fact is that DirectShow plays back sound more than fast enough for me as the user of my own game to not perceive any delay. I have created this little TILEMAP game with some helicopter sprite flying around and shoting at tiles. (To test and play with TILEMAPS).

And I find that I have to slow down the playback of sound, or they will overlap so
quickly it will not be good. And this is under GDI, where each frame the entire HD screen is cleared for no reason, then the TILEMAP is written, also redundantly as 80x40*2 calls to the custom FillRect code, just to get a little border around tiles. And then the loop sleeps for the rest of the frametime. In other words, I could easily nullify most of the drawing code.

So even when you use highlevel functions of DirectShow, to play many simultaneous sounds, the delay is not even perceivable. The human brain cannot accurately perceive sound at a frame boundery. It needs something like 3-5-6 frames (~100-190ms /) until it will detect that sound has changed. So I dont feel we have any need to worry.
And if I wrong in thinking this I would be happy to be corrected. This is just my 2 cents.
Kladdehelvete
Yeah but in that case is it not a computer producing the sound? And then you can queue it, so this solves it?

The synth is running on PC. I have a USB midi keyboard hooked to the PC.


Kladdehelvete
I don't know much about it. I find the topic very interesting though.

When playing slow instruments (like evolving pads), the latency can be hundreds of ms without an issue. With piano, I have no problems with 50+ms, but I can imagine someone who has played a real piano a lot feeling the latency already. Drums are impossible with high latency even for me.


Kladdehelvete
So even when you use highlevel functions of DirectShow, to play many simultaneous sounds, the delay is not even perceivable. The human brain cannot accurately perceive sound at a frame boundery. It needs something like 3-5-6 frames (~100-190ms /) until it will detect that sound has changed. So I dont feel we have any need to worry.

Curious. I hadn't even considered using DirectShow for game audio =)

Anyway, the need for delaying the sounds comes from looking at the alternative: only triggering sounds at the start of a new buffer, in which case sounds get "clumped" together. Here's a short vid showing the difference of these methods:

[video]https://www.youtube.com/watch?v=Qt79F8NRLcE[/video]
(youtube link)
sol_hsa

Curious. I hadn't even considered using DirectShow for game audio =)


I never made games before. I know nothing of them. But now it's been like 3-4-5 days or so, with this "tileshooter", and it's been extremely fun. I am like: "why idn't I do this before"?

sol_hsa

Anyway, the need for delaying the sounds comes from looking at the alternative: only triggering sounds at the start of a new buffer, in which case sounds get "clumped" together. Here's a short vid showing the difference of these methods:

[video]https://www.youtube.com/watch?v=Qt79F8NRLcE[/video]
(youtube link)


Can you explain the video in a little more detail? When you say "buffer", do you mean the DSSoundbuffer?

Is the only difference between (3) and (4), the offset? That (3) is mixed into the same buffer, at different sampleoffsets? But what happens in (4)? Overwriting the buffer? I am not sure I understand. If I mixed 2 identical sounds to one DSbuffer, at the same sampleoffset, I would expect the volume to go up, or just have one sound. Not noise, unless I missed the playcursor.

It sound like (4) has more sounds. If the difference is just the offset, then I understand why you would not consider directsound :) As it has, I guess, this overlapping noise, because they fire so often?

But isn't Directshow supposed to take care of the mixing, and the noise I hear with my tests, are just amplitude noise, comming from the fact that there just to many (can be over 100, at least 50) of the same sounds.
Kladdehelvete
I never made games before. I know nothing of them. But now it's been like 3-4-5 days or so, with this "tileshooter", and it's been extremely fun. I am like: "why idn't I do this before"?

=)

It's too bad games are sufficiently complex that you can't really start programming from a game project. Not saying you're a beginner. Just an explanation why we don't teach programming through game development.



Kladdehelvete
Can you explain the video in a little more detail? When you say "buffer", do you mean the DSSoundbuffer?

No. This has nothing to do with any single specific audio interface.

The way audio generally works is that you have two (or more) buffers of audio, on of which is being played and the other(s) can be written to. That means that from the application's point of view a new sound can only start from the beginning of the next buffer, earliest.

This is true, as far as I know, for directsound as well (which, as I've mentioned, isn't "direct" in post-vista world) - the hardware will get a buffer of N samples at a time and won't be smoothly reading a circular buffer, as neat as that would be.

The difference (in SoLoud) of play and playclocked is that "play" always starts the sounds from the beginning of the next buffer, while playclocked may delay the start by N samples based on the time provided by the game engine.

So to illustrate:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
play:
  [buffer][buffer][buffer][buffer]
  Xxxxx
  Xxxxx
  Xxxxx
  Xxxxx
          Xxxxx
          Xxxxx
          Xxxxx
          Xxxxx
playClocked:
  [buffer][buffer][buffer][buffer]
  Xxxxx
  ..Xxxxx
  ....Xxxxx
  ......Xxxxx
          Xxxxx
          ..Xxxxx
          ....Xxxxx
          ......Xxxxx

Edited by Jari Komppa on
sol_hsa
Kladdehelvete
6ms, or even 33,33333 ms is WAY faster responstime than the human brain. For playing back recorded drums, no problem. For playing drums, while you play, also no problem.

That's funny, because playing synths with 5ms latency "feels" much better than playing them at 30ms.


Actually, latency above 20 - 25 ms for instruments with fast attack would be considered unplayable by musicians. This comes from having worked on and programming audio plug-in effects and synths.

Even looking at the YouTube video that was linked, the latency of the play() examples were very noticeable -- and that was at 46ms latency unless I'm mistaken. That was just viewing it too, not actually playing/interacting with it. So I would argue that that amount of latency is not acceptable in games.
Flyingsand
Even looking at the YouTube video that was linked, the latency of the play() examples were very noticeable -- and that was at 46ms latency unless I'm mistaken. That was just viewing it too, not actually playing/interacting with it. So I would argue that that amount of latency is not acceptable in games.

Noted.