On the small effects of not blending audio samples

Miguel Lechón

#4142

July 5, 2015

I've been toying around with the audio mixer to see if I can get some intuition on why the linear blending of samples doesn't make much difference. It turns out that it actually does make a small but perceivable difference in some cases. If you take day 146's source code, silence the piano music and play "bloop_01.wav" at dsample=0.8 with no sample blending, you will hear an extra low-volume, high-pitched electric ringing along with the original audio that is not present when interpolation is used.

To illustrate the origin of the high-pitched component I will be using a 440 Hz pure sine wave instead of the more complex bloop sound. If you play that wave through the HMH mixer at 80% of its original speed (dsample=0.80), with and without linear blending, grab the outputs and superimpose them, you get the top plot in the following image (x axis: sample number, y axis: volume(-32768, +32767), green: blended, blue: nearest sample). The bottom plot shows the difference between the two waves.

As we've already seen in the HMH stream, mixing two sounds can be achieved by adding them. By that same logic, the non-blended sound can be seen as the combination of the blended sound plus the sawtooth wave at the bottom (note the change of vertical scale, though), which is the high-pitched sound we were looking for.

The frequency of the sawtooth wave comes from the aliasing generated by sampling every 0.8 samples. We're taking samples at positions 0, 0.8, 1.6, 2.4, 3.2, 4.0 ..., which means we're hitting one original sample out of every 5. With a sampling frequency of 48000Hz, that amounts to oscillating at 48000/5 = 9600 Hz.

Since the frequency of the input sound is much lower than the 48kHz sampling frequency, the volume of the sawtooth wave wave is mainly determined by the 16-bit quantization of the source file, so it's bound to be low.

The original 440Hz wave lacks a 9600 Hz component, which makes the sawtooth wave easy to spot. However, in a less structured, more frequency-rich sound environment, this extra component is likely to end up masked beyond perceptibility. That is not the case, however, when we only play the bloop sound, whose frequency spectrum is silent in the 5kHz-11kHz range.

Flyingsand

#4143

July 5, 2015

I also examined this recently, partly because I was a bit puzzled as to why Casey said he couldn't hear the difference between the linear interpolation and the truncation. In my experience with audio programming, I've never found truncation to be acceptable at lower sampling rates (the sampling rate needs to be at least 96 kHz). In fact, I hear a huge difference -- there is very noticeable aliasing noise when pitch shifting up or down using truncation.

Expanding a little on your sine wave illustration, here is the frequency spectrum of an 880 Hz sine wave at 48 kHz pitch shifted by 0.8 using linear interpolation:

And here is the same sine wave pitch shifted using truncation:

As you can see with lerp, the aliasing is very minimal and <= -72 dB. With truncation, however, the aliasing noise is 6 (!) times as loud (6 dB == 0.5 in linear space) with a lot of extra noise. It sounds like buzzing and has an inharmonic ring to it.

Aliasing is a very common and crucial problem to deal with whenever non-linear operations are involved (anything that can add frequencies to a signal) because (1) it sounds awful, and (2) once it has been baked into a signal, it's practically impossible to get rid of.

It's closely related to the Sampling Theorem which states that you can only represent frequencies up to half the sampling rate. Frequencies beyond this will be mirrored around the sampling rate (i.e. fold over and become aliased). The best analogy I have come across to visualize this is the wagon wheel effect in film. If the wheel is spinning fast enough in relation to the frame rate, it looks like it's turning backwards.

Edited by Flyingsand on July 5, 2015, 12:57pm

Casey Muratori

#4145

July 5, 2015

Awesome! Very glad to see that lerp was not wasted effort :) It may just be that I am not using good enough headphones on the HH machine to hear some of the problems, or it may be that I'm too old and my hearing isn't good enough anymore :P Either way, I remember lerp being important in the past when I did audio programming, so that's why I put it in there, and so it's good to know that at least someone can hear the difference.

- Casey

Flyingsand

#4148

July 5, 2015

The quality of your headphones (or speakers in general) will definitely affect your ability to hear subtle audio problems. I would say this is much more likely the case than your hearing. :)

In fact, with really good speakers, you can even hear the extra noise from the lerp on the sine wave. Of course with music and more complex sounds, the signal-to-noise ratio is high enough for this not to matter on HmH.

Casey Muratori

#4150

July 5, 2015

Yes - on my dev machine at work I only use Etymotic ER4p's and you can hear every single bug on those. But on the HH machine I use Etymotic's much cheaper phone headset, since I need the microphone, and it may not have as good frequency response (in fact I'm sure it doesn't).

- Casey