Friday, 17 August 2012

A Convoluted Approach

I've realised that this blog has moved further and further away from audio programming, and so hopefully I'll get back on track with this post.  I've decided to follow up on a little project I did ages ago, but it was probably the coolest audio thing I've done so far.

The convolution is an operation commonly found in signal processing, and there are several ways of trying to understand it.  The most intuitive to me is, in the words of Wolfram (Mathsworld), "A convolution is an integral that expresses the amount of overlap of one function 'g' as it is shifted over another function 'f'.   In my words the convolution is equivalent to reflecting and sliding one signal along the time scale, and the convolution function is the area of the overlap at every point.  This however is much better seen than described, head over to this website for an interactive demonstration, and wikipedia has an animation.  For those unaware of calculus, when you hear integral, this just means mathsy way of saying getting the area.

Getting this overlap between two signals is useful because it allows us to blend two signals beautifully together.  This makes the convolution useful for creating reverb and filters.  Another important property is the convolution in the time domain is equivalent to multiplication in the frequency domain (and vice versa), which kind of confirms the idea of blending the signals.  In simpler terms the convolution of two audio signals is equivalent to multiplying all the corresponding frequencies.

This means a convolution can be done by taking the frequency spectra of the two signals we're convolving.  Remember that the spectra simply say how much of each frequency there is in the signal - if this is still new to you, go here.  This bit is done with a Fourier transform.  Every corresponding component of the two spectra is multiplied together to produce a new spectrum.  This spectrum is then converted back to a signal with the creatively named inverse Fourier transform.

The project itself was done in C (I'm trying to do about half my audio stuff in C, the other half in C++ as part of a balanced diet) and basically a showcase for all the different methods of convolution.  The first method was using my own DFT algorithm, the FFTW FFT algorithm, and a tapped delay line method (based off another intersting property of the convolution).  Getting the DFT up and running wasn't too painful, but there were plenty of small issues along the way.  The FFT method was harder in that I was now using an external library.  Fortunately the tapped delay line was really easy and I already had two working methods to debug it against.

The main issue I had was when I used large inputs, the convolution resulted in very quiet outputs.  So quiet, that all the samples had gone beyond floating point precision and came out 0.  However, if didn't normalise the samples would all be massive and so just clip to +-1.0.   The solution was to not normalise the transforms, and instead normalise the audio file at the very end.

Below is some output I got from a convolution reverb using my program, where you convolve an audio signal (a drum solo from "Blues For Tony" by Allan Holdsworth) with an impulse response.  The impulse response is basically a really short loud noise played in a big hall or something.  The "response" (the echoes and results of this noise) are then recorded, and this basically is a way of recording the acoustics of the room.  The impulse could be a balloon popping, or a starter gun.  The result of the reverb actually really took me by surprise; it convincingly sounded like it was being played in the big room the impulse response was apparently taken.  Up to this point in time, this is definitely the most exciting thing I've done with programming!

The process of implementing this was also an eye opener in terms of algorithms and how they do matter.  The crucial element of convolution is the Fourier transform, and while a slow DFT (i.e. mine) is O(n^2), the FFT is O(nlg n).  For audio, in just 10 seconds of audio you could have 441,000 samples, at which point the FFT is already hugely more efficient.  Already it could mean the difference between hours of waiting or a few seconds.

Finally, there are many more resources if you're knoweldge hungry.  This website also had good explanations.  Although I can't post the contents, the "Audio Programming Book" has been a thorough and guiding resource for this as well.

No comments:

Post a Comment