Wednesday, 11 July 2012

Fourier: A Transformation

I was about to return to something vaguely audio programming related, but soon realised that there's a lot of explaining of theory to do.  My current projects on the go include a matrix and vector class (finished mostly), a linear algebra project (almost done), and a neural network project (have only done the research for).  The idea behind this huge spread is to pick up lots of different skills.  I've so far got a basis in data structures and algorithms, and Windows and GUI programming,  and I want to be confident in numerical methods, and maybe some areas of games programming (AI especially).  I also have a plan to get to grips with another programming language, maybe Python.

The Fourier transform is a great of example of where more advanced maths has real application.  The Fourier transform is used in all kinds of things, especially when it comes to audio.  A more efficient algorithm to do a Fourier transform has had (and will have if even more efficient algorithms are discovered) a widespread effect.  For example, one impact that was brought up in a NewScientist article is it would benefit mobile phones, because in image compression Fourier transforms are used, and in mobiles less processing equals faster image uploads and longer battery life.

The Fourier transform - taking you from the time domain to frequency domain since 1807
So far I haven't explained what a Fourier transform is.  To give an overview, the Fourier transform will take an time based signal (for example, audio) and tell you what frequencies are in the signal.  In posh words, it transforms from the time domain to the frequency domain.  Starting from the top, a transform is like a function of functions, which maps many to many.  For the non-mathsy people the transform is a process that deals with a whole set of data, and acts on this entire set at once to produce a whole new set of data.

Every signal, from the humble square wave to "Stairway to Heaven" is made up of very simple building blocks - sine waves, which are of different frequencies - think pitch - and amplitude - think loudness.  If we had a pure sine tone at 440Hz, then if you ask what the frequency is of this tone, it's quite clearly 440Hz, no questions.  But if you play a concert 'A' on a piano, then the answer isn't nearly as clear.  As human listeners, the dominant frequency is 440Hz, but what makes the piano sound so nice and piano-ey is all the "harmonics" or "overtones" - sine waves at multiples of the fundamental frequency, which are quieter tones mainly of higher frequencies, produced at the same time.  So in fact, it's not frequency but frequencies that we can look at, and we want to know how much of each frequency there is.  Just remember that whenever "frequency" is mentioned, we're talking about a pure sine wave of that frequency, and how much that sine wave contributes to the "final" signal.

When mathematicians look at Fourier transforms the transforms are continuous - they transform a continuous (smooth) function of time to a continuous function of frequency and it uses calculus to do the dirty work. This works in theory but is not generally done algorithmically.  When engineers do an Fourier transform it will be discrete - there will be a finite number of values (for example a set of audio samples)  and will produce a finite amount of frequency data.  The discrete Fourier transform can be abbreviated DFT.

Perhaps one of the best ways to look at how the Fourier transform works, is to drop all the fancy maths and complicated language.  For now, think about us and our ability to tell different pitches, so frequencies apart.  We're also good at telling how loud these different pitches are.  To do this, we use our ears.  Our ears are biologically doing a Fourier transform and sending the results to our brain.  In our ears are loads of tiny "cili" like little hairs.  The hairs are of different lengths.  When a frequency that matches the length of a certain hair enters your ear, this cili resonates and so bounces around relative to how loud this component is.  Then your brain gets the message, this hair is bouncing around this much i.e. this frequency is present and is "this" loud.  So in the simplest terms, when we're doing a discrete Fourier transform, I see it as picking frequencies and see if they "resonate" with the signal, much like the ear does when we listen out for sounds.

I'm going to avoid the maths definition for a while, and move to pseudo-code.  I feel like a lot of the time, an equation on its own is completely unintuitive; I need to explore it more to get what's going on.  With (pseudo)code, I can see what's going on much better.

dft(in[], out[], transform size) :
for every frequency "band" from 0 to the transform size :
    initialise current band to 0 + 0i;
    for every sample in the transform (from 0 to transform size) :
        current sample = in[sample number]
        // real part
        add current sample * cos(2pi * band number * sample number / transform size)
        to the current band
        // complex part
        subtract current sample * i*sin(2pi * band number * sample number / transform size)
        from the current band
    divide current band by transform size // optional
    out[sample number] = current band // complex output

Hopefully the code isn't too cryptic.  I go for clarity whenever I can, but it's hard to know how other will intepret it.  Maybe the most confusing part is the cos and i*sin part.  Or the fact complex numbers have crept in.  Basically, the Fourier transform "by default" works with complex signals (signals with a real and imaginary part, or in layman's terms, a signal that has two seperate parts, like a left and right channel in stereo).   However, audio is a "real" signal (one part to the signal) and so a half of the transform will simply be redundant.  You'll see when I get some actual output.

The mysterious cos and i*sin are going back to this idea of resonance.  If a component in the audio signal matches up closely to the sine wave, as we iterate through that inner loop the two components, when multiplied together add up to a large value.  If no component matches that sine wave then the result will be 0, or very small.  A broad overview is we split the possible frequencies into bands.  In  the outer loop we go through each band in turn, and in the inner loop we accumulate the amount that component appears in the signal.  Below is some actual output based on a square wave at 4.4kHz.

Frequencies found in a 440Hz square wave, I converted the spectral bands to frequencies (Hz)
In the output above, I used a transform of size 1000, so the line graph contains 1000 individual points.  Although the square wave only contains exact odd harmonics (multiples of the fundamental frequency) because of the way the transform works, we can only test for bands of frequencies.  With a larger transform size then the graph will look slowly more and more exact as we narrow down the frequency bands we search in.

One thing that confused me is the relationship between the number of samples we look at and the number of spectral bands.  It turns out, this is the same number.  If you do a Fourier transform on 50 samples, you'll produce the results for 50 frequency "bands" (although remember in audio half of these frequencies are redundant).  So if you take a larger transform, you'll get a much larger number of frequencies to test against, or finer bands of frequencies to look at.

Another confusing part is how the different bands correlate to actual frequencies.  First remember, the maximum possible frequency in a discrete audio signal is the Nyquist limit, or half the sampling rate.  Each frequency band is basically a multiple of the sample rate divided by the transform size.  In a transform of 10 samples, the 0 band is 0Hz (called the DC band, it's like an overall amplitude), then band 1 is 44.1kHz / 10 = 4410Hz, then band 2 is 8820Hz, then band 3 is 13.23kHz, band 4 is 17.64kHz and band 5 is 22.05kHz (the Nyquist frequency).  This is an important one because we established the Nyquist frequency is the upper limit.  However, what happens to the next 4 bands?  This is where there is redundancy.  Band 6 is identical to 4, band 7 is identical to 3, band 8 is identical to 2, and 9 identical to band 1.  No band corresponds to band 0.  So in effect we've got a mirror image after the Nyquist limit.

Below is the same code, this time C++.  Hopefully this has allowed you to get some intuition as to what's going on.

// the dft takes a real input signal *in of size in_size and outputs
// spectrum *out with n pairs of complex values [real, imaginary]
void dft(float *in, float *out, int transform_size)
    int curr_band, curr_sample, i;
    float tmp;
    for(i = curr_band = 0; curr_band < transform_size; 
        curr_band++, i += 2)
        out[i] = out[i+1] = 0.f;
        for(curr_sample = 0; curr_sample < transform_size; 
            tmp = (float) (curr_band * curr_sample * twopi / transform_size);
            // real part
            out[i] += in[curr_sample] * (float) cos(tmp);
            // imaginary part
            out[i+1] -= in[curr_sample] * (float) sin(tmp);
        out[i] /= transform_size;
        out[i+1] /= transform_size;

However, this isn't quite the end.  In it's given form, the DFT, is not entirely convenient.  It works well enough for small inputs but is too inefficient for bigger transforms.  In big-O notation, it is a O(n^2) algorithm (looking at the code should make it clear with the two "for" loops).  When I tried it on audio input around 20 seconds long, it understandably took several hours to finish.  Fortunately the Fast Fourier transform (FFT) comes to the rescue.  This is a clever algorithm, which splits the task up into lots of smaller ones, although I currently don't understand the details, for a good implementation look to libraries, such as FFTW.  At some point in the future I'll look at understanding it, but for now, I kind of use it as a black box.  However, as long as I know the basis of the black box and what it does, I should be okay!

Finally, there is an inverse operation, called the inverse Fourier transform, which is a very similar process.  I'll cover it in the next post though.

No comments:

Post a Comment