Wednesday, 31 October 2012

Return to the Mandelbrot Win32 API Project

For a presentation that I've got to do for university I've returned to the project I did writing a Windows application to show the Mandelbrot set.  Because I couldn't cram everything into 7 minutes of Powerpoint presentation, I wrote up a little background.

The document can be found here.

The code can also be found here.  Enjoy!

Wednesday, 17 October 2012

The Magic Behind the Phase Vocoder Part A

Hopefully you've seen my post on my phase vocoder project.  Otherwise there won't be much motivation for this slog at the theory; when you have this much maths and theory there better be a good reason for it.  Fortunately, the phase vocoder can do some extremely cool stuff, like shift the pitch of some audio while keeping it at the same speed. If you haven't read about Fourier's theory, or the Fourier transform, look back to simpler times.  Without understanding this you really won't understand the next bit.

A FFT is pretty useful for breaking down the frequencies in a snapshot of audio.  However, a snapshot is just a snapshot - capturing an instance in time.  Generally we'd be talking a transform size of 1024, up to maybe 4096 samples.  You might wonder why you can't just take a transform of an entire 3-4 minutes of audio, and then use that for all your spectral needs.  While that transform will have astounding number of spectral points - i.e. really super "spectral resolution" it will have awful "time resolution" - i.e. you have no way of seeing how the frequencies change over time.  You wouldn't be able to pick out the distinct thumps of a bass drum or the squeal of a 30 second guitar solo, because it's squished down into a generic spectral mush over the whole song.

What would the intuitive next step be, if we want to see how the frequencies change in time?  What about taking two transforms, with the first transform of the first half of the audio and the second transform the second half.  Now we have half the number of spectral points but we do have some time resolution i.e. can see how the spectrum changes over the audio.  Moving on, the natural progression is to make the transforms smaller and have more of them.  You could make the transform size 1024 samples big and then just split the audio into tiny sections.  After taking loads of Fourier transforms, you end up with a load of spectral data.  You will now be able to see how the frequencies change over time, with good enough time and spectral resolution.  This is the beginnings of the STFT - the Short Term Fourier Transform.  A good visual way to imagine this process, is as a spectrogram (often a set of bars that light up, indicating frequency levels) on a CD player or in a Digital workstation.

It would be nice if the STFT was as simple as slicing up audio and shifting the magical FFT along, but there are catches.  Because the FFT abruptly slices audio, but then assumes that snapshot is periodic, you get some nasty spectral leakage.  The phenomona is illustrated below.

How the Fourier transform mauls your signal
Spectral leakage is the bane of DFT processing.   Discontinuities are pretty much guaranteed when transforming an arbitrary signal.  There is no perfect solution to this, but there is a very good one.  Often you'll here mention of the "transform window"; the window is the samples that the DFT is currently “looking at”.  Leaving the samples alone, this window is called a rectangular window.  Ultimately we want to get rid of those jumps at the sides of the window.  To do this we use a windowing function, a function designed to smooth out discontinuities while not affecting the frequencies too much.

To apply the window you simply multiply the input sample with the corresponding sample of the windowing function.  There are many other windowing functions e.g. Blackman, triangular, Gaussian.  The two most popular windowing functions are the Hann window and the Hamming window.  They’re almost identical, apart from a slightly different constant.  The formula and pictures of what they look like are given nicely on Wikipedia

Windowing however ruins the audio a bit, and on reconstruction the audio will sound like it's coming in and out really quickly (amplitude modulation), not really what you wanted.  The solution is to overlap the snapshots enough, so on reconstruction the original signal can be rebuilt fine.  For most windows the overlap required is an overlap of 75%.  Often, for the STFT we refer to the "hop size", i.e. how many samples you "hop" the window over each time.  If N is the transform size, a common hop size is N/4 or maybe, if you can affort the extra data, N/8. 

The STFT, pictorially (drawn badly)
The STFT is useful for some spectral things, for example the cross synthesis project I did earlier, but not capable of shifting pitch or stretching time without distortion.  However, with only a small modification of the spectral data, we can – this operation is the heart of the phase vocoder algorithm.   All will be revealed in the next installment.

Saturday, 13 October 2012

DSP Notes

I find that when I learn stuff, a good way to not forget the stuff is to write notes.  I write notes on the things that took a long time to get, the things I found interesting and the answers to the questions in my head at the time.

As a result, I've written a pretty ridiculous amount of notes for all the stuff I've learnt on Digital Signal Processing (DSP) - about 10,000 words or 37 pages, done in a window of extreme boredom in the summer holidays.

I'm not sure how useful these notes will be to anyone else, as the notes won't be as thorough as a well written textbook.  Anyway, here they are.  Enjoy.

Programming a Phase Vocoder and University Life

I've recently moved to university (college for Americans) and so life has been a bit hectic for the last 3 weeks.  The course is great (electronics) and I'm sure now that engineering suits me better than computer science.  Anyway, the result of settling is I've done hardly no programming, apart from the introductory C labs they give to everyone.

A side note is I can appreciate how hard the programming aspect will be to a complete newcomer.  I'm not completely sure about lectures on programming and how much people will get out, but who knows?  The practicals seem to throw people in the deep end a little, and there doesn't seem to be much critical feedback - the staff are teaching good style and methods, but I'm not sure what will stop people developing bad habits. 

The lack of activity will finally let me catch up to where I am in (audio) programming.  I've only just briefly started looking at Csound, which is kind of like learning a new language in itself.  The last major project I did though was creating a phase vocoder.  This was probably the most exciting thing I've done with C++.

In short, the phase vocoder takes breaks some audio down into the frequencies that make it up (spectral data), using a STFT, and then converts the data to a form which means the data is independent of time.  Once you have data independent of time, you can play the audio back at any speed you want and keep the pitches all the same (time stretch).  Once you can time stretch, you can resample, bringing the audio back to speed but now with the pitch raised or lowered.

Maybe as motivation for the theory, here is the output of my phase vocoder on some audio samples.
I'm going to follow up with the theory and pseudocode in the near future.  Another pretty cool application of the phase vocoder is performing a "spectral freeze", where you hold onto a single spectral frame of data.  The result sounds pretty unnatural.  The phase vocoder can also cross two spectra, and even interpolate between spectral data.

I got a fair number of hitches in the design process.  "The Audio Programming Book" has quite a compact implementation, in typical C style.  I wanted to use C++, and I wanted to create something much more general, and expandable.  I created classes to encapsulate audio signal data and spectral data - this worked well because it followed the RAII (Resource Allocation Is Initialisation) principle, and as a result I got no memory leaks in my program.  However, this layer of complexity did have a performance impact every time you wanted to actually access the data.  The solution was to make a function that passed the internals out (sounds like bad practice but I'm pretty sure it's okay for that kind of object) and allow the buffer array to be accessed normally.  I also had a class for the phase vocoder itself,

Once I'd fudged something together, obviously the output was nothing like it was meant to be.  I searched for more answers online, but was met mostly with inpenetrable articles from journals.  I intensely debugged my code, fixed some problems but still wasn't getting output of any value.  Probably the most frustrating part of debugging is when you find a bug but then the program still doesn't work.

In true programmers' fashion, after testing every function in turn and stepping through almost every line, comparing different outputs with manually calculated values I pinned down the problem and slapped myself.  In the tiny function "phase_unwrap", which brings the phase (just an angle in radians) down to the range -pi to +pi instead of adding / subtracting twopi, I was adding / subtracting pi.  This changed the phase and this tiny change was ruining the output.  I guess this acts as yet another reminder to always, always, check every single function you write and run some expected input and output through.  The moment you assume a function is too trivial to get wrong is the moment you set yourself up for hours of debugging!

Thursday, 20 September 2012

Trying to Understand Laplace('s Transform)

The Laplace transform is a commonly used tool in engineering and digital signal stuff e.g. filter design.  I think my first encounter with it was watching a Khanacademy video.  The title looked cool, I was procrastinating for an exam, and I raised myself the challenge of trying to understand it.

The khanacademy video is brilliant, and Khan is an excellent teacher.  For someone who's been exposed to a lot of calculus in school, given the basic transform rule, it's just a matter of integration (by parts usually) to derive any of the transforms and rules.  The process of doing it clicked straight away.


However, the only thing lacking was a more fundamental explanation of what the transform actually represents.  What is the mysterious "s-domain"?  It just comes across as some magical tool that somehow simplifies evil differential equations into pathetic algebraic ones. I like to have a deeper understanding of difficult concepts, because once they're broken down I can relate them to other concepts I already understand.  I also like to think, how did the inventor(s) come to invent this formula or process?  To me this makes it much more intuitive.

More recently I came across the Laplace transform and even weirder z-transform again in filter design.  So I decided to really hunt for a better explanation for the Laplace transforms.  I was met with many forum posts with indecipherable replies with mentions of "infinite vector space" and "complex frequency domain".  I'm sure they're perfectly valid explanations, but as a simple minded engineer I like things broken down into more English and relatable terms before I can go on to describe it mathematically.  There was really one video - an MIT lecture - that blew my mind and answered all the questions.


(There are two parts to the video, this is the first part)

This video connects Laplace transforms with the more straightforward power series.  When I questioned myself on why a power series should work I realised I didn't know, so I had to backtrack a little.   After quite a few hours though I am now satisfied with the explanations of power series, Laplace transforms and z-transforms I found.  I compiled it all into a single document which I'll use later to prompt my memory.  I however release it into the wild for similarly confused people to share!

Here it is.

Thursday, 13 September 2012

if (programmers = obsolete); replace();

I've been slogging at some phase vocoder code for the last couple of days, and pretty much picking out all the bugs.  Bugs in digital signal processing code, especially audio stuff, tend to make themselves heard,  which is good in many ways.

It's human to make mistakes.  It's human to be slow at computation.  But our brains are amazingly fast at some complex processes, just think of the incredible accuracy of (most of) our hand-eye coordination.  Our brains can also fortunately invent technologies and write code to reduce the effort of mindless calculation.

If you give a person a list of a million sums it will take them a while to do.  More importantly there will be errors; they might miss one or make a few careless mistakes, especially as they grow tired and extremely bored.  A computer will do a million sums in milliseconds, and with no mistakes and no complaints.

Although sums are a weak analogy, it also takes hundreds of people a fair amount of effort to write millions of lines of code.  In the code there will be human imperfections - bugs like in the code in the title, structural redundancy and inefficiency and so on.  If computers could write that amount of code in a few minutes, with no mistakes and highly optimised, think of the incredible power this would give humanity.  At the moment though, computers still have to be programmed by us.  Artificial intelligence is getting better though, and so far we've enabled computers to beat a chess grand master, compose like Bach, and talk to each other with their own language, among lots of other things I find mind blowing.  I think it's just a matter of time before we create a full simulation of a human brain, and start to debug our own minds. From there on it's only a matter of more time before we create a programmer's brain, and have a computer programming for another computer or itself.

Computers are of course already writing code, in a very limited way - compilers take a language like C's syntax and convert it to assembly, and in the process, a load of crazy optimisation goes on.  This is ideal, because it allows the programmer to go for clear and readable code, leaving out little efficiency hacks knowing the compiler will deal with it.  Another pretty cool thing is Visual Studio's "intellisense", which is kind of like spellcheck for programming.  It underlines errors with a red squiggly as they appear, and it lets me instantly fix little errors like semicolons as they appear.


It still feels like we're a long way away from a computer writing a program of any use, but then artificial intelligence still feels like it's in its infancy.  Reading "The Age of Spiritual Machines" has maybe given me an overoptimistic (but who knows?) view of the near future, but hopefully we'll at least see programming technologies making more and more possible, in terms of huge and reliable systems.  If computer computer programmers were to be invented, it would definitely raise some interesting social issues for the hundreds of thousands of programmers that have become obselete.  Ironically we will have created the technology that has made us redundant!

Wednesday, 5 September 2012

A Comment

Comments in programming languages are hugely useful.  As a programmer, comments are the quickest way to communicate with the reader (who may be yourself at a later moment).  As a beginner comments where really a tool for learning the language, reminding me what that strange syntax, or standard function was supposed to do.  After a while of learning programming from textbooks, I started looking around for different resources, mainly forums.  I got the impression that lots of comments was a great thing.  The result was heavy commenting in the code that followed.

Now I've read "The Practice of Programming" I've finally been able to refresh my way of thinking and look back a little.  It kind of reminded me to think of the purpose of a comment.  A comment should really help the reader along, maybe filling in some technical details of an algorithm or explaining why you've chosen to do an unusual bit fiddle.  It should add something.  Reading old code I've read comments like "// output <some crap> ", and a function "get_foo()" preceded by "// get_foo() gets foo".  This is so clear from the code, it's wasting the readers time being there.  I've moved away from the philosophy that "more comments is better" to "the least comments you can get away with, the better".  And by "get away with", I mean that the code should be completely clear - removing all comments and having cryptic code is definitely much worse to me than too many comments!

Stylistically, my aim has slowly become to write code that is good enough to not need comments.

I'm definitely at a dangerous point in my programming journey.  I feel comfortable with the syntax, the idioms and the quirks (of C/C++).  I understand the value of style, and debugging, and the dangers of things like undefined behaviour.  But it's so easy to become complacent and think you've mastered it.  The reality is, in every area of programming, there is so much more to learn and to improve on.  Hopefully, knowing this I can keep moving in the right direction...