Saturday, 13 October 2012

Programming a Phase Vocoder and University Life

I've recently moved to university (college for Americans) and so life has been a bit hectic for the last 3 weeks.  The course is great (electronics) and I'm sure now that engineering suits me better than computer science.  Anyway, the result of settling is I've done hardly no programming, apart from the introductory C labs they give to everyone.

A side note is I can appreciate how hard the programming aspect will be to a complete newcomer.  I'm not completely sure about lectures on programming and how much people will get out, but who knows?  The practicals seem to throw people in the deep end a little, and there doesn't seem to be much critical feedback - the staff are teaching good style and methods, but I'm not sure what will stop people developing bad habits. 

The lack of activity will finally let me catch up to where I am in (audio) programming.  I've only just briefly started looking at Csound, which is kind of like learning a new language in itself.  The last major project I did though was creating a phase vocoder.  This was probably the most exciting thing I've done with C++.

In short, the phase vocoder takes breaks some audio down into the frequencies that make it up (spectral data), using a STFT, and then converts the data to a form which means the data is independent of time.  Once you have data independent of time, you can play the audio back at any speed you want and keep the pitches all the same (time stretch).  Once you can time stretch, you can resample, bringing the audio back to speed but now with the pitch raised or lowered.

Maybe as motivation for the theory, here is the output of my phase vocoder on some audio samples.
I'm going to follow up with the theory and pseudocode in the near future.  Another pretty cool application of the phase vocoder is performing a "spectral freeze", where you hold onto a single spectral frame of data.  The result sounds pretty unnatural.  The phase vocoder can also cross two spectra, and even interpolate between spectral data.

I got a fair number of hitches in the design process.  "The Audio Programming Book" has quite a compact implementation, in typical C style.  I wanted to use C++, and I wanted to create something much more general, and expandable.  I created classes to encapsulate audio signal data and spectral data - this worked well because it followed the RAII (Resource Allocation Is Initialisation) principle, and as a result I got no memory leaks in my program.  However, this layer of complexity did have a performance impact every time you wanted to actually access the data.  The solution was to make a function that passed the internals out (sounds like bad practice but I'm pretty sure it's okay for that kind of object) and allow the buffer array to be accessed normally.  I also had a class for the phase vocoder itself,

Once I'd fudged something together, obviously the output was nothing like it was meant to be.  I searched for more answers online, but was met mostly with inpenetrable articles from journals.  I intensely debugged my code, fixed some problems but still wasn't getting output of any value.  Probably the most frustrating part of debugging is when you find a bug but then the program still doesn't work.

In true programmers' fashion, after testing every function in turn and stepping through almost every line, comparing different outputs with manually calculated values I pinned down the problem and slapped myself.  In the tiny function "phase_unwrap", which brings the phase (just an angle in radians) down to the range -pi to +pi instead of adding / subtracting twopi, I was adding / subtracting pi.  This changed the phase and this tiny change was ruining the output.  I guess this acts as yet another reminder to always, always, check every single function you write and run some expected input and output through.  The moment you assume a function is too trivial to get wrong is the moment you set yourself up for hours of debugging!

No comments:

Post a Comment