FFT aka Fast Fourier Transform explained well

From here: http://www.kvraudio.com/forum/viewtopic.php?p=989000&sid=6e7e5d21145a3efd03b8ebaa5a15f830#989000

The fourier transform converts a time representation ( samples ) into a frequency representation ( i.e. the frequency spectrum of these samples ) and vice versa.

Commonly it splits a number of audio samples, say 2048, into a corresponding number of sine waves of different frequencies, so that you know the volumes and the phase for each sine.

The principle is very simple:

If you multiply a set of samples with another set of reference samples and add each result, like this:

result = sample[ 0 ] * reference[ 0 ] + sample[ 1 ] * reference[ 1 ] + … + sample[ n ] * reference[ n ]

… the result is an indicator for the similarity of both sets of samples.

And now with sines

Now, if you want to know the volume and the phase of a sine at a certain frequency inside an audio sample, you have to calculate 2 of these similarity values:

FC = 2 * PI * frequency / samplerate; // “frequency constant”

result_i = sample[ 0 ] * sin( FC * 0.f ) + sample[ 1 ] * sin( FC * 1.f )+ … + sample[ n ] * sin( FC * n.f )

result_r = sample[ 0 ] * cos( FC * 0.f ) + sample[ 1 ] * cos( FC * 1.f )+ … + sample[ n ] * cos( FC * n.f )

To get the actual phase and magnitude, you use simple pythagoras to get length and trigonometry to get the angle:

magnitude = sqrt( result_i * result_i + result_r * result_r);

phase = atan2( result_i, result_r) // I think…

This way you can determine how much of a frequency is present in an audio file etc., and which phase it has relative to the beginning.

Complex Numbers?

In Fourier Transform lingo, the sine part is oftenly called “imaginary” while the cosine part is called “rational”. They pretend to be complex numbers, but that’s more or less only a convention. The phase/magnitude representation IMHO feels more like complex numbers.

Getting the whole spectrum

Now, the idea behind fourier transforms is, if you start with a “fundamental” frequency of exactly 1 cycle over the whole length of the analysed set of samples (call it n samples) and do this with every multiple up to n/2 of the fundamental, you get a representation that thoroughly reflects the spectrum of the original sample. – If you transform it back (create a new bunch of n samples out of adding sines and cosines), it will perfectly fit the original samples!

Each “frequency slot” or multiple of the fundamental (1, 2, 3, …, n/2) is called a “bin”. (Note, there’s also a zero-bin, I’ll explain that later)

Fast Fourier Transform – FFT

This would be a very time consuming technique, if there wasn’t the Fast Fourier Transform, or FFT. The FFT is a simple, yet highly difficult to understand algorithm. It takes advantage of the fact, that if you draw all those sines in a graph, you have many lines that cross each other at certain points. And because they cross each other pretty often, you need only 1 multiply to gather information for many results needed. Hence, what FFT does is recursively reordering (“prepacking”, “butterfly”) the samples in a tricky way so that only a fraction of calculations are needed to create the spectrum for a given range of audio samples. The drawback is, FFT requires certain lengths (or “ns”) of samples, which are a power of 2: like 16, 32, 1024, 4096 samples etc.

DC offset and Nyquist in FFTs

Typically, a FFT outputs the spectrum in sines and cosines, but it also output 2 special values: A cosine representing the DC (offset of powers above zero and below zero), which is bin 0, or the fundamental frequency * 0 and a sine representing Nyquist (because at Nyquist, there can’t be a cosine!). Hence some algorithms require N/2+1 bins, others “pack” the Nyquist bin into the sine field of the DC bin.

Hope this helps, or maybe it’s a start…


Wink Urs

How to remove the gloss / glare / white area from iPhone app icon?

To remove the gloss / glare / white area from iPhone app icon that iPhone adds automatically – go to the Xcode and open up the App’s Info.plist. On right side you see list like button. Press that and you’ll get new row with selectable property name. From there pick the “Icon already includes gloss..” and check the checkbox.

So in short: it’s a property in the App info.plist.

MIDI signal

MIDI signal in oscilloscope. Inverted serial 31200 bits per second. Captured from USB->MIDI adapter (SWIS). First bit time is about 32us (31.2us = microsecond ). Voltage difference was 3.92V.

A great page describing MIDI: http://www.tigoe.net/pcomp/code/serial-communication/midi which describes MIDI like:

“MIDI is a serial communications protocol, operating at 31,250 bits per second. Each byte has 8 bits, plus a start bit and a stop bit. It operates at 5 volts DC. The standard MIDI connector is a 5-pin DIN connector, and usually all connectors on the device are female, and both ends of a MIDI cable are male.”

From here http://home.roadrunner.com/~jgglatt/tech/midispec.htm :

  • MIDI is an asynchronous serial interface.
  • The baud rate is 31.25 Kbaud (+/- 1%).
  • There is 1 start bit, 8 data bits, and 1 stop bit (ie, 10 bits total), for a period of 320 microseconds per serial byte.
  • The MIDI circuit is current loop, 5 mA.
  • Logic 0 is current ON.
  • To avoid grounding loops and subsequent data errors, the input is opto-isolated. It requires less than 5 mA to turn on. The Sharp PC-900 and HP 6N138 optoisolators are satisfactory devices. Rise and fall time for the optoisolator should be less than 2 microseconds.
  • The standard connector used for MIDI is a 5 pin DIN.
  • Separate jacks (and cable runs) are used for input and output, clearly marked on a given device (ie, the MIDI IN and OUT are two separate DIN female panel mount jacks).
  • 50 feet is the recommended maximum cable length.
  • Cables are shielded twisted pair, with the shield connecting pin 2 at both ends.
  • The pair is pins 4 and 5. Pins 1 and 3 are not used, and should be left unconnected.

Working MD5 for Objective-C

  3. #import <CommonCrypto/CommonDigest.h>
  4. "%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X",
  5. result[0], result[1],
  6. result[2], result[3],
  7. result[4], result[5],
  8. result[6], result[7],
  9. result[8], result[9],
  10. result[10], result[11],
  11. result[12], result[13],
  12. result[14], result[15]];
  14. }

Technology snippets