ETD Displays

Ron Nossaman RNossaman@KSCABLE.com
Tue, 27 Feb 2001 13:09:09 -0600


>Its easy (at least for me) to imagine this upper line sliding
>back and forth as the pin is adjusted, at the same time listening
>(and watching) for the "best" partial and "wholistic" match. Its
>also very easy for me to imagine the usefullness of such a
>display in learning / teaching about just what tuning really is.
>Combined with present day curve graphs such a tool could be
>invaluable as a teaching / learning / tuning aid.
>
>Again I must say I fail to see why there is so much resistance to
>multipartial displays amoung ETD authors. Isnt it time we move
>past the dancing dial ?
>--
>Richard Brekne

It's easy enough to imagine such a display, and easy enough to write the
display function once all the necessary information is gathered and sifted
out of the rubble. Let's take a look at what we have to work with. 

Raw microphone input is converted from an intermingled mixture of
continuous waves into a series of voltage levels by the analog to digital
converter in the ETD or computer sound card. Since the raw sound isn't
conveniently divided up into discrete increments, we have to determine how
many of these discrete increments per second we want, the number of these
discrete increments making up the total sampling, and determining the
duration of the sample. For instance, a sampling rate of 22050 will give
you twenty two thousand and fifty discrete voltage levels per second
ranging from 0 to 256, or 65535 - depending on whether you're doing 8 or 16
bit sound, but the information is still just a chopped up numerical
approximation of the original sound through a specific time interval, so
it's not much use to us in this form other than as a recording. Played back
through a digital to analog converter (like a speaker), it will closely
approximate the original sound input. To make some sense of this raw data,
it has to be converted from a time to a frequency domain. That's what the
FFT, or Fast Fourier Transform does. It takes the time sequence data and
produces a compound number representing phase and amplitude of the
frequency components of the original signal. The phase data isn't of much
use to us in this application, so it's squared, added to the square of the
amplitude, and the square root of the result is taken to be used in any
further calculations.    

To chose a sampling rate and FFT array size for an application, it has to
be determined how high a frequency range will be needed. The sampling rate
must be at least twice the required upper frequency limit for the sample.
This is called the Nyquist frequency, and just means that you can't
represent details smaller than the granularity of the data set. For
instance, a sampling rate of 44100/sec will have a pitch resolution up to
22050hz. Now you chose the FFT array size. For an array of 1024 at a 44100
sampling rate, the sample length is 1024/44100, or 0.02322 seconds long,
with the FFT frequency resolution of 1/0.02322=43.06641hz per array
position, or "bin" and a frequency range of from 43.06641hz to
(1024/2)*43.06641=22050hz, which is half the sampling rate. The top
frequency range is always half the sample rate. The low frequency
resolution gets finer as the FFT array size is increased, but the sample
length and duration increases as does the FFT processing time. In real
time, with current laptop systems, a 22050bps sample rate, with a 4096 FFT
array size is probably about the best trade off for sample acquisition and
FFT processing time and frequency resolution that you'll get. That leaves
you with an FFT bin resolution of 5.383301hz. So how do you tune a piano in
5.383301hz increments? With more processing.

An FFT frequency peak that falls directly on a bin it is at full amplitude
and very narrow, with bins on either side of the peak at relatively very
low values. A frequency near half way between the bin values displays a
lower amplitude peak (about 2/3 height), and adjacent bin values are much
higher. this is called spectral leakage, and actually extends through the
entire FFT to some degree. Various sliding windowing functions are used to
interpolate the actual frequency of any given peak, limited by the sampling
rate of the original sample and the FFT array size. I haven't found any
information on resolving the spectral leakage effect on peak amplitude, but
from some preliminary investigations I've done, it seems to closely
approximate a sine function between 0° and 90° from the bin position to
half way between. Since I'm interested in extracting this partials
amplitude information as accurately as I can, I'm still exploring that one.
There are also some annoying effects scattered throughout the FFT, that are
the result of the sound sample starting and ending abruptly at full
amplitude. While it's not a major tragedy, it does affect the peak
frequency computations slightly, so it's a good idea to taper the beginning
and end of the sample down with a trapezoidal ramp function, Blackman,
Hanning, or Hamming window before the FFT. That's an additional processing
pass through the data.

So to display the partial structures of two notes in real time you would
have to gather data at a high enough sampling rate to cover the frequencies
employed in the display, FFT the data with a large enough array to maximize
the overall frequency resolution and keep the low frequency minimum in the
range of the piano, run the FFT, reprocess the FFT data to find all the
frequency peaks, sort and separate the frequencies indicated to determine
what peak goes to what note (never mind how you might manage to separate
nearly alike coincident partial frequencies that both fall in the same area
between two bins), refigure the amplitudes of all the accumulated partial
frequencies to counteract the effects of spectral leakage, scale and render
the resulting graph onto your working bitmap and copy the whole thing over
the previous one, do all the usual program housekeeping and operating
system maintenance as another sound sample is being accumulated, and do it
all again. That's what the current ETDs are doing, only with considerably
more data processing. And this is to be a real time deal on the "average"
133mhz pentium - never mind the 486 crowd that can still run TuneLab? 

I'd sure like to see it working when you've taken us past the dancing dial
into the future. I'm curious whether any movement will be detectable on the
screen with the current hardware. Having gotten a little dirty in this
field, I'm prepared to be impressed. Meanwhile, I'd recommend doing a few
Google searches on digital signal processing and FFT, and reading for a
couple of weeks.


Ron N


This PTG archive page provided courtesy of Moy Piano Service, LLC