>Its easy (at least for me) to imagine this upper line sliding >back and forth as the pin is adjusted, at the same time listening >(and watching) for the "best" partial and "wholistic" match. Its >also very easy for me to imagine the usefullness of such a >display in learning / teaching about just what tuning really is. >Combined with present day curve graphs such a tool could be >invaluable as a teaching / learning / tuning aid. > >Again I must say I fail to see why there is so much resistance to >multipartial displays amoung ETD authors. Isnt it time we move >past the dancing dial ? >-- >Richard Brekne It's easy enough to imagine such a display, and easy enough to write the display function once all the necessary information is gathered and sifted out of the rubble. Let's take a look at what we have to work with. Raw microphone input is converted from an intermingled mixture of continuous waves into a series of voltage levels by the analog to digital converter in the ETD or computer sound card. Since the raw sound isn't conveniently divided up into discrete increments, we have to determine how many of these discrete increments per second we want, the number of these discrete increments making up the total sampling, and determining the duration of the sample. For instance, a sampling rate of 22050 will give you twenty two thousand and fifty discrete voltage levels per second ranging from 0 to 256, or 65535 - depending on whether you're doing 8 or 16 bit sound, but the information is still just a chopped up numerical approximation of the original sound through a specific time interval, so it's not much use to us in this form other than as a recording. Played back through a digital to analog converter (like a speaker), it will closely approximate the original sound input. To make some sense of this raw data, it has to be converted from a time to a frequency domain. That's what the FFT, or Fast Fourier Transform does. It takes the time sequence data and produces a compound number representing phase and amplitude of the frequency components of the original signal. The phase data isn't of much use to us in this application, so it's squared, added to the square of the amplitude, and the square root of the result is taken to be used in any further calculations. To chose a sampling rate and FFT array size for an application, it has to be determined how high a frequency range will be needed. The sampling rate must be at least twice the required upper frequency limit for the sample. This is called the Nyquist frequency, and just means that you can't represent details smaller than the granularity of the data set. For instance, a sampling rate of 44100/sec will have a pitch resolution up to 22050hz. Now you chose the FFT array size. For an array of 1024 at a 44100 sampling rate, the sample length is 1024/44100, or 0.02322 seconds long, with the FFT frequency resolution of 1/0.02322=43.06641hz per array position, or "bin" and a frequency range of from 43.06641hz to (1024/2)*43.06641=22050hz, which is half the sampling rate. The top frequency range is always half the sample rate. The low frequency resolution gets finer as the FFT array size is increased, but the sample length and duration increases as does the FFT processing time. In real time, with current laptop systems, a 22050bps sample rate, with a 4096 FFT array size is probably about the best trade off for sample acquisition and FFT processing time and frequency resolution that you'll get. That leaves you with an FFT bin resolution of 5.383301hz. So how do you tune a piano in 5.383301hz increments? With more processing. An FFT frequency peak that falls directly on a bin it is at full amplitude and very narrow, with bins on either side of the peak at relatively very low values. A frequency near half way between the bin values displays a lower amplitude peak (about 2/3 height), and adjacent bin values are much higher. this is called spectral leakage, and actually extends through the entire FFT to some degree. Various sliding windowing functions are used to interpolate the actual frequency of any given peak, limited by the sampling rate of the original sample and the FFT array size. I haven't found any information on resolving the spectral leakage effect on peak amplitude, but from some preliminary investigations I've done, it seems to closely approximate a sine function between 0° and 90° from the bin position to half way between. Since I'm interested in extracting this partials amplitude information as accurately as I can, I'm still exploring that one. There are also some annoying effects scattered throughout the FFT, that are the result of the sound sample starting and ending abruptly at full amplitude. While it's not a major tragedy, it does affect the peak frequency computations slightly, so it's a good idea to taper the beginning and end of the sample down with a trapezoidal ramp function, Blackman, Hanning, or Hamming window before the FFT. That's an additional processing pass through the data. So to display the partial structures of two notes in real time you would have to gather data at a high enough sampling rate to cover the frequencies employed in the display, FFT the data with a large enough array to maximize the overall frequency resolution and keep the low frequency minimum in the range of the piano, run the FFT, reprocess the FFT data to find all the frequency peaks, sort and separate the frequencies indicated to determine what peak goes to what note (never mind how you might manage to separate nearly alike coincident partial frequencies that both fall in the same area between two bins), refigure the amplitudes of all the accumulated partial frequencies to counteract the effects of spectral leakage, scale and render the resulting graph onto your working bitmap and copy the whole thing over the previous one, do all the usual program housekeeping and operating system maintenance as another sound sample is being accumulated, and do it all again. That's what the current ETDs are doing, only with considerably more data processing. And this is to be a real time deal on the "average" 133mhz pentium - never mind the 486 crowd that can still run TuneLab? I'd sure like to see it working when you've taken us past the dancing dial into the future. I'm curious whether any movement will be detectable on the screen with the current hardware. Having gotten a little dirty in this field, I'm prepared to be impressed. Meanwhile, I'd recommend doing a few Google searches on digital signal processing and FFT, and reading for a couple of weeks. Ron N
This PTG archive page provided courtesy of Moy Piano Service, LLC