An Indexed Archive

Bill Ballard yardbird@pop.vermontel.net
Tue, 17 Sep 2002 07:48:12 -0400


At 12:29 AM -0500 9/17/02, Richard Moody wrote:
>Greetings Andy and List
>     I would like to propose an "indexing" of the Pianotech
>Archives.  This would entail going through the archive and
>retrieving all emails pertinent to piano technology.

Then is of course a piece of manual labor, performed on the 117,000 
posts in the archive. Did you have anybody in mind for doing this? I 
hope not Andy. He just finished the massive job of bringing all those 
records forward from a "legacy" format, with all the associated data 
cleaning. And that did not entail an judgement in each individual 
case as to whether the post was "OT" (on-topic) or "OT" (off-topic).

We could certainly come up with an outline of subject, but it would 
still require someone going through 117K records and tagging each one 
("Board/New/Ribs/Feathering"). Is this a job for a single person, or 
for a massively parallel team? Good question.

>Then entering these posts in HTML or similar format so that they can be
>indexed and accessed  easily by the home PC.

I'd like to encourage people to construct their own searchable 
databases of the archives. Certainly this is a local solution, based 
on an individual's platform (OS), hardware (Processor speed and RAM), 
and resident database.

I've talked with Andy about this, and thanks to his fine work, the 
whole enchilada is both 1.) plain text uncluttered by embedded 
formatting, and 2.) easily readable by an app based on the widely 
recognized Unix .mbox format. My own database will have no trouble 
importing. The most difficult part of the job is converting data info 
in text format in to date format, and it's no special trick.

But that's my local solution, based on Mac OS 9.1, DualG4 450MHz 
processors, 1024MB of RAM, and the RAM-based and extremely capable 
cross-platform database Panorama by ProVue Development. 
(<http://www.provue.com/panorama.html>). The whole enchilada barely 
takes up 36% of a CD-ROM. (No, I would not expect FilemakerPro to do 
a good job of this. I have no direct experience with MS Access. By 
hearsay, it has a steep learning curve but is otherwise 
well-equipped.)

That's my solution which might be useful for list member with 
matching hardware/software. I'm sure there are others on the list 
with db experience who can work out solutions for local situations so 
common-denominator as to border on a general solution. (For instance, 
my conversion of the long text file into a database record/field 
structure would yield a tab-delimited text file with any 
data-cleaning already done, and ready for a simple import.)

>     This would leave the present archive intact with every message
>every ever posted and also offer a "compiled archive" in which
>specific information can be better and more  quickly accessed.

What I have in mind will include streamlined search/select 
capabilities ("Build me a list of every thread containing the words 
'epoxy' AND 'bridge' in the message"), and also a writing console. 
Then of course, I'll need a way of transferring the the packaged 
email to Eudora for xmitting. ("Beautiful dreamer, dream unto 
me.....").

Bill Ballard RPT
NH Chapter, P.T.G.

"The box said 'Requires Windows 98, or better.' So I bought a Macintosh."
     ...........a wag from the Web
+++++++++++++++++++++



This PTG archive page provided courtesy of Moy Piano Service, LLC