[pianotech] re: new list features

Andy Rudoff andy@rudoff.com
Tue, 10 Sep 2002 15:46:37 -0600 (MDT)


>I've been perusing the archives, and just tried the search function.... wow! 
>Much clearer than before, jumps right to the message, sorts the search by 
>how good of a match!  We finally have the messages back to 1994 as a 
>resource that is usable.
>
>Thanks Andy!

You're welcome!  I've been setting up the new server for quite a while
now and the main reason for the delay was the format of the pianotech
archives.  I decided that since pianotech was one of our most valued
assets, the archives really needed to be there and working well before
cutting over to the new server.  For those of you interested, here's
a list of the highlights that happened during this process:

	- I found that the Mailman software had never archived so many
	  messages, so I had to get in and fix many bugs in the code that
	  caused it die.  The bugs needed fixing and I'll be giving the
	  fixes back to the Mailman maintainers.

	- I found that Mailman's cool archives did not allow a search
	  form at the top.  Another fix.

	- I found that majordomo had stripped some information from the
	  top of messages I needed, so I went back to my personal archives.

	- Many messages had wrong dates in them (years like 1904, or 2011,
	  some from y2k bugs, I think) and Mailman kept trying to create
	  folders for those dates.  I thought that looked confusing so
	  I had to fix them by assigning them dates from messages that
	  came in around the same time.

	- I found the search engine we were using also had problems
	  dealing with the list archives and I had to fix a bug there.
	  (It took over 20 hours to index the archives before the bug
	  fix, now it takes about an hour.)

	- I wrote a program to take my personal archives, sift out all
	  the messages for pianotech, ptg-l, caut, humor, and ce-l, fix
	  all the problems, and produce a file that Mailman could handle.

	- Running the Mailman archiver on the pianotech file takes about
	  6 hours!  So each time I hit a bug I had to fix it and start the
	  run again.

I'm quite happy with the way it turned out -- I think the archives are
organized in a more sensical way and searching works well.  I expect we'll
run into more "scaling" problems over time, since the archives are so big,
but I'll do my best to keep on top of it.  If we do hit some showstopper
problem, I can always move the past archives to a separate location and
start over (so there would be two places you'd look for old messages)
but I'm trying to avoid it if at all possible.

The one issue I'm not happy with is that I changed the name of all the
archive files from the old majordomo names to the new names.  That's fine
when you're reading them, but if you go to a search engine and find
an old URL it will give you an annoying "not found" error when you try
to look at it.  I'm thinking about a creative fix for that.

Go back and read some of those old 1994 messages!  I found it really
interesting to find discussions that I've seen a few times on the list
and certainly some great information there...

-andy


This PTG archive page provided courtesy of Moy Piano Service, LLC