[lug] maildir to database format?
Nate Duehr
nate at natetech.com
Wed Apr 30 20:25:53 MDT 2008
David Kritzberg wrote:
> Hello BLUG people,
>
> I am trying to take the contents of an old mail list and turn it into
> some data format that I can analyze. The list is in maildir format.
> I have been looking for any existing tools to convert this data. I
> actually just want information on threads created by an initial post,
> and any replies, with the identity of the poster or replier, and the
> subject line. I'm trying to create a kind of social graph from this,
> for consumption purposes for mailing list members who've stuck it out
> on the list. By the way, I'm not talking about the BLUG list, but
> another one. Has anyone got any advice or suggestion on how to get
> started?
If you had an extra machine to play "server" with, you could set up
something like this... ignoring the SMTP integration (you don't need to
deliver mail to it)...
http://www.dbmail.org/index.php?page=overview
And then using your IMAP client, copy all your mail over to it.
(Thunderbird, set up the new server, create a folder on it or use the
Inbox or whatever for your account... then select all mail on your main
IMAP server, right-click, Copy-To... new server and any folders should
be there in the list. Great trick for moving to a new or "test" server
too.)
Total overkill. But figured I'd mention it.
I wouldn't run DBMail as a production box... too many security
announcements about it over the last few months, for my tastes...
Figured maybe it was just useful as a tool for doing what you're trying
to do -- and save you from having to hack together something to stuff
your mail into a DB.
I think I'd shoot a backup of my Maildir before attempting any of the
above, too... if the mail is valuable to you.
:-)
> This might be a mess to work with. I see in mutt (the only MUA I'm
> really familiar with) that the thread characteristics are broken in
> places, and I'm not sure how that might have happened, with many
> contributors using many clients, and the mail list software and host
> changing at least once over the course of the history I have (3 years,
> 5000 messages).
It'll DEFINITELY be a mess to work with.
Threading is often broken by people replying to a message and changing
the subject line. Good clients, doing proper threading leave the thread
intact, but you see the subject line change... other clients have no
"real" threading of "in-response-to" or other headers, and just sort by
subject line...
Definitely messy.
Nate
More information about the LUG
mailing list