[lug] maildir to database format?

Nate Duehr nate at natetech.com
Wed Apr 30 20:25:53 MDT 2008


David Kritzberg wrote:
> Hello BLUG people,
> 
> I am trying to take the contents of an old mail list and turn it into
> some data format that I can analyze.  The list is in maildir format.
> I have been looking for any existing tools to convert this data.  I
> actually just want information on threads created by an initial post,
> and any replies, with the identity of the poster or replier, and the
> subject line.  I'm trying to create a kind of social graph from this,
> for consumption purposes for mailing list members who've stuck it out
> on the list.  By the way, I'm not talking about the BLUG list, but
> another one.  Has anyone got any advice or suggestion on how to get
> started? 


If you had an extra machine to play "server" with, you could set up 
something like this... ignoring the SMTP integration (you don't need to 
deliver mail to it)...

http://www.dbmail.org/index.php?page=overview

And then using your IMAP client, copy all your mail over to it.

(Thunderbird, set up the new server, create a folder on it or use the 
Inbox or whatever for your account... then select all mail on your main 
IMAP server, right-click, Copy-To... new server and any folders should 
be there in the list.  Great trick for moving to a new or "test" server 
too.)

Total overkill.  But figured I'd mention it.

I wouldn't run DBMail as a production box... too many security 
announcements about it over the last few months, for my tastes...

Figured maybe it was just useful as a tool for doing what you're trying 
to do -- and save you from having to hack together something to stuff 
your mail into a DB.

I think I'd shoot a backup of my Maildir before attempting any of the 
above, too... if the mail is valuable to you.

:-)

> This might be a mess to work with.  I see in mutt (the only MUA I'm
> really familiar with) that the thread characteristics are broken in
> places, and I'm not sure how that might have happened, with many
> contributors using many clients, and the mail list software and host
> changing at least once over the course of the history I have (3 years,
> 5000 messages). 

It'll DEFINITELY be a mess to work with.

Threading is often broken by people replying to a message and changing 
the subject line.  Good clients, doing proper threading leave the thread 
intact, but you see the subject line change... other clients have no 
"real" threading of "in-response-to" or other headers, and just sort by 
subject line...

Definitely messy.

Nate



More information about the LUG mailing list