[lug] maildir to database format?

David Kritzberg david.kritzberg at colorado.edu
Sun May 4 14:15:51 MDT 2008


* Nate Duehr <nate at natetech.com> [2008-04-30 20:25:53 -0600]:

> David Kritzberg wrote:
>> Hello BLUG people,
>> I am trying to take the contents of an old mail list and turn it into
>> some data format that I can analyze.  The list is in maildir format.
>> I have been looking for any existing tools to convert this data.  I
>> actually just want information on threads created by an initial post,
>> and any replies, with the identity of the poster or replier, and the
>> subject line.  I'm trying to create a kind of social graph from this,
>> for consumption purposes for mailing list members who've stuck it out
>> on the list.  By the way, I'm not talking about the BLUG list, but
>> another one.  Has anyone got any advice or suggestion on how to get
>> started? 
>
> If you had an extra machine to play "server" with, you could set up 
> something like this... ignoring the SMTP integration (you don't need to 
> deliver mail to it)...
>
> http://www.dbmail.org/index.php?page=overview

Thanks, dbmail looks cool, if I can figure out how to convert from
maildir with it.  I see on the dbmail wiki that it is possible to go
the other direction.  Once mail was in SQL-friendly format I could
play around with mail data.  

Actually I don't manage or need to host the list, I just have all the
saved posts.  

Dave

>> This might be a mess to work with.  I see in mutt (the only MUA I'm
>> really familiar with) that the thread characteristics are broken in
>> places, and I'm not sure how that might have happened, with many
>> contributors using many clients, and the mail list software and host
>> changing at least once over the course of the history I have (3 years,
>> 5000 messages). 
>
> It'll DEFINITELY be a mess to work with.
>
> Threading is often broken by people replying to a message and changing the 
> subject line.  Good clients, doing proper threading leave the thread intact, 
> but you see the subject line change... other clients have no "real" 
> threading of "in-response-to" or other headers, and just sort by subject 
> line...
>
> Definitely messy.

mutt seems to ignore the subject line and just use the in-reply-to,
as you suggest.  For example, later in this thread, when the subject
line changes, my client doesn't care.

Dave

____________________________
Dave Kritzberg
http://dijon.colorado.edu/
Dave.Kritzberg at gmail.com



More information about the LUG mailing list