[lug] maildir to database format?
David Kritzberg
david.kritzberg at colorado.edu
Sun May 4 14:15:51 MDT 2008
* Nate Duehr <nate at natetech.com> [2008-04-30 20:25:53 -0600]:
> David Kritzberg wrote:
>> Hello BLUG people,
>> I am trying to take the contents of an old mail list and turn it into
>> some data format that I can analyze. The list is in maildir format.
>> I have been looking for any existing tools to convert this data. I
>> actually just want information on threads created by an initial post,
>> and any replies, with the identity of the poster or replier, and the
>> subject line. I'm trying to create a kind of social graph from this,
>> for consumption purposes for mailing list members who've stuck it out
>> on the list. By the way, I'm not talking about the BLUG list, but
>> another one. Has anyone got any advice or suggestion on how to get
>> started?
>
> If you had an extra machine to play "server" with, you could set up
> something like this... ignoring the SMTP integration (you don't need to
> deliver mail to it)...
>
> http://www.dbmail.org/index.php?page=overview
Thanks, dbmail looks cool, if I can figure out how to convert from
maildir with it. I see on the dbmail wiki that it is possible to go
the other direction. Once mail was in SQL-friendly format I could
play around with mail data.
Actually I don't manage or need to host the list, I just have all the
saved posts.
Dave
>> This might be a mess to work with. I see in mutt (the only MUA I'm
>> really familiar with) that the thread characteristics are broken in
>> places, and I'm not sure how that might have happened, with many
>> contributors using many clients, and the mail list software and host
>> changing at least once over the course of the history I have (3 years,
>> 5000 messages).
>
> It'll DEFINITELY be a mess to work with.
>
> Threading is often broken by people replying to a message and changing the
> subject line. Good clients, doing proper threading leave the thread intact,
> but you see the subject line change... other clients have no "real"
> threading of "in-response-to" or other headers, and just sort by subject
> line...
>
> Definitely messy.
mutt seems to ignore the subject line and just use the in-reply-to,
as you suggest. For example, later in this thread, when the subject
line changes, my client doesn't care.
Dave
____________________________
Dave Kritzberg
http://dijon.colorado.edu/
Dave.Kritzberg at gmail.com
More information about the LUG
mailing list