Photo by Irish TypepadDr John Wang [Update- sorry, wrong John Wang!] has just started a new site called EnronData.org, dedicated to developing and refining the Enron email dataset. It's off to a cracking start, offering all the Enron emails as 148 PST files, one for each 'custodian' (informally each mail user). I did my own PST conversion, but it was primarily so I had a large data set to load onto an Exchange server and test Mailana against. John's version is much closer to the original source data, and so will be more of a real-world test for applications.
I'm really pleased John has put this together, it will be a boon to anyone looking at doing heavy-duty email data-mining. I can't wait to see what else the project produces.
Hello fellow blogger, I stopped by your site and thought maybe you would like to swap links with me, after adding http://voiceofbragg.com with the title "Blog Till Death" to your site, message me and I will add yours...have a great day
Posted by: randy bragg | February 08, 2009 at 02:45 PM