[Update- There's now a good alternative that includes separate PSTs for each user]
After getting the Enron emails into the mbox format, the next step was to convert them into something that the Outlook/Exchange world can understand. Thankfully I already had a great conversion program in mind, Aid4Mail. At its core its a translator between a large number of mail formats, including Outlook, Outlook Express, Windows Mail, Eudora, Thunderbird, Netscape Messenger, Pegasus Mail and a whole bunch of generic formats including several mbox variants. It can read and write to all of these formats, and has a large number of options to transform the mail as you do so. For example you can choose to only convert mails sent between certain dates, or to ignore attachments. If you're working with mail, I highly recommend giving this program a try, it's the swiss army knife of email tools.
To do the Enron conversion, I selected generic unix mbox as the input format. On the next screen I navigated to the root folder that contained all my files, and then chose Outlook pst as the destination type. I left all the other options at their defaults, so no filtering was done and the folder hierarchy was preserved. It took around 16 hours to process all 500,000 messages, and the pst file came out at around 5 GB.
I'm able to open it in Outlook and browse through the messages, and can also add them to my Exchange server. There are some issues, it doesn't preserve the original user structure, since they're all in one pst, attachments aren't included, and some of the addresses are obsfucated. It's good enough to give me the testbed I need to put some of my tools through some real-world stress tests.
Once the upload has finished, you should be able to access the pst yourself at
http://funhousepicture.com/enron.pst
It's 5 GB, so it won't be all there for a good few hours, and be prepared for a long download time.
I struggled with this conversion for awhile when I went from Pegasus Mail to Outlook 2007.
What I found was a free way of doing it. You can use Gmail's IMAP feature to get the msgs from the Pegasus Folder up to the Gmail IMAP folder, and then back down to Outlook via IMAP.
Worked great. Wasn't terribly quick mind you (especially if you've got 5GB of mail), but it DOES work, and it's free.
Posted by: Dopefish | March 26, 2008 at 11:09 AM
Thanks Joe, that's very true. I'd considered running a local IMAP server, but hadn't thought of Gmail. If anyone else is dealing with less than 5 GB of data, that's probably a great way to go, plus you've got it all backed up by Google!
Posted by: Pete Warden | March 28, 2008 at 06:19 PM
You mention 5GB of pst, but the pst download is only 2GB. Please comment as I'd like to obtain the full 5GB of Enron mail.
p.s Thank you for making this available.
Posted by: Shawn Owens | May 06, 2008 at 09:05 AM
Sorry to hear you're having problems with it Shawn. I just investigated by ssh-ing into the server, and it claims the full 5GB pst is there:
ls -l enron.pst
-rw-r--r-- 1 petew users 5975000064 Mar 25 23:10 enron.pst
Once I'm back home on a decent connection, I'll try downloading it to confirm it's working.
Posted by: Pete Warden | May 09, 2008 at 06:53 PM
Thanks for converting the dataset to Outlook pst format.
As Shawn mentioned, the enron.pst file is 2GB instead of 5GB.
I tried to import it into my Outlook but got a message that the enron.pst is not a personal folders file.
Is it due to incomplete download or file corruption?
Appreciate if you can help. Thanks!
Posted by: maria chen | May 17, 2008 at 05:04 AM
I spoke to Maria some more, and it turned out that she was using a FAT32 filesystem as the download location. FAT32 has a 2GB size limit for individual files, which explains what went wrong.
This was the default file system on many versions of Windows, so if you're hitting a similar issue, make sure your volume is NTFS.
Posted by: Pete Warden | May 18, 2008 at 01:03 PM
The file size showed up at 1.56GB while downloading to a NTFS volume as time of save. Also as previous user stated MS outlook did not recognize the PST file, again possibly incomplete.
For data tranfer I find that winrar would apply sufficient compression to reduce the mail store to facilite quicker downloading.
Appreciate any help. Thanks
Posted by: Tim Fehilly | June 13, 2008 at 07:19 AM
I've heard back from both Shawn and Maria, and they have now successfully downloaded and opened the PST. Once I'm back home I'll give it another try and make sure it's still accessible.
Posted by: Pete Warden | June 13, 2008 at 05:54 PM
I had always wanted to learn about this topic ... I think it's great the way you expose .. great work and continuing on with this great blog
Posted by: moncler coats | November 20, 2011 at 03:10 AM