« How to escape from a submarine | Main | Should you censor your emails? »

How to extract and categorize email addresses

Headfiling
Photo by SlightlyLessRandom

It's possible to extract some interesting information from someone's email address, such as which organization they represent, what type of organization it is, and whether it's a work or personal account. This is very useful if you want to do automatic contact location in a Spoke-like way, eg who do I know at company X, and for the statistical analysis of large email stores in my own Mailana.

The key is the 80/20 rule. 80% of emails come from 20% of organizations. That makes it feasible to create a white-list that covers the most common US companies, colleges and ISPs, noting their type and giving the organization's full name. With Liz's help, I've put together an initial list of 2200. Here's a demonstration of it in practice, or you can enter some addresses into the box below:



You can also download the source and list at http://mailana.com/labs/addresscategorizer.zip

It's definitely not infallible, but it's good enough to be useful for my purposes. The more organizations get added, the more accurate it gets, so to add your own edit the domaininformation.txt file. There's a line for each organization, in this format:

organization domain|display name|type

Let me know if you do generate a larger list you're willing to share, and I'll update the example. Thanks to Christine DeMello for compiling her directory of colleges.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d83454428269e2010535a7b1a1970b

Listed below are links to weblogs that reference How to extract and categorize email addresses:

Comments

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment