Image by Yankee in Canada
One of my favorite parts of my own Facebook research has been discovering some of the existing work in this area I didn't know about. Here's some of the most interesting papers:
Inference of Profile Elements of Individuals Using Publicly Available Social Web DataUsing Rapleaf's massive data store of publicly-available social network data, Piotr Kozikowski wrote his master's thesis on inferring attributes like gender, location and age from other known information about a person.
http://current.cs.ucsb.edu/fac
Contains details on the EuroSys '09 academic data set containing both connections and interactions for
Facebook.
Real-world separation effects in an online social network
A paper on how geography influences social networks,
using 30,000 users public friendship data from a German social network.
http://randomwalker.info/
Arvind's got a few notes about the LiveJournal, Twitter and Flickr data they're using. It sounds like Mislove has been willing to share LiveJournal network data with other academics in the past.
Cameron Marlow is the head of Facebook's data mining team, and covers their internal research on his blog.
Finally, it's in a different area, but one of the scariest datasets I've run across is the Enron collection of 500,000 emails released as part of the investigation. I was a heavy user of this for developing my email services, but I'm still amazed it's out there!
Piotr Kozikowski wrote his master's thesis on inferring attributes like gender, location and age from other known information about a perso
Posted by: north face denali | September 18, 2010 at 10:41 PM