I'm excited to be doing a presentation at Defrag again this year, Eric Norlin gathers together an amazing bunch of people. As I was exchanging emails with him about the conference I found the theme to The Wombles kids show from the 70's going through my head. I've always struggled to find the right label for what I do, Implicit Data was the original inspiration for Defrag, these days Big Data is en vogue, but none of them are very descriptive. I realized that Recycled Data might be a better theme, which makes me a Womble:
What's really changed in the last few years is that the technology for grabbing large amounts of data and analyzing it is now incredibly cheap. Just as mining companies are using new technology to extract metal from decades-old piles of waste material, so researchers are starting to pull useful information from data that the big players see as valueless.
I think the root cause of my troubles with Facebook was that they didn't realize what a rich source of information the public profiles they exposed to search engines were. Individually they only displayed a name, a handful of friends and some pages each user liked, which seemed worthless. What they didn't understand was that if you have enough of them, important and interesting patterns start to emerge. Even junk data becomes valuable at scale. Who'd have thought that analyzing which pages link to each other could become a gushing fountain of money for Google, once they had enough pages crawled?
I feel like a kid in a candy store, there's so many great sources of public data to choose from I hardly know what to visualize first, and I'm surprised there aren't more people taking advantage of this bounty. From Crunchbase, to Google Profiles, Twitter, US Census data, make good use of the things you can find, things that the everyday folks leave behind, and remember you're a Womble: