Brad asked an interesting question on his blog today - Boulder seems packed with entrepreneurs, but what's the real density of those sort of folk to the general population? His guess is almost everyone in Boulder is either working at an entrepreneurial company or going to college.
The data to answer that is floating around on the web, so I thought it would be a great demo of the value of grabbing data in bulk (as opposed to siphoning data through a preset API), and of visualizing the results. Crunchbase has a liberal robots.txt and data license, so I wrote up a crawler that pulled down the information on all 45,000 companies in their database. The US census releases population data for zip codes, so then it was just a simple matter of programming to derive some per-person stats for different areas. I didn't trust the employee counts in Crunchbase (they're not the first thing someone would update) so instead I chose a couple of related indicators - the total number of companies in a location, and how much venture money they'd raised between them. Here's the top 10 zip codes for each category:
Amount raised per-person
CA 94104 - $629m total - $1,681,925 per person
CA 94304 - $2,822m total - $1,656,031 per person
CA 94105 - $972m total - $472,540 per person
MA 02142 - $1,013m total - $448,833 per person
IL 60606 - $739m total - $439,744 per person
CA 92121 - $1,826m total - $429,847 per person
CA 95113 - $202m total - $373,077 per person
MA 02210 - $135m total - $229,442 per person
WA 98033 - $5,662m total - $186,292 per person
NY 10004 - $168m total - $137,404 per person
Companies per-person
CA 94104 - 87 companies - 0.233 per person
CA 94105 - 173 companies - 0.084 per person
CA 95113 - 24 companies - 0.044 per person
MA 02142 - 73 companies - 0.032 per person
MA 02210 - 19 companies - 0.032 per person
CA 94111 - 103 companies - 0.031 per person
CA 92121 - 116 companies - 0.027 per person
NY 10004 - 29 companies - 0.024 per person
IL 60606 - 39 companies - 0.023 per person
NY 10005 - 20 companies - 0.023 per person
This is a crude approach to take, since the Crunchbase data may not be a representative sample, etc, but it gives a good first approximation. I've open-sourced all the code and data, so if you have ideas on improving this, jump in.
Next of course I wanted to visualize this data. Thanks to the sheer mindblowing awesomeness(*) of my OpenHeatMap project, all I had to do was upload my spreadsheets to get these maps of the data:
And here's a couple of detailed views of the funds raised in Colorado and the Bay Area:
* Mileage may vary. Standard terms and conditions apply
Comments