Ayasdi - A very seductive new visualization and analysis tool, it feels like they've learned a lot from Palantir's success.
Benford's Law: A revised analysis - I'd been using the original study that analyzed public company accounts for fraud over time using Benford's Law as a poster child for the application of numeric methods to journalism. I'm sorry to see that it turned out to be a bogus correlation (thanks to an increase of zeroes in revenue figures) but it's a good reminder of how important peer review and humility are as we're charging ahead with our new techniques. It's the sort of mistake that keeps me awake at nights, knowing how easy it would be to make.
Tiki - A lovely collection of open source code to handle all sorts of file conversions to text. I built some similar functionality into the Data Science Toolkit, but I'm excited to see an Apache-supported alternative.
Stanford Part-of-speech Tagger - A walk-through of a slick project for categorizing words within unstructured English-language text.
The Next Big Thing - How Amazon should be using their information on customers' book habits to drive a social network. I'm convinced that implicit signals will win out over the follow/friend model when it comes to building communities of people, but nobody's built an example that actually works yet.
Comments