Heritrix - The open-source web crawler created by the Internet Archive, source is here. It's easy to get started writing a crawler, but there are a lot of deep issues you have to wrestle with if you want sophisticated features, so it's great to have production-tested code to reference.
The animals of O'Reilly - A wonderful initiative to highlight worthwhile wildlife projects, a lot of them involving fascinating technology hacks.
What makes Paris look like Paris? - Automatically extracting the visual elements that define a place.
MangoDB - MongoDB has been fine for the applications I've used it on, and the support has been top-notch, but some frustrated person has put way too much thought into this open-source parody.