Photo by Larry & Flo
Have you noticed more useless or deceptive links showing up in your Google search results? I have and it looks like I'm not alone. More and more of the time when I'm doing a technical query, I click on pages that from the summary look like they might have the answer, but they turn out to be bait-and-switch. They're usually a small amount of text captured from somewhere else surrounded by a mass of related ads.
Why is this happening? The short answer is that publishers are getting better at tricking Google into ranking low-quality articles highly. There's always been a battle between content-producers who want their content to appear in search results, and search engines trying to send their users to relevant articles. In the 90's it was enough to repeat popular keywords hundreds of times at the bottom of the page, but Google killed such simple approaches using a combination of PageRank and algorithms to assess the quality of the content.
Unfortunately, truly evaluating the quality of a text article is an AI-complete problem. Instead Google has relied on statistical tests to spot repetitions, copied content and obvious nonsense, but publishers have figured out the limits of that approach, and are busy churning out cheap low-quality content that squeaks through the tests.
Demand Media takes the Mechanical Turk route and pays people a tiny amount of money to create articles based on popular searches on sites like eHow. As you might expect, the articles tend to be pretty shallow and I doubt they help many people, but they at least pay lip service to creating original content.
Mahalo began as a reputable startup using professional editors to create good answers to common search queries. Over the last couple of years they've apparently switched to outsourcing instead, and most recently asked for 17 volunteer interns. Most recently, they've been credibly accused of scraping content created by other people without permission to blatantly game Google's rankings. Any content that's a blatant copy of other material should be spotted and down-ranked, but for some reason isn't.
So what can you do to improve your search results? You can report sites you consider spam to Google, but even if they agree it may be a while before they're removed from the listings. As a short-term measure you can add the following keyword to your searches to remove eHow for example:-site:ehow.com -site:ehow.co.uk how to move from houston to canada
In the end the only thing that can really remove low-quality content from search results is having some human judgment in the ranking process. Google are pushing hard to use social information to build results based around what they know of your friends, and I expect that they will expand this to use information about the sites you and your friends actually visit. It's much more likely that I'm interested in news.ycombinator.com discussions than TechCrunch, because I frequently visit HN. In an ideal world links from there should show high in my search results, even though they might not for other people. Until that beautiful day dawns, I'll just pull down 100 results a page and let my eyeballs do the sorting.