« More resources on mining information from plain text | Main | A happy anniversary »



Utterly Fascinating Idea. Our only concern would be additional unrelated but statistically probable bulk in the keyword.

Imagine taking each word in a keyword and projecting its semantic value onto an axis of abstractness. Generally, there is quite a variance in search keywords with some words being very abstract other quite specific.

We argue the utility of a tag-cloud is directly proprtional to the overlap between the abstractness of the words in the tag cloud and the user's desired degree of abstractness.

For instance, people searching for the St Regis Hotel probably search "St Regis Hotel Room" frequently. But once someone was on the St Regis site, the term "hotel room" is probably quite useless as a navigation term.

The problem we think is one of asking which portion of the Semantic Hierarchy is worth exploring. "Hotel" "Room Service" is probably not terribly value in terms of in-site navigation once on the St Regis site. But it could be highly valuable in a broader search on google for distinguishing between the person St Regis and the hotel chain.

Hence, reverse-indexing data provides a useful insight on a number of these topics, but if used wholesale will probably result in many terribly obvious keywords. How many people still append keywords like "web site" for instance when searching?

But with a little fiddling, we bet this data could become highly useful.

Pete Warden

Very true, the top keyword shown on Darren's cloud is 'the'! It feels like looking at the whole search phrase might help too, but then as you say there's a lot of variations.

What this needs is some practical experimentation. I've just installed statcounter here since I can't download referrer logs from Typepad's default stats panel.

The comments to this entry are closed.