« Why should I be proper? | Main | What puzzles can company-wide email solve? »

Comments

Josh Fraser

Thanks for such an interesting post Pete.

I've played with automatic tag generation for EventVue. We have the advantage of being able to use the community at a conference to determine the contextual significance of each word. The key for us is to use the dataset of existing tags as the dictionary by which we detect tags in the future.

The dictionaries we use for a tech conference would look very different for a lawyers convention. Do you think it is possible to find a master dictionary that would work in any context? The best attempt that I've seen is tagthe.net -- and they have a long way to go.

Pete Warden

There's some interesting subject-specific word lists here:
http://www.nzdl.org/Kea/download.html
http://esw.w3.org/topic/SkosDev/DataZone
These are for specialized areas like medicine and physics, and include synonyms which could be useful.

I think you're right that the usefulness of an automatic tagging system goes down as the domain gets broader. You'll get a lot less noise if you have a small, hand-tuned list of keywords to look for.

http://tagthe.net is interesting, thanks for the link. You're right it's a long way from human tagging, but running this article through the web interface gave me about 8 keywords, half of them useful. I'm hopeful something of that quality would be useful for what I'm thinking of.

Al Bismark

http://nosyjoe.com is using an intelligent tagging engine...

Josh Fraser

Thanks for sharing those links to the domain-specific word lists. Lists like that could be quite useful to get everything kick-started.

The comments to this entry are closed.