I've been intrigued by the promise of automatically extracting information from raw text using semantic analysis, but I've never found a publicly-available component I could integrate into my own work that was good enough to get excited about. When OpenCalais was released I wanted to give it a spin, but there wasn't a demo page available to run tests with. I've taken some of the PHP demo code they've released, added some robot-deterrent and put it online at http://funhousepicture.com/calaisdemo/
To use it, copy-and-paste some text, answer the CAPTCHA test, and click on Show Results. You should see some of the places, people and technical terms highlighted. If you mouse over, it shows what kind of object it is. You can download the source to my version of the demo here, though you'll need to grab your own reCAPTCHA keys before it will run.
Give it a try for yourself and let me know what you think. I'm primarily interested in automatically tagging business emails, and from my tests it's got some promise. It didn't seem to mistakenly identify many items in my material, but there were a lot of nouns its not designed to handle. I'd love to see something that understood dates, addresses and locations, but it doesn't do a great job with these yet.
I'll be running some more bake-offs figuring out what off-the-shelf semantic technology can do these days, so stay tuned.
Comments