Monday, July 13, 2009

Taking a look at Open Calais

Thomson Reuters provides services for tagging web content as RDF, called OpenCalais (http://opencalais.com/). Several projects are leveraging their API including; the content management system Drupal (http://drupal.org/project/opencalais), the blog site WordPress.com (http://tagaroo.opencalais.com/), Microsoft, and several others. The services are exposed as SOAP, REST, and HTTP Traffic Compression (http://opencalais.com/documentation/calais-web-service-api/api-invocation). You can download a HTML page for making the REST calls. I only supplied my API key and the HTML and it returned the data as RDF (http://opencalais.com/files/HTMLform.zip).

Open Calais is offically part of the Linked Open Data Cloud and therefore the entities represented are dereferenceable URIs (http://opencalais.com/documentation/linked-data-entities). When I supplied an article about the Tour de France then one of the Lance Armstrong entities was marked http://d.opencalais.com/pershash-1/050fd058-00ac-3453-a376-45df3198a109.html and so it is now dereferenceable. However if you click it then it probably won't tell you much. This is initially a stub for the entity "Lance Armstrong". Eventually the entity disambiguation system (http://opencalais.com/documentation/calais-web-service-api/api-metadata/entity-disambiguation) will process it, add what it knows (including same as relationship and hopefully a link to the wikipedia article for Lance Armstrong). At this point it will be marked as disambiguated. The last question in the FAQ gives a good description (http://opencalais.com/documentation/calais-linked-data/linkfaq).

So how do we use Open Calais' data? Blog post for that next.

No comments:

Post a Comment