Cloud-based Text Annotation Marketplace

Announcing the AnnoMarket News Pipeline

In preparation for our first public beta (watch this space!) we have been busy creating text annotation services (pipelines, to their friends). One of our stated priorities is “news“, a term used here in the most generic way to cover types of text talking about subjects such as politics, entertainment, sports, etc.

The AnnoMarket consortium is in a particularly good position to address this market, benefitting from the text mining expertise at the University of Sheffield, and the news know-how at the Press Association. So we have put our heads together and produced the AnnoMarket News Pipeline. This is what it looks on its home page in the forthcoming AnnoMarket on-line store:

AnnoMarket News Pipeline

The AnnoMarket News Pipeline details page in the on-line store.

The pipeline produces a wide range of annotations covering names of people, locations, organizations, dates and date ranges, measurements, numbers and ratios, percentages, amounts of money, sentences and tokens. It also performs content detection, using the Boilerpipe tagger.

Here is the pipeline in action on the store page:

News Pipeline in Action

Example of annotations produced by the AnnoMarket News Pipeline

For Person and Organization annotations that cover news-worthy entities, the pipeline will also include the ontology URIs linking into the Press Association knowledge base. These can be used to classify the entities according to various criteria relevant to journalists. We have been experimenting with new user interfaces that can make use of this information, and here is one possibility:

 

The "News Prospector"

The “News Prospector” – a user interface that can be used to explore an annotated document collection, and search it based on full text, annotations, and semantics.

We hope to be able to offer a tool along these lines when the full AnnoMarket system is launched, in the first half of 2014.


Share this page

,

Comments are currently closed.