Announcing the AnnoMarket News Pipeline
In preparation for our first public beta (watch this space!) we have been busy creating text annotation services (pipelines, to their friends). One of our stated priorities is “news“, a term used here in the most generic way to cover types of text talking about subjects such as politics, entertainment, sports, etc.
The AnnoMarket consortium is in a particularly good position to address this market, benefitting from the text mining expertise at the University of Sheffield, and the news know-how at the Press Association. So we have put our heads together and produced the AnnoMarket News Pipeline. This is what it looks on its home page in the forthcoming AnnoMarket on-line store:
The pipeline produces a wide range of annotations covering names of people, locations, organizations, dates and date ranges, measurements, numbers and ratios, percentages, amounts of money, sentences and tokens. It also performs content detection, using the Boilerpipe tagger.
Here is the pipeline in action on the store page:
For Person and Organization annotations that cover news-worthy entities, the pipeline will also include the ontology URIs linking into the Press Association knowledge base. These can be used to classify the entities according to various criteria relevant to journalists. We have been experimenting with new user interfaces that can make use of this information, and here is one possibility:
We hope to be able to offer a tool along these lines when the full AnnoMarket system is launched, in the first half of 2014.