Cloud-based Text Annotation Marketplace

Consortium

The University of Sheffield (USFD)

The Natural Language Processing (NLP) group at the University of Sheffield is one of the largest and most successful research groups in language and information in the EU. The group is based in the Department of Computer Science, and includes world-class teams in the areas of speech, language, knowledge and information processing, biotechnology, and machine learning for medical informatics. The Department of Computer Science was awarded a top Grade 5 in the most recent nationwide Research Assessment Exercise. Companies that support its work include Daimler-Chrysler, GlaxoSmithKline, Motorola and Nokia.

The Natural Language Processing Group has focused on robust engineering of open source NLP software and on quantitative evaluation and repeatability. The group has extensive experience in the fields of NLP infrastructures (GATE), information extraction, text summarisation, machine learning methods for NLP, dialogue systems, question answering, terminology extraction, NLP methods for Knowledge Management and the Semantic Web. USFD has a world-leading research record on human language technologies, developed within national and international research projects in these areas. Our participation in ARCOMEM (sentiment analysis), GateCloud (UK exploratory project), LARKC (large-scale reasoning and web search), and KHRESMOI (bio-informatics) will form a solid base.

Ontotext AD (ONTO)

Ontotext is the leading provider of core semantic technology distinctive for its performance, scale, and compliance with open standards. It is the developer of OWLIM, the most scalable semantic database. Another outstanding product of Ontotext is KIM – the most popular semantic annotation and search platform. Ontotext’s technology delivers real-world applications in Life Sciences, Financial Intelligence, Telecommunications, Publishing, Online Recruitment, Web Mining and Search, and other areas. Our customers include top-10 pharmaceutical company (AstraZeneca), top-5 US military contractor, financial intelligence institutions, leading UK media (BBC) and news agencies (Press Association), as well as, customer-facing semantic technology companies, e-business and social media start-ups. Ontotext is involved in several joint-ventures, which deliver vertical solutions.

Ontotext was founded in year 2000 as the semantic technology lab of Sirma (a top-3 software house in Bulgaria). It took part in several projects in FP5, FP6, and FP7, which allowed it to invest more than 150 person-years in product development. In September 2008, Ontotext was spun-off as a separate legal entity in order to accommodate an investment from NEVEQ – a venture capital fund which acquired a minority share in a deal for 2.5 million EURO. At present Ontotext has about 50 employees in its offices in Sofia and Varna (Bulgaria).

Ontotext’s engineers are heavily involved in a variety of research and academic activities, e.g. scientific publications, teaching and organization of events. Recent developments include reasonable-views – public services which demonstrate reasoning with billions of linked data facts and provide unique facilities for efficient semantic search and querying of data from multiple sources.

Internet Memory Research (IMR)

Internet Memory Research is a spin off of the Internet Memory Foundation dedicated to Web archiving and webscale extraction of information for professional use. Based in Paris IMR operates the first large scale shared Web archiving platform in Europe to help public institutions (e.g. libraries, archives, heritage organisations) to engage in preserving content on the Web to enrich their collection. This platform is for instance used by the UK National Archives to capture and redirect all UK central government websites.

In addition to its archiving services, IMR currently develops tools to collect and extract information at large scale (millions of sites and social networks) for professional application (web intelligence). Whiles currently crawling dozens of terabytes of data per month, IMR plans to scale to one petabyte this year.

The Press Association Ltd (PA)

The Press Association Group is a global content operation with specific focus on news, sport and entertainment. The group operates a diverse range of businesses across the UK, Ireland, Europe, Canada and Asia as well separate brands specialising in areas such as Marketing (TNR) and Weather (Meteo Group).

On a daily basis, the Press Association creates, curates and distributes vast amounts of content and data as well as providing a range of services to all major media organisations and many global corporations. Recently, the Press Association was appointed as host national news agency for the London 2012 Olympics.

In order to address the growing demand for ‘fast, fair and accurate’ content and data, the Press Association recently committed to a large-scale project to redesign its IT infrastructure by putting semantic web technologies and principles at its core. By the adoption of established standards such as IPTC’s G2, RDF, a preference for existing ontologies such as geonames, FOAF, Dublin Core, etc. and the use of technologies such as cloud-based computing, open and RESTful approaches, XML-driven DBs and triple stores, the Press Association addresses real business needs such as, the managing of enormous, disparate and heterogeneous sets of data, reducing silos and streamlining content and data interchange within the organisation and between PA and its customers.


Share this page