Cloud-based Text Annotation Marketplace

AnnoMarket Contract Signed

The contract is now signed and we’re all ready for the project to begin on June 1st 2012. AnnoMarket is an EU Framework 7 project funded jointly by the European Commission and the project partners — Ontotext AD, Internet Memory Research SAS, the Press Association Ltd. and the University of Sheffield (who will coordinate). The project will run for two years.

Cloud computing is making disruptive changes throughout ICT. This creates opportunities, especially for small fast-moving organisations and SMEs. AnnoMarket will create a marketplace for multilingual content analysis resources adapted to cloud computing and big data. This innovation will change the business model of suppliers and the cost basis for customers in this growing sector.

Three areas in which EU business is currently experiencing a high rate of innovation are:

  • cloud computing, which represents a new commodification of compute power and applications software
  • online web mining and analysis to support core business functions like customer relations or market research
  • data publishing and repurposing (via the Linked Data and Linked Open Data initiatives)

We intend to increase uptake of ICT products and services in these areas by taking our state-of-the-art tools for multilingual content analysis and resources production and linked data management, porting them to the cloud, and creating a sustainable business model based on two income streams:

  • Marketplace facilitation. Every developer of language resources and digital content crawling, analysis and indexing software will be able to become a supplier on the marketplace (with entry effort comparable to selling on Ebay).
  • Cloud resource reselling. Providers of infrastructure as service or data as service facilities will be able facilitate resale of their services for specialist markets. AnnoMarket will create a resale environment for digital content processing and management.

As an example of revenue stream 1., a consultant specialising in legal information extraction will be able to customise existing analysis pipelines, add value for the market niche that is their domain of expertise, and offer pay-as-you-go scaleable processing to customers via the AnnoMarket portal.

As an example of revenue stream 2., the portal itself will consume pay-per-use compute resources from the big suppliers (e.g. Amazon, Rightscale, CloudFoundry, etc.). We will resell these resources at close to cost — the value to the consumer being removal of the complexity of selecting, marshalling and managing these resources, the associated payments and security infrastructures, and the adaptation of content analysis software to the cloud platforms.

In addition to these innovations, we will create the first properly open content analysis service, based on open source. At present organisations who need to analyse large volumes of text have two choices: A) they can invest in server hardware, systems administrator time and so on, and run their analyses on this infrastructure, or B) they can use a service like OpenCalais where they pump their data document-by-document over to Thomson-Reuters’ servers. AnnoMarket will create a third option by exploiting open source systems adapted to cloud computing.

For example, users will be able to define a web crawl of millions of web pages and have this crawl run on commodity server farms. The results of the crawl will be fed into a highly parallel and distributed multilingual content analysis engine (based on existing mature open source software), again running on 3rd-party server farms. The output from the content analysis engine will then be available for populating index servers or COTS data mining and analysis software to feed diverse business processes. Users will pay only for the compute time and the network bandwidth that they use — they will not require any investment in hardware, and will not incur any fixed costs. Where charges are levied for specialisations of the analysis software or linked data sets pricing will be transparent and usage based, with no heavy up-front licence fees. Customers will also be free from vendor lock-in and similar closed-world disadvantages.

Watch this space!


Share this page

Comments are currently closed.