Text-mining and open access

There is an excellent opinion piece in the latest edition of Research Fortnight by Professor Doug Kell on text-mining and open access. As for many the article will be behind a pay-wall (the irony…) I thought I would summarize the argument and post a few quotes here.

The argument goes like this:

  • New research findings are being added to the body of literature at a rate that means it is impossible for anyone to read it all, let alone assimilate and make sense of it all. The only solution is to use text-mining.
  • There are clear benefits for researchers, business and policy-makers in using text-mining of the scientific literature. For example a recent report from JISC concludes that “there is clear potential for significant productivity gains, with benefit both to the sector and to the wider economy”.
  • But for text-mining to be effective access is needed to the full text. Abstracts are not enough, and for rapid interpretation of new research embargo periods are a problem.

And here are some key paragraphs from the article:

The PubMed database records two new peer-reviewed papers in the life sciences every minute. Across all the sciences, the number is five.

Such is the rate at which scholarly papers are produced that only computers can read them all. As a result, text-mining techniques are infiltrating every field of research, from genomics to the social sciences and humanities. Historians are using text mining to analyse court records from the Old Bailey. Business has been mining newswires since the 1980s to acquire competitive intelligence and today companies use text mining, including of social media, to discover what customers think of their products and services.


To get the most from text mining requires open access to the literature. And it requires it as soon after publication as possible. In the life sciences, six months—the maximum embargo allowed in Research Councils UK’s policy on ‘green’ open access—is a very long time.

This is one reason why the research councils’ policy on open access announced this July made the ‘gold’ model the preferred route. Pursuing gold open access will help the UK to get ahead of the curve in exploiting the opportunities, including text mining, that come from open access.