The measurement of article citations is becoming increasingly prevalent in research policy circles. While the attractions of quantitative measures of research quality are many, it is time to take stock. There is so much momentum behind the notion of citation counts that it is easy to forget the limitations, some of which are fundamental and beyond technical fixes. I strongly believe we need better evidence to inform research policy decision making, but I think there is a real tendency to overplay citation data as it is currently used.
There are a number of well-established and often discussed limitations to citation data.
Although it has been improving, disciplinary coverage remains poor. The coverage of arts and humanities disciplines is inadequate, both in terms of journal articles, and also the more diverse research outputs that characterize these fields of inquiry. Maybe this will improve over time, but maybe not. There is a considerable amount of material that would need to be indexed to bring coverage on a par with the natural sciences, and many technical challenges to be overcome for outputs other than journal articles. Non-standard citation formats, and the combination of endnotes, footnotes and bibliographies all stand in the way of the universal coverage needed to make citation data at all usable when considering whole research systems.
A further limitation comes from the differences in citation behaviour between disciplines. The different citation rates are a particular problem, which can be addressed by various 'field-weighting' approaches. But these approaches themselves introduce a problem. In order for citation counts to be adjusted for fields, each article needs to be allocated to a field, which can be potentially challenging for work that spans disciplinary boundaries.
From a policy perspective, citations also bring challenges because of the long lag time. If a policy change is made, say increased funding to a particular disciplinary area, it could be over a decade before any resulting signal might be seen in citation data. Some of this delay comes from the time it takes to award funding, carry out research, and publish it. However, extended periods are needed following publication before reliable citation information is available. In some extreme cases this could be multiple decades. In terms of measuring the effect of a policy intervention this is just too slow.
There are also challenges introduced by the highly skewed distribution of citation counts. As a result care needs to be taken when aggregating article level data to calculate averages for researchers, universities or nations. But even when this care is taken, a single number doesn't give a truly representative picture. Major differences between groupings can be the result of just a handful of articles that are either highly-cited or receive no citations at all. This is the reason why journal impact factors are so problematic when they are used as a proxy for the citations of individual papers.
But perhaps most important of the limitations of citation data is the weak link to the quality of research. Citations measure a particular property of the cited work; the extent to which it is receiving attention from academic peers. Research quality has many different dimensions, and citation reflects them in different ways. Using the language of the UK Research Excellence Framework, citation may imply a minimum standard of rigour and some academic significance. It tells us little about originality as the citing author may simply have been unaware of other similar work, or even cited multiple very similar studies. A citation may also provide little information about the impact of the work beyond the academic sphere, an important component of research quality.
The relationship between citation and academic significance is also difficult. Citation is in fact a combination of the academic significance of the research and its discoverability. While the discoverability of research is likely to be related to aspects of its quality, there are other factors that determine how easy it is to find research. Most scholars will be careful to read the work of the leading people in their field, or will focus on particular journals. Some articles will appear higher in searches in either research-focused or general search engines, based on algorithms built into the them. All of these effects will tend to increase citation of particular articles for reasons that are independent of their academic quality.
What is particularly worrying about these effects is the potential for a positive feedback loop. Once an article begins to get citations, the citations themselves increase its discoverability. And so it gets more citations, and so on. Becoming highly cited might result from small differences in citation early on which then become amplified through this positive feedback loop. At best this argument suggests that high quality is necessary, but not sufficient for high citation. It is also conceivable that the highly skewed distribution of citations reflects the effect of this feedback loop, rather than the massive differences in research quality that are implied.
There is evidence that discoverability is an important factor in citation rates. The well documented 'citation advantage' of open access articles is hard to explain in terms of research quality. In aggregate, it seems unlikely that open access articles are of a higher quality than closed access, but their discoverability (and accessibility) will be higher. There is also a fascinating piece of analysis that shows that identical content receives more citations if it is published in a journal with a higher journal impact factor. The simplest explanation for this is that the versions in the higher impact factor journals are more likely to be seen, and so cited.
While discoverability is an important attribute of a research article, my conclusion is that citation rates are a severely limited and potentially misleading proxy for research quality. There are many limitations, but the positive attention cycle that underpins citation makes teasing apart research quality from other factors impossible.
This isn't to say that there isn't information contained in the citation network of articles, or that we don't need better, preferably quantitative, indicators of research quality. Rather than concentrating on refining citation counts into ever more complex, but still-flawed measures, we need new approaches. There is innovation happening. For example, the approach of 'semantometrics' is based on the citation network, but uses textual analysis to evaluate an article's contribution. We need to encourage this innovation, and as policy-makers reduce our reliance on citation rates as proxies for research quality.