Peer reviewers often disagree in their assessments. Whether reviewing research proposals for grant funding or manuscripts for publication, there is ample evidence of low 'interrater reliability'1. This is regularly presented as evidence that the process of peer review is somehow flawed. For example, a recent review2 of the evidence on grant peer review in health research states without qualification that:
If peer review is reliable, the judgements of different peer reviewers on the same proposal should be highly correlated.
While peer review is not without problems, I think there are ample reasons to expect differences in the evaluation of research by different reviewers. Far from being a weakness in the process, the diversity of assessment between reviewers is a potential strength. Indeed, if there is an expectation that reviewers will agree, it casts some doubt on the practice of inviting multiple reviews. Instead, we should be celebrating the differences, and making the most of them in supporting robust evaluation.
A thoughtful analysis3 of the issue of differences between reviewer assessments is provided by Carole Lee. In this paper Lee argues that there are reasons to expect differences between reviewers, and specifically that:
low interrater reliabilities might reﬂect reasonable forms of disagreement among reviewers.
Lee suggests that there are at least two reasons to expect differences between reviewers. First, reviewers might come from distinct subspecialisms, and therefore bring different but equally valid interpretations of the work4. Secondly, reviewers may bring alternative interpretations of the criteria used for the evaluation. Lee hypothesises that:
experts can have diverging evaluations about how signiﬁcant, sound, or novel a submitted paper or project is because they make different antecedent judgments about the relevant respects in which a submission must fulﬁll these criteria.
Although there are legitimate reasons for reviewer disagreement, this does not mean that disagreements are not a challenge for peer review. If differing views are not appropriately combined then there is a risk that decisions might have a degree of randomness. This does have implications for the integrity of the process. As Lee puts it:
low interrater reliabilities can make peer review outcomes an arbitrary result of which reviewer perspectives are brought to bear
Therefore, the line of thinking proposed by Lee moves the debate about reviewer agreement away from questions of reliability of peer review, and towards the process for resolving differing opinions. Because differences can be legitimate, collapsing assessments to numerical scores early, and combining them arithmetically will remove rich qualitative information. Instead, there needs to be space for discussion between reviewers so that they can collectively explore their differing views and reach an agreement on a composite view of the article in question.
However, the processes of peer review do not, in general, provide this space for discussion. In the world of grant peer review, discussion space is provided through panel meetings, although often panel members are discussing the assessments of reviewers who are not present. And for journal peer review, arbitration between reviewers is usually conducted by a third party (an editor) without further input from the reviewers. And through both of these processes researchers often have the opportunity to respond to reviews, but this is limited to text responses to the text of reviews. In my experience, this can sometimes be more a process of 'rebuttal' than deep engagement with the points made.
Better approaches for exploring reviewer differences need to become more central to the peer-review process. For example, in journal peer review even introducing a step where reviewers comment on one another's assessments would help. A step like this would be especially straightforward to introduce into open peer review approaches. More generally, open peer review approaches offer the opportunity for a broader discussion of reviews, as well as the article itself. Introducing more reflection opportunities into grant peer review is perhaps more challenging.
Either way, an increasing focus on exploring the legitimate differences of opinion that reviewers bring will contribute to better decision-making, and so improve the benefits delivered from research.
I don't like this terminology as it conflates the notion of consistency of reviewers with the reliability of the assessment. This essay argues that disagreement does not necessarily reflect a lack of reliability. ↩
Guthrie, S., Ghiga, I., & Wooding, S. (2018). What do we know about grant peer review in the health sciences? F1000Research, 6, 1335. https://doi.org/10.12688/f1000research.11917.2 ↩
In the case of multi- or interdisciplinary research reviewers may come from distinct disciplinary domains. Sometimes it is suggested that this is a particular problem or challenge for the review of research that crosses disciplinary boundaries, but it is also important to consider the value that distinctive perspectives bring to the evaluation. ↩