Thinking through research policy

Using Readwise as a commonplace book

2021-04-04T14:12:24+00:00

Several years ago, I wrote about the potential of the digital commonplace book, and my frustration that it wasn't possible to tie together insight from digital reading effectively. In recent months I have begun using a service - readwise.io - that solves the problem of integrating notes and highlights from a range of sources online.

The offering from Readwise is both simple and powerful. It links up to a selection of sources for reading highlights and collects the information together in a single place. For me, the sources I am using are ebooks on the Kindle platform, highlights within Instapaper, and annotations from Hypothesis¹. There are other options available, including import of annotated pdf files (currently in beta). The collated information can be searched and read with Readwise itself, but there is also an options to export to popular note-taking apps, Evernote, Notion and Roam Research².

Readwise can also send a daily email with five highlights or annotations from your library for review. The email is a form of spaced repetition, a technique for improving recall of information. While I am not trying to memorise highlights from my reading, the regular email is undoubtedly a helpful reminder of material I have read in the past.

I have been using Readwise for several months now and am pleased to say that it has delivered the digital commonplace book that I was hoping for.

Interestingly, I am using exactly the same options for reading as I was when I wrote the original post six years ago. ↩
Unlike my reading options, I have in the last year transferred from Evernote to Roam Research, which is probably a subject for a future post. ↩

Thoughts on Science Fictions by Stuart Ritchie

2021-04-03T11:35:03+00:00

Science Fictions by Stuart Ritchie is a thorough review of the so-called 'replication crisis' in science and research. Building out from the well-documented issues in psychology, Ritchie's discipline, the book is an excellent survey of the problems of fraud, bias, negligence and hype in research. While much of the material and argument are familiar to anyone who has been following these debates, drawing the material together into a well-written narrative is a valuable contribution.

Overall, I enjoyed reading the book and agree with much of Ritchie's diagnosis of the problem and proposed solutions. In particular, his analysis of the issues with the publication system is spot on. More openness in research publication, the use of preprints, and reducing the dominance of journal brand are all part of the solution. In some aspects, I was less convinced by the argument of the book and had two particular reflections.

Despite taking popular science books to task for hype and oversimplification, in some areas, this book has precisely those problems. For example, in discussing the practice of paying researchers cash bonuses for publication in specific journals, Ritchie states that it occurs in "certain universities in other countries, including […] the UK". The claim turns out to refer to one department in one UK university, so limited evidence for a widespread issue. Similarly, Ritchie tells us that "many scientists quit the profession out of frustration", but the evidence to support this statement isn't at all compelling. There are other places where a rather black-and-white argument is presented, hiding considerable nuance. The story isn't quite as simple as Ritchie would have us believe.

The solutions to the problems with science that Ritchie presents in the book's closing chapter also merit further scrutiny and consideration. First, as has been pointed out in a review of the book, there is a tendency to refer back to a 'golden age' of scientific research whose existence is not necessarily supported by the evidence. While there is no doubt that improvements to the extrinsic incentives on researchers are needed, there will still be intrinsic motivations that might encourage bad practice. We also need to pay attention to the balance between false positives and false negatives in the research record. Science Fictions is rightly concerned that there are too many false positives, but false negatives are problematic too, with the potential to prematurely close off promising lines of inquiry. An effective research system balances false-positive and false-negative risk, acknowledging that it is impossible to eliminate either.

Disagreements in peer review

2021-02-20T13:41:40+00:00

Peer reviewers often disagree in their assessments. Whether reviewing research proposals for grant funding or manuscripts for publication, there is ample evidence of low 'interrater reliability'¹. This is regularly presented as evidence that the process of peer review is somehow flawed. For example, a recent review² of the evidence on grant peer review in health research states without qualification that:

If peer review is reliable, the judgements of different peer reviewers on the same proposal should be highly correlated.

While peer review is not without problems, I think there are ample reasons to expect differences in the evaluation of research by different reviewers. Far from being a weakness in the process, the diversity of assessment between reviewers is a potential strength. Indeed, if there is an expectation that reviewers will agree, it casts some doubt on the practice of inviting multiple reviews. Instead, we should be celebrating the differences, and making the most of them in supporting robust evaluation.

A thoughtful analysis³ of the issue of differences between reviewer assessments is provided by Carole Lee. In this paper Lee argues that there are reasons to expect differences between reviewers, and specifically that:

low interrater reliabilities might reﬂect reasonable forms of disagreement among reviewers.

Lee suggests that there are at least two reasons to expect differences between reviewers. First, reviewers might come from distinct subspecialisms, and therefore bring different but equally valid interpretations of the work⁴. Secondly, reviewers may bring alternative interpretations of the criteria used for the evaluation. Lee hypothesises that:

experts can have diverging evaluations about how signiﬁcant, sound, or novel a submitted paper or project is because they make different antecedent judgments about the relevant respects in which a submission must fulﬁll these criteria.

Although there are legitimate reasons for reviewer disagreement, this does not mean that disagreements are not a challenge for peer review. If differing views are not appropriately combined then there is a risk that decisions might have a degree of randomness. This does have implications for the integrity of the process. As Lee puts it:

low interrater reliabilities can make peer review outcomes an arbitrary result of which reviewer perspectives are brought to bear

Therefore, the line of thinking proposed by Lee moves the debate about reviewer agreement away from questions of reliability of peer review, and towards the process for resolving differing opinions. Because differences can be legitimate, collapsing assessments to numerical scores early, and combining them arithmetically will remove rich qualitative information. Instead, there needs to be space for discussion between reviewers so that they can collectively explore their differing views and reach an agreement on a composite view of the article in question.

However, the processes of peer review do not, in general, provide this space for discussion. In the world of grant peer review, discussion space is provided through panel meetings, although often panel members are discussing the assessments of reviewers who are not present. And for journal peer review, arbitration between reviewers is usually conducted by a third party (an editor) without further input from the reviewers. And through both of these processes researchers often have the opportunity to respond to reviews, but this is limited to text responses to the text of reviews. In my experience, this can sometimes be more a process of 'rebuttal' than deep engagement with the points made.

Better approaches for exploring reviewer differences need to become more central to the peer-review process. For example, in journal peer review even introducing a step where reviewers comment on one another's assessments would help. A step like this would be especially straightforward to introduce into open peer review approaches. More generally, open peer review approaches offer the opportunity for a broader discussion of reviews, as well as the article itself. Introducing more reflection opportunities into grant peer review is perhaps more challenging.

Either way, an increasing focus on exploring the legitimate differences of opinion that reviewers bring will contribute to better decision-making, and so improve the benefits delivered from research.

I don't like this terminology as it conflates the notion of consistency of reviewers with the reliability of the assessment. This essay argues that disagreement does not necessarily reflect a lack of reliability. ↩
Guthrie, S., Ghiga, I., & Wooding, S. (2018). What do we know about grant peer review in the health sciences? F1000Research, 6, 1335. https://doi.org/10.12688/f1000research.11917.2 ↩
Lee, C. J. (2012). A Kuhnian Critique of Psychometric Research on Peer Review. Philosophy of Science, 79(5), 859–870. https://doi.org/10.1086/667841. Open access pdf version ↩
In the case of multi- or interdisciplinary research reviewers may come from distinct disciplinary domains. Sometimes it is suggested that this is a particular problem or challenge for the review of research that crosses disciplinary boundaries, but it is also important to consider the value that distinctive perspectives bring to the evaluation. ↩

Tracing the signals of research impact

2020-10-08T08:24:32+00:00

Research has impact. The knowledge, understanding and know-how that comes from deep investigation leads to concrete changes of considerable diversity, from cultural benefits to environmental improvements, from the economic benefits of new and improved products and services, to the health benefits of enhanced treatment approaches and novel drugs.

While there is ample evidence of the impact that research has, in all its diversity, we know much less about how research impact comes about. There are many distinct routes, which are complex and non-linear, but how the steps fit together, and the effect of different choices in the route to impact are less well established. There are also limited theoretical frameworks for the analysis of the process of impact generation.

These gaps in knowledge are important, because they limit the extent to which the process of impact generation can be optimised, both at a system level and in terms of indvidual choices made by researchers and research managers. At the system level, questions include how should we organise the allocation of funding to maximise impact or support particular impact outcomes? Or what structures should we build to facilitate the interactions between researchers and other actors in the system? At the level of individual choices, questions include how to identify research outcomes that have 'impact potential' to assist investment decisions?

A recent article presents an interesting approach to analysing the steps that link research funding interventions to impacts, providing more evidence on this complex process. The authors investigate the links between a set of research grants in the life sciences, the publications generated (through 3 generations via citation relationships), patents that cite these publications and drug registrations. The analysis provides insight into only one specific pathway to impact - the development of drugs that are built on patentable intellectual property - but the work nonetheless reveal rich complexity in that pathway. The study provides insights into the long timelines resulting from investment in fundamental research, and on the potential scale of the return on investment, at least in medical research.

Although the starting point for the analysis is a specific limited set of grant proposals, the scale of the analysis is large. There are 18,197 articles that acknowledge support from the grant programme, and 760,516 second generation articles (that cite the original 18,197) and over 8 million third generation articles. The full set of articles can be linked to 334,908 patents, which in turn are mentioned in the registration information for 774 different drug products.

Having established the links, clustering analysis was used to identify patterns in the data, as illustrated below:

The cluster highlighted in the network is densely linked and was subjected to further analysis. There are a number of drugs contained within the cluster that are all based on a similar mechanistic approach, the use of short lengths of DNA or RNA as drugs. Delving more deeply, the authors then move on to look at the network associated with just one drug, Onpattro:

This network illustrates the complex relationship between the original grant investment and the development of Onpattro. The information in many articles combined in different ways is involved, resulting in many patents several of which are central to the development of the drug. And this network may not reflect all the research that went into the development of the drug, as it is limited to research that can be traced to a specific set of grants or whose contribution is visible in the citation network.

Given the complexity of the network, the impact pathway is surely non-linear, but it is possible from the data to identify the sequence in time related to the development, as illustrated below:

The network can be traced back 38 years from the drug registration to the first journal articles in 1980, with the grant funding made as early as the late 1970s. Impact timelines can indeed be long.

The study is not without limitations, especially resulting from the focus on journal articles and their citation as the only mechanism of knowledge diffusion. Other routes are undoutedly important; diffusion through other forms of codified output, or the transfer of people along with their tacit knowledge. These alternative mechanisms may be important even in the specific impact area of this article, and they may have increasing importance in other impact areas. However, this study illustrates the power of sytematically tracing the links between research and its impact and offers the prospect of developing new theoretical frameworks to understand the process. More work of this type alongside other methods to investigate other pathways to impact is required to support an increasingly evidence-based approach to research impact decision making in the future.

Connecting journal articles

2020-08-19T08:33:14+00:00

Surveying the literature is a task with which all researchers are familiar. A common approach is to start from a known relevant article, identified either on the basis of a keyword search or a recommendation, and then to work 'outwards' from it by looking at the reference list and cited papers. This works well enough, but I recently came across¹ a new tool - Connected Papers - which adds a new and different dimension to literature searching.

As explained in a post announcing its launch, Connected Papers surfaces articles related to one another, not through direct citation/referencing relationships, but as a result of articles having references or citations in common. The related articles are presented visually, clustered based on their relatedness, and coloured based on their year of publication. The service also provides lists of 'prior' and 'derivative 'works. Prior works are those articles that are cited by many of the related cluster, and this list is a useful place to look for key (or at least widely cited) contributions to the area. Derivative works are articles that cite many of the related cluster, and is the place to find current thinking in the area or recent review articles. Of course, the usual limitations of citation based tools apply: the tool only works for journal articles and is limited in its disciplinary coverage. Nonetheless in experimenting with Connected Papers over the last few weeks I have found it a useful tool to find new articles on topics that I am researching.

The best way to illustrate the tool is with an example. The image below shows the graph generated using an article written by Gemma Derrick and Gabrielle Samuel on impact evaluation in the 2014 Research Excellence Framework (REF)².

I have added a red circle to indicate the paper used to generate the graph. To the 'north' there is a cluster of papers concerned with the assessment of research impact, including two key review articles (Bornman, 2013 and Penfield, 2014), and some contributions to impact assessment that are distinct from the approach taken in the REF (Spaapen, 2011 and Joly, 2015). The more 'north east' part of this cluster contains contributions to impact assessment specifically in the health research area, in which a lot of the earlier thinking for research impact evaluation was carried out. The cluster to the 'south west' of the central paper represents the emerging literature on the potential (or not) for alternative metrics to capture research impact. The full interactive version of the graph, including the prior and derivative works, can be explored by clicking here.

I know the field of research impact evaluation reasonably well, so this graph did not generate any surprises for me in terms of work I had not previously come across. As an introduction to the field, however, I think Connected Papers represents a very slick way of going from one article to understanding the broader landscape.

How I came across Connected Papers is itself an interesting story of serendipity. Over the last few months I have been using the personal knowledge management and note taking app, Roam. While reading a blog post on using Roam, I spotted a reference to Connected Papers in an image illustrating the post. ↩
The paper used is Derrick, G. E., & Samuel, G. N. (2016). The Evaluation Scale: Exploring Decisions About Societal Impact in Peer Review Panels. Minerva, 54(1), 75–97. https://doi.org/10.1007/s11024-016-9290-0 ↩

Virtual peer review panels - the evidence

2020-08-06T09:53:34+00:00

Over recent months many of us have become accustomed to conducting all manner of business through video calls. Tasks that, we would have argued, had an absolute requirement to be conducted face-to-face have transferred almost seamlessly to an online or virtual context. From a personal perspective, I have found this transition to be an effective one, with business being conducted as effectively, and sometimes more so.

But it is important to challenge perceptions and seek evidence about the effectiveness of virtual meetings, especially when face-to-face interactions are central to a process. An example that demands particular scrutiny in the context of research policy is the use of panel meetings for peer review. Panel meetings are at the heart of funding decisions for research projects, and play a part in some research evaluation processes. As we get used to living with an dangerous and endemic virus, the conduct of peer review panels remotely is set become the norm. We need to understand the effectiveness of this new setting so that we can mitigate any negative effects and ensure the robustness of processes.

All peer review panels for grant proposal review operate in a similar manner. Panel members read the proposals in advance of the meeting and assign a provisional score or rating to them. A subset of the panel focus on particular proposals as the lead reviewer or reader, in order to manage and distribute the volume of work. At the panel meeting the panel as a whole review each proposal, listening to the views of the lead reviewer(s) and coming to a collective and agreed score or rating. While not always stated explicitly, the purpose of the panel discussion is to bring a greater diversity of views to bear and to mitigate against any implicit or unconscious biases that the lead reviewer(s) might have.

The extent to which panels are effective in delivering their objectives is an important question. However, in this post I want to focus on the performance of virtual panel meetings compared to the more usual face-to-face setting. From the evidence I have looked at (see the annotated bibliography below) I draw four conclusions.

There is limited evidence on the effectiveness of virtual panel meetings. There are only a handful of studies that have examined this question systematically, and they are all concerned with the assessment of medical research funding applications. The disciplinary focus is perhaps not a big problem, but the small number of studies does limit the extent to which the findings can be confidently generalised. It is also worth noting that none of the studies examined the effectiveness of the panel process directly; they did not ask whether the correct funding decisions were made. Instead two types of information have been collected: the perceptions of panel members of the effectiveness of the process, and data relating to the panel operation, like the extent to which scores were adjusted or the time spent discussing each application. The studies were exclusively in the North American context.

In general, panel members prefer face-to-face meetings to virtual meetings. Surveys of panel members reveal a preference for face-to-face meetings. Panel members suggested that better communication, both during the formal parts of the meeting and in the spaces around the formal sessions, is driving their preference. The minority of panel members who preferred virtual meetings did so because of the logistical convenience.

Despite their preferences, panel members do not report major differences in effectiveness between face-to-face and virtual meetings. Survey evidence did not reveal statistically significant differences in panel members perception of the effectiveness of the process using tele- or videoconference methods. Effectiveness was assessed through questions relating to use of expertise, quality of discussion and its facilitation, and the perceived robustness of the outcomes. There were some perceived reductions in quality for 'web-based meetings'. A precise definition of 'web-based meetings' wasn't provided, but the implication is that this was a text-only format not involving an audio or video link.

Virtual panel meetings have similar outcomes, but there is less time spent on discussing proposals Measures of panel process are similar for face-to-face and virtual meetings. For example, the extent to which scores were adjusted from the reviewers personal scores after discussion was similar in different settings. The only notable difference identified is that discussions tended to be shorter in virtual settings. This may be important, as panel members perceive that shorter discussions are more likely to result in biased outcomes. However, there is no evidence that this is the case from the scoring distributions, and further work would be needed to explore this matter fully.

There does not appear to be significant evidence to suggest that remote peer review panels are less effective than those conducted face-to-face. However, the evidence base is small and limited in scope, and is focussed on perceptions of panel members, and measures of the mechanics of panel processes. The current shift to more virtual panel meetings presents an opportunity to improve the evidence base on this important question.

Finally, the evidence available is limited to panels assessing grant proposals. I was not able to find any studies looking at the less common use of panels for the ex post evaluation of research outputs and impacts, such as used in national research evaluations like the UK's Research Excellence Framework. With the likelihood that the assessment phase of the 2021 REF will be conducted at least partially via remote meetings, there is an opportunity to gather new evidence for this different context.

If you are aware of relevant evidence that I have missed, please add a comment below.

Annotated bibliography

Primary research

Gallo, S. A., Carpenter, A. S., & Glisson, S. R. (2013). Teleconference versus Face-to-Face Scientific Peer Review of Grant Application: Effects on Review Outcomes. PLoS ONE, 8(8), e71693. https://doi.org/10.1371/journal.pone.0071693
This paper reports on a difference-in-difference approach to examine the effect of switching from face-to-face grant panel meetings to meetings conducted by teleconference. The study examines a number of metrics of the peer review process and finds no significant differences, with the exception of a reduction in discussion times for meetings conducted by teleconference. Surveys of panel members did not reveal any differences in their perceptions of the robustness of the process.

Carpenter, A. S., Sullivan, J. H., Deshmukh, A., Glisson, S. R., & Gallo, S. A. (2015). A retrospective analysis of the effect of discussion in teleconference and face-to-face scientific peer-review panels. BMJ Open, 5(9), e009138. https://doi.org/10.1136/bmjopen-2015-009138
This study uses the same dataset and approach as the authors previous work, above. In addition to confirming the findings of the previous study, this analysis shows that there is a small reduction in the changes between pre-meeting and post-discussion scoring in the teleconference environment.

Gallo, S. A., Schmaling, K. B., Thompson, L. A., & Glisson, S. R. (2020). Grant reviewer perceptions of the quality, effectiveness, and influence of panel discussion. Research Integrity and Peer Review, 5(1). https://doi.org/10.1186/s41073-020-00093-0
This paper reports on a survey of reviewer perceptions of the role of panel discussion in grant peer review outcomes. It reports that many panel members think that the discussion is valuable, but there are concerns about poor and/or short discussions not addressing or even accentuating bias. The central role of the panel chair in facilitating discussions is a key factor for panellists, accompanied with evidence that the role is sometimes perceived to have been carried out poorly. Training for panel chairs is recommended as an outcome from the research. There is only passing mention of remote panel meetings, with evidence that shifts in scoring and the length of discussions tend to be shorter in virtual settings.

Gallo, S. A., Schmaling, K. B., Thompson, L. A., & Glisson, S. R. (2019). Grant reviewer perceptions of panel discussion in face-to-face and virtual formats: lessons from team science? bioRxiv https://www.biorxiv.org/content/10.1101/586685v2?versioned=true
Note: This is a preprint version of Gallo et al. (2020) that is substantially different and contains more information on virtual panels.
Based on a survey of panel members, this paper reports a preference among panel members for face-to-face meetings, but no evidence to suggest that panellists felt that the quality of discussion was significantly impacted when face-to-face is compared to tele- or video-conferencing.

Pier, E. L., Raclaw, J., Nathan, M. J., Kaatz, A., Carnes, M., & Ford, C. E., (2015). Studying the study section: How group decision making in person and via videoconferencing affects the grant peer review process (WCER Working Paper No. 2015-6). https://wcer.wisc.edu/docs/working-papers/Working_Paper_No_201506.pdf
_This study examines the effects of discussion on score adjustment, compared different panels' assessment of the same proposals, and examined whether there were any differences between face-to-face and video conference panels The panel meetings adopted NIH procedures, but used already reviewed proposals, and did not lead to any changes in outcome for those proposals. There were three face-to-face panels and one by videoconference. There was no evidence of any difference between the face-to-face and virtual panels, except less time was spent on each proposal in the virtual panel. Some panel members perceived that the face-to-face meetings were more effective, although the evidence does not support the existence of material differences.

Vo, N. M. & Trocki, R. (2015). Virtual and Peer Reviews of Grant Applications at the Agency for Healthcare Research and Quality. Southern Medical Journal, 108(10) 622-626
This paper reports on six review panel meetings that were conducted via video conference because of hurricanes in 2012. The virtual panels are compared to five face-to-face panels. The report concludes that there was no evidence of reduced effectiveness in the virtual panels.

Review articles

Guthrie, S., Ghiga, I., & Wooding, S. (2018). What do we know about grant peer review in the health sciences? F1000Research, 6, 1335. https://doi.org/10.12688/f1000research.11917.2
This review article summarises evidence available on peer review, with a focus on the life and health sciences. A range of areas of evidence are covered including the use of remote meeting formats for peer review.

Shepherd, J., Frampton, G. K., Pickett, K., & Wyatt, J. C. (2018). Peer review of health research funding proposals: A systematic map and systematic review of innovations for effectiveness and efficiency. PLOS ONE, 13(5), e0196914. https://doi.org/10.1371/journal.pone.0196914
This paper reports a systematic review of literature examining the effectiveness of innovations in peer review. It includes a review of evidence on using remote meetings for peer review.

A post-pandemic research agenda

2020-06-18T17:39:16+00:00

Much has already been writen about the new world that will emerge following the acute phase of the coronavirus pandemic. One idea that crops up repeatedly is the notion that, as well as presenting many challenges, the current circumstances present an opportunity to rethink and reset. We don't need to go back to what came before and that is a positive from the crisis. This was captured especially eloquently by Arundhati Roy who described the pandemic as portal to a new world which "we can walk through lightly, with little luggage, ready to imagine another world".

This implies making some choices to do things differently. Writing in the Future Crunch newsletter, Gus Hervey argues for an approach to rethink our economic goals and structures. The piece draws on the work of two economists - Mariana Mazzucato and Kate Raworth. Mazzucato has argued in a number of books, most notably The Entrepreneurial State, that there is a need to recognise the central role of the state in innovation. This book and its arguments has received considerable attention in the innovation and research policy communities. I was personally much less familiar with Raworth's work, and while I had heard of her book, Doughnut Economics, I hadn't read it. Having read the book, I find that the arguments are compelling. As well as providing a valuable impetus to thinking differently about the future economy, the ideas in Doughnut Economics provides a useful framework for thinking about research strategy and direction.

The central idea in Doughnut Economics is that we need to replace GDP growth as the principal goal of economic and public policy. As well as being too narrow a measure¹, Raworth argues that the focus on growth inevitably creates unequal societies and leads to environmental degradation. Instead Raworth introduces the idea of target to maintain the world in a 'safe zone' of her 'doughnut'. The minimum requirement is that everyone on the planet is able to reach a set of standards, the 'social foundation', defined by the UN Sustainable Development Goals. The maximum corresponds to a set of clearly defined limits of the earth's environmental system, the 'ecological ceiling'.

Currently, we are definitely outside of the 'safe zone', with huge numbers of people globally not reaching the social foundation, while at the same time significantly exceeding the ecological ceiling. Having set this goal, Raworth goes on to explain a range of changes needed to the world's economic system that will contribute to achieving the goal. Some of these are radical, some surprisingly simple, but, overall, the book presents a compelling agenda for a new future.

While there are many actions that are needed to stimulate progress into the safe zone, how could we orient the research system more with this vision of the future? I think there are three ways that research strategy and policy could respond.

Fund more research into developing the ideas behind Doughnut Economics. Although the book contains evidence to support the alternative economic model it espouses, in order to convince decision-makers, and to refocus away from GDP growth, further research is needed. I was especially struck by the chapter that discusses the application of a systems approach, and complexity theory to problems of the economy and society. This seems a powerful approach, and instrumental in demonstrating the overly simple view of much policy making, and so is worthy of more research effort.

Focus research on the safe zone. If my first point is focussed on a very specific area of research, the second is about a broad research agenda. If we are to get into the safe zone of the doughnut, then there is a major research effort needed against each aspect of both the social foundation and the ecological ceiling. We need to focus effort on these areas and only support research that is delivering against the lower and upper limits. This does not necessarily imply that all research should be aimed at immediate problem solving, as there are many questions that require better fundamental understanding. But if we are to get to the safe zone, there is a need to incentivise and focus on research that has a plausible pathway to get there. Not everything will achieve its expected outcome, but the idea of the doughnut gives a framework within which to consider research impact from a more normative perspective. The type of research impact matters and impact that only generates narrow economic growth at the expense of the social foundation or the ecological ceiling should not be a priority for public investment.

Consider adopting a 'doughnut' approach to research policy. There is also the potential to explore how the ideas of Doughnut Economics could be applied to policy for research, and some of these ideas have already attracted interest. For some time, researchers, funders and others have been promoting open research and the idea of constructing a global knowledge commons, an important idea advocated by Raworth. A more radical avenue to explore would be the application of a more redistributive approach to research funding. There are already advocates for alternatives to a competitive approach to funding distribution, including using basic research income, peer allocation mechanisms, and random allocations. Maybe now is the right moment to give these alternatives some serious consideration. Finally, one of the key notions that Raworth puts forward in the book is to treat the economy holistically, and recognise that it is a complex system. This is also true of the research system, and, while the language of systems thinking is often used in the context of research, a systems approach is less often used in practice. Considering how to make research policy in a whole system context is an interesting challenge that merits further exploration.

Mazzucato's Value of Everything is an excellent longer review of this question. ↩

Four trends shaping the future of research evaluation

2020-03-02T09:55:46+00:00

Earlier today I gave a keynote presentation at the 10th anniversary LIS-bibliometrics conference. This post is a summary of the argument that I made in my presentation; you can also view the slides.

The evaluation of research operates at a range of levels within the research system, from individuals to comparisons between nations. The methods and approaches used need to be sensitive to the purpose(s) of the evaluation and the level at which it is applied, and the recently proposed SCOPE framework is a useful framework for thinking carefully about these questions.

It is also important to consider how research evaluation could or should evolve in the future. Making predictions is always hard, but one tool is to consider the potential trends and drivers that might bring about change. These drivers can arise either within or outside the research system itself, and might have a range of implications for evaluation. Considering trends and drivers helps us think about possible futures, and the options that we have in shaping them.

In this post, I want to consider four trends that have the potential to influence research evaluation in the future. My thoughts draw on a range of evidence sources, including recent work commissioned by Research England , an international study (pdf) conducted by Elsevier and Ipsos MORI, and early outputs from a Demos project on 'Research 4.0'. I have also been influenced by my general exposure to the research system and literature.

While the four trends I discuss are not the only factors influencing the future of research evaluation, I do believe they are important ones, that deserve considered attention.

1. The nature of research outputs is changing

Despite extensive digitisation, the fundamental nature of research outputs has remained much the same for four hundred years. Research is disseminated in writing either in short or long form (journal article and book chapters, or books, respectively). There are signs that this is beginning to change, with researchers themselves predicting an increase in the diversity of the output types that they produce. These changes are partly driven by the imperative to reach wider audiences, and the easier availability and distribution of non-text-based media. At the same time, the outputs of research are becoming more openly accessible with evidence suggesting that more than three-quarters of journal article views will be through open routes by the end of the decade.

One of the key drivers changing research outputs is increased collaboration. The outputs of research increasingly have international co-authorship, with analysis indicating that many nations (including the UK) have overseas authors on more than 50% of their 'national' output. There are also trends towards an increasing number of authors on publications.

Researchers are also starting to really take advantage of the digital medium of publishing, with different component of the research process - text, data, and associated outputs - being published separately as citable and linked entities. The push towards more reproducibility of research is also leading to the idea that data and code used in analysis should be published, and there are experiments going on to make the journal article fully executable.

These changes have big implications for research evaluation. Journal articles and books are self-contained entities that are 'finished' at a specific point in time. In the future research outputs are likely to be much more dynamic and need to be considered not in isolation but in the context of other related and linked outputs.

2. Insight from the citation network is increasing in sophistication

Citation data is regularly used as part of research evaluations, notwithstanding their considerable limitations (which I have discussed previously. But many current approaches are essentially limited to counting the citations an article receives, with little attention paid to the network context within which articles sit. The only data from the rest of the network that is routinely considered are the citation counts of supposedly similar articles, in an effort to normalise or contextualise the counts.

But there is richer information contained in the network relationships and the full text of articles, which is beginning to be exploited. Perhaps the longest standing of these methods are co-citation and co-authorship analyses that can be used to investigate dynamic disciplinary groupings and interdisciplinary research (this paper is an example; there is an OA version available).

Methods are emerging to combine information about the citation network with analysis of the full text of articles, such as the approach that has been termed 'semantometrics', which I have written about before. More recent work has sought to use information from the citation network to measure how disruptive an article has been, a property that seems to be not necessarily related to its citation count. The article concerned has also been the subject of a previous post.

Of course, these new approaches come with many of the challenges associated with current citation counting methods, not least the still-poor coverage of some output types and, so, certain disciplines in bibliometric databases. They will no doubt raise their own issues too, but the approaches are likely to become more accessible and mainstream over the coming years. We need to think carefully how they will fit, if at all, into our future responsible evaluations.

3. There is an increasing focus on the culture of research

Recent years has seen an explosion of interest in issues of research culture, culminating the recent report from Wellcome. There is general agreement that all is not right in our research organisations, and considerable debate about the source of the problems or the potential solutions. Despite differing views, there is some agreement, however, that the reward, recognition and evaluation approaches used within the research system do not pay enough attention to issues of research culture.

For example, recent analysis of national research evaluations in 20 nations reveals that there is precious little attention paid to the process of research, or issues of research culture. If we are serious about tackling the challenge of improving research culture expanding the horizons of research evaluation will need to be part of the mix.

I think we need to accept that doing that will need to involve qualitative evaluation approaches. There are also emerging prospects of automated methods (to examine, for example, statistical robustness), or more quantitative approaches. An example of the latter is interesting work looking at gender representation in research articles from UK universities.

Responding to this challenge will need care. It would be easy to design evaluations of research culture that do more harm than good. Keeping the principles of responsible research evaluation front of mind will help to mitigate this risk.

4. AI has the potential to revolutionise research assessment

Finally, we need consider if and how the rapidly expanding and increasingly effective tools of Artificial Intelligence (AI) should be applied to research evaluation. The pace at which AI tools are increasing in power is dramatic, whether the AI is winning complex games of strategy, predicting materials with specific properties, or designing new antibiotics.

There are active experiments in the use of AI for tasks in research evaluation. Microsoft uses an algorithm to determine the 'saliency' of sources in its Microsoft Academic product. In the publishing sector there are experiments like Unsilo, which appears to be able to extract key findings out of articles, as well as identify 'missing' references from the bibliography. The Cochrane Collaboration are also examining the potential for machine learning to assess whether articles should be included in systematic reviews, alongside, but not replacing, human reviewers.

Whether you think AI has a place in research evaluation or not, it will inevitably be raised as a possibility in the near future. The key for the evaluation community is to begin researching this question, so that we have a sound evidence-base on the challenges and opportunities. Just like we needed frameworks to consider the responsible use of citation metrics, we need guidelines for the responsible use of AI in research evaluation.

Those guidelines will need to go beyond a simple technocratic assessment of the abilities of AI, but also include broader considerations of the impact on the research system. For example, a recent report has highlighted that AI-augmented systems are being developed for both the writing of grant proposals and their evaluation, raising the prospect of an 'arms race' of competing AIs that seems unlikely to serve the system well.

The four trends considered in this article are not based on speculation, but on evidence of what is happening now. This doesn't mean that how the trends play out in the future of research evaluation is fixed. There are many future trajectories, and highlighting these trends is aimed at encouraging those in the research system to begin thinking about the implications now. Early thought and action will also inform our response, using the changes and opportunities to build a more effective research system in the future.

Text-mining the research literature

2020-01-31T15:38:00+00:00

The size and complexity of the research literature is growing rapidly, and this trend has been evident for a number of years. More nations are investing in research, and within established research nations, investment is increasing Im both cases more research outputs are generated. Alongside increased investment, there is an increasing pressure on researchers to publish more and more, especially in nations that have chosen to attach incentives to the volume of research.

The increasing scale of the research literature presents a considerable challenge for researchers, and maintaining an understanding of developments across even a narrow disciplinary focus can be difficult. And in considering new questions and research directions, extracting currently latent insights from the literature can be a significant barrier.

Advances in AI and machine learning are beginning to offer real potential to help with this challenge. This is illustrated by an article from last year that used relatively simple, and unsupervised text mining approaches on the materials science literature (a read-only version of the article is available).

This articles reports on analysis using text from the materials science literature, with minimal human intervention. The authors were able to predict, with a relatively high level of accuracy, properties of materials, even when the literature did not itself contain reports on the properties. Most strikingly, the authors demonstrate that their approach could have predicted the discovery of novel materials. Using historically bounded sets of literature, they show that materials that exhibit a range of properties (thermoelectric or photovoltaic behaviour, for example) could be predicted years in advance of their 'discovery' using wholly empirical approaches.

The method used is based entirely on text analysis, so could, at least in principle, be applied into other research domains. The study used only text from article abstracts, and the authors suggest that working with full text may actually be more difficult, due the more complex and nuanced language used in the full articles. Some initial filtering of abstracts made for a more effective prediction process, and using a larger dataset (the full corpus of Wikipedia) performed less well.

The authors conclude:

"Scientific progress relies on the efficient assimilation of existing knowledge in order to choose the most promising way forward and to minimize re-invention. As the amount of scientific literature grows, this is becoming increasingly difficult, if not impossible, for an individual scientist. We hope that this work will pave the way towards making the vast amount of information found in scientific literature accessible to individuals in ways that enable a new paradigm of machine-assisted scientific breakthroughs."

There is huge potential in this 'new paradigm', and not just for scientific disciplines. Part of the challenge is assembling and accessing the required data, but many are seeking to address this issue, and assemble large corpuses for analysis (as reported in Nature last year).

Alongside the potential, there are some important implications for the processes and culture of research. Activities that have relied on time-consuming work, like extracting insight from the literature, may switch to machine-led or machine-augmented alternatives. So, the scale and skills of the workforce will need to change in response.

In the short term, researchers who are familiar with the tools, techniques and algorithms of AI and machine learning, and their limitations, are likely to be able to advance their research more effectively. And in the medium term, the policy challenge is to ensure these new approaches become embedded in the training of the all researchers.

Thinking about CRediT

2020-01-27T09:47:00+00:00

CRediT Check – Should we welcome tools to differentiate the contributions made to academic papers?

Writing on the LSE Impact Blog, Lizzie Gadd makes thoughtful points about the potential consequences of the CASRAI CRediT approach to identifying researcher contributions.

Two especially important points stood out for me.

On the one hand, we need to acknowledge the reality that different roles carry different weights, that will vary depending on both disciplinary norms and the context of individual research projects:

"Assuming that CRediT are not seeking to abolish the role of author altogether and assuming they don’t believe non-author-contributors should be relegated to the acknowledgements, where presumably they’d get no formal credit at all, I’m not entirely sure where this leaves us. Are they creating a third category of research participant, slightly more than ‘acknowledgee’, but less than author? And assuming such a status could easily be incorporated into the world’s bibliographies, can someone’s contribution be assessed merely on the role name (e.g., ‘Software’) or would it need to be assessed on the level of their contribution in that role?"

While on the other hand, there needs to be real care in how different roles are rewarded and recognised:

"We are already seeing bibliometric analyses based on contributor roles. Whilst this is interesting at a ‘science of science’ level (e.g., are roles gender based?), it worries me on an individual researcher evaluation level. Are we going to see some roles prized above others? Will some roles literally ‘count’ and some roles not? And what impact will this have on those early career researchers in project administration and literature searching roles that CRediT seeks to give previously unacknowledged credit to? Will they, in another terrible fit of irony, be excluded from some forms of credit altogether?"

I must admit I have always had a level of discomfort with the CRediT approach, which I haven't been able to articulate until reading this post. The issue, as ever, is about the potential unintended consequences of well-meaning and superficially straightforward solutions. As the research community collectively considers solutions to the research culture challenge, really thinking through the potential implications of interventions is more important than ever.