On data management and open research

I recently attended a fascinating workshop showcasing the work Oxford University has been doing to develop systems for management of research data. The Data Management Roll-out project (DaMaRo) has built and enhanced a suite of tools for handling data across the research life-cycle. Central to the thinking in this project has been ensuring seamless interoperability between the components of the system, using open standards. There has also been considerable effort to make sure that components also ‘talk to’ national and international infrastructure. The tools are being made available as open source software, so they can be used by other institutions.

Making data available has clearly been thought about as part of this project. The data warehousing tools have the capability to make data open to anyone, and a metadata catalogue, which is intended to include both open and closed datasets, will make that data discoverable.

Another key part of Oxford’s work has been to tackle policy issues in parallel to dealing with the technical aspects. The University has developed a data policy, and the tools have been developed as part of the implementation route for the policy.

All good stuff. DaMaRo has developed an impressive set of tools, that do their best to provide an easy-to-use use interface for researchers. A real test will be the uptake and use of the system, but the data management policy, mandates that are coming from funders and the apparent ease of use of the systems should all help.

Valuable as these tools are, there is a question in my mind about whether they go far enough. DaMaRo has been built around a very traditional research life-cycle, that views data as something that is initially available only to researchers and their collaborators, and then made open at some specific point. This point is related to the publication of research outputs. The research project is ‘finished’, outputs are published and data made available. If this were to be fully implemented it would be a huge positive step for research openness, and conforming to the traditional life cycle may well increase uptake by researchers. But it isn’t compatible with more radical views of open research, that see the entire research process being made open, as it is happening. In this model data are released and ideas of interpretation and conclusions are shared in nearer to real time. Rather than the corpus of knowledge increasing in steps as research projects are finished and their outputs released, there is a gradual accumulation of insights, taking place in the public domain.

I think it is interesting to ask what tools are available to support this more radical vision of open research. Are the tools we already have online – blogs, wikis, repositories, social networks – sufficient? Or is there a need for specific tools for research? At a policy level, we also need to think what sorts of data policies are needed to enable, or at least not hinder, those researchers who want to open their work completely to the world.