Research funders across the world are implementing data management and sharing policies to maximize openness of data, transparency and accountability of the research they support. This guide aims to cover guidance on how to plan your research using a data management checklist, how to format and organize data, and how to publish and cite data. This is a useful guide for students and researchers on a topic of increasing importance, writes Emily Grundy.
Managing and Sharing Research Data: A Guide to Good Practice. Louise Corti, Veerle Van den Eynden, Libby Bishop & Matthew Woollard. SAGE. 2014.
Social scientists in the UK have long benefited from access to a wide range of government and market research and researcher-generated databases via what is now known as the UK Data Archive. This started life in 1967 as the Social Science Research Council (now Economic and Social Research Council) Data Bank. The London School of Economics played a role in this initiative through its support for the committee which recommended establishing an archive, and indeed the first Chair of the SSRC, Michael Young, tried to interest LSE in a collaborative bid to host it. The archive was eventually set up at the University of Essex and has remained there ever since.
The authors of Managing and Sharing Research Data: A guide to good practice all work at the Data Archive and collectively have a wealth of experience in the complex range of activities related to promoting, managing and facilitating data sharing. In many ways data sharing has become much easier due to technical advances which have enormously speeded and simplified the procedure of acquiring and processing data (no longer do users have to wait for postal delivery of magnetic tapes). However, in other respects the whole business of depositing, managing, acquiring and using data responsibly has become more complicated, not least because of the wealth of data sets of different types now available. This guide sets out to help students, researchers, academics and research support staff through these processes and deals with documenting, formatting, storing and transferring data, as well as with legal and ethical issues, publication and citation.
Chapter 1 sets out the case for managing and sharing research data. The authors argue that this is essential on the grounds of scientific rigour – published findings should be replicable – as well as on the principle that publicly funded research should be regarded as a common good. Moreover, duplicating data collection exercises is obviously wasteful and cumulatively may also tend to increase response burden (if people are asked to participate in multiple surveys) and lead to lower response rates. The chapter documents various promulgations, guidelines and policies produced by UK research funders and national and international agencies. These include the UK government’s Open Data White Paper published in 2012. Although probably beyond the scope of a short guide such as this, it would be interesting somewhere to evaluate progress made in meeting all the declarations made by these various agencies. In the UK, for example, although there have been improvements in accessing data, including via government virtual microdata labs, there have also been setbacks and some oscillation in practices, particularly in response to publicity around ‘mislaid’ or misused data. Devolution has also caused some extra complications as the policies of the Northern Ireland Statistical Authority, the Office for National Statistics (ONS) and National Records Scotland all differ in some respects. The chapter includes an informative account of surveys of researchers’ attitudes to data sharing – and variations in these by disciplinary background- and the authors highlight various disciplinary differences. There is also an interesting piece on the growth of ‘citizen science’ projects.
Subsequent chapters follow with more guidance on planning for data sharing throughout the data life cycle, documentation, data management, formatting and storage. Chapter 7 returns to consider in more detail the legal and ethical issues involved in sharing research. This includes a consideration of statistical disclosure techniques. This is an important and expanding area of research and practice which is often poorly understood so this chapter is useful although, understandably in such a short guide, it does not go into detail about underlying theory. Suggestions are made about how to reduce risks of being able to identify individuals in data sets (in addition of course to removing personal identifiers) through top coding of information and, for example, including only month of birth rather than day. All of this is sensible although perhaps it would have been useful to acknowledge that such practices do limit the potential for further use in ways that may not be foreseen at the time of deposition.
The book has many case studies, exercises and fun quizzes. Some of the latter in particular are slightly irritatingly low level but others could be a useful basis for a class discussion. Overall this is a useful guide for students and researchers on a topic of increasing importance.
Emily Grundy is Professor of Demography at the London School of Economics. Read more reviews by Emily.