Blog Admin

December 16th, 2015

Who’s afraid of Open Data: Scientists’ objections to data sharing don’t stand up to scrutiny.

13 comments

Estimated reading time: 5 minutes

Blog Admin

December 16th, 2015

Who’s afraid of Open Data: Scientists’ objections to data sharing don’t stand up to scrutiny.

13 comments

Estimated reading time: 5 minutes

Many scientists are still resisting calls to openly share underlying data. Whilst their concerns should be taken seriously, Dorothy Bishop doesn’t think the objections withstand scrutiny. Concerns about being scooped are frequently cited, but are seldom justified. If we move to a situation where a dataset is a publication, then the original researcher will get credit every time someone else uses the dataset. And in general, having more than one person doing an analysis is an important safeguard for science.

I was at a small conference last year, catching up on gossip over drinks, and somehow the topic moved on to journals, and the pros and cons of publishing in different outlets. I was doing my best to advocate for open access, and to challenge the obsession with journal impact factors. I was getting the usual stuff about how early-career scientists couldn’t hope to have a career unless they had papers in Nature and Science, but then the conversation took an interesting turn.

“Anyhow,” said eminent Professor X. “One of my postdocs had a really bad experience with a PLOS journal.”

Everyone was agog. Nothing better at conference drinks than a new twist on the story of evil reviewer 3. We waited for him to continue. But the problem was not with the reviewers.

“Yup. She published this paper in PLOS Biology, and of course she signed all their forms. She then gave a talk about the study, and there was this man in the audience, someone from a poky little university that nobody had ever heard of, who started challenging her conclusions. She debated with him, but then, when she gets back she has an email from him asking for her data.”

Image credit: John R. McKiernan (CC BY) See more at Why Open Research

We wait with bated breath for the next revelation.

“Well, she refused of course, but then this despicable person wrote to the journal, and they told her that she had to give it to him! It was in the papers she had signed.”

Murmurs of sympathy from those gathered round. Except, of course, me. I just waited for the denouement. What had happened next, I asked.

“She had to give him the data. It was really terrible. I mean, she’s just a young researcher starting out.”

I was still waiting for the denouement. Except that there was no more. That was it! Being made to give your data to someone was a terrible thing. So, being me, I asked, why was that a problem? Several people looked at me as if I was crazy.

“Well, how would you like it if you had spent years of your life gathering data, data which you might want to analyse further, and some person you have never heard comes out of nowhere demanding to have it?”

“Well, they won’t stop you analysing it,” I said.

“But they may scoop you and find something interesting in it before you have a chance to publish it!”

I was reminded of all of this at a small meeting that we had in Oxford last week, following up on the publication of a report of a symposium I’d chaired on Reproducibility and Reliability of Biomedical Research. Thanks to funding by the St John’s College Research Centre, a small group of us were able to get together to consider ways in which we could take forward some of the ideas in the report for enhancing reproducibility. We covered a number of topics, but the one I want to focus on here is data-sharing.

A move toward making data and analyses open is being promoted in a top-down fashion by several journals, and universities and publishers have been developing platforms to make this possible. But many scientists are resisting this process, and putting forward all kinds of argument against it. I think we have to take such concerns seriously: it is all too easy to mandate new actions for scientists to follow that have unintended consequences and just lead to time-wasting, bureaucracy or perverse incentives. But in this case I don’t think the objections withstand scrutiny. Here are the main ones we identified at our meeting:

Lack of time to curate data; Data are only useful if they are understandable, and documenting a dataset adequately is a non-trivial task;
Personal investment – sense of not wanting to give away data that had taken time and trouble to collect to other researchers who are perceived as freeloaders;
Concerns about being scooped before the analysis is complete;
Fear of errors being found in the data;
Ethical concerns about confidentiality of personal data, especially in the context of clinical research;
Possibility that others with a different agenda may misuse the data, e.g. perform selective analysis that misrepresented the findings;

These have partial overlap with points raised by Giorgio Ascoli (2015) when describing NeuroMorpho.Org, an online data-sharing repository for digital reconstructions of neuronal morphology. Despite the great success of the repository, it is still the case that many people fail to respond to requests to share their data, and points 1 and 2 seemed the most common reasons.

As Ascoli noted, however, there are huge benefits to data-sharing, which outweigh the time costs. Shared data can be used for studies that go beyond the scope of the original work, with particular benefits arising when there is pooling of datasets. Some illustrative examples from the field of brain imaging were provided by Thomas Nichols at our meeting (slides here), where a range of initiatives is being developed to facilitate open data. Data-sharing is also beneficial for reproducibility: researchers will check data more carefully when it is to be shared, and even if nobody consults the data, the fact it is available gives confidence in the findings. Shared data can also be invaluable for hands-on training. A nice example comes from Nicole Janz, who teaches a replication workshop in social sciences in Cambridge, where students pick a recently published article in their field and try to obtain the data so they can replicate the analysis and results.

These are mostly benefits to the scientific community, but what about the ‘freeloader’ argument? Why should others benefit when you have done all the hard work? In fact, when we consider that scientists are usually receiving public money to make scientific discoveries, this line of argument does not appear morally defensible. But in any case, it is not true that the scientists who do the sharing have no benefits. For a start, they will see an increase in citations, as others use their data. And another point, often overlooked, is that uncurated data often become unusable by the original researcher, let alone other scientists, if it is not documented properly and stored on a safe digital site. Like many others, I’ve had the irritating experience of going back to some old data only to find I can’t remember what some of the variable names refer to, or whether I should be focusing on the version called final, finalfinal, or ultimate. I’ve also had the experience of data being stored on a kind of floppy disk, or coded by a software package that had a brief flowering of life for around 5 years before disappearing completely.

Concerns about being scooped are frequently cited, but are seldom justified. Indeed, if we move to a situation where a dataset is a publication with its own identifier, then the original researcher will get credit every time someone else uses the dataset. And in general, having more than one person doing an analysis is an important safeguard, ensuring that results are truly replicable and not just a consequence of a particular analytic decision (see this article for an illustration of how re-analysis can change conclusions).

The ‘fear of errors’ argument is, of course understandable but not defensible. The way to respond is to say of course there will be errors – there always are. We have to change our culture so that we do not regard it as a source of shame to publish data in which there are errors, but rather as an inevitability that is best dealt with by making the data public so the errors can be tracked down.

Ethical concerns about confidentiality of personal data are a different matter. In some cases, participants in a study have been given explicit reassurances that their data will not be shared: this was standard practice for many years before it was recognised that such blanket restrictions were unhelpful and typically went way beyond what most participants wanted – which was that their identifiable data would not be shared. With training in sophisticated anonymization procedures, it is usually possible to create a dataset that can be shared safely without any risk to the privacy of personal information; researchers should be anticipating such usage and ensuring that participants are given the option to sign up to it.

Fears about misuse of data can be well-justified when researchers are working on controversial areas where they are subject to concerted attacks by groups with vested interests or ideological objections to their work. There are some instructive examples here and here. Nevertheless, my view is that such threats are best dealt with by making the data totally open. If this is done, any attempt to cherrypick or distort the results will be evident to any reputable scientist who scrutinises the data. This can take time and energy, but ultimately an unscientific attempt to discredit a scientist by alternative analysis will rebound on those who make it. In that regard, science really is self-correcting. If the data are available, then different analyses may give different results, but a consensus of the competent should emerge in the long run, leaving the valid conclusions stronger than before.

I’d welcome comments from those who have started to use open data and to hear your experiences, good or bad.

P.S. As I was finalising this post, I came across some recent tweets from the OpenCon meeting. Anyone seeking inspiration and guidance for moving to an open science model should follow the #opencon hashtag, which links to materials such as these: Slides from keynote by Erin McKiernan, and resources at .

This post originally appeared on the author’s personal blog and is reposted with permission.

Note: This article gives the views of the author, and not the position of the Impact of Social Science blog, nor of the London School of Economics. Please review our Comments Policy if you have any concerns on posting a comment below.

About the Author

Dorothy Bishop is Professor of Developmental Neuropsychology and a Wellcome Principal Research Fellow at the Department of Experimental Psychology in Oxford and Adjunct Professor at The University of Western Australia, Perth. The primary aim of her research is to increase understanding of why some children have specific language impairment (SLI). Dorothy blogs at BishopBlog and is on Twitter @deevybee.

About the author

Blog Admin

Posted In: Academic communication | Data science

13 Comments

Sasha says:

December 16, 2015 at 3:16 pm

This is an important topic and this is a very well-argued post. But with the focus on the implications for researchers, there’s a key perspective missing: that of patients who positively want their data to be shared.

The PACE trial of CBT and graded exercise therapy for chronic fatigue syndrome (CFS) is now rapidly becoming notorious. The PACE investigators abandoned all their main outcome measures and their criteria for “recovery” partway through the trial and replaced them with new ones. The results of the planned analyses were never reported.

The new threshold for “recovery” of physical function (SF-36) is so low that it’s below the level of trial entry – that is, you could get worse during the trial and be considered “recovered”. It’s close to the mean of patients with Class II coronary heart failure.

People have been trying and failing to get the raw data from this study to independently analyse it for years but all requests have been refused. Shockingly, Professor James Coyne’s request for data from one of the PACE papers has been denied by King’s College London, even though it was published in PLOS One and made his request under PLOS One’s data-sharing policy.

As Professor Chris Chambers of Cardiff University has tweeted, “If @KingsCollegeLon is seeking to do itself ‘reputational damage’, hiding trial data shd do the job.”

http://www.meaction.net/2015/12/12/vexatious-kings-college-london-dismisses-james-coynes-request-for-plos-one-pace-data/

PLOS One are now considering their next step but these shenanigans shouldn’t be necessary. As an ME patient, I was offered a place on PACE and refused. But if I had taken part I would be horrified at the travesty of science that this trial has become. I wouldn’t have wanted to have risked my health in a clinical trial only for the study authors to publish bizarre and misleading analyses and then hide the data so that others couldn’t challenge them.

Open science isn’t just for scientists: it’s also for the people who are on the receiving end of science. It’s time that both scientists and their universities faced up to that.

Reply
geneticist says:

December 16, 2015 at 4:21 pm

I have this friend who is carrying out a selection experiment in Genetics since 1980. He spends possibly 10% of his time taking care of it, which means 3.5 whole years. He also has good and original ideas how to analyse data and get newer results, the limiting factor being his personal time.
Of course he is not sharing data. Being cited is not reward enough compared to discovering new things from your data.
People tend to think that one experiment os a one-time thing. But for people dealing with biology, you carry non-stop experiments.

Reply
1. Chris says:
  
  January 18, 2016 at 11:18 am
  
  Is the project something that is specifically publicly-funded?
  
  Reply
Bill says:

December 16, 2015 at 5:35 pm

I and others are currently having issues with PLOS in getting data from a specific author from two studies. While he & his coauthors made a cursory attempt at data release, the data were actually summary statistics, not raw values, as required by PLOS. Enforcement of policies, therefore, appears to be a problem that should be addressed.

Reply
Sheri Oberman says:

December 16, 2015 at 6:05 pm

With open data now coming to the fore, research methods courses in research training programs need to encourage the practice and use the data for more hands on experience with training.

Reply
Pingback: Who’s afraid of Open Data: Scientists’ objections to data sharing don’t stand up to scrutiny. | Nader Ale Ebrahim
Pingback: Forskere nekter innsyn i ME-studie | De Bortgjemte
Pingback: Why wouldn’t you share data? | NeuroNeurotic
Pingback: Who’s afraid of Open Data: Scientists’ objections to data sharing don’t stand up to scrutiny. – Veille juridique
Sneha Kulkarni says:

December 30, 2015 at 11:06 am

This is an interesting article. I would like to add that data sharing has another important benefit i.e. it prevents data from getting lost. 80% of datasets over 20 years old are not available and the practice of sharing the data will help prevent data loss.

Reply
Pingback: Objections to data sharing don’t stand up to scrutiny - DNAdigest.org
Pingback: Week in review – science and policy edition – Enjeux énergies et environnement
Pingback: Been scooped? A discussion on data stewardship | Musings on Quantitative Palaeoecology

Altmetrics may be able to help in evaluating societal reach, but research significance must be peer reviewed.

July 9th, 2014

“I only come here for the comments” – Exploring the controversy of post-publication peer review.

November 7th, 2014

The impact of academia on Parliament: 45 percent of Parliament-focused impact case studies were from social sciences

October 19th, 2015

Open Research for Academics: how to be an academic in the twenty-first century

October 25th, 2016

Blog Admin

December 16th, 2015

Who’s afraid of Open Data: Scientists’ objections to data sharing don’t stand up to scrutiny.

Blog Admin

December 16th, 2015

Who’s afraid of Open Data: Scientists’ objections to data sharing don’t stand up to scrutiny.

Image credit: John R. McKiernan (CC BY) See more at Why Open Research

About the author

Blog Admin

13 Comments

Leave a Reply to Bill Cancel reply

Related Posts

Altmetrics may be able to help in evaluating societal reach, but research significance must be peer reviewed.

July 9th, 2014

“I only come here for the comments” – Exploring the controversy of post-publication peer review.

November 7th, 2014

The impact of academia on Parliament: 45 percent of Parliament-focused impact case studies were from social sciences

October 19th, 2015

Open Research for Academics: how to be an academic in the twenty-first century

October 25th, 2016