To move towards a more open science, we must free the data

Data sharing is a key principle of open science, and research funders are increasingly including this as a condition of grant awards. Despite this, Jessica Couture reports on research that found little more than a quarter of relevant research projects to be compliant. While there are valid reasons for certain data not to be made available – its sensitivity or the ease of its interpretation, for example – these findings indicate more needs to be done. A fundamental obstacle to data sharing is the absence of a professional reward structure, such as recognition that data citations are as valuable as article citations. Funders can also encourage data sharing compliance by creating dedicated data archives for funded projects and providing technological assistance to awardees.

Open science can be incorporated into every step of the scientific process and emphasises data sharing. Making data publicly available facilitates its reuse by scientists, such as in synthesis research, and can thus have a much greater impact than data limited to the creator’s initial analysis or intention.

With huge amounts of money dedicated each year to support scientific research, there is a growing push from funders to increase the impact and prestige of the money they award by requiring or encouraging data sharing. Particularly, when scientists receive public funds, research data is considered a public good and therefore carries an expectation of public accessibility. Additionally, new tools are emerging that make data annotation and sharing easier to incorporate into the research process.

However, while tools and protocols are changing to improve data sharing among researchers, colleagues and I found data was mostly not being made public in practice. In an article published in PLoS ONE, our team of scientists tested compliance with funder-imposed data-sharing requirements among projects in the environmental sciences over a 20-year period. We were able to collect data from only 26 per cent of funded projects. As scientists, we believe everyone in the scientific community can play a role in increasing data publication and sharing, and it is our responsibility to do so to improve the efficiency of research.

In our analysis, data availability did differ based on the project’s field of study, influenced by factors such as the time required to prepare data, whether a field has established data collection protocols and standardised methods, the sensitivity of data, and the ease of its interpretation. Nonetheless, we assert that a fundamental obstacle facing data sharing is the absence of a professional reward structure, such as the recognition that data citations are as valuable as paper citations. This discrepancy de-incentivises the time spent formatting, annotating, and preparing data to be shared.

While some publication platforms are starting to apply digital object identifiers (DOIs) to published data as a reliable way to enable attribution, similar to journal publications, it is ultimately up to the scientific community to recognise data citations as scientific currency that is equally valuable, and to encourage and practice the inclusion of data citations in their overall scientific output.

Image credit: Artem Bali, via Unsplash (licensed under a CC0 1.0 license).

To move toward more open science, scientists must take on some of the responsibility of learning about the benefits of data sharing and incorporating open science methods into their daily work. Creating data in a way that others – and, in future, you – can access and easily interpret may require an extra initial step, but it will reduce additional work down the road.

Using data formats that are easy to share and read on multiple and open source platforms – for example, CSV files rather than MS Excel – and publishing data in open archives will also save time when other researchers or the funder request data. Refined data preparation protocols can also expedite the publication process, as many journals, similar to funders, now require proof of data publication.

Funders can also make changes that will incentivise data sharing. Many have long required their awardees to make data publicly available without following up on these requirements or providing any resources to help researchers preserve their data. Some funders, such as the National Science Foundation (NSF), are starting to ensure data sharing compliance by creating dedicated data archives for the projects they fund and provide technological assistance to awardees. For example, the Arctic Data Center houses all data about the Arctic collected under NSF grants and provides awardees with a team of technicians to assist with data attribution, metadata creation, formatting, and publication. NSF also requires funded Arctic researchers to publish their data in the archive, or prove their publication in a similar archive, before awarding further funding. This two-fold approach not only facilitates data publication but also provides funders with easy confirmation of data sharing compliance.

Data sharing is pivotal to ensuring open science and research efficiency. In the ways outlined above, scientists, funders, and publishers alike can play important roles in increasing data liberation. Thinking about data as a valuable scientific currency is an important step forward, and it requires support from the entire scientific community. It starts with how you think about and treat yours and other people’s data.

This blog post was originally published by the National Center for Ecological Analysis and Synthesis (NCEAS) and is reposted here with permission. It is based on the author’s co-written article, “A funder-imposed data publication requirement seldom inspired data sharing”, published in PLoS ONE (DOI: 10.1371/journal.pone.0199789).

Note: This article gives the views of the author, and not the position of the LSE Impact Blog, nor of the London School of Economics. Please review our comments policy if you have any concerns on posting a comment below.

About the author

Jessica Couture is a PhD student at the Bren School for Environmental Science and Management at UC Santa Barbara, studying the environmental impacts of aquaculture. Previously, she worked at NCEAS for four years on various data projects with DataONE and participated in three working groups.

5 Comments

Lewis Johnson says:

August 16, 2018 at 6:04 pm

I agree with you. As a person who studied biology, fresh new research publications were hard to get. If it wasn’t on the databases provided by the university, chances were that you were not going to get access to that paper. If so much money is invested in research and publications, it should be available to a more widely to people.

Anonymous says:

August 17, 2018 at 7:35 am

While I agree in princple, I think you severely underestimate the complexity of the problem.

Just to mention a few issues:

It is not only about data, but also about programming code used to gather and analyze data. There are hundreds of different data formats used in science (i.e., CSV and Excel files hardly suffice). What about images, videos, and whatnot? How about documentation for the data (and the code)? Should we store both raw data and pre-processed data? What about big data? Should we store whole databases?

Before gong forward with these kind of initiatives, funders and institutions (NSF, EU, national foundations, etc.) should first solve the problem fo repositories to which these data can be deposited. These must scale at least to a terabyte-level. Integrity must be guaranteed. Also standards and formats must be agreed upon.

A long way ahead.

Patricia Galloway says:

August 20, 2018 at 3:56 pm

Data, code, especially libraries, not to mention standardization–what scientists need are digital archivists!

Pingback: Fellow-Programm 2018/2019 – Jetzt anmelden für die Auftaktveranstaltung! – Wikimedia Deutschland Blog
Pingback: auftakt freies wissen |

Collaboration and concerted action are key to making open data a reality

October 29th, 2017

Science by press conference: What the Heinsberg Study on COVID-19 demonstrates about the dangers of fast, open science.

August 20th, 2020

Why social scientists should engage early in the research life cycle

October 21st, 2019

Building trust in science is a social and technological project

June 3rd, 2024

Blog Admin

August 16th, 2018

To move towards a more open science, we must free the data

Blog Admin

August 16th, 2018

To move towards a more open science, we must free the data

Image credit: Artem Bali, via Unsplash (licensed under a CC0 1.0 license).

About the author

Blog Admin

5 Comments

Leave a Comment Cancel reply

Related Posts

Collaboration and concerted action are key to making open data a reality

October 29th, 2017

Science by press conference: What the Heinsberg Study on COVID-19 demonstrates about the dangers of fast, open science.

August 20th, 2020

Why social scientists should engage early in the research life cycle

October 21st, 2019

Building trust in science is a social and technological project

June 3rd, 2024