Julien Larregue

Philippe Vincent-Lamarre

Frédéric Lebaron

Vincent Larivière

November 30th, 2020

COVID-19: Where is the data?

2 comments | 50 shares

Estimated reading time: 7 minutes

Julien Larregue

Philippe Vincent-Lamarre

Frédéric Lebaron

Vincent Larivière

November 30th, 2020

COVID-19: Where is the data?

2 comments | 50 shares

Estimated reading time: 7 minutes

The arrival of the COVID-19 pandemic has led many to argue that scholarly communication and publishing is undergoing a revolution, in terms of not only the wider opening of access to research, but also the data underlying it. In this post Julien Larrègue, Philippe Vincent-Lamarre, Frédéric Lebaron, and Vincent Larivière, discuss findings from their study of papers submitted to the preprint server medRxiv, which shows levels of open data to be stubbornly low.

On January 31^st, 2020, the Wellcome Trust issued a press release that seemed to constitute a great advancement for the accessibility and reproducibility of biomedical research. Among other engagements—such as open access for coronavirus-related publications—it was expected that “researchers share interim and final research data relating to the outbreak, together with protocols and standards used to collect the data, as rapidly and widely as possible”. This statement was quickly adhered to by a large number of signatories, from preprint repositories (arXiv, bioRxiv, medRxiv) to prestigious journals (Nature, Science, The Lancet), as well as scientific institutions across the globe (NIH, INSERM, Chinese Center for Disease Control and Prevention) and publishers (Elsevier, SAGE, Springer, Taylor & Francis).

This finding indicates—contrary to what many have suggested—that a global pandemic is not sufficient to radically modify scientific practices

As controversies surrounding vaccines and treatments are still ongoing, it is crucial that scientists be able to evaluate each other’s claims, of which access to data is a strong component. To assess the effectiveness of this engagement, we analysed data availability statements contained in 7,394 COVID-19 articles submitted to medRxiv between January 1st and November 2^nd 2020, and compared those with 5,350 preprints extracted from this same repository but that were not coronavirus-related. We used an automated identification of targeted keywords related to the data availability to obtain an approximation of the data availability status for 9,953 out of the 12,744 manuscripts (our full methods, code and data used are available here!).

The results are rather disappointing (Figure 1): overall, COVID-19 preprints declare similar levels of data openness than articles published on other topics, that is to say, a minority of papers make their data available without restriction. This finding indicates—contrary to what many have suggested—that a global pandemic is not sufficient to radically modify scientific practices, and that the Wellcome Trust statement had little effect. Although scientists working on COVID-19 do declare slightly higher rates of data availability, incorporate a hyperlink or mobilise already publicly available data, the proportion of manuscripts concerned remains very low (11.2%, 11.8% and 18.2% respectively).

Fig.1: Percentage of preprints submitted to MedRxiv with a data statement between January 1^st and November 2^nd 2020, by data (left panel) and code (right panel) availability statement

Additionally, a sizeable proportion of preprints mention that data is available upon request (conditional availability), both in COVID-19 publications (42.8%) or in the comparison group (52%), and the trend is increasing over time for COVID-19 preprints (Figure 2). Such statements are problematic, as data sharing remains dependent upon the authors’ good will. In fact, many data availability statements are ambiguous, and could actually counteract the sharing of data. For example, one of the COVID-19 preprints stated that “the datasets generated and analyzed during the current study are available from the corresponding author on reasonable request”—which makes us wonder what may constitute an unreasonable request. Such cases are not exceptional: 8% of coronavirus preprints —nearly a fifth of all “conditional availability” statements—mention reasonableness as a criterion for accepting, or refusing, to share data with fellow scientists. Of course, medRxiv preprints are not necessarily representative of the entire biomedical literature, and we cannot extend our findings to peer-reviewed journal papers. However, our non-systematic observations of COVID-19 publications in prominent journals tend to confirm that data openness remains wishful thinking.

Fig.2: Percentage of preprints submitted to MedRxiv with a data statement between January 1^st and November 2^nd 2020, by month, for COVID preprints (left panel) and other preprints (right panel)

To comply with the Wellcome engagement, publications should be accompanied by supplementary information containing more or less detailed tables of the material used in the said article. This was for instance the case of the now infamous Surgisphere articles. What the community needs is access to the original data, be they observations, randomized controlled trials or administrative registers. This is the only way to foster the rapid discovery of a successful, agreed-upon treatment for COVID-19, as well as to avoid the sterile polarizations. Without the possibility for scientists to have access to original data and replicate findings, we are paving the way for free speculations, political indeterminacy and media turmoil. At a time when we need collaboration and utmost transparency in drug trials, it would seem that all what biomedical scientists can think about is getting to the finish line before their peers and inflating their h-index. Competition for priorities in scientific discovery is probably as old as science itself, but is it really what we need right now?

Since the advent of the Internet, scholarly journals have lost their quasi-monopoly over the dissemination of knowledge. In addition to preprint servers, which provide access to research results before they are reviewed and published in journals, post-publication peer review platforms have shown that peer review is far from perfect. Perhaps this unprecedented health crisis constitutes an occasion for them to restore faith in the publication process. Enforcing what they committed to with the Wellcome Trust statement would be a great leap forward. It is time for scientists to get serious about open data.

Note: This article gives the views of the authors, and not the position of the Impact of Social Science blog, nor of the London School of Economics. Please review our Comments Policy if you have any concerns on posting a comment below.

Image Credit: Jeremy Perkins, via Unsplash.

About the author

Julien Larregue

Julien Larregue is a sociologist at the University of Copenhagen whose research interests revolve around the production of scientific knowledge, the uses of quantification and the circulation of ideas across social fields and countries. He recently published a book on the contemporary return of biological theories of crime (Héréditaire. L’éternel retour des théories biologiques du crime, Paris, Éditions du Seuil, 2020).

Philippe Vincent-Lamarre

Philippe Vincent-Lamarre is a SSHRC postdoctoral researcher at the Canada Research Chair on the Transformations of Scholarly Communication of the Université de Montréal. He is currently interested in using quantitative and computational methods to understand the impact of the COVID-19 pandemic on the scientific community.

Frédéric Lebaron

Frédéric Lebaron is professor of sociology at Ecole normale supérieure Paris-Saclay, inside the university Paris-Saclay. He is a member of the research group Institutions et dynamiques historiques de l'économie et de la société (CNRS). He recently co-edited Empirical Investigations of the Social Space (Springer, 2019) and published in 2020 Savoir et agir. Chroniques de conjoncture 2007-2020 at éditions du Croquant.

Vincent Larivière

Vincent Larivière holds the Canada Research Chair on the Transformations of Scholarly Communication at the Université de Montréal, where he is professor of information science and associate vice-president (planning and communications). He is also scientific director of the Érudit journal platform, associate scientific director of the Observatoire des sciences et des technologies (OST) and regular member of the Centre interuniversitaire de recherche sur la science et la technologie (CIRST).

Posted In: COVID 19 | Open Research

2 Comments

Michael Hall says:

November 30, 2020 at 11:37 pm

“Since the advent of the Internet, scholarly journals have lost their quasi-monopoly over the dissemination of knowledge”. Yes, but they have retained their hold on impact factors and what bibliometric service they are in. Factors which profoundly affect the careers of many academics, and potentially even more so given the credence given by universities to various ranking systems. While extremely appropriate to make datasets available from the perspective of the public good, it is extremely understandable that many researchers, some which which have tenuous contract positions are reticent to share data until they have squeezed the last publication from it.

To encourage knowledge sharing is what we should be doing, but it is extremely difficult to do so when another part of the knowledge economy in the form of journal publication is highly corporatised and promotes the use of metrics that universities and institutions have bought into, further reinforcing them, and therefore making researchers capacity to share their databases because other people may publish from them before they can.

Reply
Pingback: No revolution: COVID-19 boosted open access, but preprints are only a fraction of pandemic papers | Science - Thisweeks 5

Book Review: Open Data and the Knowledge Society by Bridgette Wessels, Kush Wadhwa, Rachel L. Finn and Thordis Sveinsdottir

November 26th, 2017

In order to fully realise the value of open data researchers must first address the quality of the datasets

April 26th, 2017

The State of Open Data 2019 – What are the key issues in open data for researchers?

December 13th, 2019

The need for open data sharing in the era of global pandemics

November 17th, 2020

Julien Larregue

Philippe Vincent-Lamarre

Frédéric Lebaron

Vincent Larivière

November 30th, 2020

COVID-19: Where is the data?

Julien Larregue

Philippe Vincent-Lamarre

Frédéric Lebaron

Vincent Larivière

November 30th, 2020

COVID-19: Where is the data?

About the author

Julien Larregue

Philippe Vincent-Lamarre

Frédéric Lebaron

Vincent Larivière

2 Comments

Leave a Comment Cancel reply

Related Posts

Book Review: Open Data and the Knowledge Society by Bridgette Wessels, Kush Wadhwa, Rachel L. Finn and Thordis Sveinsdottir

November 26th, 2017

In order to fully realise the value of open data researchers must first address the quality of the datasets

April 26th, 2017

The State of Open Data 2019 – What are the key issues in open data for researchers?

December 13th, 2019

The need for open data sharing in the era of global pandemics

November 17th, 2020