Providing access across subjects and regions, the Data Citation Index is linking up with repositories to provide a single-point recognition mechanism for quality research data. Christopher Lortie welcomes this development as it will profoundly reshape the publication pipeline and further fuel the open science movement. Data can now be recognised and cited as independent products, with or without them being linked to papers. There is now no excuse not to publish those datasets.
I had the good fortune to attend the Datacite Annual Conference this year about giving value to data. Thomson Reuters presented their new Data Citation Index that I had previously explored only cursorily for my ESA annual meeting ignite presentation on data citations. However, after the presentation by Thomson Reuters and the Q&A, I realized that a truly profound moment is upon us – the opportunity to FULLY & independently give value to data within the current framework of merit recognition (i.e. I love altmetrics and we need them too, but we can make a huge change right now with a few simple steps).
The following attributes of the process are what you need to know to fully appreciate the value of the new index: the data citation index is partnering with Datacite to ensure that they capture citations to datasets in repositories with DOI’s, citations from papers to datasets are weighted equally to paper-paper citations, and (in the partnership with Datacite) citations from one dataset to another dataset will also be captured and weighted equally. Unless I misunderstood the answers provided by Thomson Reuters, this is absolutely amazing.
Disclaimer: As I mentioned in my ignite presentation, citations are not everything and only one of many estimates of use/reuse. However, we can leverage and link citations to other measures and products to make a change now.
The revolution
If we publish our data in repositories, with or without them being linked to papers, we can now provide the recognition needed to data as independent evidence products. Importantly, if you use other datasets to build your dataset such as a derived dataset for a synthesis activity such as a meta-analysis or if you aggregate data from other datasets, cite those data sources in your meta-data. The data citation index will capture these citations too. This will profoundly reshape the publication pipeline we are now stuck in and further fuel the open science movement.
Consequently, publish your datasets now (no excuses) and cite the data sources you used to build both your papers and your datasets. Open science and discovery await.
This piece originally appeared on Christopher Lortie’s personal blog and is reposted with permission.
Featured image credit: To deposit or not to deposit, that is the question – journal.pbio.1001779.g001 (Wikimedia, CC BY 4.0)
Note: This article gives the views of the author, and not the position of the Impact of Social Science blog, nor of the London School of Economics. Please review our Comments Policy if you have any concerns on posting a comment below.
Christopher Lortie is an Associate Professor of Biology and Geography at York University in Canada. He is an integrative scientist with expertise in community theory, sociology, and quantitative methods. Empirical research by Lortie and collaborators includes biogeographical comparisons of many forms of community dynamics including invasion, climate change, keystone species, and plant-insect interactions. Research efforts include structural network analyses to couple trophic and non-trophic interactions with basal facilitation in desert ecosystems. Lortie is also a senior editor for Oikos and serves on other boards including PLOSONE, Journal of Ecology, and PeerJ.
does any one understand anything the author just said? What does this mean for qualitative data for example?
I don’t quite get the value of datasets living in their own free-flowing world disconnected from papers. Data, by itself, if ultimately valueless. Data only has value when you use it to answer a question and to generate knowledge and understanding. To achieve that, someone sometime, has to actually use the data and tell us what they’ve learned and that means writing a paper.
That also means that there will always be a tension in when and how people make the data available. I cannot imagine a time when we write letters for colleagues that say “you should promote this person–they’ve generated many great data sets, even though they’ve never told us what they mean.” So there will also be concern about making new data available before the person generating them has had a chance to do the first interpretation.
Yet, because data gain value when used, the more they are used and in more different ways, the more value they gain. That is really the value of open data–it allows greater use and as long as the data are citable independently that will give more recognition back to the generator.
There are many datasets that live on their own, we’ve got more than 300 stand alone statistical datasets at OECD and they provide a valuable resource for anyone working or studying in economics and social policy. They’ve been citable for many years (with DOIs) but the challenge is for authors who use data to cite it! The tradition is to simply put ‘Source: OECD’ under a chart, table or graph which is pretty useless for the subsequent reader. The launch of the Data Citation Index (which, by the way, is incorporating datasets from other places as well as Datacite) is a great step forward, but the real challenge is persuading authors to cite data as they would a research paper or book.