What's the point of open academic data? Redefining valuable research based on re-use and reproducibility.

The benefits of open research to both those who fund it and to wider society mean that academia is on a course which cannot be altered, or returned to a previous state. Mark Hahnel discusses the momentum behind making all research outputs openly available online and what changes may be in store for universities in this transition. Funder mandates could shift how researchers receive credit for their work and librarians might also continue to help research staff better manage and disseminate content

Openly-available academic data on the web will soon become the norm. Funders and publishers are already making preparations for how this content will be best managed. With the coming open data mandates, meaning we are now talking about ‘when’ not ‘if’ the majority of academic outputs live somewhere on the web, the big question now becomes, ‘what next?’ The answer in the open government space is one of the drivers for the movement in academia: re-use. There is a fundamental difference between academic and government data that is often overlooked. Heterogeneity in data between academic fields, or even what constitutes data, is a point that needs addressing. The primary concern, and that of the funders, is the digital outputs of research. Whatever is needed to reproduce the investigation should be considered ‘data’. This can be spreadsheet data, videos, code and field-specific proprietary formats that dominate the landscape in certain areas. There are, however, many more outputs of a research project, such as posters, presentations and other traditional outputs including theses, which are essentially extended research papers. Funders are also interested in making these outputs open and monitoring the subsequent impact they have.

The EPSRC of the UK is mandating dissemination of all of the digital products of research they fund this year. The European Commission and Whitehouse’s OSTP are pushing ahead with directives that are also causing a chain effect of open data directives amongst European governments and North American funding bodies.

While this process may seem to have been slow given that the world wide web was originally created for exactly this purpose, other industries were rapidly disrupted. Examples in the music and film industry enforce the idea that dissemination of content on the web is a problem that has been solved. However, changing the incentive structure in an industry for which papers have been the currency for more the past 300 years obviously has some complications. The Finch report and subsequent reaction highlighted the idiosyncrasies in each research field that makes interpretation of guidelines heterogeneous across academia.

Image credit: Blude (Flickr, CC BY)

It is quickly becoming clear that the benefits of open research to both those who fund it and to wider society mean that the wheels that have been set in motion have put academia on a course which cannot be altered, or returned to a previous state. As new value, career advancement incentives and metrics are built on top of open content, new concepts of valuable research may arise. If we define new criteria for what is classified as ‘good’ or ‘valuable’ research, based on re-usability and reproducibility, what will become of the millions of academic papers existing today which do not conform to the new required standards and what would the new criteria for valuable research look like? There is a case for reverse engineering value in non-digital content as newer technologies arise. A case in point is the ability to text mine a large proportion of books never intended to be digitised, let alone indexed.

As is the case with the huge (and growing) numbers of papers being published each year, publishing platforms need to develop to cope with the dissemination of other digital content. With new business models arising based on open content, these new platforms should not need to restrict access to any content which serves as a public good. With the dawn of altmetrics providing more detailed ways to filter the content, to interrogate reward systems and track levels of reuse, funders will be able to better record their ROI and better allocate funds. In terms of the existing research life cycle, there are a couple of stakeholders who will be affected more significantly in the coming years.

Academics

If researchers are being directed to make all research outputs openly available, there is huge scope for how the same researchers are receiving credit for these outputs. At figshare, we have expanded the ‘types’ of academic content that can be filtered on from an initial offering of datasets, figures and multimedia, to include posters, code, papers, presentations, theses, and groups of files as a citable object (filesets) based on requests from academics themselves. We also accept any file type, with the idea that these can be converted and downloaded in open formats. The ability to more efficiently mine research data and draw new conclusions is something that may be better suited to machines and coders. Human interpretation of results with some field knowledge will always be essential, but having the appropriate skill set to best interrogate the large corpus of open content may be something for which those outside of the field may be better suited.

This may lead to a dual tiering of academia – raw data gathering and hypothesising and a second level of interrogation, akin to the ideas put forward as the ‘4th paradigm’ of academic research. Where new funding will come from for these roles and how they will affect the traditional ‘tenure track’ remains an unknown unknown.

Librarians

The library has gone through a transition process at what must be all academic institutions over the last 10 years. From 2007-2011 when I myself was completing my PhD, I never once physically entered the library in order to find content or conduct research. I did however, access content that my library had arranged access to on a daily basis through my browser. As the access discussion evolves, as too does the role of the library and even more importantly, the librarian.

A survey of our users informed us that those who were using figshare for private research data management and not dissemination of their research outputs were unsure about whether their PIs, institutions and even funders allowed them to do so. This is surprising when the NIH lists us as a trusted repository and funders like the Wellcome Trust use us to disseminate their research findings. Whilst funders have public statements on their expectations with regards to publication of all research they fund, this suggests that the message is not getting through to the academics.

One area where this expertise lies is within the library. Librarians are transitioning from a position of accessing content for others, to helping their research staff better manage and disseminate content in line with funder expectations. The librarian as a source of knowledge for all things data, code and policy is fast becoming a job with ‘many hats’. Thus cementing their role within the institution as a growing need, even if it does come under several titles.

Institutions

As with the case for academics, institutions are aware that once funders start asking to see all the outputs of the funding they provide, measuring the associated impact and ranking individuals and institutions on that basis will not be far behind. In the UK, the recent interest in the results of the REF, demonstrated just how important reporting on the impact is for the continued progress and funding at an institutional level.

As with all things in this space, it is often the early adopters who garner the most benefit. Bigger institutions have been trying to forge ahead in the digital commons space for several years, with institutions like Harvard creating solutions such as Dataverse. Our existing and potential clients for the ‘figshare for institutions’ offering, see this as a cost effective and timely approach to get an early footprint of impact in this space.

Work done by the RDA working and interest groups has identified some of the questions that will need to be addressed before any standards come into place, but the fast moving nature of the mandates and the technical capabilities of the web mean that institutional reporting will need to be addressed globally in the not-too-distant future.

Consumers of academic content

As with the new roles within academia mentioned above, there may also be similar developments outside of the academic world. Open government directives have led to new applications, business models and ways of making use of datasets. Whilst the data produced from governmental surveys and reporting is often much more homogenous than the varied and increasingly diverse structures of academic research outputs, the sheer volume of research in specific fields has led to new and improved ways to interpret the data, ranging from genetic applications such as 23andme, to sentiment analysis on huge cohorts of social science data from systems such as twitter and facebook.

Through the power of linked open data, the web should evolve in order to return more accurate data in response to any question that it is posed with. As the world’s largest driver of knowledge, the academic system should provide data to better answer queries at all stages of the learning and educational process.

As always, feedback, comments, suggestions and ideas are welcomed. Please get in touch at info@figshare.com or via twitter, facebook or google+. If you would like to hear more about our institutional offering, please get in touch via any of the above channels and we will be more than happy to discuss your requirements in more detail.

This piece originally appeared on the figshare blog and is reposted with the author’s permission.

Note: This article gives the views of the author, and not the position of the Impact of Social Science blog, nor of the London School of Economics. Please review our Comments Policy if you have any concerns on posting a comment below.

About the Author

Mark Hahnel is the founder of figshare and has a PhD in stem cell biology from Imperial College London, having previously studied genetics in both Newcastle and Leeds. He is passionate about open science and the potential it has to revolutionise the research community. For more information about FigShare, visit FigShare.com. You can follow him at @figshare.

Blog Admin

January 16th, 2015

What’s the point of open academic data? Redefining valuable research based on re-use and reproducibility.

Blog Admin

January 16th, 2015

What’s the point of open academic data? Redefining valuable research based on re-use and reproducibility.

Image credit: Blude (Flickr, CC BY)

About the author

Blog Admin

1 Comments

Leave a Comment Cancel reply

101 Innovations in Scholarly Communication: how researchers are getting to grip with the myriad new tools

November 11th, 2015

Challenges of using Twitter as a data source: An overview of current resources

September 28th, 2015

The ResearchGate Score rewards academics’ active participation on the platform above their publications and citations

September 25th, 2017

How academics and NGOs can work together to influence policy: insights from the InterAction report

September 23rd, 2016

Blog Admin

January 16th, 2015

What’s the point of open academic data? Redefining valuable research based on re-use and reproducibility.

Blog Admin

January 16th, 2015

What’s the point of open academic data? Redefining valuable research based on re-use and reproducibility.

Image credit: Blude (Flickr, CC BY)

About the author

Blog Admin

1 Comments

Leave a Comment Cancel reply

Related Posts

101 Innovations in Scholarly Communication: how researchers are getting to grip with the myriad new tools

November 11th, 2015

Challenges of using Twitter as a data source: An overview of current resources

September 28th, 2015

The ResearchGate Score rewards academics’ active participation on the platform above their publications and citations

September 25th, 2017

How academics and NGOs can work together to influence policy: insights from the InterAction report

September 23rd, 2016