Research into pressing societal challenges increasingly depends on data coming from across different disciplines and research contexts. Gordon Blair argues that to create a research culture that makes the best use of available data, the 2016 FAIR principles need to be extended in ways that address issues that have emerged in the decade following their creation.
Modern science yields unprecedented volumes of data for researchers from myriad different sources. Combined with advances in digital research infrastructure and artificial intelligence, how we use and analyse data is changing, creating ideal conditions for data-intensive science to flourish.
As this new era of science dawns, our best-practice framework for managing and sharing data, the FAIR Principles, is at risk of being left behind.
Combined with advances in digital research infrastructure and artificial intelligence, how we use and analyse data is changing, creating ideal conditions for data-intensive science to flourish.
Introduced in 2016, the FAIR Guiding Principles (Findability, Accessibility, Interoperability and Reusability) were a significant step forward for open science. They were designed to make research outputs, like data, easier to find and integrate into studies with minimal human input. Eight years on, for research to be up to the task of tackling the complex environmental and societal challenges of our time, it’s time to extend the FAIR Principles so open, interoperable and AI-ready data isn’t just a goal, but part of scientific culture.
FAIR’s hidden challenges
In a research ecosystem where data is increasing in volume, variety and complexity, FAIR should be an enabler for open, collaborative and interdisciplinary science. But, like any guiding framework that is nearly a decade old, it is not without issues.
For example, metadata is a critical underpinning for FAIR to function. To enable this, metadata should be well described, yet descriptions aren’t standardised across disciplines, making data messy and harder to locate. Data also isn’t always stored in open-access repositories, excluding valuable datasets from being exchanged and reused. In some cases, data remains on a researcher’s hard drive or institutional file system until their analysis is published. Cultural and technical barriers such as these stem from the siloed nature of science and the incentives and pressures placed on researchers to publish.
The way researchers integrate this data is currently limited by the unique set of standards and interfaces that each domain uses for data access and storage
This contrasts with best practices in data sharing that take a ‘One Health’-inspired approach to deliver systemic insights into a range of critical environmental questions. Take water quality as an example – 30 per cent of people don’t have access to reliable supplies of clean water. Data-led approaches have real potential to unpick the complex interactions affecting the availability and quality of water resources for people and industries globally. To achieve this, there is a need to access and bring together data from a range of disciplines, including genomic data from the life sciences, ecosystems data from a variety of environmental science sub-disciplines, as well as economic, social and health-related datasets. The way researchers integrate this data is currently limited by the unique set of standards and interfaces that each domain uses for data access and storage – hampering what science can achieve.
Moving beyond FAIR
This systemic approach is central to how we carry out research at the UK Centre for Ecology & Hydrology. Rather than simply working to understand soils or water in isolation, we harness data and digital technologies to understand whole ecosystems and their interactions across the planet and its populations. For us to develop holistic solutions for the world, we must be able to harmonise and integrate data across different domains. Extending the scope of FAIR offers a route to achieve this:
- Findable – Discoverable is better than findable: Discoverable data goes beyond the ability to simply locate and access a specific data set you are aware of – discoverable data would be found serendipitously. For example, by making data easier to discover, a search strategy for the dataset you know you want has the potential to unearth useful data you didn’t even know existed, like data about the river catchment you are studying that we were not aware of. This discovered data can provide an AI engine with contextual information that enriches research.
- Accessible – True accessibility for all: Currently, accessible simply means data can be accessed. A broader picture of accessibility would include an inclusive, cross-domain approach. For example, data would not just be findable and downloadable, but readily accessible to all by a variety of mechanisms. For example, via applications and workflows that automatically discover, retrieve and process relevant data sources, rather than having to search for them manually.
- Interoperable – Striving for interoperability across domains: Interoperability needs to harmonise data use across domains, so open data from different disciplines can come together to deliver powerful new insights. Standardising metadata descriptions, for example, is one important step to achieve this. More generally, there is a need to work on common standards and interfaces wherever possible, together with mechanisms to translate between domains where differences inevitably exist.
- Reusable – Building a culture of reusability: We need to move from periodically reusing data to building a culture where data reuse is the norm. Going beyond this, we need to consider the reuse of a broader range of digital assets including models and methods. For example, embracing a reuse and exchange-focused ethos would contribute to improving the sustainability of data analysis and modelling, which comes at a high energy cost. This would reduce the need to repeat experiments, limiting the environmental impact of research.
Extending FAIR principles in this way would encourage a more open economy of science where expertise and knowledge are valued above data. This has taken hold in the open-source software movement, where developers readily invest their time and money, yet give their software (their data) away. In this case, value is in expertise and knowledge, not raw materials. Contrast this to the environmental sciences, where many see their data as intellectual property that’s worth holding on to – a culture we must move beyond.
Overcoming barriers
Culture change doesn’t happen overnight. Researchers, funders and institutions each have a role to play in coordinating a cross-disciplinary effort that establishes common standards, interfaces and vocabularies for sharing digital assets including data.
Good practices exist. One example is the Australian Research Data Commons. By bringing together thematic communities – people, planet, humanities, arts and social sciences – they’ve developed a set of common standards that emphasise interoperability and the translation of data between domains, all underpinned by a cloud-based infrastructure. The goal is to produce a national knowledge infrastructure that gives Australia’s researchers a competitive advantage.
While the UK’s approach is more siloed, we have the building blocks to build on FAIR
In the UK, Health Data Research UK is working towards an open-first and reuse-based approach that brings together multi-modal data across the health research community. Similarly, the UKRI-funded BioFAIR is developing a BioCommons infrastructure for UK life sciences researchers, with shared commons and services to facilitate AI-readiness and improve open science practices. The UK Data Service takes a similar approach to economic, population and social research data.
While the UK’s approach is more siloed, we have the building blocks to build on FAIR and embrace a more inclusive, discoverable, interoperable and reuse-centred culture of data sharing.
From addressing climate change to ensuring food security, the world’s grand challenges demand a more united, data-driven response that rises above disciplinary silos and individual priorities. This would ensure the future of science is not only FAIR but truly fit for purpose.
The content generated on this blog is for information purposes only. This Article gives the views and opinions of the authors and does not reflect the views and opinions of the Impact of Social Science blog (the blog), nor of the London School of Economics and Political Science. Please review our comments policy if you have any concerns on posting a comment below.
Image credit: robuart on Shutterstock.