The impact of generative AI on research is often presented as one of fundamentally changing knowledge production, through innovative research methods and writing. However, as Niki Scaplehorn and Henning Schoenenberger argue, generative AI tools could have a more transformative effect on how science is shared – such as the creation and publication of open datasets and how science is assessed.
The past decade revealed how open information exchange can dramatically accelerate scientific progress. The explosion of sharing of preliminary results, data sets, and protocols during the COVID-19 outbreak arguably hastened the development of vaccines, treatments and effective public health measures. It was a key moment for open science, highlighting in practical terms, how access to a diverse range of research outputs, not just the final article, fuels breakthroughs.
Yet, significant hurdles remain. Despite a proliferation of high-quality data repositories and an increasing number of funder and institutional mandates, many researchers still lack consistent guidance on how to share data in ways that add value – by aligning with FAIR (Findable, Accessible, Interoperable, and Reusable) standards. The existing maze of overlapping sharing policies moreover leaves authors unsure where, when, and in what format to deposit their research materials. Adding to these practical challenges are substantial cultural barriers. Data sharing, code publication, and detailed protocol documentation are yet to be fully recognised or rewarded in many academic circles.
A researcher-centric approach to AI
Emerging technologies (particularly those built on generative AI) could be part of the answer – helping resolve the bottlenecks that keep many results locked behind institutional walls.
AI is already reshaping the research ecosystem and has the potential to transform how we think about open science. Notably, it could help shift these practices from being another directive from funders or journals, to more of a carefully designed “product” that aligns with how researchers work. In turn, this requires shifting mindsets from top-down policy enforcement to a service-oriented approach that places researchers’ needs and goals at the centre.
When open science is framed as a ‘policy imperative’, it can feel like a burden to already overstretched scholars
In practice, adopting a product mindset begins with empathy for the realities of researchers’ day-to-day workflows. From data collection and experiment design to code development and protocol sharing, these steps are too often fragmented by time-consuming administrative tasks. When open science is framed as a ‘policy imperative’, it can feel like a burden to already overstretched scholars.
At Springer Nature, we recently conducted a pilot study simply requiring authors to explain why any unshared data hadn’t been deposited in a public repository before final acceptance. This request alone raised data-sharing compliance from 51% to 87% in participating journals. However, while such editorial engagement clearly works, scaling it across hundreds of titles poses a challenge if it depends solely on manual oversight.
This is where Generative AI has promise. Enabling automation of metadata creation, flagging overlooked requirements, and suggesting best-practice workflows all free researchers to focus on discovery rather than documentation. Crucially, these tools can connect researchers more directly with the benefits of openness (tracking citations of datasets, code usage, or protocol adoption) to better reflect their full range of contributions.
Enabling automation of metadata creation, flagging overlooked requirements, and suggesting best-practice workflows all free researchers to focus on discovery rather than documentation
We are currently conducting a small pilot with our authors on a small number of our OA journals, to see if generative AI can be used to identify promising datasets buried in traditional articles and help transform them into data manuscripts. Crucially, authors can then review and edit these drafts, ensuring the final text accurately represents their work and meets community standards. This human-in-the-loop approach is vital to ensure the accuracy and integrity of the generated content.
Ultimately, a researcher-centric approach to AI makes generative tools part of the process (rather than the whole process) and encourages and supports detailed documentation and better data stewardship. Instead of viewing open science as ‘extra’, we can embed it into the infrastructure of academic work. This could ensure that openness, equity, and innovation become the norm rather than the exception. Further, once a dataset is shared, it becomes more likely that protocols, code, and supplementary assets will be similarly recognised.
Tools alone can’t deliver change
The process of research is currently grounded in a mutual exchange of trust: researchers receive support from institutions, funders, and society, and they share the outcomes of their work in return. Until now, this workflow has largely revolved around publishing research articles. However, in today’s interconnected and data-driven world, this model arguably no longer fully captures the multifaceted nature of modern research outputs. A framework that values only the final manuscript misses these broader contributions and stifles the culture of openness required to reap the benefits of collaboration, usability and re-usability.
As a sector we could better recognise, enable and support all components of the research lifecycle. As could research institutions through academic promotions, grant decisions, and professional evaluations. In scholarly communication, changing incentives always feels over the horizon, but if we could acknowledge researchers who share reproducible datasets, publish well-documented code, or refine and disseminate experimental methods, we could build a system where creating a high-quality dataset or widely used software tool could earn credit on par with writing a journal article.
Technology remains crucial to this vision, by providing the infrastructure for measuring and surfacing these contributions. As collaborative projects, such as the DataCite initiative demonstrate and our AI pilot work, show, AI can be deployed to detect references to specific datasets, code snippets, or methodological protocols in publications. This granular tracking and linking creates robust evidence for how shared research objects influence future work and broadens the scope of what can be cited, counted, or rewarded. While citation metrics have dominated evaluation of research articles, AI-driven analytics might illuminate a broader range of markers for the value of underlying research components. Used carefully within university or funder evaluation systems, these metrics could help researchers who prioritise openness.
granular tracking and linking creates robust evidence for how shared research objects influence future work and broadens the scope of what can be cited, counted, or rewarded
Equally important is reducing the practical friction of sharing. AI-based platforms can guide researchers through publisher or funder requirements, auto-generate metadata, and highlight relevant repositories. This saves time while encouraging thorough documentation and broad accessibility—strengthening the transparency and trust at the heart of any social contract. In turn, once open science becomes less cumbersome and time consuming, it is more likely to be embraced by researchers.
Realising the benefits of openness
When designed around researchers’ real-world needs, AI systems and tools can streamline data-sharing requirements, automate the tedious parts of compliance, and elevate the visibility of otherwise hidden contributions. This lowers the barriers to openness, making it simpler and more appealing to deposit data, code, and protocols in ways that others can easily find and reuse.
Ultimately, this collaboration between human-centric policy and AI-driven facilitation benefits not only researchers, but everyone who relies on scientific progress. Policymakers and practitioners have more data at their fingertips, driving more informed decisions; funders can ensure that investments lead to broadly accessible resources; and the public gains greater transparency into the research it helps to underwrite. While current visions of AI research often seem to include researchers as an afterthought, by aligning technology and incentives under a broader view of scholarship, the research ecosystem can evolve into one that is open, equitable, and primed for new discoveries.
The content generated on this blog is for information purposes only. This Article gives the views and opinions of the authors and does not reflect the views and opinions of the Impact of Social Science blog (the blog), nor of the London School of Economics and Political Science. Please review our comments policy if you have any concerns on posting a comment below.
Image Credit: Sergey Nivens on Shutterstock.