AI paper mills and image generation require a co-ordinated response from academic publishers

The role of AI in the production of research papers is rapidly moving from being a futuristic vision, towards an everyday reality; a situation with significant consequences for research integrity and the detection of fraudulent research. Rebecca Lawrence and Sabina Alam argue that for publishers, collaboration and open research workflows are key to ensuring the reliability of the scholarly record.

The latest iteration of OpenAI’s Artificial Intelligence (AI) chatbot, ChatGPT, and the bot’s almost uncanny capability to write poetry and academic essays that are very difficult to distinguish from human-centric production has recently, and much like other companies linked to Elon Musk, caused a stir in the world of research. This is raising the spectre of AI in the service of research fraud and a race-to-the-bottom in research output and publication. As John Gapper warned in the Financial Times, “…if an unreliable linguistic mash-up is freely accessible, while original research is costly and laborious, the former will thrive”. Does a new age of research desk top paper mills that are in easy reach of everyone anywhere present a real and present danger to research integrity?

In short, the risk is already with us. In May this year, data sleuth Elisabeth Bik tweeted about how image fraud was being boosted by AI, with Generative Adversarial Network (GAN) technology (where algorithms that closely match the human brain are pitted against each other to produce synthetic data) is capable of producing deepfakes in biomedical literature. Ethics and integrity issues are growing exponentially across scholarly communication. F1000’s and Taylor & Francis’ figures tell a story that is reflected across academic publishing, with such cases representing 34% of ethics cases for F1000 and about 50% of T&F’s ethics cases. Other major issues include duplicate submissions, data integrity, citation manipulation and authorship integrity issues. As Sabina noted recently, the problem is significant, not just because of the volume and extent of the growth in the number of these issues, but also because there are different types of paper mills, and they are all highly adaptive.

Does a new age of research desk top paper mills that are in easy reach of everyone anywhere present a real and present danger to research integrity?

Investigating these issues within a context of shifting sands poses many challenges. Nevertheless, publishers play a vital role in ensuring the legitimacy and integrity of what we publish and disseminate across the world. We invest in systems, safeguards and expertise to ensure due process has been applied to the scholarly content we publish. So, when this is manipulated and the integrity of the scholarly record is under threat, it’s vital we take all steps necessary to protect it. Technology is playing an ever more important role for publishers. The ability to detect research integrity and publishing ethics issues needs to be scalable, because some types of misconduct only become noticeable when patterns are detected across a number of different articles and datasets. This is a key area where developers, publishers and other scholarly organisations are collaborating and investing, not just financially, but with time and effort too.

The ability to detect research integrity and publishing ethics issues needs to be scalable, because some types of misconduct only become noticeable when patterns are detected across a number of different articles and datasets.

Systems, safeguards and expertise are just one part of the solution. At a recent Westminster Higher Education Forum, there was “wide agreement across the global research system” that open research is critical in reducing research waste and enabling scrutiny of data. We agree. Open data and materials make it harder to fabricate data and conclusions, and access to the underlying data by readers and AI means that issues are more likely to be noticed. As AI and automation, such as Automated Research Workflows, increasingly become an integral part of research (particularly in the analysis of big data), making that data open will significantly benefit the use of AI to interrogate data for fraud. Furthermore, enabling and encouraging the publication of a broad range of outputs including negative/null findings, protocols, incremental studies – a key element of the open research model – minimises publication and editorial bias and provides additional accessible data for AI tools aimed at combatting research fraud.

Publishers themselves also need to be open to collaboration with stakeholders (including other publishers) across the research ecosystem to tackle the root causes including a system of rewards and incentives that deter rather than feed into incentives to use paper mills. The STM Integrity Hub and its prototype paper mill detector shows what can be achieved through cross-publisher collaboration. The use of automated, AI processes that can spot duplicate publications and other issues between publishers are a crucial development, given publishers’ often distinct submission and publication systems.

$Abstract microscopic photography of a Graphics Processing Unit resembling a floor plan or fractal art$

However, AI tools cannot do this alone and human judgement also plays a crucial role in safeguarding research integrity. From F1000’s experience of open research, we know that rigorous checks prior to publication by both AI and experienced experts are integral to maintaining research integrity in publications.

There is also a crucial need for more training and education for researchers in publishing ethics as well as research integrity. Many types of misconduct or bad practice issues are not deliberate, but rather the consequence of an inconsistent quality of training. This includes training in both good research and publishing practices, including the roles and responsibilities of authors. It’s also important to be aware of what good peer review looks like given that most peer review is still typically conducted anonymously: most researchers only see peer review reports on their own work (unless they happen to also be an Editor of a journal). There is a key role for many of the stakeholders in the scholarly ecosystem to collaborate on this and make such training open to ensure researchers, wherever they are based, can access adequate high-quality information.

There is already a rapidly accelerating race taking place between paper mills using ever more complex AI to produce fake papers at scale and publishers employing ever more sophisticated AI technology to detect issues. Ultimately, to paraphrase one of the best-known, popular reflections on AI and humanity: The future of research is not set. There is no research integrity, but what we make for ourselves.

The content generated on this blog is for information purposes only. This Article gives the views and opinions of the authors and does not reflect the views and opinions of the Impact of Social Science blog (the blog), nor of the London School of Economics and Political Science. Please review our comments policy if you have any concerns on posting a comment below.

Image Credit: Fritzchens Fritz, Better Images of AI, GPU shot etched 1, (CC-BY 4.0)

Rebecca Lawrence

Sabina Alam

December 15th, 2022

AI paper mills and image generation require a co-ordinated response from academic publishers

Rebecca Lawrence

Sabina Alam

December 15th, 2022

AI paper mills and image generation require a co-ordinated response from academic publishers

About the author

Rebecca Lawrence

Sabina Alam

1 Comments

Leave a Comment Cancel reply

New AI tools that can write student essays require educators to rethink teaching and assessment

May 17th, 2022

The crumbling façade: my experience working for an essay mill

December 12th, 2017

The Lovelace Effect – AI generated texts should lead us to re-value creativity in academic writing

December 6th, 2022

University students are buying assignments – what could, or should, be done about it?

February 28th, 2017

Rebecca Lawrence

Sabina Alam

December 15th, 2022

AI paper mills and image generation require a co-ordinated response from academic publishers

Rebecca Lawrence

Sabina Alam

December 15th, 2022

AI paper mills and image generation require a co-ordinated response from academic publishers

About the author

Rebecca Lawrence

Sabina Alam

1 Comments

Leave a Comment Cancel reply

Related Posts

New AI tools that can write student essays require educators to rethink teaching and assessment

May 17th, 2022

The crumbling façade: my experience working for an essay mill

December 12th, 2017

The Lovelace Effect – AI generated texts should lead us to re-value creativity in academic writing

December 6th, 2022

University students are buying assignments – what could, or should, be done about it?

February 28th, 2017