LSE - Small Logo
LSE - Small Logo

Andy Tattersall

June 10th, 2024

How to translate academic writing into podcasts using generative AI

1 comment | 16 shares

Estimated reading time: 6 minutes

Andy Tattersall

June 10th, 2024

How to translate academic writing into podcasts using generative AI

1 comment | 16 shares

Estimated reading time: 6 minutes

One of the benefits of generative AI is the ability to transform one media from text, to speech, to imagery to video. In this post Andy Tattersall explores one aspect of this ability, by transforming his archive written blogposts into a podcast format, Talking Threads, and discusses why and how this could be beneficial for research communication.


Over the past decade I have written multiple articles for the LSE Impact Blog and other platforms. I believe in blogging as a vital medium to share new research and ideas. As standalone pieces, or companions to longer formal outputs, blogposts play an important role in the information landscape.

However, there are inherent limits to text and the development of new generative AI tools has made switching from text to speech easier. I therefore wanted to see whether I could combine blogging with another favourite activity of mine, podcasting.

My plan was to breathe new life into this archive of written articles, many of which remain relevant and well read. Of course, I could record these articles as podcasts using my own voice (I have the skills and technology), but I wanted to explore whether AI could create more streamlined and accessible ways of podcasting for academics.

Text to speech technology is progressing rapidly and there are a growing number of tools that can help with accessibility. Audio versions are nothing new, audio books are well established and audio versions of research papers are something publishers have experimented with for some time, although usually with mixed results. Podcasts are still regarded by many in academia as a new way to disseminate ideas and research, but they too have been around for the last two decades. According to statista, podcast listenership in the UK is continually growing with an estimated 21.2 million podcast listeners in 2022. Why not then (to coin a new word) blogcasts?

Text to speech technology is progressing rapidly and there are a growing number of tools that can help with accessibility.

Like most new digital technologies, the slow adoption of podcasts in academia is largely due to barriers relating to time, finances, confidence and knowledge. However, with the right support and training, anyone can make and share a podcast. This could be a rough and ready lo-fi podcast, although this still requires planning, hosting and editing skills. For many anxiety around recording (or even hearing) their own voice can be challenging. Ultimately, content has to be worthy of a listener’s time, this might not be BBC quality, but the better the presentation and sound quality, the more likely listeners will engage.

Hence my project. After seeking permission from this blog and The Conversation to republish my articles in a new format, I set up a new podcast account on Spotify called Talking Threads. Whilst there are many available options in the market, I chose a tool called Augie, which largely focuses on creating video and animations using AI. It has a feature that allows you to add text which can be used to generate a video with an AI voice narrating it with your words. I used it primarily to export an audio file, as the AI’s choice of images remains well wide of the mark for such niche topics.

I provided an intro to the podcast using my own voice and then reformatted all of my old articles so that Augie had a better chance of reading the text correctly. It was surprisingly good at pronouncing names, but struggled with compound words such as paywall, which I changed to pay wall. The same applied for acronyms such as API and DOI which I changed to A P I and D O I. It was interesting hearing what I had written read back to me so clearly, but it also highlighted a few occasions where I needed to modify the text to sound better. Once I had ironed out any issues and had a full read through of the modified article, the process was fairly simple.

The brief guide by Augie explains how to add text to generate your audio. The steps are as follows, select ‘create’ and then choose ‘I have text that I want to turn into a video’. Then paste in your script and choose a voice and then preview it. Once you are happy with the recording, click the three dots on the play/preview button and select download. This will then download your audio as an mp3.

One consideration I had to make was the choice of voice. Augie has a few dozen voices to choose from. The majority of these are American, as with AI image generators there appears to be a bias to U.S. outputs. I decided as a white English male that I would choose white English male voices for the podcasts, as I initially felt they should represent the author. Then it struck me that audio books may not be read by someone with the same background as the author. So I experimented creating a couple with American and Australian accents and one female voice. Whilst this may raise other issues around representation, if I was to do this again from scratch, I would certainly consider increasing the variety of voices in the mix.

My approach to technology adoption is driven by two factors. Firstly, I adopt technology based on pedagogical reasons through the lens of research communication. Secondly, to explore the novelty of new technologies and its possibilities. Podcasts can take an idea, theory, and something written and make it more accessible. This naturally assists people who have sight impairments or disabilities, but it also includes people who just want a break from a computer screen. They are also portable and offer a break from the onslaught of the written word in the work setting. While a journal paper or book demands your full attention, audio also provides a more ambient way of engaging with ideas, as people commute or go about doing other things.

While a journal paper or book demands your full attention, audio also provides a more ambient way of engaging with ideas, as people commute or go about doing other things.

The other factor, novelty, by no means underplays the first reason. As the communication theorist Marshall McLuhan championed; ‘The medium is the message’ and exploring AI and audio may be a catalyst to bring attention to your work. Of course, like podcasts themselves, they are not for everyone and ideally you would want to record your own podcast using your own voice. This was certainly an option for me, but it was refreshing to use a variety of voices. The quality and tone of them is professional, they were able to pick up irony and apply pauses in the right places. To record one’s own voice from text requires practice, otherwise it can sound just as wooden and boring as a generated voice. We have all experienced presentations delivered where the speaker has read pages of text verbatim to an audience and managed to lose their attention after the first few minutes due to having the wrong tone.

The AI did not notably suffer from the issue of sounding bored of its own voice. If you want others to take an interest in your ideas and research then it is key that the narrator sounds engaged with the text they are reading. AI produced podcasts open up other future possibilities to produce translated language versions (as long as they are properly reviewed). AI podcasts may not be for everyone, but they do offer a solution to individuals and group who do not have access to recording equipment or lack confidence. And just in case you were wondering, this blog post was 100% authored by myself.

 


You can listen to Andy’s blogcast, Talking Threads – Where AI Meets Impact by following the link. You can also find all of Andy’s LSE Impact blogposts here

The content generated on this blog is for information purposes only. This Article gives the views and opinions of the authors and does not reflect the views and opinions of the Impact of Social Science blog (the blog), nor of the London School of Economics and Political Science. Please review our comments policy if you have any concerns on posting a comment below.

Image Credit: Google DeepMind via Unsplash.


Print Friendly, PDF & Email

About the author

Andy Tattersall

Andy Tattersall is an Information Specialist at The School of Health and Related Research (ScHARR) and writes, teaches and gives talks about digital academia, technology, scholarly communications, open research, web and information science, apps, altmetrics, and social media. In particular, their applications for research, teaching, learning, knowledge management and collaboration. Andy received a Senate Award from The University of Sheffield for his pioneering work on MOOCs in 2013 and is a Senior Fellow of the Higher Education Academy. He is also Chair for the Chartered Institute of Library and Information Professionals – Multi Media and Information Technology Committee. Andy was listed as one of Jisc’s Top Ten Social Media Superstars for 2017 in Higher Education. He has edited a book on altmetrics for Facet Publishing which is aimed at researchers and librarians. He tweets @Andy_Tattersall and his ORCID ID is 0000-0002-2842-9576.

Posted In: Academic communication | AI Data and Society | Featured | Podcasts and Research

1 Comments