This article is by Marco Lehner, Product Developer at BR AI + Automation Lab. It recaps some of the lessons learned by the BR team while working on Science in Context, a project developed as part of the 2021 JournalismAI Collab Challenges, in partnership with the Science Media Center Germany.
The public reputation of journalism articles written by computers ranges between horror scenarios and the promise of universal happiness. At BR’s AI + Automation Lab, we work with various technologies for text generation, such as neural language models like GPT-3, as well as other technologies based on artificial intelligence (AI). But which tools are most suitable for journalism? Can AI really write like a human?
When the Guardian published an article written by a computer in September 2020, the authors provocatively asked in the headline: “Are you scared yet, human?” – cleverly alluding to journalists’ fears of eventually being replaced by AI.
The Guardian used the GPT-3 language model to generate the article. Language models are mathematical models designed to represent natural language. To do this, they are trained on large amounts of text.
Where are all the machine-written texts?
GPT-3’s journalistic career has been pretty stagnant since then. At the Lab, we have only noticed one book review in the New York Times that was written in part by GPT-3. Even the article in the Guardian was not written by a computer alone: “GPT-3 produced eight different outputs, or essays. Each was unique, interesting and advanced a different argument. The Guardian could have just run one of the essays in its entirety. However, we chose instead to pick the best parts of each, in order to capture the different styles and registers of the AI.”
This means a human editor assisted GPT-3 in arranging the text. However, simple machine-generated texts have been published for years, including at BR. These texts use processes that are more data- and rule-based, making them predictable and reliable.
GPT-3 and the facts: It’s complicated
In our experiment as part of the 2021 Journalism AI Collab Challenges, we created fact boxes for climate change stories using AI and automation techniques.
In doing so, we put GPT-3’s factual accuracy to the test. Our system was asked to automatically create a fact box with the most important background information for a given article. We obtained the content for the fact box from a database built up by the Science Media Center Germany. GPT-3 was then instructed to shape the single statements from the database into a fluent text.
In the next step, we asked our colleagues responsible for writing fact boxes to review these texts. The first example had a fictitious number appearing in the generated text. After this blatant error, our editorial colleagues were clear: every single statement of the text must be fact-checked.
Up to this point, this is not unusual, because these check-ups also happen with texts written by human writers. However, those writers disclose their sources. GPT-3 does not do that and that is why fact-checking the generated texts takes an incredibly long time. When a very specific number is mentioned in the text, exactly this number has to be confirmed or refuted.
A number made up by the language model is confusing and hard to spot for human editors. The conclusion of our colleagues was that it is easier and faster to research and write the whole text yourself instead of dissecting the generated text.
Using Language AI responsibly
This example illustrates an important principle for the use of AI, as laid out in the BR AI + Automation Lab’s ethical guidelines:
Therefore, we ask ourselves every time before employing new technology: Does this really offer a tangible benefit to our users and employees at BR? The work of our journalists is and will be irreplaceable. Working with new technologies will augment their invaluable contribution and introduce new activities and roles to the newsroom.
AI is not an end in itself, but should help us do better journalism. In our experiment, however, GPT-3 did not simplify the workflow at all. In fact, it makes it more complicated. Therefore, we ruled out GPT-3 for this particular use case.
However, we have learned a lot about setting up hybrid workflows between humans and AI technologies. We will continue to explore the idea behind the automated fact boxes with the editors from the archive.
Three Types of Text Automation
It was important to us during the JournalismAI Collab Challenges to try out different approaches and also to publicly reflect on the results. Therefore, we would like to take a closer look at the three currently most important approaches to text generation. These are:
- Clozes
- Rule-based texts
- Language models
In each of the automation variants mentioned, the experience of journalists plays a key role. In the following paragraphs we dive deeper into the different kinds of text automation.
Clozes with data
The simplest variant of text automation is the cloze. We use this in the Lab for the automation of our Corona newsletter.
The cloze is written by journalists as if they were writing a normal article. The journalists know the dataset behind the article. If the data changes, the text adapts to it as the journalists intended.
In the cloze, individual words can be swapped if the incidences fall or rise, but the structure of the text remains the same. If something in the sentence structure needs to change, the sentence must be rewritten by humans and passed to the system as an alternative for defined cases.
The strength of this form of automation is that journalists can precisely control the wording. This is especially important with the Corona newsletter: reported cases are just that – cases that have been reported and nothing else. Linguistic variance used by less data-savvy humans is a potential gateway for errors here, eliminated by a machine which fills in a carefully worded cloze.
The cloze is then reviewed thoroughly once before it is published. The varieties that the text can form are finite and we can run through each option to check it. After that, the newsletter can be published automatically because we know it will be correct with the given data.
This automated newsletter is by far our most popular piece internally. It compiles and calculates all COVID-related numbers and indicators that are relevant for BR to cover in the state of Bavaria. Regional studios, radio hosts, or local journalists do not have to dig up relevant data for their area on their own. Therefore, they can use the time they won on their actual reporting.
Rule-Based Texts with Grammar
When automating basketball coverage, we have other challenges. Here, we don’t want to publish two identical texts. Therefore, we work with a framework of rules that dictates how the individual parts of a sentence are strung together, but not in detail how each sentence should look. The basis for this is still a table of data.
This way, it is easier to change the sentences. Since the program views each sentence as a pattern of rules, it can rearrange sentences by itself if we want it to. Unlike the cloze, we do not have to rewrite the sentences for this.
Our system writes a new generated text as a draft into the CMS. The sports editors get a notification so that they can check and approve or adapt the text. The human touch is important in this workflow. Editors can add important background information or out-of-ordinary events, especially content that is not represented in the data of the match.
Language Models and Stochastic Parrots
Language models generate text without a journalist defining the structure or content beforehand. For text generation, we experimented primarily with GPT-3, so we describe our experience with this model. However, we believe the findings can be applied to many other models.
For language models, the structure cannot be specified, and this introduces a major problem we encountered in one of our experiments. As described earlier, in an info box on global warming, numbers suddenly appeared that were not given in the scientific input from the database of the Science Media Center.
Even after longer research, we could not find any evidence for the numbers – the model simply made them up. How could this happen?
To write up a fact box, we give the model a few examples of what we expect. These consist of a sentence like “I am a science journalist and would like to write an article from these sentences:”, followed by some key statements. Then we show the model a text example of how the fact box could look like.
The last example is our actual input. Here we don’t show GPT-3 a fact box, but let it generate one for us, as shown in the screenshot below. The bold text is our input, the light text was generated by GPT-3 itself.
The examples make it more likely that the model will actually provide a fact box because it has already seen the introductory sentence and similar examples in the training data. But in this rather scientific context, numbers come up frequently in the training data, so GPT-3 is tempted to give them to us, even if the input does not include them.
Large language models were somewhat disparagingly referred to as “stochastic parrots” in one paper, because they don’t understand the text they are writing. Instead, they parrot what they have seen before and think is the most likely output at that moment.
By giving examples to GPT-3, we increase the probability of a desired output. However, since the model does not understand what the input means, we can never be sure that the output is actually what we need it to be.
Therefore, each text of a language model must be editorially approved. It is true that the texts are more stylistically and linguistically diverse than texts from the more rigid systems. However, compared to the other two methods, the review process is quite time-consuming and, from our perspective, only worthwhile in rare circumstances.
Our approach: High control through hybrid workflows
When we use artificial intelligence at BR, we always first ask ourselves “Which technology is the best match for the problem we try to solve for our users?”
In text generation, we try to find a balance between stylistic variance and controllability, and then select the appropriate technology. The next step is then to work with the users to find the right workflow and define responsibilities and approval processes.
The Corona newsletter with the most important figures on the pandemic, for example, can be published automatically on a daily basis, because the incoming data from the Robert Koch-Institute (the German CDC) are mostly composed in a recurring format.
For the automatically generated articles on basketball games, we use data from the German sports publisher Deutscher Sportverlag. However, the texts can only contain what is represented in the data and was taken into account when the set of rules was established.
If something extraordinary happens, like a paraglider flying into the stadium as in a game of the European Football Championship, it will not be in the data, even though it might have been the most remarkable event of the game. In such cases, people have to adapt the text before publication. That’s why the articles arrive only as drafts in the CMS, and are checked, supplemented and then published by the sports editorial team.
Does AI write better than journalists?
When it comes to suitable forms of presentation, such as the match report, machine-generated texts can hardly be distinguished from human-written texts. But humans remain indispensable for the design, control and approval of content.
A computer cannot take editorial responsibility. It lacks an understanding of truth and has no knowledge of the world around it. World knowledge is necessary, however, to judge whether an article reflects reality.
With well-crafted approval workflows, we can use AI to make our journalistic work easier. This leaves us more time for creative tasks and research, which is also a lot more fun than typing out rows of numbers from tables.
The BR AI + Automation Lab works at the intersection of journalism, computer science and product development for German public broadcaster Bayerischer Rundfunk. Our mission is to make the benefits of AI and automation accessible to journalism. In doing so, we pay particular attention to the ethical dimension of new technologies and that our solutions bring added value to our users and coworkers. Our journalistic focus is on algorithmic accountability reporting.
This article was originally published in German on Medium, with the title: Sind Künstliche Intelligenzen die besseren Journalist:innen?
Explore all the projects of the 2021 JournalismAI Collab Challenges on the website and sign up for the JournalismAI newsletter to stay up-to-date about our activities in 2022.
JournalismAI is a project of Polis, supported by the Google News Initiative.
Header Image: Alina Constantin / Better Images of AI / Handmade A.I / CC-BY 4.0