Emilia Díaz-Struck is research editor and Latin American coordinator for the International Consortium of Investigative Journalists (ICIJ). She oversees data projects and has been involved in some major cross-border investigations including the Panama Papers, the Paradise Papers and the Offshore Leaks. The ICIJ receives vast amounts of files from whistleblowers and uses AI-powered technologies to sift through that data more efficiently. For our interview series with women working on the intersection of AI and journalism, Emilia spoke to us about how exactly AI is deployed and what impact it will have on investigative journalism.
JournalismAI: You have a very diverse background in journalism, having worked with major organisations such as The Washington Post, the Press and Society Institute of Venezuela and co-founding your own news site Armando.info. How did you initially move into a data-driven role?
Emilia: When I started working as a journalist, I was not involved in AI projects but it’s interesting how one thing leads to another. I was based in Caracas and I had already started working with data, not because it was fashionable but out of necessity: it was a way of bringing in more transparency and to ‘bulletproof’ stories.
That’s when I started collaborating with colleagues in other countries around the world. The first ICIJ report that involved nearly 100 journalists working together was the Offshore Leaks. The use of technology was already essential to connect people and information but things have evolved since then and AI has become a powerful tool for investigative journalism. I have seen the size of files, records and data we explore grow significantly and that’s tied to the evolution of technology and how journalists can receive information nowadays from sources and whistleblowers. From the Offshore Leaks to the Panama Papers – where we were working on millions of records – we were using technology more and more often to mine data, explore it, and share it with other colleagues. Our approaches towards the data have developed as a result of us thinking: ”How can we best respond to our journalistic needs? What are the best ways in which data analysis and technology can help us? Is there more we can do with it?”
How have AI and machine learning been integrated in ICIJ’s uncovering and reporting of such significant stories?
I remember that it was an ICIJ data analyst from Costa Rica, Rigoberto Carvajal, who first suggested that we try to use machine learning. We thought, why not, and decided to figure out a use case to experiment with. We ended up using it to identify loan agreements in the 13.4 million records of the Paradise Papers. Machine learning helped us identify a specific type of document that was interesting to follow the money, which was loan agreements tied to a big corporation. We also used machine learning on the Implant Files, to identify in the reports sent to the U.S. Food and Drug Administration, patients’ deaths potentially caused by faulty medical devices, which were misclassified as malfunctions or injuries by the device manufacturers and others filing the information.
During a fellowship at Stanford, our then director of strategic initiatives, Marina Walker, also brought new machine learning partnerships to ICIJ so we could continue developing our expertise in the area and see how AI can help journalists investigate corruption and money laundering, among other topics.
What we have figured out is that AI can be really powerful but it’s not magic. I see there is a lot of potential to use machine learning in the type of work we do because we deal with vast amounts of data. For these kinds of investigations, it would take years to manually go through and screen millions of records and make sense of them. We have all the reporting to do as well: talking to sources, cross-checking the data with public records and so on. Machine learning can help us find a needle in a haystack, help us be more efficient and help journalists figure out if we are missing connections that could actually help with our reporting.
The other thing that is interesting about machine learning is that it’s actually a team effort so it responds to the way we work – you train a computer, you build a model and you teach the computer to identify things, but the human factor is key. You need humans to give the input to the computer, to check if it’s getting things right and to check if the model can be improved.
Thinking about recent events, do you see potential to deploy AI-powered tools to improve reporting of emergencies like the COVID-19 pandemic?
Yes, I do think there is potential for that, although the key is also having the time and resources, as covering the emergency can already be quite overwhelming. Machine learning has already been used in the medical field to analyse medical imaging, for example. For the reporting and research, we will need to see what kind of data is being gathered at the moment, what data is made available, the quality of it and for which cases machine learning would be a good approach. We should be careful because there are already challenges brought by the variations and differences in the quality and availability of data across countries.
With machine learning, there are different models, including classification models and predictive regression models. In our investigations we have used mostly the classification type. I think there’s the potential to use predictive models in successful ways in the context of COVID-19 coverage. As long as a thorough fact-checking process to analyse the results of the model is always in place.
The ICIJ has been among the early adopters of AI and machine learning in journalism, while other media organisations seem to be less receptive. Why do you think that is?
I think that the case of ICIJ is special in a sense, because exploring the potential of AI has been a response to our own needs. Not everyone deals with millions of records every day. We work on large collaborations and we dedicate a lot of time to our investigations so experimenting with AI has been the outcome of us asking ourselves: How can we be more efficient? Are there any new things in the data world that can help us? Are there new approaches that we can apply and implement?
What we try to do is to find the best ways to meet the journalistic needs that we have in each project and this doesn’t mean we only use AI. We also use other data analysis approaches while we’re working on an investigation. The key is to distinguish when what we have in front of us is a problem machine learning can help us solve. I think other journalists will learn the same way, it will come with time.
You mentioned previously how important the human factor is in all the investigations you’ve conducted. Do you think that there might be a misconception that this human factor will eventually be made redundant by AI developments?
Yes, I do think there is a misconception, especially in the area of investigative journalism. Take the example of the Implant FIles, where we analysed about 8 million health records to track the harm caused by medical devices that have been tested inadequately or not at all.
A source told one of our colleagues that patients’ deaths potentially caused by medical devices were underreported and so we questioned: How can we look into that? How can we read 8 million records? With the help of technology and Rigoberto Carvajal’s analysis, we were able to identify more than 3,400 ways in which death was reported in the data. After that, we did a full refining process where reporters, researchers and members of our team were providing input so that the computer was properly trained to identify deaths in the unstructured part of the data.
Then there was the stage of analysing the results. We wanted to know whether the medical device might have contributed to the death or not. For that we also did use machine learning but there were a number of false positives that luckily were identified by the researchers. The computer was able to recognise deaths in the documents but sometimes it was actually not the patient dying – it was a relative of the patient, or it was the device “dying”, as in: the device had expired. Those mistakes could only be caught by the human eye.
The fact-checking process involved a team of eleven people manually going through the results, to make sure that every case flagged by the algorithm was correctly identified. That allowed us to finally get to a precise number of cases in which a medical device was involved in the death of a patient and the event was not reported as such to the FDA. Would that have been possible without machine learning? No. Who would have read 8 million records? But the work of the journalists and researchers was equally essential to get to the results with the highest level of accuracy.
You spoke about the potential of AI and your hopes about how it can lead to more effective journalism. But do you think that AI might also be overhyped?
I think it’s important to recognise that AI is not the answer for everything. The key is to learn for which kind of problems we can use AI to help make our journalism better.
Something we’re doing is to integrate AI with other technologies. For example, we use a tool called Datashare, developed by our tech team, to mine and explore the documents we decide to investigate. In the case of the Luanda Leaks, in partnership with the Quartz AI Studio, we used machine learning to get some clusters of all the types of records that were similar, which allowed us, for example, to group all the bills we found in the leaked documents, of which there were more than 700,000 records. We integrated those results into Datashare, so that even the users that were not familiar with AI and data analysis techniques, would have a filter they could click on to see all the bills, or any other cluster that was identified by the machine learning model.
When you combine the best of traditional reporting with the powerful things that technology can do, you can get very powerful results. But it’s not for everything and that’s what we need to recognise.
AI and technology in general traditionally tend to be very male-dominated fields. Do you think you face any challenges in this particular area because of your gender?
Fortunately, our team is quite diverse and we have some truly great women who are experts in data and technology. But the gender imbalance in these fields is a fact. It’s a challenge and we need to recognise that there should be more space for women to work and lead at the intersection of data, tech and journalism. When you have great people with a diverse skill set and different backgrounds, then you have better journalism. That’s what I believe in and that’s what I promote in our teams. Our investigations involve journalists from across the world and I can say from experience that this diversity is a powerful combination, which makes for better and more impactful stories.
Global collaboration is clearly central to the work of ICIJ and to the success of your projects. Do you think there is a difference in the perception of AI and how it is used in different contexts and parts of the world?
We need to recognise that there are many challenges that journalism faces beyond technology. There are free press challenges across the world and what we have seen is that when journalists are targeted for their work, collaboration can make a difference. It can allow stories to be investigated in places where it wouldn’t be possible for a local journalist alone, because of censorship and other risks. There are also significant gaps in terms of access to technology and resources across regions that play a role in the ability to adopt AI and benefit from its potential. In our collaborations, we share the results of the work with our global network, so all journalists from all regions can benefit from the potential machine learning can bring to their work during a journalistic investigation.
What developments do you think we should keep an eye out for in the coming years in terms of the role AI might play in journalism?
I think that we will see more and more AI-powered tools that journalists will be able to use without the need of coding skills. I would monitor closely any new tool developments that involve and integrate AI to allow journalists to explore data more efficiently.
We will also see more interdisciplinary collaborations, like our partnership with the Stanford AI lab and Professor Chris Ré. But also more collaborations between media organisations, like we have done with the Quartz AI Studio. The more organisations explore these kinds of collaborations, the sooner we will realise as an industry the value that collaboration can bring to help us make the most of the potential offered by AI technology.
The interview was conducted by Venuri Perera, Polis intern and LSE’s MSc Student. It is part of a JournalismAI interview series with women working at the intersection of journalism and artificial intelligence.
If you want to follow the series and stay informed about JournalismAI activities, you can sign up for our monthly newsletter.