The JournalismAI Fellowship began in June 2022 with 46 journalists and technologists from news organisations globally collaborating on using artificial intelligence to enhance their journalism. At the halfway mark of the 6-month long programme, our Fellows describe their journey so far, the progress they’ve made, and what they learned along the way. In this blog post, you’ll hear from team Image2Text.
In June 2022, we – a team of journalists and technologists from Argentina, Paraguay and The Philippines – decided to collaborate in the JournalismAI Fellowship to develop a product that uses AI to describe images produced at our newsrooms. We seek to process photos, videos and infographics to automatically get tags – such as names of people – in order to categorise, distribute, and archive material more efficiently. Right from the outset, we encountered a challenge and an opportunity: most computer-vision models available in the market are not trained for our specific contexts.
On the one hand, the tools available were not created for the context of journalism. For example, we ran some trials with a video that showed the recently appointed president of Chile in his first official meeting with the president of Argentina. The AI-based tool that we used was somehow successful in describing the video, but the result tags were somewhat incomplete and out of context: “suit” (their outfit), “red carpet” (there was in fact one), “men”, and “first date” (let’s say this algorithm is sort of a romantic, and interpreted the elegance of the situation as a date?).
Although accurate (with the exception of “first date”, ok), this description wasn’t relevant for journalistic purposes. There were other elements in the images that could be more useful: journalists holding their telephones, microphones, and stands, which altogether could have been interpreted by the model as a press conference. If later on, a journalist would type “Gabriel Boric, press conference” in a search engine and get that video as a result, that would indeed be helpful.
On the other hand, the AI-based tools available were not created for the specific context of our countries. When we ran our first trials, none of the presidents and prominent political figures of our countries were recognised (unlike, for example, the president of the United States). The story behind these results is a broader one, about the way AI tools are being built: the datasets used to train the models lack diversity and representation from the Larger World. They are biased toward the regions where the most amount of AI development is being done.
So, even if what got us to collaborate in the first place was to use AI to describe images, we discovered along the way that we had other motivations in common: to contribute to a more diverse AI ecosystem by having more people from diverse regions, genders and cultures building datasets, training models, and developing AI products.
After many discussions about these challenges, and the wonderful support of everyone involved in the Fellowship, we concluded that we want to develop two products:
The first one is a model trained to recognise at least 30 women politicians in our countries (this is the scope that we set up to achieve in this first stage, but it should grow, of course). The model should be shareable with others by an open-source API. We are building it in a way that allows others to contribute. If a newsroom in another country trains it to recognise politicians (or cultural figures, celebrities, athletes, etc.) they can do so and share that knowledge. The main objective is to have numerous newsrooms from different places sharing and creating knowledge.
The second product will be a starter pack designed for any newsroom that is interested in using AI to describe images. We want it to be helpful for media organisations that aren’t already familiar with this kind of technology, and might think that it is out of scope for them.
Once we agreed on what to achieve, designed a methodology, and began working on it, some hard questions arose. Since the product we are building already exists (although plagued with bias), how will our model add value? Newsrooms can use the existing tools created by big tech companies and, to an extent, train them for their specific use case. However, this will not be shareable with others, which means they will be producing valuable knowledge, but won’t be contributing to the bigger ecosystem. What sets our model apart is that it is open and contributive. Knowledge accumulates and crosses borders.
After answering that important question and learning the importance of focusing on our unique differentiator, we came across the main takeaway that we want to share here: journalists have an opportunity to participate in the creation of databases, models, and tools. We can and should be having conversations with big tech companies even as we build our models, because our motivations are different. And motivations matter for the final outcome.
Still, some challenges remain. Due to the diverse backgrounds of our organisations, we found that newsrooms manage archives differently. Some have archival material that is not digitised, others have difficulties processing their digital content. Small newsrooms may not have the tools or personnel to process images.
In some scenarios media outlets have a Media Asset Manager (MAMr) with extensive tagging and taxonomy. Yet, most MAMrs are not compatible with APIs. If we find solutions to these challenges and make these tools available to the newsrooms, we are going to help them find stories in their archives that they might not even know are there.
These and other challenges remain as we continue the development of our project. We won’t be able to answer all of them during our time in the Fellowship, but we will definitely keep researching and building solutions in a collaborative way.
The Image2Text team is formed by:
- Lucila Pinto, Product Manager, Grupo Octubre (Argentina)
- Nicolas Russo, Product Manager, Grupo Octubre (Argentina)
- Jaemark Tordecilla, Editor-in-Chief and Head of Digital Media,GMA News Online (Philippines)
- Raymund Sarmiento, Chief Technology Officer, GMA News Online (Philippines)
- Sara Campos, Product Editor, El Surti (Paraguay)
- Eduardo Ayala, Senior Full-Stack Developer, El Surti (Paraguay)
Do you have skills and expertise that could help team Image2Text? Get in touch by sending an email to Fellowship Manager Lakshmi Sivadas at firstname.lastname@example.org.
Image by Alan Warburton / © BBC / Better Images of AI / Nature / CC-BY 4.0