The need to represent: How AI can help counter gender disparity in the news |

Can AI-powered tools help counter the misrepresentation of women in the news?

For the first in our new series of JournalismAI Community Workshops, we decided to look at three recent projects that demonstrate how AI can help raise awareness on issues with misrepresentation of women in the news.

The Political Misogynistic Discourse Monitor is a web application and API that journalists from AzMina, La Nación, CLIP, and DataCrítica developed to uncover hate speech against women on Twitter.

When Women Make Headlines is an analysis by The Pudding of the (mis)representation of women in news headlines, and how it has changed over time.

In the AIJO project, journalists from eight different organisations worked together to identify and mitigate biases in gender representation in news.

We invited, Bàrbara Libório of AzMina, Sahiti Sarva of The Pudding, and Delfina Arambillet of La Nación, to walk us through their projects and share insights on what they learned and how they taught the machine to recognise what constitutes bias and hate speech.

The Political Misogynistic Discourse Monitor

To fight hate speech against women on social media, a group of Latin American news organisations came together to develop a tool that can quickly and assertively monitor attacks on women candidates on social media in the lead-up to an election.

Bárbara Libório, Data Journalist & Journalism Manager at AzMina in Brazil, was part of the team working on the Political Misogynistic Discourse Monitor. AzMina uses information, technology, and education to fight gender violence and, in 2020, monitored attacks on social media on women candidates for the municipal elections in Brazil.

To identify which tweets were offensive, a filter of offensive terms and expressions was applied. This process required a huge human effort to analyse each tweet and check if it constituted a misogynistic attack. During the 2021 JournalismAI Collab Challenges, the team of journalists at AzMina decided to join forces with other organisations in Latin America to find out if AI technologies could help do the same type of analysis on Twitter faster and in a more efficient and scalable way.

To learn to detect misogynistic discourse, a machine learning model needs to first understand what misogyny is by seeing many examples of it. So the first step in the process was to create a database with hand-labelled tweets that indicated whether or not they were misogynistic.

The team decided to train a model to detect misogynistic discourse both in Portuguese and Spanish. They achieved great results in Spanish and less great, but still good enough results in Portuguese.

The next step was to create a web application to allow users to analyse text samples or even entire files with just one click. This application is currently being developed and, ultimately, it will be available for other projects that are mapping gender violence across the web.

To summarise what they learned from the project, Libório explained: “So far, we’ve learned a couple of key things. One is that words are always subject to context. So every time we want to put together a dictionary to train a model, we need to make sure to adapt it to the context. Another learning is that even for human annotators, it’s often difficult to agree on whether something is misogynistic or not. So we definitely shouldn’t delegate these important decisions to algorithms only!”

When Women Make the Headlines

What does it look like when women make the headlines? According to the report of the 6th Global Media Monitoring Project, women only make about 20/25% of news mentions. Half of the world’s population is systematically underrepresented in the news. But how do they actually appear when they do make headlines?

Sahiti Sarva and Leonardo Nicoletti co-authored this analysis for The Pudding of the (mis)representation of women in news headlines.

They researched the headlines of four countries (UK, US, India, and South Africa) in 186 publications between 2005 and 2021. They scraped all of the headlines tagged with a set of twenty keywords that refer to women: woman, girl, female, and so on.

Before applying any machine learning model to the research, there’s an immense amount of work that goes into cleaning and removing seemingly useless words from the headlines. In NLP (Natural Language Processing), these are called “stop words”: a dictionary of words that are so commonly used that they carry very little useful information.

But that’s not always a straightforward process. The second most commonly used word in the headlines analysed by the project is” first” – normally a “stop word” in NLP dictionaries. In this case though, the word “first” is very important because in the headlines it was often used to talk about “the first female president”, or “the first woman to be defence minister”. After removing stop words from the data, the authors had to go back and manually reinsert the word “first” when used in a meaningful context for the research.

As relying on existing dictionaries alone was not useful for this purpose, the authors decided to create their own dictionaries in order to be able to identify gender bias in headlines. They relied on a taxonomy proposed by Hitti et al. that combines terms of gendered language like “actress”, “daughter”, “wife”, with words that reflect behavioural and social stereotypes, such as “beautiful”, “emotional”, “supportive”, or “dramatic”. They assigned a bias index to every headline in the datasets to categorise all the publications. This allowed them to find out and describe how some publications are more biased than others in their headlines.

To further their research, Sarva and Nicoletti performed a sentiment analysis on the headlines and found that, when women make headlines, the story is often a lot more sensational than the average headline. This could be due to the fact that stories that focus on women are twice as likely to be of violent nature than empowering. The graph here shows how it’s twice more likely for headlines about women to be about crime and violence rather than empowerment.

Sarva explains: “At the end of this project, we realised that there are so many questions that are yet to be answered. How often are perpetrators mentioned in new news headlines about crime and violence against women? How often do we refer to women in leadership positions as “women vice president” or “female mayor” instead of calling them by their name? And what are the patterns we are not seeing?”

The authors plan to continue researching this topic and explore how unsupervised machine learning could further enhance their research.

The AIJO project and Gender Gap Tracker

The AIJO project came out of the JournalismAI Collab in 2020 as a collaboration between eight different news organisations that worked together to understand, identify, and mitigate newsroom biases.

Delfina Arambillet, Data & Innovation Journalist at La Nación in Argentina, was part of the team working on the AIJO project. She joined our community workshop and described how the project was imagined and developed.

The team focused on the use of computer vision, which is one of the many fields under the AI hood. They used this technology to identify gender biases in news media by analysing who is depicted in the images used by news organisations to illustrate their stories on the front page.

The process had two steps. First, the team deployed a face detection model introduced by the Imperial College of London in 2020 to sift through the images in search of human faces. The second step was the gender classification: a second model was used to identify the gender of the people that appeared in the images selected in the first phase.

The results showed that 77% of pictures in the front pages of the participating organisations depict men and only 23% women. Even though the sample couldn’t allow for statistical significance, these results are already a valuable insight on the lack of women representation in news media.

Arambillet and the AIJO team then decided to take the research a step further and extended this experiment to text analysis, partnering with the Gender Gap Tracker.

This analysis was run on the publications of participating organisations that are published in English, with the goal to measure how many quotes mention men and women in news articles. The results were quite similar to the image analysis: 73% of quotes came from men, and only 22% from women.For the remaining 5%, it was not possible to identify the gender of the source.

Conclusion

By analysing the misrepresentation of women in the news, these three projects help raise awareness on the biases of our journalism – the first step in mitigating them and making our journalism more inclusive.

As AI tools become increasingly sophisticated, we hope and expect to see news outlets investing more in these applications. Not only using them to report on global issues with gender representation but also to begin mitigating the effects of gender disparity in their own content and production. Hopefully, the open sourcing of tools like the Harassment Manager by Jigsaw will make these kinds of analysis more accessible for newsrooms worldwide.

The three projects described in this article were presented during the first in our new series of JournalismAI Community Workshops. You can watch the recording of the session on YoTube:

Don’t miss out on our next community workshops. Make sure to sign up for our newsletter and join our Telegram group to stay up-to-date with our upcoming events and exchange ideas with our global community of journalists and technologists.

JournalismAI is a project of Polis, supported by the Google News Initiative.

Sabrina Argoub

March 16th, 2022

The need to represent: How AI can help counter gender disparity in the news

Sabrina Argoub

March 16th, 2022

The need to represent: How AI can help counter gender disparity in the news