The JournalismAI Fellowship began in June 2022 with 46 journalists and technologists from news organisations globally collaborating on using artificial intelligence to enhance their journalism. In this series of articles, our Fellows describe their journey so far, the progress they’ve made, and what they learned along the way. In this blog post, you’ll hear from team Context Cards.
Context Cards is a machine learning model that creates and suggests context — data, bio, summary, location, timeline — as short-form content in long-lasting news cycles to audiences and journalists, alongside an article. The model trains on newsroom archives, and learns from editors’ feedback.
The closest similar product is X-Ray in Amazon Prime Video. It showcases scenes, cast, trivia, etc. in context of the movie scene that one is watching.
We know from our — Times of India (TOI) and Code for Africa — collective experience that providing context and data alongside news stories builds trust among audiences. Furthermore, the time it takes to debunk and contextualise means the misinformation spreads faster than it takes to debunk the story.
Why short-form content
Coming into the JournalismAI Fellowship, we already knew that audiences:
- Aren’t reading long-form fully
- Prefer to read short-form content that can be digested quickly
- Aren’t necessarily aware of the context behind long-lasting topics
Hence, TOI spent the better part of a year building NewscardCMS — a platform for developers to create modular content templates and for editors to author them. You can read more about Newscards on Medium.
Why automate with AI
From deploying NewscardCMS, we learned that:
- It isn’t straightforward for editors trained in writing in the inverted pyramid style to author content in cards
- The newsroom workflow chases what’s new. Hence, the workflow doesn’t allow for updating evergreen cards
- It also isn’t obvious for the desk to attach existing evergreen cards to stories
Hence, it is a worthy goal to use AI to automate not only the creation but also the plugging of context cards.
We decided to focus on long-lasting and slow developing topics instead of fast-developing and possibly short-lived topics.
Editorially speaking, many of the development and divisive political issues we face in our information space are long-lasting, slow-developing topics. For example, stories related to gender rights in Taliban ruled Afghanistan will last for years to come.
It also meant that our algorithms wouldn’t be live pushing content to audiences. Our editors would have the time to supervise and curate the output from the algorithms.
From a business perspective too, we would be spending previous computational infrastructure on topics that have a long arc and thus long shelf life.
Finally, the algorithms that mine nuance and context out of an archive of stories are different from the algorithms that predict early signals and momentum.
Why news cycles and not topics
Generally, a topic in computer science parlance is often related to taxonomy. A taxonomy has a fixed set of buckets in which content can be classified. To experience taxonomy or topics, browse through your Twitter feed and you’ll find tags like the one below highlighted in pink.
However, what we are interested in is news cycles. To experience a news cycle, let’s look at Twitter again. On the top right side corner, Twitter showcases the most trending news topic. For example, protests in Iran.
And when you click on it, it takes you to an aggregation page dedicated to that news.
Building the case for our stakeholders
With the project, our editorial team wanted to serve clear information needs of audiences.
However, most news products are broad, i.e., they cover everything from politics, elections, sports, markets, etc. And each topic serves the audience’s information needs in different ways and forms. This meant that there was no unifying framework to organise, structure, and document various pieces of information served.
To address this, we decided to build on The Algebra for Modular Journalism that was built by Clwstwr, Deutsche Welle, Il Sole 24 Ore, and Maharat Foundation, as part of an earlier edition of the JournalismAI Fellowship.
All information produced by the project would be mapped back to one of the 60 user needs questions identified in the above mentioned Algebra. Below are some of the user needs questions that we could serve:
- Q-1001: What happened?
- Q-1003: When did it happen?
- Q-1004: Who is it about?
- Q-1005: Where did it happen?
- Q-1010: What has got us here?
- Q-1017: Can you tell me what happened in very few words?
- Q-1029: How many points of view are there on this topic?
To avoid disturbing the existing product, we intend to produce context as small widgets that can go on existing pages or create completely new pages. For example, the timeline could look like this:
The project also needed to serve clear product and business goals. Hence, we decided that all output from the project should serve one of two metrics: Pages Per Session and Sessions Per User.
- Sessions Per User: Features that get audiences back to the platform.
- Pages Per Session: Once they are on the platform, get audiences to consume more content
For example, The Verge has a timeline of stories within the topic “Elon Musk bought Twitter” as an embed and a story or feed too. This increases pages per session.
If the feed had a follow button, then the Verge could have sent out a notification or email to those who follow this topic to increase sessions per user.
Progress so far
We’ve split into two teams:
We hired a consultant —Anuj Karn — to build out a Named Entity Recognition algorithm.
Meanwhile, our lead data scientist and project partner, Karn Bhushan, started exploring topic modelling algorithms with guidance from Dr. Tess Jeffers, Director of Data Science of the Wall Street Journal. We were able to bring the accuracy to around 70%. You can read more about it at tech.timesinternet.in .
The biggest challenge that we foresee is translating the output from topic modelling algorithms (topics) into news cycles. Let me elaborate:
From topic to news cycle. Topic modelling are unsupervised algorithms that solve for similarity i.e. they find clusters (buckets) of similar content.
- The output from topic modelling will need to be given to editors to derive meaning from it.
- Editors interpret if one or more topics from the algorithm add up to an news cycle.
- The news cycle then needs to be labelled (headlined).
Refining the news cycle. We will need to give editors the ability to refine the output.
- False Positives: The algorithm decides that a story is part of the topic but it really isn’t.
- False Negative: The algorithm decides that a story is not part of the topic but it really is.
Maintain backward compatibility. To find new topics and thus news cycles, it becomes critical we retrain the model with new stories and the False Negative and False Positive tagging.
- However, when we retrain the model spits not a completely new set of topics.
- Hence, we need the ability to find nearness between the new topics and the old topics so that all the manual tagging and labelling can be carried forward.
Team Context Cards is made up of:
- Ritvvij Parrikh, Director of News Products, Times of India
- Karn Bhushan, Lead Data Analyst, Times of India
- Amanda Strydom, Senior Programme Manager, Code for Africa
Do you have skills and expertise that could help the team? Get in touch by sending an email to Fellowship Manager Lakshmi Sivadas at firstname.lastname@example.org.
JournalismAI is a global initiative of Polis and it’s supported by the Google News Initiative. Our mission is to empower news organisations to use artificial intelligence responsibly.