The JournalismAI Fellowship began in June 2022 with 46 journalists and technologists from news organisations globally collaborating on using artificial intelligence to enhance their journalism. At the halfway mark of the 6-month long programme, our Fellows describe their journey so far, the progress they’ve made, and what they learned along the way. In this blog post, you’ll hear from team Tracking Influencers.
It was the end of May when we jumped on a video call from Milan, London, Mexico City, Buenos Aires and New York. We were total strangers but, by the time we hung up, we had agreed on creating a methodology that allows us to investigate influencers at a large scale to uncover “shady” accounts. And we had six months to develop it.
Easy peasy (ha!).
Weeks of research and conversations with key experts resulted in lights and shadows. We received very welcoming feedback on the importance of our idea, we found similar studies and projects and some experts confirmed the idea was doable (an important element), but not on time.
The two main problems of our project were its size, “too big to be completed” on time, and that it needed a better definition of the problem itself.
We then agreed on concentrating on one specific problem, albeit not simple. Our method will allow us to find influencers who don’t disclose their commercial partnership with brands, acting against the country’s advertising rules and against codes of conduct from the EU and Instagram itself.
Let’s break down the problem.
1. How to select who to track
The first phase of our model should consider a non-biased method to identify the Instagram accounts to analyse. Given the restrictions of Instagram API and the lack of privileged access to a research account, we found the solution in the social media marketing platforms. This tool has also been used in academic work.
These companies have proliferated with the increase in social media marketing and all of them tend to offer similar products at a cost. Apart from pricing, the main differences are in how the data is collected, the size of their databases, how and to what extent that information can be exported.
Some of these platforms use an opt-in method, so users pay to appear on their search engine. Some others only track people who are promoting brands who they have an agreement with. Their databases can vary widely from 1.6M users to 123M (as an example), and the cost depends on the amount of detailed reports available for download.
All of these platforms include filters to search for accounts, and some of these filters use algorithms -not always transparent- to retrieve that information. Finally, a few provide a RESTful API to query the database.
We have considered all of these limitations before selecting the platform. After that, and also following academic research, we defined a sector and used the marketing platform’s dashboard to get the list of influencers, testing hashtags and using filters such as the country, language and performance metrics.
2. How to track their content
We have tested the libraries Instascrape, Instaloader and Selenium to directly scrape Instagram, which all required a user login or a session id. To avoid being blocked and exceeding Instagram limits, we have implemented a procedure using ephemeral cloud functions to collect basic influencer information, posts, images and videos.
If Stories were also needed, we might have to consider automating the scraping. But extra content also means extra storage, and this also comes at a cost.
We have collected data from a random sample of influencers to roughly estimate the space necessary, and we are also working with a dataset shared by a researcher specialised in social mining to familiarise ourselves with the data and the storage requirements.
The amount of content we can store will also condition the number of accounts selected using the marketing platform’s dashboard.
3. How to identify undisclosed partnership
We have found similar research which will be useful to solve this problem. We have identified examples of what “undisclosed partnership” looks like (such as this, this and this). And we have researched what the regulations say about promoting brands, which will be important to classify a post as “properly disclosing the commercial interest” or not.
And no, we haven’t solved this problem yet, and we might need more video calls to come close to finding a solution. But we now know who likes cats, who has children, who is well organised, who has enjoyed holidays abroad… At least, we are not complete strangers anymore.
The Tracking Influencers team is formed by:
- Carmen Aguílar Garcia, Senior Data Journalist, The Guardian (was with Sky News at the start of the Fellowship)
- Przemyslaw Pluta, Head of Platform Solutions, Sky News
- Juliana Fregoso, Project Manager for AI and Special Projects Newsroom, Infobae
- Matias Contreras, Chief Technology Officer, Infobae
- Pier Paolo Bozzano, Journalist and Head of Content Innovation Lab, Il Sole 24 Ore
- Marina Caporlingua, Software Engineer, Il Sole 24 Ore
Do you have skills and expertise that could help team Tracking Influencers? Get in touch by sending an email to Fellowship Manager Lakshmi Sivadas at email@example.com.
Header image by Alan Warburton / © BBC / Better Images of AI / Virtual Human / CC-BY 4.0