Sentiment analysis is an increasingly popular metric for news and social media platforms. Alison Powell reflects here on the implications of sentiment analysis and its potential connection with the rise and intensification of emotion-driven politics. The data inputted to ‘train’ algorithms on sentiment analysis has enormous impact and is imbued with assumptions about the world. What mechanisms might make these algorithms more accountable?
Recently I was at the Big Boulder social data conference discussing the use of algorithms in managing social data. Then, since I live in the UK, Brexit Events intervened. Sadness and shock for many have since morphed into uncertainty for all. Online media, driven by the social analytics I heard about in Boulder, shape and intensify these feelings as we use them to get our news and connect with people we care about. This raises some really important issues about accountability, especially as more news and information about politics gets transmitted through social media. It also stirs up some interesting questions about the relation between industry focus on sentiment analysis of social media in relation to brands, and the rise of emotion-driven politics.
So in this post I’ll talk about why algorithms matter in moments of uncertainty, what it might mean to make them accountable or ethical, and what mechanisms might help to do this.
Algorithms present the world to you – and that’s sometimes based on how you emote about it
Algorithmic processes underpin the presentation of news stories, posts and other elements of social media. An algorithm is a recipe that specifies how a number of elements are supposed to be combined. It usually has a defined outcome – like a relative ranking of a post in a social media newsfeed. Many different data will be introduced, and an algorithm’s function is to integrate them together in a way that delivers the defined outcome. Many algorithms can work together in the kinds of systems we encounter daily.
One element of algorithmic systems that find interesting at this moment in time, and that’s sentiment. Measuring how people say they feel about particular brands in order to better target them has been a key pillar of the advertising industry for decades. With the expansion of social analytics, it’s now also the backbone of political analysis aimed at seeing which leaders, parties and approaches to issues acquire more positive responses. But could too much of a focus on sentiment also intensify emotional appeals from politicians, to the detriment of our political life? What responsibility do social media companies bear?
Image credit: Pixabay Public Domain
Social Media Companies Filter Politics Emotionally
Increasingly, media companies are sensitive to the political and emotional characteristics of responses to the kinds of elements that are presented and shared. Sentiment analysis algorithms, trained on data that categorizes words into ‘positive’ and ‘negative, are widely employed in the online advertising sphere to try to ascertain how people respond to brands. Sentiment analysis also underpinned the infamous ‘Facebook emotion study’ which sought to investigate whether people spent more time using the platform when they had more ‘positive’ or ‘negative’ posts and stories in their feeds.
With the expansion of the emotional response buttons on Facebook, more precise sentiment analysis is now possible, and it is certain that emotional responses of some type are factored in to subsequent presentation of online content along with other things like clicking on links.
Sentiment analysis is based on categorizations of particular words as ‘postive’ or negative. Algorithms based on presenting media in response to such emotional words have to be ‘trained’ on this data. For sentiment analysis in particular, there are many issues with training data, because the procedure depends on the assumption that words are most often associated with particular feelings. Sentiment analysis algorithms can have difficulty identifying when a word is used sarcastically, for example.
Similarly, other algorithms used to sort or present information are also trained on particular sets of data. As Louise Amoore’s research investigates, algorithm developers will place computational elements into systems that they build, often without much attention to the purposes for which they were first designed.
In the case of sentiment analysis, I am curious as to the consequences of long term investments in this method by analytics companies and the online media industry. Especially, I’m wondering about whether focusing on sentiment or optimizing presentation of content with relation to sentiment is in any way connected to the rise of ‘fact-free’ politics and the ascendancy of emotional arguments in campaigns like the Brexit referendum and the American presidential primaries.
Algorithms have to be trained: training data establish what’s ‘normal’ or ‘good’
The way that sentiment analysis depends on whether words are understood as positive or negative gives an example of how training data establishes baselines for how algorithms work.
Before algorithms can run ‘in the wild’ they have to be trained to ensure that the outcome occurs in the way that’s expected. This means that designers use ‘training data’ during the design process. This is data that helps to normalize the algorithm. For face recognition training data will be faces, for chatbots it might be conversations, or for decision-making software it might be correlations.
But the data that’s put in to ‘train’ algorithms has an impact – it shapes the function of the system in one way or another. A series of high profile examples illustrate what kinds of discrimination can be built into algorithms through their training data: facial recognition algorithms that categorize black faces as gorillas, or Asian faces as blinking. Systems that use financial risk data to train algorithms that underpin border control. Historical data on crime is used to train ‘predictive policing’ systems that direct police patrols to places where crimes have occurred in the past, focusing attention on populations who are already marginalized.
These data make assumptions about what is ‘normal’ in the world, from faces to risk taking behavior. At Big Boulder a member of the IBM Watson team described how Watson’s artificial intelligence system uses the internet’s unstructured data as ‘training data’ for its learning algorithms, particularly in relation to human speech. In a year where the web’s discourse created GamerGate and the viral spread of fake news stories, it’s a little worrying not to know exactly what assumptions about the world Watson might be picking up.
Image credit: Clockready CC BY-SA Wikimedia
So what shall we do?
You can’t make algorithms transparent as such
There’s much discussion currently about ‘opening black boxes’ and trying to make algorithms transparent, but this is not really possible as such. In recent work, Mike Annany and Kate Crawford have created a long list of reasons for this, noting that transparency is disconnected from power, can be harmful, can create false binaries between the ‘invisible’ and the ‘visible’ algorithms, and that transparency doesn’t necessarily create trust. Instead, it simply creates more opportunities for professionals and platform owners to police the boundaries of their systems. Finally, Annany and Crawford note that looking inside systems is not enough, because it’s important to see how they are actually able to be manipulated.
Maybe training data can be reckoned and valued as such
If it’s not desirable (or even really possible) to make algorithmic systems transparent, what mechanisms might make them accountable? One strategy worth thinking about might be to identify or even register the training data that are used to set up the frameworks that key algorithms employ. This doesn’t mean making the algorithms transparent, for all the reasons specified above, but it might create a means for establishing more accountability about the cultural assumptions underpinning the function of these mechanisms. It might be desirable, in the public interest, to establish a register of training data employed in key algorithmic processes judged to be significant for public life (access to information, access to finance, access to employment, etc). Such a register could even be encrypted if required, so that training data would not be leaked as a trade secret, but held such that anyone seeking to investigate a potential breach of rights could have the register opened at request.
This may not be enough, as Annany and Crawford intimate, and it may not yet have adequate industry support, but given the failures of transparency itself it may be the kind of concrete step needed to begin firmer thinking about algorithmic accountability.
This piece originally appeared on the author’s personal blog and is reposted with permission.
Note: This article gives the views of the author, and not the position of the LSE Impact blog, nor of the London School of Economics. Please review our Comments Policy if you have any concerns on posting a comment below.
Dr. Alison Powell is Assistant Professor and Programme Director of the MSc in Media and Communication (Data & Society). She researches how people’s values influence the way technology is built, and how technological systems in turn change the way we work and live together.