*Following a special workshop convened by the Media Policy Project on ‘Automation, Prediction and Digital Inequalities’, Suresh Venkatasubramanian, Associate Professor at the School of Computing, University of Utah, here outlines the case for interrogating the inner workings of algorithms. **A summary of the workshop will be available on this website shortly.*

The algorithm is out of the box. Decision-by-algorithm is no longer something amusing to help us decide where to eat, or what to watch. These decisions have a material impact on our lives in ways small and large.

And that is worrying. Algorithms seem inscrutable. Mysterious. They’re balls of mathematics wrapped up in layers of code, patiently constructed by wizards sitting at keyboards in campuses.

And so we demand explanations. Unwrap the code. Expose the inner workings. Demand accountability, transparency, and above all hope that algorithms are fair, that they do not transmit or amplify all the biases present in society already.

But algorithms — learning algorithms — aren’t what we think they are. And that’s the problem. They aren’t sequences of well-defined instructions solving well-defined problems that we want solved. Rather, they are algorithms for making algorithms — meta-algorithms, if you wish. They follow random walks through hundred-dimensional spaces while trying to climb hills, and each little peak they reach is an algorithm that will make your decisions for you.

Even the people building meta-algorithms can’t be quite sure what they’ll spit out, and what unintended consequences the results will have. If we want to persuade these systems to generate decision-making procedures – for approving housing loans, admitting students to college, for hiring, or even for sentencing criminals — that are acceptable, we have to speak their language: the language of mathematics. And we have to translate notions of social good — fairness, transparency, accountability and the like — into an abstract framework that can be understood by code.

In my research, what I’m looking for is no less than a mathematical theory of fairness in learning. Not the one definition that will rule them all, but a systematic way to formalise hidden assumptions about fairness and justice so that we can evaluate an algorithm for how well it complies with these assumptions.

Machine learning algorithms work by building a model of behaviour and then using it to make predictions. We do this by collecting training data and feeding it to an algorithm that “learns” a model. Let’s consider whether such an algorithm might be acquiring bias from the data it’s trained on. In our work, we’ve exploited a simple trick that computer scientists have used for a long time: simulation. Here’s how it works. Let’s imagine that we’re trying to build an algorithm to predict whether a customer is a good risk for a car loan. Given historical patterns of discrimination in the United States, we’d like to avoid using an individual’s race as part of this model. We could strip the training data of this attribute, but it’s possible that other attributes in the data – where a person lives, what their name is, and so on – could provide enough correlated information to determine an individual’s race. But how do we determine whether this might happen? Essentially we’re saying that an algorithm might still learn a biased outcome *if a separate algorithm can learn the missing attributes from the remaining data*. In other words, *we can measure the information flow from the training data to the algorithm by determining how well we can predict attributes that might lead to bias*.

This idea — that we can quantify information flow via computation — is an old and time-honoured idea in computer science and information theory, and has been exploited in other domains where information flow is important, like privacy.

There are two important consequences of the way we frame the problem of detecting bias. Firstly, we eliminate the messy issue of what forms of information might be transmitted or not, because our definition of information flow does not depend on any specific kind of information, or any specific way in which an attacker might try to get at it. Information is transferred if we can reconstruct it. No more, no less. Secondly, this sets up a virtuous ‘arms race’ where even if algorithms that learn biased outcomes get more and more powerful, algorithms for *detecting* potential bias get correspondingly better as well.

We can take this idea further. Suppose we want to eliminate the possibility of an algorithm learning bias from its training data. Then we must modify the training data to remove information in such a way that our simulator can no longer predict the eliminated attributes. No prediction, no information. Of course, there are many ways to remove information, and some are more dramatic than others. For example, if we only cared about ensuring a fair outcome, we could scrub the data of *all *identifying features, rendering it fair – but useless. So while maintaining the condition that we can no longer predict eliminated attributes, we also want to do this “minimally” so that we don’t change the data too much. This was the basis of our recent work on certifying disparate impact.

And in new work, we’ve gone even further. Suppose we have a blackbox algorithm that makes predictions based on a number of different features, or knobs. Can we tell which knobs appear to be more important than others? Again, the simulation-perspective provides an answer.

We can use the methods from above to eliminate the information provided by a certain feature (“knob”) and see how the results change. If they don’t change, the knob couldn’t have been that important. And if they do, it probably was.

Ultimately, what we seek is an abstract framework to reason about fairness and algorithms. One in which assumptions and beliefs are clearly laid out, and algorithms can be tested against these beliefs.

All of what I describe above reflects joint work with Sorelle Friedler, Carlos Scheidegger, and our students.

*This blog gives the views of the author and does not represent the position of the LSE Media Policy Project blog, nor of the London School of Economics and Political Science.*

*This post was published to coincide with a workshop held in April 2016 by the Media Policy Project, ‘Automation, Prediction and Digital Inequalities’. This was the third of a series of workshops organised throughout 2015 and 2016 by the Media Policy Project as part of a grant from the LSE’s Higher Education Innovation Fund (HEIF5). To read a summary of the workshop, please click here.*

## 3 Comments