Public policies are increasingly being underpinned by automatic algorithmic decision making processes that have in some instances spectacularly failed.
Daan Kolkmanand Gijs van Maanen argue that efforts to make these algorithmic decisions accountable by making them transparent and explainable are ultimately limited, if these mechanisms remain unresponsive to the contextual knowledge and expertise that exists in the contexts where they are put to work.
Five years have passed since two high-profile failures in algorithmic policy, the UK A-level grading fiasco and the Dutch childcare benefits scandal. Yet, approaches to algorithmic decision-making remain static.
Despite advances in explainable AI (XAI), algorithmic registries, and new regulatory frameworks, algorithmic failures continue to negatively impact millions. Why haven’t efforts to increase transparency prevented new mistakes?
The answer stems from a misconception: That algorithmic transparency alone is enough for decisions based on them to be accountable. Interpreting simple models is challenging, even for experts; expecting non-experts to grasp complex algorithms is unrealistic. While algorithm registers and watchdogs have their place, progress demands a shift from transparency to “responsiveness.”
What is responsiveness?
Responsiveness goes beyond transparency or explainability. It focuses on our ability to influence and adjust an algorithm’s mechanisms — the way it transforms inputs to outputs. However, it is not solely a feature of an algorithm itself, but of a wider algorithmic system, one which includes social and environmental factors influenced by an algorithm.
Responsiveness goes beyond transparency or explainability. It focuses on our ability to influence and adjust an algorithm’s mechanisms
Responsiveness is therefore the “duty” of an algorithm’s developer to act in the best interest of all those affected by it. This includes acknowledging that algorithms are fallible and facilitating people to question and meaningfully influence the mechanisms of an algorithm, and to contest its results.
Responsiveness is not the reactivity provided by generative AI. We can steer responses form chatbots, but we cannot directly influence the underlying mechanisms of large language models themselves. Responsiveness should also not be confused with the term “control”, which is often used to describe the influence developers have on algorithms. Rather, responsiveness implies an interaction between those affected by an algorithm and those building it. This requires channels through which people can influence the algorithm’s mechanisms. This influence need not be direct, but people affected by an algorithm need to have “a way in”.
Take for instance the difference between statistical models and machine learning. Statistical models are designed to formalise beliefs and structure data in ways that invite scrutiny and debate. Equations and coefficients specify the mechanisms that transform inputs to outputs, allowing statisticians to incorporate domain-specific knowledge and align models with theory or mental models of stakeholders. These qualities make statistical models transparent and responsive.
In contrast, machine learning algorithms rely on large numbers of intermediate variables to optimise predictive accuracy, often at the expense of transparency. However, the relative lack of transparency is not their main shortcoming, machine learning algorithms offer no meaningful representation of the world in their mechanisms that can become the focal point of scrutiny and debate.
machine learning algorithms offer no meaningful representation of the world in their mechanisms that can become the focal point of scrutiny and debate.
As Herbert Simon described, machine learning produces “synthetic” mechanisms. They are systems that perform impressive functions without mirroring human logic or understanding. Thus, even if we succeeded in making these algorithms transparent or explainable, they will never be responsive. This unresponsiveness constrains our ability to use machine learning in ways that drive understanding and permit alignment with human reasoning, qualities which are not just important for research but also for high-stakes decision-making.
Responsiveness in practice
Returning to the UK A-level fiasco and the Dutch childcare scandal through this lens, we see why a shift towards responsiveness is critical, particularly compared to a model like SAFFIER II, that is particularly responsive.
The SAFFIER II model is used by the Netherlands Bureau for Economic Policy Analysis (CPB) to, amongst other things, guide the Netherlands’ government spending. Composed of behavioural equations and “rules of thumb,” SAFFIER II is designed not just for prediction, but also to quantify the impact of proposed policy measures. The model’s structure allows people to trace how economic assumptions influence forecasts, which opens space for public discourse. As a result, SAFFIER II has a particularly critical audience. The CPB considers – and is responsive – to critique and can adjust the model to better align with economic theory and shifting mental models.
In contrast, the UK’s Direct Centre Performance (DCP) algorithm, created during the COVID-19 pandemic to assign A-level grades in lieu of exams, was met with significant backlash, but was not adjusted. Despite Ofqual’s efforts to ensure transparency, the documentation on the DCP and their modelling process is vast – their reliance on historical grade distributions and student ranking led to unfair outcomes. By excluding sociodemographic factors, the DCP masked attainment gaps, and its rigid design made it nearly impossible to have a meaningful discussion, let alone adjust to feedback.
The lesson here is clear: models designed with responsiveness in mind, like SAFFIER II, can adapt and improve through public engagement.
Similarly, the Dutch Risk Classification Model (RCM), designed to detect social benefit fraud, faced serious issues with both transparency and responsiveness. Although the Tax Authority promoted the RCM as a “self-learning” model, positioning it as the culprit for the model’s disastrous impacts, the RCM was actually based on logistic regression, a statistical method that requires manual variable selection and does not generate new variables automatically. This misleading portrayal was compounded by the Tax Authority’s refusal to disclose key model details, citing sensitivity concerns. This case highlights that , although some types of algorithms may facilitate responsiveness, true responsiveness requires organisational accountability.
The lesson here is clear: models designed with responsiveness in mind, like SAFFIER II, can adapt and improve through public engagement. The DCP case highlights the pitfalls of using algorithms that are transparent, but not responsive, the RCM case shows that using the right algorithm is a necessary, but not a sufficient condition for responsiveness.
Should decision makers rely on algorithms?
For high-stakes decision-making algorithmic systems require responsiveness at their core. Rather than merely focusing on transparency of algorithms, we must emphasise building models that can be adapted based on domain knowledge and feedback. Machine learning algorithms are not the right tool for that job.
Does this mean we will have to shy away from machine learning altogether?
Not necessarily. Policymakers should carefully consider if machine learning has any added benefits. There is mounting evidence it fails to outperform statistical models, casting doubts of its added predictive value in medicine and the judiciary. Moreover, we can take a leaf from approaches that seek to leverage machine learning to improve theorisation. The same representational qualities that impact machine learning’s responsiveness in decision making contexts, also make it challenging to employ it to drive theory development. To test hypotheses, we need to take into account existing theory and not just maximise data fit.
Instead of asking, How can we make AI transparent? we should ask, How can we ensure AI is responsive?
While many have suggested machine learning can be used inductively, to identify hidden patterns in data, we advocate for “co-duction”. This process involves inductive, deductive, and abductive reasoning and clearly outlines the agents and techniques for each of these steps. In short, it recommends using machine learning as an exploratory tool to identify complex patterns, which can then be isolated, connected to existing theory, and integrated in a statistical model. This ensures patterns identified can rigorously examined and formalised using statistical models that are responsive and align with theory.
The efforts in the field of interpretable machine learning cannot be overlooked in this context. Causal AI in particular shows promise for machine learning to be used responsibly in high-stakes decision making.
A way forward with responsive AI
The UK A-level fiasco and the Dutch childcare benefits scandal serve as reminders of the dangers of unchecked algorithmic power. Transparency is necessary but not enough; interpretability is better, but still falls short. What we truly need is a voice. We need influence over the algorithms.
Instead of asking, “How can we make AI transparent?” we should ask, “How can we ensure AI is responsive?” This shift places responsibility squarely on the developers of algorithms, requiring them to go beyond superficial explanation. It forces them to build channels for dialogue and develop algorithms capable of integrating human feedback.
The content generated on this blog is for information purposes only. This Article gives the views and opinions of the authors and does not reflect the views and opinions of the Impact of Social Science blog (the blog), nor of the London School of Economics and Political Science. Please review our comments policy if you have any concerns on posting a comment below.
Image Credit: Clarote & AI4Media, Better Images of AI, User/Chimera (CC-BY 4.0)