We get it all wrong when we use physical sciences tools to analyse big data

This is an edited excerpt from Mark Carrigan’s interview of Emma Uprichard published in the LSE Impact Blog.

There is something non-trivial about the violence being committed under the guise of big data. This is happening at a methodological level and is not being discussed. Many people tend to focus on the ethical, privacy and security issues intrinsic to big data, or on big data’s capacity to transform particular areas of social life, such as health, education and cities. These issues are certainly important and they are of course closely linked with methodology too. However, there are some even more fundamental questions that are largely being ignored.

I’m not saying that everything about big data is bad, negative or terrible, far from it. I think a lot of good is already coming from it. Making more visible the data systems that are already making and shaping everyday life is a good thing. We do need to pay attention to these interconnected systems. Who is generating the most or least data? Who owns it or parts of it? Who can and who does access and process the data? Are these the same people? To whom do the analytics and findings go and for which purposes? Who is profiting most and least from big data? These are questions that need urgent attention and they also need constant updating since, as we know, organisations’ personnel changes, data circuits morph as well as split and merge, and laws and cultures around what is or is not acceptable in relation to data keep changing.

The issue is, though, not whether we have small or big or even bad or good data. A more difficult question is: how are we to study the social world empirically given that all social systems – including big data systems – are open, historical, complex, dynamic nonlinear and tend towards states far from equilibrium as they evolve through time and space?

Moreover, how might we produce, within these systems, empirical descriptions and explanations that are also adequate at the level of both cause and meaning? And how might we do this in a way that is useful for policy and planning purposes? How precisely is our use of big data going to contribute to understanding the social? Big data is part of the social structures from which it emerges. These issues are far from clear.

The modes of analysis applied to big data tend to mirror approaches that have long been rejected by social scientists. These methodological approaches don’t recognise the importance of social meaning, context, history, culture and agency…

My worry is that there are modes of practice that are already being laid down, attitudes and cultures that are being normalised, laws being set up, and global networked infrastructures that are being created with little thought to what is social about them – and indeed often by those with little or no training in how the social is considered to work. To many, the question of what is social about the data is not even necessary, because there seems to be an assumption in some circles that the ‘social bit’ doesn’t really matter; instead what matters are the data. But in the case of big data, the data usually are the social bit! I cannot emphasise this point enough: most big data is social data.

Yet the modes of analysis applied to big data tend to mirror approaches that have long been rejected by social scientists: those modes of analysis useful to model and think about atoms, fluid dynamics, engine turbulence or social insects etc.

Yet the modes of analysis applied to big data tend to mirror approaches that have long been rejected by social scientists. And importantly, they have been rejected not because of the ‘discomfort’ that comes from the idea of modelling human systems using modes of analysis useful to model and think about atoms, fluid dynamics, engine turbulence or social insects etc., (although that may be part of it). These kinds of big data methodologies might well be used meaningfully to address certain kinds of questions and ones that we haven’t even asked before. But one of the main reasons these ‘social physics’ approaches tend to be rejected by social scientists is that these methodological approaches don’t so much as even nod to how important social meaning, context, history, culture, notions of agency or structure might be – and yet these matter enormously to how we use data to study social change and continuity.

At the end of the day, it doesn’t matter how much or how good our data is if the epistemological approach to modelling social systems is all backwards. To use big data to simply drive forward positivist notions of the social without accounting for the importance of history, culture, meaning, context, agency and structure in shaping social life is not good.

Yet so far the signs are precisely that big data analytics are going down that path. So when I say that there is a methodological genocide going on, what I am getting at is that at a methodological level, data analytics needs serious interrogation, particularly whether they can or are going to improve our lives. As Shaw argues, there are a number of ‘sides’ to genocide, and likewise here the methodological genocide that I am hinting at is epistemological, disciplinary, technological, urbanised, elitist, commercialised and, I dare say, often gendered.

There may be a view that these things are not so important because what really matters is about finding tools that fit the data. But unless the tools also fit the social that you want to re-produce, then it is unlikely that they will be much help in transforming anything for the better. If you produce bad findings and they are picked up and used, then somewhere along the line the issue of liability needs to be raised.

I am sure that with a bit of brainstorming we could develop a number of ways to ensure that those driving big data futures in the present were held more accountable both in the present and future. Big data has potentially serious repercussions to the kinds of social futures we are each complicit in shaping and people with the analytical capacity, or institutions that are supporting big data developments, need to be held accountable. This means that we also take stock of the methodological harm that is already present in many big data practices.

Notes: The interview is part of a series on the Philosophy of Data Science. Other interviews in the series: Rob Kitchin, Evelyn Ruppert, Deborah Lupton, Susan Halford, Noortje Marres, and Sabina Leonelli.

Emma Uprichard is Associate Professor and Deputy Director at the Centre for Interdisciplinary Methodologies and Co-Director of the Warwick Q-Step Centre, at the University of Warwick. She has recently completed the ESRC project ‘Food Matters’, which explored food hates and avoidances through the life course and is currently on the ESRC Seminar Series on ‘Complexity and Methods in the Social Sciences: An interdisciplinary approach’.

Mark Carrigan is a sociologist based in the Centre for Social Ontology at the University of Warwick. He edits the Sociological Imagination and is an assistant editor for Big Data & Society. His research interests include asexuality studies, sociological theory and digital sociology. He’s a regular blogger and podcaster and tweets at @mark_carrigan

Print Friendly

Algorithms writing academic books? What gives?

May 18th, 2019

Naps in the office – perhaps the secret of China’s digital success?

October 9th, 2017

How our social interactions influence our decision to buy a new home

May 28th, 2019

Self-driving vehicles might bring new possibilities for the future of digital money

May 22nd, 2023

Helena Vieira

September 26th, 2015

We get it all wrong when we use physical sciences tools to analyse big data

Helena Vieira

September 26th, 2015

We get it all wrong when we use physical sciences tools to analyse big data

About the author

Helena Vieira

Leave a Reply Cancel reply

Related Posts

Algorithms writing academic books? What gives?

May 18th, 2019

Naps in the office – perhaps the secret of China’s digital success?

October 9th, 2017

How our social interactions influence our decision to buy a new home

May 28th, 2019

Self-driving vehicles might bring new possibilities for the future of digital money

May 22nd, 2023