LSE - Small Logo
LSE - Small Logo

Bento Natura

March 8th, 2022

5 Minutes with Michelle Kendall: data, privacy and COVID-19

0 comments

Estimated reading time: 10 minutes

Bento Natura

March 8th, 2022

5 Minutes with Michelle Kendall: data, privacy and COVID-19

0 comments

Estimated reading time: 10 minutes

Michelle Kendall is a Research Fellow at the University of Warwick and has an honorary position as an independent scientific advisor with the UK Health Security Agency working on the NHS COVID-19 App. In February 2022, she gave a talk at our Women in Mathematics Seminar, where she discussed the epidemiological impact of the NHS COVID-19 app. This interview took place the following day with Bento Natura.

Thanks for your great talk on your involvement with the NHS Covid-19 App. As a statistician who wants to collect as much data as possible, while preserving privacy, how would you tweak the app for your needs?

The design of the app has to tread a fine balance in order to optimise functionality, privacy and epidemiological impacts. Sometimes those forces pull in opposite directions, but privacy can also help with the epidemiological impact by increasing trust in the app – the more people who use it, the better, and privacy is an important part of that. I would obviously defer to privacy and legal experts before recommending any changes, but I can explain some things which would be useful for us as we analyse the data. They mainly boil down to being able to link multiple data packets to the same user over time, which would obviously come with a privacy cost. So, for example, linking a notification to a subsequent positive test, or being able to identify whether a positive test result is “new” or one of many entered by the same user over consecutive days, would help us to be more accurate with our estimates.

And what do you think about location data?

In some of our analyses we make use of postcode data, which the user enters into the app when they install it, to understand regional variations and to adjust for statistically confounding factors. However, particularly when there are fewer restrictions and people move around more, it is a stretch to assume that all users get exposed to the virus within their home postcode area or even their home region, and some of our analytical methods don’t work well outside of lockdown periods. But I doubt you could get more accurate information about exposure locations without severely compromising privacy. We can see in aggregate how many positive tests are entered and how many notifications go out each day across England and Wales. For example, we can see that somebody who tests positive, puts the result into the app and consents to contact tracing, on average triggers two or three notifications to other app users with whom they’ve been in close contact over the past few days. (This changes over time and can be explored via this dashboard.) However, it would be helpful to know how varied that measure is, as some people probably don’t have any close contacts shortly before their positive test, whereas for others it could be many more, for example due to workplace contacts, public transport or parties. If there was more information about locations of exposures you might be able to make interesting use of that data and understand population behaviours better, but again that would compromise privacy.

I am sometimes surprised by the functionalities for which people are willing or not willing to sacrifice a little privacy. My go-to example is Google Maps and sharing location data. Many people are happy to share their location, and I’m really happy they do, because it means I don’t get caught in traffic jams as often because my navigation will try to find me a different route. It helps everyone: even for people who aren’t using Google Maps, if they do end up in a traffic jam it’s a shorter one than it would have been, because others haven’t joined it thanks to their phones diverting them. A lot of people are very willing to share their location data for that functionality. In some ways, it surprises me that people are suspicious of sharing more data if it could mean helping against the epidemic: reducing everyone’s risk of exposure, reducing cases, reducing the need for harsh restrictions, reducing the burden on the NHS, etc.

At the same time, I do understand that health data feels more sensitive, especially when it might feel associated with the threat of self-isolation fines, even though the app never made it mandatory to isolate – it was always just advisory. The NHS COVID-19 app is deliberately kept separate from the NHS App, which contains your vaccine passport and is linked to GP records, but there was some confusion about that in the media when vaccine passports were introduced. Some countries have made use of opt-in solutions where you can choose to share more data, in addition to your Apple/Google Exposure Notification system, to help researchers, and that could be worth considering. However, that would still come with interpretational challenges because we already know that app users are not a representative sample of the general population, and we must assume that the people who opt-in to share more data are not even representative of app users.

I’m getting into a grey area here, but could the app randomly use different sensitivity to notify users to help in gathering more useful data? Like an analogue of A/B testing, or the use of a placebo in clinical trials?

I don’t think that would be approved by any ethics board because the app is having a positive effect in reducing case numbers. The app is genuinely helping, so you can’t ethically take that away from people, nor can you deliberately isolate people unnecessarily. It would be wonderful for research, bringing higher confidence results much more quickly, but this is the challenge we face as statisticians and epidemiologists: trying to figure out ways of understanding the impacts of interventions on outbreaks in the real world using limited data.

Sometimes you get natural experiments which, as a researcher, you can try to probe for information. For example, there was a lab in southwest England around Autumn 2021 that gave out a lot of false negatives by mistake, and – horribly – this allowed us to see something of what happens when testing is effectively unavailable. There was also the infamous Excel error in September 2020, where some rows got left off a spreadsheet. For a few weeks, there were people who should have been contact traced but weren’t. As they were randomly spread throughout the country it made for an ideal natural experiment and there’s a brilliant paper by Fetzer and Graeber on this. They showed the mistake was associated with hundreds of thousands of extra infections in the six-week period following the discovery of the error, which tells you something about the important role that contact tracing plays when it’s going well. But of course, you can’t set up such experiments deliberately.

How did you end up where you are now in your career?

I started off with a bachelors and masters in pure mathematics, and my PhD was in cryptography and information security. From there, I moved into mathematical biology, looking at evolutionary trees, before working on HIV trees. It so happened that I was working in the multi-disciplinary Fraser Group at the Big Data Institute in Oxford, alongside epidemiologists, when Covid struck, and so I chose to move into working on Covid-19. It’s been quite an unusual journey!

I am motivated by real-world problems that involve interesting mathematics, and that has been the common theme across all the topics I’ve worked on. The biggest change was probably from cryptography to mathematical biology. In that situation, I wanted to find a new problem to investigate, and I saw a job advert that explained the position. Immediately, even with limited understanding of the biology, I could imagine mathematically that it was one of those problems that’s easy to state but quickly becomes intractable. What really helped was emailing my future supervisor, Caroline Colijn, and saying essentially that I hadn’t done any biology for years, but this sounded like a very interesting problem. Getting her to explain the project in her own words and hearing her passion for it, and me in turn explaining my passion for the relevant maths, laid the foundation for a great collaboration. That’s something I’d really recommend to anyone looking to change field – contact one of your possible future colleagues and hear them get enthused about it.

How did you get into cryptography for your PhD, as this can be completely different mathematics to what one usually encounters in pure mathematics degrees?

I was really lucky. I was at the University of York, and we had a free-choice module in our final year where students could organise a topic for self-study with the help of an academic sponsor, and a small group of us chose to study cryptography. This inspired me to take a PhD in cryptography at Royal Holloway, where they were quite used to people coming in with a strong mathematics background and learning cryptography ‘on the job’. My focus ended up being more on information security than cryptography. Mathematically this was mostly graph theory and combinatorics, which I hadn’t encountered in my pure mathematics degree so also had to learn as I went along.

Which then helped you later when working on phylogenetic trees…

Exactly – once you’ve studied complicated networks, a tree is a relatively simple version of that as it doesn’t have any cycles, and it is slightly easier to nail down mathematically. But trees can still get quite complicated and are used in the representation of plenty of interesting problems.

I’d been interested in coding for a long time but it was during my postdoc at Imperial College London that I really learned to use the programming language R, and to create Shiny apps which help to make your ideas more accessible. From then on, it has been a common theme for me to develop R packages and Shiny apps to communicate my findings – software has formed quite a large part of my research contributions. I think it’s really important: research can be relevant and useful, but if all you publish is a formula and leave it up to the reader to implement, it is less likely to get the impact it deserves than if you package it up nicely and make it accessible so it is immediately ready for people to use.

Do you see that more students who study mathematics, statistics or related subjects may decide to do research in your area as a PhD student or in industry compared to two years ago?

I haven’t seen it myself, because I haven’t been involved in hiring, but I could imagine so. Certainly, it has really brought epidemiology to everyone’s attention. I’m guessing there’s going to be a lot more funding available for such topics, and indeed many applications of big data skills. If I was advising someone, I would definitely say that if you pursue those skills, you’ll be really useful to lots of people.

Once Covid-19 is off our collective minds and your professional mind, are there other applications which could use the tools developed by you and your collaborators? And where do you see yourself?

It’s hard to say that because it completely depends on where the epidemic goes next. I would like to think there is always an option somewhere in the background to be able to reinstate a similar kind of app quickly in the case of another pandemic. Infections can spread in different ways and some control strategies would benefit from fast, anonymised contact tracing more than others, but if needed I’d like to think that a digital contact tracing solution could be spun up quickly. It does make such a big difference if it is available at the start of a wave.

My research project is all about looking at the impact of interventions on outbreaks. Part of what I hope to be doing in the longer term is creating a toolkit of methods for evaluating the impacts. I think there is a lot we can learn from economics particularly, and in fact we worked with some economists when we were looking at the impact of the app on the launch of Test and Trace on the Isle of Wight. They introduced me to the idea of a synthetic control study, which was a good way of approaching the problem, and I can see there are lots of useful existing tools in that space.

I hope that we will be better prepared for future outbreaks. What I would like to see is more collaboration between epidemiologists, economists, behavioural scientists, virologists, and anybody else with relevant skills to categorise impacts in advance, including cost-benefit analyses. Ideally, we would have analyses ready so that when a new pathogen emerges, as soon as we have early data about the way it is spreading, we are able to determine which interventions are likely to have the greatest impacts on limiting the spread and the least harmful impacts on society. This would also include assessments on timing and availability: typically, vaccines will have the most impact but take some time to develop, and other interventions can have a comparatively larger impact in the shorter term. To be better able to quickly quantify those impacts, even with really early and uncertain data, would be amazing, and I hope to be able to contribute to that body of knowledge.

I’ve really enjoyed doing some fast-paced work in my role as an independent scientific advisor for the NHS COVID-19 app, and it has been an interesting personal challenge to learn new ways of working. In the academic world, you often spend a lot of time looking for an interesting question, and then you spend quite a lot of time answering it as thoroughly and rigorously as possible and presenting your findings very carefully. Over the last couple of years, I’ve been thrust into a world where sometimes a question is asked by someone senior and an answer is needed within a few hours, so you just have to give the best answer you can. It has been a fun and exciting experience but it’s also been a lot of pressure, and I am looking forward to spending more time in the more familiar academic world and working on more long-term projects again soon.

About the author

Bento Natura

Posted In: 5 Minutes with | Analytics | Big Data | Data Science | Featured | General | Networks | Operations Research

Leave a Reply

Your email address will not be published. Required fields are marked *