Much of the social and medical sciences depend on randomised control trials. But while this may be considered the foundational experimental method, a certain degree of bias inevitably arises in any trial; whether this is sample bias, selection bias, or measurement bias. This is important as the level of validity of a trial’s causal claims can be a matter of life or death. To Alexander Krauss, the scientific process is a complex human process, involving many actors required to take many unique decisions at many different stages, and so some degree of bias is unavoidable. This has implications for the reproducibility crisis, as variation between study outcomes becomes the norm, and one-to-one replication is not possible.
Randomised controlled trials (RCTs) are generally viewed as the foundational experimental method of the social and medical sciences. Economists depend on them, for certain questions, as their most valued method. Yet RCTs are not flawless. In my study, “Why all randomised controlled trials produce biased results”, I argue that RCTs are not able to establish precise causal effects of an intervention.
Many of us have likely used some medication, own some technology, or support some public policy tested in a trial. To be able to assess how effective they may be prior to supporting them — either as patients or consumers or voters — RCTs are often conducted by splitting up a sample of people into a treatment group and a control group. Contrary to common belief, I argue that some degree of bias inevitably arises in any trial. This is because some share of recruited people refuse to participate in any trial (which leads to sample bias), some degree of partial blinding or unblinding of the various trial persons generally arises in any trial (which leads to selection bias), and participants generally take treatment for different lengths of time and different dosages in any trial (which leads to measurement bias), among other issues.
The ten most-cited RCTs worldwide, which I assess in the study, suffer from such general issues. But they also suffer from other methodological issues that affect their estimated results as well: participants’ background characteristics (like age, health status, level of need for treatment, etc.) are often poorly allocated across trial groups, participants at times switch between trial groups, and trials often neglect alternative factors contributing to their main reported outcome, among others. Some of these issues cannot be avoided in trials—but they affect the robustness and validity of their results and conclusions.
This is important as the level of validity of a trial’s causal claims is at times a life-or-death matter—for example in public health. The study itself is about the RCT method and not any individual RCTs, and the insights outlined are useful and important for researchers using RCTs in economics, psychology, agriculture, and the like (though the ten most-cited RCTs worldwide that are assessed happen to be medical trials).
Assumptions and biases generally increase at each step when carrying out trials
That is, from how we create our variables, select our initial sample, and randomise participants into trial groups, to how we analyse the data for participants with different lengths of time and amounts of treatment, and how we try and ensure everyone involved is fully blinded before the trial begins and throughout its entire implementation—among many other steps before, in between, and after these.
I thus argue that the reproducibility crisis is, to a large extent, the result of the scientific process always being a complex human process that involves many actors (study designers, all participants, data collectors, implementing practitioners, study statisticians, etc.) who must make many unique decisions at many different steps over time when designing, implementing, and analysing any given study—and some degree of bias unavoidably arises during this process. Variation between study outcomes is thus the norm, and one-to-one replication is not possible.
Researchers should thus not assume that the RCT method inevitably produces valid causal results—in fact, that all trials face some degree of bias is simply the trade-off for studies to actually be conducted in the real world. A number of things inevitably do not go as planned or designed given the multiple complex processes over time involved in carrying out trials. Once a study is conducted and completed some biases will have arisen and nothing can be done about a number of them. The study, at the same time, aims to improve how RCTs are carried out by outlining how researchers can reduce some of the biases.
Are biased results in trials still good enough to inform our decisions in public health and social policy?
In many cases they are. But that judgement generally depends on how useful the results are in practice and their level of robustness relative to other studies that use the same method or, at times, other methods. Yet no single study should be the sole and authoritative source used to inform policy and our decisions.
Some may respond, “are RCTs not still more credible than other methods even if they may have biases?” For most questions we are interested in, RCTs cannot be more credible because they cannot be applied — e.g. for most complex phenomena we study, such as effective government institutions, long life expectancy, democracy, inequality, education systems, psychological states, etc. Other methods (such as observational studies) are needed for many questions generally not amendable to randomisation but also at times to help design trials, interpret and validate their results, provide further insight on the broader conditions under which treatments may work, among other reasons discussed in the study. Different methods are thus complements (not rivals) in improving understanding.
Taken together, researchers, practitioners, and policymakers need to become better aware of the broader range of biases facing trials. Journals need to begin to require researchers to outline in detail the assumptions, biases, and limitations in their studies. If researchers do not report this crucial information, practitioners and citizens will have to just rely on information and warning labels provided by policymakers, biopharmaceutical companies, and the like implementing the tested policies and selling the tested treatments.
This blog post originally appeared under a different title on the Institute for New Economic Thinking blog. It is based on the author’s article, “Why all randomised controlled trials produce biased results”, published in the Annals of Medicine (DOI: 10.1080/07853890.2018.1453233).
Featured image credit: Analise Benevides, via Unsplash (licensed under a CC0 1.0 license).
Note: This article gives the views of the author, and not the position of the LSE Impact Blog, nor of the London School of Economics. Please review our comments policy if you have any concerns on posting a comment below.
About the author
Alexander Krauss is a Marie Curie Research Fellow, with his research financed by the European Commission. Alexander has also taught at University College London. His postdoc research focuses largely on the limits of science and scientific methodology. He has a Bachelor’s in Philosophy, a Master’s in Philosophy, as well as in Political Science, and a PhD in Development Economics.
14 July 2018
ear Dr. Krauss,
I agree that we cannot avoid bias by simply randomizing and have written as much (Treves et al. 2016). But the alternative methods of sampling you advocate would introduce selection bias inevitably. The distinction is important. Selection bias CAN arise in randomized trials as described by Mukherjee (2010), Iomandis (2005),our work on predator management (Treves et al. 2016, Ohrens et al. in press), and many others. But other forms of sampling WILL generate selection bias. Your essay does not do enough in my opinion to clarify that selection bias is pervasive in other types of studies that claim causal inference.
The distinction is crucial because the best and safest way to sample without bias is to randomize, and thereafter one should add other elements to the experimental design to remove bias. For example, when the extraneous variables you list and others intrude to confound a treatment effect, one might add a cross-over (reverse-treatment) design to introduce within-subject controls over variation. Other steps such as blinding, you mention, are also essential to removing bias in other steps in the process. I would add safeguards against reporting bias (not publishing all relevant methods or data) and treatment bias (varying the quantity or the quality of the treatment). These are all compatible with randomized trials.
I feel your essay will suggest to some reaserchers that they can justify non-random selection by citing you. It is not an either or choice. Randomization with rigorous and vigilant constraints on other forms of bias are needed. The history of biomedical research and other fields is littered with the graves of other approaches. Therefore, while I agree that other types of studies can help shed light on results, I differ on the weight you place on bias and your solution to it.
Cordially,
Adrian Treves
Iomandis JP. 2005. Why Most Published Research Findings Are False. PLOS Medicine 2:e124.
Mukherjee S 2010. The Emperor of All Maladies: A Biography of Cancer. Scribner, Mew York.
Treves A, Krofel M, McManus J. 2016. Predator control should not be a shot in the dark. Frontiers in Ecology and the Environment 14:380-388.
Ohrens O, Bonacic C, Treves A. in press. Non-lethal defense of livestock against predators: Flashing lights deter puma attacks in Chile. Frontiers in Ecology and the Environment.
The point is not whether a study design results in bias but rather how much those sources of bias could explain the observed effect. Hard to explain away big results without a large methodlogical flaw in ascertainmeny or follow-up. But certainly an important consideration for modest effects
There is the naturalistic fallacy to consider, too, which I am currently writing about in connection with behavioural economics. Is-ought distinction. I agree that science is a social activity and therefore it cannot escape the influence of social, cultural, political and economic norms.
A major flaw in the current positivist revival is that it has trodden over old critiques, such as the fallibility of scientific methodology, the verification principle is itself unverifiable, there is no privileged, mind and culture- independent, ‘objective’ vantage point accessible only for choice architects and behavioural economists to observe from, and so on.
In public policy, qualitative methodology as a feedback mechanism is highly valuable in that it allows citizens a space for meaningful, and above all, democratic dialogue. Without that dialogue, behavioural economics is simply a prop for failing neoliberal policies, entailing an authoritarian state that simply ACTS UPON passive subjects without their consent, without respect for their autonomy and without any incusion in discussion of their needs or participation in decision making.
The implications of that are ghastly.
Any system that involves humans is unlikely to be perfect, but the perfect should not be the enemy of the good. RCTs are the best method we have of proving that an intervention works, and they are (no pun intended) proving invaluable in ridding our health systems of medieval nonsense like acupuncture, homeopathy and other forms of alternative “medicine.” I guarantee that your work will be pounced on by pseudoscientists of every stripe as “evidence” that their voodoo “works.”