Randomised controlled trials (RCTs) are generally viewed as the foundational experimental method of the social and medical sciences. Economists depend on them, for certain questions, as their most valued method. Yet RCTs are not flawless. In my study, “Why all randomised controlled trials produce biased results”, I argue that RCTs are not able to establish precise causal effects of an intervention.
Many of us have likely used some medication, own some technology, or support some public policy tested in a trial. To be able to assess how effective they may be prior to supporting them — either as patients or consumers or voters — RCTs are often conducted by splitting up a sample of people into a treatment group and a control group. Contrary to common belief, I argue that some degree of bias inevitably arises in any trial. This is because some share of recruited people refuse to participate in any trial (which leads to sample bias), some degree of partial blinding or unblinding of the various trial persons generally arises in any trial (which leads to selection bias), and participants generally take treatment for different lengths of time and different dosages in any trial (which leads to measurement bias), among other issues.
The ten most-cited RCTs worldwide, which I assess in the study, suffer from such general issues. But they also suffer from other methodological issues that affect their estimated results as well: participants’ background characteristics (like age, health status, level of need for treatment, etc.) are often poorly allocated across trial groups, participants at times switch between trial groups, and trials often neglect alternative factors contributing to their main reported outcome, among others. Some of these issues cannot be avoided in trials—but they affect the robustness and validity of their results and conclusions.
This is important as the level of validity of a trial’s causal claims is at times a life-or-death matter—for example in public health. The study itself is about the RCT method and not any individual RCTs, and the insights outlined are useful and important for researchers using RCTs in economics, psychology, agriculture, and the like (though the ten most-cited RCTs worldwide that are assessed happen to be medical trials).
Assumptions and biases generally increase at each step when carrying out trials
That is, from how we create our variables, select our initial sample, and randomise participants into trial groups, to how we analyse the data for participants with different lengths of time and amounts of treatment, and how we try and ensure everyone involved is fully blinded before the trial begins and throughout its entire implementation—among many other steps before, in between, and after these.
I thus argue that the reproducibility crisis is, to a large extent, the result of the scientific process always being a complex human process that involves many actors (study designers, all participants, data collectors, implementing practitioners, study statisticians, etc.) who must make many unique decisions at many different steps over time when designing, implementing, and analysing any given study—and some degree of bias unavoidably arises during this process. Variation between study outcomes is thus the norm, and one-to-one replication is not possible.
Researchers should thus not assume that the RCT method inevitably produces valid causal results—in fact, that all trials face some degree of bias is simply the trade-off for studies to actually be conducted in the real world. A number of things inevitably do not go as planned or designed given the multiple complex processes over time involved in carrying out trials. Once a study is conducted and completed some biases will have arisen and nothing can be done about a number of them. The study, at the same time, aims to improve how RCTs are carried out by outlining how researchers can reduce some of the biases.
Are biased results in trials still good enough to inform our decisions in public health and social policy?
In many cases they are. But that judgement generally depends on how useful the results are in practice and their level of robustness relative to other studies that use the same method or, at times, other methods. Yet no single study should be the sole and authoritative source used to inform policy and our decisions.
Some may respond, “are RCTs not still more credible than other methods even if they may have biases?” For most questions we are interested in, RCTs cannot be more credible because they cannot be applied — e.g. for most complex phenomena we study, such as effective government institutions, long life expectancy, democracy, inequality, education systems, psychological states, etc. Other methods (such as observational studies) are needed for many questions generally not amendable to randomisation but also at times to help design trials, interpret and validate their results, provide further insight on the broader conditions under which treatments may work, among other reasons discussed in the study. Different methods are thus complements (not rivals) in improving understanding.
Taken together, researchers, practitioners, and policymakers need to become better aware of the broader range of biases facing trials. Journals need to begin to require researchers to outline in detail the assumptions, biases, and limitations in their studies. If researchers do not report this crucial information, practitioners and citizens will have to just rely on information and warning labels provided by policymakers, biopharmaceutical companies, and the like implementing the tested policies and selling the tested treatments.
♣♣♣
Notes:
- This blog post appeared first under a different title on the Institute for New Economic Thinking blog, and also on LSE Impact. It is based on the author’s article, “Why all randomised controlled trials produce biased results”, published in the Annals of Medicine (DOI: 10.1080/07853890.2018.1453233).
- The post gives the views of its authors, not the position of the Grantham Research Institute, LSE Business Review or the London School of Economics.
- Featured image credit: Image by OpenClipart-Vectors, under a CC0 licence.
- When you leave a comment, you’re agreeing to our Comment Policy.
Alexander Krauss is a Marie Curie Research Fellow at LSE’s Centre for Philosophy of Natural and Social Science (CPNSS), with his research financed by the European Commission. Alexander has also taught at University College London. His postdoc research focuses largely on the limits of science and scientific methodology. He has a Bachelor’s in philosophy, a Master’s in philosophy, as well as in political science, and a PhD in development economics.
This appears to show no awareness of the fact that RCTs do not require prognostic factors to be balanced across groups, for the simple reason that the standard analysis makes an allowance for imbalance. A simple way to see this is to note that if a cross-over trial, for which all patient level covariates are balanced, is analysed as if it were a parallel group trial the confidence intervals are wider.Why? To allow for the possible imbalance that would occur. See https://errorstatistics.com/2013/07/14/stephen-senn-indefinite-irrelevance-2/amp/ for a concrete illustration. To demonstrate that RCTs did not adequately deal with this problem one would have to demonstrate that, for example, that the confidence intervals did not have the correct coverage property. No such demonstration was attempted. For further discussion see: https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.5713