Replication and closer scrutiny of published findings are generally welcome in the scientific community, but questions have been raised over how replication attempts are being reported. Whilst there are certainly arguments for more friendly and cooperative tones to scientific debate, Dorothy Bishop welcomes this next chapter in rigorous debate. Reputation and career prospects will, at the end of the day, come down to the scientific rigour of a person’s research, not on whether a particular result did or did not cross a threshold of p < .05.
Some people are really uncomfortable with the idea that psychology studies should be replicated. The most striking example is Jason Mitchell, Professor at Harvard University, who famously remarked in an essay that “unsuccessful experiments have no meaningful scientific value“.
Hard on his heels now comes UCLA’s Matthew Lieberman, who has published a piece in Edge on the replication crisis. Lieberman is careful to point out that he thinks we need replication. Indeed, he thinks no initial study should be taken on face value – it is, according to him, just a scientific anecdote, and we’ll always need more data. He emphasises:”Anyone who says that replication isn’t absolutely essential to the success of science is pretty crazy on that issue, as far as I’m concerned.”
It seems that what he doesn’t like, though, is how people are reporting their replication attempts, especially when they fail to confirm the initial finding. “There’s a lot of stuff going on”, he complains “where there’s now people making their careers out of trying to take down other people’s careers”. He goes on to say that replications aren’t unbiased, and that people often go into them trying to shoot down the original findings and this can lead to bad science:
Making a public process of replication, and a group deciding who replicates what they replicate, only replicating the most counterintuitive findings, only replicating things that tend to be cheap and easy to replicate, tends to put a target on certain people’s heads and not others. I don’t think that’s very good science that we, as a group, should sanction.
It’s perhaps not surprising that a social neuroscientist should be interested in the social consequences of replication, but I would take issue with Lieberman’s analysis. His depiction of the power of the non-replicators seems misguided. You do a replication to move up in your career? Seriously? Has Lieberman ever come across anyone who was offered a job because they failed to replicate someone else? Has he ever tried to publish a replication in a high-impact outlet? Give it a try and you’ll soon be told it is not novel enough. Many of the most famous journals are notorious for turning down failures to replicate studies that they themselves published. Lieberman is correct in noting that failures to replicate can get a lot of attention on Twitter, but a strong Twitter following is not going to recommend you to a hiring committee (and, btw, that Kardashian index paper was a parody).
Lieberman makes much of the career penalty for those whose work is not replicated. But anyone who has been following the literature on replication will be aware of just how common non-replication is (see e.g. Ioannidis, 2005). There are various possible reasons for this, and nobody with any sense would count it against someone if they do a well-conducted and adequately powered study that does not replicate. What does count against them is if they start putting forward implausible reasons why the replication must be wrong and they must be right. If they can show the replicators did a bad job, their reputation can only be enhanced. But they’ll be in a weak position if their original study was not methodologically strong and should not have been submitted for publication without further evidence to support it. In other words, reputation and career prospects will, at the end of the day, come down to the scientific rigour of a person’s research, not on whether a particular result did or did not cross a threshold of p < .05.
Image credit: Sagredo – Own work. (Wikimedia, CC BY-SA 3.0)
The problem with failures to replicate is that they can arise for at least four reasons, and it can be hard to know which applies in an individual case. One reason, emphasized by Lieberman, is that the replicator may be incompetent or biased. But a positive feature of the group replication efforts that Lieberman so dislikes is that the methods and data are entirely open, allowing anyone who wants to evaluate them – see for instance this example. Others have challenged replication failures on the grounds that there are crucial aspects of the methodology that only the original experimenter knows about. To those I recommend making all aspects of methods explicit.
A second possibility is that a scientist does a well-designed study whose results don’t replicate because all results are influenced by randomness – this could mean that an original effect was a false positive, or the replication was a false negative. The truth of the matter will only be settled by more, rather than less replication, but there’s research showing that the odds are that an initial large effect will be smaller on replication, and may disappear altogether – the so-called Winner’s Curse (Button et al, 2012).
The third reason why someone’s work doesn’t replicate is if they are a charlatan or fraudster, who has learned that they can have a very successful career by telling lies. We all hope they are very rare and we all agree they should be stopped. Nobody would make the assumption that someone must be in this category just because a study fails to replicate.
The fourth reason for lack of replication arises when researchers are badly trained and simply don’t understand about probability theory, and so engage in various questionable research practices to tweak their data to arrive at something ‘significant’. Although they are innocent of bad intentions, they stifle scientific progress by cluttering the field with nonreplicable results. Unfortunately, such practices are common and often not recognised as a problem, though there is growing awareness of the need to tackle them.
There are repeated references in Lieberman’s article to people’s careers: not just the people who do the replications (“trying to create a career out of a failure to replicate someone”) but also the careers of those who aren’t replicated (“When I got into the field it didn’t seem like there were any career-threatening giant debates going on”). There is, however, another group whose careers we should consider: graduate students and postdocs who may try to build on published work only to find that the original results don’t stand up. Publication of non-replicable findings leads to enormous waste in science and demoralization of the next generation. One reason why I take reproducibility initiatives seriously is because I’ve seen too many young people demoralized after finding that the exciting effect they want to investigate is actually an illusion.
While I can sympathize with Lieberman’s plea for a more friendly and cooperative tone to the debate, at the end of the day, replication is now on the agenda and it is inevitable that there will be increasing numbers of cases of replication failure.
So suppose I conduct a methodologically sound study that fails to replicate a colleague’s work. Should I hide my study away for fear of rocking the boat or damaging someone’s career? Have a quiet word with the author of the original piece? Rather than holding back for fear of giving offence it is vital that we make our data and methods public: For a great example of how to do this in a rigorous yet civilized fashion I recommend this blogpost by Betsy Levy Paluck.
In short, we need to develop a more mature understanding that the move towards more replication is not about making or breaking careers: it is about providing an opportunity to move science forward, improve our methodology and establish which results are reliable (Ioannidis, 2012). And this can only help the careers of those who come behind us.
This piece originally appeared on Dorothy Bishop’s personal blog and is reposted with permission.
Note: This article gives the views of the authors, and not the position of the Impact of Social Science blog, nor of the London School of Economics. Please review our Comments Policy if you have any concerns on posting a comment below.
Dorothy Bishop is Professor of Developmental Neuropsychology and a Wellcome Principal Research Fellow at the Department of Experimental Psychology in Oxford. The primary aim of her research is to increase understanding of why some children have specific language impairment (SLI). Dorothy blogs at BishopBlog and is on Twitter @deevybee.