Blog Admin

February 3rd, 2016

Putting hypotheses to the test: We must hold ourselves accountable to decisions made before we see the data.

5 comments | 2 shares

Estimated reading time: 5 minutes

Blog Admin

February 3rd, 2016

Putting hypotheses to the test: We must hold ourselves accountable to decisions made before we see the data.

5 comments | 2 shares

Estimated reading time: 5 minutes

In the daily practice of doing research, it is easy to confuse what is being done. There is often confusion over whether a study is exploratory (hypothesis-generating) research or confirmatory (hypothesis-testing) research. By defining how a hypothesis or research question will be tested at the outset of research, preregistration eliminates this ambiguity. David Mellor outlines the value of preregistration for determining statistical significance and introduces a new prize for scholars aimed at encouraging this practice.

We are giving $1,000 prizes to 1,000 scholars simply for making clear when data were used to generate or test a hypothesis. Science is the best tool we have for understanding the way the natural world works. Unfortunately, it is in our imperfect hands. Though scientists are curious and can be quite clever, we also fall victim to biases that can cloud our vision. We seek rewards from our community, we ignore information that contradicts what we believe, and we are capable of elaborate rationalizations for our decisions. We are masters of self-deception.

Yet we don’t want to be. Many scientists choose their career because they are curious and want to find real answers to meaningful questions. In its idealized form, science is a process of proposing explanations and then using data to expose their weaknesses and improve them. This process is both elegant and brutal. It is elegant when we find a new way to explain the world, a way that no one has thought of before. It is brutal in a way that is familiar to any graduate student who has proposed an experiment to a committee or to any researcher who has submitted a paper for peer-review. Logical errors, alternative explanations, and falsification are not just common – they are baked into the process.

Image credit: Winnowing Grain Eastman Johnson Museum of Fine Arts, Boston

Using data to generate potential discoveries and using data to subject those discoveries to tests are distinct processes. This distinction is known as exploratory (or hypothesis-generating) research and confirmatory (or hypothesis-testing) research. In the daily practice of doing research, it is easy to confuse which one is being done. But there is a way – preregistration. Preregistration defines how a hypothesis or research question will be tested – the methodology and analysis plan. It is written down in advance of looking at the data, and it maximizes the diagnosticity of the statistical inferences used to test the hypothesis. After the confirmatory test, the data can then be subjected to any exploratory analyses to identify new hypotheses that can be the focus of a new study. In this way, preregistration provides an unambiguous distinction between exploratory and confirmatory research.The two actions, building and tearing down, are both crucial to advancing our knowledge. Building pushes our potential knowledge a bit further than it was before. Tearing down separates the wheat from the chaff. It exposes that new potential explanation to every conceivable test to see if it survives.

To illustrate how confirmatory and exploratory approaches can be easily confused, picture a path through a garden, forking at regular intervals, as it spreads out into a wide tree. Each split in this garden of forking paths is a decision that can be made when analysing a data set. Do you exclude these samples because they are too extreme? Do you control for income/age/height/wealth? Do you use the mean or median of the measurements? Each decision can be perfectly justifiable and seem insignificant in the moment. After a few of these decisions there exists a surprisingly large number of reasonable analyses. One quickly reaches the point where there are so many of these reasonable analyses, that the traditional threshold of statistical significance, p < .05, or 1 in 20, can be obtained by chance alone.

If we don’t have strong reasons to make these decisions ahead of time, we are simply exploring the dataset for the path that tells the most interesting story. Once we find that interesting story, bolstered by the weight of statistical significance, every decision on that path becomes even more justified, and all of the reasonable, alternative paths are forgotten. Without us realizing what we have done, the diagnosticity of our statistical inferences is gone. We have no idea if our significant result is a product of accumulated luck with random error in the data, or if it is revealing a truly unusual result worthy of interpretation.

This is why we must hold ourselves accountable to decisions made before seeing the data. Without putting those reasons into a time-stamped, uneditable plan, it becomes nearly impossible to avoid making decisions that lead to the most interesting story. This is what preregistration does. Without preregistration, we effectively change our hypothesis as we make those decisions along the forking path. The work that we thought was confirmatory becomes exploratory without us even realizing it.

I am advocating for a way to make sure the data we use to create our explanations is separated from the data that we use to test those explanations. Preregistration does not put science in chains. Scientists should be free to explore the garden and to advance knowledge. Novelty, happenstance, and unexpected findings are core elements of discovery. However, when it comes time to put our new explanations to the test, we will make progress more efficiently and effectively by being as rigorous and as free from bias as possible.

Preregistration is effective. After the United States required that all clinical trials of new treatments on human subjects be preregistered, the rate of finding a significant effect on the primary outcome variable fell from 57% to just 8% within a group of 55 cardiovascular studies. This suggests that flexibility in analytical decisions had an enormous effect on the analysis and publication of these large studies. Preregistration is supported by journals and research funders. Taking this step will show that you are taking every reasonable precaution to reach the most robust conclusions possible, and will improve the weight of your assertions.

Most scientists, when testing a hypothesis, do not specify key analytical decisions prior to looking through a dataset. It’s not what we’re trained to do. We at the Center for Open Science want to change that. We will be giving 1,000 researchers $1,000 prizes for publishing the results of preregistered work. You can be one of them. Begin your preregistration by going to https://cos.io/prereg.

Note: This article gives the views of the author(s), and not the position of the LSE Impact blog, nor of the London School of Economics. Please review our Comments Policy if you have any concerns on posting a comment below.

About the Author:

David Mellor is a Project Manager at the Center for Open Science and works to encourage preregistration. He received his PhD from Rutgers University in Ecology and Evolution has been an active researcher in the behavioral ecology and citizen science communities.

About the author

Blog Admin

Posted In: Academic publishing | Data science | Evidence-based research

5 Comments

Mel Bartley says:

February 3, 2016 at 1:35 pm

I strongly agree with almost all of this. One question, though. I sometimes take part in studies that use path models. It can happen that a referee suggests an additional pathway that makes sense to us. But this would not have been in the original specification of the model. Come to think of it this kind of thing must happen pretty often. How would you view that?

Reply
1. David Mellor (@EvoMellor) says:
  
  March 9, 2016 at 2:31 pm
  
  Hello Mel,
  
  That is a great point and is a very frequent occurrence. I think that the vast majority of papers come out of peer review with one or more changes in how the data are analyzed. The best way to handle that is with transparency: “The following, additional paths (or tests, interactions, correlations, etc..) were conducted after data collection was complete…” The important distinction is to not present those new pathways as simply part of the a-priori tests or to lump them with the same analyses presented initially and planned ahead of time. This way, the reader will be able to properly situate those new tests in the complete body of evidence presented in the paper. After data collection and initial analysis, any new tests were created by being influenced by the data and are, in essence, a new hypothesis that is now being tested with the same data that was used to create it. That new test can be confirmed with later follow up study using newly collected data.
  
  Reply
Patrick Dunleavy says:

February 4, 2016 at 12:55 am

Doesn’t this just say – we can only be honest by being rigid? It carries hypothetico-deductive ‘logic’ to a silly extreme, ignoring the inherently iterative process of theorization, recognition of interesting phenomena, and data analysis. But, creative research is not like this. How can you formulate meaningful hypotheses without thinking about and recognizing patterning in the data – the two go hand in hand, and are not the same as simply ‘milking’ data for significant results.

Reply
Pingback: Testing a Hypothesis? Be Upfront About It and Win $1,000
David Mellor says:

February 6, 2016 at 4:46 am

Hi Patrick, Thank you for commenting. I very much agree that meaningful hypotheses cannot be made without recognizing patterns in the date. That may the best way to make a reasonable hypothesis. However, the same data that are used to create the hypothesis cannot be used to test that same hypothesis, and this is what preregistration does. It makes it clear to ourselves exactly what the hypothesis is before seeing the data, so that the data aren’t then used to subtly change/create a new hypothesis. If it does, fine, great! But that is hypothesis building, not hypothesis testing. That is exploratory work, not confirmatory work.

Reply

Blog Admin

February 3rd, 2016

Putting hypotheses to the test: We must hold ourselves accountable to decisions made before we see the data.

Blog Admin

February 3rd, 2016

Putting hypotheses to the test: We must hold ourselves accountable to decisions made before we see the data.

Image credit: Winnowing Grain Eastman Johnson Museum of Fine Arts, Boston

About the author

Blog Admin

5 Comments

Leave a Comment Cancel reply

It’s time designing for the colour blind became a more integrated component of academic and media training

July 31st, 2017

An emerging iron cage? Understanding the risks of increased use of big data applications in social policy

February 7th, 2018

Patient experience feedback: we need to engage with the issues of using Big Data methods to capture the human voice

March 20th, 2017

Challenging the print paradigm: Web-powered scholarship is set to advance the creation and distribution of research.

February 8th, 2016

Blog Admin

February 3rd, 2016

Putting hypotheses to the test: We must hold ourselves accountable to decisions made before we see the data.

Blog Admin

February 3rd, 2016

Putting hypotheses to the test: We must hold ourselves accountable to decisions made before we see the data.

Image credit: Winnowing Grain Eastman Johnson Museum of Fine Arts, Boston

About the author

Blog Admin

5 Comments

Leave a Comment Cancel reply

Related Posts

It’s time designing for the colour blind became a more integrated component of academic and media training

July 31st, 2017

An emerging iron cage? Understanding the risks of increased use of big data applications in social policy

February 7th, 2018

Patient experience feedback: we need to engage with the issues of using Big Data methods to capture the human voice

March 20th, 2017

Challenging the print paradigm: Web-powered scholarship is set to advance the creation and distribution of research.

February 8th, 2016