If your research involves collecting data, somewhere along the process you might engage in statistical analysis of your data. My question to you is, when do you decide what analyses you are going to run on your data? Is it once you’ve got your spreadsheet of data in front of you? Is it before you’ve even collected your data? Or, somewhere in between?
As a PhD supervisor, I have always asked my Post Graduate Researchers (PGRs; formerly referred to as PhD students) to put together an analysis plan when they are designing their study. Whilst we would have already fleshed out the research questions in meetings, putting pen to paper made sure that we had a good idea of how to address these research questions statistically. The analysis plan was a list of the variables we were manipulating, the variables we were collecting, the statistical tests we would use and how many participants we would need to conduct these tests with sufficient power. Over the years, I have placed more and more importance on this. This is for two reasons. First, I didn’t always practice what I preached in my own research, or we’d been a bit too vague in a PhD analysis plan. Too many times this led to spending weeks trying to derive the variables that would allow us to answer research questions, when actually if we’d fleshed it out earlier, we would have measured it slightly differently and saved ourselves time. Second, the reproducibility crisis. Something started to feel a bit wrong about having the data right in front of me and then deciding how to analyse it. It was too easy to try a couple of ways of analysing the data to see what worked best – clearly questionable research practices!
Questionable research practices are often not intentional and we might not even perceive them as wrong. For example, we might be selective in the variables we report, choosing just the ones that support our hypothesis, or we might fit our hypothesis to the data (that’s what I predicted all along wasn’t it?), known as HARKing (Hypothesising After the Results are Known). Equally, you might run lots of tests on the same data, but only report the ones that fit your hypothesis (p-hacking). The outcome of these kinds of research practices? False positives and research that is not reproducible. To mitigate against this within my lab group we now use pre-registration. This is a time-stamped record of your study design and analysis plans, which you upload to a repository before you start collecting your data. Once you’ve written up your study, you can make your pre-registration publicly available and provide the link to it in your write-up. That way, you can show that the analyses that you ran were planned before you had the data in front of you. There’s nothing to stop you adding exploratory analyses to your write-up, you just mark it as additional to your planned analyses.
Personally, I have found that putting in the effort to write a pre-registration document makes me dedicate more time and effort to making the study watertight. I think the PGRs in my lab would agree. Importantly, it has led to an improvement in the rigour of the study designs coming out of my lab. Pre-registration also means that once the data is in, you have a ‘recipe’ to follow. After all, you’ve already done the thinking for the planned analyses. This also prevents you from introducing unintentional biases to the study. If you haven’t tried pre-registration, have a look at the templates on the Open Science Framework or AsPredicted. Give it a go – using pre-registration can help you to build a reputation for openness and transparency.
Prof Emily Farran is the University’s Academic Lead for Research Culture and Integrity