Three years ago, an investigation conducted by the Open Science Collaboration made headlines. It showed that the findings of only a little more than a third of the 100 psychology experiments that it examined were reproducible, i.e. were able to stand up to scrutiny when they were retested in follow-up studies. Talk about a reproducibility crisis in science has been spreading at least since then. The crisis affects not just the field of psychology, but also concerns many other disciplines whose insights are based on experiments and statistics – fields ranging from economics to biomedicine.
Biostatistician Leonhard Held is intimately familiar with this issue. “It’s not that the quality of research has simply deteriorated,” he says, “because unlike today, research results before often were never retested at all.” Ever since that has changed, though, certain deficiencies have become visible (see box).
While many of the studies published today are impeccable in terms of quality, plenty of them do leave something to be desired. Lax methodological rigor is a recurring issue in empirical research, according to Held, who believes that one reason for this is that researchers sometimes know too little about the scientific principles of statistics. “There’s a lot of catching up to do here,” admonishes Held. To advance and improve the quality of empirical research conducted at UZH, the biostatistician has founded the Center for Reproducible Science (CRS), which recently took up operations.
The new center has an interdisciplinary setup involving the participation of the Faculty of Medicine, the Faculty of Arts and Social Sciences, the Faculty of Science and the Faculty of Business, Economics and Informatics. “Everyone we asked enthusiastically agreed to join,” says Held, “which indicates that the problems transcend individual disciplines.” The objective of the CRS is to improve the reproducibility of scientific studies conducted at UZH and to thus enhance research integrity.
Seal of approval for exemplary research
In pursuit of that goal, instructional materials for teaching staff are to be developed, for instance, and will be made available online. The CRS also plans to hold workshops to impart the very latest methodological knowledge to researchers and students. Such workshops are slated to be held, for example, on Reproducibility Day, which is scheduled to take place at UZH next February. In addition, the CRS envisages awarding a UZH in-house seal of approval in the future for best-in-class research. A reproducibility prize to be awarded to exceptionally exemplary studies is also under discussion.
“Research also has to become more transparent if we want to improve the reproducibility of studies,” believes Leonhard Held. For this reason, says the biostatistician, experiment data and the source code used to analyze it should both be openly accessible in the public domain (keyword: open science). This enables other researchers to replicate experiments using the same protocols and parameters.
Lowering the P value
Leonhard Held also has another idea on how to combat the reproducibility crisis. In a jointly authored journal article, he and a group of colleagues have put forth a proposal to lower the P-value threshold for claims of new scientific discoveries to 0.005 from 0.05 at present. The P value is the measure of all things in empirical research. It gauges how strong the evidence is in support of a scientific hypothesis. If the P value is low enough (below 0.05), the result is considered statistically significant and is deemed a verified new scientific finding. Such discoveries can reap researchers scientific accolades.
Lowering the P-value threshold and correspondingly increasing sample sizes by 70% can substantially reduce the risk of obtaining false-positive results, the authors stress.
This idea has received a great deal of attention in the scientific community and has been contentiously debated since its publication. At any rate, a discussion about quality in empirical research has been ignited. The foundation of the CRS is further proof of this. The center is thus playing a vanguard role throughout Switzerland.
The four horsemen of the reproducibility apocalypse
Biostatistician Leonhard Held attributes the present crisis to four causes – the four horsemen of the reproducibility apocalypse, as Dorothy Bishop, a neuropsychologist at the University of Oxford, calls them:
1. In some cases, protocols that articulate the procedural methods and define the research questions of studies are missing. This can lead to hypotheses getting formulated retrospectively after the experiment data have already been collected, loosely along the lines of the motto “that’s exactly what I’ve always wanted to know.”
2. Oftentimes researchers “tweak the data until the P value is low enough,” says Leonhard Held. The P value gauges how strong the evidence is in support of a scientific hypothesis. If the P value is low enough (below 0.05), the result is considered statistically significant, in other words, it is then deemed a verified new scientific finding.
3. Many studies are conducted using inadequate sample sizes or too few experiment subjects. This renders them less conclusive and increases the risk that a statistically significant result will turn out to be a false positive.
4. Scientific journals tend to publish only “sexy findings and stunning discoveries that reveal significant distinctions,” says Held. This ratchets up the pressure on researchers to obtain such results. Negative study findings, in contrast, tend not to get published.
The editorial team reserves the right to not publish comments. We will not publish anonymous, defamatory, racist, sexist, otherwise prejudiced, or irrelevant comments. UZH News will also not publish comments with advertising content.