The importance that academics has placed on getting material published has led to bad science and even worse statistics.
The world of professional academics is a world unlike any other. With a laid back environment as well as a varying work schedule, the academic culture is very different than that of the typical corporate culture. Recently, there has been an aspect of the culture that has troubled many people. In 2014 on Inside Higher ED, Colleen Flaherty wrote a piece entitled Evaluating Evaluations where she discussed a study that displays the changing importance of different roles played by academics. She reports, “The study, out in the ‘American Association of University Professors’ journal Academe, also suggests that collegiality as a criterion for tenure and promotion is on the decline, and that value increasingly is being placed on research and publication – even for professors at teaching-oriented liberal arts institutions.” This usually comes as no surprise to most academics. They know that publishing is the most important part of their job. The term 'Publish or Perish" has become a colloquial term on campus due to this emphasis. This added pressure has put academics in a tough position, and has had troubling consequences.
When an academic is faced with either publishing or perishing they are left with few options. The first, and hopefully least utilized, is fraud. Make up a study, create fake data, and publish fake results. Most people agree that this type of academic fraud is not very common due to the strict penalties that have been put in place for people if found guilty. So what are the other options? NPR’s Planet Money explores what many academics are doing on Episode 677: The Experiment Experiment. They claim that–across a broad range of scientific fields–the push for publication has led many academics to cut corners and publish bad scientific findings. In particular, teasing data into being significant enough to publish. A simple, yet very helpful example they give on the podcast is studying flipping a coin. This coin is totally ordinary, with a 50-50 chance to land on either side, but–for argument’s sake–pretend we are ignorant of that fact. We start out by flipping the coin 10 times and 7 times it comes up heads. These results seem to indicate that there is a bias towards heads, but there isn’t enough proof to make a statistically significant finding. At this point the academic has a choice to either abandon the experiment and have nothing to show, or maybe continue on and see what else can happen. Maybe after flipping four more times–all of which happen to come up heads, boom, there is a “significant” result! Now the not-so-scientific scientific investigator has enough evidence to make a conclusion about coin flipping–a conclusion that in reality is false.
These kinds of practices have corrupted science, and lead to many false findings. These finding then go on to have real world implications, with potentially dangerous impacts on society. Brain Nosak, for example, pioneered the “Replication Project,” where replication of past successful psychology experiments was done to test their validity by testing their ability to be replicated. Following the original procedures in each case, they replicated 100 experiments. Out of those 100 they were only able to replicate the original findings 36 times. Similar problems exist in economics as well.
What can be done? There are at least three schools of thought. The first is to force researchers to register each study in advance, reporting on their intended methods of data collection and analysis. This forces them to stick to their original procedure, and not keep altering it until a “significant” result is found. the data. The second idea is to have institutions place less emphasis on publication and place a higher importance on other things such as student evaluations (which have their own problems). The third is to increase the professional rewards for trying to replicate other scientist’s results so the genuine results are more often sorted out from the spurious results. What is clear is that something needs to be done.
Update: In a comment on Miles’s Facebook post for this, Arthur Lewbel writes “A much simpler way to address much of the problem: use 4 sigma instead of 2 sigma as the standard of significance.” I (Miles) think that suggestion has a lot of merit to it. In any case, people should have to be much more apologetic about having a p-value (probability of a finding being the result of chance if there wasn’t any tinkering) as high as 5%, and only feel good about a result if the computed p-value is more like .1%, so that even if there was some tinkering there is a half-decent chance the result is genuine.