December 2013

The reliability of scientific research

When I was going through our mail, the cover of the Oct. 19 issue of The Economist jumped out at me: “HOW SCIENCE GOES WRONG.” I thought “This is not good” and scanned the story, which highlights two studies that indicated that, when scientists from the pharmaceutical industry tried to replicate results from important papers in preclinical cancer research, only 10 percent to 25 percent of the key findings could be reproduced. The article proposes several explanations for the lack of replicability. The author’s hypotheses include the impact of the publish-or-perish culture (favoring rapid publication of new results with few incentives for replication or validation studies) and the incentives for cherry-picking data and exaggeration.
The issue also contains a second article, “Unreliable research: Trouble in the lab.” The briefing refers to a study published in 2005 by Stanford epidemiologist John Ioannidis, “Why most published findings are false.”. Rather than looking for cultural issues that may encourage publication of unreliable results, these articles instead examine the research process from a statistical point of view. More specifically, they use so-called Bayesian analysis to examine the problem.
To understand Bayesian analysis, consider the following. Suppose you have a diagnostic test for a disease. If the disease is present, the test is positive 95 percent of the time, meaning that it is quite sensitive. If the disease is absent, the test is negative 90 percent of the time, meaning that it is fairly specific. Given these parameters, it seems like a fairly reliable test. Suppose that 1 percent of the population has the disease. What is the likelihood that someone who tests positive for the disease actually has it?
Consider a population of 2,000. One percent, or 20 individuals, has the disease. For these people, 95 percent, or 19 out of 20, are expected to test positive, and 1 is expected to test negative. The remaining 1,980 do not have the disease. Of these, 90 percent, or 1,782, are expected to test negative and 10 percent, or 198, are expected to test positive. Taken together, these data mean that 217 (19 + 198) individuals are expected to test positive, but only 19 actually have the disease. Thus, the likelihood that a person with a positive test actually has the disease is 19/217, or 8.7 percent, a surprisingly low number.
Suppose the prevalence of the disease is much higher, say 30 percent. If you repeat the analysis above, the likelihood that a person with a positive test actually has the disease rises to 80 percent.
How can Bayesian analysis be applied to scientific results? The article in The Economist assumes that scientific hypotheses have a false positive rate of 5 percent (based on the widespread use of a p value of 0.05 when testing statistical significance) and a false negative rate of 20 percent. To complete the analysis, the authors have to assume a value for the equivalent of the prevalence of the disease. This is referred to as the “prior probability” in the general case. The authors assume a value of 10 percent, meaning that 10 percent of the hypotheses deemed interesting enough to investigate are, in fact, correct. Based on these parameters, in a sample of 1,000 studies, the number of hypotheses that are true and that are found to be true is expected to be 80, while the number of hypotheses that are false but appear to be true will be 45. Thus, the percentage of hypotheses that appear to be true but are not will be 45/(80 + 45), or 36 percent. If one accepts all of the assumptions, this analysis provides an explanation for why a significant fraction of published papers cannot be replicated.
Given both the empirical data and this statistical analysis that suggests that the phenomenon of important studies that cannot be replicated is real, what should the scientific community do? First, we must take ownership of the issue. Denying that the lack of replicability is not an issue or that it does not affect any particular field in the absence of compelling data supporting this conclusion is not an effective strategy and is likely to involve a substantial amount of wishful thinking or self-delusion.
Second, each researcher has a responsibility to ensure that his or her own published work is as reliable as possible within the limits imposed by resources and other constraints. In the Bayesian context, this will increase both sensitivity and specificity. Some of the published analyses include anecdotes in which investigators, when confronted with the lack of replicability of one of their published works, made comments indicating that the experiment “worked” only one out of 10 times but that successful result is the result that they published. In addition, each researcher should make sure that the experimental sections of his or her papers are as complete as possible and highlight those details that are particularly important for obtaining the results described. The responsibility also falls on the reviewers and editors of manuscripts, who must do their parts to make sure that manuscripts do not contain clear flaws and include adequate information to allow experimental replication. The fact that most journals are now largely or wholly online facilitates the inclusion of adequate experimental details.
Third, the community should find effective mechanisms for sharing the results of replication experiments, both successful and unsuccessful. Some small-scale projects in this area already are underway, particularly in the area of post-publication review. For example, the new electronic journal eLife includes a comment section for each article, where, in principle, researchers can ask questions about procedures or describe their own experiences. The National Institutes of Health, through the National Center for Biotechnology Information, is experimenting with PubMed Commons, a vehicle to allow members of the scientific community to comment on papers within PubMed. PubMed Commons is in an invitation-only pilot phase now but will expand if the pilot is deemed a success.
In addition to these mechanisms, journals and funding agencies should consider carefully their policies with regard to the performance and publication of successful and unsuccessful replication experiments. Replication studies never will be as sexy as novel findings, but they are important for the scientific enterprise, and addressing some of the disincentives for performing or sharing these results could provide considerable benefit.
The imperative for taking on these issues is highlighted in articles that have appeared since The Economist articles. For example, the Los Angeles Times published an article titled “Science has lost its way, at a big cost to humanity.” It highlights some of the data discussed above as well as some of the potential responses. While we must be careful not to overreact and set up unwise or overly burdensome policies or waste valuable resources, we must keep in mind that the credibility of scientific results and the scientific process is one of the most valuable assets that we, as members of the scientific community, have. This is essential for our role as a largely publicly funded enterprise and, most importantly, for our ability to contribute to the solutions of important problems.

Photo of Jeremy BergJeremy Berg ( is the associate senior vice-chancellor for science strategy and planning in the health sciences and a professor in the computational and systems biology department at the University of Pittsburgh.

found= true2602