Sir Ronald Aylmer Fisher (1890 - 1962). Everyone who's studied statistics is familiar with Fischer as the originator of ANOVA, the analysis of variance. (Via Wikimedia Commons.) |
"No. If manuscripts pass the preliminary inspection, they will be sent out for review. But prior to publication, authors will have to remove all vestiges of the NHSTP (p-values, t-values, F-values, statements about 'significant' differences or lack thereof, and so on)."[1]The journal cited the p-test as an "important obstacle to creative thinking" that's dominated psychology for decades; and, it hoped that other journals would join in this ban on what's seen as an unneeded crutch.[1] Shortly thereafter, the American Statistical Association (ASA) posted a comment on its web site that it was wary that such a p-value ban might have its own negative consequences.[2] The ASA has formed a group of more than two-dozen "distinguished statistical professionals" to develop a statement on p-values.[2] Tom Siegfried, in his blog at Science News, quotes William Rozeboom, a philosopher of science, as saying that the p-test was "surely the most bone-headedly misguided procedure ever institutionalized in the rote training of science students."[3] A recent paper in PLoS Biology by biologists at the Australian National University (Canberra, Australia) and Macquarie University (New South Wales, Australia) concludes that scientists will sometimes "tweak" experiments and analysis methods to obtain a better p-value and thereby increase the likelihood of publication.[5] The authors call this technique, "p-hacking," and it appears to be common in the life sciences. This conclusion is based on an analysis of more than 100,000 research papers in such diverse scientific disciplines as medicine, biology and psychology.[5]
Megan Head, lead author of the p-hacking paper, in her evolutionary biology laboratory at the Australian National University. (Australian National University photo by Regina Vega-Trejo.) |
"Many researchers are not aware that certain methods could make some results seem more important than they are. They are just genuinely excited about finding something new and interesting."[5]Typical research practices leading to p-hacking include doing analyses in the middle of an experiment to decide whether to continue the experiment; recording many variables, but deciding which are significant enough to report; dropping outliers; excluding, combining, or splitting groups after analysis; and stopping data taking once an analysis gives a significant p-value.[4] One reason for p-hacking is publication pressure. Prestigious journals accept papers that have statistically significant ("positive") results, and this appears to generate papers with false positive results that hinder scientific progress.[4-5] Early positive studies receive a lot of attention, while contradicting negative studies not as much.[4] In multiple studies on the effectiveness of a pharmaceutical drug, too many p-hacked findings would make the drug would look more effective than it is.[5]
Evidence for p-hacking for various scientific disciplines based on p-values in paper abstracts. Engineering and chemistry appear to be honest disciplines, while other sciences have p-values clumped at the high end. See ref. 4 for details. (Fig. 3B of ref. 4, licensed under a Creative Commons Attribution License.)[4] |
"This suggests that some scientists adjust their experimental design, datasets or statistical methods until they get a result that crosses the significance threshold... They might look at their results before an experiment is finished, or explore their data with lots of different statistical methods, without realizing that this can lead to bias."[5]Funding for this research was provided by the Australian Research Council.[4]
Randall Munroe weighed in on the p-value debate in his xkcd comic of January 26, 2015, licensed under the Creative Commons Attribution-NonCommercial 2.5 License. Click image to view the comic on his web site. |