Publication bias and the failure of replication in experimental psychology
 Gregory Francis
 … show all 1 hide
Abstract
Replication of empirical findings plays a fundamental role in science. Among experimental psychologists, successful replication enhances belief in a finding, while a failure to replicate is often interpreted to mean that one of the experiments is flawed. This view is wrong. Because experimental psychology uses statistics, empirical findings should appear with predictable probabilities. In a misguided effort to demonstrate successful replication of empirical findings and avoid failures to replicate, experimental psychologists sometimes report too many positive results. Rather than strengthen confidence in an effect, too much successful replication actually indicates publication bias, which invalidates entire sets of experimental findings. Researchers cannot judge the validity of a set of biased experiments because the experiment set may consist entirely of type I errors. This article shows how an investigation of the effect sizes from reported experiments can test for publication bias by looking for too much successful replication. Simulated experiments demonstrate that the publication bias test is able to discriminate biased experiment sets from unbiased experiment sets, but it is conservative about reporting bias. The test is then applied to several studies of prominent phenomena that highlight how publication bias contaminates some findings in experimental psychology. Additional simulated experiments demonstrate that using Bayesian methods of data analysis can reduce (and in some cases, eliminate) the occurrence of publication bias. Such methods should be part of a systematic process to remove publication bias from experimental psychology and reinstate the important role of replication as a final arbiter of scientific findings.
 Abrams, RA, Davoli, CC, Du, F, Knapp, WH, Paull, D (2008) Altered vision near the hands. Cognition 107: pp. 10351047 CrossRef
 Bakker, M, Wicherts, J (2011) The (mis)reporting of statistical results in psychology journals. Behavior Research Methods 43: pp. 666678 CrossRef
 Banks, GC, McDaniel, M (2011) The kryptonite of evidencebased IO psychology. Industrial and Organizational Psychology 4: pp. 4044 CrossRef
 Begg, CB, Mazumdar, M (1994) Operating characteristics of a rank correlation test for publication bias. Biometrics 50: pp. 10881101 CrossRef
 Bem, DJ (2011) Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect. Journal of Personality and Social Psychology 100: pp. 407425 CrossRef
 Berger, J, Berry, D The relevance of stopping rules in statistical inference (with discussion). In: Gupta, SS, Berger, J eds. (1988) Statistical decision theory and related topics, 1. Springer, New York, pp. 2972 CrossRef
 Champely, S. (2009). pwr: Basic functions for power analysis. R package version 1.1.1. http://CRAN.Rproject.org/package=pwr
 Cohen, J (1988) Statistical power analysis for the behavioral sciences. Erlbaum, Hillsdale, NJ
 Cumming, G (2012) Understanding the new statistics: Effect sizes, confidence intervals, and metaanalysis. Routledge, New York
 Davoli, CC, Abrams, RA (2009) Reaching out with the imagination. Psychological Science 20: pp. 293295 CrossRef
 Del Re, A.C. (2010). compute.es: Compute Effect Sizes. R package version 0.2. http://CRAN.Rproject.org/web/packages/compute.es/
 Dienes, Z (2011) Bayesian versus orthodox statistics: Which side are you on?. Perspectives on Psychological Science 6: pp. 274290 CrossRef
 Fayard, JV, Bassi, AK, Bernstein, DM, Roberts, BW (2009) Is cleanliness next to godliness? Dispelling old wives’ tales: Failure to replicate Zhong and Liljenquist (2006). Journal of Articles in Support of the Null Hypothesis 6: pp. 2128
 Fischer, P, Krueger, JI, Greitemeyer, T, Vogrincic, C, Kastenmüller, A, Frey, D, Heene, M, Wicher, M, Kainbacher, M (2011) The bystandereffect: A metaanalytic review on bystander intervention in dangerous and nondangerous emergencies. Psychological Bulletin 137: pp. 517537 CrossRef
 Francis, G (2012) The same old New Look: Publication bias in a study of wishful seeing. Perception 3: pp. 176178 CrossRef
 Francis, G (2012) Too good to be true: Publication bias in two prominent studies from experimental psychology. Psychonomic Bulletin & Review 19: pp. 151156 CrossRef
 Francis, G. (in press). Publication bias in “Red, Rank, and Romance in Women Viewing Men” by Elliot et al. (2010). Journal of Experimental Psychology: General.
 Gelman, A, Weakliem, D (2009) Of beauty, sex and power. American Scientist 97: pp. 310316 CrossRef
 Greenwald, AG (1975) Consequences of prejudice against the null hypothesis. Psychological Bulletin 82: pp. 120 CrossRef
 Hedges, LV (1984) Estimation of effect size under nonrandom sampling: The effects of censoring studies yielding statistically insignificant mean differences. Journal of Educational Statistics 9: pp. 6185 CrossRef
 Hedges, LV, Olkin, I (1985) Statistical methods for metaanalysis. Academic Press, San Diego, CA
 Ioannidis, JPA (2008) Why most discovered true associations are inflated. Epidemiology 19: pp. 640648 CrossRef
 Ioannidis, JPA, Trikalinos, TA (2007) An exploratory test for an excess of significant findings. Clinical Trials 4: pp. 245253 CrossRef
 John, LK, Loewenstein, G, Prelec, D (2012) Measuring the prevalence of questionable research practices with incentives for truthtelling. Psychological Science 23: pp. 524532 CrossRef
 Kass, RE, Raftery, AE (1995) Bayes factors. American Statistical Association 90: pp. 7737935 CrossRef
 Kerridge, D (1963) Bounds for the frequency of misleading Bayes inferences. Annals of Mathematical Statistics 34: pp. 11091110 CrossRef
 Kruschke, JK (2010) Bayesian data analysis. Wiley Interdisciplinary Reviews: Cognitive Science 1: pp. 658676 CrossRef
 Kruschke, J. K. (2010b). Doing Bayesian Data Analysis: A Tutorial with R and BUGS. Academic Press/Elsevier Science.
 Lane, DM, Dunlap, WP (1978) Estimating effect size: Bias resulting from the significance criterion in editorial decisions. British Journal of Mathematical and Statistical Psychology 31: pp. 107112 CrossRef
 Munafò, MR, Flint, J (2010) How reliable are scientific studies?. The British Journal of Psychiatry 197: pp. 257258 CrossRef
 Nairne, JS (2009) Psychology. Thomson Learning, Belmont, CA
 R Development Core Team. (2011). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3900051070, http://www.Rproject.org/
 Renkewitz, F, Fuchs, HM, Fiedler, S (2011) Is there evidence of publication biases in JDM research?. Judgment and Decision Making 6: pp. 870881
 Rosenthal, R (1984) Applied Social Research Methods Series, Vol. 6. Metaanalytic procedures for social research. Sage Publications, Newbury Park, CA
 Rouder, JN, Morey, RD (2011) A Bayes fator metaanalysis of Bem’s ESP claim. Psychonomic Bulletin & Review 18: pp. 682689 CrossRef
 Rouder, J. N., Morey R. D., Speckman P. L., & Province J. M. (in press). Default Bayes factors for ANOVA designs. Journal of Mathematical Psychology.
 Rouder, JN, Speckman, PL, Sun, D, Morey, RD, Iverson, G (2009) Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review 16: pp. 225237 CrossRef
 Scargle, JD (2000) Publication bias: The “FileDrawer” problem in scientific inference. Journal of Scientific Exploration 14: pp. 91106
 Schimmack, U. (in press). The ironic effect of significant results on the credibility of multiple study articles. Psychological Methods.
 Schooler, J (2011) Unpublished results hide the decline effect. Nature 470: pp. 437 CrossRef
 Simmons, JP, Nelson, LD, Simonsohn, U (2011) Falsepositive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science 22: pp. 13591366 CrossRef
 Sterling, TD (1959) Publication decisions and the possible effects on inferences drawn from test of significance—or vice versa. Journal of the American Statistical Association 54: pp. 3034
 Sterling, TD, Rosenbaum, WL, Weinkam, JJ (1995) Publication decisions revisited: The effect of the outcome of statistical tests on the decision to publish and vice versa. The American Statistician 49: pp. 108112
 Sterne, JA, Gavaghan, D, Egger, M (2000) Publication and related bias in metaanalysis: Power of statistical tests and prevalence in the literature. Journal of Clinical Epidemiology 53: pp. 11191129 CrossRef
 Strube, MJ (2006) SNOOP: A program for demonstrating the consequences of premature and repeated null hypothesis testing. Behavior Research Methods 38: pp. 2427 CrossRef
 Wagenmakers, EJ (2007) A practical solution to the pervasive problems of p values. Psychonomic Bulleting & Review 14: pp. 779804 CrossRef
 Zhong, C, Liljenquist, K (2006) Washing away your sins: Threatened morality and physical cleansing. Science 313: pp. 14511452 CrossRef
 Title
 Publication bias and the failure of replication in experimental psychology
 Journal

Psychonomic Bulletin & Review
Volume 19, Issue 6 , pp 975991
 Cover Date
 20121201
 DOI
 10.3758/s134230120322y
 Print ISSN
 10699384
 Online ISSN
 15315320
 Publisher
 SpringerVerlag
 Additional Links
 Topics
 Keywords

 Bayesian methods
 Hypothesis testing
 Metaanalysis
 Publication bias
 Replication
 Industry Sectors
 Authors

 Gregory Francis ^{(1)}
 Author Affiliations

 1. Department of Psychological Sciences, Purdue University, 703 Third Street, West Lafayette, IN, 47906, USA