Estimating the Reproducibility of Experimental Philosophy

Abstract

Responding to recent concerns about the reliability of the published literature in psychology and other disciplines, we formed the X-Phi Replicability Project (XRP) to estimate the reproducibility of experimental philosophy (osf.io/dvkpr). Drawing on a representative sample of 40 x-phi studies published between 2003 and 2015, we enlisted 20 research teams across 8 countries to conduct a high-quality replication of each study in order to compare the results to the original published findings. We found that x-phi studies – as represented in our sample – successfully replicated about 70% of the time. We discuss possible reasons for this relatively high replication rate in the field of experimental philosophy and offer suggestions for best research practices going forward.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Change history

  • 10 August 2018

    Appendix 1 was incomplete in the initial online publication. The original article has been corrected.

Notes

  1. 1.

    Meaning, the ratio of published studies that would replicate versus not replicate if a high-quality replication study were carried out.

  2. 2.

    http://experimental-philosophy.yale.edu/xphipage/Experimental%20Philosophy-Replications.html.

  3. 3.

    In practice, it can be hard to determine whether the ‘sufficiently similar’ criterion has actually been fulfilled by the replication attempt, whether in its methods or in its results (Nakagawa and Parker 2015). It can therefore be challenging to interpret the results of replication studies, no matter which way these results turn out (Collins 1975; Earp and Trafimow 2015; Maxwell et al. 2015). Thus, our findings should be interpreted with care: they should be seen as a starting point for further research, not as a final statement about the existence or non-existence of any individual effect. For instance, we were not able to replicate Machery et al. (2004), but this study has been replicated on several other occasions, including in children (Li et al. 2018; for a review, see Machery, 2017a, chapter 2).

  4. 4.

    Note that this page is basically a mirror of the “Experimental philosophy” category of the Philpapers database.

  5. 5.

    Despite two important studies published in 2001 (Greene et al. 2001; Weinberg et al. 2001), no experimental philosophy paper is to be found for 2002.

  6. 6.

    There was some initial debate about whether to include papers reporting negative results, that is, results that failed to reject the null hypothesis using NHST. We decided to do so when such results were used as the basis for a substantial claim. The reason for this was that negative results are sometimes treated as findings within experimental philosophy. For example, in experimental epistemology, the observation of negative results has led some to reach the substantive conclusion that practical stakes do not impact knowledge ascriptions (see for example Buckwalter 2010; Feltz and Zarpentine 2010; Rose et al. in press). Accordingly, papers reporting ‘substantive’ negative results were not excluded.

  7. 7.

    Note, however, that the more ‘demanding’ paper that was originally selected was not discarded from our list, but remained there in case research teams with the required resources agreed to replicate these studies.

  8. 8.

    It should be noted that two other papers were replaced during the replication process. For the year 2006, Malle (2006) was replaced with Nichols (2006), given that the original paper misreported both the results and statistical analyses, making comparison with replication impossible. For the same year, Cushman et al. (2006) proved to be too resource-demanding after all and was replaced by Nahmias et al. (2006).

  9. 9.

    In this respect, our methodology differed from the OSC’s methodology, which instructed replication teams to focus on the papers’ last study.

  10. 10.

    Ns were computed not from the total N recruited for the whole study but from the number of data points included in the relevant statistical analysis.

  11. 11.

    For this analysis, studies for which power > 0.99 were counted as power = 0.99.

  12. 12.

    For studies reporting statistically significant results, we counted studies for which the original effect size was smaller than the replication 95% CI as successful replications on the ground that, given the studies’ original hypotheses, a greater effect size than originally expected constituted even more evidence in favor of these hypotheses. Of course, theoretically, this need not always be the case, for example if a given hypothesis makes precise predictions about the size of an effect. But for the studies we attempted to replicate, a greater effect size did indeed signal greater support for the hypothesis.

  13. 13.

    As pointed out by a reviewer on this paper, this criterion might even be considered too stringent. This is because, in certain circumstances in which no prediction is made about the size of an effect, a replication for which the 95% CI falls below the original effect size might still be considered as a successful replication, given that there is a significant effect in the predicted direction. Other ways of assessing replication success using effect sizes might include computing whether there is a statistical difference between the original and replication effect size (which would present the disadvantage of rewarding underpowered studies), or considering whether the replication effect size fell beyond the lower bound of the 95% CI of the original effect size (which returns a rate of 28 successful replications out of 34 original studies, i.e. 82.4%). Nevertheless, we decided to err on the side of stringency.

  14. 14.

    This analysis was done on the basis of Google Scholar’s citation count (as of March 23rd, 2018).

  15. 15.

    In a previous version of this manuscript, we reported 30 content-based studies and 5 demographic effects. However, helpful commentaries from readers, including Wesley Buckwalter, led us to revise our classification for Nichols (2004).

  16. 16.

    A low replication rate for demographic-based effects should not be taken as direct evidence for the nonexistence of variations between demographic groups. Indeed, out of 3 demographic-based effects that failed to replicate, one was a null effect, meaning that the failed replication found an effect where there was none in the original study.

  17. 17.

    Possible reasons for such transparency might be that (i) experimental philosophy is still a smaller academic community where individual researchers are likelier to be well known to each other and thus able and willing to hold each other accountable, and (ii) research resources (such as online survey accounts) used to be shared among researchers in the early days of the field, thus making questionable research practices more difficult to obscure (see Liao 2015).

  18. 18.

    One more cynical explanation would simply be that experimental philosophers are less well versed in into statistics, and that certain questionable research practices are only available to those who have sufficient skills in this area (i.e., the ability to take advantage of highly complex statistical models or approaches to produce ‘findings’ that are of questionable value).

  19. 19.

    For example, as of November 2017, the Wikipedia page for “Experimental Philosophy” dedicates a large part of its “Criticisms” section to the “Problem of Reproducibility,” arguing that “a parallel with experimental psychology is likely.”

References

  1. Alfano, M. & Loeb, D. 2014. Experimental moral philosophy. In The Stanford Encyclopedia of Philosophy (Fall 2017 Edition), ed. E. N. Zalta. Retrieved from https://plato.stanford.edu/archives/fall2017/entries/experimental-moral/

  2. American Statistical Association. 2016. American Statistical Association statement on statistical significance and p-values. American Statistical Association. Retrieved from http://www.amstat.org/asa/files/pdfs/P-ValueStatement.pdf

  3. Amrhein, V., and S. Greenland. 2017. Remove, rather than redefine, statistical significance. Nature Human Behaviour. https://doi.org/10.1038/s41562-017-0224-0.

  4. Anderson, S.F., K. Kelley, and S.E. Maxwell. 2017. Sample-size planning for more accurate statistical power: A method adjusting sample effect sizes for publication bias and uncertainty. Psychological Science 28 (11): 1547–1562. https://doi.org/10.1177/0956797617723724.

    Article  Google Scholar 

  5. Baker, M. 2016. Is there a reproducibility crisis? Nature 533 (1): 452–454.

    Article  Google Scholar 

  6. Benjamin, D.J., J.O. Berger, M. Johannesson, B.A. Nosek, E.-J. Wagenmakers, R. Berk, et al. in press. Redefine statistical significance. Nature Human Behaviour. https://doi.org/10.1038/s41562-017-0189-z.

  7. Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E. J., Berk, R., ... & Cesarini, D. 2018. Redefine statistical significance. Nature Human Behaviour 2 (1): 6

  8. Boyle, G. J. (in press). Proving a negative? Methodological, statistical, and psychometric flaws in Ullmann et al. (2017) PTSD study. Journal of Clinical and Translational Research.

  9. Brandt, M. J., IJzerman, H., Dijksterhuis, A., Farach, F. J., Geller, J., Giner-Sorolla, R., … van ’t Veer, A. 2014. The replication recipe: What makes for a convincing replication? Journal of Experimental Social Psychology, 50 (supplement C), 217–224. https://doi.org/10.1016/j.jesp.2013.10.005, The Replication Recipe: What makes for a convincing replication?

  10. Buckwalter, W. 2010. Knowledge isn’t closed on Saturday: A study in ordinary language. Review of Philosophy and Psychology 1 (3): 395–406. https://doi.org/10.1007/s13164-010-0030-3.

    Article  Google Scholar 

  11. Button, K.S., J.P. Ioannidis, C. Mokrysz, B.A. Nosek, J. Flint, E.S. Robinson, and M.R. Munafò. 2013. Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience 14 (5): 365–376. https://doi.org/10.1038/nrn3475.

    Article  Google Scholar 

  12. Casler, K., L. Bickel, and E. Hackett. 2013. Separate but equal? A comparison of participants and data gathered via Amazon’s MTurk, social media, and face-to-face behavioral testing. Computers in Human Behavior 29 (6): 2156–2160. https://doi.org/10.1016/j.chb.2013.05.009.

    Article  Google Scholar 

  13. Cesario, J. 2014. Priming, replication, and the hardest science. Perspectives on Psychological Science 9 (1): 40–48. https://doi.org/10.1177/1745691613513470.

    Article  Google Scholar 

  14. Chambers, C., & Munafò, M. 2013. Trust in science would be improved by study pre-registration. The Guardian. Retrieved from http://www.theguardian.com/science/blog/2013/jun/05/trust-in-science-study-pre-registration

  15. Champely, S. 2018. Package ‘pwr’. Retrieved from http://cran.r-project.org/package=pwr

  16. Chang, A.C., and P. Li. 2015. Is economics research replicable? Sixty published papers from thirteen journals say “usually not”, Finance and Economics Discussion Series 2015–083. Washington, DC: Board of Governors of the Federal Reserve System.

    Google Scholar 

  17. Clavien, C., C.J. Tanner, F. Clément, and M. Chapuisat. 2012. Choosy moral punishers. PLoS One 7 (6): e39002. https://doi.org/10.1371/journal.pone.0039002.

    Article  Google Scholar 

  18. Collins, H.M. 1975. The seven sexes: A study in the sociology of a phenomenon, or the replication of experiments in physics. Sociology 9 (2): 205–224. https://doi.org/10.1177/003803857500900202.

    Article  Google Scholar 

  19. Colombo, M., Duev, G., Nuijten, M. B., & Sprenger, J. 2017. Statistical reporting inconsistencies in experimental philosophy. Retrieved from https://osf.io/preprints/socarxiv/z65fv

  20. Cova, F. 2012. Qu’est-ce que la philosophie expérimentale ? In La Philosophie Expérimentale, ed. F. Cova, J. Dutant, E. Machery, J. Knobe, S. Nichols, and E. Nahmias. Paris: Vuibert.

    Google Scholar 

  21. Cova, F. 2016. The folk concept of intentional action: Empirical approaches. In A Companion to Experimental Philosophy, ed. W. Buckwalter and J. Sytsma, 121–141 Wiley-Blackwell.

    Google Scholar 

  22. Cova, F. 2017. What happened to the trolley problem? Journal of Indian Council of Philosophical Research 34 (3): 543–564.

    Article  Google Scholar 

  23. Crandall, C.S., and J.W. Sherman. 2016. On the scientific superiority of conceptual replications for scientific progress. Journal of Experimental Social Psychology 66 (Supplement C): 93–99. https://doi.org/10.1016/j.jesp.2015.10.002.

    Article  Google Scholar 

  24. Cullen, S. 2010. Survey-driven romanticism. Review of Philosophy and Psychology 1 (2): 275–296. https://doi.org/10.1007/s13164-009-0016-1.

    Article  Google Scholar 

  25. Cumming, G. 2013. Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. Routledge.

  26. Cushman, F., Young, L., & Hauser, M. 2006. The role of conscious reasoning and intuition in moral judgment testing three principles of harm. Psychological Science 17 (12): 1082–1089.

  27. De Villiers, J., R.J. Stainton, and P. Szatmari. 2007. Pragmatic abilities in autism spectrum disorder: A case study in philosophy and the empirical. Midwest Studies in Philosophy 31 (1): 292–317. https://doi.org/10.1111/j.1475-4975.2007.00151.x.

    Article  Google Scholar 

  28. Del Re, A. C. 2015. Package “compute.es”. Available from https://cran.r-project.org/web/packages/compute.es/compute.es.pdf Accessed 08 Apr 2018.

  29. Doyen, S., O. Klein, D.J. Simons, and A. Cleeremans. 2014. On the other side of the mirror: Priming in cognitive and social psychology. Social Cognition 32 (Supplement): 12–32. https://doi.org/10.1521/soco.2014.32.supp.12.

    Article  Google Scholar 

  30. Dunaway, B., A. Edmonds, and D. Manley. 2013. The folk probably do think what you think they think. Australasian Journal of Philosophy 91 (3): 421–441.

    Article  Google Scholar 

  31. Earp, B.D. 2017. The need for reporting negative results – a 90 year update. Journal of Clinical and Translational Research 3 (S2): 1–4. https://doi.org/10.18053/jctres.03.2017S2.001.

    Google Scholar 

  32. Earp, B.D. in press. Falsification: How does it relate to reproducibility? In Key concepts in research methods, ed. J.-F. Morin, C. Olsson, and E.O. Atikcan. Abingdon: Routledge.

  33. Earp, B.D., and D. Trafimow. 2015. Replication, falsification, and the crisis of confidence in social psychology. Frontiers in Psychology 6 (621): 1–11. https://doi.org/10.3389/fpsyg.2015.00621.

    Google Scholar 

  34. Earp, B.D., and D. Wilkinson. 2017. The publication symmetry test: a simple editorial heuristic to combat publication bias. Journal of Clinical and Translational Research 3 (S2): 5–7. https://doi.org/10.18053/jctres.03.2017S2.002.

    Google Scholar 

  35. Feltz, A., and F. Cova. 2014. Moral responsibility and free will: A meta-analysis. Consciousness and Cognition 30: 234–246. https://doi.org/10.1016/j.concog.2014.08.012.

    Article  Google Scholar 

  36. Feltz, A., and C. Zarpentine. 2010. Do you know more when it matters less? Philosophical Psychology 23 (5): 683–706. https://doi.org/10.1080/09515089.2010.514572.

    Article  Google Scholar 

  37. Fiedler, K., and N. Schwarz. 2016. Questionable research practices revisited. Social Psychological and Personality Science 7 (1): 45–52. https://doi.org/10.1177/1948550615612150.

    Article  Google Scholar 

  38. Findley, M.G., N.M. Jensen, E.J. Malesky, and T.B. Pepinsky. 2016. Can results-free review reduce publication bias? The results and implications of a pilot study. Comparative Political Studies 49 (13): 1667–1703. https://doi.org/10.1177/0010414016655539.

    Article  Google Scholar 

  39. Fraley, R.C., and S. Vazire. 2014. The N-pact factor: Evaluating the quality of empirical journals with respect to sample size and statistical power. PLoS One 9 (10): e109019. https://doi.org/10.1371/journal.pone.0109019.

    Article  Google Scholar 

  40. Franco, A., N. Malhotra, and G. Simonovits. 2014. Publication bias in the social sciences: Unlocking the file drawer. Science 345 (6203): 1502–1505. https://doi.org/10.1126/science.1255484.

    Article  Google Scholar 

  41. Gilbert, D.T., G. King, S. Pettigrew, and T.D. Wilson. 2016. Comment on “estimating the reproducibility of psychological science”. Science 351 (6277): 1037–1037. https://doi.org/10.1126/science.aad7243.

    Article  Google Scholar 

  42. Greene, J.D., R.B. Sommerville, L.E. Nystrom, J.M. Darley, and J.D. Cohen. 2001. An fMRI investigation of emotional engagement in moral judgment. Science 293 (5537): 2105–2108. https://doi.org/10.1126/science.1062872.

    Article  Google Scholar 

  43. Greene, J. D., Morelli, S.A., Lowenberg, K., Nystrom, L.E., & Cohen, J.D. 2008. Cognitive load selectively interferes with utilitarian moral judgment. Cognition 107 (3): 1144–1154.

  44. Grens, K. (2014). The rules of replication. Retrieved November 8, 2017, from http://www.the-scientist.com/?articles.view/articleNo/41265/title/The-Rules-of-Replication/

  45. Heine, S.J., D.R. Lehman, K. Peng, and J. Greenholtz. 2002. What's wrong with cross-cultural comparisons of subjective Likert scales? The reference-group effect. Journal of Personality and Social Psychology 82 (6): 903–918. https://doi.org/10.1037//0022-3514.82.6.903.

    Article  Google Scholar 

  46. Hendrick, C. 1990. Replications, strict replications, and conceptual replications: Are they important? Journal of Social Behavior and Personality 5 (4): 41–49.

    Google Scholar 

  47. Hitchcock, C., & Knobe, J. (2009). Cause and norm. The Journal of Philosophy 106 (11): 587–612.

  48. Ioannidis, J.P.A. 2005. Why most published research findings are false. PLoS Medicine 2 (8): e124. https://doi.org/10.1371/journal.pmed.0020124.

    Article  Google Scholar 

  49. John, L.K., G. Loewenstein, and D. Prelec. 2012. Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science 23 (5): 524–532. https://doi.org/10.1177/0956797611430953.

    Article  Google Scholar 

  50. Knobe, J. 2016. Experimental philosophy is cognitive science. In A companion to experimental philosophy, ed. J. Sytsma and W. Buckwalter, 37–52. John Wiley & Sons, Ltd. https://doi.org/10.1002/9781118661666.ch3.

  51. Knobe, J. 2003a. Intentional action and side effects in ordinary language. Analysis 63 (279): 190–94.

  52. Knobe, J. 2003b. Intentional action in folk psychology: An experimental investigation. Philosophical psychology 16 (2): 309–324.

  53. Knobe, J., & Burra, A. 2006. The folk concepts of intention and intentional action: A cross-cultural study. Journal of Cognition and Culture 6 (1): 113–132.

  54. Knobe, J. 2007. Experimental Philosophy. Philosophy Compass 2 (1): 81–92.

  55. Knobe, J., and S. Nichols. 2008. Experimental philosophy. Oxford University Press.

  56. Knobe, J., W. Buckwalter, S. Nichols, P. Robbins, H. Sarkissian, and T. Sommers. 2012. Experimental philosophy. Annual Review of Psychology 63 (1): 81–99. https://doi.org/10.1146/annurev-psych-120710-100350.

    Article  Google Scholar 

  57. Lakens, D. 2013. Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs. Frontiers in Psychology 4: 863.

    Article  Google Scholar 

  58. Lakens, D., F.G. Adolfi, C. Albers, F. Anvari, M.A.J. Apps, S.E. Argamon, et al. 2017. Justify your alpha: a response to “Redefine statistical significance”. PsyArXiv. https://doi.org/10.17605/OSF.IO/9S3Y6.

  59. Lam, B. 2010. Are Cantonese-speakers really descriptivists? Revisiting cross-cultural semantics. Cognition 115 (2), 320–329.

  60. Lash, T.L., and J.P. Vandenbroucke. 2012. Should preregistration of epidemiologic study protocols become compulsory? Reflections and a counterproposal. Epidemiology 23 (2): 184–188. https://doi.org/10.1097/EDE.0b013e318245c05b.

    Article  Google Scholar 

  61. Li, J., L. Liu, E. Chalmers, and J. Snedeker. 2018. What is in a name?: The development of cross-cultural differences in referential intuitions. Cognition 171: 108–111. https://doi.org/10.1016/j.cognition.2017.10.022.

    Article  Google Scholar 

  62. Liao, S. 2015. The state of reproducibility in experimental philosophy Retrieved from http://philosophycommons.typepad.com/xphi/2015/06/the-state-of-reproducibility-in-experimental-philosophy.html

  63. Locascio, J. 2017. Results blind science publishing. Basic and Applied Social Psychology 39 (5): 239–246. https://doi.org/10.1080/01973533.2017.1336093.

    Article  Google Scholar 

  64. Machery, E., Mallon, R., Nichols, S., & Stich, S. P. 2004. Semantics, cross-cultural style. Cognition 92 (3): B1–B12.

  65. Machery, E. 2017a. Philosophy within its proper bounds. Oxford: Oxford University Press.

    Google Scholar 

  66. Machery, E. 2017b. What is a replication? Unpublished manuscript.

  67. Makel, M.C., & Plucker, J.A. 2014. Facts are more important than novelty: Replication in the educationsciences. Educational Researcher 43 (6), 304–316.

  68. Malle, B. F. 2006. Intentionality, morality, and their relationship in human judgment. Journal of Cognition and Culture 6 (1), 87–112.

  69. Maxwell, S.E., M.Y. Lau, and G.S. Howard. 2015. Is psychology suffering from a replication crisis? What does “failure to replicate” really mean? The American Psychologist 70 (6): 487–498. https://doi.org/10.1037/a0039400.

    Article  Google Scholar 

  70. McShane, B.B., Gal, D., Gelman, A., Robert, C., & Tackett, J L. (2017). Abandon Statistical Significance. arXiv preprint. arXiv:1709.07588.

  71. Munafò, M.R., B.A. Nosek, D.V.M. Bishop, K.S. Button, C.D. Chambers, N.P. du Sert, et al. 2017. A manifesto for reproducible science. Nature Human Behaviour 1 (21): 1–9. https://doi.org/10.1038/s41562-016-0021.

    Google Scholar 

  72. Murtaugh, P.A. 2014. In defense of p-values. Ecology 95 (3): 611–617. https://doi.org/10.1890/13-0590.1.

    Article  Google Scholar 

  73. Nadelhoffer, T., & Feltz, A. 2008. The actor–observer bias and moral intuitions: adding fuel to Sinnott-Armstrong’s fire. Neuroethics 1 (2): 133–144.

  74. Nadelhoffer, T., Kvaran, T., & Nahmias, E. 2009. Temperament and intuition: A commentary on Feltz and Cokely. Consciousness and cognition, 18 (1): 351–355.

  75. Nakagawa, S., and T.H. Parker. 2015. Replicating research in ecology and evolution: Feasibility, incentives, and the cost-benefit conundrum. BMC Biology 13 (88): 1–6. https://doi.org/10.1186/s12915-015-0196-3.

    Google Scholar 

  76. Nahmias, E., Morris, S.G., Nadelhoffer, T., & Turner, J. (2006). Is incompatibilism intuitive? Philosophy and Phenomenological Research 73 (1): 28–53.

  77. Nichols, S. 2004. After objectivity: An empirical study of moral judgment. Philosophical Psychology 17 (1): 3–26.

  78. Nichols, S. 2006. Folk intuitions on free will. Journal of Cognition and Culture 6 (1): 57–86.

  79. Nichols, S., & Knobe, J. 2007. Moral responsibility and determinism: The cognitive science of folk intuitions. Nous 41 (4): 663–685.

  80. Nosek, B.A., and T.M. Errington. 2017. Reproducibility in cancer biology: Making sense of replications. eLife 6: e23383. https://doi.org/10.7554/eLife.23383.

    Article  Google Scholar 

  81. Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (in press). The preregistration revolution. Proceedings of the National Academy of Sciences.

  82. O’Neill, E., and E. Machery. 2014. Experimental philosophy: What is it good for? In Current controversies in experimental philosophy, ed. E. Machery and E. O’Neill. New York: Routledge.

    Google Scholar 

  83. Open Science Collaboration. 2015. Estimating the reproducibility of psychological science. Science 349 (6251): aac4716. https://doi.org/10.1126/science.aac4716.

    Article  Google Scholar 

  84. Reuter, K. 2011. Distinguishing the appearance from the reality of pain. Journal of Consciousness Studies 18 (9-10): 94–109.

  85. Rose, D., and D. Danks. 2013. In defense of a broad conception of experimental philosophy. Metaphilosophy 44 (4): 512–532. https://doi.org/10.1111/meta.12045.

    Article  Google Scholar 

  86. Rose, D., Machery, E., Stich, S., Alai, M., Angelucci, A., Berniūnas, R., … & Cohnitz, D. (in press). Nothing at stake in knowledge. Noûs.

  87. Rosenthal, R. 1979. The file drawer problem and tolerance for null results. Psychological Bulletin 86 (3): 638–641. https://doi.org/10.1037/0033-2909.86.3.638.

    Article  Google Scholar 

  88. Schmidt, S. 2009. Shall we really do it again? The powerful concept of replication is neglected in the social sciences. Review of General Psychology 13 (2): 90–100. https://doi.org/10.1037/a0015108.

    Article  Google Scholar 

  89. Scott, S. 2013. Pre-registration would put science in chains. Retrieved July 29, 2017, from https://www.timeshighereducation.com/comment/opinion/pre-registration-would-put-science-in-chains/2005954.article

  90. Simmons, J.P., L.D. Nelson, and U. Simonsohn. 2011. False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science 22 (11): 1359–1366. https://doi.org/10.1177/0956797611417632.

    Article  Google Scholar 

  91. Simonsohn, U., L.D. Nelson, and J.P. Simmons. 2014. P-curve: A key to the file-drawer. Journal of Experimental Psychology: General 143 (2): 534.

    Article  Google Scholar 

  92. Sprouse, J., & Almeida, D. 2017. Setting the empirical record straight: Acceptability judgments appear to be reliable, robust, and replicable. Behavioral and Brain Sciences 40: e311.

  93. Stroebe, W., and F. Strack. 2014. The alleged crisis and the illusion of exact replication. Perspectives on Psychological Science 9 (1): 59–71.

    Article  Google Scholar 

  94. Trafimow, D., and B.D. Earp. 2017. Null hypothesis significance testing and type I error: The domain problem. New Ideas in Psychology 45: 19–27.

    Article  Google Scholar 

  95. Weinberg, J.M., S. Nichols, and S. Stich. 2001. Normativity and epistemic intuitions. Philosophical Topics 29 (1/2): 429–460.

    Article  Google Scholar 

  96. Woolfolk, R.L. 2013. Experimental philosophy: A methodological critique. Metaphilosophy 44 (1–2): 79. https://doi.org/10.1111/meta.12016.

    Article  Google Scholar 

  97. Young, N.S., Ioannidis, J.P., & Al-Ubaydli, O. 2008. Why current publication practices may distort science. PLoS medicine 5 (10): e201.

  98. Zalla, T., & Leboyer, M. 2011. Judgment of intentionality and moral evaluation in individuals with high functioning autism. Review of Philosophy and Psychology 2 (4), 681–698.

Download references

Acknowledgments

This project could not have been possible without the financial support of multiple organizations. Florian Cova’s work on this project was supported by a grant from the Cogito Foundation (Grant No. S-131/13, “Towards an Experimental Philosophy of Aesthetics”).

Brent Strickland’s work was supported by two grants from the Agence Nationale de la Recherche (Grants No. ANR-10-IDEX-0001-02 PSL*, ANR-10-LABX-0087 IEC).

Matteo Colombo, Noah van Dongen, Felipe Romero and Jan Sprenger’s work was supported by the European Research Council (ERC) through Starting Grant. No. 640638 (“Making Scientific Inferences More Objective”).

Rodrigo Diaz and Kevin Reuter would like to acknowledge funding from the Swiss National Science Foundation, Grant No. 100012_169484.

Antonio Gaitán Torres and Hugo Viciana benefited from funding from the Ministerio de Economía y Competitividad for the project “La constitución del sujeto en la interacción social” (Grant No. FFI2015-67569-C2-1-P & FFI2015-67569-C2-2-P).

José Hernández-Conde carried out his work as a Visiting Scholar at the University of Pittsburgh’s HPS Department. He was financially supported by a PhD scholarship and mobility grant from the University of the Basque Country, and by the Spanish Ministry of Economy and Competitiveness research project No. FFI2014-52196-P. His replication research was supported by the Pittsburgh Empirical Philosophy Lab.

Hanna Kim’s work was supported by the Pittsburgh Empirical Philosophy Lab.

Shen-yi Liao’s work was supported by the University of Puget Sound Start-up Funding.

Tania Moerenhout carried out her work as a Visiting Researcher at the Center for Bioethics and Health Law, University of Pittsburgh, PA (Aug 2016-July 2017).

Aurélien Allard, Miklos Kurthy, and Paulo Sousa are grateful to Rashmi Sharma for her help in the replication of Knobe & Burra (2006), in particular for her help in translating the demographic questions from English to Hindi.

Ivar Hannikainen and Florian Cova would like to thank Uri Simonsohn for his help in discussing the meaning and best interpretation of p-curves.

Finally, we would like to thank all the authors of original studies who accepted to take the time to answer our questions, share their original material and data, and discuss the results of our replication attempts with us.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Florian Cova.

Additional information

OSF Repository

Details, methods and results for all replications can be found online at https://osf.io/dvkpr/

Softwares

Most of the analyses reported in this manuscript were conducted using the R {compute.es} and {pwr} packages (Champely 2018; Del Re 2015). We are also indebted to Lakens’ R2D2 sheet (Lakens 2013).

The original article has been revised: Appendix 1 has been corrected.

Appendices

Appendix 1. List of Studies Selected for Replication

(Crossed-out studies are studies who were planned for replications but did not get replicated.)

*2003

Most cited: Knobe, J. (2003a). Intentional action and side effects in ordinary language. Analysis, 63(279), 190–194. [Study 1] (Content-based, successful, osf.io/hdz5x/).

Random: Knobe, J. (2003b). Intentional action in folk psychology: An experimental investigation. Philosophical Psychology, 16(2), 309–324. [Study 1] (Content-based, successful, osf.io/78sqa/).

*2004

Most cited: Machery, E., Mallon, R., Nichols, S., & Stich, S. P. (2004). Semantics, cross-cultural style. Cognition, 92(3), B1-B12. (Demographic effect, successful, osf.io/qdekc/)

  • Replacement: Knobe, J. (2004). Intention, intentional action and moral considerations. Analysis, 64(282), 181–187. [Study 1] (Content-based, successful, osf.io/ka5wv/)

Random 1: Nadelhoffer, T. (2004). Blame, Badness, and Intentional Action: A Reply to Knobe and Mendlow. Journal of Theoretical and Philosophical Psychology, 24(2), 259–269. (Content-based, unsuccessful, osf.io/w9bza/).

Random 2: Nichols, S. (2004). After objectivity: An empirical study of moral judgment. Philosophical Psychology, 17(1), 3–26. [Study 3] (Content-based, successful, osf.io/bv4ep/).

*2005

Most cited:Nahmias, E., Morris, S., Nadelhoffer, T., & Turner, J. (2005). Surveying freedom: Folk intuitions about free will and moral responsibility. Philosophical Psychology, 18(5), 561–584. [Study 1] (Content-based, successful, osf.io/4gvd5/).

Random 1: McCann, H. J. (2005). Intentional action and intending: Recent empirical studies. Philosophical Psychology, 18(6), 737–748. [Study 1] (Context-based, null effect, successful, osf.io/jtsnn/).

Random 2: Nadelhoffer, T. (2005). Skill, luck, control, and intentional action. Philosophical Psychology, 18(3), 341–352. [Study 1] (Content-based, successful, osf.io/6ds5e/).

*2006

Most cited: Cushman, F., Young, L., & Hauser, M. (2006). The role of conscious reasoning and intuition in moral judgment testing three principles of harm. Psychological Science, 17(12), 1082–1089.

  • Replacement: Nahmias, E., Morris, S. G., Nadelhoffer, T., & Turner, J. (2006). Is incompatibilism intuitive? Philosophy and Phenomenological Research, 73(1), 28–53. [Study 2] (Content-based, unsuccessful, osf.io/m8t3k/)

Random 1: Knobe, J., & Burra, A. (2006). The folk concepts of intention and intentional action: A cross-cultural study. Journal of Cognition and Culture, 6(1), 113–132. (Content-based, successful, osf.io/p48sa/)

  • Replacement: Malle, B. F. (2006). Intentionality, morality, and their relationship in human judgment. Journal of Cognition and Culture, 6(1), 87–112.

  • Replacement: Nichols, S. (2006). Folk intuitions on free will. Journal of Cognition and Culture, 6(1), 57–86. [Study 2] (Content-based, successful, osf.io/8kf3p/)

Random 2: Nadelhoffer, T. (2006). Bad acts, blameworthy agents, and intentional actions: Some problems for juror impartiality. Philosophical Explorations, 9(2), 203–219. (Content-based, successful, osf.io/bv42c/).

*2007

Most cited: Nichols, S., & Knobe, J. (2007). Moral responsibility and determinism: The cognitive science of folk intuitions. Nous, 41(4), 663–685. [Study 1] (Content-based, successful, osf.io/stjwg/).

Random 1: Nahmias, E., Coates, D. J., & Kvaran, T. (2007). Free will, moral responsibility, and mechanism: Experiments on folk intuitions. Midwest studies in Philosophy, 31(1), 214–242. (Content-based, successful, osf.io/pjdkg/).

Random 2: Livengood, J., & Machery, E. (2007). The folk probably don’t think what you think they think: Experiments on causation by absence. Midwest Studies in Philosophy, 31(1), 107–127. [Study 1] (Content-based, successful, osf.io/7er6r/).

*2008

Most cited: Greene, J. D., Morelli, S. A., Lowenberg, K., Nystrom, L. E., & Cohen, J. D. (2008). Cognitive load selectively interferes with utilitarian moral judgment. Cognition, 107(3), 1144–1154. (Context-based, unsuccessful, but with deviations from the original procedure, see osf.io/yb38c).

Random 1: Gonnerman, C. (2008). Reading conflicted minds: An empirical follow-up to Knobe and Roedder. Philosophical Psychology, 21(2), 193–205. (Content-based, successful, osf.io/wy8ab/).

Random 2: Nadelhoffer, T., & Feltz, A. (2008). The actor–observer bias and moral intuitions: adding fuel to Sinnott-Armstrong’s fire. Neuroethics, 1(2), 133–144. (Context-based, unsuccessful, osf.io/jb8yp/).

*2009

Most cited: Hitchcock, C., & Knobe, J. (2009). Cause and norm. The Journal of Philosophy, 106(11), 587–612. (Content-based, successful, osf.io/ykt7z/).

Random 1: Roxborough, C., & Cumby, J. (2009). Folk psychological concepts: Causation. Philosophical Psychology, 22(2), 205–213. (Content-based, unsuccessful, osf.io/5eanz/).

Random 2: Nadelhoffer, T., Kvaran, T., & Nahmias, E. (2009). Temperament and intuition: A commentary on Feltz and Cokely. Consciousness and Cognition, 18(1), 351–355. (Demographic effect, null effect, unsuccessful, osf.io/txs86/).

*2010

Most cited: Beebe, J. R., & Buckwalter, W. (2010). The epistemic side-effect effect. Mind & Language, 25(4), 474–498. (Content-based, successful, osf.io/n6r3b/)

Random 1: Lam, B. (2010). Are Cantonese-speakers really descriptivists? Revisiting crosscultural semantics. Cognition, 115(2), 320–329.

  • Replacement: Sytsma, J., & Machery, E. (2010). Two conceptions of subjective experience. Philosophical Studies, 151(2), 299–327. [Study 1] (Demographic effect, successful, osf.io/z2fj8/)

Random 2: De Brigard, F. (2010). If you like it, does it matter if it’s real? Philosophical Psychology, 23(1), 43–57. (Content-based, successful, osf.io/cvuwy/).

*2011

Most cited: Alicke, M. D., Rose, D., & Bloom, D. (2011). Causation, norm violation, and culpable control. The Journal of Philosophy, 108(12), 670–696. [Study 1] (Content-based, unsuccessful, osf.io/4yuym/)

Random 1: Zalla, T., & Leboyer, M. (2011). Judgment of intentionality and moral evaluation in individuals with high functioning autism. Review of Philosophy and Psychology, 2(4), 681–698.

  • Replacement: Reuter, K. (2011). Distinguishing the Appearance from the Reality of Pain. Journal of Consciousness Studies, 18(9–10), 94–109. (Observational data, successful, osf.io/3sn6j/)

Random 2: Sarkissian, H., Park, J., Tien, D., Wright, J. C., & Knobe, J. (2011). Folk moral relativism. Mind & Language, 26(4), 482–505. [Study 1] (Content-based, successful, osf.io/cy4b6/).

*2012

Most cited: Paxton, J. M., Ungar, L., & Greene, J. D. (2012). Reflection and reasoning in moral judgment. Cognitive Science, 36(1), 163–177. [Study 1] (Context-based, unsuccessful, osf.io/ejmyw/).

Random 1: Schaffer, J., & Knobe, J. (2012). Contrastive knowledge surveyed. Noûs, 46(4), 675–708. [Study 1] (Content-based, successful, osf.io/z4e45/).

Random 2: May, J., & Holton, R. (2012). What in the world is weakness of will? Philosophical Studies, 157(3), 341–360. [Study 3] (Content-based, successful, osf.io/s37h6/).

*2013

Most cited: Nagel, J., San Juan, V., & Mar, R. A. (2013). Lay denial of knowledge for justified true beliefs. Cognition, 129(3), 652–661. (Content-based, successful, osf.io/6yfxz/).

Random 1: Beebe, J. R., & Shea, J. (2013). Gettierized Knobe effects. Episteme, 10(3), 219. (Content-based, successful, osf.io/k89fc/).

Random 2: Rose, D., & Nichols, S. (2013). The lesson of bypassing. Review of Philosophy and Psychology, 4(4), 599–619. [Study 1] (Content-based, null effect, successful, osf.io/ggw7c/).

*2014

Most cited: Murray, D., & Nahmias, E. (2014). Explaining away incompatibilist intuitions. Philosophy and Phenomenological Research, 88(2), 434–467. [Study 1] (Content-based, successful, osf.io/rpkjk/).

Random 1: Grau, C., & Pury, C. L. (2014). Attitudes towards reference and replaceability. Review of Philosophy and Psychology, 5(2), 155–168. (Demographic effect, unsuccessful, osf.io/xrhqe/).

Random 2: Liao, S., Strohminger, N., & Sripada, C. S. (2014). Empirically investigating imaginative resistance. The British Journal of Aesthetics, 54(3), 339–355. [Study 2] (Content-based, successful, osf.io/7e8hz/).

*2015

Most cited: Buckwalter, W., & Schaffer, J. (2015). Knowledge, stakes, and mistakes. Noûs, 49(2), 201–234. [Study 1] (Content-based, successful, osf.io/2ukpq/).

Random 1: Björnsson, G., Eriksson, J., Strandberg, C., Olinder, R. F., & Björklund, F. (2015). Motivational internalism and folk intuitions. Philosophical Psychology, 28(5), 715–734. [Study 2] (Content-based, successful, osf.io/d8uvg/).

Random 2: Kominsky, J. F., Phillips, J., Gerstenberg, T., Lagnado, D., & Knobe, J. (2015). Causal superseding. Cognition, 137, 196–209. [Study 1] (Content-based, successful, osf.io/f5svw/).

Appendix 2. Pre-replication form

Reference of the paper: ….

Replication team: ….

*Which study in the paper do you replicate? ….

*If it is not the first study, please explain your choice: ….

*In this study, what is the main result you will focus on during replication? Please give all relevant statistical details present in the paper: ….

*What is the corresponding hypothesis? ….

*What is the corresponding effect size? ….

*Was the original effect size:

  • Explicitly reported in the original paper

  • Not explicitly reported in the original paper, but inferable from other information present in the original paper

  • Not inferable from information present in the original paper.

*What is the corresponding confidence interval (if applicable)?

*Was the original confidence interval:

  • Explicitly reported in the original paper

  • Not explicitly reported in the original paper, but inferable from other information present in the original paper

  • Not inferable from information present in the original paper.

*From which population was the sample used in the original study drawn? (Which country, language, students/non-students, etc.)

*Was the nature of the original population:

  • Explicitly reported in the original paper

  • Not explicitly reported in the original paper, but inferable from other information present in the original paper

  • Not inferable from information present in the original paper.

*What was the original sample size (N): ….

*Was the original sample size:

  • Explicitly reported in the original paper

  • Not explicitly reported in the original paper, but inferable from other information present in the original paper

  • Not inferable from information present in the original paper.

*Does the study involve a selection procedure (e.g. comprehension checks)? (YES/NO).

*If YES, describe it briefly: ….

*Were all the steps of the selection procedure (including, e.g., comprehension checks):

  • Explicitly reported in the original paper

  • Not explicitly reported in the original paper, but inferable from other information present in the original paper

  • Not inferable from information present in the original paper.

*Overall, would you say that the original paper contained all the information necessary to properly conduct the replication (YES/NO).

*If NO, explain what information was lacking: ….

Power analysis and required sample size:

(Please, describe briefly the power analysis you conducted to determine the minimum required sample size. If the original effect is a null effect, just describe the required sample size you obtained by doubling the original sample size.)

Projected sample size:

(Please, describe the actual sample size you plan to use in the replication.)

Appendix 3. Post-replication form

Reference of the paper: ….

Replication team: ….

Methods

Power analysis and required sample size:

Please, describe briefly the power analysis you conducted to determine the minimum required sample size. If the original effect is a null effect, just describe the required sample size you obtained by doubling the original sample size.)

Actual sample size and population:

(Describe the number of participants you actually recruited, and the nature of the population they are drawn from. Indicate whether the number of participants you actually recruited matched the one you planned on the OSF pre-registration. Describe briefly any difference between the population you drew your sample from and the population the original study drew its sample from.)

Materials and Procedure:

(Describe the procedure you employed for the replication, like you would in the Methods section of a paper. At the end, indicate all important differences between the original study and replication, e.g. language,)

Results

Data analysis - Target effect:

(Focusing on the effect you singled out as the target effect for replication, describe the results you obtained. Then describe the statistical analyses you performed, detailing the effect size, the significance of the effect and, when applicable, the confidence interval.)

Data analysis - Other effects:

(If the original study included other effects and you performed the corresponding analyses, please, describe them in this section.)

Data analysis - Exploratory Analysis:

(If you conducted additional analyses that were absent from the original study, feel free to report them here. Just indicate whether they were planned in the OSF pre-registration, or exploratory.)

Discussion

Success assessment:

(Did you succeed in replicating the original result? If applicable, does the original team agree with you?)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Cova, F., Strickland, B., Abatista, A. et al. Estimating the Reproducibility of Experimental Philosophy. Rev.Phil.Psych. (2018). https://doi.org/10.1007/s13164-018-0400-9

Download citation