Addressing the theory crisis in psychology

  • Klaus OberauerEmail author
  • Stephan Lewandowsky
Theoretical Review


A worrying number of psychological findings are not replicable. Diagnoses of the causes of this “replication crisis,” and recommendations to address it, have nearly exclusively focused on methods of data collection, analysis, and reporting. We argue that a further cause of poor replicability is the often weak logical link between theories and their empirical tests. We propose a distinction between discovery-oriented and theory-testing research. In discovery-oriented research, theories do not strongly imply hypotheses by which they can be tested, but rather define a search space for the discovery of effects that would support them. Failures to find these effects do not question the theory. This endeavor necessarily engenders a high risk of Type I errors—that is, publication of findings that will not replicate. Theory-testing research, by contrast, relies on theories that strongly imply hypotheses, such that disconfirmation of the hypothesis provides evidence against the theory. Theory-testing research engenders a smaller risk of Type I errors. A strong link between theories and hypotheses is best achieved by formalizing theories as computational models. We critically revisit recommendations for addressing the “replication crisis,” including the proposal to distinguish exploratory from confirmatory research, and the preregistration of hypotheses and analysis plans.


Replication Scientific inference Hypothesis testing Computational modeling Preregistration 



  1. Asendorpf, J. B., Conner, M., De Fruyt, F., De Houwer, J., Denissen, J. J. A., Fiedler, K., … Wicherts, J. M. (2013). Recommendations for increasing replicability in psychology. European Journal of Personality, 27, 108–119. CrossRefGoogle Scholar
  2. Barnes, E. C. (2008). The paradox of predictivism. Cambridge: Cambrdige University Press.CrossRefGoogle Scholar
  3. Bays, P. M., & Husain, M. (2008). Dynamic shifts of limited working memory resources in human vision. Science, 321, 851–854.CrossRefGoogle Scholar
  4. Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E.-J., Berk, R., … Johnson, V. E. (2018). Redefine statistical significance. Nature Human Behaviour, 2, 6–10.
  5. Bröder, A., & Schütz, J. (2009). Recognition ROCs are curvilinear—Or are they? On premature arguments against the two-high-threshold model of recognition. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35, 587–606.Google Scholar
  6. Brown, G. D. A., Neath, I., & Chater, N. (2007). A temporal ratio model of memory. Psychological Review, 114, 539–576.CrossRefGoogle Scholar
  7. Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafò, M. R. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14, 365. CrossRefGoogle Scholar
  8. Conway, A. R. A., Kane, M. J., & Engle, R. W. (2003). Working memory capacity and its relation to general intelligence. Trends in Cognitive Sciences, 7, 547–552.CrossRefGoogle Scholar
  9. de Groot, A. D. (2014). The meaning of “significance” for different types of research (E.-J. Wagenmakers, D. Borsboom, J. Verhagen, R. Kievit, M. Bakker, A. Cramer, D. Matzke, D. Mellenbergh, & H. L. J. van der Maas, Trans. and annotated). Acta Psychologica, 148, 188–194. (Original work published 1956)CrossRefGoogle Scholar
  10. Dienes, Z. (2011). Bayesian versus orthodox statistics: Which side are you on? Perspectives on Psychological Science, 6, 274–290.CrossRefGoogle Scholar
  11. Farrell, S., & Lewandowsky, S. (2018). Computational modeling of cognition and behavior. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  12. Ferguson, C. J., & Heene, M. (2012). A vast graveyard of undead theories: Publication bias and psychological science’s aversion to the null. Perspectives on Psychological Science, 7, 555–561.
  13. Fiedler, K. (2017). What constitutes strong psychological science? The (neglected) role of diagnosticity and a priori theorizing. Perspectives on Cognitive Science, 12, 46–61. Google Scholar
  14. Fiedler, K., Kutzner, F., & Krueger, J. I. (2012). The long way from α-error control to validity proper: Problems with a short-sighted false-positive debate. Perspectives on Psychological Science, 7, 661–669. CrossRefGoogle Scholar
  15. Freund, A. M., & Isaacowitz, D. M. (2013). Beyond age comparisons: A plea for the use of a modified Brunswikian approach to experimental designs in the study of adult development and aging. Human Development, 56, 351–371.Google Scholar
  16. Gelman, A., & Loken, E. (2014). The statistical crisis in science. American Scientist, 102. Retrieved from
  17. Gigerenzer, G., & Hoffrage, U. (1995). How to improve Bayesian reasoning without instruction: Frequency formats. Psychological Review, 102, 684–704.CrossRefGoogle Scholar
  18. Gigerenzer, G., Hoffrage, U., & Kleinbölting, H. (1991). Probabilistic mental models: A Brunswikian theory of confidence. Psychological Review, 98, 506–528.CrossRefGoogle Scholar
  19. Glymour, C. (2003). Learning, prediction and causal Bayes nets. Trends in Cognitive Sciences, 7, 43–48.CrossRefGoogle Scholar
  20. Ioannidis, J. P. A. (2005). Why most published research findings are false. PLOS Medicine, 2, 696–701.Google Scholar
  21. Jostmann, N. B., Lakens, D., & Schubert, T. W. (2009). Weight as an embodiment of importance. Psychological Science, 20, 1169–1174.
  22. Kary, A., Taylor, R., & Donkin, C. (2016). Using Bayes factors to test the predictions of models: A case study in visual working memory. Journal of Mathematical Psychology, 72, 210–219. CrossRefGoogle Scholar
  23. Kellen, D., & Klauer, K. C. (2014). Discrete-state and continuous models of recognition memory: Testing core properties under minimal assumptions. Journal of Experimental Psychology: Learning, Memory, and Cognition, 40, 1795–1804. Google Scholar
  24. Kellen, D., & Klauer, K. C. (2015). Signal detection and threshold modeling of confidence-rating ROCs: A critical test with minimal assumptions. Psychological Review, 122, 542–557. CrossRefGoogle Scholar
  25. Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social Psychology Review, 2, 196–217.CrossRefGoogle Scholar
  26. Körner, A., Topolinski, S., & Strack, F. (2015). Routes to embodiment. Frontiers in Psychology, 6, 940.
  27. Ladyman, J. (2002). Understanding philosophy of science. Oxon: Routledge.CrossRefGoogle Scholar
  28. Lee, M. D., & Vanpaemel, W. (2018). Determining informative priors for cognitive models. Psychonomic Bulletin & Review, 25, 114–127. CrossRefGoogle Scholar
  29. Lewandowsky, S., & Bishop, D. (2016). Don’t let transparency damage science. Nature, 529, 459–461.CrossRefGoogle Scholar
  30. Lewandowsky, S., Duncan, M., & Brown, G. D. A. (2004). Time does not cause forgetting in short-term serial recall. Psychonomic Bulletin & Review, 11, 771–790.CrossRefGoogle Scholar
  31. Lewandowsky, S., & Oberauer, K. (2015). Rehearsal in serial recall: An unworkable solution to the non-existent problem of decay. Psychological Review, 122, 674–699. CrossRefGoogle Scholar
  32. Lynott, D., Corker, K. S., Wortman, J., Connell, L., Donnellan, M. B., Lucas, R. E., & O’Brien, K. (2014). Replication of “Experiencing Physical Warmth Promotes Interpersonal Warmth”. Social Psychology, 45, 216–222.CrossRefGoogle Scholar
  33. MacLeod, C. M. (1991). Half a century of research on the Stroop effect: An integrative review. Psychological Bulletin, 109, 163–203.CrossRefGoogle Scholar
  34. Marsman, M., Schönbrodt, F. D., Morey, R. D., Yao, Y., Gelman, A., & Wagenmakers, E.-J. (2017). A Bayesian bird’s eye view of “Replications of Important Results in Social Psychology.” Royal Society Open Science, 4. Retrieved from
  35. Miller, J. (2009). What is the probability of replicating a statistically significant effect? Psychonomic Bulletin & Review, 16, 617–640. CrossRefGoogle Scholar
  36. Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., Percie du Sert, N., … Ioannidis, J. P. A. (2017). A manifesto for reproducible science. Nature Human Behavior, 1.
  37. Muthukrishna, M., & Henrich, J. (2019). A problem in theory. Nature Human Behavior, 3, 221–229.
  38. Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The preregistration revolution. Proceedings of the National Academy of Sciences, 115, 2600–2606
  39. O’Keefe, D. J. (2003). Colloquy: Should familywise alpha be adjusted? Human Communication Research, 29, 431–447.Google Scholar
  40. O’Reilly, R. C., & Munakata, Y. (2000). Computational explorations in cognitive neuroscience: Understanding the mind by simulating the brain. Cambridge: MIT Press.CrossRefGoogle Scholar
  41. Oberauer, K., & Lewandowsky, S. (2014). Further evidence against decay in working memory. Journal of Memory and Language, 73, 15–30. CrossRefGoogle Scholar
  42. Oberauer, K., & Lin, H.-Y. (2017). An interference model of visual working memory. Psychological Review, 124, 21–59.CrossRefGoogle Scholar
  43. Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349.
  44. Palmer, J. (1990). Attentional limits on the perception and memory of visual information. Journal of Experimental Psychology: Human Perception and Performance, 16, 332–350.Google Scholar
  45. Pashler, H., & Harris, C. R. (2012). Is the replication crisis overblown? Three arguments examined. Perspectives on Psychological Science, 7, 531–536. CrossRefGoogle Scholar
  46. Pitt, M. A., Myung, I. J., & Zhang, S. (2002). Toward a method of selecting among computational models of cognition. Psychological Review, 109, 472–491.CrossRefGoogle Scholar
  47. Rubin, M. (2017a). Do p values lose their meaning in exploratory analyses? It depends how you define the familywise error rate. Review of General Psychology, 21, 269–275.
  48. Rubin, M. (2017b). When does HARKing hurt? Identifying when different types of undisclosed post hoc hypothesizing harm scientific progress. Review of General Psychology, 21, 308–320. CrossRefGoogle Scholar
  49. Sederberg, P. B., Howard, M. C., & Kahana, M. J. (2008). A context-based theory of recency and contiguity in free recall. Psychological Review, 115, 893–912.CrossRefGoogle Scholar
  50. Sewell, D. K., Lilburn, S. D., & Smith, P. L. (2014). An information capacity limitation of visual short-term memory. Journal of Experimental Psychology: Human Perception and Performance, 40, 2214–2242. Google Scholar
  51. Shiffrin, R. M., Lee, M. D., Kim, W., & Wagenmakers, E.-J. (2008). A survey of model evaluation approaches with a tutorial on hierarchical Bayesian methods. Cognitive Science, 32, 1248–1284. CrossRefGoogle Scholar
  52. Silberzahn, R., Uhlmann, E. L., Martin, D. P., Anselmi, P., Aust, F., Awtrey, E., … Nosek, B. A. (2018). Many analysts, one data set: Making transparent how variations in analytic choices affect results. Advances in Methods and Practices in Psychological Science, 1, 337–356.
  53. Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 1359–1366.CrossRefGoogle Scholar
  54. Smith, P. L., Corbett, E. A., Lilburn, S. D., & Kyllingsbæk, S. (2018). The power law of visual working memory characterizes attention engagement. Psychological Review, 125, 435–451. CrossRefGoogle Scholar
  55. Smith, P. L., Lilburn, S. D., Corbett, E. A., Sewell, D. K., & Kyllingsbæk, S. (2016). The attention-weighted sample-size model of visual short-term memory: Attention capture predicts resource allocation and memory load. Cognitive Psychology, 89, 71–105. CrossRefGoogle Scholar
  56. Snyder, L. J. (1994). Is evidence historical? In P. Achinstein & L. H. Snyder (Eds.), Scientific methods: Conceptual and historical problems (pp. 95–117). Malabar: Krieger.Google Scholar
  57. Steegen, S., Tuerlinckx, F., Gelman, A., & Vanpaemel, W. (2016). Increasing transparency through a multiverse analysis. Perspectives on Psychological Science, 11, 702–712. CrossRefGoogle Scholar
  58. Thabane, L., Mbuagbaw, L., Zhang, S., Samaan, Z., Marcucci, M., Ye, C., … Goldsmith, C. H. (2013). A tutorial on sensitivity analyses in clinical trials: The what, why, when and how. BMC Medical Research Methodology, 13, 92.
  59. Topolinski, S., & Sparenberg, P. (2012). Turning the hands of time: Clockwise movements increase preference for novelty. Social Psychology and Personality Science, 3, 208–214. CrossRefGoogle Scholar
  60. van den Berg, R., & Ma, W. J. (2018). A resource-rational theory of set size effects in human visual working memory. eLIFE, 7, e34963. CrossRefGoogle Scholar
  61. van den Berg, R., Shin, H., Chou, W.-C., George, R., & Ma, W. J. (2012). Variability in encoding precision accounts for visual short-term memory limitations. Proceedings of the National Academy of Sciences, 109, 8780–8785.CrossRefGoogle Scholar
  62. Wagenmakers, E.-J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14, 779–804.CrossRefGoogle Scholar
  63. Wagenmakers, E.-J., Beek, T., Dijkhoff, L., Gronau, Q. F., Acosta, A., Adams, R. B., … Zwaan, R. A. (2016). Registered replication report: Strack, Martin, & Stepper (1988). Perspectives on Psychological Science, 11, 917–928. CrossRefGoogle Scholar
  64. Wagenmakers, E.-J., Beek, T. F., Rotteveel, M., Gierholz, A., Matzke, D., Steingroever, H., … Pinto, Y. (2015). Turning the hands of time again: A purely confirmatory replication study and a Bayesian analysis. Frontiers in Psychology, 6, 494.
  65. Wagenmakers, E.-J., Dutilh, G., & Srafoglou, A. (2018a). The creativity-verification cycle in psychological science: New methods to combat old idols. Perspectives on Cognitive Science, 13, 418–427. Google Scholar
  66. Wagenmakers, E.-J., Marsman, M., Jamil, T., Ly, A., Verhagen, J., Love, J., … Morey, R. D. (2018b). Bayesian inference for psychology. Part I: Theoretical advantages and practical ramifications. Psychonomic Bulletin & Review, 25, 35–57.
  67. Wagenmakers, E.-J., Wetzels, R., Borsboom, D., van der Maas, H. L. J., & Kievit, R. A. (2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7, 632–638. CrossRefGoogle Scholar
  68. Williams, L. E., & Bargh, J. A. (2008). Experiencing physical warmth promotes interpersonal warmth. Science, 322, 606–607. CrossRefGoogle Scholar
  69. Wills, A. J., & Pothos, E. M. (2012). On the adequacy of current empirical evaluations of formal models of categorization. Psychological Bulletin, 138, 102–125. CrossRefGoogle Scholar
  70. Wixted, J. T. (2007). Dual-process theory and signal-detection theory of recognition memory. Psychological Review, 114, 152–176.CrossRefGoogle Scholar

Copyright information

© The Psychonomic Society, Inc. 2019

Authors and Affiliations

  1. 1.Department of Psychology–Cognitive PsychologyUniversity of ZurichZürichSwitzerland
  2. 2.University of BristolBristolUK
  3. 3.University of Western AustraliaCrawleyAustralia

Personalised recommendations