Discussions of model selection in the psychological literature typically frame the issues as a question of statistical inference, with the goal being to determine which model makes the best predictions about data. Within this setting, advocates of leave-one-out cross-validation and Bayes factors disagree on precisely which prediction problem model selection questions should aim to answer. In this comment, I discuss some of these issues from a scientific perspective. What goal does model selection serve when all models are known to be systematically wrong? How might “toy problems” tell a misleading story? How does the scientific goal of explanation align with (or differ from) traditional statistical concerns? I do not offer answers to these questions, but hope to highlight the reasons why psychological researchers cannot avoid asking them.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
While there are many people who assert that “a single failure is enough to falsify a theory”, I confess I have not yet encountered anyone willing to truly follow this principle in real life.
For instance, Gelman et al. (2003, pp. 586–587) present an analogous convergence result for the posterior distribution P(𝜃|x) within a single model . The result generalises to the Bayes factor by noting that the Bayes factor identifies a model with the prior predictive distribution P(x|). Substituting P(x|) for the role of P(x|𝜃) in their derivation produces the necessary result.
For the purposes of full disclosure, I should note that the precise situation from Lee and Navarro (2002) is quite a bit more complex than this description implies, and there are several details about how we had to adapt a model from one context to be applicable to the other have been omitted.
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B.N., Petrov, & F., Csaki (Eds.) Second international symposium on information theory (pp. 267–281). Budapest: Akademiai Kiado.
Bernardo, J.M., & Smith, A.F.M. (2000). Bayesian theory, 2nd Edn. New York: John Wiley & Sons.
Box, G.E.P. (1976). Science and statistics. Journal of the American Statistical Association, 71, 791–799.
Browne, M. (2000). Cross-validation methods. Journal of Mathematical Psychology, 44, 108–132.
Devezer, B., Nardin, L.G., Baumgaertner, B., Buzbas, E. (under review). Discovery of truth is not implied by reproducibility but facilitated by innovation and epistemic diversity in a model-centric framework. Manuscript submitted for publication. arXiv:1803.10118.
Edwards, W., Lindman, H., Savage, L.J. (1963). Bayesian statistical inference for psychological research. Psychological Review, 70, 193–242.
Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B. (2003). Bayesian Data Analysis, 2nd Edn. Boca Raton: Chapman & Hall/CRC.
Grünwald, P. (2007). The minimum description length principle. Cambridge: MIT Press.
Gronau, Q., & Wagenmakers, E.J. (2018). Limitations of Bayesian leave-one-out cross-validation for model selection. Computational Brain and Behavior.
Hayes, B.K., Banner, S., Forrester, S., Navarro, D.J. (under review). Sampling frames and inductive inference with censored evidence. Manuscript submitted for publication. https://doi.org/10.17605/OSF.IO/2M83V.
Kamin, L.J. (1969). Predictability, surprise, attention, and conditioning. In Campbell, B.A., & Church, R.M. (Eds.) Punishment and Aversive Behavior (pp. 279–296). New York: Appleton-Century-Crofts.
Kruschke, J.K. (1992). ALCOVE: An exemplar-based connectionist model of category learning. Psychological Review, 99(1), 22–44.
Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B. (2015). Human-level concept learning through probabilistic program induction. Science, 350(6266), 1332–1338.
Lattal, K.M., & Nakajima, S. (1998). Overexpectation in appetitive Pavlovian and instrumental conditioning. Animal Learning & Behavior, 26(3), 351–360.
Lee, M.D. (2001a). On the complexity of additive clustering models. Journal of Mathematical Psychology, 45, 131–148.
Lee, M.D. (2001b). Determining the dimensionality of multidimensional scaling models for cognitive modeling. Journal of Mathematical Psychology, 45, 149–166.
Lee, M.D., & Navarro, D.J. (2002). Extending the ALCOVE model of category learning to featural stimulus domains. Psychonomic Bulletin & Review, 9, 43–58.
Navarro, D.J. (2004). A note on the applied use of MDL approximations. Neural Computation, 16, 1763–1768.
Navarro, D.J., Dry, M.J., Lee, M.D. (2012). Sampling assumptions in inductive generalization. Cognitive Science, 36, 187–223.
Navarro, D.J., Pitt, M.A., Myung, I.J. (2004). Assessing the distinguishability of models and the informativeness of data. Cognitive Psychology, 49, 47–84.
Pavlov, I. (1927). Conditioned reflexes. London: Oxford University Press.
Pitt, M.A., Myung, I.J., Zhang, S. (2002). Toward a method of selecting among computational models of cognition. Psychological Review, 109, 472–491.
Pitt, M.A., Kim, W., Navarro, D.J., Myung, J.I. (2006). Global model analysis by parameter space partitioning. Psychological Review, 113, 57–83.
Rescorla, R.A. (1968). Probability of shock in the presence and absence of CS in fear conditioning. Journal of Comparative and Physiological Psychology, 66, 1–5.
Rescorla, R.A. (1969). Conditioned inhibition of fear resulting from negative CS-US contingencies. Journal of Comparative and Physiological Psychology, 67, 504–509.
Rescorla, R.A. (1971). Variations in the effectiveness of reinforcement following prior inhibitory conditioning. Learning and Motivation, 2, 113–123.
Rescorla, R.A., & Wagner, A.R. (1972). A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. In A.H., Black, & W.F., Prokasy (Eds.) Classical conditioning II: current research and theory (pp. 64–99). New York: Appleton-Century-Crofts.
Ransom, K., Perfors, A., Navarro, D.J. (2016). Leaping to conclusions: why premise relevance affects argument strength. Cognitive Science, 40, 1775–1796.
Rissanen, J. (1996). Fisher information and stochastic complexity. IEEE Transactions on Information Theory, 42, 40–47.
Schultz, W., Dayan, P., Montague, P.R. (1997). A neural substrate of prediction and reward. Science, 275(5306), 1593–1599.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461–464.
Shao, J. (1993). Linear model selection by cross-validation. Journal of the American Statistical Association, 88, 486–494.
Shiffrin, R. M., Borner, K., Stigler, S.M. (2018). Scientific progress despite irreproducibility: a seeming paradox. Proceedings of the National Academy of Sciences, USA, 115, 2632–2639.
Tenenbaum, J.B., & Griffiths, T.L. (2001). Generalization, similarity, and Bayesian inference. Behavioral and Brain Sciences, 24, 629–640.
Vehtari, A., Simpson, D., Yao, Y., Gelman, A. (2018). Limitations of Bayesian leave-one-out cross-validation. Computational Brain and Behavior.
Vehtari, A., & Ojanen, J. (2012). A survey of Bayesian predictive methods for model assessment, selection and comparison. Statistics Surveys, 6, 142–228.
Voorspoels, W., Navarro, D.J., Perfors, A., Ransom, K., Storms, G. (2015). How do people learn from negative evidence? Non-monotonic generalizations and sampling assumptions in inductive reasoning. Cognitive Psychology, 81, 1–25.
Wagenmakers, E. J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14(5), 779–804.
Wickelgren, W.A. (1972). Trace resistance and decay of long-term memory. Journal of Mathematical Psychology, 9, 418–455.
I am grateful to many people for helpful conversations and comments that shaped this paper, most notably Nancy Briggs, Berna Devezer, Chris Donkin, Olivia Guest, Daniel Simpson, Iris van Rooij and Fred Westbrook.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Navarro, D.J. Between the Devil and the Deep Blue Sea: Tensions Between Scientific Judgement and Statistical Model Selection. Comput Brain Behav 2, 28–34 (2019). https://doi.org/10.1007/s42113-018-0019-z
- Model selection