Psychonomic Bulletin & Review

, Volume 4, Issue 1, pp 79–95 | Cite as

Applying Occam’s razor in modeling cognition: A Bayesian approach

  • In Jae MyungEmail author
  • Mark A. Pitt


In mathematical modeling of cognition, it is important to have well-justified criteria for choosing among differing explanations (i.e., models) of observed data. This paper introduces a Bayesian model selection approach that formalizes Occam’s razor, choosing the simplest model that describes the data well. The choice of a model is carried out by taking into account not only the traditional model selection criteria (i.e., a model’s fit to the data and the number of parameters) but also the extension of the parameter space, and, most importantly, the functional form of the model (i.e., the way in which the parameters are combined in the model’s equation). An advantage of the approach is that it can be applied to the comparison of non-nested models as well as nested ones. Application examples are presented and implications of the results for evaluating models of cognition are discussed.


Bayesian Information Criterion Bayesian Method Marginal Likelihood Bayesian Model Selection Journal Ofthe American Statistical Association 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrox & F. Caski (Eds.),Second International Symposium on Information Theory (p. 267). Budapest: Akademiai Kiado.Google Scholar
  2. Akaike, H. (1983). Information measures and model selection.Bulletin of the International Statistical Institute,50, 277–290.Google Scholar
  3. Allan, L. G. (1980). A note on measurement of contingency between two binary variables in judgment tasks.Bulletin of the Psychonomic Society,15, 147–149.Google Scholar
  4. Allan, L. G. (1993). Human contingency judgments: Rule based or associativity?Psychological Bulletin,114, 435–448.CrossRefPubMedGoogle Scholar
  5. Anderson, J. R. (1990).The adaptive character of thought. Hillsdale, NJ: Erlbaum.Google Scholar
  6. Anderson, J. R., &Sheu, C.-F. (1995). Causal inferences as perceptual judgments.Memory & Cognition,23, 510–524.CrossRefGoogle Scholar
  7. Anderson, N. H. (1981).Foundations of information integration theory. New York: Academic Press.Google Scholar
  8. Ashby, F. G. (1992). Multidimensional models of categorization. In F. G. Ashby (Ed.),Multidimensional models of perception and cognition (pp. 449–483). Hillsdale, NJ: Erlbaum.Google Scholar
  9. Ashby, F. G., &Gott, R. E. (1988). Decision rules in the perception and categorization of multidimensional stimuli.Journal of Experimental Psychology: Learning, Memory, & Cognition,14, 33–53.CrossRefGoogle Scholar
  10. Ashby, F. G., &Townsend, J. T. (1986). Varieties of perceptual independence.Psychological Review,93, 154–179.CrossRefPubMedGoogle Scholar
  11. Balakrishnan, N., &Cohen, A. C. (1991).Order statistics and inference: Estimation methods. New York: Academic Press.Google Scholar
  12. Bamber, D., &van Santen, J. P. H. (1985). How many parameters can a model have and still be testable?Journal of Mathematical Psychology,29, 443–473.CrossRefGoogle Scholar
  13. Berger, J. O. (1985).Statistical decision theory and Bayesian analysis (2nd ed.). New York: Springer-Verlag.Google Scholar
  14. Berger, J. O., &Perrichi, L. R. (1996). The intrinsic Bayes factor for model selection.Journal of the American Statistical Association,91, 109–122.CrossRefGoogle Scholar
  15. Bickel, P. J., &Doksum, K. A. (1977).Mathematical statistics. Oakland, CA: Holden-Day.Google Scholar
  16. Bozdogan, H. (1987). Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions.Psychometrika,52, 345–370.CrossRefGoogle Scholar
  17. Bretthorst, G. L. (1989). Bayesian model selection: Examples relevant to NMR. In J. Skilling (Ed.),Maximum entropy and Bayesian methods (pp. 377–388). Amsterdam: Kluwer.Google Scholar
  18. Browne, M. W., &Cudeck, R. C. (1992). Alternative ways of assessing model fit.Sociological Methods & Research,21, 230–258.CrossRefGoogle Scholar
  19. Busemeyer, J. R., &Townsend, J. T. (1993). Decision field theory: A dynamic-cognitive approach to decision making in an uncertain environment.Psychological Review,100, 432–459.CrossRefPubMedGoogle Scholar
  20. Carlin, B. P., &Chib, S. (1995). Bayesian model choice via Markov chain Monte Carlo methods.Journal of the Royal Statistical Society: Series B,3, 473–484.Google Scholar
  21. Chaitin, G. J. (1966). On the length of programs for computing binary sequences.Journal of the Association for Computing Machinery,13, 547–569.Google Scholar
  22. Collyer, C. E. (1985). Comparing strong and weak models by fitting them to computer-generated data.Perception & Psychophysics,38, 476–481.Google Scholar
  23. Cover, T. M., &Thomas, J. A. (1991).Elements of information theory. New York: Wiley.CrossRefGoogle Scholar
  24. Cryer, J. D. (1986).Time series analysis. Boston: PWS-Kent.Google Scholar
  25. Cudeck, R., &Henly, S. J. (1991). Model selection in covariance structures analysis and the “problem” of sample size: A clarification.Psychological Bulletin,109, 512–519.CrossRefPubMedGoogle Scholar
  26. Cutting, J. E., Bruno, N., Brady, N. P., &Moore, C. (1992). Selectivity, scope, and simplicity of models: A lesson from fitting judgments of perceived depth.Journal of Experimental Psychology: General,121, 364–381.CrossRefGoogle Scholar
  27. De Bruijn, N. G. (1958).Asymptotic methods in analysis. Amsterdam: North-Holland.Google Scholar
  28. Gelfand, A. E., &Dey, D. K. (1994). Bayesian model choice: Asymptotics and exact calculations.Journal of the Royal Statistical Society: Series B,56, 501–514.Google Scholar
  29. Gelfand, A. E., &Smith, A. E. (1990). Sampling-based approaches to calculating marginal densities.Journal of the American Statistical Association,85, 398–409.CrossRefGoogle Scholar
  30. Geman, S., &Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images.IEEE Transactions on Pattern Analysis & Machine Intelligence,6, 721–741.CrossRefGoogle Scholar
  31. Gillund, G., &Shiffrin, R. M. (1984). A retrieval model for both recognition and recall.Psychological Review,91, 1–67.CrossRefPubMedGoogle Scholar
  32. Green, D. M., &Swets, J. A. (1966).Signal detection theory and psychophysics. New York: Wiley.Google Scholar
  33. Gregory, P. C., &Loredo, T. J. (1992). A new method for the detection of a periodic signal of unknown shape and period.Astrophysical Journal,398, 146–168.CrossRefGoogle Scholar
  34. Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chain and their applications.Biometrika,57, 97–109.CrossRefGoogle Scholar
  35. Hintzman, D. L. (1986). “Schema abstraction” in a multiple-trace memory model.Psychological Review,93, 411–428.CrossRefGoogle Scholar
  36. Hintzman, D. L. (1988). Judgments of frequency and recognition in a multiple-trace memory model.Psychological Review,84, 260–278.Google Scholar
  37. Jacobs, A. M., &Grainger, J. (1994). Models of visual word recognition: Sampling the state of the art.Journal of Experimental Psychology: Human Perception & Performance,20, 1311–1334.CrossRefGoogle Scholar
  38. Jaynes, E. T. (1957). Information theory and statistical mechanics.Physical Review,106, 620–630;108, 171–190.CrossRefGoogle Scholar
  39. Jeffreys, H. (1961).Theory of probability (3rd ed.). New York: Oxford University Press.Google Scholar
  40. Jeffreys, W. H., &Berger, J. O. (1992). Ockham’s razor and Bayesian analysis.American Scientist,80, 64–72.Google Scholar
  41. Kapur, J. N., &Kesavan, H. K. (1992).Entropy optimization principles with applications. New York: Academic Press.Google Scholar
  42. Kass, R. E., &Raftery, A. E. (1995). Bayes factors.Journal of the American Statistical Association,90, 773–795.CrossRefGoogle Scholar
  43. Kolmogorov, A. N. (1968). Logical basis for information theory and probability theory.IEEE Transactions on Information Theory,14, 662–664.CrossRefGoogle Scholar
  44. Kruschke, J. (1992). ALCOVE: An exemplar-based connectionist model of category learning.Psychological Review,99, 22–44.CrossRefPubMedGoogle Scholar
  45. Kullback, S., &Leibler, R. A. (1951). On information and sufficiency.Annals of Mathematical Statistics,22, 79–86.CrossRefGoogle Scholar
  46. Le, N. D., &Raftery, A. E. (1996). Robust Bayesian model selection for autoregressive processes with additive outliers.Journal of the American Statistical Association,91, 123–131.CrossRefGoogle Scholar
  47. Li, M., &Vitanyi, P. (1993).An introduction to Kolmogorov complexity and its applications. New York: Springer-Verlag.Google Scholar
  48. MacKay, D. J. C. (1992).Bayesian methods for adaptive models. Unpublished doctoral dissertation, California Institute of Technology, Pasadena.Google Scholar
  49. Maddox, W. T., &Ashby, F. G. (1993). Comparing decision bound and exemplar models of categorization.Perception & Psychophysics,53, 49–70.Google Scholar
  50. Marquardt, D. (1963). An algorithm for least-squares estimation of nonlinear parameters.SIAM Journal of Applied Mathematics,11, 431–441.CrossRefGoogle Scholar
  51. Massaro, D. W., &Cohen, M. M. (1993). The paradigm and the fuzzy logical model of perception are alive and well.Journal of Experimental Psychology: General,122, 115–124.CrossRefGoogle Scholar
  52. Massaro, D. W., &Friedman, D. (1990). Models of integration given multiple sources of information.Psychological Review,97, 225–252.CrossRefPubMedGoogle Scholar
  53. Medin, D. L., &Schaffer, M. M. (1978). Context theory of classification learning.Psychological Review,85, 207–238.CrossRefGoogle Scholar
  54. Metcalfe-Eich, J. (1982). A complete holographic associative recall model.Psychological Review,89, 627–661.CrossRefGoogle Scholar
  55. Murdock, B. B., Jr. (1982). A theory for the storage and retrieval of item and associative information.Psychological Review,89, 609–626.CrossRefGoogle Scholar
  56. Nosofsky, R. M. (1986). Attention, similarity, and the identification-categorization relationship.Journal of Experimental Psychology: General,115, 39–57.CrossRefGoogle Scholar
  57. Oden, G. C., &Massaro, D. W. (1978). Integration of featural information in speech perception.Psychological Review,85, 172–191.CrossRefPubMedGoogle Scholar
  58. O’Hagan, A. (1995). Fractional Bayes factors for model comparison.Journal of the Royal Statistical Society: Series B,57, 99–138.Google Scholar
  59. Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S. Long (Eds.),Testing structural equation models (pp. 163–180). Thousand Oaks, CA: Sage.Google Scholar
  60. Raftery, A. E. (1994).Approximate Bayes factors and accounting for model uncertainty in generalized linear models (Tech. Rep. 255). Seattle: University of Washington, Department of Statistics.Google Scholar
  61. Raftery, A. E., &Lewis, S. (1991). How many iterations in the Gibbs sampler?Bayesian Statistics,4, 763–773.Google Scholar
  62. Reed, S. K. (1972). Pattern recognition and categorization.Cognitive Pyschology,3, 382–407.CrossRefGoogle Scholar
  63. Rissanen, J. (1986). Stochastic complexity and modeling.Annals of Statistics,14, 1080–1100.CrossRefGoogle Scholar
  64. Rissanen, J. (1990). Complexity of models. In W. H. Zurek (Ed.),Complexity, entropy, and the physics of information (pp. 117–125). Reading, MA: Addison-Wesley.Google Scholar
  65. Roberts, F. S. (1979).Measurement theory. Reading, MA: Addison-Wesley.Google Scholar
  66. Schustack, M. W., &Sternberg, R. J. (1981). Evaluation of evidence in causal inference.Journal of Experimental Psychology: General,110, 101–120.CrossRefGoogle Scholar
  67. Schwarz, G. (1978). Estimating the dimension of a model.Annals of Statistics,6, 461–464.CrossRefGoogle Scholar
  68. Smith, A. F. M. (1991). Bayesian computational methods.Philosophical Transactions of the Royal Society of London: Series A,337, 369–386.CrossRefGoogle Scholar
  69. Smith, A. F. M., &Roberts, G. O. (1993). Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods.Journal of the Royal Statistical Society: Series B,55, 3–23.Google Scholar
  70. Solomonoff, R. J. (1964). A formal theory of inductive inference.Information Control,7, 1–22, 224-254.CrossRefGoogle Scholar
  71. Steiger, J. H. (1990). Structural model evaulation and modification: An interval estimation approach.Multivariate Behavioral Research,25, 173–180.CrossRefGoogle Scholar
  72. Steiger, J. H., &Lind, J. C. (1980, November).Statistically based tests for the number of common factors. Paper presented at the annual meeting of the Psychometric Society, Iowa City.Google Scholar
  73. Takane, Y., &Shibayama, T. (1992). Structure in stimulus identification data. In F. G. Ashby (Ed.),Multidimensional models of perception and cognition (pp. 335–362). Hillsdale, NJ: Erlbaum.Google Scholar
  74. Thisted, R. A. (1988).Elements of statistical computing: Numerical computation. New York: Chapman & Hall.Google Scholar
  75. Tierney, L., &Kadane, J. B. (1986). Accurate approximations for posterior moments and marginal densities.Journal of the American Statistical Association,81, 82–86.CrossRefGoogle Scholar
  76. Townsend, J. T. (1975). The mind-body equation revisited. In C. Cheng (Ed.),Philosophical aspects of the mind-body problem (pp. 200–218). Honolulu: Honolulu University Press.Google Scholar
  77. Tribus, M. (1969).The principle of maximum entropy. Elmsford, NY: Pergamon.Google Scholar
  78. Van Zandt, T., &Ratcliff, R. (1995). Statistical mimicking of reaction time data: Single-process models, parameter variability, and mixtures.Psychonomic Bulletin & Review,2, 20–54.Google Scholar
  79. Wakefield, J. C., Smith, A. F. M., Racine-Poon, A., &Gelfand, A. E. (1994). Bayesian analysis of linear and non-linear population models by using the Gibbs sampler.Applied Statistics,43, 201–221.CrossRefGoogle Scholar

Copyright information

© Psychonomic Society, Inc 1997

Authors and Affiliations

  1. 1.Department of PsychologyOhio State UniversityColumbus

Personalised recommendations