Skip to main content

Applying Occam’s razor in modeling cognition: A Bayesian approach

Abstract

In mathematical modeling of cognition, it is important to have well-justified criteria for choosing among differing explanations (i.e., models) of observed data. This paper introduces a Bayesian model selection approach that formalizes Occam’s razor, choosing the simplest model that describes the data well. The choice of a model is carried out by taking into account not only the traditional model selection criteria (i.e., a model’s fit to the data and the number of parameters) but also the extension of the parameter space, and, most importantly, the functional form of the model (i.e., the way in which the parameters are combined in the model’s equation). An advantage of the approach is that it can be applied to the comparison of non-nested models as well as nested ones. Application examples are presented and implications of the results for evaluating models of cognition are discussed.

References

  • Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrox & F. Caski (Eds.),Second International Symposium on Information Theory (p. 267). Budapest: Akademiai Kiado.

    Google Scholar 

  • Akaike, H. (1983). Information measures and model selection.Bulletin of the International Statistical Institute,50, 277–290.

    Google Scholar 

  • Allan, L. G. (1980). A note on measurement of contingency between two binary variables in judgment tasks.Bulletin of the Psychonomic Society,15, 147–149.

    Google Scholar 

  • Allan, L. G. (1993). Human contingency judgments: Rule based or associativity?Psychological Bulletin,114, 435–448.

    Article  PubMed  Google Scholar 

  • Anderson, J. R. (1990).The adaptive character of thought. Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Anderson, J. R., &Sheu, C.-F. (1995). Causal inferences as perceptual judgments.Memory & Cognition,23, 510–524.

    Article  Google Scholar 

  • Anderson, N. H. (1981).Foundations of information integration theory. New York: Academic Press.

    Google Scholar 

  • Ashby, F. G. (1992). Multidimensional models of categorization. In F. G. Ashby (Ed.),Multidimensional models of perception and cognition (pp. 449–483). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Ashby, F. G., &Gott, R. E. (1988). Decision rules in the perception and categorization of multidimensional stimuli.Journal of Experimental Psychology: Learning, Memory, & Cognition,14, 33–53.

    Article  Google Scholar 

  • Ashby, F. G., &Townsend, J. T. (1986). Varieties of perceptual independence.Psychological Review,93, 154–179.

    Article  PubMed  Google Scholar 

  • Balakrishnan, N., &Cohen, A. C. (1991).Order statistics and inference: Estimation methods. New York: Academic Press.

    Google Scholar 

  • Bamber, D., &van Santen, J. P. H. (1985). How many parameters can a model have and still be testable?Journal of Mathematical Psychology,29, 443–473.

    Article  Google Scholar 

  • Berger, J. O. (1985).Statistical decision theory and Bayesian analysis (2nd ed.). New York: Springer-Verlag.

    Google Scholar 

  • Berger, J. O., &Perrichi, L. R. (1996). The intrinsic Bayes factor for model selection.Journal of the American Statistical Association,91, 109–122.

    Article  Google Scholar 

  • Bickel, P. J., &Doksum, K. A. (1977).Mathematical statistics. Oakland, CA: Holden-Day.

    Google Scholar 

  • Bozdogan, H. (1987). Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions.Psychometrika,52, 345–370.

    Article  Google Scholar 

  • Bretthorst, G. L. (1989). Bayesian model selection: Examples relevant to NMR. In J. Skilling (Ed.),Maximum entropy and Bayesian methods (pp. 377–388). Amsterdam: Kluwer.

    Google Scholar 

  • Browne, M. W., &Cudeck, R. C. (1992). Alternative ways of assessing model fit.Sociological Methods & Research,21, 230–258.

    Article  Google Scholar 

  • Busemeyer, J. R., &Townsend, J. T. (1993). Decision field theory: A dynamic-cognitive approach to decision making in an uncertain environment.Psychological Review,100, 432–459.

    Article  PubMed  Google Scholar 

  • Carlin, B. P., &Chib, S. (1995). Bayesian model choice via Markov chain Monte Carlo methods.Journal of the Royal Statistical Society: Series B,3, 473–484.

    Google Scholar 

  • Chaitin, G. J. (1966). On the length of programs for computing binary sequences.Journal of the Association for Computing Machinery,13, 547–569.

    Google Scholar 

  • Collyer, C. E. (1985). Comparing strong and weak models by fitting them to computer-generated data.Perception & Psychophysics,38, 476–481.

    Google Scholar 

  • Cover, T. M., &Thomas, J. A. (1991).Elements of information theory. New York: Wiley.

    Book  Google Scholar 

  • Cryer, J. D. (1986).Time series analysis. Boston: PWS-Kent.

    Google Scholar 

  • Cudeck, R., &Henly, S. J. (1991). Model selection in covariance structures analysis and the “problem” of sample size: A clarification.Psychological Bulletin,109, 512–519.

    Article  PubMed  Google Scholar 

  • Cutting, J. E., Bruno, N., Brady, N. P., &Moore, C. (1992). Selectivity, scope, and simplicity of models: A lesson from fitting judgments of perceived depth.Journal of Experimental Psychology: General,121, 364–381.

    Article  Google Scholar 

  • De Bruijn, N. G. (1958).Asymptotic methods in analysis. Amsterdam: North-Holland.

    Google Scholar 

  • Gelfand, A. E., &Dey, D. K. (1994). Bayesian model choice: Asymptotics and exact calculations.Journal of the Royal Statistical Society: Series B,56, 501–514.

    Google Scholar 

  • Gelfand, A. E., &Smith, A. E. (1990). Sampling-based approaches to calculating marginal densities.Journal of the American Statistical Association,85, 398–409.

    Article  Google Scholar 

  • Geman, S., &Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images.IEEE Transactions on Pattern Analysis & Machine Intelligence,6, 721–741.

    Article  Google Scholar 

  • Gillund, G., &Shiffrin, R. M. (1984). A retrieval model for both recognition and recall.Psychological Review,91, 1–67.

    Article  PubMed  Google Scholar 

  • Green, D. M., &Swets, J. A. (1966).Signal detection theory and psychophysics. New York: Wiley.

    Google Scholar 

  • Gregory, P. C., &Loredo, T. J. (1992). A new method for the detection of a periodic signal of unknown shape and period.Astrophysical Journal,398, 146–168.

    Article  Google Scholar 

  • Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chain and their applications.Biometrika,57, 97–109.

    Article  Google Scholar 

  • Hintzman, D. L. (1986). “Schema abstraction” in a multiple-trace memory model.Psychological Review,93, 411–428.

    Article  Google Scholar 

  • Hintzman, D. L. (1988). Judgments of frequency and recognition in a multiple-trace memory model.Psychological Review,84, 260–278.

    Google Scholar 

  • Jacobs, A. M., &Grainger, J. (1994). Models of visual word recognition: Sampling the state of the art.Journal of Experimental Psychology: Human Perception & Performance,20, 1311–1334.

    Article  Google Scholar 

  • Jaynes, E. T. (1957). Information theory and statistical mechanics.Physical Review,106, 620–630;108, 171–190.

    Article  Google Scholar 

  • Jeffreys, H. (1961).Theory of probability (3rd ed.). New York: Oxford University Press.

    Google Scholar 

  • Jeffreys, W. H., &Berger, J. O. (1992). Ockham’s razor and Bayesian analysis.American Scientist,80, 64–72.

    Google Scholar 

  • Kapur, J. N., &Kesavan, H. K. (1992).Entropy optimization principles with applications. New York: Academic Press.

    Google Scholar 

  • Kass, R. E., &Raftery, A. E. (1995). Bayes factors.Journal of the American Statistical Association,90, 773–795.

    Article  Google Scholar 

  • Kolmogorov, A. N. (1968). Logical basis for information theory and probability theory.IEEE Transactions on Information Theory,14, 662–664.

    Article  Google Scholar 

  • Kruschke, J. (1992). ALCOVE: An exemplar-based connectionist model of category learning.Psychological Review,99, 22–44.

    Article  PubMed  Google Scholar 

  • Kullback, S., &Leibler, R. A. (1951). On information and sufficiency.Annals of Mathematical Statistics,22, 79–86.

    Article  Google Scholar 

  • Le, N. D., &Raftery, A. E. (1996). Robust Bayesian model selection for autoregressive processes with additive outliers.Journal of the American Statistical Association,91, 123–131.

    Article  Google Scholar 

  • Li, M., &Vitanyi, P. (1993).An introduction to Kolmogorov complexity and its applications. New York: Springer-Verlag.

    Google Scholar 

  • MacKay, D. J. C. (1992).Bayesian methods for adaptive models. Unpublished doctoral dissertation, California Institute of Technology, Pasadena.

  • Maddox, W. T., &Ashby, F. G. (1993). Comparing decision bound and exemplar models of categorization.Perception & Psychophysics,53, 49–70.

    Google Scholar 

  • Marquardt, D. (1963). An algorithm for least-squares estimation of nonlinear parameters.SIAM Journal of Applied Mathematics,11, 431–441.

    Article  Google Scholar 

  • Massaro, D. W., &Cohen, M. M. (1993). The paradigm and the fuzzy logical model of perception are alive and well.Journal of Experimental Psychology: General,122, 115–124.

    Article  Google Scholar 

  • Massaro, D. W., &Friedman, D. (1990). Models of integration given multiple sources of information.Psychological Review,97, 225–252.

    Article  PubMed  Google Scholar 

  • Medin, D. L., &Schaffer, M. M. (1978). Context theory of classification learning.Psychological Review,85, 207–238.

    Article  Google Scholar 

  • Metcalfe-Eich, J. (1982). A complete holographic associative recall model.Psychological Review,89, 627–661.

    Article  Google Scholar 

  • Murdock, B. B., Jr. (1982). A theory for the storage and retrieval of item and associative information.Psychological Review,89, 609–626.

    Article  Google Scholar 

  • Nosofsky, R. M. (1986). Attention, similarity, and the identification-categorization relationship.Journal of Experimental Psychology: General,115, 39–57.

    Article  Google Scholar 

  • Oden, G. C., &Massaro, D. W. (1978). Integration of featural information in speech perception.Psychological Review,85, 172–191.

    Article  PubMed  Google Scholar 

  • O’Hagan, A. (1995). Fractional Bayes factors for model comparison.Journal of the Royal Statistical Society: Series B,57, 99–138.

    Google Scholar 

  • Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S. Long (Eds.),Testing structural equation models (pp. 163–180). Thousand Oaks, CA: Sage.

    Google Scholar 

  • Raftery, A. E. (1994).Approximate Bayes factors and accounting for model uncertainty in generalized linear models (Tech. Rep. 255). Seattle: University of Washington, Department of Statistics.

    Google Scholar 

  • Raftery, A. E., &Lewis, S. (1991). How many iterations in the Gibbs sampler?Bayesian Statistics,4, 763–773.

    Google Scholar 

  • Reed, S. K. (1972). Pattern recognition and categorization.Cognitive Pyschology,3, 382–407.

    Article  Google Scholar 

  • Rissanen, J. (1986). Stochastic complexity and modeling.Annals of Statistics,14, 1080–1100.

    Article  Google Scholar 

  • Rissanen, J. (1990). Complexity of models. In W. H. Zurek (Ed.),Complexity, entropy, and the physics of information (pp. 117–125). Reading, MA: Addison-Wesley.

    Google Scholar 

  • Roberts, F. S. (1979).Measurement theory. Reading, MA: Addison-Wesley.

    Google Scholar 

  • Schustack, M. W., &Sternberg, R. J. (1981). Evaluation of evidence in causal inference.Journal of Experimental Psychology: General,110, 101–120.

    Article  Google Scholar 

  • Schwarz, G. (1978). Estimating the dimension of a model.Annals of Statistics,6, 461–464.

    Article  Google Scholar 

  • Smith, A. F. M. (1991). Bayesian computational methods.Philosophical Transactions of the Royal Society of London: Series A,337, 369–386.

    Article  Google Scholar 

  • Smith, A. F. M., &Roberts, G. O. (1993). Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods.Journal of the Royal Statistical Society: Series B,55, 3–23.

    Google Scholar 

  • Solomonoff, R. J. (1964). A formal theory of inductive inference.Information Control,7, 1–22, 224-254.

    Article  Google Scholar 

  • Steiger, J. H. (1990). Structural model evaulation and modification: An interval estimation approach.Multivariate Behavioral Research,25, 173–180.

    Article  Google Scholar 

  • Steiger, J. H., &Lind, J. C. (1980, November).Statistically based tests for the number of common factors. Paper presented at the annual meeting of the Psychometric Society, Iowa City.

  • Takane, Y., &Shibayama, T. (1992). Structure in stimulus identification data. In F. G. Ashby (Ed.),Multidimensional models of perception and cognition (pp. 335–362). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Thisted, R. A. (1988).Elements of statistical computing: Numerical computation. New York: Chapman & Hall.

    Google Scholar 

  • Tierney, L., &Kadane, J. B. (1986). Accurate approximations for posterior moments and marginal densities.Journal of the American Statistical Association,81, 82–86.

    Article  Google Scholar 

  • Townsend, J. T. (1975). The mind-body equation revisited. In C. Cheng (Ed.),Philosophical aspects of the mind-body problem (pp. 200–218). Honolulu: Honolulu University Press.

    Google Scholar 

  • Tribus, M. (1969).The principle of maximum entropy. Elmsford, NY: Pergamon.

    Google Scholar 

  • Van Zandt, T., &Ratcliff, R. (1995). Statistical mimicking of reaction time data: Single-process models, parameter variability, and mixtures.Psychonomic Bulletin & Review,2, 20–54.

    Google Scholar 

  • Wakefield, J. C., Smith, A. F. M., Racine-Poon, A., &Gelfand, A. E. (1994). Bayesian analysis of linear and non-linear population models by using the Gibbs sampler.Applied Statistics,43, 201–221.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to In Jae Myung.

Additional information

A portion of this work was presented at the 27th annual meeting of the Society for Mathematical Psychology held at the University of California, Irvine, in August 1995. Many people provided very useful feedback on earlier versions of this paper. They include Greg Ashby, Michael Browne, Jerry Busemeyer, Dan Friedman, Lester Krueger, Duncan Luce, Robert MacCallum, Dominic Massaro, Richard Schweickert, James Townsend, Michael Wenger, and Patricia van Zandt. Greg Ashby and Lester Krueger were especially helpful in sharpening our thinking on model complexity. This research was supported in part by Ohio Supercomputer Center Grant PAS887-1.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Myung, I.J., Pitt, M.A. Applying Occam’s razor in modeling cognition: A Bayesian approach. Psychonomic Bulletin & Review 4, 79–95 (1997). https://doi.org/10.3758/BF03210778

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.3758/BF03210778

Keywords

  • Bayesian Information Criterion
  • Bayesian Method
  • Marginal Likelihood
  • Bayesian Model Selection
  • Journal Ofthe American Statistical Association