# Applying Occam’s razor in modeling cognition: A Bayesian approach

## Abstract

In mathematical modeling of cognition, it is important to have well-justified criteria for choosing among differing explanations (i.e., models) of observed data. This paper introduces a Bayesian model selection approach that formalizes Occam’s razor, choosing the simplest model that describes the data well. The choice of a model is carried out by taking into account not only the traditional model selection criteria (i.e., a model’s fit to the data and the number of parameters) but also the extension of the parameter space, and, most importantly, the functional form of the model (i.e., the way in which the parameters are combined in the model’s equation). An advantage of the approach is that it can be applied to the comparison of non-nested models as well as nested ones. Application examples are presented and implications of the results for evaluating models of cognition are discussed.

## Keywords

Bayesian Information Criterion Bayesian Method Marginal Likelihood Bayesian Model Selection Journal Ofthe American Statistical Association## References

- Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrox & F. Caski (Eds.),
*Second International Symposium on Information Theory*(p. 267). Budapest: Akademiai Kiado.Google Scholar - Akaike, H. (1983). Information measures and model selection.
*Bulletin of the International Statistical Institute*,**50**, 277–290.Google Scholar - Allan, L. G. (1980). A note on measurement of contingency between two binary variables in judgment tasks.
*Bulletin of the Psychonomic Society*,**15**, 147–149.Google Scholar - Allan, L. G. (1993). Human contingency judgments: Rule based or associativity?
*Psychological Bulletin*,**114**, 435–448.CrossRefPubMedGoogle Scholar - Anderson, J. R., &Sheu, C.-F. (1995). Causal inferences as perceptual judgments.
*Memory & Cognition*,**23**, 510–524.CrossRefGoogle Scholar - Anderson, N. H. (1981).
*Foundations of information integration theory*. New York: Academic Press.Google Scholar - Ashby, F. G. (1992). Multidimensional models of categorization. In F. G. Ashby (Ed.),
*Multidimensional models of perception and cognition*(pp. 449–483). Hillsdale, NJ: Erlbaum.Google Scholar - Ashby, F. G., &Gott, R. E. (1988). Decision rules in the perception and categorization of multidimensional stimuli.
*Journal of Experimental Psychology: Learning, Memory, & Cognition*,**14**, 33–53.CrossRefGoogle Scholar - Ashby, F. G., &Townsend, J. T. (1986). Varieties of perceptual independence.
*Psychological Review*,**93**, 154–179.CrossRefPubMedGoogle Scholar - Balakrishnan, N., &Cohen, A. C. (1991).
*Order statistics and inference: Estimation methods*. New York: Academic Press.Google Scholar - Bamber, D., &van Santen, J. P. H. (1985). How many parameters can a model have and still be testable?
*Journal of Mathematical Psychology*,**29**, 443–473.CrossRefGoogle Scholar - Berger, J. O. (1985).
*Statistical decision theory and Bayesian analysis*(2nd ed.). New York: Springer-Verlag.Google Scholar - Berger, J. O., &Perrichi, L. R. (1996). The intrinsic Bayes factor for model selection.
*Journal of the American Statistical Association*,**91**, 109–122.CrossRefGoogle Scholar - Bickel, P. J., &Doksum, K. A. (1977).
*Mathematical statistics*. Oakland, CA: Holden-Day.Google Scholar - Bozdogan, H. (1987). Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions.
*Psychometrika*,**52**, 345–370.CrossRefGoogle Scholar - Bretthorst, G. L. (1989). Bayesian model selection: Examples relevant to NMR. In J. Skilling (Ed.),
*Maximum entropy and Bayesian methods*(pp. 377–388). Amsterdam: Kluwer.Google Scholar - Browne, M. W., &Cudeck, R. C. (1992). Alternative ways of assessing model fit.
*Sociological Methods & Research*,**21**, 230–258.CrossRefGoogle Scholar - Busemeyer, J. R., &Townsend, J. T. (1993). Decision field theory: A dynamic-cognitive approach to decision making in an uncertain environment.
*Psychological Review*,**100**, 432–459.CrossRefPubMedGoogle Scholar - Carlin, B. P., &Chib, S. (1995). Bayesian model choice via Markov chain Monte Carlo methods.
*Journal of the Royal Statistical Society: Series B*,**3**, 473–484.Google Scholar - Chaitin, G. J. (1966). On the length of programs for computing binary sequences.
*Journal of the Association for Computing Machinery*,**13**, 547–569.Google Scholar - Collyer, C. E. (1985). Comparing strong and weak models by fitting them to computer-generated data.
*Perception & Psychophysics*,**38**, 476–481.Google Scholar - Cover, T. M., &Thomas, J. A. (1991).
*Elements of information theory*. New York: Wiley.CrossRefGoogle Scholar - Cudeck, R., &Henly, S. J. (1991). Model selection in covariance structures analysis and the “problem” of sample size: A clarification.
*Psychological Bulletin*,**109**, 512–519.CrossRefPubMedGoogle Scholar - Cutting, J. E., Bruno, N., Brady, N. P., &Moore, C. (1992). Selectivity, scope, and simplicity of models: A lesson from fitting judgments of perceived depth.
*Journal of Experimental Psychology: General*,**121**, 364–381.CrossRefGoogle Scholar - Gelfand, A. E., &Dey, D. K. (1994). Bayesian model choice: Asymptotics and exact calculations.
*Journal of the Royal Statistical Society: Series B*,**56**, 501–514.Google Scholar - Gelfand, A. E., &Smith, A. E. (1990). Sampling-based approaches to calculating marginal densities.
*Journal of the American Statistical Association*,**85**, 398–409.CrossRefGoogle Scholar - Geman, S., &Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images.
*IEEE Transactions on Pattern Analysis & Machine Intelligence*,**6**, 721–741.CrossRefGoogle Scholar - Gillund, G., &Shiffrin, R. M. (1984). A retrieval model for both recognition and recall.
*Psychological Review*,**91**, 1–67.CrossRefPubMedGoogle Scholar - Green, D. M., &Swets, J. A. (1966).
*Signal detection theory and psychophysics*. New York: Wiley.Google Scholar - Gregory, P. C., &Loredo, T. J. (1992). A new method for the detection of a periodic signal of unknown shape and period.
*Astrophysical Journal*,**398**, 146–168.CrossRefGoogle Scholar - Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chain and their applications.
*Biometrika*,**57**, 97–109.CrossRefGoogle Scholar - Hintzman, D. L. (1986). “Schema abstraction” in a multiple-trace memory model.
*Psychological Review*,**93**, 411–428.CrossRefGoogle Scholar - Hintzman, D. L. (1988). Judgments of frequency and recognition in a multiple-trace memory model.
*Psychological Review*,**84**, 260–278.Google Scholar - Jacobs, A. M., &Grainger, J. (1994). Models of visual word recognition: Sampling the state of the art.
*Journal of Experimental Psychology: Human Perception & Performance*,**20**, 1311–1334.CrossRefGoogle Scholar - Jaynes, E. T. (1957). Information theory and statistical mechanics.
*Physical Review*,**106**, 620–630;**108**, 171–190.CrossRefGoogle Scholar - Jeffreys, H. (1961).
*Theory of probability*(3rd ed.). New York: Oxford University Press.Google Scholar - Jeffreys, W. H., &Berger, J. O. (1992). Ockham’s razor and Bayesian analysis.
*American Scientist*,**80**, 64–72.Google Scholar - Kapur, J. N., &Kesavan, H. K. (1992).
*Entropy optimization principles with applications*. New York: Academic Press.Google Scholar - Kass, R. E., &Raftery, A. E. (1995). Bayes factors.
*Journal of the American Statistical Association*,**90**, 773–795.CrossRefGoogle Scholar - Kolmogorov, A. N. (1968). Logical basis for information theory and probability theory.
*IEEE Transactions on Information Theory*,**14**, 662–664.CrossRefGoogle Scholar - Kruschke, J. (1992). ALCOVE: An exemplar-based connectionist model of category learning.
*Psychological Review*,**99**, 22–44.CrossRefPubMedGoogle Scholar - Kullback, S., &Leibler, R. A. (1951). On information and sufficiency.
*Annals of Mathematical Statistics*,**22**, 79–86.CrossRefGoogle Scholar - Le, N. D., &Raftery, A. E. (1996). Robust Bayesian model selection for autoregressive processes with additive outliers.
*Journal of the American Statistical Association*,**91**, 123–131.CrossRefGoogle Scholar - Li, M., &Vitanyi, P. (1993).
*An introduction to Kolmogorov complexity and its applications*. New York: Springer-Verlag.Google Scholar - MacKay, D. J. C. (1992).
*Bayesian methods for adaptive models*. Unpublished doctoral dissertation, California Institute of Technology, Pasadena.Google Scholar - Maddox, W. T., &Ashby, F. G. (1993). Comparing decision bound and exemplar models of categorization.
*Perception & Psychophysics*,**53**, 49–70.Google Scholar - Marquardt, D. (1963). An algorithm for least-squares estimation of nonlinear parameters.
*SIAM Journal of Applied Mathematics*,**11**, 431–441.CrossRefGoogle Scholar - Massaro, D. W., &Cohen, M. M. (1993). The paradigm and the fuzzy logical model of perception are alive and well.
*Journal of Experimental Psychology: General*,**122**, 115–124.CrossRefGoogle Scholar - Massaro, D. W., &Friedman, D. (1990). Models of integration given multiple sources of information.
*Psychological Review*,**97**, 225–252.CrossRefPubMedGoogle Scholar - Medin, D. L., &Schaffer, M. M. (1978). Context theory of classification learning.
*Psychological Review*,**85**, 207–238.CrossRefGoogle Scholar - Metcalfe-Eich, J. (1982). A complete holographic associative recall model.
*Psychological Review*,**89**, 627–661.CrossRefGoogle Scholar - Murdock, B. B., Jr. (1982). A theory for the storage and retrieval of item and associative information.
*Psychological Review*,**89**, 609–626.CrossRefGoogle Scholar - Nosofsky, R. M. (1986). Attention, similarity, and the identification-categorization relationship.
*Journal of Experimental Psychology: General*,**115**, 39–57.CrossRefGoogle Scholar - Oden, G. C., &Massaro, D. W. (1978). Integration of featural information in speech perception.
*Psychological Review*,**85**, 172–191.CrossRefPubMedGoogle Scholar - O’Hagan, A. (1995). Fractional Bayes factors for model comparison.
*Journal of the Royal Statistical Society: Series B*,**57**, 99–138.Google Scholar - Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S. Long (Eds.),
*Testing structural equation models*(pp. 163–180). Thousand Oaks, CA: Sage.Google Scholar - Raftery, A. E. (1994).
*Approximate Bayes factors and accounting for model uncertainty in generalized linear models*(Tech. Rep. 255). Seattle: University of Washington, Department of Statistics.Google Scholar - Raftery, A. E., &Lewis, S. (1991). How many iterations in the Gibbs sampler?
*Bayesian Statistics*,**4**, 763–773.Google Scholar - Reed, S. K. (1972). Pattern recognition and categorization.
*Cognitive Pyschology*,**3**, 382–407.CrossRefGoogle Scholar - Rissanen, J. (1986). Stochastic complexity and modeling.
*Annals of Statistics*,**14**, 1080–1100.CrossRefGoogle Scholar - Rissanen, J. (1990). Complexity of models. In W. H. Zurek (Ed.),
*Complexity, entropy, and the physics of information*(pp. 117–125). Reading, MA: Addison-Wesley.Google Scholar - Schustack, M. W., &Sternberg, R. J. (1981). Evaluation of evidence in causal inference.
*Journal of Experimental Psychology: General*,**110**, 101–120.CrossRefGoogle Scholar - Schwarz, G. (1978). Estimating the dimension of a model.
*Annals of Statistics*,**6**, 461–464.CrossRefGoogle Scholar - Smith, A. F. M. (1991). Bayesian computational methods.
*Philosophical Transactions of the Royal Society of London: Series A*,**337**, 369–386.CrossRefGoogle Scholar - Smith, A. F. M., &Roberts, G. O. (1993). Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods.
*Journal of the Royal Statistical Society: Series B*,**55**, 3–23.Google Scholar - Solomonoff, R. J. (1964). A formal theory of inductive inference.
*Information Control*,**7**, 1–22, 224-254.CrossRefGoogle Scholar - Steiger, J. H. (1990). Structural model evaulation and modification: An interval estimation approach.
*Multivariate Behavioral Research*,**25**, 173–180.CrossRefGoogle Scholar - Steiger, J. H., &Lind, J. C. (1980, November).
*Statistically based tests for the number of common factors*. Paper presented at the annual meeting of the Psychometric Society, Iowa City.Google Scholar - Takane, Y., &Shibayama, T. (1992). Structure in stimulus identification data. In F. G. Ashby (Ed.),
*Multidimensional models of perception and cognition*(pp. 335–362). Hillsdale, NJ: Erlbaum.Google Scholar - Thisted, R. A. (1988).
*Elements of statistical computing: Numerical computation*. New York: Chapman & Hall.Google Scholar - Tierney, L., &Kadane, J. B. (1986). Accurate approximations for posterior moments and marginal densities.
*Journal of the American Statistical Association*,**81**, 82–86.CrossRefGoogle Scholar - Townsend, J. T. (1975). The mind-body equation revisited. In C. Cheng (Ed.),
*Philosophical aspects of the mind-body problem*(pp. 200–218). Honolulu: Honolulu University Press.Google Scholar - Van Zandt, T., &Ratcliff, R. (1995). Statistical mimicking of reaction time data: Single-process models, parameter variability, and mixtures.
*Psychonomic Bulletin & Review*,**2**, 20–54.Google Scholar - Wakefield, J. C., Smith, A. F. M., Racine-Poon, A., &Gelfand, A. E. (1994). Bayesian analysis of linear and non-linear population models by using the Gibbs sampler.
*Applied Statistics*,**43**, 201–221.CrossRefGoogle Scholar