, Volume 84, Issue 1, pp 261–284 | Cite as

A Simple Method for Comparing Complex Models: Bayesian Model Comparison for Hierarchical Multinomial Processing Tree Models Using Warp-III Bridge Sampling

  • Quentin F. GronauEmail author
  • Eric-Jan Wagenmakers
  • Daniel W. Heck
  • Dora Matzke


Multinomial processing trees (MPTs) are a popular class of cognitive models for categorical data. Typically, researchers compare several MPTs, each equipped with many parameters, especially when the models are implemented in a hierarchical framework. A Bayesian solution is to compute posterior model probabilities and Bayes factors. Both quantities, however, rely on the marginal likelihood, a high-dimensional integral that cannot be evaluated analytically. In this case study, we show how Warp-III bridge sampling can be used to compute the marginal likelihood for hierarchical MPTs. We illustrate the procedure with two published data sets and demonstrate how Warp-III facilitates Bayesian model averaging.


multinomial processing tree Bayesian model comparison Bayes factor bridge sampling Warp-III posterior model probability Bayesian model averaging 

Supplementary material

11336_2018_9648_MOESM1_ESM.pdf (897 kb)
Supplementary material 1 (pdf 897 KB)


  1. Aitkin, M. (2001). Likelihood and Bayesian analysis of mixtures. Statistical Modelling, 1, 287–304.Google Scholar
  2. Akaike, H. (1973). Information theory as an extension of the maximum likelihood principle. In B. N. Petrov & F. Csaki (Eds.), Second international symposium on information theory (pp. 267–281). Budapest: Akademiai Kiado.Google Scholar
  3. Batchelder, W. H., & Riefer, D. M. (1980). Separation of storage and retrieval factors in free recall of clusterable pairs. Psychological Review, 87, 375–397.Google Scholar
  4. Batchelder, W. H., & Riefer, D. M. (1990). Multinomial processing models of source monitoring. Psychological Review, 97, 548–564.Google Scholar
  5. Batchelder, W. H., & Riefer, D. M. (1999). Theoretical and empirical review of multinomial process tree modeling. Psychonomic Bulletin & Review, 6, 57–86.Google Scholar
  6. Bayarri, M. J., Berger, J. O., Forte, A., & García-Donato, G. (2012). Criteria for Bayesian model choice with application to variable selection. The Annals of Statistics, 40, 1550–1577.Google Scholar
  7. Bishop, Y. M., Fienberg, S., & Holland, P. (Eds.). (1975). Discrete multivariate analysis: Theory and practice. Cambridge, MA: MIT Press.Google Scholar
  8. Böckenholt, U. (2012a). The cognitive-miser response model: Testing for intuitive and deliberate reasoning. Psychometrika, 77(2), 388–399.Google Scholar
  9. Böckenholt, U. (2012b). Modeling multiple response processes in judgment and choice. Psychological Methods, 17(4), 665–678.Google Scholar
  10. Boehm, U., Marsman, M., Matzke, D., & Wagenmakers, E.-J. (2018). On the importance of avoiding shortcuts in applying cognitive models to hierarchical data. Behavior Research Methods, 50, 1614–1631.Google Scholar
  11. Boehm, U., Steingroever, H., & Wagenmakers, E.-J. (2017). Using Bayesian regression to incorporate covariates into hierarchical cognitive models. Manuscript submitted for publication.Google Scholar
  12. Brown, K. S., & Sethna, J. P. (2003). Statistical mechanical approaches to models with many poorly known parameters. Physical Review E, 68, 021904.Google Scholar
  13. Brown, S. D., & Heathcote, A. (2008). The simplest complete model of choice reaction time: Linear ballistic accumulation. Cognitive Psychology, 57, 153–178.Google Scholar
  14. Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach (2nd ed.). New York: Springer.Google Scholar
  15. Chambers, C. D. (2013). Registered reports: A new publishing initiative at Cortex. Cortex, 49, 609–610.Google Scholar
  16. Chambers, C. D. (2015). Ten reasons why journals must review manuscripts before results are known. Addiction, 110, 10–11.Google Scholar
  17. Culpepper, S. A. (2014). If at first you don’t succeed, try, try again: Applications of sequential IRT models to cognitive assessments. Applied Psychological Measurement, 38(8), 632–644.Google Scholar
  18. Dawid, A. (1970). On the limiting normality of posterior distributions. In Mathematical proceedings of the Cambridge philosophical society (Vol. 67, pp. 625–633).Google Scholar
  19. De Boeck, P., & Partchev, I. (2012). IRTrees: Tree-based item response models of the GLMM family. Journal of Statistical Software, 48, 1–28.Google Scholar
  20. DiCiccio, T. J., Kass, R. E., Raftery, A. E., & Wasserman, L. (1997). Computing Bayes factors by combining simulation and asymptotic approximations. Journal of the American Statistical Association, 92, 903–915.Google Scholar
  21. Dickey, J. M., & Lientz, B. P. (1970). The weighted likelihood ratio, sharp hypotheses about chances, the order of a Markov chain. The Annals of Mathematical Statistics, 41, 214–226.Google Scholar
  22. Eddelbuettel, D., François, R., Allaire, J., Chambers, J., Bates, D., & Ushey, K. (2011). Rcpp: Seamless R and C++ integration. Journal of Statistical Software, 40, 1–18.Google Scholar
  23. Erdfelder, E., Auer, T.-S., Hilbig, B. E., Aßfalg, A., Moshagen, M., & Nadarevic, L. (2009). Multinomial processing tree models: A review of the literature. Zeitschrift für Psychologie, 217, 108–124.Google Scholar
  24. Etz, A., & Wagenmakers, E.-J. (2017). J.B.S. Haldane’s contribution to the Bayes factor hypothesis test. Statistical Science, 32, 313–329.Google Scholar
  25. Fazio, L. K., Brashier, N. M., Payne, B. K., & Marsh, E. J. (2015). Knowledge does not protect against illusory truth. Journal of Experimental Psychology: General, 144, 993–1002.Google Scholar
  26. Frühwirth-Schnatter, S. (2004). Estimating marginal likelihoods for mixture and Markov switching models using bridge sampling techniques. Econometrics Journal, 7, 143–167.Google Scholar
  27. Gamerman, D., & Lopes, H. F. (2006). Markov chain Monte Carlo: Stochastic simulation for Bayesian inference. Boca Raton, FL: Chapman & Hall/CRC.Google Scholar
  28. Gelman, A. (2013). Two simple examples for understanding posterior p values whose distributions are far from uniform. Electronic Journal of Statistics, 7, 2595–2602.Google Scholar
  29. Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2014). Bayesian data analysis (3rd ed.). Boca Raton (FL): Chapman & Hall/CRC.Google Scholar
  30. Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge: Cambridge University Press.Google Scholar
  31. Gelman, A., & Meng, X.-L. (1998). Simulating normalizing constants: From importance sampling to bridge sampling to path sampling. Statistical Science, 13, 163–185.Google Scholar
  32. Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences (with discussion). Statistical Science, 7, 457–472.Google Scholar
  33. Gelman, A., & Rubin, D. B. (1995). Avoiding model selection in Bayesian social research. Sociological Methodology, 25, 165–173.Google Scholar
  34. Gill, J. (2002). Bayesian methods: A social and behavioral sciences approach (1st ed.). Boca Raton, FL: CRC Press.Google Scholar
  35. Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 711–732.Google Scholar
  36. Gronau, Q. F., Sarafoglou, A., Matzke, D., Ly, A., Boehm, U., Marsman, M., et al. (2017). A tutorial on bridge sampling. Journal of Mathematical Psychology, 81, 80–97.Google Scholar
  37. Heck, D. W., Arnold, N. R., & Arnold, D. (2018a). TreeBUGS: An R package for hierarchical multinomial-processing-tree modeling. Behavior Research Methods, 50(1), 264–284.Google Scholar
  38. Heck, D. W., & Erdfelder, E. (2016). Extending multinomial processing tree models to measure the relative speed of cognitive processes. Psychonomic Bulletin & Review, 23, 1440–1465.Google Scholar
  39. Heck, D. W., Erdfelder, E., & Kieslich, P. J. (2018b). Generalized processing tree models: Jointly modeling discrete and continuous variables. Psychometrika, 83, 893–918.Google Scholar
  40. Heck, D. W., & Wagenmakers, E.-J. (2016). Adjusted priors for Bayes factors involving reparameterized order constraints. Journal of Mathematical Psychology, 73, 110–116.Google Scholar
  41. Hoeting, J. A., Madigan, D., Raftery, A. E., & Volinsky, C. T. (1999). Bayesian model averaging: A tutorial. Statistical Science, 14, 382–417.Google Scholar
  42. Hu, X. (2001). Extending general processing tree models to analyze reaction time experiments. Journal of Mathematical Psychology, 45, 603–634.Google Scholar
  43. Hu, X., & Batchelder, W. H. (1994). The statistical analysis of general processing tree models with the EM algorithm. Psychometrika, 59, 21–47.Google Scholar
  44. Hütter, M., & Klauer, K. C. (2016). Applying processing trees in social psychology. European Review of Social Psychology, 27, 116–159.Google Scholar
  45. Jefferys, W. H., & Berger, J. O. (1992). Ockham’s razor and Bayesian analysis. American Scientist, 80, 64–72.Google Scholar
  46. Jeffreys, H. (1961). Theory of probability (3rd ed.). Oxford, UK: Oxford University Press.Google Scholar
  47. Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90, 773–795.Google Scholar
  48. Kellen, D., Singmann, H., & Klauer, K. C. (2014). Modeling source-memory overdistribution. Journal of Memory and Language, 76, 216–236.Google Scholar
  49. Klauer, K. C. (2006). Hierarchical multinomial processing tree models: A latent-class approach. Psychometrika, 71, 7–31.Google Scholar
  50. Klauer, K. C. (2010). Hierarchical multinomial processing tree models: A latent-trait approach. Psychometrika, 75, 70–98.Google Scholar
  51. Lee, M. D. (2011). How cognitive modeling can benefit from hierarchical Bayesian models. Journal of Mathematical Psychology, 55, 1–7.Google Scholar
  52. Lee, M. D., & Vanpaemel, W. (2018). Determining informative priors for cognitive models. Psychonomic Bulletin & Review, 25, 114–127.Google Scholar
  53. Lee, M. D., & Wagenmakers, E.-J. (2013). Bayesian cognitive modeling: A practical course. Cambridge: Cambridge University Press.Google Scholar
  54. Liu, S., & Trenkler, G. (2008). Hadamard, Khatri-Rao, Kronecker and other matrix products. International Journal of Information and Systems Sciences, 4, 160–177.Google Scholar
  55. Ly, A., Boehm, U., Heathcote, A., Turner, B. M., Forstmann, B., Marsman, M., & Matzke, D. (2018). A flexible and efficient hierarchical Bayesian approach to the exploration of individual differences in cognitive-model-based neuroscience. In A. A. Moustafa (Ed.), Computational models of brain and behavior (pp. 467–480). Wiley Blackwell.Google Scholar
  56. Ly, A., Verhagen, A. J., & Wagenmakers, E.-J. (2016). An evaluation of alternative methods for testing hypotheses, from the perspective of Harold Jeffreys. Journal of Mathematical Psychology, 72, 43–55.Google Scholar
  57. Matzke, D., Dolan, C. V., Batchelder, W. H., & Wagenmakers, E.-J. (2015). Bayesian estimation of multinomial processing tree models with heterogeneity in participants and items. Psychometrika, 80, 205–235.Google Scholar
  58. Matzke, D., Love, J., & Heathcote, A. (2017). A Bayesian approach for estimating the probability of trigger failures in the stop-signal paradigm. Behavior Research Methods, 49, 267–281.Google Scholar
  59. Maydeu-Olivares, A., & Joe, H. (2005). Limited-and full-information estimation and Goodness-of-Fit testing in 2n contingency tables. Journal of the American Statistical Association, 100(471), 1009–1020.Google Scholar
  60. Meng, X.-L. (1994). Posterior predictive p values. The Annals of Statistics, 22, 1142–1160.Google Scholar
  61. Meng, X.-L., & Schilling, S. (2002). Warp bridge sampling. Journal of Computational and Graphical Statistics, 11, 552–586.Google Scholar
  62. Meng, X.-L., & Wong, W. H. (1996). Simulating ratios of normalizing constants via a simple identity: A theoretical exploration. Statistica Sinica, 6, 831–860.Google Scholar
  63. Myung, I. J., & Pitt, M. A. (1997). Applying Occam’s razor in modeling cognition: A Bayesian approach. Psychonomic Bulletin & Review, 4, 79–95.Google Scholar
  64. Overstall, A. M. (2010). Default Bayesian model determination for generalised liner mixed models (Doctoral dissertation. University of Southampton). Retrieved August 21, 2018 from
  65. Overstall, A. M., & Forster, J. J. (2010). Default Bayesian model determination methods for generalised linear mixed models. Computational Statistics & Data Analysis, 54, 3269–3288.Google Scholar
  66. Plieninger, H., & Heck, D. W. (2018). A new model for acquiescence at the interface of psychometrics and cognitive psychology. Multivariate Behavioral Research.
  67. Plummer, M. (2003). In K. Hornik, F. Leisch, & A. Zeileis (Eds.), JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. Vienna: Austria.Google Scholar
  68. Plummer, M., Best, N., Cowles, K., & Vines, K. (2006). CODA: Convergence diagnosis and output analysis for MCMC. R News, 6, 7–11.Google Scholar
  69. R Core Team (2016). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria: Retrieved August 21, 2018 from
  70. Riefer, D. M., & Batchelder, W. H. (1988). Multinomial modeling and the measurement of cognitive processes. Psychological Review, 95, 318–339.Google Scholar
  71. Riefer, D. M., Knapp, B. R., Batchelder, W. H., Bamber, D., & Manifold, V. (2002). Cognitive psychometrics: Assessing storage and retrieval deficits in special populations with multinomial processing tree models. Psychological Assessment, 14, 184–201.Google Scholar
  72. Robins, J. M., van der Vaart, A., & Ventura, V. (2000). Asymptotic distribution of p values in composite null models. Journal of the American Statistical Association, 95(452), 1143–1156.Google Scholar
  73. Rouder, J. N., & Lu, J. (2005). An introduction to Bayesian hierarchical models with an application in the theory of signal detection. Psychonomic Bulletin & Review, 12, 573–604.Google Scholar
  74. Rouder, J. N., Lu, J., Morey, R. D., Sun, D., & Speckman, P. L. (2008). A hierarchical process dissociation model. Journal of Experimental Psychology: General, 137, 370–389.Google Scholar
  75. Rouder, J. N., Morey, R. D., Verhagen, A. J., Swagman, A. R., & Wagenmakers, E.-J. (2017). Bayesian analysis of factorial designs. Psychological Methods, 22, 304–321.Google Scholar
  76. Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461–464.Google Scholar
  77. Scott, J. G., & Berger, J. O. (2010). Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem. The Annals of Statistics, 38, 2587–2619.Google Scholar
  78. Shao, J. (1993). Linear model selection by cross-validation. Journal of the American Statistical Association, 88(422), 286–292.Google Scholar
  79. Singmann, H., & Kellen, D. (2013). MPTinR: Analysis of multinomial processing tree models in R. Behavior Research Methods, 45, 560–575.Google Scholar
  80. Singmann, H., Kellen, D., & Klauer, K. C. (2013). Investigating the other-race effect of Germans towards Turks and Arabs using multinomial processing tree models. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th annual conference of the cognitive science society (pp. 1330–1335). Austin, TX: Cognitive Science Society.Google Scholar
  81. Sinharay, S., & Stern, H. S. (2005). An empirical comparison of methods for computing Bayes factors in generalized linear mixed models. Journal of Computational and Graphical Statistics, 14, 415–435.Google Scholar
  82. Smith, J. B., & Batchelder, W. H. (2010). Beta-MPT: Multinomial processing tree models for addressing individual differences. Journal of Mathematical Psychology, 54, 167–183.Google Scholar
  83. Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society B, 64, 583–639.Google Scholar
  84. Stan Development Team. (2016). RStan: The R interface to Stan. Retrieved August 21, 2018 from (R package version 2.14.1)
  85. Steingroever, H., Wetzels, R., & Wagenmakers, E.-J. (2014). Absolute performance of reinforcement-learning models for the Iowa Gambling Task. Decision, 1, 161–183.Google Scholar
  86. Stone, M. (1977). An asymptotic equivalence of choice of model by cross-validation and Akaike’s criterion. Journal of the Royal Statistical Society Series B, 39, 44–47.Google Scholar
  87. Turner, B. M., Wang, T., & Merkle, E. C. (2017). Factor analysis linking functions for simultaneously modeling neural and behavioral data. NeuroImage, 153, 28–48.Google Scholar
  88. Vandekerckhove, J. (2014). A cognitive latent variable model for the simultaneous analysis of behavioral and personality data. Journal of Mathematical Psychology, 60, 58–71.Google Scholar
  89. Vandekerckhove, J., Matzke, D., & Wagenmakers, E.-J. (2015). Model comparison and the principle of parsimony. In J. Busemeyer, J. Townsend, Z. J. Wang, & A. Eidels (Eds.), Oxford handbook of computational and mathematical psychology (pp. 300–319). Oxford: Oxford University Press.Google Scholar
  90. Vanpaemel, W. (2010). Prior sensitivity in theory testing: An apologia for the Bayes factor. Journal of Mathematical Psychology, 54, 491–498.Google Scholar
  91. Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27, 1413–1432.Google Scholar
  92. Wagenmakers, E.-J., & Farrell, S. (2004). AIC model selection using Akaike weights. Psychonomic Bulletin & Review, 11, 192–196.Google Scholar
  93. Wagenmakers, E.-J., Lodewyckx, T., Kuriyal, H., & Grasman, R. (2010). Bayesian hypothesis testing for psychologists: A tutorial on the Savage-Dickey method. Cognitive Psychology, 60, 158–189.Google Scholar
  94. Wang, L., & Meng, X.-L. (2016). Warp bridge sampling: The next generation. arXiv preprint arXiv:1609.07690.
  95. Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research, 11, 3571–3594.Google Scholar

Copyright information

© The Psychometric Society 2018

Authors and Affiliations

  1. 1.University of AmsterdamAmsterdamThe Netherlands
  2. 2.University of MannheimMannheimGermany

Personalised recommendations