Abstract
Bayesian theories of cognitive science hold that cognition is fundamentally probabilistic, but people’s explicit probability judgments often violate the laws of probability. Two recent proposals, the “Probability Theory plus Noise” (PT+N; Costello and Watts Psychological Review, 121, 463–480, 2014) and “Bayesian Sampler” (Zhu et al. Psychological Review, 127, 719–748, 2020) theories of probability judgments, both seek to account for these biases while maintaining that mental credences are fundamentally probabilistic. These models differ in their averaged predictions about people’s conditional probability judgments and in their distributional predictions about their overall patterns of judgments. In particular, the Bayesian Sampler’s Bayesian adjustment process predicts a truncated range of responses as well as a correlation between the average degree of bias and variability trial-to-trial. However, exploring these distributional predictions with participants’ raw responses requires a careful treatment of rounding errors and exogenous response processes. Here, I cast these theories into a Bayesian data analysis framework that supports the treatment of these issues along with principled model comparison using information criteria. Comparing the fits of both models on data collected by (Zhu et al. Psychological Review, 127(5), 719–748 2020), I find these data are best explained by an account of biases based on “noise” in the sample-reading process but in which conditional probability judgments are produced by a process of conditioning in the mental model of the events, rather than in a two-stage mental sampling process as proposed by the PT+N model.
Similar content being viewed by others
Availability of Data and Materials
This paper presents secondary analyses of data. The datasets generated and/or analysed during the current study are available at https://osf.io/mgcxj/files/.
Code Availability
All analysis code is available at https://github.com/derekpowell/bayesian-sampler and at https://osf.io/bpkjf/.
Notes
It is worth noting that other non-sampling based approaches have been proposed to account for distortions in people’s use of explicit probabilities in decision-making (e.g. Zhang & Maloney, 2012, Zhang et al., 2020). Further theorizing might extend these accounts to also describe the generation of probability estimates, so that a probabilistic account of beliefs might not rest entirely on the assumption of sampling from mental models.
Rather than estimating model fit and then penalizing for model complexity, PSIS-LOO estimates out-of-sample prediction performance directly by estimating the expected log predictive density \(\widehat {\text {elpd}}\) of the model, or the expected probability of new unseen data (Gelman et al., 2014; Vehtari et al., 2017). From these calculations, an estimate of model complexity \(\hat {p}_{\text {LOO}}\) can also be derived. However, it is worth recognizing that formal measures of model complexity will not always track notions of simplicity or elegance in scientific explanation (for some related discussions, see (Kuhn, 1977; Piantadosi, 2018; Sober, 2002)
Uninformativeness was sought in order to reduce bias in the posterior parameter estimates. It should be acknowledged that a uniform prior does not exactly correspond to what the authors of the PT+N theory would predict, as they have frequently assumed d to be a fairly small value (e.g. Costello and Watts, 2017)
Strictly speaking, under the original form of the Bayesian sampler model, N and \(N^{\prime }\) are discrete parameters representing the number of distinct independent samples drawn. Given a particular implied d, this could create constraints on the possible values of \(d^{\prime }\), assuming β is held constant. However, Zhu et al. (2020) also consider the possibility that people draw non-independent mental samples, in which case N and \(N^{\prime }\) would represent the effective number of samples, accounting for their autocorrelation. In this case, we could treat this effective number of samples as a continuous quantity, and therefore imagine there are no clear constraints on d and \(d^{\prime }\) except the stipulation that d ≤ d′. These ideas will be developed further in the trial-level analyses.
References
Anderson, J.R. (1991). The adaptive nature of human categorization. Psychological Review, 98 (3), 409–429. https://doi.org/10.1037/0033-295X.98.3.409.
Chater, N., Zhu, J.-Q., Spicer, J., Sundh, J., León-Villagrá, P., & Sanborn, A. (2020). Probabilistic biases meet the Bayesian brain. Current Directions in Psychological Science, 29(5), 506–512. https://doi.org/10.1177/0963721420954801.
Cook, J., & Lewandowsky, S. (2016). Rational irrationality: modeling climate change belief polarization using Bayesian networks. Topics in Cognitive Science, 8(1), 160–179. https://doi.org/10.1111/tops.12186.
Costello, F., & Watts, P. (2014). Surprisingly rational: Probability theory plus noise explains biases in judgment. Psychological Review, 121(3), 463–480. https://doi.org/10.1037/a0037010.
Costello, F., & Watts, P. (2016). People’s conditional probability judgments follow probability theory (plus noise). Cognitive Psychology, 89, 106–133. https://doi.org/10.1016/j.cogpsych.2016.06.006.
Costello, F., & Watts, P. (2017). Explaining high conjunction fallacy rates: The probability theory plus noise account. Journal of Behavioral Decision Making, 30(2), 304–321. https://doi.org/10.1002/bdm.1936.
Costello, F., & Watts, P. (2018). Invariants in probabilistic reasoning. Cognitive Psychology, 100, 1–16. https://doi.org/10.1016/j.cogpsych.2017.11.003.
Dasgupta, I., Schulz, E., & Gershman, S.J. (2017). Where do hypotheses come from? Cognitive Psychology, 96, 1–25. https://doi.org/10.1016/j.cogpsych.2017.05.001.
Edwards, W. (1968). Conservatism in human information processing. In B. Kleinmuntz (Ed.) Formal representation of human judgment, pp 17–52, New York, Wiley.
Erev, I., Wallsten, T.S., & Budescu, D.V. (1994). Simultaneous over- and underconfidence: The role of error in judgment processes. Psychological Review, 101(3), 519–527. https://doi.org/10.1037/0033-295X.101.3.519.
Fennell, J., & Baddeley, R. (2012). Uncertainty plus prior equals rational bias: An intuitive Bayesian probability weighting function. Psychological Review, 119(4), 878–887. https://doi.org/10.1037/a0029346.
Franke, M., Dablander, F., Scholler, A., Bennett, E., Degen, J., Tessler, M.H...., & Goodman, N.D. (2016). What does the crowd believe? A hierarchical approach to estimating subjective beliefs from empirical data, vol. 6.
Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., & Rubin, D.B. (2014). Bayesian data analysis (Third edn.). Boca Raton: CRC Press.
Gelman, A., Hwang, J., & Vehtari, A. (2014). Understanding predictive information criteria for Bayesian models. Statistics and Computing, 24(6), 997–1016. https://doi.org/10.1007/s11222-013-9416-2.
Gershman, S.J., & Goodman, N.D. (2016). Amortized inference in probabilistic reasoning. vol. 7.
Griffiths, T.L., & Tenenbaum, J.B. (2006). Optimal predictions in everyday cognition. Psychological Science, 17(9), 767–773. https://doi.org/10.1111/j.1467-9280.2006.01780.x.
Howe, R., & Costello, F. (2020). Random variation and systematic biases in probability estimation. Cognitive Psychology, 123, 101306. https://doi.org/10.1016/j.cogpsych.2020.101306.
Jaynes, E.T. (2003). Probability theory: The logic of science (G. L. Bretthorst, Ed.) Cambridge, United Kingdom: Cambridge University Press.
Jern, A., Chang, K.K., & Kemp, C. (2014). Belief polarization is not always irrational. Psychological Review, 121(2), 206–224. https://doi.org/10.1037/a0035941.
Kahneman, D. (2013). Thinking, fast and slow (1st edn.) New York: Farrar, Straus and Giroux.
Kersten, D., Mamassian, P., & Yuille, A. (2004). Object perception as Bayesian inference. Annual Review of Psychology, 55(1), 271–304. https://doi.org/10.1146/annurev.psych.55.090902.142005.
Kucukelbir, A., Ranganath, R., Gelman, A., & Blei, D. (2015). Automatic variational inference in stan. In Advances in Neural Information Processing Systems, vol. 28. Curran Associates, Inc. Retrieved from https://proceedings.neurips.cc/paper/2015/hash/352fe25daf686bdb4edca223c921acea-Abstract.html.
Kuhn, T.S. (1977). The essential tension: Selected studies in scientific tradition and change. Chicago London: University of Chicago Press.
Lu, H., Chen, D., & Holyoak, K.J. (2012). Bayesian analogy with relational transformations. Psychological Review, 119(3), 617–648. https://doi.org/10.1037/a0028719.
Papaspiliopoulos, O., Roberts, G.O., & Sköld, M. (2007). A general framework for the parametrization of hierarchical models. Statistical Science, vol 22(1). https://doi.org/10.1214/088342307000000014.
Phan, D., Pradhan, N., & Jankowiak, M. (2019). Composable effects for flexible and accelerated probabilistic programming in NumPyro. arXiv:1912.11554 [Cs, Stat].
Piantadosi, S.T. (2018). One parameter is always enough. AIP Advances, 8(9), 095118. https://doi.org/10.1063/1.5031956.
Powell, D. (2022). A descriptive Bayesian account of optimism in belief revision. In C. Jennifer, A. Perfors, H. Rabagliati, & V. Ramenzoni (Eds.) Proceedings of the 42nd annual conference of the cognitive science society.
Powell, D., Weisman, K., & Markman, E.M. (2018). Articulating lay theories through graphical models: A study of beliefs surrounding vaccination decisions. vol. 6.
Sanborn, A.N., & Chater, N. (2016). Bayesian brains without probabilities. Trends in Cognitive Sciences, 20(12), 883–893. https://doi.org/10.1016/j.tics.2016.10.003.
Sivula, T., Magnusson, M., & Vehtari, A. (2020). Uncertainty in Bayesian leave-one-out cross-validation based model comparison. arXiv:2008.10296 [Stat].
Sober, E. (2002). What is the problem of simplicity? In A. Zellner, H.A. Keuzenkamp, & M. McAleer (Eds.) Simplicity, inference and modelling (first, pp. 13–31). Cambridge University Press. https://doi.org/10.1017/CBO9780511493164.002.
Sundh, J., Zhu, J., Chater, N., & Sanborn, A. (2021). The mean-variance signature of Bayesian probability judgment. PsyArXiv. https://doi.org/10.31234/osf.io/yuhaz.
Tenenbaum, J.B., Kemp, C., Griffiths, T.L., & Goodman, N.D. (2011). How to grow a mind: statistics, structure, and abstraction. Science, 331(6022), 1279–1285. https://doi.org/10.1126/science.1192788.
Tversky, A., & Kahneman, D. (1983). Extensional versus intuitive reasoning: The conjunction fallacy in probability judgment. Psychological Review, 90(4), 23.
Tversky, A., & Koehler, D.J. (1994). Support theory: a nonextensional representation of subjective probability. Psychological Review, 101(4), 547–567. http://dx.doi.org.ezproxy1.lib.asu.edu/10.1037/0033-295X.101.4.547.
Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27(5), 1413–1432. https://doi.org/10.1007/s11222-016-9696-4.
Vehtari, A., Simpson, D.P., Yao, Y., & Gelman, A. (2019). Limitations of “Limitations of Bayesian leave-one-out cross-validation for model selection”. Computational Brain & Behavior, 2(1), 22–27. https://doi.org/10.1007/s42113-018-0020-6.
Xu, F., & Tenenbaum, J.B. (2007). Word learning as Bayesian inference. Psychological Review, 114(2), 245–272. https://doi.org/10.1037/0033-295X.114.2.245.
Zhang, H., & Maloney, L.T. (2012). Ubiquitous log odds: A common representation of probability and frequency distortion in perception, action, and cognition. Frontiers in Neuroscience, vol. 6. https://doi.org/10.3389/fnins.2012.00001.
Zhang, H., Ren, X., & Maloney, L.T. (2020). The bounded rationality of probability distortion. Proceedings of the National Academy of Sciences, 117(36), 22024–22034. https://doi.org/10.1073/pnas.1922401117.
Zhu, J.-Q., Sanborn, A.N., & Chater, N. (2020). The Bayesian sampler: generic Bayesian inference causes incoherence in human probability judgments. Psychological Review, 127(5), 719–748. https://doi.org/10.1037/rev0000190.
Zhu, J.-Q., Sundh, J., Spicer, J., Chater, N., & Sanborn, A. (2021). The autocorrelated Bayesian sampler: A rational process for probability judgments, estimates, confidence intervals, choices, confidence judgments, and response times. PsyArXiv. https://doi.org/10.31234/osf.io/3qxf7.
Author information
Authors and Affiliations
Contributions
Derek Powell is the sole author of this manuscript.
Corresponding author
Ethics declarations
Ethics Approval
Not applicable.
Consent to Participate
Not applicable.
Consent for Publication
Not applicable.
Competing of Interests
The author declares no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix:
Appendix:
For the trial-level response models participants’ rounded responses are modeled as discrete responses with a categorical (multinomial) distribution. For i ∈{0, 1,...,m} where m = 20 possible responses, define a set of cut points \(a_{i} = \frac {i}{m}-\frac {1}{2m}\) and \(b_{i} = \frac {i}{m}+\frac {1}{2m}\). Using x|[0,1] to denote that x is restricted to the domain [0, 1], the probability of each response given μ and N is:
where B is the appropriate cumulative distribution function. To capture rounding to 10 we define \(a_{i,10} = \frac {2i}{m}-\frac {1}{m}\) and \(b_{i} = \frac {2i}{m}+\frac {1}{m}\), so that the probability of each response is:
Next defined a vector of mixture probabilities \(\overrightarrow {\phi }\), with the zeroth index indicating a “contaminant” process. Combining these response processes, we can define the marginal probability of each response as:
And then responses themselves are distributed Categorical:
For the noise-based model, B is the incomplete Beta function, the CDF of the Beta distribution. The computations for the Bayesian Sampler model response probabilities are identical save that instead of B(x,α,β) we have \(B(f_{BS}^{-1}(x), \alpha , \beta )\) when computing the probability of each response pi, and we use N and \(N^{\prime }\) where appropriate. To see this, let X be the Beta distribution success proportions from the mental sampling operations, ρ(A), and let Y be the distribution of resulting probabilities from the Bayesian sampler model. Then Y = g(X) where g is the function defined in equation 13 from the manuscript.
Letting FX and FY be the CDF of X and Y respectively, we have that:
Putting this all together, define ZNB as the function which calculates the probability of each categorical response under the noise-based model given the inputs of \(\mu _{ijk}, d_{j}, d^{\prime }_{j}\) and ϕ. Here, \(\mu _{ijk} = f_{\text {NB}}(\overrightarrow {\theta _{jk}}, d_{j}, d^{\prime }_{j}, x_{ijk})\) computes the expected probability according to the PT+N theory except that it treats conditional probability judgments like simple probability judgments.
Finally, define ZBS as the function which calculate the probability of each categorical response under the Bayesian Sampler model. Note that, here, \(\mu _{ijk} = f_{0}(\overrightarrow {\theta _{jk}}, x_{ijk})\) where the value of μ depends only on the underlying probabilities and query asked on a specific trial.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Powell, D. Comparing Probabilistic Accounts of Probability Judgments. Comput Brain Behav 6, 228–245 (2023). https://doi.org/10.1007/s42113-022-00164-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42113-022-00164-z