Skip to main content
Log in

Subjective Probability as Sampling Propensity

  • Published:
Review of Philosophy and Psychology Aims and scope Submit manuscript

Abstract

Subjective probability plays an increasingly important role in many fields concerned with human cognition and behavior. Yet there have been significant criticisms of the idea that probabilities could actually be represented in the mind. This paper presents and elaborates a view of subjective probability as a kind of sampling propensity associated with internally represented generative models. The resulting view answers to some of the most well known criticisms of subjective probability, and is also supported by empirical work in neuroscience and behavioral psychology. The repercussions of the view for how we conceive of many ordinary instances of subjective probability, and how it relates to more traditional conceptions of subjective probability, are discussed in some detail.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. (C) is usually, but not always, made explicit in analyses of probability. As many have argued (e.g., James 1890; Ramsey 1931; Williamson 2015, etc.), a notion of idle belief, totally disconnected from any kind of choice or action, seems suspect. Some evidence in neuroscience suggests that, for low-level cognition, representation of utility and probabilistic expectation cannot be separated (Gershman and Daw 2012). Others insist that subjective probability is conceptually separable from any relation it might have to action or choice (Eriksson and Hájek 2007). Though we assume (C) as part of the concept, not much will hinge on whether probability and choice are really conceptually separable.

  2. Of course, the many problems with these proposals, even for the types of scenarios for which they are designed, are well known. See Zynda (2000) on representation theorems, and Eriksson and Hájek (2007) for many of the central puzzles and problems with all conceptual analyses considered in the literature.

  3. This is an example from Raiffa (1968).

  4. There are, of course, those who argue that intuition outstrips explicit calculation, particularly in cases like these (Dreyfus and Dreyfus 1986).

  5. This example is taken from Griffiths and Tenenbaum (2006). Examples 3 and 4 from above could be used to illustrate the same point.

  6. We will use this scenario, a simplified version of one from Rumelhart et al. (1986), as a running example through the next few sections.

  7. This is particularly bad news for anyone who wants to define subjective probability on the basis of representation theorems. There is obviously no probability function that will agree with such an ordering, since P(A & B)≤P(A) for any A and B.

  8. Cf. Vul (2010), who earlier used this illustration.

  9. Presumably, a more realistic probabilistic model of this domain would have to include higher-order correlations as well. For instance, if we added minibar to the network, then while sink and chair are anti-correlated in general, in the presence of minibar they might be correlated. This kind of structure can be captured by more general Markov random fields. See next section.

  10. Though this is a very different proposal from what Harman suggested (Harman 1986, Ch. 1). See Millgram (1991) for a response to Harman also drawing on (approximate inference for) graphical models. Millgram goes further, questioning whether any “hardness argument” could ever cast doubt on the idea that we ought to reason in a way consistent with numerical probability.

  11. Even within the class of Markov random fields, Boltzmann Machines are a small subclass, representing only distributions that can be written with pairwise potential functions.

  12. See also Stewart et al. (2006) on an application of this idea to decision making.

  13. This only works for relatively small discrete hypothesis spaces. For continuous spaces a natural alternative proposal would be to construct a density estimate and return the mean, for instance.

  14. For the Boltzmann Machine, which defines a biased sampler (see Sections 7 and 9, Appendix A), one must specify how many iterations to run before the current state is returned as a single, bona fide sample.

  15. It is also conceivable that the actions themselves could be incorporated into the generative process. See, e.g., Solway and Botvinick (2012) and the idea of planning as inference, where an action is inferred from a model by conditioning on the action having desirable consequences.

  16. See Marr (1982), and also Anderson (1990). Compare this with the following quotation from Churchland and Sejnowski (1994):

    In the Boltzmann Machine, matters of computation, of algorithm, and of implementation are not readily separable. It is the very physical configuration of the input that directly encodes the computational problem, and the algorithm is nothing other than the very process whereby the physical system settles into the solution. (92)

    For the Bayesian psychologist, the mere specification of the machine leaves out a crucial level of computational explanation: understanding what function the machine is computing, or “attempting” to compute (Griffiths et al. 2010).

  17. Some have pointed out that many Bayesian models in cognitive science (provably) cannot be tractably approximated, which has been taken to throw the rationality claim into question (Kwisthout et al. 2008). The general trend has been to back off to a notion of bounded rationality, though there remain difficult and interesting questions on this front (Griffiths et al. 2015; Icard 2014).

  18. Cf. Chater et al. (2006), Box 2, where a similar point is made.

  19. The results in the paper are stated in terms of the log likelihood ratio, also known as the weight of evidence (Good 1950); but since the two hypotheses have equal prior probability in these experiments, these values are equal by Bayes’ Theorem.

  20. They build on a large literature on this topic, including earlier probabilistic and sampling-based analyses of multistability, especially Sundareswara and Schrater (2007). See Gershman et al. (2012) for other references. See also Moreno-Bote et al. (2011), cited above, for a different sampling analysis of multistability based on population coding, and Buesing et al. (2011) for extensions of Gershman et al.’s work, in a more neurally plausible setting, accounting for refractory periods.

  21. Disambiguating sentence structure is akin to disambiguation of word sense, as in one of our initial examples (4) from Section 2.

  22. Notably, particle filters have also been proposed as neurally plausible models of high-level vision. See, e.g., Lee and Mumford (2003).

  23. The literature following Tversky and Kahneman (1974) has investigated an ambiguity in the statement of the heuristic, as to whether probability judgments are based on the number of instances brought to mind, or the perceived ease with which examples are brought to mind. Experimental results by Schwarz et al. (1991) suggest it is the latter.

  24. It is also tempting to suggest that the conjunction fallacy itself can be explained in terms of such a method. Perhaps more representative (Tversky and Kahneman 1983) instances of a concept or event-type are indeed sampled more often and easily, just as we know more prototypical instances of a concept often come to mind most easily in experimental settings (Rosch 1975). While the method of separately sampling the two event types has the advantage of working in principle for any two event-types, it risks making systematic logical mistakes in cases like these. Needless to say, this vague claim would need to be systematized and tested. One might also worry, to the extent that this holds, whether the sampling propensities in question could still be said faithfully to represent the agent’s degree of uncertainty. Worries of this sort will be addressed in Section 11 below.

  25. See also Lieder et al. (2014) and Griffiths et al. (2015) for more work in this vein, including results relevant to the availability heuristic, specifically availability of “extreme” events (cf. Section 11 below).

  26. See Vulcan (2000) for a comprehensive discussion and review of work from the 20 th century.

  27. As a number of authors have pointed out (e.g., Perfors 2012), demanding clarity about “where the priors come from”—what inductive biases people have—and “what the hypothesis space is”—what are the possible representations that could be used—ought to be seen as virtuous.

  28. Any concrete sampling algorithm would work as an illustration, but utility-weighted sampling illustrates the point especially vividly.

  29. To see what a difference this can make, Lieder et al. estimate that, in deciding whether to text and drive, if people were simply drawing samples in a way that mirrors the actual (objective) frequency, in order to have a 50% chance of conjuring to mind possible disaster scenarios, they would need to draw 700 million samples. By contrast, a single utility-weighted sample would have well over 99% chance of returning such a possibility.

  30. On the view sketched here, such sampling propensities might be seen as a quantitative version of Gendler’s aliefs, belief-like attitudes that elude rational control, but play an important role in quasi-automatic “reflex-like” behavior (Gendler 2008). Thus, their role can be overridden by conscious control, as, e.g., when the snake is behind a pane of glass in a museum.

  31. Channeling the attitude expressed by Good (recall Section 3), we might construe the purpose of inductive logic precisely as overcoming such biases in how our naïve probability judgments (now construed as sampling propensities) are formed, so that our considered judgments about likelihood can be more purely based in evidence.

  32. However, intriguingly, it has recently been argued that the mere consideration of a proposition might amount to raising one’s credence in it, if only very momentarily (Mandelbaum 2014).

  33. This is assuming we have only a single trail. Empirically, when subjects make multiple sequential guesses, they sometimes do probability match, even when the “objective” probabilities are completely clear, as in this case (Vulcan 2000).

    One might observe something closer to sampling behavior in a slightly more complex scenario, where, say, the subject must make a choice between receiving a moderate-sized cash reward for sure, or a gamble that returns nothing on red and an enormous cash reward on green. In fact, these types of problems have been analyzed by Lieder et al. (2014) using utility-weighted sampling. See also Stewart et al. (2006) for a closely related decision-making paradigm based on sampling from memory traces.

  34. A fourth, quite curious, example comes from the observation that in many cases where adult subjects do probability match, young children seem to maximize robustly. See Yurovsky et al. (2013) for an overview and discussion.

  35. Indeed, a number of theorists have claimed that numbers cannot be meaningfully assigned to such states. We already mentioned Harman. Keynes (1921) is another famous example, and of course there are many others, in philosophy, psychology, and the social sciences.

  36. In this sense, the Sampling Hypothesis seems to be complementary to a view expressed recently by Norby (2015), who claims that degrees of belief as traditionally conceived are inappropriate as descriptions of ordinary psychological states that feature in decision making. Interestingly, Norby comes close to expressing something in the neighborhood of the Sampling Hypothesis (see his Section 3 on “proto-credences”), based on some of the same empirical work (in particular Stewart et al. 2006). However, he does not takes these states to be genuine subject probabilities (87).

  37. As mentioned above (Section 9), it has been common in this literature to rationalize behavior at odds with logic or probability by invoking notions of boundedness or resource constraints.

  38. We ignore bias terms here to simplify the presentation. See Rumelhart et al. (1986) for the general formulation.

References

  • Anderson, J.R. 1990. The adaptive character of thought. Lawrence Earlbaum Associates, Inc.

  • Anscombe, F.J., and R.J. Aumann. 1963. A definition of subjective probability. The Annals of Mathematical Statistics 34(1): 199–205.

    Article  Google Scholar 

  • Arora, S., and B. Barak. 2009. Computational complexity: A modern approach. Cambridge University Press.

  • Barsalou, L.W. 1999. Perceptual symbol systems. Behavioral and Brain Sciences 22(4): 577–609.

    Google Scholar 

  • Berkes, P., G. Orbán, M. Lengyel, and J. Fiser. 2011. Spontaneous cortical activity reveals hallmarks of an optimal internal model of the environment. Science 331: 83–87.

    Article  Google Scholar 

  • Buesing, L., J. Bill, B. Nessler, and W. Maass. 2011. Neural dynamics as sampling: A model for stochastic computation in recurrent networks of spiking neurons. PLoS Computational Biology 7(11).

  • Carnap, R. 1947. On the application of inductive logic. Philosophy and Phenomenological Research 8: 133–148.

    Article  Google Scholar 

  • Chater, N., and C.D. Manning. 2006. Probabilistic models of language processing and acquisition. Trends in Cognitive Science 10(7): 335–344.

    Article  Google Scholar 

  • Chater, N., J.B. Tenenbaum, and A. Yuille. 2006. Probabilistic models of cognition: Conceptual foundations. Trends in Cognitive Science 10(7): 287–291.

    Article  Google Scholar 

  • Christensen, D. 1996. Dutch-book arguments depragmatized: Epistemic consistency for partial believers. Journal of Philosophy 93: 450–479.

    Article  Google Scholar 

  • Churchland, P.S., and T.J. Sejnowski. 1994. The computational brain. MIT Press.

  • Clark, A. 2013. Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences 36: 181–253.

    Article  Google Scholar 

  • Craik, K. 1943. The nature of explanation. Cambridge University Press.

  • Davidson, D. 1975. Hempel on explaining action. Erkenntnis 10(3): 239–253.

    Google Scholar 

  • de Finetti, B. 1974, Vol. 1. Theory of probability. New York: Wiley.

    Google Scholar 

  • Denison, S., E. Bonawitz, A. Gopnik, and T.L. Griffiths. 2013. Rational variability in children’s causal inferences: The sampling hypothesis. Cognition 126: 285–300.

    Article  Google Scholar 

  • Dennett, D.C. 1981. Three kinds of intentional psychology. In Reduction, time, and reality, ed. Healey R., 37–61. Cambridge University Press.

  • Dreyfus, H.L., and S.E. Dreyfus. 1986. Mind over machine. Free Press.

  • Eriksson, L., and A. Hájek. 2007. What are degrees of belief? Studia Logica 86(2): 183–213.

    Article  Google Scholar 

  • Fiser, J., P. Berkes, G. Orbán, and M. Lengyel. 2010. Statistically optimal perception and learning: From behavior to neural representations. Trends in Cognitive Science 14(3): 119–130.

    Article  Google Scholar 

  • Freer, C., D. Roy, and J. Tenenbaum. 2012. Towards common-sense reasoning via conditional simulation: Legacies of Turing in artificial intelligence. In Turing’s legacy, ed. Downey R. ASL Lecture Notes in Logic.

  • Gaifman, H. 2004. Reasoning with limited resources and assigning probabilities to arithmetical statements. Synthese 140: 97–119.

    Article  Google Scholar 

  • Galton, F. 1889. Natural inheritance. MacMillan.

  • Gendler, T. 2008. Alief and belief. Journal of Philosophy 105(10): 634–663.

    Article  Google Scholar 

  • Gershman, S.J., and N.D. Daw. 2012. Perception, action, and utility: The tangled skein. In Principles of brain dynamics: Global state interactions, eds. Rabinovich M., Friston K., and Varona P., 293–312. MIT Press.

  • Gershman, S.J., E. Vul, and J.B. Tenenbaum. 2012. Multistability and perceptual inference. Neural Computation 24: 1–24.

    Article  Google Scholar 

  • Gigerenzer, G., and D.G. Goldstein. 1996. Reasoning the fast and frugal way: Models of bounded rationality. Psychological Review 103(4): 650–699.

    Article  Google Scholar 

  • Good, I.J. 1950. Probability and the weighing of evidence. Charles Griffin.

  • Good, I.J. 1983. Good thinking: The foundations of probability and its applications. University of Minnesota Press.

  • Goodman, N.D., J.B. Tenenbaum, and T. Gerstenberg. 2014. Concepts in a probabilistic language of thought. In The conceptual mind: New directions in the study of concepts, eds. Margolis E. and Laurence S. MIT Press.

  • Gopnik, A., C. Glymour, D. Sobel, L. Schulz, T. Kushnir, and D. Danks. 2004. A theory of causal learning in children: Causal maps and Bayes nets. Psychological Review 111(1): 3–32.

    Article  Google Scholar 

  • Griffiths, T.L., N. Chater, C. Kemp, A. Perfors, and J.B. Tenenbaum. 2010. Probabilistic models of cognition: Exploring representations and inductive biases. Trends in Cognitive Science 14(8): 357–364.

    Article  Google Scholar 

  • Griffiths, T.L., F. Lieder, and N.D. Goodman. 2015. Rational use of cognitive resources: Levels of analysis between the computational and the algorithmic. Topics in Cognitive Science 7(2): 217–229.

    Article  Google Scholar 

  • Griffiths, T.L., and J.B. Tenenbaum. 2006. Optimal predictions in everyday cognition. Psychological Science 17(9): 767–773.

    Article  Google Scholar 

  • Harman, G. 1986. Change in view. MIT Press.

  • Icard, T. 2013. The algorithmic mind: A study of inference in action. PhD thesis, Stanford University.

  • Icard, T.F. 2014. Toward boundedly rational analysis. In Proceedings of the 36th annual meeting of the cognitive science society, eds. Bello P., Guarini M., McShane M., and Scassellati B., 637– 642.

  • Icard, T.F., and N.D. Goodman. 2015. A resource-rational approach to the causal frame problem. In Proceedings of the 37th annual conference of the cognitive science society, eds. Noelle D.C., Dale R., Warlaumont A.S., Yoshimi J., Matlock T., Jennings C.D., and Maglio P.P.

  • James, W. 1890. The principles of psychology. Henry Holt & Co.

  • Jaynes, E.T. 2003. Probability theory: The logic of science. Cambridge University Press.

  • Johnson-Laird, P.N. 1983. Mental models: Towards a cognitive science of language, inference, and consciousness. Cambridge University Press.

  • Joyce, J.M. 1998. A nonpragmatic vindication of probabilism. Philosophy of Science 65: 575–603.

    Article  Google Scholar 

  • Keynes, J.M. 1921. A treatise on probability. Macmillan.

  • Knill, D.C., and A. Pouget. 2004. The Bayesian brain: The role of uncertainty in neural coding and computation. Trends in Neurosciences 27(12): 712–719.

    Article  Google Scholar 

  • Koller, D., and N. Friedman. 2009. Probabilistic graphical models: Principles and techniques. MIT Press.

  • Kruschke, J.K. 2006. Locally Bayesian learning with application to retrospective revaluation and highlighting. Psychological Review 113(4): 677–699.

    Article  Google Scholar 

  • Kwisthout, J., T. Wareham, and I. van Rooij. 2008. Bayesian intractability is not an ailment that approximation can cure. Cognitive Science 35: 779–784.

    Article  Google Scholar 

  • Lee, T.S., and D. Mumford. 2003. Hierarchical bayesian inference in the visual cortex. Journal of the Optical Society of America, A 20(7): 1434–1448.

    Article  Google Scholar 

  • Levy, R., F. Reali, and T.L. Griffiths. 2009. Modeling the effects of memory on human online sentence processing with particle filters. Advances in Neural Information Processing Systems 21 : 937–944.

    Google Scholar 

  • Lewis, D.K. 1974. Radical interpretation. Synthese 23: 331–344.

    Article  Google Scholar 

  • Lichtenstein, S., P. Slovic, B. Fischoff, M. Layman, and B. Combs. 1978. Judged frequency of lethal events. Journal of Experimental Psychology: Human Learning and Memory 4(6).

  • Lieder, F., T.L. Griffiths, and N.D. Goodman. 2012. Burn-in, bias, and the rationality of anchoring. Advances in Neural Information Processing Systems 25: 2699–2707.

    Google Scholar 

  • Lieder, F., M. Hsu, and T.L. Griffiths. 2014. The high availability of extreme events serves resource-rational decision-making. In Proceedings of the 36th annual meeting in cognitive science, eds. Bello P., Guarini M., McShane M., and Scassellati B.

  • Lochmann, T., and S. Deneve. 2011. Neural processing as causal inference. Current Opinion in Neurobiology 21: 774–781.

    Article  Google Scholar 

  • Luce, R.D. 1959. Individual choice behavior: A theoretical analysis. Wiley.

  • Luce, R.D., and P. Suppes. 1965. Preference, utility, and subjective probability. In Handbook of mathematical psychology, eds. Luce R.D., Bush R.R., and Galanter E.H., 249–410 . Wiley.

  • MacKay, D. 2003. Information theory, inference, and learning algorithms. Cambridge University Press.

  • Mandelbaum, E. 2014. Thinking is believing. Inquiry: An Interdisciplinary Journal of Philosophy 57(1): 55–96.

    Article  Google Scholar 

  • Marr, D. 1982. Vision. W.H. Freeman and Company.

  • McFadden, D.L. 1973. Conditional logit analysis of qualitative choice behavior. In Frontiers in econometrics, ed. Zarembka P. Academic Press.

  • Millgram, E. 1991. Harman’s hardness arguments. Pacific Philosophical Quarterly 72(3): 181–202.

    Google Scholar 

  • Moreno-Bote, R., D.C. Knill, and A. Pouget. 2011. Bayesian sampling in visual perception. Proceedings of the National Academy of Sciences 108 (30): 12491–12496.

    Article  Google Scholar 

  • Mozer, M.C., H. Pashler, and H. Homaei. 2008. Optimal predictions in everyday cognition: The wisdom of individuals or crowds? Cognitive Science 32: 1133–1147.

    Article  Google Scholar 

  • Norby, A. 2015. Uncertainty without all the doubt. Mind and Language 30(1): 70–94.

    Article  Google Scholar 

  • Pearl, J. 1988. Probabilistic reasoning in intelligent systems: Networks of plausible inference. Morgan Kaufmann.

  • Perfors, A. 2012. Bayesian models of cognition: What’s built in after all? Philosophy Compass 7(2): 127–138.

    Article  Google Scholar 

  • Raiffa, H. 1968. Decision analysis. Addison-Wesley.

  • Ramsey, F.P. 1931. Truth and probability. In Foundations of mathematics and other logical essays, ed. Braithwaite R.B. Martino Fine.

  • Rosch, E. 1975. Cognitive representations of semantic categories. Journal of Experimental Psychology: General 104(3): 192–233.

    Article  Google Scholar 

  • Rumelhart, D.E., J.L. McClelland, and The PDP Research Group. 1986. Parallel distributed processing: Explorations in the microstructure of cognition. MIT Press.

  • Savage, L.J. 1954. The foundations of statistics. Wiley.

  • Schacter, D.L., D.R. Addis, and R.L. Buckner. 2008. Episodic simulation of future events: Concepts, data, and applications. Annals of the New York Academy of Sciences 1124: 39–60.

    Article  Google Scholar 

  • Schwarz, N., H. Bless, F. Strack, G. Klumpp, H. Rittenauer-Schatka, and A. Simons. 1991. Ease of retrieval as information: Another look at the availability heuristic. Journal of Personality and Social Psychology 61(2): 195–202.

    Article  Google Scholar 

  • Seth, A.K. 1999. Evolving behavioural choice: An exploration of Hernnstein’s matching law. In Proceedings of the 5th European conference on artificial life, eds. Floreano D., Nicoud J.-D., and Mondada F., 225–236. Springer.

  • Shi, L., T.L. Griffiths, N.H. Feldman, and A.N. Sanborn. 2010. Exemplar models as a mechanism for performing Bayesian inference. Psychonomic Bulletin and Review 17(4): 443–464.

    Article  Google Scholar 

  • Simon, H.A. 1976. From substantive to procedural rationality. In 25 years of economic theory, eds. Kastelein T.J., Kuipers S.K., Nijenhuis W.A., and Wagenaar G.R., 65–86. Springer.

  • Solway, A., and M.M. Botvinick. 2012. Goal-directed decision making as probabilistic inference. Psychological Review 119(1): 120–154.

    Article  Google Scholar 

  • Stewart, N., N. Chater, and G.D. Brown. 2006. Decision by sampling. Cognitive Psychology 53: 1–26.

    Article  Google Scholar 

  • Sundareswara, R., and P. Schrater. 2007. Perceptual multistability predicted by search model for Bayesian decisions. Journal of Vision 8(5): 1–19.

    Google Scholar 

  • Suppes, P. 1974. The measurement of belief. The Journal of the Royal Statistical Society, Series B 36(2): 160–191.

    Google Scholar 

  • Tenenbaum, J.T., C. Kemp, T.L. Griffiths, and N.D. Goodman. 2011. How to grow a mind: Statistics, structure, and abstraction. Science 331: 1279–1285.

    Article  Google Scholar 

  • Thurstone, L.L. 1927. A law of comparative judgment. Psychological Review 34(4): 273–286.

    Article  Google Scholar 

  • Tversky, A., and D. Kahneman. 1974. Judgment under uncertainty: Heuristics and biases. Science 185(4157): 1124–1131.

    Article  Google Scholar 

  • Tversky, A., and D. Kahneman. 1983. Extensional versus intuitive reasoning: The conjunction fallacy in probability judgment. Psychological Review 90(4): 293–315.

    Article  Google Scholar 

  • Vilares, I., and K.P. Kording. 2011. Bayesian models: the structure of the world, uncertainty, behavior, and the brain. Annals of the New York Academy of Sciences 1224: 22–39.

    Article  Google Scholar 

  • Vul, E. 2010. Sampling in human cognition. PhD thesis, MIT.

  • Vul, E., N.D. Goodman, T.L. Griffiths, and J.B. Tenenbaum. 2014. One and done? Optimal decisions from very few samples. Cognitive Science 38(4): 599–637.

    Article  Google Scholar 

  • Vul, E., and H. Pashler. 2008. Measuring the crowd within. Psychological Science 19(7): 645–647.

    Article  Google Scholar 

  • Vulcan, N. 2000. An economist’s perspective on probability matching. Journal of Economic Surveys 13(1): 101–118.

    Article  Google Scholar 

  • Walley, P. 1991. Statistical reasoning with imprecise probabilities. Chapman & Hall.

  • Williamson, T. 2015. Acting on knowledge. In Knowledge-first, eds. Carter J.A., Gordon E., and Jarvis B. Oxford University Press.

  • Yang, T., and M.N. Shadlen. 2007. Probabilistic reasoning by neurons. Nature 447: 1075–1082.

    Article  Google Scholar 

  • Yurovsky, D., T.W. Boyer, L.B. Smith, and C. Yu. 2013. Probabilistic cue combination: Less is more. Developmental Science 16(2): 149–158.

    Article  Google Scholar 

  • Zynda, L. 2000. Representation theorems and realism about degrees of belief. Philosophy of Science 67(1): 45–69.

    Article  Google Scholar 

Download references

Acknowledgments

Thanks to the RoPP editor Paul Egré, to the journal reviewers, and to Falk Lieder for useful comments that helped improve the paper. Thanks also to Wesley Holliday, Shane Steinert-Threlkeld, and my dissertation committee (see Icard 2013) for helpful comments on an earlier version.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas Icard.

Appendices

Appendix A: Boltzmann Machine as Gibbs Sampler

In this Appendix we explain the sense in which the activation rule for the Boltzmann Machine carries out Gibbs sampling on an underlying Markov random field. This fact is folklore in cognitive science.

Gibbs sampling is an instance of the Metropolis-Hastings algorithm, which in turn is a typle of Markov chain Monte Carlo inference (MacKay 2003). Suppose we have a multivariate distribution P(X 1,…,X n ) and we want to draw samples from it. The Gibbs sampling algorithm is as follows:

  1. 1.

    Specify some initial values for all the random variables \(\mathbf {y}^{(0)} = \left (y_{1}^{(0)},\dots ,y_{n}^{(0)}\right )\).

  2. 2.

    Given y (r), randomly choose a number i∈{0,…,n}, and let y (r+1) be exactly like y (r), except that \(y_{i}^{(r+1)}\) is redrawn from the conditional distribution P(X i | y i ).

  3. 3.

    At some stage R, return some subset of {y (0),…,y (R)} as samples.

Note that the sequence of value vectors y (0),…,y (r),… forms a Markov chain because the next sample y (r+1) only depends on the previous sample y (r). Let q(yy ) be the probability of moving from y at a given stage r to y at stage r+1 (which is the same for all r, and equal to 0 when y and y differ by more than one coordinate.). And let π r(y) be the probability of being in state y at stage r. Then we clearly have:

$$\pi^{r+1}(\mathbf{y}^{\prime}) = \sum\limits_{\mathbf{y}} \pi^{r}(\mathbf{y}) q(\mathbf{y}\rightarrow\mathbf{y}^{\prime})\;.$$

By general facts about Markov chains, it can be shown that this process reaches a unique stationary distribution, i.e., π r = π r+1 for all r>s for some s. This is the unique distribution π for which the following holds:

$$\pi^{*}(\mathbf{y}^{\prime}) = \sum\limits_{\mathbf{y}} \pi^{*}(\mathbf{y}) q(\mathbf{y}\rightarrow\mathbf{y}^{\prime})\;.$$

Since this equation also holds for P, that shows the Markov chain converges to P = π .

Recall the Boltzmann Machine is defined by a set of nodes and weights between those nodes.Footnote 38 If we think of the nodes as binary random variables (X 1,…,X n ), taking on values {0,1}, then the weight matrix W gives us a natural distribution P(X 1,…,X n ) on state space {0,1}n as follows. First, define an energy function E on the state space:

$$E(\mathbf{y}) = -\frac{1}{2} \sum\limits_{i,j} W_{i,j} y_{i}y_{j}\;.$$

Then the resulting Boltzmann distribution is given by:

$$P(\mathbf{y}) = \frac{e^{-E(\mathbf{y})}}{\sum\limits_{\mathbf{y}^{\prime}}e^{-E(\mathbf{y}^{\prime})}}\;.$$

Note that the denominator becomes a very large sum very quickly. Recall from our earlier discussion that the size of the state space is 32 with 5 nodes, but over a trillion with 40 nodes. Suppose we apply the Gibbs sampling algorithm to this distribution. Then at a given stage, we randomly choose a node to update, and the new value is determined by the conditional probability as above. This step is equivalent to applying the activation function for the Boltzmann Machine (where in the following, y is just like y, except that \(y_{i}^{\prime } = 1\) and y i = 0):

$$\begin{array}{@{}rcl@{}} P(X_{i}=1\; |\; \{y_{j}\}_{j\neq i}) & = & P(X_{i} = 1\;|\;\mathbf{y}\text{ or }\mathbf{y}^{\prime}) \\ & = & \frac{P(\mathbf{y}^{\prime})}{P(\mathbf{y}\text{ or }\mathbf{y}^{\prime})} \\ & = & \frac{e^{-E(\mathbf{y}^{\prime})}}{e^{-E(\mathbf{y})}+e^{-E(\mathbf{y}^{\prime})}} \\ & = & \frac{1}{1+e^{E(\mathbf{y}^{\prime}) -E(\mathbf{y})}} \\ & = & \frac{1}{1+ e^{-\sum\limits_{j} W_{i,j}y_{j}}}\\ & = & \frac{1}{1+ e^{-net_{i}}}\;. \end{array} $$

Thus, the above argument shows that the Boltzmann Machine (eventually) samples from the associated Boltzmann distribution. Incidentally, we can also see now why n e t i is equivalent to the log odds ratio under the Boltzmann distribution (recall the discussion in Section 8 of the Yang and Shadlen 2007 experiment):

$$\begin{array}{@{}rcl@{}} \frac{P(\mathbf{y}^{\prime})}{P(\mathbf{y})} & = & \frac{e^{-E(\mathbf{y}^{\prime})}}{e^{-E(\mathbf{y})}} \\ & = & e^{E(\mathbf{y}) - E(\mathbf{y}^{\prime})} \\ & = & e^{\sum\limits_{j} W_{i,j}y_{j}} \\ & = & e^{net_{i}}\;. \end{array} $$

And thus,

$$\begin{array}{@{}rcl@{}} \log\;\frac{P(\mathbf{y}^{\prime})}{P(\mathbf{y})} & = & \log \left( e^{net_{i}}\right) \\ & = & net_{i}\;.\end{array} $$

Appendix B: Softmax versus Sampling Rule

In this Appendix, we offer some observations about the relation between the softmax (or generalized Luce-Shepard) choice rule and the sampling-based decision rules discussed in this chapter. Suppose we associate with a subject a probability function P(⋅) on sample space \(\mathcal {H} = \{H_{1},\dots ,H_{n}\}\) and a utility u over actions \(\mathcal {A} = \{A_{1},\dots ,A_{m}\}\).

The softmax rule says that the subject will give response A with probability

$$\frac{e^{v(A)/\beta}}{\sum\limits_{A^{\prime}\in\mathcal{A}} e^{v(A^{\prime})/\beta}} , $$

where v is some value function, which in this case we will assume is the log expected utility:

$$v(H) = \log \sum\limits_{H\in\mathcal{H}}P(H)u(A,H) .$$

As a representative example of a sampling based rule, recall Decision Rule B:

Decision Rule B: Suppose we are given a generative model \(\mathcal {M}\) with \(\mathcal {V}\) as possible return values. To select an action, take R samples, H (1),…,H (R), using \(\mathcal {M}\), and let best be the set of actions that receive the largest summed utilities, i.e.,

$$\text{\textsc{best}} = \left\{A_{j}: \sum\limits_{i=1}^{R}u(A_{j},H^{(i)})\text{ is maximal}\right\}.$$

Take action A j best with probability \(\frac {1}{\vert \text {\textsc {best}}\vert }\).

In the specific case of a certain kind of estimation problem—where \(\mathcal {H} = \mathcal {A}\) and the utility of an estimate is 1 if correct, 0 otherwise; thus expected utility and probability coincide—it is easy to see that the softmax rule with β = 1 and Decision Rule B with R = 1 (or R = 2) are equivalent. The probability of returning hypothesis H is just P(H), i.e., we have probability matching.

Unfortunately, even restricting to these simple estimation problems, the relation between the two rules as functions of β and R is intractable and varies with the probability distribution P(⋅), as Vul (2010) points out. Thus, beyond β = R = 1 it is hard to study their relationship. As mentioned in the text, both can be used to fit much of the psychological data, though one might suspect the sampling rule is more reasonable on the basis of computational considerations.

Interestingly, for more general classes of decision problems, these two classes of rules can be qualitatively distinguished. The Luce Choice Axiom, from which the Luce choice rule was originally derived (Luce 1959), gives a hint of how we might do this. Where P S (T) is the probability of choosing an action from \(T\subseteq \mathcal {A}\) from among options in \(S\subseteq \mathcal {A}\), the choice axiom states that for all R such that TRS:

$$P_{S}(T) \;=\; P_{S}(R) \;P_{R}(T)\;.$$

It is easy to see the softmax rule satisfies the choice axiom for all values of β:

$$\begin{array}{@{}rcl@{}} P^{\text{softmax}}_{S}(T) & = & \frac{\sum\limits_{A\in T}e^{v(A)/\beta}}{\sum\limits_{A\in S}e^{v(A)/\beta}} \\ & = & \frac{\sum\limits_{A\in R}e^{v(A)/\beta}}{\sum\limits_{A\in S}e^{v(A)/\beta}} \;\cdot\;\frac{\sum\limits_{A\in T}e^{v(A)/\beta}}{\sum\limits_{A\in R}e^{v(A)/\beta}} \\ & = & P^{\text{softmax}}_{S}(R) \;P^{\text{softmax}}_{R}(T)\;. \end{array} $$

For estimation problems, Decision Rule B with R = 1 also satisfies this axiom. However, in more general contexts, even for the case of R = 1, it does not. Perhaps the simplest illustration of this is the decision problem in Table 1, where \(\mathcal {H} = \{H_{1},H_{2}\}\) and \(\mathcal {A} = \{A_{1},A_{2},A_{3}\}\), and 𝜖>0.

Table 1 Distinguishing softmax and sample-based decision rules

We clearly have \(P^{\text {\textsc {Rule B}}}_{\{A_{1},A_{2},A_{3}\}}(\{A_{1}\}) = 0\). Yet, as long as P(H 1),P(H 2)>0 and 𝜖<1, we have

$$P^{\text{\textsc{Rule B}}}_{\{A_{1},A_{2},A_{3}\}}(\{A_{1},A_{2}\})\;P^{\text{\textsc{Rule B}}}_{\{A_{1},A_{2}\}}(\{A_{1}\})>0\;,$$

showing the violation of the choice axiom. As the softmax rule satisfies the axiom, this gives us an instance where we would expect different behavior, depending on which rule (if either) a subject is using. When presented with such a problem, a softmax-based agent would sometimes choose action A 1. In fact, the probability of choosing A 1 can be nearly as high as 1/3, for example, when β = 1, 𝜖 is very small, and H 1 and H 2 are equiprobable. However, for any set of samples of any length R, one of A 2 or A 3 would always look preferable for a sampling agent. Thus, such an agent would never choose A 1. It would be interesting to test this difference experimentally. Doing so could offer important evidence about the tenability of the Sampling Hypothesis.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Icard, T. Subjective Probability as Sampling Propensity. Rev.Phil.Psych. 7, 863–903 (2016). https://doi.org/10.1007/s13164-015-0283-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13164-015-0283-y

Keywords

Navigation