Skip to main content
Log in

Probability and proximity in surprise

  • Published:
Synthese Aims and scope Submit manuscript

Abstract

This paper proposes an analysis of surprise formulated in terms of proximity to the truth, to replace the probabilistic account of surprise. It is common to link surprise to the low (prior) probability of the outcome. The idea seems sensible because an outcome with a low probability is unexpected, and an unexpected outcome often surprises us. However, the link between surprise and low probability is known to break down in some cases. There have been some attempts to modify the probabilistic account to deal with these cases, but as we shall see, they are still faced with problems. The new analysis of surprise I propose turns to accuracy (proximity to the truth) and identifies an unexpected degree of inaccuracy as reason for surprise. The shift from probability to proximity allows us to solve puzzles that strain the probabilistic account of surprise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. See, for example, recent quantitative formulations (McGrew 2003, Schupbach and Sprenger 2011, Crupi and Tentori 2012) of Peirce’s idea that explanation eliminates surprise (Peirce 1931–35, 5: p. 189). These authors all assume that the degree of surprise is an inverse function of the (prior) probability.

  2. See Tversky and Kahneman (1974) and the subsequent literature on “heuristics and biases”. See, in particular, Landy, Silbert and Goldin (2013) on heuristics for estimating large numbers, which is often a factor in paradigmatic cases of surprise.

  3. Authors sympathetic to Horwich’s analysis include Good (1984), White (2000), Olsson (2002) and Manson and Thrush (2003). See Schlesinger (1991, pp. 99–102) and Harker (2012) for criticisms of Horwich’s analysis.

  4. See Schlesinger (1991, pp. 99–102) and Harker (2012). Baras (Baras 2019, p. 1409n) reports (based on his correspondence with Horwich) that Horwich did not intend the existence of a serious alternative hypothesis to be a necessary condition for surprise.

  5. See, again, Schlesinger (1991, pp. 99–102) and Harker (2012).

  6. See Harker (2012, p. 253) for a proposal along this line.

  7. The raffle tickets may have numerals on them, but they are assigned only for the purpose of identification. We can replace them by letters, colors, shapes, etc. that represent no quantities.

  8. If x is the greatest lower bound, no other quantity is a greatest lower bound; but if x is only a lower bound, any quantity smaller than x is also a lower bound.

  9. Miller (1974) and Tichý (1974) proved, independently, that the original definition by Popper (1963) is defective. See Oddie (2014) for a review of various attempts to overcome the difficulty, and for different conceptions of verisimilitude (truthlikeness).

  10. Note also that adding a known (new) truth to a theory does not necessarily increase its overall verisimilitude if the theory to which it is added is false (has false logical consequences). Adding a truth to a false theory may increase its falsity-content more than its truth-content. As a result, we cannot even make a comparative judgment that the addition of a truth increases the verisimilitude of the theory.

  11. See Gneiting and Raftery (2007) for a review of the literature, and Winkler and Jose (2010) for an accessible overview.

  12. I will not discuss a categorical statement (with no assignment of a probability) separately because a categorical statement can be considered a special case of a probability distribution where the entire probability is assigned to one member of the partition.

  13. When the members xi in the ranked partition are the values of a continuous variable, we need the Continuous Ranked Probability Score instead of the Ranked Probability Score, but I restrict discussions to discrete cases here because an extension to the continuous cases does not change the substance of my analysis.

  14. See Roche and Shogenji (2018) for similar reasoning for Strict Propriety. Calling it “an epistemic reason” (p. 594), they argue that the prior probability distribution p(xi) is the most accurate in retrospect if and only if it is identical to the posterior (updated) probability distribution p(xi | y) given the new evidence y.

  15. See, for example, Levinstain (2012) and Roche and Shogenji (2018) for arguments against the Brier Score in favor of the Logarithmic Score SRL(p, i) = − log p(xi) for a non-ranked partition.

  16. The point holds regardless of the choice of the scoring rule. Since p is equiprobable over XRAF = {x1, …, x1000}, the degree of inaccuracy is the same no matter who is the winner, so that the actual inaccuracy is exactly the expected inaccuracy.

  17. I am using here the ratio of the actual degree of inaccuracy to the expected degree of inaccuracy (instead of the difference between them) to measure the unexpectedness of the actual degree of inaccuracy. This is because the degree of inaccuracy, as measured by SRB or SRRPS, is on a ratio scale with a unique and non-arbitrary zero value (reached when the probability distribution is completely accurate) and with no finite upper bound.

  18. I am using the case of five out of five here, instead of four out of four used earlier, because the degree of inaccuracy for five out of five is closer (than four out of four) to the degree of inaccuracy for 53 out of 100. Suitable examples are different for comparing probabilities and comparing inaccuracies because the degree of inaccuracy (as measured by the Ranked Probability Score) is not a simple function of the probability assigned to the outcome.

  19. The exact H-ratio of one half (more generally, the exact match between the probability and the observed frequency ratio) looks particularly suspect for two reasons. First, it makes the degree of accuracy not just better than expected, but the best possible. The second reason is the possible explanation of the exact match by manipulation. As Horwich points out (see Sect. 2 above), the recognition of a possible alternative explanation shakes our confidence in the default assumption. When the H-ratio is close to one half, but not exactly one half, the possible explanation of the departure from the expectation is less obvious.

  20. This point is consistent with the fact that the most probable outcome is unsurprising in many cases. It is true that the most probable member of a non-ranked partition (to set aside the additional complication for a ranked partition) makes the probability distribution the least inaccurate, and thus makes the degree of inaccuracy less than expected. Note, however, that the probability distribution plays two roles. It determines the degree of inaccuracy SR(p, i) for the outcome xi, but it also determines the weights for calculating the expected inaccuracy. Since the most probable member receives the greatest weight, the expected inaccuracy is pulled closer to its degree of inaccuracy. In order for the most probable outcome to be surprising, the partition must consist of many members and the outcome in question must not be so probable as to pull the expected inaccuracy close to its degree of inaccuracy.

  21. These are all cases of the ex post evaluation of the hypothesis based on the actual outcome, but we can extend the idea to ex ante evaluation: We may seek to minimize the estimated inaccuracy of the hypothesis in the ex ante selection of a probabilistic hypothesis. This is a familiar theme in statistical learning theory (see Burnham and Anderson 2002 for an overview).

  22. There is the proposal of the “accuracy first” approach (Leitgeb and Pettigrew 2010a, b; Pettigrew 2016) to use accuracy as the ground for the basic principles of the probability calculus, but what I have in mind here is the use of proximity to the truth in solving some puzzles in epistemology and philosophy of science. See Shogenji (2018) for an instance of such attempts.

References

  • Baras, D. (2019). Why do certain states of affairs call out for explanation? A critique of two Horwichian accounts. Philosophia, 47(5), 1405–1419.

    Article  Google Scholar 

  • Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach. New York: Springer.

    Google Scholar 

  • Crupi, V., & Tentori, K. (2012). A second look at the logic of explanatory power (with two novel representation theorems). Philosophy of Science, 79(3), 365–385.

    Article  Google Scholar 

  • Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477), 359–378.

    Article  Google Scholar 

  • Good, I. J. (1984). A Bayesian approach in the philosophy of inference. British Journal for the Philosophy of Science, 35(2), 161–173.

    Article  Google Scholar 

  • Harker, D. (2012). A surprise for Horwich (and some advocates of the fine-tuning argument (which does not include Horwich (as far as I know))). Philosophical Studies, 161(2), 247–261.

    Article  Google Scholar 

  • Horwich, P. (1982). Probability and evidence. Cambridge: Cambridge University Press.

    Google Scholar 

  • Landy, D., Silbert, N., & Goldin, A. (2013). Estimating large numbers. Cognitive Science, 37(5), 775–799.

    Article  Google Scholar 

  • Leitgeb, H., & Pettigrew, R. (2010a). An objective justification of Bayesianism I: Measuring inaccuracy. Philosophy of Science, 77(2), 201–235.

    Article  Google Scholar 

  • Leitgeb, H., & Pettigrew, R. (2010b). An objective justification of Bayesianism II: The consequences of minimizing inaccuracy. Philosophy of Science, 77(2), 236–272.

    Article  Google Scholar 

  • Levinstain, B. A. (2012). Leitgeb and Pettigrew on accuracy and updating. Philosophy of Science, 79(3), 413–424.

    Article  Google Scholar 

  • Manson, N. A., & Thrush, M. J. (2003). Fine-tuning, multiple universes, and the “This Universe” objection. Pacific Philosophical Quarterly, 84(1), 67–83.

    Article  Google Scholar 

  • McGrew, T. (2003). Confirmation, heuristics, and explanatory reasoning. The British Journal for the Philosophy of Science, 54(4), 553–567.

    Article  Google Scholar 

  • Miller, D. (1974). Popper’s qualitative theory of verisimilitude. British Journal for the Philosophy of Science, 25(2), 166–177.

    Article  Google Scholar 

  • Oddie, G. (2014). Truthlikeness. Retrieved July 1, 2020, from Stanford Encyclopedia of Philosophy: https://plato.stanford.edu/entries/truthlikeness/

  • Olsson, E. (2002). Corroborating testimony, probability and surprise. British Journal for the Philosophy of Science, 53(2), 273–288.

    Article  Google Scholar 

  • Peirce, C. S. (1931-35). The collected papers of Charles Sanders Peirce (Vols. 1-6). In C. Hartshorne, & P. Weiss (Eds.). Cambridge MA: Harvard University Press.

  • Pettigrew, R. (2016). Accuracy and the laws of credence. Oxford: Oxford University Press.

    Book  Google Scholar 

  • Popper, K. (1963). Conjectures and refutations. London: Routledge.

    Google Scholar 

  • Roche, W., & Shogenji, T. (2018). Information and Inaccuracy. British Journal for the Philosophy of Science, 69(2), 577–604.

    Article  Google Scholar 

  • Schlesinger, G. (1991). The sweep of probability. Notre Dame: University of Notre Dame Press.

    Google Scholar 

  • Schupbach, J. N., & Sprenger, J. (2011). The logic of explanatory power. Philosophy of Science, 78(1), 105–127.

    Article  Google Scholar 

  • Shogenji, T. (2018). Formal epistemology and Cartesan skepticism: In defense of belief in the natural world. Abingdon: Routledge.

    Google Scholar 

  • Tichý, P. (1974). On Popper’s definitions of verisimilitude. The British Journal for the Philosophy of Science, 25, 155–160.

    Article  Google Scholar 

  • Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185, 1124–1131.

    Article  Google Scholar 

  • White, R. (2000). Fine-tuning and multiple universes. Noûs, 34(2), 260–276.

    Article  Google Scholar 

  • Winkler, R. L., & Jose, V. R. (2010). Scoring rules. In J. J. Cochran (Ed.), Wiley encychopedia of operations research and management science. Hoboken NJ: Wiley.

    Google Scholar 

Download references

Acknowledgements

Precursors of this paper were presented at Chinese Academy of Social Sciences, Rhode Island College, the University of Turin, and the University of Groningen. I would like to thank the audiences at these institutions for valuable comments. I would also like to thank Matt Duncan and William Roche for carefully reading an earlier version and making many suggestions for improvement.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tomoji Shogenji.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shogenji, T. Probability and proximity in surprise. Synthese 198, 10939–10957 (2021). https://doi.org/10.1007/s11229-020-02761-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11229-020-02761-6

Keywords

Navigation