Abstract
In attempting to form rational personal probabilities by direct inference, it is usually assumed that one should prefer frequency information concerning more specific reference classes. While the preceding assumption is intuitively plausible, little energy has been expended in explaining why it should be accepted. In the present article, I address this omission by showing that, among the principled policies that may be used in setting one’s personal probabilities, the policy of making direct inferences with a preference for frequency information for more specific reference classes yields personal probabilities whose accuracy is optimal, according to all proper scoring rules, in situations where all of the relevant frequency information is point-valued. Assuming that frequency information for narrower reference classes is preferred, when the relevant frequency statements are point-valued, a dilemma arises when choosing whether to make a direct inference based upon (i) relatively precise-valued frequency information for a broad reference class, R, or upon (ii) relatively imprecise-valued frequency information for a more specific reference class, \(\hbox {R}^{\prime }\) (\(\hbox {R}^{\prime }\subset \hbox {R}\)). I address such cases, by showing that it is often possible to make a precise-valued frequency judgment regarding \(\hbox {R}^{\prime }\) based on precise-valued frequency information for R, using standard principles of direct inference. Having made such a frequency judgment, the dilemma of choosing between (i) and (ii) is removed, and one may proceed by using the precise-valued frequency estimate for the more specific reference class as a premise for direct inference.
Similar content being viewed by others
Notes
In fact, further qualifications are required in order to exclude degenerate direct inferences (cf. Pollock 1990; Kyburg and Teng 2001; Thorn 2012). The problem of excluding degenerate direct inferences does not arise within the simple sorts of population model considered in Sect. 2. In Sect. 3.5, I will say a little bit about the problem of excluding degenerate direct inferences.
Pollock (1990, p. 86) asserts that the preference for narrower reference classes is a ‘kind of’ total evidence requirement. This may be. However, there is no straightforward way to direct the force of arguments in support of Carnap’s Principle of Total Evidence in order to support a preference for narrower reference classes. Indeed, Carnap’s principle of total evidence (1962, p. 211) prescribes that one’s posterior probability for a proposition \(\alpha \) be identical to one’s prior probability for \(\alpha \) conditional on one’s complete body of evidence. So in a case where one’s complete body of evidence consists of \(\hbox {freq}(\hbox {T}|\hbox {R}) = 0.6\), \(\hbox {freq}(\hbox {T}|\hbox {R}^{\prime }) = 0.9, \hbox {R}^{\prime }\) \(\subseteq \) R, and c \(\in \hbox {R}^{\prime }\), Carnap’s principle prescribes that one’s posterior probability for \(\hbox {c} \in \hbox {T}\) be identical to one’s prior probability for \(\hbox {c} \in \hbox {T}\) conditional on \(\hbox {freq}(\hbox {T}|\hbox {R}) = 0.6 \wedge \hbox {freq}(\hbox {T}|\hbox {R}^{\prime }) = 0.9 \wedge \hbox {R}^{\prime } \subseteq \hbox {R} \wedge \hbox {c} \in \hbox {R}^{\prime }\). However, since one’s prior probability for \(\hbox {c} \in \hbox {T}\) conditional on \(\hbox {freq}(\hbox {T}|\hbox {R}) = 0.6 \wedge \hbox {freq}(\hbox {T}|\hbox {R}^{\prime }) = 0.9 \wedge \hbox {R}^{\prime } \subseteq \hbox {R} \wedge \hbox {c} \in \hbox {R}^{\prime }\) need not be 0.9, the preference for narrower reference classes does not follow from Carnap’s principle. Perhaps rational personal probabilities are structured in such a way as to generate a preference for narrower reference classes, when updating by conditionalization (cf. Thorn 2014). Whether this is the case is something that would need to be argued for, independently of the Principle of Total Evidence.
In the circumstance of making a judgment about the probability that an object c is a member of T, it is always possible to introduce a description of maximal specificity, which denotes the unit set consisting of the very object about which one is reasoning. It is reasonable to ignore such descriptions, in cases where we have no substantive information concerning the value of \(\hbox {freq}(\hbox {T}|\{\hbox {c}\}\)). So, in the present section, I take the reasonable course of ignoring such descriptions. But see Sect. 3.3, which considers the proper treatment of reference classes for which one has no prior frequency information, and provides a more adequate treatment of the present issue.
Note that the present specification of categories represents a generalization of the case where F is the set of all subsets of U, which corresponds to the case where of \(\Pi = \{\{x\} : x \in U\}\).
Easwaran (2013, p. 124) appeals to a similar condition in showing that updating by conditionalization maximizes expected accuracy, including the case of probability functions that are defined over infinite sets of possible worlds.
Note that the oracular policy, \(\nu \), will be principled in some population models, such as in population models where \(\Pi = \{\{\hbox {x}\} : \hbox {x} \in \hbox {U}\}\). In all such cases, \(\delta (\hbox {x} \in \hbox {T}) = \nu (\hbox {x} \in \hbox {T}\)), for all x in U.
For the sake of uniformity, negatively oriented scoring rules (such as Brier scoring) are treated as loss functions, where the scores corresponding to such loss functions are determined by multiplying the loss earned according to such a rule by \(-\)1.
The proof of Theorem 2 is identical to that of Theorem 1, where we replace instances of \(\pi \) and \(\Pi \), with f and F, and instances of x, \(\hbox {x}_i\), and U, with \(\langle \hbox {x}, \hbox {f}\rangle \), \(\langle \hbox {x}_i\), \(\hbox {f}\rangle \), and \(\hbox {U}^{\text {F}}\), excluding instances of x and \(\hbox {x}_i\) in the scope of \(\nu \).
Note that I have not argued here that all direct inferences based on reference classes that are partitions are degenerate. I only maintain that there is a preference for direct inferences that employ ‘standard’ reference classes versus their partitions.
The results also imply the expected optimality of \(\delta \), in the case where an agent makes inferences about uniformly randomly selected elements of the domain.
I will touch on this problem briefly in Sect. 3.5, in connection with the classical principle of indifference.
Both of the mentioned values are identical to: \(|\{\)S : S \(\subseteq \) R \(\wedge \) |S| \(\in \) W \(\wedge \) freq(T|S) = v\(_i\}|\)/\(|\{\)S : S \(\subseteq \) R \(\wedge \) |S| \(\in \) W \(\wedge \) freq(T|S) \(\in \) V\(\}|\).
A proof of Theorem 6 is given in the appendix.
In cases where the size of \(|\hbox {R}^{\prime }|\) is unknown, let \(\hbox {s}^{+}\) be the least upper bound that one is warranted in accepting regarding the size of \(|\hbox {R}^{\prime }|\). In Step 1, we then make one direct inference for each value of i in \(\{0, ..., \hbox {s}^+\}\).
Recall that \(\hbox {PROB}(\hbox {c} \in \hbox {T}) = \hbox {E}\left[ \hbox {freq}(\hbox {T}|\{\hbox {c}\})\right] \).
The restriction in the applicability of theorem 7 to cases where one is not warranted in accepting that \(\hbox {freq}(\hbox {T}|\hbox {R}^{\prime }) \in \) v, for any v, such that v \(\subset \) \(\{\)0/\(|\hbox {R}^{\prime }|\), 1/\(|\hbox {R}^{\prime }|\), ..., \(|\hbox {R}^{\prime }|\)/\(|\hbox {R}^{\prime }|\}\) is also suggestive of where past accounts of direct inference (with the possible exception of Thorn 2012) go wrong in the face of Stone’s Ace Urn example (Stone 1987, p. 251)
The example of Bradley and Steele (2014) is meant to serve as a plausible example of credence dilation. If the present treatment of the example is correct, it cannot serve as an example of rational credence dilation.
Consider a regular direct inference of the form: From \(\hbox {c} \in \hbox {R}\) and \(\hbox {freq}(\hbox {T}|\hbox {R})\) = r infer that \(\hbox {PROB}(\hbox {c} \in \hbox {T})\) = r, where r = i/\(|\hbox {R}|\). To achieve such an emulation of this direct inference, proceed as follows: (i) Introduce a set of names {c\(_1\), ..., c\(_{\text {i}}\), c\(_{\text {i}+1}\), ..., c\(_{|\text {R}|}\)} for the elements of R, where c\(_1\) through c\(_{\text {i}}\) denote elements of T, and c\(_{\text {i}+1}\) through c\(_{|\text {R}|}\) do not. (ii) Form the reference class R\(_{\pi }\) = {c=c\(_1\), ..., c=c\(_{|\text {R}|}\)}. (iii) Where V is the set of all true propositions, make direct inferences of the form: From c=c\(_j\) \(\in \) R\(_{\pi }\) and freq(V\(|\hbox {R}_{\pi }\)) = 1/\(|\hbox {R}|\) infer that PROB(c=c\(_j\) \(\in \) V) = 1/\(|\hbox {R}|\), for each j in {1, ..., i}. (iv) Given the conclusions of the direct inferences made in step (iii), deduce that \(\hbox {PROB}(\hbox {c} \in \hbox {T})\) = i/\(|\hbox {R}|\).
The present issue obviously deserves are more detailed and careful treatment than is given here. For reasons of space, I leave this task to another occasion.
References
Bacchus, F. (1990). Representing and reasoning with probabilistic knowledge. Cambridge, MA: MIT Press.
Bradley, S., & Steele, K. (2014). Uncertainty, learning, and the “Problem” of dilation. Erkenntnis, 79(6), 1287–1303.
Brier, G. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78, 1–3.
Carnap, R. (1962). Logical foundations of probability. Chicago: University of Chicago Press.
de Finetti, B. (1974). Theory of probability (vol. 1). New York: Wiley.
Easwaran, K. (2013). Expected accuracy supports conditionalization and conglomerability and reflection. Philosophy of Science, 80(1), 119–142.
Gould, H. (2010). Combinatorial identities: Table I: Intermediate techniques for summing finite series. In J. Quaintance (Ed.), http://www.math.wvu.edu/~gould/Vol.4.PDF. Accessed 3 Feb 2016.
Greaves, H., & Wallace, D. (2006). Justifying conditionalization: Conditionalization maximizes expected epistemic utility. Mind, 114, 607–632.
Joyce, J. (1998). A nonpragmatic vindication of probabilism. Philosophy of Science, 65(4), 575–603.
Kyburg, H. (1974). The logical foundations of statistical inference. Dordrecht: Reidel Publishing Company.
Kyburg, H., & Teng, C. (2001). Uncertain inference. Cambridge: Cambridge University Press.
Leitgeb, H., & Pettigrew, R. (2010a). An objective justification of Bayesianism I: Measuring inaccuracy. Philosophy of Science, 77(2), 201–235.
Leitgeb, H., & Pettigrew, R. (2010b). An Objective Justification of Bayesianism II: The Consequences of Minimizing Inaccuracy. Philosophy of Science, 77(2), 236–272.
Levinstein, B. (2012). Leitgeb and Pettigrew on accuracy and updating. Philosophy of Science, 79(3), 413–424.
Pollock, J. (1990). Nomic probability and the foundations of induction. Oxford: Oxford University Press.
Reichenbach, H. (1949). A theory of probability. Berkeley: Berkeley University Press.
Selten, R. (1998). Axiomatic characterization of the quadratic scoring rule. Experimental Economics, 1, 43–62.
Stone, M. (1987). Kyburg, Levi, and Petersen. Philosophy of Science, 54(2), 244–255.
Thorn, P. (2012). Two problems of direct inference. Erkenntnis, 76, 299–318.
Thorn, P. (2014). Defeasible conditionalization. Journal of Philosophical Logic, 43, 283–302.
Venn, J. (1866). The logic of chance. New York: Chelsea Publishing Company.
White, R. (2009). Evidential symmetry and mushy credence. In T. Szabo Gendler & J. Hawthorne (Eds.), Oxford studies in epistemology (Vol. 3, pp. 161–186). Oxford: Oxford University Press.
Acknowledgments
Work on this paper was supported by DFG Grant SCHU1566/9-1 as part of the priority program “New Frameworks of Rationality” (SPP 1516). For helpful comments on a presentation of this paper, I am thankful for an audience at EPSA 2015. For helpful discussions, I am thankful to Ludwig Fahrbach, Gerhard Schurz, and Ioannis Votsis. Finally, I am especially thankful two anonymous referees for Synthese who provided excellent comments and suggestions concerning an earlier draft of the paper.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Theorem 1
\(\forall \) M,\(\chi \) : if \(\chi \) is principled in M, then \(\forall \)S:
-
(1)
if S is a proper scoring rule, then \(\forall \pi \in \Pi \) :
$$\begin{aligned} \Sigma _{{\mathrm{x}}\in \pi } \mathrm{S}(\delta (\mathrm{x} \in \mathrm{T}), \nu (\mathrm{x} \in \mathrm{T})) \ge \Sigma _{\text {x}\in \pi } \mathrm{S}(\chi (\mathrm{x} \in \mathrm{T}), \nu (\mathrm{x} \in \mathrm{T})), \mathrm{and} \end{aligned}$$ -
(2)
if S is a strictly proper scoring rule and \(\chi \) \(\ne \) \(\delta \) , then \(\exists \pi \in \Pi \) :
$$\begin{aligned} \Sigma _{\text {x}\in \pi } \mathrm{S}(\delta (\mathrm{x} \in \mathrm{T}), \nu (\mathrm{x} \in \mathrm{T})) > \Sigma _{\text {x}\in \pi } \mathrm{S}(\chi (\mathrm{x} \in \mathrm{T}), \nu (\mathrm{x} \in \mathrm{T})). \end{aligned}$$
Proof
Part (1): Consider an arbitrary \(\pi \) in \(\Pi \), and an arbitrary \(\hbox {x}_i\) in \(\pi \). We have \(\Sigma _{\text {x}\in \pi } \hbox {S}(\delta (\hbox {x} \in \hbox {T}), \nu (\hbox {x} \in \hbox {T})) = |\pi |\times [\hbox {S}(\delta (\hbox {x}_i \in \hbox {T}), 1)\times \delta (\hbox {x}_i\in \hbox {T}) + \hbox {S}(\delta (\hbox {x}_i \in \hbox {T}), 0)\times (1-\delta (\hbox {x}_i \in \hbox {T})]\), and \(\Sigma _{\text {x}\in \pi } \hbox {S}(\chi (\hbox {x} \in \hbox {T}\)), \(\nu (\hbox {x} \in \hbox {T})) = |\pi |\times [\hbox {S}(\chi (\hbox {x}_i\) \(\in \hbox {T}), 1)\times \delta (\hbox {x}_i \in \hbox {T}) + \hbox {S}(\chi (\hbox {x}_i \in \hbox {T}), 0)\times (1-\delta (\hbox {x}_i \in \hbox {T})]\) (since \(\delta \) and \(\chi \) are principled). Since S is proper, we have for all \(\hbox {x}: \hbox {S}(\delta (\hbox {x} \in \hbox {T}), 1)\times \delta (\hbox {x} \in \hbox {T}) + \hbox {S}(\delta (\hbox {x} \in \hbox {T}), 0)(1-\delta (\hbox {x} \in \hbox {T})) \ge \hbox {S}(\chi (\hbox {x} \in \hbox {T}), 1)\times \delta (\hbox {x} \in \hbox {T}) + \hbox {S}(\chi (\hbox {x} \in \hbox {T}), 0)(1-\delta (\hbox {x} \in \hbox {T}))\). \(\square \)
Part (2): For some \(\pi \), we have \(\delta (\hbox {x} \in \hbox {T}\)) \(\ne \chi (\hbox {x} \in \hbox {T}\)), for all x in \(\pi \) (since \(\delta \) and \(\chi \) are principled and \(\delta \ne \chi \)). Consider such a \(\pi \). For such a \(\pi , \hbox {S}(\delta (\hbox {x} \in \hbox {T}), 1)\times \delta (\hbox {x} \in \hbox {T}) + \hbox {S}(\delta (\hbox {x} \in \hbox {T}), 0)\times (1-\delta (\hbox {x} \in \hbox {T})) > \hbox {S}(\chi (\hbox {x} \in \hbox {T}), 1)\times \delta (\hbox {x} \in \hbox {T}) + \hbox {S}(\chi (\hbox {x} \in \hbox {T}), 0)\times (1-\delta (\hbox {x} \in \hbox {T}))\), for all x in \(\pi \), since S is strictly proper. \(\square \)
Theorem 6
\(\forall \)T,R,R\(^{\prime }\) : if R\(^{\prime }\) \(\subseteq \) R and \(\forall i\) : PROB(freq(T|R\(^{\prime }\)) = i/|R\(^{\prime }|\)) = freq(\(\{\)S : freq(T|S) = i/|R\(^{\prime }|\}|\{\)S : S \(\subseteq \) R \(\wedge \) |S| = |R\(^{\prime }|\}\)), then E[freq(T|R\(^{\prime }\))] = freq(T|R).
Proof
Let T, R, and \(\hbox {R}^{\prime }\) be arbitrary sets such that \(\hbox {R}^{\prime } \subseteq \) R. Note that, for all \(i, \hbox {freq}(\{\hbox {S} : \hbox {freq}(\hbox {T}|\hbox {S}) = i/|\hbox {R}^{\prime }|\}|\{\hbox {S} : \hbox {S} \subseteq \hbox {R} \wedge |\hbox {S}| = |\hbox {R}^{\prime }|\}) = {\text {g} \atopwithdelims ()i}\times {|R|-\text {g} \atopwithdelims ()|R^{\prime }|-i} / {|R| \atopwithdelims ()|R^{\prime }|}\), where \(\hbox {g} = \hbox {freq}(\hbox {T}|\hbox {R})\times |\hbox {R}|\). So, for all i, \(\hbox {PROB}(\hbox {freq}(\hbox {T}|\hbox {R}^{\prime }) = i/|\hbox {R}^{\prime }|) = {\text {g} \atopwithdelims ()i}\times {|R|-\text {g} \atopwithdelims ()|R^{\prime }|-i} / {|R| \atopwithdelims ()|R^{\prime }|}\). So \(\hbox {E}[\hbox {freq}(\hbox {T}|\hbox {R}^{\prime })] = \Sigma _{i \in \{0, ..., |\text {R}^{\prime }|\}} \,i\,/|\hbox {R}^{\prime }|\times \hbox {PROB}(\hbox {freq}(\hbox {T}|\hbox {R}^{\prime }) = i/|\hbox {R}^{\prime }|) = \Sigma _{i \in \{0, ..., |\text {R}^{\prime }|\}} \,i\,/|\hbox {R}^{\prime }| \times {\text {g} \atopwithdelims ()i}\times {|R|-\text {g} \atopwithdelims ()|R^{\prime }|-i} / {|R| \atopwithdelims ()|R^{\prime }|} = 1/|\hbox {R}^{\prime }| \times 1/{|\text {R}| \atopwithdelims ()|\text {R}^{\prime }|} \times \Sigma _{i \in \{0, ..., |\text {R}^{\prime }|\}} {\text {g} \atopwithdelims ()i}\times {|R|-\text {g} \atopwithdelims ()|R^{\prime }|-i}\times {i \atopwithdelims ()1} = 1/|\hbox {R}^{\prime }| \times 1/{|\text {R}| \atopwithdelims ()|\text {R}^{\prime }|} \times {\text {g} \atopwithdelims ()1}\times {\text {g}+|\text {R}|-\text {g}-1 \atopwithdelims ()|\text {R}^{\prime }|-1}\) [by Vandermonde’s Identity (cf. Gould (2010), 6.17)] \(= 1/|\hbox {R}^{\prime }| \times (|\hbox {R}^{\prime }|!\times (|\hbox {R}|-|\hbox {R}^{\prime }|)!)/|\hbox {R}|! \times \hbox {g} \times (|\hbox {R}|-1)!/(|\hbox {R}^{\prime }-1|!\times (|\hbox {R}|-|\hbox {R}^{\prime }|)!) = \hbox {g}/|\hbox {R}| = \hbox {freq}(\hbox {T}|\hbox {R})\). \(\square \)
Rights and permissions
About this article
Cite this article
Thorn, P.D. On the preference for more specific reference classes. Synthese 194, 2025–2051 (2017). https://doi.org/10.1007/s11229-016-1035-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11229-016-1035-y