On the preference for more specific reference classes

Thorn, Paul D.

doi:10.1007/s11229-016-1035-y

On the preference for more specific reference classes

Published: 08 February 2016

Volume 194, pages 2025–2051, (2017)
Cite this article

Synthese Aims and scope Submit manuscript

Paul D. Thorn¹

270 Accesses
8 Citations
Explore all metrics

Abstract

In attempting to form rational personal probabilities by direct inference, it is usually assumed that one should prefer frequency information concerning more specific reference classes. While the preceding assumption is intuitively plausible, little energy has been expended in explaining why it should be accepted. In the present article, I address this omission by showing that, among the principled policies that may be used in setting one’s personal probabilities, the policy of making direct inferences with a preference for frequency information for more specific reference classes yields personal probabilities whose accuracy is optimal, according to all proper scoring rules, in situations where all of the relevant frequency information is point-valued. Assuming that frequency information for narrower reference classes is preferred, when the relevant frequency statements are point-valued, a dilemma arises when choosing whether to make a direct inference based upon (i) relatively precise-valued frequency information for a broad reference class, R, or upon (ii) relatively imprecise-valued frequency information for a more specific reference class, $\hbox {R}^{\prime }$ ($\hbox {R}^{\prime }\subset \hbox {R}$). I address such cases, by showing that it is often possible to make a precise-valued frequency judgment regarding $\hbox {R}^{\prime }$ based on precise-valued frequency information for R, using standard principles of direct inference. Having made such a frequency judgment, the dilemma of choosing between (i) and (ii) is removed, and one may proceed by using the precise-valued frequency estimate for the more specific reference class as a premise for direct inference.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Article Open access 01 April 2016

Recognize the Value of the Sum Score, Psychometrics’ Greatest Accomplishment

Article Open access 17 April 2024

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Article Open access 05 May 2021

Notes

In fact, further qualifications are required in order to exclude degenerate direct inferences (cf. Pollock 1990; Kyburg and Teng 2001; Thorn 2012). The problem of excluding degenerate direct inferences does not arise within the simple sorts of population model considered in Sect. 2. In Sect. 3.5, I will say a little bit about the problem of excluding degenerate direct inferences.
Pollock (1990, p. 86) asserts that the preference for narrower reference classes is a ‘kind of’ total evidence requirement. This may be. However, there is no straightforward way to direct the force of arguments in support of Carnap’s Principle of Total Evidence in order to support a preference for narrower reference classes. Indeed, Carnap’s principle of total evidence (1962, p. 211) prescribes that one’s posterior probability for a proposition $\alpha $ be identical to one’s prior probability for $\alpha $ conditional on one’s complete body of evidence. So in a case where one’s complete body of evidence consists of $\hbox {freq}(\hbox {T}|\hbox {R}) = 0.6$, $\hbox {freq}(\hbox {T}|\hbox {R}^{\prime }) = 0.9, \hbox {R}^{\prime }$ $\subseteq $ R, and c $\in \hbox {R}^{\prime }$, Carnap’s principle prescribes that one’s posterior probability for $\hbox {c} \in \hbox {T}$ be identical to one’s prior probability for $\hbox {c} \in \hbox {T}$ conditional on $\hbox {freq}(\hbox {T}|\hbox {R}) = 0.6 \wedge \hbox {freq}(\hbox {T}|\hbox {R}^{\prime }) = 0.9 \wedge \hbox {R}^{\prime } \subseteq \hbox {R} \wedge \hbox {c} \in \hbox {R}^{\prime }$. However, since one’s prior probability for $\hbox {c} \in \hbox {T}$ conditional on $\hbox {freq}(\hbox {T}|\hbox {R}) = 0.6 \wedge \hbox {freq}(\hbox {T}|\hbox {R}^{\prime }) = 0.9 \wedge \hbox {R}^{\prime } \subseteq \hbox {R} \wedge \hbox {c} \in \hbox {R}^{\prime }$ need not be 0.9, the preference for narrower reference classes does not follow from Carnap’s principle. Perhaps rational personal probabilities are structured in such a way as to generate a preference for narrower reference classes, when updating by conditionalization (cf. Thorn 2014). Whether this is the case is something that would need to be argued for, independently of the Principle of Total Evidence.
In the circumstance of making a judgment about the probability that an object c is a member of T, it is always possible to introduce a description of maximal specificity, which denotes the unit set consisting of the very object about which one is reasoning. It is reasonable to ignore such descriptions, in cases where we have no substantive information concerning the value of $\hbox {freq}(\hbox {T}|\{\hbox {c}\}$). So, in the present section, I take the reasonable course of ignoring such descriptions. But see Sect. 3.3, which considers the proper treatment of reference classes for which one has no prior frequency information, and provides a more adequate treatment of the present issue.
Note that the present specification of categories represents a generalization of the case where F is the set of all subsets of U, which corresponds to the case where of $\Pi = \{\{x\} : x \in U\}$.
Easwaran (2013, p. 124) appeals to a similar condition in showing that updating by conditionalization maximizes expected accuracy, including the case of probability functions that are defined over infinite sets of possible worlds.
Note that the oracular policy, $\nu $, will be principled in some population models, such as in population models where $\Pi = \{\{\hbox {x}\} : \hbox {x} \in \hbox {U}\}$. In all such cases, $\delta (\hbox {x} \in \hbox {T}) = \nu (\hbox {x} \in \hbox {T}$), for all x in U.
For the sake of uniformity, negatively oriented scoring rules (such as Brier scoring) are treated as loss functions, where the scores corresponding to such loss functions are determined by multiplying the loss earned according to such a rule by $-$1.
The proof of Theorem 2 is identical to that of Theorem 1, where we replace instances of $\pi $ and $\Pi $, with f and F, and instances of x, $\hbox {x}_i$, and U, with $\langle \hbox {x}, \hbox {f}\rangle $, $\langle \hbox {x}_i$, $\hbox {f}\rangle $, and $\hbox {U}^{\text {F}}$, excluding instances of x and $\hbox {x}_i$ in the scope of $\nu $.
Note that I have not argued here that all direct inferences based on reference classes that are partitions are degenerate. I only maintain that there is a preference for direct inferences that employ ‘standard’ reference classes versus their partitions.
The results also imply the expected optimality of $\delta $, in the case where an agent makes inferences about uniformly randomly selected elements of the domain.
For a survey of past approaches to the present problem, including those of Bacchus (1990), Pollock (1990), and Kyburg and Teng (2001), see Thorn (2012).
I will touch on this problem briefly in Sect. 3.5, in connection with the classical principle of indifference.
Both of the mentioned values are identical to: $|\{$S : S $\subseteq $ R $\wedge $ |S| $\in $ W $\wedge $ freq(T|S) = v$_i\}|$/$|\{$S : S $\subseteq $ R $\wedge $ |S| $\in $ W $\wedge $ freq(T|S) $\in $ V$\}|$.
A proof of Theorem 6 is given in the appendix.
In cases where the size of $|\hbox {R}^{\prime }|$ is unknown, let $\hbox {s}^{+}$ be the least upper bound that one is warranted in accepting regarding the size of $|\hbox {R}^{\prime }|$. In Step 1, we then make one direct inference for each value of i in $\{0, ..., \hbox {s}^+\}$.
Recall that $\hbox {PROB}(\hbox {c} \in \hbox {T}) = \hbox {E}\left[ \hbox {freq}(\hbox {T}|\{\hbox {c}\})\right] $.
The restriction in the applicability of theorem 7 to cases where one is not warranted in accepting that $\hbox {freq}(\hbox {T}|\hbox {R}^{\prime }) \in $ v, for any v, such that v $\subset $ $\{$0/$|\hbox {R}^{\prime }|$, 1/$|\hbox {R}^{\prime }|$, ..., $|\hbox {R}^{\prime }|$/$|\hbox {R}^{\prime }|\}$ is also suggestive of where past accounts of direct inference (with the possible exception of Thorn 2012) go wrong in the face of Stone’s Ace Urn example (Stone 1987, p. 251)
The example of Bradley and Steele (2014) is meant to serve as a plausible example of credence dilation. If the present treatment of the example is correct, it cannot serve as an example of rational credence dilation.
Consider a regular direct inference of the form: From $\hbox {c} \in \hbox {R}$ and $\hbox {freq}(\hbox {T}|\hbox {R})$ = r infer that $\hbox {PROB}(\hbox {c} \in \hbox {T})$ = r, where r = i/$|\hbox {R}|$. To achieve such an emulation of this direct inference, proceed as follows: (i) Introduce a set of names {c$_1$, ..., c$_{\text {i}}$, c$_{\text {i}+1}$, ..., c$_{|\text {R}|}$} for the elements of R, where c$_1$ through c$_{\text {i}}$ denote elements of T, and c$_{\text {i}+1}$ through c$_{|\text {R}|}$ do not. (ii) Form the reference class R$_{\pi }$ = {c=c$_1$, ..., c=c$_{|\text {R}|}$}. (iii) Where V is the set of all true propositions, make direct inferences of the form: From c=c$_j$ $\in $ R$_{\pi }$ and freq(V$|\hbox {R}_{\pi }$) = 1/$|\hbox {R}|$ infer that PROB(c=c$_j$ $\in $ V) = 1/$|\hbox {R}|$, for each j in {1, ..., i}. (iv) Given the conclusions of the direct inferences made in step (iii), deduce that $\hbox {PROB}(\hbox {c} \in \hbox {T})$ = i/$|\hbox {R}|$.
The present issue obviously deserves are more detailed and careful treatment than is given here. For reasons of space, I leave this task to another occasion.

References

Bacchus, F. (1990). Representing and reasoning with probabilistic knowledge. Cambridge, MA: MIT Press.
Google Scholar
Bradley, S., & Steele, K. (2014). Uncertainty, learning, and the “Problem” of dilation. Erkenntnis, 79(6), 1287–1303.
Article Google Scholar
Brier, G. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78, 1–3.
Article Google Scholar
Carnap, R. (1962). Logical foundations of probability. Chicago: University of Chicago Press.
Google Scholar
de Finetti, B. (1974). Theory of probability (vol. 1). New York: Wiley.
Google Scholar
Easwaran, K. (2013). Expected accuracy supports conditionalization and conglomerability and reflection. Philosophy of Science, 80(1), 119–142.
Article Google Scholar
Gould, H. (2010). Combinatorial identities: Table I: Intermediate techniques for summing finite series. In J. Quaintance (Ed.), http://www.math.wvu.edu/~gould/Vol.4.PDF. Accessed 3 Feb 2016.
Greaves, H., & Wallace, D. (2006). Justifying conditionalization: Conditionalization maximizes expected epistemic utility. Mind, 114, 607–632.
Article Google Scholar
Joyce, J. (1998). A nonpragmatic vindication of probabilism. Philosophy of Science, 65(4), 575–603.
Article Google Scholar
Kyburg, H. (1974). The logical foundations of statistical inference. Dordrecht: Reidel Publishing Company.
Book Google Scholar
Kyburg, H., & Teng, C. (2001). Uncertain inference. Cambridge: Cambridge University Press.
Book Google Scholar
Leitgeb, H., & Pettigrew, R. (2010a). An objective justification of Bayesianism I: Measuring inaccuracy. Philosophy of Science, 77(2), 201–235.
Article Google Scholar
Leitgeb, H., & Pettigrew, R. (2010b). An Objective Justification of Bayesianism II: The Consequences of Minimizing Inaccuracy. Philosophy of Science, 77(2), 236–272.
Article Google Scholar
Levinstein, B. (2012). Leitgeb and Pettigrew on accuracy and updating. Philosophy of Science, 79(3), 413–424.
Article Google Scholar
Pollock, J. (1990). Nomic probability and the foundations of induction. Oxford: Oxford University Press.
Google Scholar
Reichenbach, H. (1949). A theory of probability. Berkeley: Berkeley University Press.
Google Scholar
Selten, R. (1998). Axiomatic characterization of the quadratic scoring rule. Experimental Economics, 1, 43–62.
Article Google Scholar
Stone, M. (1987). Kyburg, Levi, and Petersen. Philosophy of Science, 54(2), 244–255.
Article Google Scholar
Thorn, P. (2012). Two problems of direct inference. Erkenntnis, 76, 299–318.
Article Google Scholar
Thorn, P. (2014). Defeasible conditionalization. Journal of Philosophical Logic, 43, 283–302.
Article Google Scholar
Venn, J. (1866). The logic of chance. New York: Chelsea Publishing Company.
Google Scholar
White, R. (2009). Evidential symmetry and mushy credence. In T. Szabo Gendler & J. Hawthorne (Eds.), Oxford studies in epistemology (Vol. 3, pp. 161–186). Oxford: Oxford University Press.

Download references

Acknowledgments

Work on this paper was supported by DFG Grant SCHU1566/9-1 as part of the priority program “New Frameworks of Rationality” (SPP 1516). For helpful comments on a presentation of this paper, I am thankful for an audience at EPSA 2015. For helpful discussions, I am thankful to Ludwig Fahrbach, Gerhard Schurz, and Ioannis Votsis. Finally, I am especially thankful two anonymous referees for Synthese who provided excellent comments and suggestions concerning an earlier draft of the paper.

Author information

Authors and Affiliations

Department of Philosophy, University of Duesseldorf, Duesseldorf, Germany
Paul D. Thorn

Authors

Paul D. Thorn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paul D. Thorn.

Appendix

Theorem 1

$\forall $ M,$\chi $ : if $\chi $ is principled in M, then $\forall $S:

(1)
if S is a proper scoring rule, then $\forall \pi \in \Pi $ :
$$\begin{aligned} \Sigma _{{\mathrm{x}}\in \pi } \mathrm{S}(\delta (\mathrm{x} \in \mathrm{T}), \nu (\mathrm{x} \in \mathrm{T})) \ge \Sigma _{\text {x}\in \pi } \mathrm{S}(\chi (\mathrm{x} \in \mathrm{T}), \nu (\mathrm{x} \in \mathrm{T})), \mathrm{and} \end{aligned}$$
(2)
if S is a strictly proper scoring rule and $\chi $ $\ne $ $\delta $ , then $\exists \pi \in \Pi $ :
$$\begin{aligned} \Sigma _{\text {x}\in \pi } \mathrm{S}(\delta (\mathrm{x} \in \mathrm{T}), \nu (\mathrm{x} \in \mathrm{T})) > \Sigma _{\text {x}\in \pi } \mathrm{S}(\chi (\mathrm{x} \in \mathrm{T}), \nu (\mathrm{x} \in \mathrm{T})). \end{aligned}$$

Proof

Part (1): Consider an arbitrary $\pi $ in $\Pi $, and an arbitrary $\hbox {x}_i$ in $\pi $. We have $\Sigma _{\text {x}\in \pi } \hbox {S}(\delta (\hbox {x} \in \hbox {T}), \nu (\hbox {x} \in \hbox {T})) = |\pi |\times [\hbox {S}(\delta (\hbox {x}_i \in \hbox {T}), 1)\times \delta (\hbox {x}_i\in \hbox {T}) + \hbox {S}(\delta (\hbox {x}_i \in \hbox {T}), 0)\times (1-\delta (\hbox {x}_i \in \hbox {T})]$, and $\Sigma _{\text {x}\in \pi } \hbox {S}(\chi (\hbox {x} \in \hbox {T}$), $\nu (\hbox {x} \in \hbox {T})) = |\pi |\times [\hbox {S}(\chi (\hbox {x}_i$ $\in \hbox {T}), 1)\times \delta (\hbox {x}_i \in \hbox {T}) + \hbox {S}(\chi (\hbox {x}_i \in \hbox {T}), 0)\times (1-\delta (\hbox {x}_i \in \hbox {T})]$ (since $\delta $ and $\chi $ are principled). Since S is proper, we have for all $\hbox {x}: \hbox {S}(\delta (\hbox {x} \in \hbox {T}), 1)\times \delta (\hbox {x} \in \hbox {T}) + \hbox {S}(\delta (\hbox {x} \in \hbox {T}), 0)(1-\delta (\hbox {x} \in \hbox {T})) \ge \hbox {S}(\chi (\hbox {x} \in \hbox {T}), 1)\times \delta (\hbox {x} \in \hbox {T}) + \hbox {S}(\chi (\hbox {x} \in \hbox {T}), 0)(1-\delta (\hbox {x} \in \hbox {T}))$. $\square $

Part (2): For some $\pi $, we have $\delta (\hbox {x} \in \hbox {T}$) $\ne \chi (\hbox {x} \in \hbox {T}$), for all x in $\pi $ (since $\delta $ and $\chi $ are principled and $\delta \ne \chi $). Consider such a $\pi $. For such a $\pi , \hbox {S}(\delta (\hbox {x} \in \hbox {T}), 1)\times \delta (\hbox {x} \in \hbox {T}) + \hbox {S}(\delta (\hbox {x} \in \hbox {T}), 0)\times (1-\delta (\hbox {x} \in \hbox {T})) > \hbox {S}(\chi (\hbox {x} \in \hbox {T}), 1)\times \delta (\hbox {x} \in \hbox {T}) + \hbox {S}(\chi (\hbox {x} \in \hbox {T}), 0)\times (1-\delta (\hbox {x} \in \hbox {T}))$, for all x in $\pi $, since S is strictly proper. $\square $

Theorem 6

$\forall $T,R,R$^{\prime }$ : if R$^{\prime }$ $\subseteq $ R and $\forall i$ : PROB(freq(T|R$^{\prime }$) = i/|R$^{\prime }|$) = freq($\{$S : freq(T|S) = i/|R$^{\prime }|\}|\{$S : S $\subseteq $ R $\wedge $ |S| = |R$^{\prime }|\}$), then E[freq(T|R$^{\prime }$)] = freq(T|R).

Proof

Let T, R, and $\hbox {R}^{\prime }$ be arbitrary sets such that $\hbox {R}^{\prime } \subseteq $ R. Note that, for all $i, \hbox {freq}(\{\hbox {S} : \hbox {freq}(\hbox {T}|\hbox {S}) = i/|\hbox {R}^{\prime }|\}|\{\hbox {S} : \hbox {S} \subseteq \hbox {R} \wedge |\hbox {S}| = |\hbox {R}^{\prime }|\}) = {\text {g} \atopwithdelims ()i}\times {|R|-\text {g} \atopwithdelims ()|R^{\prime }|-i} / {|R| \atopwithdelims ()|R^{\prime }|}$, where $\hbox {g} = \hbox {freq}(\hbox {T}|\hbox {R})\times |\hbox {R}|$. So, for all i, $\hbox {PROB}(\hbox {freq}(\hbox {T}|\hbox {R}^{\prime }) = i/|\hbox {R}^{\prime }|) = {\text {g} \atopwithdelims ()i}\times {|R|-\text {g} \atopwithdelims ()|R^{\prime }|-i} / {|R| \atopwithdelims ()|R^{\prime }|}$. So $\hbox {E}[\hbox {freq}(\hbox {T}|\hbox {R}^{\prime })] = \Sigma _{i \in \{0, ..., |\text {R}^{\prime }|\}} \,i\,/|\hbox {R}^{\prime }|\times \hbox {PROB}(\hbox {freq}(\hbox {T}|\hbox {R}^{\prime }) = i/|\hbox {R}^{\prime }|) = \Sigma _{i \in \{0, ..., |\text {R}^{\prime }|\}} \,i\,/|\hbox {R}^{\prime }| \times {\text {g} \atopwithdelims ()i}\times {|R|-\text {g} \atopwithdelims ()|R^{\prime }|-i} / {|R| \atopwithdelims ()|R^{\prime }|} = 1/|\hbox {R}^{\prime }| \times 1/{|\text {R}| \atopwithdelims ()|\text {R}^{\prime }|} \times \Sigma _{i \in \{0, ..., |\text {R}^{\prime }|\}} {\text {g} \atopwithdelims ()i}\times {|R|-\text {g} \atopwithdelims ()|R^{\prime }|-i}\times {i \atopwithdelims ()1} = 1/|\hbox {R}^{\prime }| \times 1/{|\text {R}| \atopwithdelims ()|\text {R}^{\prime }|} \times {\text {g} \atopwithdelims ()1}\times {\text {g}+|\text {R}|-\text {g}-1 \atopwithdelims ()|\text {R}^{\prime }|-1}$ [by Vandermonde’s Identity (cf. Gould (2010), 6.17)] $= 1/|\hbox {R}^{\prime }| \times (|\hbox {R}^{\prime }|!\times (|\hbox {R}|-|\hbox {R}^{\prime }|)!)/|\hbox {R}|! \times \hbox {g} \times (|\hbox {R}|-1)!/(|\hbox {R}^{\prime }-1|!\times (|\hbox {R}|-|\hbox {R}^{\prime }|)!) = \hbox {g}/|\hbox {R}| = \hbox {freq}(\hbox {T}|\hbox {R})$. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thorn, P.D. On the preference for more specific reference classes. Synthese 194, 2025–2051 (2017). https://doi.org/10.1007/s11229-016-1035-y

Download citation

Received: 27 June 2015
Accepted: 25 January 2016
Published: 08 February 2016
Issue Date: June 2017
DOI: https://doi.org/10.1007/s11229-016-1035-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the preference for more specific reference classes

Abstract

Access this article

Similar content being viewed by others

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Recognize the Value of the Sum Score, Psychometrics’ Greatest Accomplishment

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Theorem 1

Proof

Theorem 6

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the preference for more specific reference classes

Abstract

Access this article

Similar content being viewed by others

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Recognize the Value of the Sum Score, Psychometrics’ Greatest Accomplishment

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Theorem 1

Proof

Theorem 6

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation