Probabilistic truthlikeness, content elements, and meta-inductive probability optimization

Schurz, Gerhard

doi:10.1007/s11229-021-03057-z

Probabilistic truthlikeness, content elements, and meta-inductive probability optimization

Approaching Probabilistic Truths
Open access
Published: 30 March 2021

Volume 199, pages 6009–6037, (2021)
Cite this article

Download PDF

You have full access to this open access article

Synthese Aims and scope Submit manuscript

Probabilistic truthlikeness, content elements, and meta-inductive probability optimization

Download PDF

Gerhard Schurz ORCID: orcid.org/0000-0002-4107-9240¹

989 Accesses
2 Citations
Explore all metrics

Abstract

The paper starts with the distinction between conjunction-of-parts accounts and disjunction-of-possibilities accounts to truthlikeness (Sects. 1, 2). In Sect. 3, three distinctions between kinds of truthlikeness measures (t-measures) are introduced: (i) comparative versus numeric t-measures, (ii) t-measures for qualitative versus quantitative theories, and (iii) t-measures for deterministic versus probabilistic truth. These three kinds of truthlikeness are explicated and developed within a version of conjunctive part accounts based on content elements (Sects. 4, 5). The focus lies on measures of probabilistic truthlikeness, that are divided into t-measures for statistical probabilities and single case probabilities (Sect. 4). The logical notion of probabilistic truthlikeness (evaluated relative to true probabilistic laws) can be treated as a subcase of deterministic truthlikeness for quantitative theories (Sects. 4–6). In contrast, the epistemic notion of probabilistic truthlikeness (evaluated relative to given empirical evidence) creates genuinely new problems, especially for hypotheses about single case probabilities that are evaluated not by comparison to observed frequencies (as statistical probabilities), but by comparison to the truth values of single event statements (Sect. 6). By the method of meta-induction, competing theories about single case probabilities can be aggregated into a combined theory with optimal predictive success and epistemic truthlikeness (Sect. 7).

The qualitative paradox of non-conglomerability

Article 10 November 2016

Quantitative Logic Reasoning

A non-probabilist principle of higher-order reasoning

Article 09 October 2015

1 Introduction

In order to be sufficiently general, scientific theories have to make various simplifications and neglect (small) deviations from reality. So these theories are strictly speaking false. Nevertheless they are highly successful in their explanations and predictions, because what they assert is close to the truth, in the sense of being a good approximation to reality.

It is difficult to develop the notion of "closeness to the truth" in terms of probabilistic confirmation measures, since theories that are falsified by evidence have a conditional probability of zero. Because of this shortcoming of confirmation measures, Popper (1962, 1963) developed the account of truthlikeness, or verisimilitude. Within this account it is possible to attribute a high truthlikeness even to a false theory, provided its truth-content is high and its falsity-content is low.

Popper's original definition of truthlikeness had a technical defect, as demonstrated by Tichý (1974) and Miller (1974). Soon after this defect was detected, philosophers proposed revised accounts that avoid this defect. Two major families of approaches have been proposed for this purpose: conjunction-of-parts accounts and disjunction-of-possibilities accounts; their difference will be explained in Sect. 2. Hereafter, three distinctions between kinds of truthlikeness measures (t-measures) are introduced: (i) comparative versus numeric t-measures, (ii) t-measures for qualitative versus quantitative theories, and (iii) t-measures for deterministic versus probabilistic truth. In Sects. 3–5 these kinds of truthlikeness are explicated and developed within a version of conjunctive part accounts based on content elements. The focus lies on measures of probabilistic truthlikeness, that are divided into t-measures for statistical probabilities and single case probabilities (Sect. 3).

The logical notion of truthlikeness for probability statements (evaluated relative to true probabilistic laws) can be treated as a subcase of deterministic truthlikeness for quantitative theories (Sects. 4–6). In contrast, for the epistemic notions of truthlikeness (evaluated relative to a given set of empirical evidence), probabilistic truthlikeness creates genuinely new problems. This is especially true for hypotheses about single case probabilities, because they are not evaluated by comparison with observed frequencies (as statistical probabilities), but by comparison with the truth values of single event statements (Sect. 6). In the final Sect. 7 the method of meta-induction is introduced as a means of aggregating competing theories about single case probabilities into a combined theory with optimal predictive success and epistemic truthlikeness.

1.1 Two families of truthlikeness accounts: conjunction-of-parts and disjunction-of-possibilities

Conjunction-of-parts accounts represent theories as conjunctions of (smallest) conjunctive parts. Since conjunctive parts are selected consequences of the theory, conjunction-of-parts accounts are also called consequence accounts (Schurz and Weingartner 2010; Oddie 2013). The truthlikeness of a conjunctively represented theory increases with the number and strength of its true parts and decreases with the number and strength of its false parts. Popper's original account to verisimilitude was a conjunction-of-parts account, in which conjunctive parts were understood as arbitrary logical consequences of the theory. The above-mentioned technical defect of Popper's original account resulted from the fact that the (classical) notion of logical consequence covers two sorts of unnatural consequences that should be disregarded in truthlikeness definitions, namely (i) redundant conjunctions p ∧ q of elementary consequences p and q of a theory A, and (ii) irrelevant disjunctive weakenings p ∨ x of a consequence p of A by arbitrary formulas x (cf., e.g., Schurz 2018, sec. 4). For example, the theory of Newtonian physics, N, entails the true prediction α that the planets circumscribe the sun in elliptic orbits, and the false prediction β that the sun is the center of the universe. However, one should not count the conjunction α ∧ β as a third and false consequence of N, nor should one count the disjunction α ∨ γ for an arbitrary γ (e.g., γ = the moon is made of green cheese) as a third true consequence of N. Since redundant conjunctions and irrelevant weakenings are responsible for the technical defect of Popper's account, all conjunction-of-parts accounts rejected them as conjunctive parts. In the literature, conjunctive parts have been understood in four major ways:

(i)
As relevant elements, or content elements. This account has been developed by Schurz and Weingartner (1987); improved versions are given in Schurz and Weingartner (2010), Schippers and Schurz (2017, sec. 4), and in Schurz (2018), who refers to relevant elements as content elements. An advantage of this account is that it applies to theories of all logical formats, not only in propositional but also in predicate logic. Since the set of content elements of a theory is logically equivalent to the theory, no information gets lost by the content element representation.
(ii)
As content parts in the sense of Gemes (1994, 2007). Gemes' content-parts are similar to the relevant elements of Schurz and Weingartner, but less fine-grained, as was shown in Schurz (2005, sec. 6).
(iii)
As independent conjuncts. In propositional logic they have the form of unnegated or negated atomic sentences, so-called literals, abbreviated as ± p_i. This account was anticipated in Kuipers' (1982) notion of "actual truthlikeness" and has been elegantly developed by Cevolani and Festa (2009), Cevolani et al. (2011) and Cevolani and Festa (2020). The account applies to so-called conjunctive theories, consisting of conjunctions of mutually independent elementary statements. The conjunctive format is a significant restriction, since it excludes theories containing disjunctions or implications such as p ∨ q, p → q or ∀x(Fx → Gx) (etc.). Cevolani and Festa (2020) have extended their account to include disjunctions of literals by assuming a notion a partial entailment between a disjunction of literals d and a literal l, based on the conditional logical probability of l given d.
(iv)
A special case of a conjunctive account is Kuipers' account of nomic truthlikeness based on conjunctions or sets of nomic constituents, or of corresponding set-theoretic structures (Kuipers 1982; 2000, sec. 7.2.2; 2019; Cevolani et al. 2013).

In contrast, disjunction-of-possibilities accounts represent theories disjunctively, either semantically as disjunctions of possible words (Hilpinen 1976; Oddie 1981) or syntactically as disjunctions of constituents, which are descriptions of possible worlds (Tichý 1974; Niiniluoto 1977, 1987). In a propositional language with n propositional variables p₁,…,p_n, constituents are given as complete conjunctions of the form c_i = ± p₁ ∧…∧ ± p_n, where " ± " means "unnegated" or "negated" (± p ∈ {p,¬p}); there are 2ⁿ such constituents. In 1st order languages the number and size of constituents grows astronomically large with the number of individual constants and predicates, especially in relational languages (Niiniluoto 1987, p. 70f.). Almost all disjunction-of-possibilities accounts (with the exception of Miller's 1979) are based on a measure of similarity (or inverse distance) between possible worlds or constituents, respectively. The truthlikeness of a theory A, represented as a disjunction of constituents c₁∨…∨ c_k, is defined by the similarity between A's constituents c_i and the true constituent c_T. While most disjunction-of-possibilities accounts agree in their similarity measure (sim) between two constituents, there is disagreement about the right measure for the similarity between the set of A's constituents and the true constituent c_T:

(i)
the average-measure (Tichý 1976; Oddie 1981) defines sim(A,c_T) as the average of the distances between c_T and A's constituents,
(ii)
the min–max-measure (Hilpinen 1976) defines sim(A,c_T) as a weighted average of the minimum and the maximum of these distances, and.
(iii)
the min-sum measure (Niiniluoto 1977; 1987, ch. 6) defines sim(A,c_T) as a doubly weighted average of the minimum distance and the normalized sum of the distances between c_T and A's constituents.

In a variety of examples, the three disjunction-of-possibility accounts, as well as the above-mentioned conjunction-of-parts accounts, lead to different truthlikeness orderings of theories (Niiniluoto 1987, 232–234). These differences concern mainly the truthlikeness of disjunctions or implications. Most accounts on truthlikeness agree on the truthlikeness ordering of purely conjunctive theories, as expressed by the following intuitions:

(1) Intuitions for truthlikeness of conjunctive theories:
Let p₁, p₂, … be true content elements and " > " denote "being more truthlike than".
(1.1) For true theories truthlikeness increases with logical strength: p₁ ∧ p₂ > p₁ > p₁ ∨ p₂.
(1.2) True conjuncts are better than false conjuncts: ¬p₁ ∧ p₂ > ¬p₁ ∧ ¬p₂.
(1.3) The less false conjuncts, the better: p₁ > p₁ ∧ ¬p₂ and ¬p₁ > ¬p₁ ∧ ¬p₂.
(1.4) Contradictions are worst in truthlikeness: α > p ∧ ¬p for non-contradictory α.

The above intuitions are supported by various passages in Popper (1963), collected in Schurz and Weingartner (2010, sec. 2). Intuition (1) is accepted by most truthlikeness accounts with the exception of the average measure of Tichý (1974) and Oddie (1981) and the partial entailment measure of Cevolani and Festa (2020). Intuitions (2–4) are accepted by all accounts, with the exception of Miller (1979), who rejects these intuitions in order to make truthlikeness language-independent (cf. Schurz 2018, sec. 6). Intuition (3) is rejected in Niiniluoto's account (Niiniluoto 2020).

In the next sections we develop important kinds of truthlikeness, thereby focusing on the content element account of truthlikeness and its application to probabilistic truthlikeness. To make our ideas logically precise we introduce the following technical notions: ${\mathcal{L}}$ is our formal propositional or predicate language with the standard logical operators (¬, ∨, ∧, → for material implication, ⟷, ∀,∃, = identity,T verum, ⊥ falsum). p₁, p₂, … stand for atomic propositions, F, G, R … for predicates, a, b,… for individual constants, x, y, … for individual variables, small letters α, β, … for arbitrary sentences (h_i for hypotheses), capital letters A, B, … for theories, represented as arbitrary sets of sentences, and "|==" for "logical consequence". When it comes to probabilistic truth, we enrich ${\mathcal{L}}$ by lower-case p for an objective statistical probability function and upper-case P for an objective single case probability function.

2 Four kinds of truthlikeness

In this section we introduce three dichotomic distinctions between kinds of truthlikeness:

(a)
between comparative and numeric measures of truthlikeness,
(b)
between truthlikeness for qualitative versus quantitative theories, and
(c)
between truthlikeness in regard to deterministic versus probabilistic truth.

Although these distinctions are logically speaking independent, there are certain plausibility relations between them, with the effect that in the literature there are not $2\cdot 2\cdot 2=8$ but merely four major kinds of truthlikeness: comparative qualitative deterministic, numeric qualitative deterministic, numeric quantitative deterministic and numeric quantitative probabilistic (see below). The plausibility relations are based on two facts: (1) for qualitative statements numeric t-measures involve difficulties that have motivated some authors to prefer comparative t-measures, but for quantitative statements they are straightforwardly definable, and (2) probabilistic truthlikeness applies to probability statements that are a subcase of quantitative statements. Of course, these plausibility relations do not exclude the development of other interesting kinds of t-measures, e.g. comparative for quantitative statements (etc.).

2.1 Comparative versus numerical notions of truthlikeness

Concerning the gradation type of the truthlikeness concept one can distinguish between comparative and numerical notions. Comparative notions of truthlikeness have the form "theory A is closer (or at least as close) to the truth than theory B", abbreviated as A >_i (≥ _i) B, where the index "i" ranges over different explications of this notion. Comparative notions do not establish a total but merely a partial truthlikeness ordering of theories, i.e., there exist theories A, B that are incomparable in truthlikeness.

Numerical notions explicate truthlikeness by a real-valued measure of the form "the truthlikeness of A is r", in short t_i(A) = r. Numerical notions are more fine-grained than comparative notions; on the other hand, their definition involves more arbitrary conventions. Comparative and numerical notions must be ordinally equivalent in the sense that A ≥ _i B iff t_i(A) ≥ t_i(B).

Popper's first concept of truthlikeness (1963) was comparative. Many conjunction-of-parts accounts followed Popper in this respect (Kuipers 1982, 2000, 2019; Schurz and Weingartner 1987; Gemes 2007; Cevolani and Festa 2009) and explicated comparative truthlikeness as follows:

(2) Comparative truthlikeness in conjunction-of-parts accounts:
A ≥_i B iff A_t-parts|== B_t-parts and B_f-parts |== A_f-parts
where A_t-parts (resp. A_f-parts) stand for the set of A's true (resp. false) conjunctive parts. Formally, the two sets are defined as A_t-parts = A_parts ∩ T and A_f-parts = A_parts ∩ F, where T and F are the set of all true resp. false statements of ${\mathcal{L}}$ and A_parts is the set of A's conjunctive parts as explicated in "version i" of the conjunction-of-parts account.

Conjunctive parts in the sense of Schurz and Weingarter (1987) and Gemes (2007) are not closed under logical consequences, whence the superset relation " ⊇ " of Popper's original explication is replaced by the logical entailment relation |==. In contrast, for conjunction-of-parts in the sense of Cevolani and Festa (2009) and Kuipers (2000) logical entailment coincides with the super-set relation, whence in their account "|==" is replaced by " ⊇ " in definition (2).

Conjunction-of-parts accounts have also been developed as numeric measures (Cevolani and Festa 2009, Schurz and Weingartner 2010, Schurz 2018; already Popper 1963 proposed a numerical measure). Moreover, disjunction-of-possibility accounts are typically numeric.

2.2 Truthlikeness for qualitative versus quantitative theories

Concerning the theories whose truthlikeness is evaluated one can distinguish between qualitative versus quantitative theories. The conjunctive parts of qualitative theories are basic qualitative assertions or disjunctions of them, without involving any quantitative magnitudes. The truthlikeness of such theories can be evaluated solely in terms of the numbers and logical strengths of its true versus false content parts, without involving any measure of similarity or distance between numerically graded assertions about magnitudes. Their definition within the content element approach is treated in Sect. 4.

Quantitative truthlikeness applies to theories that contain quantitative (metric) concepts, such as length, time or mass in physics. These concepts are called magnitudes and are expressed as functions m: D → R that map objects in a domain D (and possibly time points t ∈ T) to real numbers (r ∈ R), e.g., "m(a) = r" standing for "The length of this tower is 19.523 m". In what follows, we abbreviate physical magnitude functions as m₁, m_2, …, while f, g,… denote purely mathematical functions over numbers and p, p₁,… stand for mixed physical–mathematical functions, e.g. p(x) = f(m₁(x),m₂(x)).

A singular quantitative statement expresses the value of a physical magnitude only with a certain accuracy. Thus it almost always involves an error (except when the true value is representable by a finite number of digits). This error corresponds inversely to its truthlikeness. Therefore truthlikeness measures of quantitative theories are typically numerical. They are based on a measure of the similarity (or inverse distance) between the value of a quantitative property m for an individual a that is predicted by a theory A, m_A(a), and the true value of m for a, m_T(a). In what follows we abbreviate this similarity measure as sim(m_A(a),m_T(a)).

Different similarity (or inverse distance) measures between quantitative statements are possible; for an excellent overview see Niiniluoto (1989, ch. 1). The simplest measure for sim(x,y) applied to elementary quantitative statements ("m(a) = x") is based on the absolute difference |x − y| and is inversely related to |x − y| (this measure has already been suggested by Niiniluoto 1982). One reason why |x − y| is arguably a most natural distance measure is that already in the metrization procedure of extensive magnitudes, the magnitude m (e.g., mass) of objects x, y … is defined by the concatenation (o) of suitable copies of a m-unit u (e.g., n gram units put together on a scale), where the concatenation function "o" is additive (i.e., m(x o y) = m(x) + m(y)). It follows that m(x) is definable as a certain number or fraction of concatenated m-units (the axioms for Archimedean ordered semi-groups guarantees uniqueness of m(x) up to a constant factor; cf. Krantz et al. 1971, 45, 74). The additivity of o implies that the natural distance is the absolute distance (since m(x) − m(y) = _def the number of m-units needed to expand y to the size of x). Of course, one may ask why the concatenation function is assumed to be additive, instead, for example, quadratic-euclidean (m(x o y)= $\sqrt {\text{m(x)}^2 + \text{m(y)}^2}$). A frequently given answer is that this would not incur a loss of information, but it would complicate the laws of nature; 'natural' measures should lead to most simple laws of nature that "cut nature at its joints". On the other hand, even if one has accepted an additive metric with a natural absolute distance measure, nonlinear distance functions can be useful for special purposes. One such special purpose is the similarity measure for single case probabilities whose proper scoring requires a non-linear distance function, as for example the quadratic function, for reasons to be explained in Sect. 5.3. Since truthlikeness for probabilistic laws is central to this paper, we focus in the following on absolute and quadratic measures for distance and similarity. A further non-linear distance function is the logarithmic distance, defined as log(m(x)/m(y)), and its Kullback–Leibler generalization; the disadvantage of logarithmic distance measures for truthlikeness applications is that they are not upper-bounded, but can approach infinity.

Certain objections have been raised against the inverse relation between numerical distance and truthlikeness. These objections have to do with problems of applying similarity measures directly to complex statements, i.e., to theories having many elementary consequences. Weston (1992, p. 64) and Liu (1999, p. 235) have argued that a small distance between a predicted value m_A (predicted by theory A) and the true value m_T may nevertheless have large effects − that is, may entail large distances for other physical parameters within the given theory A. In this case a small numerical m-difference implies a large distance of theory A from the truth. Therefore, the two authors conclude, the relation between numerical closeness and truthlikeness is intricate and not formally explicable. Within the conjunctive element account, however, there is no need of drawing such drastic conclusion, because the theory is split into its content elements, i.e. its elementary quantitative statements, before measures of numerical similarity are applied. Thus if a small deviation of m_A from m_T has large effects, this cashed out in terms of other content elements of theory A whose distance from the truth is large, so that the overall truthlikeness of theory A (formed by adding up the truthlikeness contributions of its elementary consequences) will be very small or negative. In conclusion, Weston and Liu's problem does not arise for conjunctive accounts. In their second objection, Weston (1992, p. 60) and Liu (1999, p. 241) consider a true linear lawlike function (T) between two real-valued magnitudes, say position p and time t, T: p = a⋅t + b (where a, b are real-valued constants). They assume a false function A: p = a⋅t + b that is also linear with the right inclination (thus A is 'nomologically' correct), but it involves a false constant factor b* strongly different from b. They compare A with a false function C: p = f_zig-zag(t) that runs in a complicated zig-zag pattern (f_zig-zag) around T, but is numerically closer to T than C. In this case, the authors argue, although A is numerically farer apart from T than C, it is nevertheless more truthlike than C because A has correctly captured the linear shape. Also this objection can be elegantly handled within the conjunctive approach, because the assertion that the function is linear, formally expressed by the existential quantification ∃c(p = a⋅t + c), is a true content element of A that is not entailed by C and that boosts A's truthlikeness over that of C.

After this fundamental considerations we continue the explication of truthlikeness, turning now to truthlikeness for universally quantified quantitative statements. As an example from classical mechanics consider the law hypothesis L: ∀x∀t(s(x,t) = f(t,m(x),s₀(x)), where s is the position function, t the time variable and x a variable ranging over physical objects. L says that the position of all objects in an intended domain (e.g., all planets of our solar system) is a unique function f of time, the object's mass m(x) and the object's initial position s₀(x). Also in this case we can distinguish between the function f_A predicted by theory A and the true function f_T. A natural measure of the truthlikeness of a universally quantified quantitative statement is the average of the similarities between the predicted and the true value of f, for all values of the universally quantified variables. For the continuous time variable t, this average is given as the integral of these differences divided by the length of the considered time interval, as in (3b) below (Niiniluoto 1987, sec. 11.3). For the discrete individual variable x ranging over a domain of individuals D, this average is given by the limit of the averages for the first n individuals of D, for n → ∞, as in (3c) below. Equations (3b) and (3c) have to be applied iteratively to each universal quantifier of the law.

To integrate our truthlikeness measure (3) for elementary quantitative hypotheses into our generalized measure (8) for qualitative theories in Sect. 4, we normalize it to the interval between − 1 and + 1 as follows. We assume the magnitude m is measured within a maximal interval [m₁,m₂] of length ∆ (m₁ ≤ m ≤ m₂ and m₂ − m₁ = ∆). Since the truthlikeness of a basic content element (Sect. 4) is + 1 if it is true, − 1 if it is false, and 0 for a tautology, we integrate the truthlikeness of quantitative hypothesis into this frame by assuming that its truthlikeness is + 1 if its distance to the truth is zero, − 1 if this distance is maximal, i.e., equal to ∆, and 0 if this distance is equal to the average distance of a random guess. Assuming a uniform probability distribution over the true and the predicted value, the average distance of a random guess is provably ∆/3. Therefore the similarity sim(x,y) is normalized as in condition (3d) below.

(3) [Definition] Truthlikeness of elementary quantitative hypotheses
Let "t( −)" denote the numeric truthlikeness of an elementary quantitative statement m_A(a) = r, asserted by theory A and "m_T(a)" the true value of magnitude m for object a, where the values of m(x) range in the interval [m₁,m₂] of length ∆. Moreover, let "sim(x,y)" denote the similarity between two values x,y ∈ [m₁,m₂] based on the absolute distance (sim₁) or the quadratic distance (sim₂), as defined in (3d). Then:
(3a) For singular quantitative statements: t(m(a) = r) = sim(r,m_T(a)).
For quantitative laws:
(3b) Discrete quantifiers: t(∀x(m(x) = p(x))) = lim_{n → ∞} (Σ_d∈D sim(p(d),m_T(d)))/|D_n|, where D_n = {d₁,…,d_n} and D = {d_i: i∈N}.
(3c) Continuous quantifiers:

t(∀t(m(t) = p(t))) = $(\int _{{{{\text t} \in _[{\text{t}_{1}},{\text{t}_{2}}]}}} {\left( {{\text{sim}}\left( {{\text{p}}\left( {\text{t}} \right),{\text{m}}_{{\text{T}}} \left( {\text{t}} \right)} \right){\text{ dt}}} \right)} /({\text{t}}_{{2}} \, - \,{\text{t}}_{{1}}).$
(3d) The similarity measures sim₁ and sim₂ are defined as follows:
sim₁(x,y) = (∆/3 −|x − y|)⋅n, where the normalization factor n satisfies:
(i) if |x − y|< ∆/3 (better than a random guess), n = 3/∆, and
(ii) if |x − y|≥ ∆/3 (not better than a random guess), n = 3/(2⋅∆).
sim₂(x,y) = ((∆/3)² − (x − y)²)⋅n', where n' = n² in case (i) and n' = n²/2 in case (ii).
Thus if |x − y|= 0, sim(x,y) = + 1; if |x − y|= ∆/3, sim(x,y) = 0; and if |x − y|= ∆, sim(x,y) = − 1; so sim(x,y) ranges between − 1 and + 1

For theories consisting of logical combinations of quantitative hypotheses, the truthlikeness measure (3) has to be combined with the qualitative truthlikeness measure in definition (9) of Sect. 4.

2.3 Deterministic versus probabilistic truthlikeness

Concerning the nature of truth, we distinguish between deterministic and probabilistic truth. In almost all accounts of truthlikeness developed so far,^{Footnote 1} the complete truth T was assumed to be deterministic, i.e., to consist of true factual (singular or existential) statements together with strict (universally quantified) laws or theories that deductively entail observation statements. However, according to quantum physics (in its prevalent interpretation), microphysical reality is indeterministic, i.e., there are genuinely probabilistic laws and the future states of a closed physical system are not completely determined by its past states. Thus both current theories of physics and the objective truth T are probabilistic, containing probabilistic laws. So it makes sense to introduce and investigate the notion of probabilistic truthlikeness, measuring how close a probabilistic theory comes to the objective probabilistic truth.

Probabilistic truthlikeness is of particular relevance for nomic truthlikeness, as developed by Kuipers (2000, 2019). Here the objective truth T does not contain all true facts (expressible in the given language ${\mathcal{L}}$), but only all true nomic facts, i.e. facts expressing general nomic possibilities and necessities. Logically speaking this means that T is no longer complete, i.e., T doesn't decide for every singular ${\mathcal{L}}$-statement whether it is true or false. Nomic truthlikeness makes especially sense for theories of physics, because their hypotheses consist predominantly of nomic statements. In contrast, theories of biology or social sciences contain also many contingent (non-nomic) generalizations.

Probabilistic truthlikeness is similar to nomic truthlikeness in two respects. First, in the context of probabilistic truthlikeness, the objective truth consists solely of probability statements, not of factual statements; thus T is incomplete. Second the notion of probabilistic truthlikeness assumes that the probabilistic truth approximated by our theories is an objective one, not representing epistemic degrees of belief, but objective probabilistic dispositions (also called propensities). As argued below, all objective probability statements (be they generic or single case) are backed up by statistical probabilities (frequency limits) in given reference classes, expressing probabilistic laws or generalizations. Let us consider some examples. "50% of all Caesium¹³⁷ atoms (for any amount of the substance) will have decayed within 30 years" expresses an objective statistical law that is causally complete and thus invariant under conditionalization to any additional information, since radioactive decay is an objective random processes in nature. In contrast, "80% of people in this room have brown hair" expresses an accidental frequency fact that can be broken any time; it does not express an objective probabilistic disposition. However, we do not assume that there is a strict borderline between nomic and non-nomic probabilistic generalizations. For example, the generalization "80% of all overweight persons have high blood pressure" is not causally complete, because high blood pressure depends on many other causes whose contingent distribution determines the probability value; yet the law is not accidental. For the application of probabilistic truthlikeness to 'special' sciences (medicine, biology, social sciences) the notion of an objective probabilistic law should not require that the law antecedent is causally complete.

Summarizing, in the context of probabilistic truthlikeness, the considered theories A are sets of probability statements, the objective truth T consists of all true probability statements expressible in ${\mathcal{L}}$, and truthlikeness measures the similarity between the probability assertions in A and T. Since probability assertions are a species of quantitative assertions, their truthlikeness measures are typically numeric. Of course, one may also combine deterministic and probabilistic truthlikeness (cf. Cevolani and Festa this volume). One possibility to cover this combination is to consider deterministic truth as the subcase of probabilistic truth with a probability of 1.

3 Probabilistic truthlikeness for elementary probability statements

We first consider elementary probability statements expressed by (closed or open) formulas α. Unconditional (elementary) probability statements have the form p(α) = r, saying that the probability of event α has the value r ∈ [0,1]; conditional probability statements have the form p(α|β) = r, saying that the probability of α in conditions where β occurred is r (where p(α|β) is standardly defined as p(α ∧ β)/p(β)).^{Footnote 2} For simplicity reasons we assume that events are binary or discrete (thus we don't treat probability densities over continuous events). Binary events are expressed by simple statements α versus ¬α. Discrete events are expressed by functions X:D → Val assigning to individuals in the domain D values out of a finite value space Val_X = {v₁,…,v_k}, e.g., "the color of object a is green", C(a) = green, with Val_C = {red, blue, green,…}. In statistics these functions are called random variables.

3.1 Probabilistic truthlikeness as subcase of truthlikeness for quantitative theories

Let h: p(α) = r be an elementary probabilistic hypothesis entailed by a theory and p_T(α) the true probability of the event α (which is unknown to us, whence we don't write "p_T(α) = q" but just "p_T(α)"). Then the truthlikeness t(h) of h measures the similarity between r and p_T(α) and is given as sim(r,p_T(α)), as in definition (3a) of Sect. 2. If we consider universally quantified probability statements of the form ∀x(p(Fx) = r) (examples are given in Sect. 5), we use the limiting average of the similarities sim(r, p_T(Fa_i)) $(\text{a}_\text{i} \in \text{D}_\text{n})$ for n → ∞, as in definition (3b) (for continuous quantifiers we use (3c)). The truthlikeness of logical combinations of probability statements requires the combination of the measures in (3) with the truthlikeness measure for qualitative theories, to be explained in Sect. 4.

Until now it seems that probabilistic truthlikeness is just a special case of truthlikeness for quantitative theories and requires no special treatment. In the next subsections we discuss specific subtleties of probabilistic truthlikeness.

3.2 Statistical versus single case probability

There are two kinds of objective probabilities: statistical or generic probabilities denoted by small p, and single case probabilities denoted by capital P.

Statistical probabilities apply to repeatable types of events, linguistically expressed by open formulas, e.g. Fx. The statistical probability of the event type Fx in a reference class or reference sequence Rx is understood as the disposition of R-events (or corresponding 'random experiments' of type R) to produce F-events with certain relative frequencies that converge towards a frequency limit whose value is denoted as p(Fx|Rx) (cf. Schurz 2014, sec. 3.9). If the class of R-events is finite, the relative frequency of F's among R's, f(Fx|Rx), is simply given as the ratio |F|/|R|. If R is infinite, this frequency ratio is undefined; one rather refers to a random ordering of the individuals in R in the form of so-called random sequence (a₁, a₂,…), and defines p(Fx|Rx) as the limit of the relative frequencies f_n(Fx|Rx) in n-membered initial segments of this random sequence: p(Fx|Rx) = lim_{n → ∞} f_n(Fx|Rx). The frequency-limit understanding of statistical probabilities goes back to von Mises (1964). From a more contemporary perspective frequency limits are theoretical idealizations that are not observable, but only extrapolable from the observation of finite frequencies (cf. Howson and Urbach 1996; Gillies 2000; Schurz 2014, sec. 3.13).

As explained, statistical probabilities refer to repeatable types of events or states of affairs in given reference classes or sequences (R), expressed by open formulas. For example, p(Fx|Rx) could express the limiting frequency of rainy days (Fx) among all days in Düsseldorf (Rx). If the reference class is left out, as in p(Fx), this merely means that it is specified elsewhere. Objective single case probabilities, on the other hand, refer to particular events or states of affairs expressed by singular sentences (closed formulas), e.g. Fa, without any explicit mentioning of a reference class. P(Fa), for example, could express the objective probability that it will rain in Düsseldorf tomorrow ("a" for "the day tomorrow in Düsseldorf"). Without an at least implicit connection to statistical probabilities it seems unclear how single case probabilities could be cognitively accessible (for criticisms cf. Gillies 2000, pp. 126–36; Schurz 2014, sec. 3.13.2). There is, however, a well known method to explain the objective probabilities of single case events by implicit statistical probabilities, namely in relation to their narrowest relevant reference class. In our example, if a meteorologist announces a probability of 3/4 that it will rain tomorrow in Düsseldorf, (s)he uses the weather conditions in the (say three) preceding days as the narrowest reference class; so her claim means that the statistical probability of rain in Düsseldorf on a day preceded by three days with similar weather patterns as in the last three days is 3/4.

In what follows, the narrowest relevant reference class of an event Fa is denoted as R_Fa. It is the narrowest reference class to which the individual event Fa belongs and which is statistically relevant for Fx. With "reference class" we mean not just any class of individuals in the set-theoretic sense, but a conjunction of nomological predicates. Narrowest reference classes as a method of defining single case probabilities go back to Reichenbach (1949, §72) and to Hempel (1968). These two authors, however, understood narrowest reference classes in the epistemic sense, as narrowest reference classes from which we know that they apply. For an objective notion of single case probabilities we need the notion of an objectively narrowest relevant reference class. This notion was first explicated by Salmon (1984), who called it the "broadest objectively homogeneous" reference class R_Fa: "objectively homogeneous" means "narrowest", i.e. no further strengthening of R_Fa changes the statistical probability p(Fx|R_Fax), and "broadest" means "relevant", i.e. R_Fa contains no superfluous irrelevant conditions that make R_Fa narrower than necessary.

In addition, Salmon requires that R contains only conditions that are causally relevant for Fa; thus R has to refer to conditions that temporally precede the event Fa. Following Salmon we define:

(4) [Definition] Objective single case probabilities
P(Fa) is defined as p(Fx|R_Fax), where R_Fax is the narrowest relevant reference class for the event Fa, defined as strongest conjunction of conditions R₁x ∧ … ∧ R_nx that refer to events preceding the event Fx,^{Footnote 3} and each of the R_ix is statistically relevant for Fx, i.e., the value of p(Fx|R₁x ∧ … ∧ R_nx) changes when R_ix is replaced by ¬R_ix

In a similar way, objective single case propensities have been explicated by Lewis (1980), who calls them chances. Lewis does not explicitly explain single case chances as statistical probabilities relative to objectively narrowest reference classes, but he says that they are conditionalized to the entire history of the world until the present time. This can be interpreted as an implicit statistical assertion, since if the world's history were repeated infinitely many times, the objective chance would coincide with the limiting frequency of the event in the sequence of these histories.

Coffa (1974) was the first who pointed out that if the world were deterministic, then all objective single case probabilities would be zeros or ones, i.e., they would coincide with truth values, because in a deterministic world the objectively narrowest relevant reference class for a future event determines its occurrence or non-occurrence. Thus non-trivial objective single case probabilities can only exist in indeterministic worlds.

3.3 Logical versus epistemic truthlikeness

At the end of Sect. 3.1 it seemed that probabilistic truthlikeness is just a special case of truthlikeness for quantitative theories and requires no special treatment. We think that this is indeed true for the logical notion of truthlikeness, but not for the epistemic notion of truthlikeness. This last distinction is explained in this section.

Logical truthlikeness means truthlikeness relative to the objective complete truth T (which for probabilistic truthlikeness is the set of all objective probabilistic truths). When we speak of "truthlikeness" simpliciter we always mean logical truthlikeness, because this is how truthlikeness is defined. However, the objective truth T is typically unknown. We need epistemic criteria to assess the truthlikeness of a theory in the light of our given set E of data or empirical evidence. The estimation of truthlikeness relative to a given evidence is also called epistemic truthlikeness.

In epistemic truthlikeness, the role of the objective truth T is taken over by the evidence set E. This brings us to two important differences between the epistemic versions of deterministic and probabilistic truthlikeness, which are particularly important for conjunction-of-parts accounts of truthlikeness.^{Footnote 4} Thereby we assume a given distinction of the consequences of a theory into empirical consequences that are directly testable by observation and measurement, and theoretical consequences that postulate unobservable causes of observable effects and can only indirectly be tested via the empirical consequences (for a defense of this distinction against challenges cf. Schurz 2014, sec. 2.9).

Difference 1: Deterministic theories deductively entail empirical consequences. Thus they have empirical content elements whose truthlikeness can be directly assessed by comparison with the empirical data in E, in the same way as for objective truthlikeness. If one replaces T by E and the set of a theory's content elements by its empirical content elements, then one can apply the definition of logical truthlikeness analogously to epistemic truthlikeness. One obtains a simple version of epistemic truthlikeness that has been worked out in Schurz (2018, explication (12)). In contrast, probabilistic theories do not entail any empirical consequences; they imply them only with probability. For example, assume p is a statistical probability function over binary random events ± Fx in reference sequences of kind R whose repetitions are independently and identically distributed. Then the theory A = {h_r}, with h_r: p(Fx|Rx) = r, implies the probability statement:

$$ {\text{p}}({\text{f}}\left( {{\text{F}}|{\text{R}}} \right) \, = \frac{{\text{k}}}{{\text{n}}}|{\text{ h}}_{{\text{r}}} ) = \left( {\begin{array}{*{20}c} {\text{n}} \\ {\text{k}} \\ \end{array} } \right) \cdot {\text{r}}^{{\text{k}}} \cdot ({1} - {\text{r}})^{{({\text{n}} - {\text{k)}}}} , $$

(5)

the well-known binomial formula, where "f(F|R)" is the relative frequency of F's in a random sample of R's. Equation (5) specifies the probability of obtaining k out of n F's among n randomly chosen R's (given h_r is true), but it doesn't entail their actual frequencies. We cannot directly compare the probability statement h with the actually observed frequencies. Rather we have to compute the inductive probability of h_r, given our frequentist evidence e (i.e., f(F|R) = $\frac{{\text{k}}}{{\text{n}}}$), according to the well-known Bayesian formula:

$$ {\text{D}}\left( {{\text{h}}_{{\text{r}}} |{\text{e}}} \right) = {\text{p}}\left( {{\text{e|h}}_{{\text{r}}} } \right) \cdot {\text{D}}\left( {{\text{h}}_{{\text{r}}} } \right)/\int\limits_{0}^{{1}} {{\text{p(e|h}_\text{q})} \cdot {\text{D(h}_\text{q})\,{\text{dq}}}} , $$

(6)

where D(h_q) is a uniform prior probability density over all possible probability functions h_q: p(Fx|Rx) = q, for q ∈ [0,1]. In practical applications r is replaced by an interval ∆r of real numbers and P(h|e) is the integral of D(h_r|e) over this interval. The inductive probabilities of the content elements of statistical theories have to be inserted into the truthlikeness definition, as will be explained in Sect. 5.

Difference 2: Difference 1 applies to generic statistical probabilities, whose relevant evidence is expressed in terms of finite frequencies within explicitly given reference classes. Objective single case probabilities, however, are not explicitly relativized to any reference class. They are understood relative to objectively narrowest (relevant) reference classes: P(Fa) = p(Fx|R_Fax), as explained in the previous section. These objectively narrowest reference classes R_Fax are typically unknown and different probabilistic theories or methods diverge in their hypotheses about the relevant 'cues' whose entirety makes up these classes. Thus there are no conditional frequencies in relation to which the truthlikeness of hypotheses about objective single case probabilities could be assessed. Rather, their truthlikeness has to be assessed relative to the truth values of the event statements whose probabilities are predicted. This evaluation method is entirely different from the similarity measures for ordinary quantitative statements. How it works and why meta-inductive probability aggregation becomes important for it is explained in Sects. 5 and 6. Before we come to this, we explain the combination of numerical and quantitative with qualitative truthlikeness within the framework of the content element account. This is done in Sect. 4.

4 Truthlikeness for qualitative and quantitative theories based on content elements

The truthlikeness account of Schurz and Weingartner (e.g., Schurz and Weingartner 1987, 2010; Schurz 2018) is based on decomposing theories into their sets of relevant elements, or content elements, as Schurz later prefers to say. The definition for languages of first or higher order predicate logic is given in (7). It generalizes the definitions in Schurz (1991) and Schippers and Schurz (2017, def. 4.2) by using the notion of a a'quasi-clause' and 'quasi-literal'. Every sentence can be L-equivalently transformed into a conjunction of quasi-clauses.

(7) [Definition] Content elements (relevant elements)
α is a content element of a theory (set of sentences) A iff
(a) α is a quasi-clause, also called an 'element', which is a disjunction of one or more elementary quasi-literals, in short eqls, defined as follows:
(a1) a quasi-literal is either a closed literal (an unnegated or negated atomic sentence) or a quantified sentence in negation-normal form (it begins with a quantifier and has all negation symbols in front of atomic formulas), expressed by means of the primitive symbols ¬, ∨, ∧, ∃, ∀; and
(a2) a quasi-literal is elementary iff it is not L(ogically)-equivalent with a conjunction or a disjunction n ≥ 1 of quasi-literals each of which is shorter than q, where the length of a sentence is defined as the number of its primitive symbols;
(b) α is a relevant logical consequence of A in the sense that A |== α and no predicate in α is replaceable on some of its occurrences by any other predicate (of the same degree) salva validitate of A |== α (propositional atoms are predicates of degree 0); and
(c) α is the first among all statements L-equivalent with α and satisfying (a) and (b), according to a given enumeration of all statements of ${\mathcal{L}}$.

Examples of eql's and of elements according to def. 7a: p, ¬p are eql's; p ∨ q is an element; p ∧ p is not a eql, but p is one; p ∨ p is not an element; ¬¬p is not an eql, but p is one; p ∨ (q ∧ r) is not an element, but p ∨ q and p ∨ r are elements, ∃x(Fx ∨ Gx) is not an eql, but ∃xFx and ∃xGx are eqls and ∃xFx ∨ ∃xGx is an element; ∀x(Fx ∧ Gx) is not an eql, but ∀xFx and ∀xGx are eqls; ∃x(Fx ∧ Gx) is an eql and ∀x(Fx ∨ Gx) is an eql.

Examples of content elements according to clause (c), where A_c is the set of content elements of theory A and ${\mathcal{K}}$ is the set of ${\mathcal{L}}$'s individual constants): {p ∧ q}_c = {p, q}, {p ∨ (q ∧ r)}_c = {p ∨ q, p ∨ r}, {p ∨ ¬q, p ∨ q}_c = {p}, {p → q, q → ¬r}_c = {¬p ∨ q, ¬q ∨ ¬r, ¬p ∨ ¬r}, {p ∨ ¬p}_c ={ T}, {p ∧ ¬p}_c = {⊥}, {∀xFx}_c = {∀xFx, ∃xFx} ∪ {Fa_i: a_i ∈ ${\mathcal{K}}$}, {Fa}_c = {Fa, ∃xFx}, {∀x((Fx ∨ Gx) → (Hx ∧ Qx))}_c = {∀x(¬Fx ∨ Hx)}_c ∪ {∀x(¬Fx ∨ Qx)}_c ∪ {∀x(¬Gx ∨ Hx)}_c ∪ {∀x(¬Gx ∨ Qx)}_c.

The only content element of a logically true theory is the constant T and the only content element of a logically false theory is the constant ⊥.

Content elements within propositional logics are also called prime implicates (cf. Bienvenu 2009); the basic idea goes back to an old paper of Quine (1955). Schurz and Weingartner (2010, Lemma 5) prove that in propositional logic, definition (7) of relevant elements is equivalent with their definition in terms of strongest clauses, where clauses are disjunctions of literals ± p_i ordered according to a fixed enumeration. This ensures that in propositional logic content elements are unique modulo L-equivalence. For predicate logic, this condition is not sufficient; uniqueness of content elements modulo L-equivalence has to be required by condition (c).

The identification of content elements with relevant elementary quasi-clauses constrains the possible L-equivalent formulations of content elements significantly. In definition (a2) of elementary quasi-literals, universally or existential quantified quasi-literals are split into conjunctions or disjunctions, respectively, of quasi-literals, as far as possible, as seen in the above examples.

A theory A is logically equivalent with the set A_c of its content elements; thus no information gets lost in the representation of theories by their content elements. A proof of this important fact for propositional logic is given in Schurz and Weingartner (1987). A general proof for predicate logic is so far only possible if content elements are allowed to contain second order quantifiers; whether the fact holds also without this assumption is until now an open problem.

We let A_c, A_tc = A_c ∩ T and A_fc = A_c ∩ F be the sets of A's content elements, A's true content elements and A's false content elements, respectively. Based on these notions, the concept of comparative truthlikeness for qualitative theories is given by inserting in definition (2) the sets A_tc and A_fc for the sets A_t-parts and A_f-parts, respectively, and likewise for theory B (cf. Schurz 1991).

Building up on the comparative notion of truthlikeness, Schurz and Weingartner (2010) proposed a content element-based definition of numeric truthlikeness, t(A), for propositional languages. Definition (8) below is a generalization of their definition to languages of predicate logic. The generalization to predicate logic is difficult and there are different approaches to this enterprise.^{Footnote 5}

Our proposal is based on the finiteness assumption, i.e. the set of quasi-literals expressible in the language of the considered theories is finite; this assumption will be discussed below. Quasi-literals are denoted as q₁, q₂,….

(8) [Definition] Numeric truthlikeness for qualitative theories based on content elements:
Let Q = {q₁,…,q_n} be the set of all n quasi-literals expressible in ${\mathcal{L}}$, Q(α) the set of quasi-literals occurring in a clause α , k_α =|Q_t(α)| their number, and Q_t(α) = Q(α) ∩ T the subset of α's true quasi-literals. Then:
t(A) = Σ{t(α): α∈A_c}, where
(a) for α∈A_tc: t(α) = $\frac{{({\text{n}} - {\text{k}}_{\alpha} + 1)!}}{{{\text{n}}!}}$⋅$\frac{{\sum _{{\text{q}} \in {\text{Q}_{\text{t}}}(\alpha ) \, } {\text{t}}\left( {\text{q}} \right)}}{{{\text{k}}_\alpha }}$,
(b) for α∈A_fc: t((α) = − $\frac{{({\text{n}} - {\text{k}}_{\alpha} + 1)!}}{{{\text{n}}!}}$⋅$\frac{{\sum _{{\text{q}} \in {\text{Q}}(\alpha ) \, } {\text{t}}\left( {\text{q}} \right)}}{{{\text{k}}_\alpha }}$,
(c) for a quasi-literal q: t(q) = ± 1, (+ 1 if true, − 1 if false)

The measure (8) is based on the following intuitions:

(i)
t(A) is defined as the sum of the truthlikeness of all content elements of A.
(ii)
True content elements have positive and false content elements have negative truthlikeness.

Concerning quasi-literals:

(iii)
If a quasi-literal q is a literal, we set t(q) = 1 if q is true and t(q) = − 1 if q is false. This is intuitive and coincides with the propositional measures of Schurz and Weingartner (2010) and Cevolani and Festa (2009). We set t(T) = 0 and t(⊥) = − (n − 1); thus t(A) = 0 for L-true A and t(A) is maximally negative for L-false A.
(iv)
If q is a universally quantified quasi-literal ∀xβx (for a possibly complex β), it is likewise reasonable to set t(q) = ± 1, because the stronger content of ∀xβx compared to βa_i is cashed out in terms of ∀xβx's content elements, which are given by the set {∀xβx} ∪ {βa_i:1 ≤ i ≤ c} ∪ {∃xβx}. Thus if ∀xβx is true, then t({∀xβx}) is greater than the truthlikeness of ∀xβx's instances, namely equal to t({βa_i:1 ≤ i ≤ c}) + 1. This surplus truthlikeness is reasonable since not all individuals have names in ${\mathcal{L}}$.
(v)
A similar observation applies if q is an existential statement, ∃xβx. The absolute value of t(∃xβx)), |t(∃xβx)|, must be smaller than the absolute value of q's instances, |βa_i| and even smaller than the disjunction of q's instances, |t(βa₁ ∨…∨ βa_c), where c is the number of ${\mathcal{L}}$'s individual constants. However, this will automatically be the case since ∃xβ is a content element of βa_i and also one of βa₁ ∨…∨ βa_c. Thus it is sufficient to set t(∃xβ) = t((¬)βa_i) = ± 1.

Concerning disjunctions of quasi-literals:

(vii)
If q₁ ∨…∨ q_k is true, its truthlikeness is positive and must decrease with decreasing logical content (decreasing k). If q₁ ∨…∨ q_k is false, its truthlikeness is negative and its absolute value must likewise decrease with decreasing k. This requirement is realized as follows. Consider the disjunction q₁ ∨ q_i of two true quasi-literals. The set of quasi-clauses {q₁ ∨ q_i: 2 ≤ i ≤ n} must be less truthlike than q₁ because this set is logically weaker than q₁. Thus if both q₁ and q_i are true, we multiply the truthlikeness of q₁ ∨ q_i by the factor 1/n (= (n − 1)!/n!). Since there are n − 1 two-element disjunctions of the form q₁ ∨ q_i, their sum is (n − 1)/n, which is slightly smaller than t(q₁). The iterative application of this idea to q₁ ∨ …∨ $\text{q}_{\text{k}_{{\upalpha}}}$ leads to the factor $\frac{{({\text{n}} - {\text{k}}_{\upalpha} + 1)!}}{{{\text{n}}!}}$ in def. (8), which we call the content-factor of the quasi-clause.
(viii)
The truthlikeness of a true quasi-clause α is given by multiplying the content-factor of α with the fraction of the (positive) truthlikeness of the true quasi-literals of q, $\frac{{\sum _{{\text{q}} \in {\text{Q}_{\text{t}}}(\alpha ) \, } {\text{t}}\left( {\text{q}} \right)}}{{{\text{k}}_\alpha }}$. For false quasi-clauses, their content factor is multiplied with the (negative) average truthlikeness of all quasi-literals of the clause which in this case are all false (Q(α) = Q_f(α)). For qualitative theories t(q) is always ± 1; therefore the term $\frac{{\sum _{{\text{q}} \in {\text{Q}_{\text{t}}}(\alpha ) \, } {\text{t}}\left( {\text{q}} \right)}}{{{\text{k}}_\alpha }}$ is equal to |Q_t(α)|/k_α and the term $\frac{{\sum _{{\text{q}} \in {\text{Q}}(\alpha ) \, } {\text{t}}\left( {\text{q}} \right)}}{{{\text{k}}_\alpha }}$ equal to − 1 (cf. def. 5 of Schurz and Weingartner 2010). The formulation in (8a,b) has the advantage of being generalizable to quantitative theories (or to refined t-measures with varying |t(q_i)|).

Examples of truthlikeness according to def. 8, where positive quasi-literals are assumed to be true and negated ones false:

$$ \begin{gathered} {\text{t}}(\{ \exists {\text{xFx}}\} ) = {\text{t}}(\exists {\text{xFx}}) = {1};{\text{ t}}\left( {\{ {\text{Fa}}\} } \right) = {\text{t}}\left( {{\text{Fa}}} \right) + {\text{t}}(\exists {\text{xFx}}) \, = { 2}; \hfill \\ {\text{t}}(\{ \forall {\text{xFx}}\} ) = {\text{t}}(\forall {\text{xFx}}) + {\text{t}}(\exists {\text{xFx}}) + \sum\limits_{{{\text{a}}_{{\text{i}}} \in {\mathcal{K}}}} {{\text{t}}\left( {{\text{Fa}}_{{\text{i}}} } \right)} = {2} + {\text{c}};{\text{ t}}(\{ \exists {\text{xFx}} \vee \exists {\text{xGx}}\} ) = ({\text{n}} - {1})!/{\text{n}}!; \hfill \\ {\text{t}}(\{ \exists {\text{xFx}} \vee \neg \exists {\text{xGx}}\} ) = \left( {\left( {{\text{n}} - {1}} \right)!/{\text{n}}!} \right) \cdot \left( {{1}/{2}} \right);\, t\, (\{ \neg \exists {\text{xFx}} \vee \neg \exists {\text{xGx}}\} ) \, = - ({\text{n}} - {1})!/{\text{n}}!; \hfill \\ {\text{t}}(\{ {\text{Fa}} \vee {\text{Ga}}\} ) = {\text{t}}({\text{Fa}} \vee {\text{Ga}}) + {\text{t}}(\exists {\text{xFx}} \vee {\text{Ga}}) + {\text{t}}({\text{Fa}} \vee \exists {\text{xGx}}) + {\text{t}}(\exists {\text{xFx}} \vee \exists {\text{xGx}}) = {4} \cdot \left( {{\text{n}} - {1}} \right)!/{\text{n}}!; \hfill \\ {\text{t}}(\{ \forall {\text{x}}({\text{Fx}} \vee {\text{Gx}})\} ) = {\text{t}}(\forall {\text{x}}({\text{Fx}} \vee {\text{Gx}})) + \sum\limits_{{{\text{a}}_{{\text{i}}} \in {\mathcal{K}}}} {{\text{t}}({\text{Fa}}_{{\text{i}}} \vee {\text{Ga}}_{{\text{i}}} )} = {1} + {\text{c}} \cdot ({\text{n}} - {1})!/{\text{n}}!; \hfill \\ {\text{t}}(\{ \forall {\text{x}}({\text{Fx}} \wedge {\text{Gx}})\} ) = {\text{t}}(\forall {\text{xFx}}) + {\text{t}}(\exists {\text{xFx}}) + {\text{t}}(\forall {\text{xGx}}) + {\text{t}}(\exists {\text{xGx}}) + \sum\limits_{{{\text{a}}_{{\text{i}}} \in {\mathcal{K}}}} {{\text{t}}\left( {{\text{Fa}}_{{\text{i}}} } \right) + {\text{t}}\left( {{\text{Ga}}_{{\text{i}}} } \right)} = {4} + {2} \cdot {\text{c}}; \hfill \\ {\text{t}}(\{ \exists {\text{x}}\forall {\text{yRxy}}\} ) = {\text{t}}(\exists {\text{x}}\forall {\text{yRxy}}) + {\text{t}}(\exists {\text{x}}\exists {\text{yRxy}}) + {\text{t}}(\forall {\text{y}}\exists {\text{xRxy}}) + \sum\limits_{{{\text{a}}_{{\text{i}}} \in {\mathcal{K}}}} {{\text{t}}(\exists {\text{xRxa}}_{{\text{i}}} )} = {3} + {\text{c}}. \hfill \\ \end{gathered} $$

Schurz and Weingartner (2010) show that numeric truthlikeness satisfies the intuitions in (1) and is ordinally equivalent with comparative truthlikeness over all pairs of ≥ -comparable theories A, B, i.e. t(A) ≥ t(B) if A ≥ B. The advantage of t over comparative truthlikeness is that t allows reasonable truthlikeness-comparisons for ≤ -incomparable statements. For example, if p and q are true, then p ∨ q and p ∨ ¬q are both true and ≤ -incomparable, but t(p ∨ q) = 1/2 > t(p ∨ ¬q) = 1/4.

The numeric truthlike measure (8) is not normalized. It can be normalized by dividing through n; then the truthlikeness of consistent theories ranges between + 1 and − 1 and that of a contradictions is − n/(n + 1). For conjunctive propositional theories, the normalized truthlikeness measure (8) coincides with the unweighted numeric truthlikeness measure of Cevolani and Festa (2009).

In the transfer of the truthlikeness measure (8) to predicate logic we assumed that the number of quasi-literals expressible in ${\mathcal{L}}$ is finite. This restriction presupposes (i) that the depth of iterated quantifiers of the considered theories is bounded by a finite maximal depth d, (ii) that the number of predicates, relation and function symbols is finite and (iii) the number of individual constants is finite. Also the constituent account for 1st order languages assumes these restrictions (Hintikka 1965; Niiniluoto 1987). We regard assumptions (i) and (ii) as unproblematic for applications to scientific theories. Assumption (iii) is more problematic, because often theories speak about infinitely many objects that are named by natural numbers. A possible way to get around assumption (iii) would be to consider, instead of content elements, equivalence classes [α] of content elements α that arise from α by permutations of the individual constants and have finite truthlikeness; elaborations of this idea are left to future work.

Definitions (7) and (8) are restricted to qualitative theories. The extension to quantitative theories is possible by assuming that some or all quasi-literals of content elements are elementary quantitative statements. Thus they have the form m(a) = r or ∀x(m(x) = p(x)) (where a, x may be sequences of singular terms). Their quantitative truthlikeness is measured according to definition (3) by a normalized measure that ranges between + 1 for "perfect approximation", 0 for random success, and − 1 for "maximally distant from the true value". To integrate them into definition (8), we introduce a dichotomy between quasi-clauses. We call a quasi-literal q approximately true, in short "app-true", iff t(q) > 0 and app-false iff t(q) ≤ 0. A quasi-clause is app-true iff at least one of its quasi-literals is app-true, and app-false iff each of its quasi-literals is app-false. A_app-tc and A_app-fc denote the sets of a theory A's app-true and app-false content elements, respectively. For a quasi-clause α, Q(α), Q_app-t(α) and Q_app-f(α) are the sets of quasi-literals, app-true quasi-literals and app-false quasi-literals in α, respectively. Now we can define numeric-quantitative truthlikeness in complete analogy to definition (8)

(9) [Definition] Numeric truthlikeness for quantitative theories based on content elements:
Definition as in (9), except that A_tc, A_fc and Q_t are replaced by A_app-tc, A_app-fc and Q_app-t, and if q is a quantitative quasi-literal, t(q) is computed by definition (3).

For qualitative clauses α this condition leads precisely back to definition (8).^{Footnote 6}

5 Probabilistic truthlikeness: logical and epistemic

5.1 Logical truthlikeness for statistical and single case probabilities

As remarked in Sect. 3, the logical notion of probabilistic truthlikeness is a special case of numeric truthlikeness for quantitative theories. For statistical probabilities, the objective truth T consists of all true (and 'sufficiently lawlike') statistical probability assertions expressible in the language of the theory, such as p_T(Fx) = r₁ or p_T(Fx|Gx) = r₂ ("p_T" for the true probability). Also the theories consist of probability statements of this form, or of logical combinations of them. Both the truth T and the theories may in addition contain probabilistic independence assumptions, e.g. p(± Fx₁∧ ± Fx₂) = p(± Fx₁)⋅p(± Fx₂). The logical consequence operation applies also to probability assertions; it is now relativized to the basic axioms of probability, which are assumed as premises. For example, the theory {p(Fx|Rx) = r, p(Gx|Rx ∧ Fx) = q} logically implies the consequence p(Fx ∧ Gx|Rx) = r⋅q. In particular, every theory of the form {p(Fx|Rx) = r} together with a corresponding independence assumption implies the binomial formula in (5) for all k and n. The truthlikeness of an elementary probability assertion is measured using a suitable similarity measure defined as in (3), where the interval length ∆ is now 1. For example, t(p(Fx) = r₁) = sim(r₁,p_T(Fx)) and t(p(Fx|Gx)) = r₂) = sim(r₂, p_T(Fx|Gx)). Quantifications over elementary probability assertions (e.g., ∀x(p(Rxy) = r)), are evaluated by definitions (3b) or (3c). Theories containing logical combinations of probability assertions are decomposed into the sets of their content elements according to the definitions (8) and (9), which apply likewise to formulas containing probability operators.

For objective single case probabilities, the objective truth consists of true unconditional single case probability assertions, for all singular statements expressible in ${\mathcal{L}}$, e.g., P_T(Fa) = r₁, P_T(Fb) = r₂, P_T(Qab) = r₃, etc. All of them are understood as statistical probabilities relative to narrowest relevant reference classes as defined in (4), i.e., P_T(Fa) = p_T(Fx|R_Fax), P_T(Fb) = p_T(Fx|R_Fbx), P_T(Qab) = p_T(Qxy|R_Qabxy), etc. Thus R( −) is a function that assigns to each closed formula its objective narrowest relevant reference class. The function R is typically unknown and we have only partial knowledge about it, i.e., we know some interesting superclasses of narrowest reference classes. In general, the more deterministic laws are regulating the considered domain, the closer the objective single case probabilities of events are to their truth values (1 for true and 0 for false). Theories about single case probabilities consist of elementary single case probability assertions, or logical combinations of them. Their truthlikeness is again evaluated by our measure (3) for elementary P-assertions, e.g. t(P(Fa) = q₁) = sim(q₁,P_T(Fa)). For quantified P-assertions such as ∀x(P(Fx) = q) we use measure (3b,d), and for logical combinations of P-assertions we use definition (9).

5.2 Epistemic truthlikeness for statistical probabilities

For epistemic truthlikeness, the content elements of a statistical theory A consist of statistical probability statements (or disjunctions of them) solely expressed in terms of empirical predicates. The objective truth T is replaced by the set of all available pieces of frequency information corresponding to A's probability statements. We let S be the set of reports about frequencies in random samples and s₁, s₂, … range over particular sample reports. Sample reports have either the unconditional form "f(Fx) = k/n" asserting that k out of n individuals drawn from the domain were Fs, or the conditional form "f(Fx|Gx) = k/n", saying that k out of n individuals drawn from the class of G's were F's. The epistemic truthlikeness of the elementary probability assertion h: p(Fx) = r₁ is evaluated relative to the sample information s: f(Fx) = k₁/n₁ by means of the inductive probability P(h|s), computed according to formula (6) of Sect. 3. Likewise, if h is p(Fx|Gx) = r₂, then s is f(Fx|Gx) = k₂/n₂ (etc.). Based on the inductive probability P(h|s) and the inductive prior probability P(h) we define a confirmation measure conf(h|s) that ranges between + 1 and − 1 and is + 1 if P(h|s) = 1, 0 if P(h|s) = P(h) and − 1 if p(h|s) = − 1. This measure can be defined similarly as in condition (3d); we dispense with stating its details. By inserting this confirmation measure instead of the truthlikeness of h into definition (9), we obtain the epistemic truthlikeness for statistical theories.

5.3 Epistemic truthlikeness for single case probabilities

Theories about single case probabilities attempt to predict unobserved events with a probability that comes as close as possible to their truth value. This is close to the Bayesian idea that probabilities are rational estimations of truth values.

The epistemic truthlikeness of single-case probability assertions, in short P-assertions, is closely related to probabilistic predictions. In the latter context, their distance from the truth is called their loss, their similarity to the truth their score and the epistemic truthlikeness of a theory corresponds to its predictive success (cf. Schurz 2019, ch. 5).

The objectively best possible prediction is the statistical probability of the predicted event relative to its true narrowest relevant reference class. Since this reference class is typically unknown, the method of the previous section cannot be applied: there are no observable sample frequencies corresponding to competing predictions of single case probabilities. Rather, the predicted probabilities have to be scored relative to the truth values of the event statements whose probabilities are predicted (which is also a standard method in Bayesianism). The truthlikeness of the P-assertion about a binary event, P(Fa) = r, is given by sim(r,v(Fa)), where v(Fa) ∈ {1,0} denotes Fa's truth value (similarly for the P-assertion of a discrete event).

The scoring of P-assertions relative to truth values has a peculiar feature. So far it was not decisive whether a linear or a quadratic distance measure is used, but now it becomes crucial. It is a well-known fact that if the similarity (or scoring) measure is based on the absolute (or some other linear) distance, then it is not optimal to predict probabilities, but to predict truth values: ‘1’ if the predicted probability is greater-or-equal 0.5 and ‘0’ if its is smaller 0. This scoring rule is called the maximum rule (cf. Schurz 2019, sec. 5.9) and its proof is simple: assuming that the binary event Fa is independent from its prediction, the P-expected loss of the prediction P(Fa) = q is given as P(Fa)⋅(1 − q) + (1 − P(Fa))⋅q. When P(Fa) ≥ 0.5 this term is minimal if q = 1, and when P(Fa) < 0.5 if q = 0.

What follows from this fact is not that linear scoring rules are inadequate, but that under a linear scoring, single case probabilities are not optimal estimations of truth values. Rather, their roundings to 1 or 0 are the optimal estimations. However, in the context of probabilistic predictions of events one wants a scoring relative to truth values that maximizes the P-expected score if the predicted value coincides with the probability P. Such scoring functions exist and are called proper scoring functions. They have the property that the P-expected score of a probabilistic prediction P(Fa) = q is maximal iff q equals P(Fa). According to a famous result of Brier (1950), the scoring function based on the quadratic loss function is proper (cf. Selten 1998). The proof is simple, by differentiating the P-expected quadratic loss w.r.t. to q and setting it to zero: d(P(Fa)⋅(1 − q)² + (1 − P(Fa))⋅q²)/dq = 2⋅P(Fa) + 2q = _! 0, so P(Fa) = q. There are also other proper scoring functions, e.g. logarithmic ones (cf. Fallis 2007).

Brier (1950) designed proper scoring rules in the context of probabilistic weather forecasting. Under a subjective perspective, a properly scored probabilistic forecaster believes that she will maximize her expected score if she predicts her probabilities. Under the objective truthlikeness perspective, the properly scored average truthlikeness of P-assertions is the greater, the closer the predicted probabilities come to the true single case probabilities, defined as statistical probabilities in the narrowest relevant reference classes. This is precisely what we want.

In the context of probabilistic predictions, the loss and score functions of elementary probabilistic predictions q∈ [0,1] relative to a truth value v∈{0,1} are normalized to the interval [0,1] and defined as loss(p,v) = (p − v)² and score(p,v) = 1 − loss(p,v). In the truthlikeness context, similarity measures are normalized to the interval [− 1, + 1] and the truthlikeness of a random guess should be 0. Since the average quadratic loss of a random guess is 0.5² = 0.25 and the average score is 0.75, the similarity measure is renormalized as follows:

(10) Score and epistemic truthlikeness of elementary single case probabilities
(10.1) Score, normalized to [0,1]:
s(P(Fa) = r) = 1 − (r − v(Fa))², where v(Fa) is Fa's truth value.
(10.2) Truthlikeness, normalized to [− 1, + 1]:
t(P(Fa) = r) = sim(r,v(Fa)) = (0.25 − (r − v(Fa))²)⋅n, where
− if |r − v(Fa) | < 0.5 (better than random guess), n = 4, and
− if |r − v(Fa) | ≥ 0.5 (not better than random guess), n = 4/3.
Thus if |r − v(Fa)| = 0, t(P(Fa) = r) = + 1; if |r − v(Fa)| = 0.5, t(P(Fa) = r) = 0; and if |r − v(Fa)|= 1, t(P(Fa) = r) = − 1.

Based on the epistemic truthlikeness measure (10.2), the truthlikeness of theories about objective single case probabilistic can be evaluated relative to a given evidence set E, which now consists of singular statements that are accepted as true. Since E is incomplete, the theory may contain a content element h: P(Fa) = r without that E contains information about Fa's truth value, i.e. Fa∉E and ¬Fa∉E. In this case, the epistemic truthlikeness of h is set to zero.

6 Meta-inductive optimization of the epistemic truthlikeness of single case probabilities

Theories about single case probabilities may be based on different prediction methods making different conjectures about the relevant cues (or reference classes). Since the objectively relevant cues may vary under different conditions of the environment, the success of theories will be environment-dependent, which implies that under changing environments the most successful theory will frequently change. Is it possible to design a meta-method that combines different probabilistic theories or methods into an aggregated method whose predictions are optimal? This is indeed possible, namely by means of the method of meta-induction, abbreviated as MI. Meta-induction is induction applied at the level of prediction methods or theories, as opposed to object-induction applied at the level of events. Meta-inductive methods have been developed in the context of Hume's problem of induction, as an attempt to provide a non-circular solution to this problem (Schurz 2019). The approach of meta-induction is compatible with Hume's diagnosis that one cannot demonstrate the universal reliability of induction. It attempts to show something weaker: the universal optimality of meta-induction. Optimality is weaker than reliability because even in an induction-hostile world (in which all prediction methods are unreliable), meta-induction can be optimal in the sense of 'being the best of a bad lot'.

Generally speaking, a meta-inductive method predicts a weighted average of the predictions of all accessible methods or theories, weighted according to their observed success. Based on deep theorems in mathematical learning theory (cf. Cesa-Bianchi and Lugosi 2006), one can design a meta-inductive prediction strategy whose predictive success is optimal among all prediction methods that are epistemically accessible. This optimality result holds in the long-run and is universal, i.e., it holds in all possible worlds, including paranormal worlds hosting clairvoyants or anti-inductivistic demons. In the short run, certain 'regrets' of the meta-inductive method compared to the best method are possible, but they are small and converge quickly to zero for an increasing number of predictions.

The universal optimality result provides a justification of meta-induction that is a priori, insofar it does not make any inductive assumptions (it merely assumes logic and the possibility of observing the past). By itself this a priori justification of meta-induction does not imply the optimality of object-induction, since clairvoyants who predict better than object-inductive scientists are logically possible, and if they would really exist, the meta-inductivist would favor them for his predictions. However, the a priori justification of meta-induction implies the following a posteriori justification of object-induction: So far object-inductive methods were (much) more successful than non-inductive methods of prediction; therefore it is meta-inductively justified to favor object-induction in the future. This argument is not circular, because of the independent justification of meta-induction.

Beyond its importance for epistemology, the meta-inductive optimality result can be applied in many other domains (Schurz 2019, ch. 10), e.g., to prediction and action tasks in cognitive science and computer science, to social learning and opinion aggregation in social epistemology and cultural evolution, and to probability aggregation in Bayesian epistemology. In this section we demonstrate the application of meta-induction to probabilistic truthlikeness—more precisely to the epistemic truthlikeness of single case probability assertions.

Note that meta-induction can also be applied to the epistemic truthlikeness of statistical probability assertions. For this application the question of finding an optimal theory is less pressing, because in the long-run this theory will be the one that is closest to the observed frequencies—although also in this domain there are different methods of frequentist estimation (e.g., Carnapian lambda rules) with different short-run success rates that can be meta-inductively evaluated (cf. Douven forthcoming). For single case probabilities, the question of epistemic optimality is more pressing, because the objectively narrowest reference class to which they are implicitly relativized is unknown and sensitive to changing conditions of the environment.

To apply the method of meta-induction to the prediction of single case probabilities, we assume the following ingredients:

The singular statements whose probabilities are predicted are assumed to be ordered into a potentially infinite sequence of binary events (e₁, e₂, …,); this sequence is called the event sequence. Thus e_n = F_na_n (where a_n may also be a vector of individual constants, in which case F_n is a relational predicate).
A₁,…,A_m is a finite set of theories or methods predicting single case probabilities of the form P_j(e_n) = r_n,j. Thus P_j is the single case probability function of the theory or method A_j.
MI is the meta-inductive method whose predictions are defined below. The pair ((e₁, e₂, …), {A₁,…,A_m, MI}) is called a probabilistic prediction game (Schurz 2019, sec. 7.1). The event-index n enumerates the rounds of the game.
In each round n, each theory or method A_j as well as MI delivers a prediction "P_j(e_n+1) = r_n+1,j" of the next event's single case probability.

The success and truthlikeness of the theories A_j and of MI is evaluated as follows (where in this section, "truthlikeness" means always "epistemic truthlikeness"):

score: s(P_j(e_n) = r_n,j) is defined as in (10.1).
truthlikeness: t(P_j(e_n) = r_n,j) is defined as in (10.2).
absolute success until round n: Suc_n(A_j) = A_j's score-sum until round n.
absolute truthlikeness until round n: t_n(A_j) = A_j's truthlikeness-sum until round n.
success rate at round n: suc_n(A_j): = Suc_n(A_j) / n.
truthlikeness rate at round n: tr_n(A_j): = t_n(A_j) / n.
attractivity of A_j for MI at round n: at_n(A_j) = suc_n(A_j) − suc_n(MI).
max-suc_n is the maximum of the success rates suc_n(A_j) for j∈{1,…,m}.
max-tr_n is the maximum of the truthlikeness rates t_n(A_j) for j∈{1,…,m}.

(11) Probabilistic predictions of the meta-inductivist strategy MI:
(11.1) The weight of a theory or method A_j at time n is identified with the positive part of its attractivity at time n, i.e., w_n(A_j) = max(at_n(A_j,0))
(11.2) For all times n > 1 with $\sum\nolimits_{{{1} \le {\text{i}} \le {\text{m}}}} {\text{w}_\text{n}(\text{A}_\text{j})}$ > 0:
P_MI(e_n+1) = $ \frac{{\sum\nolimits_{{{1} \le {\text{j}} \le {\text{m}}}} {{\text{ w}_{\text{n}} ({\text{A}_\text{j})}} \cdot {{\text{P}}_{\text{j}}({\text{e} _{\text{n}+ 1})}}} }}{{\sum\nolimits_{{{1} \le {\text{j}} \le {\text{m}}}} {\text{ w}_{\text{n}} ({\text{A}_\text{j})}} }}$
If n = 0 or $\sum\nolimits_{{{1} \le {\text{i}} \le {\text{m}}}} {\text{w}_{\text{n}}(\text{A}_\text{j})}$ = 0, MI predicts according to a given fallback-method, independently of its weight

The attractivity of A_j for MI is also called the regret of MI w.r.t. A_j. If MI is predictively more successful then A_j at time n, then A_j's attractivity is negative. The crucial property of attractivity-based meta-induction is that MI's prediction ignores players whose attractivity is negative. The universal optimality of attractivity-based meta-induction is expressed in the following theorem:

(12) [Theorem] Universal optimality of attractivity-based meta-induction
For every probabilistic prediction game ((e₁,e₂,…), {A₁,…,A_m, MI}) the following holds:
(12.1) Short run: (i) suc_n(MI) ≥ max-suc_n − $\sqrt {\frac{{\text{m}}}{{\text{n}}}}$. (ii) tr_n(MI) ≥ max-tr_n − (4⋅$\sqrt {\frac{{\text{m}}}{{\text{n}}}}$).
(12.2) Long-run: MI's limiting success rate and truthlikeness rate is at least as great as the maximal success rate or truthlikeness rate of the competing theories:
(i) lim_{n → ∞} (suc_n(MI) − max-suc_n) ≥ 0. (ii) lim_{n → ∞}(tr_n(MI) − max-tr_n) ≥ 0

Proof:

The proof of claims (i) is found in Schurz (2019, sec. 6.6.1), based on Cesa-Bianchi and Lugosi (2006, sec. 2.1). Claims (ii) follow from claims (i) by transformation of predictive success into truthlikeness, as follows. Condition (10.2) implies that (a) the truthlikeness t of a probabilistic prediction with a score s > 0.75 is obtained from its success-score s by the transformation t = (s − 0.75)⋅4, while (b) the truthlikeness of a prediction with a score s ≤ 0.75 by the transformation t = (s – 0.75)·(4/3). To obtain a unique transformation we transform also case (b) with the transformation in (a) and obtain a stretched truthlikeness score t* that lies in the interval [–3,+1]. This has the consequence that the regret of the t*-score for a single prediction with s(MI) ≤ 0.75, max-t* – t*(MI), gets stretched (in both cases max-s > 0.75 and max-s ≤ 0.75). Thus we have max-t – t(M) ≤ max-t* – t* (MI), where max-t* – t*(MI)=(4 · max-s - 3)–(4 · s(MI) - 3)=4 · (max-s –s(MI))). By summing up and diving through n we obtain max-tr_n – tr_n(MI) ≤ 4·(maxsuc_n – suc_n(MI)), which gives us claims (12.1)(ii) and (12.2)(ii).

Theorem (12) asserts that the aggregated probability function of the meta-inductive strategy, P_MI, is in the long run maximally successful under all accessible theories or methods, even when there is no unique best theory but the success rates of the competing theories are permanently changing. Beyond its importance for Bayesian epistemology, this result is important as a strategy of optimal probability aggregation (Feldbacher-Escamilla and Schurz 2020). There are also more complicated exponential versions of attractivity-based meta-induction, whose worst-case regret bounds are even better than $\sqrt {\frac{{\text{m}}}{{\text{n}}}}$, namely proportional to $\sqrt {\frac{{\text{log(m)}}}{{\text{n}}}}$ (Schurz 2019, sec. 6.6.2), but in the present context the most simple version of meta-induction is sufficient.

7 Conclusion

We have started this paper with the distinction between conjunction-of-parts accounts and disjunction-of-possibilities accounts of truthlikeness, followed by three distinctions between kinds of truthlikeness measures (t-measures): comparative versus numeric t-measures, t-measures for qualitative versus quantitative theories, and t-measures for deterministic versus probabilistic truth. In Sects. 3 and 4 we have developed the three kinds of truthlikeness within the framework of conjunction-of-parts accounts based on content elements, with a focus on probabilistic truthlikeness. We distinguished between probabilistic t-measures for statistical probabilities and for single case probabilities (Sect. 3). For logical truthlikeness (t-measures relative to an assumed objective truth) probabilistic truthlikeness turns out to be a subcase of deterministic truthlikeness for quantitative theories (Sects. 3 and 5). In contrast, for epistemic truthlikeness (t-measures relative to a given set of empirical evidence) probabilistic truthlikeness creates genuinely new problems, especially for hypotheses about single case probabilities that are evaluated not in regard to frequencies (as statistical probabilities), but in regard to truth values. In the last section (Sect. 6) the method of meta-induction was applied to the epistemic truthlikeness of single case probabilities. Based on results about the universal predictive optimality of meta-induction it was demonstrated how competing theories about single case probabilities can be meta-inductively combined into a theory with optimal predictive success and epistemic truthlikeness.

Notes

With the exception of Rosenkrantz (1980) and Niiniluoto (1987, p. 404).
If the antecedent's probability p(β) is zero, we use a direct axiomatization of conditional probabilities as in Carnap and Jeffrey (1971, pp. 38f), instead of Kolmogorov's (1950) axiomatization of unconditional probabilities.
This can be formally expressed by assuming that in "Fx" x stands for a sequence of variables containing the time t, and "Rx" stands for the application of R to a function f(x) of x that refers to a neighbourhood at an earlier time. For example, when Fx = Fxt, then Rx = Rf(x)g(t), where g(t) is a time t' preceding t and f(x) an object at time t' located in a neighbourhood of x.
In disjunction-of-possibilities accounts, epistemic truthlikeness is defined as expected truthlikeness based on an inductive probability distribution over the constituents, conditional to the evidence (Niiniluoto 1987, p. 269). For these accounts "difference 1" below disappears, while "difference 2" still remains.
An approach different from ours is the partial entailment method of Cevolani and Festa (2020). It equates the truthlikeness t(α) for every content element α that is not a literal with $\sum\nolimits_{{{\text{q}} \in {\text{B}}_{{\text{t}}} }} {{\text{Inf}}({\text{q}}|\alpha )} - \sum\nolimits_{{{\text{q}} \in {\text{B}}_{{\text{f}}} }} {{\text{Inf}}({\text{q}}|\alpha )}$, where Inf(q,α) = 2·P(q|α)−0.5, P is a logical probability measure, and B_t and B_f are the sets of all true resp. false literals. This proposal is rather attractive. It has two disadvantages: (1.) The definition of logical probability measures for constituents of general 1st order languages is extremely complicated. (2.) Cevolani and Festa's measure doesn't satisfy intuition (1.1) of sec. 1, according to which the truthlikeness of true theories should increase with their logical strength. The authors present interesting arguments for their position. For example, if "n" abbreviates "the number of planets is n", then they argue that 8∨7∨1 is more truthlike than 8∨1, although 8∨7∨1 is weaker than 8∨1 and both assertions are true. Intuitions diverge on this point (cf. Schurz 2020 in reply to Cevolani and Festa 2020 and Niiniluoto 2020). In any case, my account shows how a generalization is possible that satisfies the intuition.
In Schurz (2018, def. 9), numeric truthlikeness for quantitative theories is defined in a simpler way. This has the disadvantage that the application of this definition to qualitative theories does not coincide with definition (8). Therefore we prefer the above definition (9).

References

Bienvenu, M. (2009). Prime implicates and prime implicants: From propositional to modal logic. Journal of Artificial Intelligence Research, 36, 71–128.
Google Scholar
Brier, G. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78, 1–3.
Google Scholar
Carnap, R., & Jeffrey, R. (1971). Studies in inductive logic and probability. Berkeley: University of California Press.
Google Scholar
Cesa-Bianchi, N., & Lugosi, G. (2006). Prediction, learning, and games. Cambridge: Cambridge University Press.
Google Scholar
Cevolani, G., Crupi, V., & Festa, R. (2011). Verisimilitude and belief change for conjunctive theories. Erkenntnis, 75, 183–202.
Google Scholar
Cevolani, G. & Festa, R. (this volume). Approaching deterministic and probabilistic truth: A unified account.
Cevolani, G., & Festa, R. (2009). Scientific change, belief dynamics and truth approximation. La Nuova Critica, 51(52), 27–59.
Google Scholar
Cevolani, G., & Festa, R. (2020). A partial consequence account of truthlikeness. Synthese, 197, 1627–1646.
Google Scholar
Cevolani, R., Festa, R., & Kuipers, T. (2013). Verisimilitude and belief change for conjunctive theories. Synthese, 190, 3307–3324.
Google Scholar
Coffa, J. (1974). Hempel’s ambiguity. Synthese, 28, 141–163.
Google Scholar
Douven, I. (forthcoming). Explaining the success of induction. British Journal for the Philosophy of Science.
Fallis, D. (2007). Attitudes towards epistemic risk and the value of experiments. Studia Logica, 86, 215–246.
Google Scholar
Feldbacher-Escamilla, C., & Schurz, G. (2020). Optimal probability aggregation based on generalized Brier scoring. Annals of Mathematics and Artificial Intelligence. https://doi.org/10.1007/s10472-019-09648-4.
Article Google Scholar
Gemes, K. (1994). A new theory of content: Basic content. Journal of Philosophical Logic, 23, 595–620.
Google Scholar
Gemes, K. (2007). Verisimilitude and content. Synthese, 154, 293–306.
Google Scholar
Gillies, D. (2000). Philosophical theories of probability. London: Routledge.
Google Scholar
Hempel, C. G. (1968). Maximal specificity and lawlikeness in probabilistic explanation. Philosophy of Science, 35, 116–133.
Google Scholar
Hilpinen, R. (1976). Approximate truth and truthlikeness. In M. Przelecki, A. Szaniawski, & R. Wójcicki (Eds.), Formal methods in the methodology of empirical sciences (pp. 19–42). Dordrecht: Reidel.
Google Scholar
Hintikka, J. (1965). Distributive normal forms in first-order logic. In J. N. Crossley & M. Dummett (Eds.), Formal systems and recursive functions (pp. 47–90). Amsterdam: North-Holland.
Google Scholar
Howson, C., & Urbach, P. (1996). Scientific reasoning: The Bayesian approach (2nd ed.). Chicago: Open Court.
Google Scholar
Kolmogorov, A. N. (1950). Foundations of the theory of probability. New York: Chelsea Publishing Company. (German original 1933).
Google Scholar
Krantz, D. H., Luce, R. D., Suppes, P., & Tversky, A. (1971). Foundations of measurement. Vol. I: Additive and polynomial representation. New York and London: Academic Press.
Google Scholar
Kuipers, T. A. F. (1982). Approaching descriptive and theoretical truth. Erkenntnis, 18, 343–378.
Google Scholar
Kuipers, T. A. F. (2000). From instrumentalism to constructive realism. Dordrecht: Kluwer.
Google Scholar
Kuipers, T. A. F. (2019). Nomic truth approximation revisited. Cham: Springer.
Google Scholar
Lewis, D. (1980). A subjectivist’s guide to objective chance. In R. C. Jeffrey (Ed.), Studies in inductive logic and probability; reprinted in ch. 9 of Lewis, D. (1986), Philosophical papers Vol II. New York: Oxford University Press.
Google Scholar
Liu, Ch. (1999). Approximation, idealization, and laws of nature. Synthese, 118, 229–256.
Google Scholar
Miller, D. (1974). Popper’s qualitative theory of verisimilitude. British Journal for the Philosophy of Science, 25, 166–177.
Google Scholar
Miller, D. (1979). On the distance from the truth as a true distance. In J. Hintikka, I. Niiniluoto, & E. Saarinen (Eds.), Essays on mathematical and philosophical logic (pp. 166–177). Dordrecht: Reidel.
Google Scholar
Niiniluoto, I. (1977). On the truthlikeness of generalizations. In R. Butts & J. Hintikka (Eds.), Basic problems in methodology and linguistics (pp. 121–147). Dordrecht: Reidel.
Google Scholar
Niiniluoto, I. (1982). Truthlikeness for quantitative statements. In P. Asquith & T. Nickles (Eds.), PSA 1982 (Vol. 1, pp. 208–216). East Lansing: Philosophy of Science Association.
Google Scholar
Niiniluoto, I. (1987). Truthlikeness. Dordrecht: Reidel.
Google Scholar
Niiniluoto, I. (2020). Truthlikeness: Old and new debates. Synthese, 197, 1581–1599.
Google Scholar
Oddie, G. (1981). Verisimilitude reviewed. British Journal for the Philosophy of Science, 32, 237–265.
Google Scholar
Oddie, G. (2013). The content, consequence and likeness approaches to verisimilitude: Compatibility, trivialization, and underdetermination. Synthese, 190, 1647–1687.
Google Scholar
Popper, K. (1962). Some comments on truth and the growth of knowledge. In E. Nagel, P. Suppes, & A. Tarski (Eds.), Logic, methodology and philosophy of science (pp. 285–292). Stanford: University Press.
Google Scholar
Popper, K. (1963). Conjectures and refutations. London: Routledge.
Google Scholar
Quine, W. V. O. (1955). A way to simplify truthfunctions. American Mathematical Monthly, 62, 627–631.
Google Scholar
Reichenbach, H. (1949). The theory of probability. Berkeley: University of California Press.
Google Scholar
Rosenkrantz, R. (1980). Measuring truthlikeness. Synthese, 45, 463–488.
Google Scholar
Salmon, W. (1984). Scientific explanation and the causal structure of the world. Princeton: Princeton University Press.
Google Scholar
Schippers, M., & Schurz, G. (2017). Genuine coherence as mutual confirmation between content elements. Studia Logica, 105(2017), 299–329.
Google Scholar
Schurz, G. (1991). Relevant deduction. Erkenntnis, 35, 391–437.
Google Scholar
Schurz, G. (2005). Bayesian h-d confirmation and structuralistic truthlikeness: Discussion and comparison with the relevant-element and the content-part approach. In R. Festa, A. Aliseda, & J. Peijnenburg (Eds.), Confirmation, empirical progress, and truth approximation (pp. 141–166). Amsterdam: Rodopi.
Google Scholar
Schurz, G. (2014). Philosophy of science: A unified approach. New York: Routledge.
Google Scholar
Schurz, G. (2018). Truthlikeness and approximate truth. In J. Saatsi (Ed.), Routledge Handbook of Scientific Realism (pp. 133–148). Oxford: Routledge.
Google Scholar
Schurz, G. (2019). Hume’s problem solved: The optimality of meta-induction. Cambridge: MIT Press.
Google Scholar
Schurz, G. (2020). Twelve great papers. Comments and replies. Synthese, 197, 1661–1695.
Google Scholar
Schurz, G., & Weingartner, P. (1987). Verisimilitude defined by relevant consequence-elements. In T. A. Kuipers (Ed.), What is closer-to-the-truth? (pp. 47–78). Amsterdam: Rodopi.
Google Scholar
Schurz, G., & Weingartner, P. (2010). Zwart and Franssen’s impossibility theorem holds for possible-world-accounts but not for consequence-accounts to verisimilitude. Synthese, 172, 415–436.
Google Scholar
Selten, R. (1998). Axiomatic characterization of the quadratic scoring rule. Experimental Economics, 1, 43–61.
Google Scholar
Tichý, P. (1974). On Popper’s definition of verisimilitude. The British Journal for the Philosophy of Science, 27, 25–42.
Google Scholar
Tichý, P. (1976). Verisimilitude redefined. British Journal for the Philosophy of Science, 27, 25–42.
Google Scholar
Von Mises, R. (1964). Mathematical theory of probability and statistics. New York: Academic Press.
Google Scholar
Weston, T. (1992). Approximate truth and sciencific realism. Philosophy of Science, 59, 53–74.
Google Scholar

Download references

Funding

Open Access funding enabled and organized by Projekt DEAL..

Author information

Authors and Affiliations

Department of Philosophy, Heinrich Heine University Duesseldorf, 40225, Duesseldorf, Germany
Gerhard Schurz

Authors

Gerhard Schurz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gerhard Schurz.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Schurz, G. Probabilistic truthlikeness, content elements, and meta-inductive probability optimization. Synthese 199, 6009–6037 (2021). https://doi.org/10.1007/s11229-021-03057-z

Download citation

Received: 30 August 2020
Accepted: 21 January 2021
Published: 30 March 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s11229-021-03057-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Probabilistic truthlikeness, content elements, and meta-inductive probability optimization

Abstract

Similar content being viewed by others

The qualitative paradox of non-conglomerability

Quantitative Logic Reasoning

A non-probabilist principle of higher-order reasoning

1 Introduction

1.1 Two families of truthlikeness accounts: conjunction-of-parts and disjunction-of-possibilities

2 Four kinds of truthlikeness

2.1 Comparative versus numerical notions of truthlikeness

2.2 Truthlikeness for qualitative versus quantitative theories

2.3 Deterministic versus probabilistic truthlikeness

3 Probabilistic truthlikeness for elementary probability statements

3.1 Probabilistic truthlikeness as subcase of truthlikeness for quantitative theories

3.2 Statistical versus single case probability

3.3 Logical versus epistemic truthlikeness

4 Truthlikeness for qualitative and quantitative theories based on content elements

5 Probabilistic truthlikeness: logical and epistemic

5.1 Logical truthlikeness for statistical and single case probabilities

5.2 Epistemic truthlikeness for statistical probabilities

5.3 Epistemic truthlikeness for single case probabilities

6 Meta-inductive optimization of the epistemic truthlikeness of single case probabilities

Proof:

7 Conclusion

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Probabilistic truthlikeness, content elements, and meta-inductive probability optimization

Abstract

Similar content being viewed by others

The qualitative paradox of non-conglomerability

Quantitative Logic Reasoning

A non-probabilist principle of higher-order reasoning

1 Introduction

1.1 Two families of truthlikeness accounts: conjunction-of-parts and disjunction-of-possibilities

2 Four kinds of truthlikeness

2.1 Comparative versus numerical notions of truthlikeness

2.2 Truthlikeness for qualitative versus quantitative theories

2.3 Deterministic versus probabilistic truthlikeness

3 Probabilistic truthlikeness for elementary probability statements

3.1 Probabilistic truthlikeness as subcase of truthlikeness for quantitative theories

3.2 Statistical versus single case probability

3.3 Logical versus epistemic truthlikeness

4 Truthlikeness for qualitative and quantitative theories based on content elements

5 Probabilistic truthlikeness: logical and epistemic

5.1 Logical truthlikeness for statistical and single case probabilities

5.2 Epistemic truthlikeness for statistical probabilities

5.3 Epistemic truthlikeness for single case probabilities

6 Meta-inductive optimization of the epistemic truthlikeness of single case probabilities

Proof:

7 Conclusion

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation