Approaching probabilistic laws

In the general problem of verisimilitude, we try to define the distance of a statement from a target, which is an informative truth about some domain of investigation. For example, the target can be a state description, a structure description, or a constituent of a first-order language (Sect. 1). In the problem of legisimilitude, the target is a deterministic or universal law, which can be expressed by a nomic constituent or a quantitative function involving the operators of physical necessity and possibility (Sect. 2). The special case of legisimilitude, where the target is a probabilistic law (Sect. 3), has been discussed by Roger Rosenkrantz (Synthese, 1980) and Ilkka Niiniluoto (Truthlikeness, 1987, Ch. 11.5). Their basic proposal is to measure the distance between two probabilistic laws by the Kullback–Leibler notion of divergence, which is a semimetric on the space of probability measures. This idea can be applied to probabilistic laws of coexistence and laws of succession, and the examples may involve discrete or continuous state spaces (Sect. 3). In this paper, these earlier studies are elaborated in four directions (Sect. 4). First, even though deterministic laws are limiting cases of probabilistic laws, the target-sensitivity of truthlikeness measures implies that the legisimilitude of probabilistic laws is not easily reducible to the deterministic case. Secondly, the Jensen-Shannon divergence is applied to mixed probabilistic laws which entail some universal laws. Thirdly, a new class of distance measures between probability distributions is proposed, so that their horizontal differences are taken into account in addition to vertical ones (Sect. 5). Fourthly, a solution is given for the epistemic problem of estimating degrees of probabilistic legisimilitude on the basis of empirical evidence (Sect. 6).


3
1 The similarity approach to truthlikeness Karl Popper's (1963) original work on truthlikeness was based on the concepts of truth value (true or false) and logical deduction (entailment). Theories were represented as deductively closed sets of sentences in some language L, and the comparative notion "more truthlike" was characterized by set-theoretical comparisons of the truth content and falsity content of rival theories. The main lesson from the dramatic failure of Popper's definition in 1974 was the need to add the notion of similarity or resemblance to the logical toolbox. In the first formulations of the similarity approach, Risto Hilpinen (1976) represented theories as classes of possible worlds and employed spheres of similarity from David Lewis' approach to counterfactuals, while Pavel Tichý defined theories as disjunctions of propositional constituents and Ilkka Niiniluoto as disjunctions of monadic constituents, adding a function to measure the distance between constituents. Soon these notions were extended to full first-order languages by Tichý, Niiniluoto, Raimo Tuomela and Graham Oddie, with systematic summaries in Oddie´s Likeness to Truth (1986), Niiniluoto´s Truthlikeness (1987), and Theo Kuipers' edition What Is Closer-to-the-Truth? (1987).
In Hilpinen's treatment, the truthlikeness of a theory depends on its maximum and minimum distances from the actual world. Tichý and Oddie favor the average distance, while Niiniluoto combines the minimum distance with the normalized sum of all distances. In the linguistic formulations, the degree of truthlikeness Tr(H,C*) of a theory H in language L depends on the similarity of the disjuncts of H with the true constituent C* of L. Here the target C* is the most informative truth expressible in the conceptual framework L, and Tr(H,C*) is maximal when H is identical with this complete truth C*. 1 A successful theory H should give a full and correct description of a domain of investigation by its conceptual resources in language L. In other words, H should specify the L-structure of the actual world with respect to L, and strongest theories are able to do this up to isomorphism. Here L may include qualitative or quantitative concepts. But the choice of the logical complexity of language L allows a finer discrimination: the target can be chosen for the purposes of the relevant cognitive problem, so that it may be a propositional constituent, a state description, a structure description, a monadic constituent, a polyadic constituent of depth-d, or a complete first-order theory. 2 For each of these choices, the task is to define the distances of statements in L from the given target.
Following Popper, one should distinguish here two problems. In the logical problem of truthlikeness, we are given the true target C*, and we ask what it means to say that a theory is close to C* or closer to C* than another theory. In the epistemic problem of truthlikeness, the true target C* is unknown, and we ask how we can rationally claim or estimate on available evidence E that one theory is close to C* or closer to C* than another theory.
1 3 Synthese (2021) 199:10499-10519 To illustrate the similarity approach to these two problems, let L be a monadic first-order language with k one-place predicates, and let = Q 1 , … , Q K be the Q-predicates of L (K = 2 k ). The Q-predicates can be defined by the conjunction of negated or unnegated primitive predicates of L, so that there is a natural distance ρ uv = d Q u , Q v between them. 3 The Q-predicates are the strongest predicates expressible in L, and they constitute a classification system of individuals in the domain of L. A state description in L locates each individual in one and only one "cell" defined by a Q-predicate, while a structure description specifies the proportions of individuals in these cells. A monadic constituent C i of L specifies which Q-predicates are empty and which are non-empty: where (+ / −) is replaced by negation or nothing. As an empty universe is excluded, the number of constituents is q = 2 K − 1 . If CT i is the class of occupied cells by C i , then (1) can be rewritten in the following form: If for the true constituent C* there are no empty cells, so that CT * = Q, the world is atomistic in the sense that there are no true universal generalizations. For example, the truth of the generalization (x)(Fx → Gx) means that the cell F& ~ G is empty. A simple distance between monadic constituents is the Clifford-measure: where Δ is the symmetric difference (see Fig. 1). Variants of (3), which take into account distances between Q-predicates, have been considered by Tichý,Oddie,and Niiniluoto. 4 Then the degree of truthlikeness Tr(H,C*) of a generalization H in L depends on the Clifford-distances (or their variants) of the disjuncts of H from the true constituent C*. A comparative notion "H 1 is closer to the truth than H 2 " is explicated by the condition Tr(H 1 ,C*) > Tr(H 2 ,C*).
If C* is unknown, but a rational epistemic probability measure P is defined over the class of constituents of L, then the unknown degree Tr(H,C*) can be estimated by its expected value on the basis of evidence E: (1) (3) Δ C C i , C j = |CT i ΔCT j |∕K = the number of disagreements of C i and C j , where the sum goes over i = 1, …, q. 5 For monadic languages, the relevant posterior probabilities P(C i /E) of constituents (i.e., degrees of belief on the truth of C i given E) are given by Jaakko Hintikka's system of inductive logic. 6

Verisimilitude vs. Legisimilitude
Following the difference between accidental and lawlike generalizations, a distinction between verisimilitude and legisimilitude has been proposed by L. J. Cohen. In the logical problem of legisimilitude, the target is not just the strongest true statement about the world (in a given language), but a genuine law of nature. A solution of this problem for universal or deterministic laws can be based on S. Uchii's notion of nomic constituent. 7 Let L(□) be a modal monadic language with the operators of nomic necessity □ and nomic possibility ◊, satisfying the system S5. Then a nomic constituent tells which Q-predicates are possible and which are impossible: The number of nomic constituents in L(□) is again q = 2 K − 1 . As actuality implies possibility, and impossibility implies non-actuality, nomic constituent (5) is partly weaker and partly stronger than constituent (3). Laws of nature are disjunctions of nomic constituents. For example, the law ◻(x)(Fx → Gx) is equivalent to the disjunction of all nomic constituents which state that the cell F& ~G is physically impossible. The distance Δ B 1 , B 2 between nomic constituents B 1 and B 2 can be defined by the Clifford-measure |CT 1 ΔCT 2 |∕K or its variants. The degree of legisimilitude of a law of nature H depends on its distance to the true nomic constituent B*: Alternatively, if the cognitive aim is to combine verisimilitude and legisimilitude, the target could be the conjunction B* & C*. 8 Estimation of legisimilitude can again employ expected values based on inductive probabilities. 9 Nomic constituents (5) represent laws of coexistence, i.e., lawlike connections between attributes or properties. To define laws of succession, introduce a discrete temporal index t to Q-predicates: leg H, B * = 1 − Δ H, B * .
7 See Niiniluoto (1987), pp. 91-98. Nomic constituents correspond to what Kuipers (1982) called"theoretical truth" (as opposed to"descriptive truth") and later "nomic truth" (see Kuipers, 2019). 8 See Niiniluoto (1987), p. 377. 9 For monadic nomic constituents, the relevant posterior probabilities are again obtained from Hintikka's inductive logic (see Niiniluoto, 1987, pp 98-102). 5 This solution to the epistemic problem was proposed by Niiniluoto in 1977(cf. Niiniluoto, 1987. 6 For a survey, see Niiniluoto (2011). Here T lists all possible transitions between successive states. For deterministic laws, for each i there is only one j such that < i,j > ε T. Again the Clifford-measure can be applied to measure the distance between laws of succession:|T 1 ΔT 2 |∕K 2 . 10 The class of Q-predicates of a monadic language can be generalized to a quantitative state space ⊆ k generated by real-valued quantities h 1 , … , h k . 11 In the simplest case, Q is the real line or its part with the geometrical distance between points ρ x, x � = |x−x � | . More generally, Q is a k-dimensional metric space with the Euclidean metric. Here laws of coexistence specify regions of nomically possible states The Clifford-distance between two such laws F 1 and F 2 is defined by An alternative approach to quantitative laws expresses how a function h k necessarily depends on h 1 , … , h k−1 ∶ The distances between two such real-valued functions can be defined by the Minkowski or L p -metrics for functions: Here p = 1 is the Manhattan metric, p = 2 the Euclidean metric, and p = ∞ the Tchebycheff metric sup |f(x)−g(x)|. 12 The degree of legisimilitude of the law f then depends on its distance to the true law f*: Further, f is closer to the truth than g if and only of leg f, f * > leg g, f * . Quantitative laws of succession can be formulated by relativizing the state x with a time: h(x,t) = state of x at time t. Then deterministic dynamical laws tell how the state depends on time t and some initial state at time t o : A law of succession specifies nomically possible trajectories F ∶ × → . The distance between such laws can be defined by taking for each Q ε the Minkowski distance between the trajectories F 1 (t,Q) and F 2 (t,Q), for t ε , and then summing over all possible initial states Q ε . 13

Probabilistic laws
The notion of a universal or deterministic law introduced in Sect. 2 can be generalized to probabilistic laws, if an objective physical probability measure is available. 14 Following Leibniz, such physical probabilities express "degrees of possibility". This can be understood in terms of single-case propensities: P(G/F) = r means that a physical set-up has a numerical disposition of strength r to produce an outcome of type G in each trial of type F. Thus, probability statements involve a dispositional modal operator, so that they differ from extensional statistical statements about actual relative frequencies of attributes in reference classes (i.e., structure descriptions). Universal laws of coexistence and deterministic laws of succession are limiting special cases of probabilistic laws (with propensities 0 and 1). 15 Genuine probabilistic laws presuppose that the world is indeterministic, but in statistical modelling one may assign in some sense objective probabilities to random phenomena (e.g., coin tossing, roulette) even when the underlying reality is deterministic. For the task 14 See Niiniluoto (1987), pp. 118-121. 15 For probabilistic laws with single-case propensities, see Fetzer (1981). 12 See Niiniluoto (1987), p. 385. The metric (7) is based on the differences between the values of two functions, but it does not reflect the similarity of their mathematical form (see Niiniluoto, 2019, p. 131). For a proposal to measure the distance between quantitative laws as a combination of accuracy and nomicity, see Garcia Lapeña (2021). 13 See Niiniluoto (1987), p. 393. of defining approximation to such probabilities, the philosophical issue of indeterminism and determinism can be left open.
To define probabilistic constituents, replace in a nomic constituent (5) the operator of physical possibility ◊ with a probability measure P over the discrete state space Q of Q-predicates, now understood as the "sample space" or the class of outcomes of a trial x: Probabilities (8) over Q define a multinomial context. Then p i = P Q i (x) > 0 if and only if Q i is physically possible, for all i = 1, … , K , so that here P is applied to the open formula Q i (x) instead of the existential statement in (5). Now a probabilistic constituent (8) is compatible with the nomic constituent B i with CT i if and only if it assigns a positive probability to the Q-predicates in CT i and zero probability to other Q-predicates. This means that typically a nomic constituent is an infinite disjunction of probabilistic constituents.
In many statistical applications, the trial x counts the number of successes in a repeated experiment (e.g., binomial and Poisson distributions), so that the state space Q is a subclass of the set N of natural numbers. Then the distance ρ x, x � between points in Q is their normalized arithmetical difference.
An important distinction can be made between pure and mixed probabilistic laws. A probabilistic constituent, where CT i is a proper subset of Q, is a mixed law in the sense that it entails a universal law (cells in Q-CT i are necessarily empty). Pure probabilistic laws have no such entailments: the world is atomistic in the sense that all Q-predicates in Q are nomically possible (so that no universal laws hold), and a positive probability is assigned to all Q-predicates.
To define probabilistic laws of succession, for a discrete space Q the set T of possible transitions between states is replaced by a matrix of transition probabilities where p 1∕i + … + p K∕i = 1 for each i. This definition involves the Markov condition, i.e., the next state depends only on the present state. If transition probabilities are 0 or 1, this law reduces to the deterministic law (6). 16 Equation (9) determines the n-step transition probabilities and for an irreducible stationary Markov chain the limits of p j/i (n), for n → ∞, give a long-run probability distribution. These notions can be generalized to Markov processes with a continuous time. 17 Special cases of probabilistic laws of succession can be formulated by quantitative dynamic laws like the law of radioactive decay where Q(x,t) states that atom x decays within the time-interval [0,t] and λ is a constant. Finally, for a quantitative state space Q a probability measure on Q ∞ (i.e., infinite sequences of successive states) assigns a physical probability to possible trajectories of a time-continuous stochastic process.

Distance between probabilities
The problem of legisimilitude for probabilistic laws has not yet received much attention. The main focus in the literature has been on cases, where the target is a universal or deterministic law, either qualitative or quantitative. The only detailed proposals have been given by Rosenkrantz (1980) and Niiniluoto (1987), 403-405, who apply the Kullback-Leibler notion of divergence as a measure of distance from a probabilistic truth.
Mathematicians have suggested a great number of measures for distances between probability distributions. In a comprehensive survey, Cha (2007) lists 45 different measures, 18 which have been used for various purposes. For example, the central limit theorem (the sum of n independent random variables approximates in the limit the normal distribution) and laws of large numbers (observed relative frequencies and predictive probabilities approach almost surely objective probabilities in a multinomial Bernoulli process) express distances between epistemic probabilities q and objective probabilities p by their geometrical distance |q-p|. 19 This amounts to the Manhattan metric For discrete probabilities, the squared Euclidean or quadratic metric or its variant χ 2 , is a standard way of measuring the fit between two distributions or structural descriptions. 20 In the special case of scoring, where q i are probabilistic estimates of the truth values p i of n rival exclusive hypotheses (p j = 1, otherwise 0), Glenn Brier's 1950 measure of inaccuracy is quadratic, i.e., d(1, q) = (1−q) 2 (10) P(Q(x, t)) = 1−e −λt , 20 See Niiniluoto (1987), pp. 15-16, pp. 302-303, pp. 321-322. 17 See Parzen (1962), pp. 248, 277. 18 Cf. Niiniluoto (1987), pp. 7-8. 19 See Festa (1993). and d(0, q) = q 2 , 21 while I. J. Good in 1952 favored the logarithmic measure d(1, q) = − lnq, d(0, q) = − ln(1 − q). 22 From these local measures the total scoring measure is obtained by summing the inaccuracies of all q i . Some measures are based on the inner products p i q i . Hellinger's 1909 proposal was modified in 1946 in Bhattacharyya's dissimilarity coefficient: The directed divergence of a discrete random variable p from another q was defined by Solomon Kullback and Richard Leibler in 1951 as the expected logarithmic difference between p and q with respect to p: Here log can be taken to have the binary base 2, so that log2 = 1. Formula (11) is also called the relative entropy of p with respect to q. This measure is only a semimetric: non-negative, div(p, p) = 0, div(p, q) = 0 if and only if p = q, but nonsymmetric (usually div(p, q) ≠ div(q, p)) , and the triangle equation is not satisfied.
For continuous probability densities f and g on R, Eq. (11) is replaced by The symmetric divergence between p and q is defined by Other variants include the λ-divergence For λ = ½, it gives the Jensen-Shannon divergence where H(p) = −Σp i logp i is the Shannon entropy of p. Formula (12) is non-negative and symmetric, and its square root is a metric. Renyi divergence is defined by div s (p, q) = div(p, q) + div(q, p) = ∑ p i − q i log p i ∕q i . div λ (p, q) = λdiv(p, λp + (1 − λ)q) + (1 − λ)div(q, λp + (1 − λ)q) For α = 1, it gives in the limit the Kullback-Leibler divergence, and for α = ½ twice the Bhattacharyya distance.
Divergence was originally intended as a tool in information theory. 23 In Bayesian statistics it has been used to measure the difference between prior and posterior distributions. It can be also used for assessing the similarity of a probabilistic model with some aspect of reality. 24 The first connection to the studies in truthlikeness was developed by Roger Rosenkrantz (1980). Inspired by I. J. Good's notion of the weight of evidence, Rosenkrantz suggested that for a random experiment x and truth h*, hypothesis h is more truthlike than hypothesis h′ if This idea was connected to the similarity approach to truthlikeness by Niiniluoto (1987): the distance of a probabilistic hypothesis h from the probabilistic target h* is measured by div(h*,h). Thus, hypothesis h is more truthlike than h´ if and only if div h * , h < div h * , h � . When the relevant hypotheses h, h′, and h* are specified by probabilities p i , q i , and p i *, this comparative condition holds if and only if Σp * i log q i ∕p i < 0 . Generalization to probability density functions is immediate. More generally, Niiniluoto (1987) recommends divergence div as a solution to the problem of probabilistic legisimilitude: • for probabilistic laws of coexistence (8) in qualitative conceptual spaces, the distance to the true probabilistic constituent • for probabilistic laws of succession (9) in qualitative languages, the distance to the matrix of true probability transitions • for probabilistic laws of succession in the quantitative space Q ∞ , the distance to the true probability on Q ∞ .
Alternative solutions could replace div by some other distance measure, e.g., Manhattan, Euclidean, or Bhattacharyya.
The Kullback-Leibler divergence div(p,q) has a limitation which is not noted in Niiniluoto (1987). Its definition (11) presupposes that p is absolutely continuous with respect to q, i.e., if q i = 0, then p i = 0. Further, when p i = 0, the factor 0log0 in the sum vanishes. The same condition is required for probability densities: if g(x) = 0, then f(x) = 0. This means that the KL-divergence can be applied only to pure probabilistic laws, since for mixed probabilistic constituents mistakes in the empty cells (or zero points in Q) in the target and hypothesis would not count at all. The same problem is faced by the Bhattacharyya distance, whose factors vanish as soon as p i or q i is 0, but not by the Minkowski metrics.
23 See Kullback (1959). 24 For example, Sober (2002) follows the statistician H. Akaike in measuring the distance between a fitted model (with fixed parameter values) and the truth by the Kullback-Leibler distance.
This problem with divergence is observed by Rosenkrantz (1980), who suggests that in the evaluation of div P(x, h), P x, h * zero probabilities are replaced by a slightly positive possibility of misclassification, but this ad hoc move is unsatisfactory. As a better solution one can recommend the use of the Kullback-Leibler directed divergence div for pure probabilistic laws, and the Jensen-Shannon divergence div JS (instead of div) to measure the distance between mixed probability distributions over cells Q or in the transition matrix. The JS-divergence shares the good properties of the KL-divergence, but it has a finite value in all cases even when some of the probabilities are zero. 25 The following examples illustrate various possibilities in analyzing approach to probabilistic laws.
Example 1. Let L be a monadic language with two primitive predicates Fx = x is a swan and Gx = x is white. Then there are four Q-predicates in L: Then the true constituent C* in L states that all Q-predicates are instantiated. Let H be the false universal generalization "All swans are white". H states that Q 2 is empty and leaves other cells as question marks. Applying the min-sum definition with weights γ and γ´ for the min and sum factors, respectively, the degree of truthlikeness of H is 26 Choosing γ = 2/3 and γ´ = 1/3, this is equal to 5/8. The degree of truthlikeness of the false constituent C 1 with CT 1 = {Q 1 , Q 3 , Q 4 } is 27 The same numerical results hold for the nomic versions of C*, H, and C 1 . In the probabilistic framework, H corresponds to the law P(Gx/Fx) = 1, but now the target is the true probability distribution P* over the cells Q 1 ,…,Q 4 , and H is a disjunction of probabilistic constituents with probability 0 for Q 2 . As the number of black swans is small in comparison to white swans, the true probabilistic law is something like P(Gx/Fx) = 0.95. It follows, for any reasonable distance measure, that H has a Tr H, C * = 1−γ∕4 − 5γ � ∕8.
25 div JS is absolutely continuous, since in (12) (p i + q i )/2 = 0 implies p i = 0 and q i = 0. 26 See formula (9.21) with b = b´= 1 and q = 4 in Niiniluoto (1987), p. 338. 27 See formula (6.88) with |I |= 24 and av(*,B) = ½ in Niiniluoto (1987), p. 229. relatively high degree of truthlikeness, and in any case higher than that of the law P(Gx∕Fx) = 0.5 . But if the cognitive interest of the investigator is to know both the nomic and actual features of birds, so that the target is the conjunction P* & C*, then H's overall truthlikeness is reduced, since it mistakenly excludes the cell Q 2 , while laws of the form P(Gx∕Fx) = r < 1 allow for the actual existence of black swans.
Example 2. Already Example 1 illustrates the fact that the comparison of ordinary, nomic and probabilistic constituents is a complicated matter, as they involve different targets. For example, a probabilistic constituent P is equivalent to a single nomic constituent B only in the special case where just one cell Q i is physically possibleand, hence, has probability 1. In other cases, the true nomic constituent B* is an infinite disjunction of probabilistic constituents, and the target-sensitivity does not allow a direct comparison of the degrees of truthlikeness of these different types of hypotheses. In particular, the atomistic nomic constituent, which states the possibility of all Q-predicates, is the disjunction of all pure probabilistic laws. This means that there is no connection between divergence and Clifford-distance for pure probabilistic laws. To see this, assume that P 1 and P 2 are two different laws, and B 1 and B 2 are the nomic constituent entailed P 1 and P 2 . If P 1 and P 2 are pure laws, then CT 1 = CT 2 = Q and CT 1 ΔCT 2 = ø, so that div(P 1 ,P 2 ) > 0 but the Clifford-distance Δ C (B 1 ,B 2 ) = 0. 28 But some simple comparisons can be made for the special case of uniform mixed laws. Thus, suppose C 1 and C 2 are monadic nomic constituents in a language with K Q-predicates with |CT 1 −CT 2 |= A, |CT 2 −CT 1 |= B, and |CT 1 ∩ CT 2 | = D , so that the Clifford distance Δ C between C 1 and C 2 is (A + B)∕K (see (3)). Let P 1 and P 2 be probabilistic constituents which allocate probability uniformly to CT 1 and CT 2 (1/c and 1/c′, respectively), where c ′ ≥ c . Now D = c−A = c � − B . Then the Manhattan distance satisfies If c = c � , this value equals KΔ C C 1 , C 2 ∕c . For the Euclidean distance with c = c � we have A similar connection to the Clifford measure holds for the Jensen-Shannon divergence: Δ 2 (P 1 , P 2 ) = (A + B)∕c 2 = K Δ C C 1 , C 2 ∕c 2 .
28 Fig. 2 in Sect. 5 shows that symmetric difference still has an interesting connection to the distance between probability densities.
However, such connections fail for non-uniform laws. For example, if nomic constituents B 1 and B 2 are otherwise almost equal, but B 1 makes correct possibility claims about cells Q i with high true probability p * i while B 2 makes such claims about cells Q j with low probability p * j , then it may happen that truthlikeness ordering is reversed when the target changes from B* to P*: Δ C B 1 , B * > Δ C B 2 , B * , but d B 1 , P * < d B 2 , P * . 29 div JS P 1 , P 2 = log2c 2c

Fig. 2
Distance between probability densities f and g 29 Festa (2007) has proposed a way of measuring the distance between a monadic generalization and "the statistical truth" (i.e., true probabilistic constituent). The idea, roughly speaking, is to divide Q-predicates into "statistically common" and "rare" ones, and then demand that a truthlike generalization should make true existential claims about common predicates and false exclusion claims only about rare predicates.

3
Example 3 If p and q are disjoint mixed probabilistic laws (i.e.,CT p ∩ CT q = ⊘ ), then Example 4 The Poisson distribution for a randomly occurring rare event p(i), i = 1, 2, … , with a constant mean λ is defined by The KL-divergence between two Poisson distributions with rates λ and λ´ (where λ´ > λ) is The proof uses the Taylor series Example 5 The Manhattan difference between two exponential laws (10) with decay rates λ and λ′ (where λ′ > λ) is Example 6 Let p and q be deterministic laws of succession such that p 2∕1 = 1, p 1∕1 = 0 and q 2∕1 = 0, q 1∕1 = 1 . Then For indeterministic r with r 2∕1 = r 1∕1 = 1 ∕ 2 , These examples illustrate that several alternative distance measures give fairly similar comparative results. For uniform nomic constituents their results are related to the Clifford-measure between ordinary constituents, but this relation is not straightforward for non-uniform constituents and disappears for pure probabilistic laws. When it comes to measure the distance between particular probability values, geometrical and quadratic differences seem simple and useful, but for the distance between whole probability distributions or densities divergence is a convenient choice. The applicability of the Kullback-Leibler divergence is restricted to pure probabilistic laws, so that the Jensen-Shannon divergence turns out to be valuable complement which can be applied to mixed laws which assign zero probabilities to some Q-predicates or sample points.

Vertical versus horizontal distance measures
An important debate about the explication of truthlikeness for monadic languages concerned the question, whether the distance between constituents should reflect distances between Q-predicates. The Clifford-measure Δ C C 1 , C * counts all errors of C 1 about the Q-predicates equally: mistaken existence claims in CT 1 −CT * and mistaken non-existence claims in CT * − CT 1 have the same weight 1/K in (3). It is natural to consider also situations where the cognitive seriousness of errors in a false constituent are treated differently, so that the distance from the truth is not simply the cardinality of the symmetric difference. Niiniluoto proposed in 1976 two modifications of the Clifford-measure on the basis of the ρ-measure between Q-predicates. 30 In the Jyväskylä measure d J false existence claims are weighted by their distance to the nearest non-empty cell, and false non-existence claims by their distance to the nearest empty cell, while in the weighted symmetric difference d w the first condition holds, but false non-existence claims are weighted by the minimum distance to a really non-empty cell. Then Δ C and d w (unlike d J ) are symmetric, and Δ C and d J (unlike d w ) are specular, where a specular distance (in the sense of Festa, 1993) satisfies the condition that the maximally distant constituent from C i is its photographic negative (i.e., all positive claims are replaced by negative ones and vice versa). 31 If the ρ-measure reflects resemblances between predicates in a family (e.g. colors), then for the Jyväskylä measure the generalization "All ravens are grey" is closer to the truth than "All ravens are white". 32 Tichý's (1976) general definition of truthlikeness implies for the monadic case a distance measure between constituents which differs from the Clifford-measure Δ C and its modifications d J and d w . 33 A linkage η between sets CT i and CT j is a surjective mapping from the larger of the sets to the smaller one. The cardinality card(η) of η is then max | | CT i | | , |CT j | , and the breadth of η is the average distance between the linked predicates: The distance d T C i , C j between constituents C i and C j is then defined as the breadth of the narrowest linkage between CT i and CT j . Niiniluoto (1987), p. 319. 32 See Niiniluoto (1987 See also Oddie (1986), pp. 91-99, who applies this method to depth-d constituents. 30 See Niiniluoto (1978). See also the refined treatment of truth approximation by Kuipers (2019). Niiniluoto (1987) rejects Tichý's proposal for several reasons. The use of average in (13) leads to unintuitive examples, and constituents should not be treated as if they consisted only of existence claims. Indeed, d T is not specular and does not reflect the cognitive goal of finding true universal generalizations. The most fundamental objection is that d T C i , C * can be derived as the minimum distance between two state descriptions s and s′, where s entails the uniformly distributed infinite structure description entailing C i and s′ entails C*. Thus, Tichý is not defining the distance between C i and C* in terms of the counted or weighted differences in claims about the Q-predicates (and thereby the ability of C i to express true generalizations), but rather in terms of putting an infinite number of individuals in their right places in a classification system. 34 The latter problem should be solved by choosing the target as the true state description and by replacing constituents (in a non ad hoc way) as disjunctions of state descriptions.
In spite of this criticism, Tichý's basic idea is interesting, since the notion of a linkage resembles metrics defined for trees in terms of the number of transformations needed to change one tree to another. 35 A linkage takes seriously (but perhaps in a wrong way) the demand that "horizontal" distances between Q-predicates are relevant. The goal of distributing an infinite number of individuals to their right places could be viewed as analogous to the task of distributing a probability mass (of measure 1) to its right place. Indeed, a discrete probabilistic constituent (8) allocates the probabilities to a finite number of points in the space Q of Q-predicates, and a continuous probability density f on a state space ⊂ R n does the corresponding assignment to an infinite number of points. This can be illustrated by the simple case where Q is a subset of the real line R and f: → R + . If we denote by D f the region between the curve f(x) ≥ 0 and the real axis, i.e., then the density f gives the probability measure 1 to D f . For two probability densities f and g, the symmetric difference D f ΔD g covers the region between the functions f(x) and g(x) (see Fig. 2). The Manhattan distance is simply the area of this region: p and q are maximal, and the distances Δ 1 (p,q) and div JS (p,q) have their maximal values quite independently of the location of p and q with respect to the space Q. A counterpart of this result for probability densities is the following observation: if f 1 and f 2 are geometrical distributions with the same shape but disjoint domains, then Δ 1 (f 1 ,f 2 ), Δ 2 (f 1 ,f 2 ), and div JS (f 1 ,f 2 ) have their maximal values quite independently of the geometrical distance a between these densities (see Fig. 3). In fact, all distance measures surveyed by Cha (2007), which are applicable to mixed probabilistic laws, share this feature of verticality.
The observations above motivate the idea that one could try to find measures which in some way take into account the horizontal distances between probability distributions (in addition to their vertical ones). Then a modification of Tichý's linkages might be fruitful. The detailed development of this suggestion has to be left for another occasion, but a simple illustration of the idea can be given here. Consider again real-valued probability densities f which define regions D f in a subspace S of R 2 . Let β ∶ S → S be an area-preserving function, so that β[A] has the same area as A for all subregions A of S. Thus, β maps D f onto D g by moving the whole probability mass from D f to D g . 36 The length of the vector (< x,y > ,β(x,y)) is defined by the metric of S, and the breadth of β is defined as the sum (integral) of all these lengths for points < x,y > in D f . Then the distance between probabilities f and g is the breadth of the narrowest transformation β between D f and D g . For example, in Fig. 3 the mapping β(x, y) = (x − a, y) , i.e., linear shift to the left, 37 gives a linkage between f 2 and f 1 whose breadth is a, since For probabilities on the discrete sample space Q, which in effect define columns on the points of Q with the total length one, the corresponding idea is to measure the distance between p and q by looking for the shortest length-preserving transformation between p and q. Such a transformation divides the columns of q into pieces and moves them in order to reach a fit with p. If a part of a column q i is moved to Q j , then the length of this part is multiplied with the distance ρ(Q i ,Q j ). For example, let = {0, 1, 2}, ρ(0, 1) = ρ(1, 2) = 1∕2, ρ(0, 2) = 1 . Then p and q have the maximal distance 1, if p gives all probability to 0 and q to 2. If 36 A transformation β(x,y) = < f(x,y),g(x,y) > , where f and g are linear functions, is area-preserving, if the absolute value of its Jacobian determinant is one at every point. The Jacobian is composed of the partial derivatives of f and g: | f(x, y)∕dx f(x, y)∕dy| | g(x, y)∕dx g(x, y)∕dy| 37 Note that the Jacobian determinant of this transformation is. |1 0| |0 1| so that its value is 1 × 1 + 0 × 0 = 1.

then the distance between p and q is
But if then the distance between p and q is ¼. These measures, which combine vertical and horizontal aspects, are applicable to both pure and mixed probabilistic laws.

Estimating distance from probabilistic truth
According to the similarity approach to the epistemic problem of truthlikeness, unknown degrees of truthlikeness can be estimated by their expected value (4) using a posterior probability distribution over constituents. The same idea can be applied for the estimation of unknown degrees of divergence, which measure distance from the true probabilistic law.
Example 7 38 If p is the true probability of success in a binomial model and q is our guessed value, then the divergence of q from p in a single trial is p 1 = p 2 = p 3 = 1∕3 q 1 = 1∕6, q 2 = 2∕3, q 3 = 1∕6. Example 8 40 Let x 1 ,…, x n be independent measurements of an unknown real-valued quantity θ with a normal distribution N(θ, 2 ): Then their mean value y = x 1 + ⋯ + x n ∕n is normally distributed N(θ,σ 2 /n). If the prior probability of θ is sufficiently flat normal, then the posterior distribution g(θ/y) of θ is approximately N(y,σ 2 /n), where y is the observed mean. If f(x/θ) is the true distribution, and f(x/θ o ) is our guess, then their estimated directed divergence is Here the mean y as the best estimate agrees with the result of the Bayes-rule of minimizing expected quadratic loss.

Conclusion
We have seen in this paper that the basic idea of the similarity approach to truthlikeness can be extended from qualitative and quantitative first-order languages to cases where probabilistic statements (and their disjunctions) are compared with probabilistic targets. Sections 2 and 3 show how one can naturally proceed from universal and deterministic laws to probabilistic laws. Section 4 argues that the Kullback-Leibler divergence has to be supplemented by the Jensen-Shannon divergence as a measure between mixed probabilistic laws, i.e., laws which assign zero probabilities to some sample points and thereby entail some universal laws. Section 5 formulates a research program for studying a new class of measures which account for the horizontal differences between probability densities, based on distances between sample points. In this way the theory of probabilistic truth approximation does not only lend tools from probability calculus but may suggest novel kinds of problems for mathematicians. Finally, Sect. 6 gives examples to show that the method of estimating degrees of legisimilitude by their expected value can be generalized from the case of deterministic laws to probabilistic laws.