Probability Propagation in Generalized Inference Forms

Probabilistic inference forms lead from point probabilities of the premises to interval probabilities of the conclusion. The probabilistic version of Modus Ponens, for example, licenses the inference from P(A)=α\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${P(A) = \alpha}$$\end{document} and P(B|A)=β\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${P(B|A) = \beta}$$\end{document} to P(B)∈[αβ,αβ+1-α]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${P(B)\in [\alpha\beta, \alpha\beta + 1 - \alpha]}$$\end{document} . We study generalized inference forms with three or more premises. The generalized Modus Ponens, for example, leads from P(A1)=α1,…,P(An)=αn\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${P(A_{1}) = \alpha_{1}, \ldots, P(A_{n})= \alpha_{n}}$$\end{document} and P(B|A1∧⋯∧An)=β\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${P(B|A_{1} \wedge \cdots \wedge A_{n}) = \beta}$$\end{document} to an according interval for P(B). We present the probability intervals for the conclusions of the generalized versions of Cut, Cautious Monotonicity, Modus Tollens, Bayes’ Theorem, and some SYSTEM O rules. Recently, Gilio has shown that generalized inference forms “degrade”—more premises lead to less precise conclusions, i.e., to wider probability intervals of the conclusion. We also study Adam’s probability preservation properties in generalized inference forms. Special attention is devoted to zero probabilities of the conditioning events. These zero probabilities often lead to different intervals in the coherence and the Kolmogorov approach.


Introduction
While logic studies the propagation of truth values from premises to conclusions, probability logic studies the propagation of probabilities from premises to conclusions. In probability logic Modus Ponens, for example, has the form shown on the left hand side of Table 1. On the right hand side the generalized probabilistic Modus Ponens is shown. In probability logic a conditional A ⇒ B is represented by the "conditional event" B|A. The probabilities of the premises are point probabilities. This assessment is assumed to be coherent. Usually, the inferred probability of the conclusion is an interval probability.
. . . P (E) = α P (En) = αn P (H|E) = β P (H|E1 ∧ · · · ∧ En) = βn P (H) ∈ [δ , δ ] P (H) ∈ [δ n , δ n ] Below we study, for different generalized inference forms, the behavior of the interval [δ n , δ n ] for an increasing number n of premises. We review results recently obtained for generalized probabilistic inference forms [6,10,14,15]. In these inference forms a degradation is observed. The width of the probability interval of the conclusion increases as the number n of premises increases. Thus, more premises lead to less precise conclusions. Figure 1 shows a numerical example for the degradation of Modus Ponens. In most inference forms even an "ultimate" degradation occurs: Already after the addition of a small number of premises, the interval of the conclusion becomes the non-informative interval [0,1]. This is a consequence of the fact, that already for small n the lower bound of the conjunction P ( n i=1 E i ) may be zero.
Because the lower bound of the conjunction probability-even for a relative small number of conjuncts-becomes zero, the conditioning events may have zero probabilities. However, in the Kolmogorov approach conditional probability is undefined in this case. The Kolmogorov approach is therefore not appropriate to investigate generalized inference forms. The case where Ei) = 0.5 the conditioning event has zero probability can, however, be treated in the coherence approach of de Finetti [4]. As a consequence, the Kolmogorov and the coherence approach lead to different interval probabilities for the conclusion of generalized inference forms.
As already mentioned, in many generalized inference forms the interval for the conclusion is getting wider as the number of premises increases and the interval [0, 1] is obtained after a certain number of premises is added. In probabilistically valid inference forms, however, the probability of the premises of an inference form is preserved to its conclusion. Are generalized inference forms consequently probabilistically invalid? Different inference forms preserve the probability of its premises to its conclusion to different degrees. Adams distinguished four preservation properties [1]. Each of these preservation properties determines a consequence operation. An inference form is valid with respect to such a consequence operation, if and only if, it satisfies the corresponding preservation property. We can establish whether an inference form satisfies a preservation property by considering the lower probability of its conclusion. Well-known examples are System P [11], which is associated with probability one-preservation and System O [8,9], which is closely connected with minimum probability preservation. Modus Ponens, for instance, is probability one preserving and consequently System P valid. This can immediately be seen by considering the lower bound of the interval of the conclusion of Modus Ponens. If P (A) = α = 1 and P (B|A) = β = 1, then P (B) ≥ αβ = 1. It is important to note that, since they yield different intervals for the conclusion of inference forms, the Kolmogorov and the coherence approach validate different inference forms.

Coherent Conditional Probability
For the treatment of conditioning events with zero probabilities in generalized inference forms, we employ the coherence approach of probability theory [3,4]. While in the Kolmogorov approach conditional probability is defined by a ratio of two (unconditional) probabilities, it is a primitive concept in the coherence approach. Let L be a Boolean algebra (i.e., L is closed under ¬, ∧, ∨), let |= denote the classical consequence operation and let be the sure event.
e.g., [3]) Let L , T be arbitrary sets of events. There are several advantages of the coherence approach. First, in the Kolmogorov approach, P (A|B) is defined by P (A∧B) P (B) . Knowledge of the marginal probabilities P (A ∧ B) and P (B), however, is not required to assess conditional probabilities in the coherence approach [3]. Second, while in the Kolmogorov approach P (A|B) is undefined if P (B) = 0, in the coherence approach conditioning on (consistent) events with zero probability is possible. P (·|B) is a one-place probability function even if P (B) = 0. As a consequence, for instance, probability one can be updated in the light of events with zero probability, i.e., it is not necessarily the case that P (A|B) = 1, if P (A) = 1 [3].
The interval of the coherent probability values for the conclusion of an inference form can be determined by solving sequences of linear systems. This is the content of Theorem 5 below which provides an alternative characterization of coherence [3, p. 81]. Let P (E 1 |H 1 ), . . . , P (E n |H n ) be a probability assessment. If H i = , then we write P (E i ) instead of P (E i |H i ).
A possible outcome or a constituent is a logically consistent conjunction of the form ±E 1 ∧ · · · ∧ ±E n ∧ ±H 1 ∧ · · · ∧ ±H n , where ±A ∈ {A, ¬A} for all events A. If the 2n events are logically independent, then there are 2 2n constituents C 1 , . . . , C 2 2n . The probability of an event E is the sum of the probabilities of the constituents C r verifying it, i.e., C r |= E. Table 2 shows our notation in the case of three events E, F, G.

Theorem 5. (Coletti and Scozzafava [3, p. 81, Theorem 4]) An assessment
Let P be a coherent extension of the assessment P (E 1 |H 1 ),. . ., P (E n |H n ). Then any given solution (x α r ) of the System S α can be interpreted as a coherent extension of the initial assessment to the family Example 1. Consider for example Predictive Inference. The premises are If P (E ∧ F ) = x 1 + x 2 > 0, then we obtain the lower (upper) bound for the predictive probability P (G|E ∧F ) by minimizing (maximizing) the objective function x 1 x 1 +x 2 in the system S 0 Solving the linear system shows that Solving S 1 shows that, if P (E∧F ) = 0, then P (G|E∧F ) can attain any value in [0, 1]. Note that in the Kolmogorov approach no corresponding result is obtained as in this case P (G|E ∧ F ) is undefined.

Probability Intervals for Generalized Inference Forms
In this section, we collect results for probabilistic versions of important generalized inference forms [6,14]. We analyze these inference forms with respect to degradation. If some of the conditioning events have zero probability, we often obtain different intervals for the coherence and the Kolmogorov approach. In the coherence approach a proper treatment of this case is possible, so that the probability of the conclusion is always a closed interval. In the Kolmogorov approach, we obtain in many cases half-open, open, or no intervals at all.
For the remainder of the paper, we suppose that P is a coherent conditional probability.

Equality holds for lower bounds greater than zero if and only if
is 0. We shall soon see that these properties of the conjunction cause the degradation of many other inferences.

Cautious Monotonicity
The generalized version of System P rule Cautious Monotonicity is given by Cautious Monotonicity degrades. As the number of premises increases, the width of the interval of the conclusion increases. Furthermore, if n ≥

Cut
The generalized version of System P rule Cut is given by the following theorem. The interval of the conclusion strongly depends on the lower bound σ n for the conjunction P ( Cut degrades. The width of the interval for P (H|E 0 ) increases as the number of premises increases. This follows from the facts that its width is 1 − σ n and that σ n is monotonically decreasing. The width of the interval 1 − σ n depends on the lower bound of the conjunction, i.e., σ n . Since this lower bound is zero if n ≥ n i=1 α i + 1, the interval for P (H|E 0 ) is the unit interval if the number of premises is sufficiently high.

Bayes' Theorem
Suppose that the prior probability of a certain hypothesis P (H) = δ, the likelihood of the data given both, the hypothesis H, P (D|H) = β, and the alternative hypothesis ¬H, P (D|¬H) = γ, are given. The posterior probability of the hypothesis H given the data D is obtained, if P (D) > 0, by Bayes' Theorem P (H|D) = βδ βδ+γ (1−δ) . The premises of generalized Bayes' Theorem are P (H) = δ, P (E 1 |H) = β 1 , . . . , P (E n |H) = β n , P (E 1 |¬H) = γ 1 , . . . , P (E n |¬H) = γ n . In inferential statistics it is often assumed that the E i 's are independent and identically distributed. We do neither require conditional independence of the E i 's given H nor do we require that P (E i |H) = P (E j |H) for i = j. The conclusion of the generalized Bayes' Theorem is P (H|E 1 ∧ · · · ∧ E n ).

Modus Tollens
Modus Tollens is the inference from {¬B, A ⇒ B} to the conclusion ¬A.
The result for probabilistic Modus Tollens with two premises within the Kolmogorov approach has been derived in [13]. Generalized Modus Tollens is given by Remark 14. If P is a Kolmogorov probability, then the upper bound 1 is never correct. Since if P (¬H) = 1, then P (H) = 0 and consequently P (E 1 ∧ E 2 ∧ · · · ∧ E n |H) is undefined. Within the coherence approach an assessment of probability 1 to both premises of Modus Tollens is perfectly alright and leads to probability 1 of the conclusion. A Kolmogorov probability such that P (¬B) = 1 and P (B|A) = 1, however, does not exist. If P (¬B) = 1, then P (A) = 0, and hence P (B|A) is undefined-a contradiction.
Modus Tollens is special because if α * + β > 1, then the interval of its conclusion does not depend on the number of premises n. However, if α * +β ≤ 1, then it does depend on n. Modus Tollens does not degrade. Moreover, contrary to the other inferences considered so far, the unit interval is not necessarily obtained if the number of premises is large.

Exclusive-Or.
System O is weaker than System P [8,9]. It contains weaker forms of the rules And and Or, Weak-And (Wand) and Weak-Or (Wor). Wor is System O equivalent with the rule Exclusive-Or (Xor).
Remark 16. In the Kolmogorov approach, if for some i, j α i = α j , we obtain the open interval (min{α i }, max{α i }). In this case we cannot set for some i P (E i |E 1 ∨ · · · ∨ E n ) = 1, because then P (E j ) = 0 for all j = i and consequently P (H|E j ) is undefined.
Xor does degrade. However, the interval [0, 1] is not necessarily obtained after addition of a certain number of premises.

Probabilistic Validity of Generalized Inference Forms
The key question of this section is whether a certain generalized inference form satisfies one of the probability preservation properties below. The question can be answered by considering the lower bound of the intervals obtained in Section 3. The Kolmogorov approach and the coherence approach often yield different lower bounds. As a consequence, an inference form may satisfy a preservation property relative to one of the approaches while it does not satisfy it with respect to the other approach.

Certainty-preservation
If the premises of an inference form have probability 1, then its conclusion has probability 1.

2.
High probability-preservation If the premises are highly probable, then the conclusion is highly probable, i.e., for all δ > 0 there exists > 0 such that: If for every premise A it is P (A) ≥ 1 − , then for the conclusion C it holds that P (C) ≥ δ.

Positive probability-preservation
If the premises have positive probability, then the conclusion has positive probability.

Minimum probability-preservation
The probability of the conclusion is at least as high as the minimum of the probabilities of the premises. Or equivalently: For all thresholds r: If the probability of each premise is at least r, then the probability of the conclusion is at least r.
4. Not minimum preserving: In general, it is not the case that αβ ≥ min{α, β}.

Certainty-Preservation and High Probability-Preservation
It is important that, while in the Kolmogorov approach high probability and certainty-preservation differ, they are, given the assumption of p-consistent premises, equivalent in the coherence approach [5,7]. We call a set of premises {A 1 , . . . , A n } p-consistent iff the assessment P (A 1 ) = · · · = P (A n ) = 1 is coherent.
Theorem 17. Suppose that {A 1 , . . . , A n } is p-consistent. Then the inference from {A 1 , . . . , A n } to C is in the coherence approach certainty preserving iff it is high probability preserving.
Remark 18. In the coherence approach an inference form has been called System P valid iff its premises are p-consistent and it is high probability preserving [5]. Contrary to other approaches, System P validity therefore requires p-consistent premises. We mention three such approaches. Adams [1] works with the default assumption: If P (A) = 0, then P (B|A) = 1 for all B. Hawthorne uses Popper functions. With respect to Popper functions certainty and high probability-preservation are, even without the assumption of p-consistent premises, equivalent [8]. Hawthorne and Makinson [9] employ Kolmogorov probability functions. In Section 4.4, we discuss the inference form Weak-And. It is System P valid with respect to these approaches, but not with respect to the coherence approach.
The inference from B to A ⇒ B is, for example, relative to the Kolmogorov approach certainty preserving but not high probability preserving. In the coherence approach this inference form is not high probability preserving and therefore, because {B} is p-consistent, not certainty preserving.
The inference forms of Section 3 are certainty preserving relative to the coherence approach. This is immediately obtained by considering the lower bound of their conclusion. Consequently, if their premises are p-consistent, these inference forms are already known to be high probability preserving in the coherence approach.
To show that an inference form is high probability preserving with respect to the coherence approach, we can alternatively determine for every probability of the conclusion δ a "high" probability 1− for the premises, such that this probability assessment guarantees that the probability of the conclusion is at least δ. A suitable can be determined by considering the intervals given in Section 3. Consider, for example, Modus Tollens. Let δ > 0. In order that P (C) ≥ δ, the lower bound of P (C), i.e., 1 − 1−α * β , may not be less than δ. Therefore, we solve for and obtain 1 − ≥ 1 2−δ . A suitable for the other inference forms can be determined by the same method. We have Theorem 19. Let P be a coherent conditional probability. All inference forms of Section 3 with p-consistent premises are certainty preserving and (consequently) high probability preserving.
Remark 20. Although generalized inference forms remain high probability preserving, degradation has a striking consequence. In order to obtain that the probability of the conclusion is at least δ, is clearly decreasing with increasing n. Since the lower probability of the conclusion decreases as the number of premises increases, for a high probability of the conclusion a higher probability of the premises is necessary with increasing n.

Positive Probability-Preservation
For n ≥ 2 premises high probability preservation differs significantly from positive probability preservation. While all of the generalized inference forms considered in Section 3 are high probability preserving, none of them-with the exception of Xor-is positive probability preserving. As already pointed out, in the case of And, Cautious Monotonicity, Cut, and Bayes' Theorem the lower bound of the conclusion is zero if the number of premises n is sufficiently high.
Moreover, in contrast to their generalizations, some of the inference forms are positive probability preserving. Cut and Bayes' Theorem are positive probability preserving for n = 1. If the sum of the probabilities of the two premises is different from one, then Modus Tollens is also positive probability preserving.
Theorem 21. If P is a coherent conditional probability, then the generalizations of And, Cautious Monotonicity, Cut, Modus Tollens and Bayes' Theorem are not positive probability preserving. Generalized Xor is positive probability preserving.
The Kolmogorov and the coherence approach validate different inference forms. In the Kolmogorov approach generalized Cut, for instance, is positive probability preserving, while this is not the case in the coherence approach.

Minimum Probability-Preservation
Positive probability preservation and minimum probability preservation differ. Contrary to positive probability preservation, Cut with two premises is, for example, not minimum probability preserving. System O is closely connected with minimum probability preservation (for a description of System O see [8,9]). All its inference forms are minimum probability preserving. The converse, however, is not true.
The generalization of Xor is minimum probability preserving.
Theorem 22. Let P be a coherent or a Kolmogorov conditional probability. Then generalized Xor is minimum probability preserving.
The System O rule Weak-And (Wand) is given by Wand is central to System O and minimum preserving in the Kolmogorov approach. However, a positive probability assessment to A ∧ ¬B ⇒ B is incoherent. Hence, from the point of view of coherence, System O is not satisfactory.
Remark 24. The premises of Wand are not p-consistent. As a consequence Wand is not System P valid in the coherence approach (compare Remark 18). In other approaches p-consistency of the premises is not required for System P validity (compare Remark 18). Consequently, since And is System P valid, and Wand is a special case of And, Wand is System P valid in these approaches.
If P (B|A∧¬B) > 0, we can conclude in the Kolmogorov framework, that P (A ∧ ¬B) = 0 and hence that P (B|A ∧ ¬B) is undefined. Hence, there is no Kolmogorov probability such that P (B|A ∧ ¬B) > 0.

Conclusions
We have seen that Cautious Monotonicity, Cut, and Exclusive-Or clearly degrade, and that Bayes' Theorem (with some exceptions) and Modus Tollens do not degrade. Moreover, in all the inference forms consideredwith the exception of Modus Tollens and Exclusive-Or-the unit interval is obtained even with a "small" number of premises. Narrow intervals may be considered to be better than wide intervals; a more complete knowledge base may be considered to be better than a truncated one [12]. While in general the number of premises and the precision of the conclusion may conflict, in generalized inference forms they often must conflict.
Degradation does not conflict with the property of monotonicity, but its consequences for information seeking cannot be ignored. On the one hand, the principle of total evidence leads to the selection of the most "recent" interval based on the most specific information. This yields wide intervals, and in many cases even the non-informative [0, 1] interval. On the other hand, a take-the-best strategy leads to the selection of the tightest interval. The according interval is based on the seemingly most "relevant" information with n = 1. Since all additional premises are discarded, it would be counterproductive to seek further information, because it would simply be useless.
Degradation is neither "good" nor "bad". To solve the conflict between precision and specificity requires to counterbalance (i) the width of an interval, (ii) the amount of information it is based upon, and (iii) the position of the interval. The choice depends on pragmatic conditions. An answer to the question which interval should rationally be selected seems to lie outside the domain of probability theory.
It might be supposed that degradation disappears if further constraints are added to the premises. In many cases stochastic independence, for example, leads to point probabilities of the conclusions. Though often presupposed, independence may be a constraint that is too strong. Exchangeability is a related but much weaker assumption. We have shown that in many generalized inference forms exchangeability does not prevent degradation [15].
In general, degradation does not make generalized inference forms probabilistically invalid. Each of the inference forms considered in this contribution is high probability preserving. As already pointed out, the lower probability of the conclusion is often zero if the number of premises is large. Therefore, all of the inference forms-with exception of Exclusive-Or-are not positive probability preserving.