1 Introduction, context and motivation

In this paper we discuss probabilistic interpretations of the infinite epistemic regress problem. One locus classicus of the epistemic regress problem is Sextus Empiricus’ Outlines of Phyrronism, where five “Modes” are distinguished, i.e. five lines of reasoning, that sceptics have availed themselves of as a safeguard against dogmatism. The second of these modes has to do with an infinite regress. Says Sextus: “The Mode based upon regress ad infinitum is that whereby we assert that the thing adduced as a proof of the matter proposed needs a further proof, and this again another, and so on ad infinitum, so that the consequence is suspension [of assent], as we posses no starting-point for our argument.” Numerous formulations of this epistemic regress problem can be found in the literature, see van Woudenberg and Meester (2014) for a discussion, an overview, and for references.

In Atkinson and Peijnenburg (2008, 2010), Peijnenburg (2007), and in the more recent book (Peijnenburg and Atkinson 2017), Peijnenburg and Atkinson interpret the epistemic regress problem probabilistically, as follows. Consider a proposition \(E_0\) and suppose that we are interested in in showing whether or not belief in \(E_0\) is justified. Suppose further that there is a proposition \(E_1\) that makes \(E_0\) probable. Imagine that \(E_1\) is in turn made probable by \(E_2\), and that \(E_2\) is made probable by \(E_3\), and so on, ad infinitum. Atkinson and Peijnenburg argue that under these conditions, with a suitable interpretation of the “make probable” relation, that the degree of justified belief in \(E_0\) is uniquely defined and can even be determined.

More precisely, let a be the probability of proposition \(E_n\) given that proposition \(E_{n+1}\) is true, and b be the probability of \(E_n\) given that \(E_{n+1}\) is false. In formulas, this reads

$$\begin{aligned} P(E_n \mid E_{n+1}) =a \text{ and } P(E_n \mid E_{n+1}^c)=b, \end{aligned}$$
(1.1)

for \(n=0,1,\ldots \)Footnote 1 For simplicity, we only discuss the case in which neither of these two conditional probabilities depend on n.Footnote 2

Although (1.1) does not determine the full joint probability distribution of the propositions \(E_0, E_1, \ldots \), Peijnenburg and Atkinson show that nevertheless conditions in (1.1) imply that \(P(E_0)\) is uniquely determined and equals

$$\begin{aligned} P(E_0)=\frac{b}{1-a + b}. \end{aligned}$$
(1.2)

This is the “completion” of the infinite regress that Peijnenburg and Atkinson claim: despite the fact that there is an infinite chain of conditional probabilities, the unconditional probability of \(E_0\) is well defined.

We do not address here whether or not such a “completion” can be viewed as an answer to the infinite epistemic regress problem, and we refer to van Woudenberg and Meester (2014) for a critical discussion of this issue.

We note the fact that Atkinson and Peijnenburg assume that epistemic probability is appropriately axiomatized by the usual Kolmogorov axioms of probability. This assumption is implicit, but it is also explicitly used in their computations when they use the law of total probability

$$\begin{aligned} P(A) = P(B)P(A {\mid } B) + P(B^c)P(A {\mid } B^c), \end{aligned}$$
(1.3)

which is, within the Kolmogorov framework, valid for all propositions A and B.

The context of Atkinson and Peijnenburg is that of an epistemic interpretation of probability. In the current paper, we understand epistemic probability as to deal with to what extent one is justified to believe things, rather than with how things are. Obtaining different pieces of information about a certain quantity may lead to different epistemic probability statements.

Several authors have formulated axioms for epistemic probability different from the classical ones: notably Cohen (1977), Walley (1991) and Shafer (1976, 1981, 2008); see also Haenni (2009), Kerkvliet and Meester (2017) and Kerkvliet and Meester (2019). Especially the approach of Shafer has been rather influential. Shafer used a different axiomatization for epistemic probability and ended up with an object which he called a belief function. Belief functions generalize probabilities: every probability distribution is a belief function but not vice versa and as such, belief functions are more flexible than probability distributions. Shafer argued that belief functions better capture the essentials of epistemic probabilities than classical probability distributions. We refer to the recent paper (Kerkvliet and Meester 2017) for a review and for further arguments supporting this claim.

In this paper we argue that if one uses belief functions instead of classical (sometimes also called ‘Bayesian’) probabilities, then the infinite epistemic regress problem does not have a unique solution, based on the values of a and b in (1.1). We will also argue by examples that this complies with rational reasoning. This goes against the conclusions of Atkinson and Peijnenburg in their context of classical probabilities.

In the next section we briefly recall belief functions, and explain their relevance for the interpretation of epistemic probability. After that, we give our examples of infinite regress situations which show that solutions are not unique, and we argue that this non-uniqueness is indeed rational and understandable. In the last section we shed some light on the possible solutions of an infinite regress given the conditional beliefs.

2 Belief functions and epistemic probability

Consider some finite outcome space \(\Omega \) of an experiment. A probability distribution is determined by the probabilities it assigns to each outcome in \(\Omega \). In other words, a probability distribution p on \(\Omega \) is nothing but a collection of non-negative numbers \(p(\omega )\) so that \(\sum _{\omega \in \Omega } p(\omega )=1\). If we receive the information that a certain experiment is described by p then this means that we have full probabilistic knowledge about the experiment in the sense that we know, for all possible outcomes \(\omega \), the probability that \(\omega \) occurs.

Shafer argued that one might be in the epistemic situation that such full knowledge about the experiment is not available. Consider an experiment in which a number between 1 and 10 is drawn (inclusive). One may receive different pieces of information about this experiment. We may receive the information that the number is simply drawn uniformly at random between 1 and 10 inclusive. This would correspond to a classical situation in which each outcome has probability 1/10, and this position can be described by classical probabilities.

However, one can also think of a situation in which, say, we only receive the information that the probability of an even number is 1/2, and that the probability of an odd number is 1/2 as well. This information cannot be readily described by a classical probability distribution on the elements of the outcome space \(\Omega \). It can, however, be described by a probability distribution m, say, on the subsets of \(\Omega \), namely by m defined by \(m(\{1,3,5,7,9\})= m(\{2,4,6,8,10\})=1/2\), and \(m(A)=0\) for all other subsets of the outcome space.

An extreme situation is that one has no information at all about the experiment, other than that the outcome is in \(\Omega \). If that really is all one knows, then this information translates into m defined by \(m(\Omega )=1\) and \(m(E)=0\) for all strict subsets E of \(\Omega \). This m puts all probability mass on the set \(\Omega \) itself and describes complete ignorance.

The probability distribution m on the subsets of \(\Omega \) summarizes the knowledge one has about an experiment, and it is called a basic belief assignment in the literature. To each such m corresponds a belief function defined as follows:

Definition 2.1

The function \(\mathrm{Bel}\), defined on subsets of \(\Omega \) by

$$\begin{aligned} \mathrm{Bel}(E) = \sum _{C \subseteq E} m(C) \end{aligned}$$
(2.1)

is called the belief function corresponding to the basic belief assignment m, and \(\mathrm{Bel}(E)\) is called the belief in E.

In words, the belief in an event E is the epistemic probability, based on one’s knowledge about the experiment, that the outcome is in E. It is conservative in the sense that for \(\mathrm{Bel}(E)\), we do not count sets that only overlap E without being contained in it. Note that \(\mathrm{Bel}\) is a probability distribution if and only if m concentrates on singletons in the set \(\Omega \).

We can see how this works in the examples above of choosing a number between 1 and 10. If \(m(\{1, 3, 5, 7, 9\}) = m(\{2, 4, 6, 8, 10\}) = 1/2\) then we have, for instance, \(\mathrm{Bel}(\{1, 3, 5\})= \mathrm{Bel}(\{7, 9\}) =0\). Note that \(\mathrm{Bel}(\{1,3,5,7,9\})=1/2\) so \(\mathrm{Bel}\) is not additive. When \(m(\Omega )=1\) then the belief in any strict subset of \(\Omega \) is 0, but the belief in \(\Omega \) itself is 1, again showing non-additivity of belief functions.

As another basic example, we take \(\Omega =\{1,2,3\}\) and the basic belief assignment m given by \(m(\{1\}) = m(\{2\}) = 1/3\), \(m(\{3\})=1/6\), and \(m(\{1,3\})=1/6\). The corresponding belief function is then given by

E

\(\emptyset \)

{1}

{2}

{3}

{1,2}

{1,3}

{2,3}

{1,2,3}

\(\mathrm{Bel}(E)\)

0

1/3

1/3

1/6

2/3

2/3

1/2

1

Indeed, this table is obtained by simply adding up the m-probabilities of the sets that are contained in the target sets in the first row.Footnote 3

To be able to process new information, we need a notion of conditional belief. The literature contains a number of suggestions for a suitable definition of \(\mathrm{Bel}(E \mid H)\). In his original book Shafer (1976) suggested a notion based on the so-called Dempster rule of combination, but this proposal has received a lot of criticism, see e.g. Dubois and Prade (1992), Fagin and Halpern (1989, 1991), Kerkvliet and Meester (2019) and Pearl (1990). Instead, various authors (Dubois and Prade 1992; Fagin and Halpern 1989; Kerkvliet and Meester 2019) have arrived at another suggestion for conditional beliefs and this is the notion that we will use.

Definition 2.2

We define the conditional belief in E given H by

$$\begin{aligned} \mathrm{Bel}_{H}(E) := \frac{\mathrm{Bel}(E \cap H)}{\mathrm{Bel}(E \cap H) + 1- \mathrm{Bel}(E \cup H^c)}, \end{aligned}$$
(2.2)

whenever this is well defined.

The expression in (2.2) has a very intuitive rationale, as follows. If we were to repeat the experiment many times, then we can consider the frequency of E occurring in the subsequence of outcomes that H occurs. The expression in (2.2) turns out to be the minimal such relative frequency of E that we can be sure of, given the information contained in the basic belief assignment. Note that if m concentrates on singletons and hence \(\mathrm{Bel}\) is a probability distribution, then (2.2) reduces to the classical conditional probability of E given H, as it should. We refer to Kerkvliet and Meester (2019) for the details.

3 Belief functions and infinite regress

The most natural way to model an infinite regress is to consider the outcome space \(\Omega :=\{0,1\}^{\infty }\) and to set

$$\begin{aligned} E_i := \{ (\omega _0, \omega _1,\omega _2,\ldots ) \in \Omega \;:\; \omega _i = 1 \}. \end{aligned}$$
(3.1)

Now note that \(\Omega \) is an infinite space, and so far we have only defined belief functions on finite spaces. However, for our current purposes we need only two types of belief functions on infinite spaces, namelyFootnote 4

  1. 1.

    Probability distributions;

  2. 2.

    Belief functions defined as in Definition 2.1 with a basic belief assignment m which assigns positive mass to only finitely many sets.

For such belief functions, the definition for conditioning as given by Definition 2.2 can be inherited directly.

We next give some examples of basic belief assignments which show that the infinite regress problem has no unique solution when modeled with belief functions.

Example 3.1

Consider the following basic belief assignment on \(\Omega \):

$$\begin{aligned} m( E_0 \cap E_1^c \cap E_2 \cap E_3^c \cap \cdots )= & {} 1/4, \end{aligned}$$
(3.2)
$$\begin{aligned} m( E_0^c \cap E_1 \cap E_2^c \cap E_3 \cap \cdots )= & {} 1/4, \end{aligned}$$
(3.3)
$$\begin{aligned} m(E_0 \cap E_1 \cap E_2 \cap \cdots )= & {} 1/4, \end{aligned}$$
(3.4)
$$\begin{aligned} m(\Omega )= & {} 1/4. \end{aligned}$$
(3.5)

In words, this basic belief assignment corresponds with the information that with probability 1/4, \(E_k\) is true precisely for all even values of k, with probability 1/4 \(E_k\) is true precisely for all odd values of k, with probability 1/4 all \(E_k\) are true, and finally with probability 1/4 one knows nothing at all.

Let us compute the relevant quantities for the corresponding belief function \(\mathrm{Bel}\). First of all, we have \(\mathrm{Bel}(E_0)=1/2\), since only the sets in (3.2) and (3.4) imply that \(E_0\) occurs. Furthermore,

$$\begin{aligned} \mathrm{Bel}(E_k \mid E_{k+1})= & {} \frac{\mathrm{Bel}(E_k \cap E_{k+1})}{\mathrm{Bel}(E_{k} \cap E_{k+1}) + 1- \mathrm{Bel}(E_k \cup E_{k+1}^c)}, \\= & {} \frac{1/4}{1/4 + 1 - 1/2} = 1/3. \end{aligned}$$

as is easily checked by applying the definition of the belief function \(\mathrm{Bel}\) in terms of m. Also,

$$\begin{aligned} \mathrm{Bel}(E_k \mid E_{k+1}^c)= & {} \frac{\mathrm{Bel}(E_k \cap E_{k+1}^c)}{\mathrm{Bel}(E_k \cap E_{k+1}^c) + 1- \mathrm{Bel}(E_k \cup E_{k+1})} \\= & {} \frac{1/4}{1/4 + 1- 3/4} = 1/2. \end{aligned}$$

Hence in this example, \(a=1/3\) and \(b= 1/2\) so in a classical situation (1.2) tells us that the belief in \(E_0\) would be 3/7. However, our outcome is \(\mathrm{Bel}(E_0)=1/2\), so the conditional beliefs do not uniquely determine the belief in the truth of \(E_0\). \(\square \)

It might be useful to point out that the outcomes of the (conditional) beliefs in Example 3.1 are the results of rational reasoning. We illustrate this reasoning in the form of a dialogue between T. and R. We think of \(E_i\) as the event that a certain lamp is on at time i.

  1. R:

    ‘Suppose I throw a fair tetrahedron whose sides are numbered 1, 2, 3 and 4. If I roll 1, I switch the lamp on at times \(0, 2, 4,\ldots \) and I switch it off at \(1, 3, 5,\ldots \). If I roll 2, I do the opposite: I switch the lamp off at \(0, 2, 4,\ldots \) and on at \(1, 3, 5,\ldots \). If I roll 3, I switch the lamp on at all times. Finally, if I roll 4 I do not give you any information at all about the lamp. To what degree would you believe that the lamp is on at time 0?’

  2. T:

    ‘Well, that’s easy. Only if you roll 1 or 3, I know that the lamp is on at time 0. The lamp is off when you roll 2, and when you roll 4 I have no information. I have no confidence at all for the lamp being on corresponding to rolling 4, so that my belief in the lamp being on at time 0 is 1/2.’

  3. R:

    ‘OK. But if I would tell you that the lamp is on at time \(k+1\), what do you believe about the lamp being on at time k? In other words, what is your conditional belief in the lamp being on at time k given that it is on at time \(k+1\)?’

  4. T:

    ‘Let’s be concrete and take \(k+1=3\), say. So you tell me the lamp is on at time 3. For one thing I know you did not roll 1 since then the lamp would be off at time 3. So there are three options left. If you rolled 2 the lamp would be of at time 2 and if you rolled 3 it would be on. If you rolled 4 then it may happen that the lamp is on at time 3 but off at time 2—in fact for all I know it may always be the case that if you roll 4 and the lamp is on at time 3 it is off at time 2. So when it comes down to determining how frequent the lamp is on at time 2 given it is on at time 3, the only thing I know for sure is that the relative frequency is at least 1/3. It could be higher, but I have no way of knowing that. So if you tell me the lamp is on at time 3, my belief that it is on at time 2 is 1/3.

  5. R:

    ‘Fair enough, and I think this is true for all moments in time?’

  6. T:

    ‘Yes.’

  7. R:

    ‘And what if I tell you that the lamp is off at time 3? What do you believe about the status of the lamp at time 2?’

  8. T:

    ‘Then I know you rolled either 1 or 4. If you rolled 1 I know the lamp is on at time 2 and for the same reason as above, I know nothing when you rolled 4. So similarly to the reasoning above, this time my conditional belief in the lamp being on at time 2 is 1/2.’

We next present a second example.

Example 3.2

Consider the following basic belief assignment:

$$\begin{aligned} m( E_0 \cap E_1^c \cap E_2 \cap E_3^c \cap \cdots )= & {} 1/4, \end{aligned}$$
(3.6)
$$\begin{aligned} m( E_0^c \cap E_1 \cap E_2^c \cap E_3 \cap \cdots )= & {} 1/4, \end{aligned}$$
(3.7)
$$\begin{aligned} m(E_0^c \cap E_1^c \cap E_2^c \cap \cdots )= & {} 1/4, \end{aligned}$$
(3.8)
$$\begin{aligned} m(E_0 \cap E_2 \cap E_4 \cap \cdots )= & {} 1/4. \end{aligned}$$
(3.9)

It is easy to see that for the corresponding belief function \(\mathrm{Bel}\) we have \(\mathrm{Bel}(E_0) =1/2\) (coming from (3.6) and (3.9)). Next we compute the conditional beliefs. It is easy to see that

$$\begin{aligned} \mathrm{Bel}(E_k \mid E_{k+1})=0, \end{aligned}$$
(3.10)

since \(\mathrm{Bel}(E_k \cap E_{k+1})=0\). Furthermore

$$\begin{aligned} \mathrm{Bel}(E_k \mid E_{k+1}^c)= & {} \frac{\mathrm{Bel}(E_k \cap E_{k+1}^c)}{\mathrm{Bel}(E_k \cap E_{k+1}^c) + 1- \mathrm{Bel}(E_k \cup E_{k+1})} \\= & {} \frac{1/4}{1/4 + 1- 3/4} = 1/2. \end{aligned}$$

Hence in this example we have \(a=0\) and \(b=1/2\). In the classical case this would lead to probability 1/3 (see (1.2)) that \(E_0\) is true, but we have obtained \(\mathrm{Bel}(E_0)=1/2.\)    \(\square \)

One can argue in the same way as for the previous example, that these numerical outcomes again follow from rational reasoning of an agent with the information corresponding to m. In the next section we will see that non-uniqueness is a generic property of infinite regresses, so that the examples we have seen so far are typical.

3.1 Bounds for the completion of an infinite regress

Since the completion of the infinite probabilistic regress is no longer unique when modeled with belief functions, it is a natural question as to how much freedom the theory of belief functions actually gives under the assumption that the conditional beliefs of \(E_k\) given \(E_{k+1}\) and \(E_{k+1}^c\) are equal to a and b respectively. The following theorem sheds some light on this matter.

Theorem 3.3

Suppose that for all k we have

$$\begin{aligned} \mathrm{Bel}(E_k \mid E_{k+1}) = a \;\;\mathrm {and} \;\; \mathrm{Bel}(E_k \mid E_{k+1}) = b. \end{aligned}$$
(3.11)
  1. (a)

    We have

    $$\begin{aligned} \mathrm{Bel}(E_0) \ge \min \{a, b\}. \end{aligned}$$
    (3.12)

    Furthermore, if \(1 \not = a \le b\) then for every \(\epsilon >0\) there is a belief function \(\mathrm{Bel}\) such that \(\mathrm{Bel}(E_0) \le \min \{a, b\} + \epsilon .\)

  2. (b)

    Let \(b>0\) and \(a<1\). For every \(\epsilon >0\) there is a belief function \(\mathrm{Bel}\) satisfying (3.11) such that

    $$\begin{aligned} \mathrm{Bel}(E_0) = \frac{1}{2-a} - \epsilon . \end{aligned}$$
    (3.13)

    Since \(1/(2-a) > b/(1-a+b)\) this implies that for fixed a and b we can always find a belief function so that the belief in \(E_0\) is strictly larger than the classical answer.

Proof

  1. (a)

    Write

    $$\begin{aligned} \begin{aligned} \Delta&:= \mathrm{Bel}(E_0) - \mathrm{Bel}(E_0 \cap E_1) - \mathrm{Bel}(E_0 \cap E_{1}^c), \\ P_1&:= 1 - \mathrm{Bel}(E_0 \cup E_1^c), \\ P_2&:= 1 - \mathrm{Bel}(E_0 \cup E_1). \\ \end{aligned} \end{aligned}$$
    (3.14)

    We have

    $$\begin{aligned} \begin{aligned} \mathrm{Bel}(E_0)&= \mathrm{Bel}(E_0 \cap E_1) + \mathrm{Bel}(E_0 \cap E_{1}^c) + \Delta \\&= a(\mathrm{Bel}(E_0 \cap E_1 ) + P_1) + b(\mathrm{Bel}(E_0 \cap E_1^c ) + P_2) + \Delta \\&\ge \min \{a , b \} ( \mathrm{Bel}(E_0 \cap E_1 ) + \mathrm{Bel}(E_0 \cap E_1^c )+ P_1 + P_2 + \Delta ) \\&= \min \{a , b \} ( \mathrm{Bel}(E_0) + P_1 + P_2 ) \\&\ge \min \{a , b \}. \end{aligned} \end{aligned}$$
    (3.15)

Now we show that there are \(\mathrm{Bel}\) arbitrarily close to this bound if \(1 \not =a \le b\). Let \(0<C<1\), set

$$\begin{aligned} Z = \frac{(1-C)b(1-a)}{1-ab+b} \end{aligned}$$
(3.16)

and consider \(\mathrm{Bel}\) given by

$$\begin{aligned} \begin{aligned} m( \Omega )&= 1 - C(1-a) - a - (2-a)Z, \\ m( (E_0 \cup E_1) \cap (E_1 \cup E_2) \cap \cdots )&= C(1-a), \\ m( E_0 \cap E_1 \cap E_2 \cap \cdots )&= a(1-Z), \\ m( E_0 \cap E_1^c \cap E_2 \cap E_3^c \cap \cdots )&= Z, \\ m( E_0^c \cap E_1 \cap E_2^c \cap E_3 \cap \cdots )&=Z. \\ \end{aligned} \end{aligned}$$
(3.17)

We can get \(\mathrm{Bel}(E_0)\) arbitrarily close to a by choosing C close enough to 0.

  1. (b)

    Let \(b>0\) and \(a<1\). Let \(0< \delta <1\) and consider \(\mathrm{Bel}\) given by

    $$\begin{aligned} \begin{aligned} m( E_0 \cap E_1^c \cap E_2 \cap \cdots )&= \delta , \\ m( E_0^c \cap E_1 \cap E_2^c \cdots )&= \delta , \\ m( E_0^c \cap E_1^c \cap E_2^c \cap \cdots )&= \frac{1-b}{b}\delta , \\ m( E_0 \cap E_1 \cap E_2 \cap \cdots )&= \frac{a}{1-a}\delta + \frac{a}{1-a}Z, \\ m( E_0 \cap E_2 \cap E_4 \cap \cdots )&= Z, \\ m( E_1 \cap E_3 \cap E_5 \cap \cdots )&= Z, \\ \end{aligned} \end{aligned}$$
    (3.18)

    where

    $$\begin{aligned} Z = \frac{(1-a) (1 - \frac{1-b}{b}\delta )}{2-a}-\delta . \end{aligned}$$
    (3.19)

    We can get \(\mathrm{Bel}(E_0)\) arbitrarily close to \(\frac{1}{2-a}\) by choosing \(\delta \) close enough to 0.

\(\square \)

4 Summary and conclusions

When one models epistemic probability with classical probability distributions, the infinite epistemic regress problem-interpreted probabilistically has a unique solution, as was shown by Peijnenburg and Atkinson. Since the infinite regress is a genuine epistemic situation, we should interpret probability epistemically here.

Not all epistemologists would agree that epistemic probability can be described by the usual axioms of Kolmogorov. An influential alternative was put forward by Shafer in 1976, the so called belief functions. Belief functions are more flexible than probability distributions, and do not follow the classical axioms. In particular, they are not additive.

We have shown that with belief functions, the infinite epistemic regress problem does not have a unique solution. We also argued that this non-uniqueness complies with common sense.

From a mathematical point of view, the completion of Atkinson and Peijnenburg boils down to the statement that fixing certain conditional probabilities determine the marginal distributions of a stochastic process. The corresponding statement for belief function is false, that is what our examples show. In each of our examples, a rational agent was able to determine his belief in \(E_0\), so in that sense completion is still possible. But the belief in \(E_0\) is not unique given the sequence of conditional beliefs. The actual completion depends on the detailed circumstances, and cannot be expressed in the conditional beliefs a and b.

In the last section, we showed that the non-uniqueness of an infinite regress is generic in the sense that many such regress problems have multiple solutions. Our examples are, therefore, typical. Our approach sheds some new light on the probabilistic version of the infinite epistemic regress problem, and shows that uniqueness of the completion depends on which axioms of epistemic probability one is willing to adopt.