1 Introduction

Modern mainstream probability theory, be it aleatory or epistemic, is almost exclusively based on the axioms of Kolmogorov, and we start our exposition with quoting these axioms. We fix a finite set \(\Omega \) of outcomes.

Definition 1.1

(Kolmogorov axioms) A function \(P: 2^{\Omega } \rightarrow [0,1]\) is a probability distribution if it satisfies:

  1. (P1)

    \(P(\Omega )=1\) and \(P(\emptyset )=0\);

  2. (P2)

    For every \(A, B \subset \Omega \) we have

    $$\begin{aligned} P(A \cup B) = P(A) + P(B) - P(A \cap B). \end{aligned}$$
    (1.1)

From an aleatory point of view, the axioms are often justified via a frequentistic interpretation of probabilities. In such a frequentistic interpretation, we take relative frequencies in repeated experiments as the motivation and justification of the axioms.

From an epistemic point of view, the probability of an event A is often interpreted as the degree of belief an agent has in a A. This degree of belief can be quantified as the price for which the agent is willing to both buy and sell a bet that pays out 1 if A turns out to be true. Using the Dutch Book argument, this interpretation also leads to the Kolmogorov axiomatization, as is well known.

However, degrees of belief cannot always be satisfactorily described with the classical axioms of Kolmogorov, something which has been recognized and confirmed by many researchers from such different disciplines as mathematics [1, 11,12,13, 16], legal science [1], and philosophy (see [3, 4, 6, 16] and references therein). These authors have argued that the classical axioms of probability are too restrictive for at least two reasons:

  1. (1)

    It is impossible for an agent to distinguish between disbelief in A, by which we mean \(P(A^c)\), and lack of belief, by which we mean \(1-P(A)\). Especially when we interpret degree of belief in A as the degree to which an agent has supporting evidence for A, lack of supporting evidence for A is not necessarily supporting evidence for \(A^c\).

  2. (2)

    It is impossible for an agent to suspend all judgment, that is, assigning \(P(A)=P(A^c)=0\) is impossible. But certainly it is possible to have no supporting evidence at all for either A or \(A^c\). As an example, consider a situation in which a man claims to be the father of a certain child. A DNA test may or may not rule out the man as a potential father of the child, but in any classical probabilistic computation one has to start with a prior probability for the man to be the father. Often one takes a fifty–fifty prior, but clearly this does not correspond to our knowledge. Since we have no evidence for either fathership or non-fathership, it would be more reasonable to assign zero prior belief to both possibilities, something which is impossible under the Kolmogorov axioms. (Uniform prior probabilities are also problematic from another point of view: lumping states together or changing the scale does not preserve uniformity, as is well known, see, e.g., [11].)

These and similar considerations motivated Glenn Shafer [11] to introduce belief functions, which were supposed to better capture the nature of epistemic probability. To see how belief functions differ from classical probability distributions, we note that it is well known that (P2) can be expanded, for any collection of events \(A_1, \ldots , A_N\), into the well-known inclusion–exclusion formula

$$\begin{aligned} P\left( \bigcup _{i=1}^N A_i \right) = \sum _{\begin{array}{c} I \subseteq \{1,\ldots ,N\} \\ I \not = \emptyset \end{array}} (-1)^{|I|+1}P \left( \bigcap _{i\in I} A_i \right) . \end{aligned}$$
(1.2)

In the Shafer axiomatization of belief functions, (1.2) is replaced by a corresponding inequality.

Definition 1.2

(Shafer axioms ([11])) A function \(\mathrm{Bel}: \Omega \rightarrow [0,1]\) is a belief function provided that

  1. (B1)

    \(\mathrm{Bel}(\emptyset )=0\) and \(\mathrm{Bel}(\Omega )=1\);

  2. (B2)

    For all \(A_1,A_2,\ldots , A_N \subset \Omega \), we have

    $$\begin{aligned} \mathrm{Bel}\left( \bigcup _{i=1}^N A_i \right) \ge \sum _{\begin{array}{c} I \subseteq \{1,\ldots ,N\} \\ I \not = \emptyset \end{array}} (-1)^{|I|+1}\mathrm{Bel}\left( \bigcap _{i\in I} A_i \right) . \end{aligned}$$
    (1.3)

One may at this point already loosely argue that the Shafer axioms are indeed suitable for describing degrees of belief, as follows. Taking \(N=2\) in (1.3) for simplicity gives \(\mathrm{Bel}(A \cup B) \ge \mathrm{Bel}(A) + \mathrm{Bel}(B) - \mathrm{Bel}(A \cap B)\). A loose interpretation now is as follows. If we have supporting evidence for either A or B, then of course we have also supporting evidence of \(A \cup B\). However, supporting evidence for both A and B is then counted twice and must be subtracted on the right-hand side. Now notice that it is possible that there is supporting evidence for \(A \cup B\), but not for A or B individually, leading to the inequality in the formula.

Shafer showed that belief functions have the following characterization.

Definition 1.3

A function \(m: 2^{\Omega } \rightarrow [0,1]\) is called a basic belief assignment if \(m(\emptyset )=0\) and

$$\begin{aligned} \sum _{C \subseteq \Omega } m(C)=1. \end{aligned}$$
(1.4)

Theorem 1.4

A function \(\mathrm{Bel}: 2^\Omega \rightarrow [0,1]\) is a belief function if and only if there exists a basic belief assignment m such that

$$\begin{aligned} \mathrm{Bel}(A) = \sum _{C \subseteq A} m(C). \end{aligned}$$
(1.5)

There is a one-to-one correspondence between belief functions and basic belief assignments, and \(\mathrm{Bel}\) is a probability distribution if and only if m concentrates on singletons.

In many situations, there is a natural candidate for the basic belief assignment m, especially when the latter is obtained through a classical probabilistic experiment.

Although belief functions are used in certain applied settings (see [2] for a recent text), researchers in mainstream mathematics have essentially stayed away from it. There are, roughly speaking, two reasons for this. First, there were many critics (see the references in [12]) for which a theory about uncertainty without a behavioral (betting) interpretation was unacceptable.

A second major point of concern is formulated in [10] and [5]. In both references, the main reason to reject Shafer’s belief functions is that the calculus of these belief functions, as put forward in the so-called Dempster rule of combination to combine two belief functions into a new one, would not be well founded or motivated and would lead to unacceptable results.

The main goal of the current paper is to show how belief functions arise from a natural betting interpretation, thereby taking away the first major point of concern. We already mentioned above that classically, the degree of belief an agent has in an event A is interpreted as the price for which he or she is willing to both buy and sell a bet that pays out 1 if A turns out to be true. We argue, however, that a degree of belief an agent has in an event A should be interpreted as the maximum price for which he or she is willing to buy (not necessarily sell) a bet that pays out 1 if A turns out to be true. In our approach, the difference between buying and selling is interpreted as the difference between disbelief and lack of believe that we mentioned above. The distinction between buying and selling prices also appears in the theory of Peter Walley [16], and below we comment on the relation between his general approach and our theory.

We do not further comment on Dempster’s rule of combination for the simple reason that we have no need for this rule, and we develop the theory without it.

Other characterizations of belief functions exist in the literature, for instance in the work of Smets [14]. We think our approach is more direct than his, and as a result our characterization is much more concise (he has 10 requirements). Moreover, unlike Smets, we work in a betting interpretation, so that our result can be seen as an analogue of the Dutch Book Theorem.

The paper is built up as follows. In Sect. 2 we develop our betting theory, and in Sect.  3 we state and prove the betting interpretation of belief functions. This section contains our main result Theorem 3.10.

2 Betting Functions and Degrees of Belief

We approach belief behaviorally, working under the assumption that the degree to which someone believes an event can be measured by his or her willingness to accept bets. To go toward a definition that we can work with mathematically, we introduce betting functions.

Definition 2.1

A bet (or gamble) on \(\Omega \) is a function \(X: \Omega \rightarrow \mathbb {R}\). We write \(\mathcal {X} = \mathbb {R}^X\) for the collection of all bets on \(\Omega \). A betting function is a function \(R: \mathcal {X} \rightarrow \{0,1\}\) such that for each \(X \in \mathcal {X}\), there is an \(\alpha _0 \in \mathbb {R}\) such that \(R(X+\alpha )=0\) for \(\alpha <\alpha _0\) and \(R(X+\alpha )=1\) for \(\alpha \ge \alpha _0\).

We interpret R as a function that indicates, for each bet \(X \in \mathcal {X}\), whether or not an agent is willing to accept X, where we interpret \(R(X)=1\) as ‘willing to accept the bet’ and \(R(X)=0\) ‘not willing to accept the bet.’ The definition of a betting function justifies the following definition.

Definition 2.2

Let R be a betting function. The buy function \(\mathrm {Buy}_R: \mathcal {X} \rightarrow \mathbb {R}\) is given by

$$\begin{aligned} \mathrm {Buy}_R(X) := \max \{ \alpha \in \mathbb {R} \;:\; R(X - \alpha )=1 \}. \end{aligned}$$
(2.1)

This is the maximum price an agent is willing to pay for a bet which pays out X. The sell function \(\mathrm {Sell}_R: \mathcal {X} \rightarrow \mathbb {R}\) is given by

$$\begin{aligned} \mathrm {Sell}_R(X) := \min \{ \alpha \in \mathbb {R} \;:\; R(\alpha - X)=1 \}. \end{aligned}$$
(2.2)

This is the minimum price an agent is willing to sell the bet which pays out X.

The buy and sell function has the following relation:

$$\begin{aligned} \begin{aligned} \mathrm{Buy}_R(X)&= \max \{ \alpha \in \mathbb {R} \;:\; R(X-\alpha )=1 \} \\&= \max \{ -\alpha \in \mathbb {R} \;:\; R(X+\alpha )=1 \} \\&= - \min \{ \alpha \in \mathbb {R} \;:\; R(\alpha -(-X))=1 \} \\&= - \mathrm{Sell}_R(-X). \end{aligned} \end{aligned}$$
(2.3)

This shows how buy and sell functions are dual in the sense that \(\mathrm{Buy}_R\) is completely determined by \(\mathrm{Sell}_R\) and vice versa. This relation between \(\mathrm{Buy}\) and \(\mathrm{Sell}\) is precisely the relation between lower and upper previsions in [16], but in our case the relation follows from the underlying concept of a betting function. Note that we can recover R from \(\mathrm{Buy}_R\), since

$$\begin{aligned} R(X) = \left\{ \begin{array}{ll} 1 &{} \text{ if } \mathrm{Buy}_R(X) \ge 0, \\ 0 &{} \text{ if } \mathrm{Buy}_R(X) < 0. \end{array} \right. \end{aligned}$$
(2.4)

Definition 2.3

Let \(R: \mathcal {X} \rightarrow \{0,1\}\) be a betting function. Then R is said to be coherent if

  1. (R1)

    If \(X>0\), then \(R(X)=1\);

  2. (R2)

    \(R(\lambda X) = R(X)\) for every \(\lambda >0\);

  3. (R3)

    If \(R(X)=R(Y)=1\), then \(R(X+Y)=1\).

These conditions do not necessarily capture everything a reasonable agent should adhere to, and for rational behavior more is needed as we will see. However, the conditions formulate the very basics. Condition (R1) says that agents should be willing to accept bets with guaranteed positive results. Condition (R2) says that willingness to accept bets should only depend on the relative sizes of the results, but not on their absolute sizes. Condition (R3) says that if an agent is willing to accept two bet, he/she should be willing to accept bets simultaneously. We want to point out that both condition (R2) and (R3) have a debatable consequence: if an agent is willing to accept a bet in which he/she wins 1 euro if A is true and loses 1 euro if A is false, then he/she should be willing to accept a bet in which he/she wins 1 million euro if A is true and loses 1 million euro if A is false. In the real world, one could think of all kinds of reasons why an agent would not want to accept the second bet, even if he/she is willing to accept the first one. By imposing coherence, we thus consider highly idealized agents.

The following result says that \(\mathrm{Buy}_R\) is coherent as a lower prevision in the sense of Walley [16] if and only if R is coherent in the sense of Definition 2.3.

Theorem 2.4

Let \(R: \mathcal {X} \rightarrow \{0,1\}\) be a betting function. Then R is coherent if and only if

  • \(\mathrm{Buy}_R(X) \ge \min X\);

  • \(\mathrm{Buy}_R(\lambda X)= \lambda \mathrm{Buy}_R(X)\) for every \(\lambda >0\);

  • \(\mathrm{Buy}_R(X+Y) \ge \mathrm{Buy}_R(X) + \mathrm{Buy}_R(Y)\).

Proof

Suppose R is coherent. By (R1), we have that \(R(X - \min X + \epsilon )=1\) for every \(\epsilon >0\). Hence,

$$\begin{aligned} \mathrm{Buy}_R(X) = \max \{ \alpha \in \mathbb {R} \;:\; R(X-\alpha )=1 \} \ge \min X - \epsilon \end{aligned}$$
(2.5)

for every \(\epsilon >0\), so \(\mathrm{Buy}_R(X) \ge \min X\). By (R2), we have for every \(\lambda >0\) that

$$\begin{aligned} \begin{aligned} \mathrm{Buy}_R(\lambda X)&= \max \{ \alpha \in \mathbb {R} \;:\; R(\lambda X-\alpha )=1 \} \\&= \max \{ \alpha \in \mathbb {R} \;:\; R(X- \frac{\alpha }{\lambda })=1 \} \\&= \max \{ \lambda \alpha \in \mathbb {R} \;:\; R(X- \alpha )=1 \} \\&= \lambda \max \{ \alpha \in \mathbb {R} \;:\; R(X- \alpha )=1 \} \\&= \lambda \mathrm{Buy}_R(X). \end{aligned} \end{aligned}$$
(2.6)

Suppose that, for some \(\alpha ,\beta \in \mathbb {R}\), we have \(R(X-\alpha )=1\) and \(R(Y-\beta )=1\). Then, by (R3), we have \(R(X+Y-\alpha -\beta )=1\) and thus

$$\begin{aligned} \mathrm{Buy}_R(X+Y) = \max \{ \gamma \in \mathbb {R} \;:\; R(X+Y-\gamma )=1 \} \ge \alpha + \beta . \end{aligned}$$
(2.7)

It follows that

$$\begin{aligned} \begin{aligned} \mathrm{Buy}_R(X+Y)&\ge \max \{ \alpha \;:\; R(X-\alpha )=1 \} + \max \{ \beta \;:\; R(X-\beta )=1 \} \\&= \mathrm{Buy}_R(X) + \mathrm{Buy}_R(Y). \end{aligned} \end{aligned}$$
(2.8)

For the converse, suppose that \(\mathrm{Buy}_R\) has the listed properties. We set \(\psi : \mathbb {R} \rightarrow \{0,1\}\) by

$$\begin{aligned} \psi (x) = \left\{ \begin{array}{ll} 1 &{} \text{ if } x \ge 0, \\ 0 &{} \text{ if } x < 0, \end{array} \right. \end{aligned}$$
(2.9)

and we note as before that

$$\begin{aligned} R(X) = \psi (\mathrm{Buy}_R(X)). \end{aligned}$$
(2.10)

If \(X>0\), then \(\mathrm{Buy}_R(X) \ge 0\) by the first property. Hence, \(R(X)= \psi (\mathrm{Buy}_R(X))=1\). So (R1) holds.

The second property tells us that, for every \(\lambda >0\), we have

$$\begin{aligned} R(\lambda X) = \psi (\mathrm{Buy}_R( \lambda X)) = \psi ( \lambda \mathrm{Buy}_R(X) ) = \psi ( \mathrm{Buy}_R(X) ) = R(X), \end{aligned}$$
(2.11)

so (R2) holds.

By the third property, we have

$$\begin{aligned} \begin{aligned} R(X+Y)&= \psi (\mathrm{Buy}_R(X+Y)) \\&\ge \psi (\mathrm{Buy}_R(X) + \mathrm{Buy}_R(Y)) \\&\ge \min \{ \psi (\mathrm{Buy}_R(X)), \psi (\mathrm{Buy}_R(Y)) \} \\&= \min \{ R(X), R(Y) \}. \end{aligned} \end{aligned}$$
(2.12)

Hence, if \(R(X)=R(Y)=1\), it follows that \(R(X+Y)=1\), so (R3) holds. \(\square \)

The second and third property of Theorem 2.4 tell us that \(\mathrm{Buy}_R\) is a super-linear functional if R is coherent. Coherence of R can of course also be captured in terms of \(\mathrm{Sell}_R\):

Theorem 2.5

Let \(R: \mathcal {X} \rightarrow \{0,1\}\) be a betting function. Then R is coherent if and only if

  • \(\mathrm{Sell}_R(X) \le \max X\);

  • \(\mathrm{Sell}_R(\lambda X)= \lambda \mathrm{Sell}_R(X)\) for every \(\lambda >0\);

  • \(\mathrm{Sell}_R(X+Y) \le \mathrm{Sell}_R(X) + \mathrm{Sell}_R(Y)\).

Proof

This follows directly from Theorem 2.4 and the relation between \(\mathrm{Buy}_R\) and \(\mathrm{Sell}_R\).

We want to measure the degree to which an agent believes \(A \subseteq \Omega \) by the willingness to accept a bet with a desirable result if A is true and an undesirable result if A is false. The actions that have a desirable result if A is true and an undesirable result if A is false, are buying the bet \(1_A\) and selling the bet \(1_{A^c}\). It is willingness to buy \(1_A\) for a high price and willingness to sell \(1_{A^c}\) for a low price that shows a high degree of belief in A. This leads to our definition of degree of belief.

Definition 2.6

(Degree of belief) Let R be the coherent betting function of an agent. We define the degree to which this agent believes an event \(A \subseteq \Omega \) as \(\mathrm{Buy}_R(1_A)=1-\mathrm{Sell}_R(1_{A^c})\).

3 Adding B-Consistency

In this section, we introduce an additional condition for betting functions and show that this constraint precisely leads to buy functions which are belief functions when restricted to bets of the form \(1_A\), with \(A \subset \Omega \). Before we do this, however, we will briefly discuss how our setup relates to the Dutch Book argument for probability distributions.

The Dutch Book argument is centered around the principle that betting behavior of agents should not lead to sure loss. The following theorem tells us that not having a sure loss is already implied by coherence. This theorem is well known within Walley’s theory, but we give our version of the proof as a service to the reader.

Theorem 3.1

If R is a coherent betting function, then

$$\begin{aligned} \max _{\omega \in \Omega } \left( \sum _{i=1}^N \left( X_i - \mathrm{Buy}_R(X_i)\right) + \sum _{j=1}^M \left( \mathrm{Sell}_R(Y_j) - Y_j\right) \right) \ge 0 \end{aligned}$$
(3.1)

for all \(X_1,\ldots ,X_N \in \mathcal {X}\) and \(Y_1,\ldots ,Y_M \in \mathcal {X}\).

Proof

We first show that \(\mathrm{Buy}_R(X) \le \max X\) for each \(X \in \mathcal {X}\). Suppose that \(\mathrm{Buy}_R(X) > \max X\). This means there is an \(\epsilon >0\) such that

$$\begin{aligned} R(X- \max X - \epsilon ) = 1. \end{aligned}$$
(3.2)

Note that for every \(Y \in \mathcal {X}\), there is a \(\lambda >0\) such that \(\lambda (X-\max X -\epsilon ) < Y\). Since \(R(\lambda (X- \max X - \epsilon )) = 1\) by (R2) and \(R(Y-\lambda (X-\max X -\epsilon ))=1\) by (R1), it follows with (R3) that \(R(Y)=1\). Hence, \(R(Y)=1\) for all \(Y \in \mathcal {X}\), but then there is no maximum \(\alpha \in \mathbb {R}\) such that \(R(Y-\alpha )=1\). This is a contradiction, and it follows that \(\mathrm{Buy}_R(X) \le \max X\).

Now let \(X_1,\ldots ,X_N \in \mathcal {X}\) and \(Y_1,\ldots ,Y_M \in \mathcal {X}\). By using Theorem 2.4 and the property we just proved, we find

$$\begin{aligned} \begin{aligned} \sum _{i=1}^N \mathrm{Buy}_R(X_i) - \sum _{j=1}^M \mathrm{Sell}_R(Y_j)&= \sum _{i=1}^N \mathrm{Buy}_R(X_i) + \sum _{j=1}^M \mathrm{Buy}_R(-Y_j) \\&\le \mathrm{Buy}_R \left( \sum _{i=1}^N X_i + \sum _{j=1}^M -Y_j \right) \\&\le \max \left( \sum _{i=1}^N X_i + \sum _{j=1}^M -Y_j \right) . \end{aligned} \end{aligned}$$
(3.3)

\(\square \)

Theorem 3.1 tells us that coherence is stronger than having no sure loss. At the same time, even coherence does not imply that \(A \mapsto \mathrm{Buy}_R(1_A)\) is a probability distribution: it is clear that the collection of maps \(A \mapsto \mathrm{Buy}_R(1_A)\) for coherent R, is much richer than only probability distributions. This tells us that the property that \(\mathrm{Buy}_R(1_A) = \mathrm{Sell}_R(1_A)\) for every \(A \subseteq \Omega \), which is usually implicitly assumed when the Dutch Book argument is laid out, is crucial for the restriction to probability distributions. The next theorem confirms this.

Theorem 3.2

P is a probability distribution if and only if there exists a coherent betting function R such that \(P(A)=\mathrm{Buy}_R(1_A)=\mathrm{Sell}_R(1_A)\).

Proof

Suppose there is a coherent R such that \(P(A)=\mathrm{Buy}_R(1_A) = \mathrm{Sell}_R(1_A)\). We have \(P(\Omega )=\mathrm{Buy}_R(1_\Omega )=1\) and \(P(A)=\mathrm{Buy}_R(1_A) \ge 0\) by coherence. Then for disjoint \(A,B \subseteq \Omega \) we find

$$\begin{aligned} P(A \cup B) = \mathrm{Buy}_R(1_A + 1_B) \ge \mathrm{Buy}_R(1_A) + \mathrm{Buy}_R(1_B) = P(A)+P(B) \end{aligned}$$
(3.4)

and

$$\begin{aligned} P(A \cup B) = \mathrm{Sell}_R(1_A + 1_B) \le \mathrm{Sell}_R(1_A) + \mathrm{Buy}_R(1_B) = P(A)+ P(B). \end{aligned}$$
(3.5)

Hence, P is additive.

For the converse, suppose that P is a probability distribution. Then define

$$\begin{aligned} \mathrm{Buy}_R(X) = \sum _{\omega \in \Omega } P(\{ \omega \}) X(\omega ). \end{aligned}$$
(3.6)

Clearly, R is coherent and we have \(\mathrm{Buy}_R(X)=\mathrm{Sell}_R(X)\). \(\square \)

As we already mentioned, the constraint that \(\mathrm{Buy}_R(1_A)=\mathrm{Sell}_R(1_A)\) for every \(A \subseteq \Omega \) is precisely one we do not want to impose. This property, however, is not easily weakened in a reasonable way. Therefore, we will work toward another characterization of probability distributions (Theorem 3.7), from which we can derive our constraint. We start with the definition of a belief valuation.

Definition 3.3

A function \(\mathcal{B}: 2^{\Omega } \rightarrow \{0,1\}\) is called a belief valuation provided that

  • \(\mathcal{B}(A^c)=0\) if \(\mathcal{B}(A)=1\);

  • If \(\mathcal{B}(A)=1\) and \(A \subseteq B\), then \(\mathcal{B}(B)=1\);

  • If \(\mathcal{B}(A)= 1 \) and \(\mathcal{B}(B)=1\), then \(\mathcal{B}(A \cap B)=1\);

  • \(\mathcal{B}(\Omega )=1\).

A belief valuation is also called a categorical belief function or a 0-1 necessity measure in the literature. In practice, we use the characterization that \(\mathcal {B}\) is a belief valuation if and only if there is a nonempty set \(E_\mathcal {B} \subseteq \Omega \) such that

$$\begin{aligned} \mathcal {B}(A) = \left\{ \begin{array}{ll} 1 &{} \text{ if } E_\mathcal {B} \subseteq A \\ 0 &{} \text{ otherwise } \end{array} \right. . \end{aligned}$$
(3.7)

We can also describe belief valuations in terms of filters, since \(\mathcal {B}\) is a belief valuation if and only if

$$\begin{aligned} \{ A \subseteq \Omega \;:\; \mathcal {B}(A)=1 \} \subseteq 2^\Omega \end{aligned}$$
(3.8)

is a proper filter of subsets of \(\Omega \). This filter can be interpreted as the collection of sets in which an agent has full belief. The next result makes this precise. For R a coherent betting function, we denote by \(\mathcal {B}_R: 2^{\Omega } \rightarrow \{0,1\}\) the function that satisfies \(\mathcal {B}_R(A)=1\) if and only if \(\mathrm{Buy}_R(1_A)=1\).

Theorem 3.4

A function \(\mathcal {B}: 2^{\Omega } \rightarrow \{0,1\}\) is a belief valuation if and only if there is a coherent betting function R such that \(\mathcal {B}=\mathcal {B}_R\).

Proof

Suppose first that \(\mathcal {B}\) is a belief valuation. We define R by

$$\begin{aligned} \mathrm{Buy}_R(X):= \min _{\omega \in E_\mathcal {B}} X(\omega ). \end{aligned}$$

Clearly, R is coherent and it follows from the definition of \(\mathcal {B}_R\) that \(\mathcal {B}_R= \mathcal {B}\).

For the converse, suppose that R is a coherent betting function. We check that \(\mathcal {B}_R\) is a belief valuation. Since \(\mathrm{Buy}_R(1_{\Omega })=1\), we have \(\mathcal {B}_R(\Omega )=1\). The second property in Definition 3.3 follows immediately from the monotonicity of \(\mathrm{Buy}_R\). If \(\mathrm{Buy}_R(A)=\mathrm{Buy}_R(B)=1\), then

$$\begin{aligned} \mathrm{Buy}_R(1_{A \cap B})= & {} \mathrm{Buy}_R(1_A + 1_B - 1_{A \cup B})\\\ge & {} \mathrm{Buy}_R(1_A) + \mathrm{Buy}_R(1_B)+ \mathrm{Buy}_R(-1_{A \cup B}) \ge 1, \end{aligned}$$

so \(\mathrm{Buy}_R(1_{A \cap B})=1\). Finally, if \(\mathcal {B}_R(A)=1\), then since

$$\begin{aligned} 1= \mathrm{Buy}_R(1_A + 1_{A^c}) \ge \mathrm{Buy}_R(1_A) + \mathrm{Buy}_R(1_{A^c}), \end{aligned}$$

it follows that \(\mathrm{Buy}_R(1_{A^c})=0\). \(\square \)

A belief valuation should be compared with the classical notion of a truth valuation. The definition of a truth valuation \(\mathcal {T}:2^{\Omega } \rightarrow \{0,1\}\) is similar to the definition of a belief valuation, the only difference being that in the first bullet, ‘if’ is replaced by ‘if and only if.’ Hence, a truth valuation is a special belief valuation, namely one that corresponds to a proper ultrafilter of sets. Truth valuations are precisely those belief valuations \(\mathcal {B}\) for which \(E_\mathcal {B}\) is a singleton. A major difference between truth and belief is that if an agent does not believe in A, this does not imply that he/she does believe in \(A^c\). As a result, the implication in the first bullet in the definition of a belief valuation goes in one direction only.

Given a belief valuation \(\mathcal {B}\) and a set \(S \subseteq \Omega \) for which \(\mathcal {B}(S)=1\), the agent fully believes that a bet \(X \in \mathcal {X}\) has a revenue of at least

$$\begin{aligned} \min _{\omega \in S} X(\omega ). \end{aligned}$$
(3.9)

Since this holds for all S for which \(\mathcal {B}(S)=1\), this leads to the definition of guaranteed revenue.

Definition 3.5

For any belief valuation \(\mathcal {B}\), the guaranteed revenue \(G_\mathcal{{B}}: \mathcal {X} \rightarrow \mathbb {R}\) is defined as

$$\begin{aligned} G_\mathcal{{B}}(X)= \max _{A: \mathcal {B}(A) = 1} \min _{\omega \in A} X(\omega ). \end{aligned}$$

Since we have that \(\mathcal {B}(A)=1\) if and only if \(E_\mathcal {B} \subseteq A\), we can express the guaranteed revenue as

$$\begin{aligned} G_\mathcal{{B}}(X) = \min _{ \omega \in E_\mathcal {B}} X(\omega ). \end{aligned}$$
(3.10)

To benchmark and motivate our main result Theorem 3.10, we now first show how classical probability distributions can be characterized with the notion of guaranteed revenue.

Definition 3.6

A betting function R is P-consistent if and only if, for all \(X_1,\ldots ,X_N\) and \(Y_1,\ldots ,Y_M\) such that

$$\begin{aligned} G_\mathcal {B} \left( \sum _{i=1}^N X_i \right) \le G_\mathcal {B} \left( \sum _{j=1}^M Y_j \right) \end{aligned}$$
(3.11)

for every belief valuation \(\mathcal {B}\), we have

$$\begin{aligned} \sum _{i=1}^N \mathrm{Buy}_R(X_i) \le \sum _{j=1}^M \mathrm{Buy}_R(Y_j). \end{aligned}$$
(3.12)

Theorem 3.7

P is a probability distribution if and only if there exists a coherent and P-consistent R such that \(P(A)=\mathrm{Buy}_R(1_A)\).

In words, this result says that if the guaranteed revenue of one collection of bets is larger than the guaranteed revenue of a second collection of bets, then the agent should be willing to pay more for the second collection.

The proof of the theorem below reveals that the statement of the theorem is, strictly speaking, overly complicated. Indeed, if we would leave out \(G_{\mathcal {B}}\) everywhere, the ensuing result would still be true and probably easier to interpret: the theorem would say that if one collection of bets always pays out more than a second collection, an agent would be willing to pay more for the first collection. We have chosen for the current formulation since this formulation points the way for the necessary changes.

Proof of Theorem 3.7

First suppose that R is coherent, P-consistent and that \(P(A)=\mathrm{Buy}_R(1_A)\). Suppose \(A \cap B = \emptyset \). Then

$$\begin{aligned} G_\mathcal {B}(1_A + 1_B) = G_\mathcal {B}(1_{A \cup B}) \end{aligned}$$
(3.13)

for every \(\mathcal {B}\), so by P-consistency we have

$$\begin{aligned} \mathrm{Buy}_R(1_A + 1_B) = \mathrm{Buy}_R(1_A) + \mathrm{Buy}_R(1_B). \end{aligned}$$
(3.14)

Hence, \(P(A \cup B) = P(A) + P(B)\). Since R is coherent, we have \(P(\Omega )=\mathrm{Buy}_R(1_\Omega )=1\) and it follows that P is a probability distribution.

Now suppose that P is a probability distribution. We define R by

$$\begin{aligned} \mathrm{Buy}_R(X) := \sum _{\omega \in \Omega } X(\omega ) P(\{\omega \}). \end{aligned}$$
(3.15)

Clearly, \(\mathrm{Buy}_R(1_A)=P(A)\), and it follows from Theorem 2.4 that R is coherent. Now suppose that

$$\begin{aligned} G_\mathcal {B} \left( \sum _{i=1}^N X_i \right) \le G_\mathcal {B} \left( \sum _{j=1}^M Y_j \right) \end{aligned}$$
(3.16)

for every belief valuation \(\mathcal {B}\). For every \(\omega \in \Omega \), the map \(\mathcal {B}_\omega (A) := 1_A(\omega )\) is a belief valuation. Note that \(G_{B_\omega }(X) = X(\omega )\), so (3.16) implies that

$$\begin{aligned} \sum _{i=1}^N X_i \le \sum _{j=1}^M Y_j. \end{aligned}$$
(3.17)

Hence, by our definition of \(\mathrm{Buy}_R\):

$$\begin{aligned} \begin{aligned} \sum _{i=1}^N \mathrm{Buy}_R(X_i)&= \mathrm{Buy}_R \left( \sum _{i=1}^N X_i \right) \\&\le \mathrm{Buy}_R \left( \sum _{j=1}^M Y_j \right) \\&= \sum _{j=1}^M \mathrm{Buy}_R(Y_j). \end{aligned} \end{aligned}$$
(3.18)

So R is P-consistent. \(\square \)

Although we do not deny that the notion of P-consistency is in some sense a reasonable requirement for collections of bets, there is, from the point of view of epistemic probability, a problem with it. Whereas in (3.11) the guaranteed revenues of the combined bets are compared, in (3.12) the sums of the buy prices of the individual bets are compared. This observation goes back to the heart of the problem with the use of classical probability distributions for epistemic purposes: the guaranteed revenue of the combined bets \(1_A\) and \(1_A^c\) is—of course—the same as the guaranteed revenue of the bet 1, under any belief valuation. But this should not have any direct implication for the maximum price for which an agent would be willing to buy \(1_A\) or \(1_A^c\) individually.

This suggests that for epistemic purposes, we should change P-consistency in one of the following two ways: in (3.11) the sums of the guaranteed revenues of the individual bets should be compared, or in (3.12) the buy prices of the combined bets should be compared. The first option leads to our definition of B-consistency.

Definition 3.8

A betting function R is B-consistent if for all \(X_1,\ldots ,X_N\) and \(Y_1,\ldots ,Y_M\) such that

$$\begin{aligned} \sum _{i=1}^N G_\mathcal {B}(X_i) \le \sum _{j=1}^M G_\mathcal {B}(Y_j) \end{aligned}$$
(3.19)

for every belief valuation \(\mathcal {B}\), we have

$$\begin{aligned} \sum _{i=1}^N \mathrm{Buy}_R(X_i) \le \sum _{j=1}^M \mathrm{Buy}_R(Y_j). \end{aligned}$$
(3.20)

There is a simple reason for this choice. Indeed, the alternative would result in the constraint that if \(G_\mathcal {B}(\sum _i X_i) \le G_\mathcal {B}(\sum _j Y_j)\) for every belief valuation \(\mathcal {B}\), then \(\mathrm{Buy}_R(\sum X_i) \le \mathrm{Buy}_R(\sum _j Y_j)\). This constraint, however, is simply B-consistency for \(N=M=1\).

Hence, the notion of B-consistency differs from P-consistency in the sense that we compare the correct quantities: we compare sums of guaranteed revenues of individual (collective) bets to sums of buy prices of individual (collective) bets.

The next example illustrates that not every coherent betting function is B-consistent.

Example 3.9

Suppose \(\Omega = \{1,2,3,4\}\) and that R is given by

$$\begin{aligned} \mathrm{Buy}_R(X) = \min \left\{ \frac{1}{2} \sum _{i=1}^2 X(i), \frac{1}{4} \sum _{i=1}^4 X(i) \right\} . \end{aligned}$$
(3.21)

It is easy to check that R is coherent, and that for all belief valuations \(\mathcal {B}\) we have

$$\begin{aligned} G_\mathcal {B}(1_{\{2,3,4\}}) + G_\mathcal {B}(1_{\{2\}}) \ge G_\mathcal {B}(1_{\{2,3\}}) + G_\mathcal {B}(1_{\{2,4\}}), \end{aligned}$$
(3.22)

but

$$\begin{aligned} \mathrm{Buy}_R(1_{\{2,3,4\}}) + \mathrm{Buy}_R(1_{\{2\}}) = \frac{3}{4} < 1 = \mathrm{Buy}_R(1_{\{2,3\}}) + \mathrm{Buy}_R(1_{\{2,4\}}). \end{aligned}$$
(3.23)

With the notion of B-consistency we can now state and prove our main result. The following theorem constitutes our behavioral interpretation of belief functions. It shows that only B-consistency is needed to guarantee that a lower prevision in the sense of Walley [16] is in fact a belief function and that every belief function can be obtained this way. The result legitimizes the use of belief functions for modeling epistemic probability.

Theorem 3.10

\(\mathrm{Bel}\) is a belief function if and only if there exists a coherent and B-consistent R such that \(\mathrm{Bel}(A) = \mathrm{Buy}_R(1_A)\).

This result follows immediately from the following theorem which is interesting in its own right and characterizes B-consistency for coherent betting functions.

Theorem 3.11

Let R be a coherent betting function. Then R is B-consistent if and only if there is a basic belief assignment m such that

$$\begin{aligned} \mathrm{Buy}_R(X)=\sum _{S \subseteq \Omega } m(S) \min _{\omega \in S} X(\omega ) \end{aligned}$$
(3.24)

for all \(X \in \mathcal {X}\).

The expression in (3.24) is known as the Choquet integral of X with respect to the belief function that corresponds to m (see for instance [7]), and is also called the lower expectation of X (see for instance [13]). Hence, another way of phrasing Theorem 3.11 is that R is B-consistent if and only if \(\mathrm{Bel}(A) := \mathrm{Buy}_R(1_A)\) is a belief function and \(\mathrm{Buy}_R(X)\) is given by the Choquet integral of X with respect to \(\mathrm{Bel}\). We now give the proof.

Proof of Theorem 3.11

First suppose that there is a basic belief assignment m such that

$$\begin{aligned} \mathrm{Buy}_R(X)=\sum _{S \subseteq \Omega } m(S) \min _{\omega \in S} X(\omega ) \end{aligned}$$

for all \(X \in \mathcal {X}\). Also suppose that for \(X_1,\ldots ,X_N,Y_1,\ldots ,Y_M \in \mathcal {X}\) we have

$$\begin{aligned} \sum _{i=1}^N \min _{\omega \in S} X_i(\omega ) \le \sum _{j=1}^M \min _{\omega \in S} Y_j(\omega ) \end{aligned}$$
(3.25)

for every nonempty \(S \subseteq \Omega \). Then

$$\begin{aligned} \begin{aligned} \sum _{i=1}^N \mathrm{Buy}_R(X_i)&= \sum _{i=1}^N \sum _{S \subseteq \Omega } m(S) \min _{\omega \in S} X_i(\omega ) \\&= \sum _{S \subseteq \Omega } m(S) \sum _{i=1}^N \min _{\omega \in S} X_i(\omega ) \\&\le \sum _{S \subseteq \Omega } m(S) \sum _{j=1}^M \min _{\omega \in S} Y_j(\omega ) \\&= \sum _{j=1}^M \mathrm{Buy}_R(Y_j). \end{aligned} \end{aligned}$$
(3.26)

Hence, R is B-consistent.

For the converse of the theorem, suppose that R is B-consistent. First, we show that \(\mathrm{Bel}(A):=\mathrm{Buy}_R(1_A)\) is a belief function. Clearly, we have

$$\begin{aligned} G_\mathcal {B}(1_A) = \mathcal {B}(A), \end{aligned}$$
(3.27)

and since \(\mathcal {B}\) is a belief function, we have

$$\begin{aligned} \mathcal {B}\left( \bigcup _{i=1}^N A_i \right) + \sum _{\begin{array}{c} I \subseteq \{1,\ldots ,n\} \\ I \not = \emptyset , |I| \mathrm {\;even} \end{array}} \mathcal {B} \left( \bigcap _{i \in I} A_i \right) \ge \sum _{\begin{array}{c} I \subseteq \{1,\ldots ,n\} \\ I \not = \emptyset , |I| \mathrm {\;odd} \end{array}} \mathcal {B} \left( \bigcap _{i \in I} A_i \right) . \end{aligned}$$
(3.28)

So by B-consistency, we have

$$\begin{aligned} \mathrm{Bel}\left( \bigcup _{i=1}^N A_i \right) + \sum _{\begin{array}{c} I \subseteq \{1,\ldots ,n\} \\ I \not = \emptyset , |I| \mathrm {\;even} \end{array}} \mathrm{Bel}\left( \bigcap _{i \in I} A_i \right) \ge \sum _{\begin{array}{c} I \subseteq \{1,\ldots ,n\} \\ I \not = \emptyset , |I| \mathrm {\;odd} \end{array}} \mathrm{Bel}\left( \bigcap _{i \in I} A_i \right) , \end{aligned}$$
(3.29)

and it follows that \(\mathrm{Bel}\) is a belief function.

Since \(\mathrm{Bel}\) is a belief function, there is a \(m: 2^\Omega \rightarrow [0,1]\) with \(m(\emptyset )=0\) such that

$$\begin{aligned} \mathrm{Bel}(A) = \sum _{S \subseteq A} m(S). \end{aligned}$$
(3.30)

Now let \(X \in \mathcal {X}\). We write \(X(\Omega ) = \{y_1,\ldots ,y_K\}\) where \(y_1< y_k< \ldots < y_K\). Set \(y_0:=0\). It is well known that the Choquet integral of X can be expressed as

$$\begin{aligned} \sum _{S \subseteq \Omega } m(S) \min _{\omega \in S} X(\omega ) = \sum _{j=1}^K (y_j - y_{j-1}) \mathrm{Bel}( \{X \ge y_j\}). \end{aligned}$$
(3.31)

We have

$$\begin{aligned} \begin{aligned} \sum _{k=1}^K \min _{\omega \in S} (y_k - y_{k-1}) 1_{\{X \ge y_k\}}(\omega )&= \sum _{k=1}^K (y_k - y_{k-1}) 1( S \subseteq \{X \ge y_k \}) \\&= \min _{\omega \in S} X(\omega ). \end{aligned} \end{aligned}$$
(3.32)

So by B-consistency we find that

$$\begin{aligned} \sum _{k=1}^K \mathrm{Buy}_R((y_k - y_{k-1}) 1_{\{X \ge y_k\}})) = \mathrm{Buy}_R(X). \end{aligned}$$
(3.33)

Now it follows with (3.31), (3.33) and coherence of R that

$$\begin{aligned} \mathrm{Buy}_R(X) = \sum _{S \subseteq \Omega } m(S) \min _{\omega \in S} X(\omega ). \end{aligned}$$
(3.34)

\(\square \)

4 Discussion

We have first argued that the classical axioms of Kolmogorov are not suitable for modeling epistemic probability. This has been known for a long time, and researchers like Glenn Shafer and Peter Walley have developed a general theory of belief functions, respectively, lower provisions, as an alternative for classical probability theory in an epistemological context.

The theory of belief functions was never picked up in mainstream probability. One of the reasons for this was the lack of a clear behavioral interpretation of belief functions, analogous to the betting interpretation of probabilities. In this paper, we have developed such a behavioral interpretation. We have embedded belief functions in a betting context, and we have shown that belief functions arise precisely when we add B-consistency to the coherent lower previsions in the theory of Walley. In this way, not only do we provide a behavioral interpretation of belief functions, but we also make a natural connection between the theory of Walley and the theory of Shafer.

Of course adding B-consistency calls for an argument why a rational agent should adhere to it. We think it is not controversial to say that no guaranteed losses (Theorem 3.1) are the bottom line for any reasonable constraint. Beyond that things are, of course, debatable. Our argument to restrict the lower previsions of Walley to B-consistency is derived from the way we obtained it, namely by altering P-consistency in a very reasonable way. If a collection of bets is in some sense guaranteed to be better than another collection of bets, then the total price should be higher. The point is now how ‘better’ should be formulated. When compared to P-consistency, B-consistency allows for all the flexibility of epistemic probability that we asked for in the introduction. Indeed, note that it is possible to set \(m(\Omega )=1\), which corresponds to total ignorance apart from the fact that the outcome is in \(\Omega \). We think it is rational to compare collection of bets from the viewpoint of total guaranteed revenue, and to be willing to pay more when this quantity increases. As such, we think that B-consistency is a very reasonable constraint for a rational agent to adhere to.

But there is more than philosophy, of course. We also want a theory that is practical and relatively easy to apply. Belief functions are close to classical probabilities since they are determined by basic belief assignments. As such, we think they are much more practical and easier to use than coherent lower previsions. Indeed, in practical situations, one does not directly use constraints as coherence or B-consistency to construct a buy function. In case of belief functions, one typically proceeds by constructing a suitable basic belief assignment, see for instance our paper [8] in which we apply belief functions in the classical forensic context of the so-called island problem, precisely by setting up appropriate basic belief assignments. This is very much akin the classical situation: not many people use characterizations like P-consistency (or related characterizations) to set up a probability measure, but it is reassuring that such characterizations exist. Hence, we should view B-consistency as a theoretical underpinning for why to use belief functions, but not as a tool that is used in practice to construct belief functions.

Last but not least, we mention conditional belief functions, a notion which we have not introduced in the current paper. Shafer originally based his notion of conditional belief on the so-called Dempster rule of combination, a rule that has been criticized fiercely, see, e.g., [4, 9]. In a forthcoming paper, we will discuss conditional beliefs from the perspective of the current betting interpretation. It turns out that there are various ways to do this, depending on the precise epistemological situation, and this fact adds to the suitableness of belief functions to model epistemic uncertainty. Another natural line of research is to further develop the theory of belief functions in infinite spaces, an enterprise that already has obtained some attention in, e.g., [15].