A behavioral interpretation of belief functions

Shafer's belief functions were introduced in the seventies of the previous century as a mathematical tool in order to model epistemic probability. One of the reasons that they were not picked up by mainstream probability was the lack of a behavioral interpretation. In this paper we provide such a behavioral interpretation, and re-derive Shafer's belief functions via a betting interpretation reminiscent of the classical Dutch Book Theorem for probability distributions. We relate our betting interpretation of belief functions to the existing literature.


Introduction
Modern mainstream probability theory, be it aleatory or epistemic, is almost exclusively based on the axioms of Kolmogorov, and we start our exposition with quoting these axioms. We fix a finite set Ω of outcomes. (1.1) From an aleatory point of view, the axioms are often justified via a frequentistic interpretation of probabilities. In such a frequentistic interpretation, we take relative frequencies in repeated experiments as the motivation and justification of the axioms.
From an epistemic point of view, the probability of an event A is often interpreted as the degree of belief an agent has in a A. This degree of belief can be quantified as the price for which the agent is willing to both buy and sell a bet that pays out 1 if A turns out to be true. Using the Dutch Book argument this interpretation also leads to the Kolmogorov axiomatization, as is well known.
However, degrees of belief cannot always be satisfactorily described with the classical axioms of Kolmogorov, something which has been recognized and confirmed by many researchers from such different disciplines as mathematics [11,12,13,1,16], legal science [1], and philosophy (see [6,3,4,16] and references therein). These authors have argued that the classical axioms of probability are too restrictive for at least two reasons: 1. It is impossible for an agent to distinguish between disbelief in A, by which we mean P (A c ), and lack of belief, by which we mean 1 − P (A). Especially when we interpret degree of belief in A as the degree to which an agent has supporting evidence for A, lack of supporting evidence for A is not necessarily supporting evidence for A c .
2. It is impossible for an agent to suspend all judgment, that is, assigning P (A) = P (A c ) = 0 is impossible. But certainly it is possible to have no supporting evidence at all for either A or A c . As an example, consider a situation in which a man claims to be the father of a certain child. A DNA test may or may not rule out the man as a potential father of the child, but in any classical probabilistic computation one has to start with a prior probability for the man to be the father.
Often one takes a fifty-fifty prior, but clearly this does not correspond to our knowledge. Since we have no evidence for either fathership or non-fathership, it would be more reasonable to assign zero prior belief to both possibilities, something which is impossible under the Kolmogorov axioms. (Uniform prior probabilities are also problematic from another point of view: lumping states together or changing the scale does not preserve uniformity, as is well known, see e.g. [11].) These and similar considerations motivated Glenn Shafer [11] to introduce belief functions, which were supposed to better capture the nature of epistemic probability. To see how belief functions differ from classical probability distributions, we note that it is well known that (P2) can be expanded, for any collection of events A 1 , . . . , A N , into the well-known inclusion-exclusion formula In the Shafer axiomatization of belief functions, (1.2) is replaced by a corresponding inequality.
One may at this point already loosely argue that the Shafer axioms are indeed suitable for describing degrees of belief, as follows.
A loose interpretation now is as follows. If we have supporting evidence for either A or B, then of course we have also supporting evidence of A ∪ B. However, supporting evidence for both A and B is then counted twice and must be subtracted on the right hand side. Now notice that it is possible that there is supporting evidence for A ∪ B, but not for A or B individually, leading to the inequality in the formula.
Shafer showed that belief functions have the following characterization. There is a one-to-one correspondence between belief functions and basic belief assignments, and Bel is a probability distribution if and only if m concentrates on singletons.
In many situations, there is a natural candidate for the basic belief assignment m, especially when the latter is obtained through a classical probabilistic experiment.
Although belief functions are used in certain applied settings (see [2] for a recent text), researchers in mainstream mathematics have essentially stayed away from it. There are, roughly speaking, two reasons for this. First, there were many critics (see the references in [12]) for which a theory about uncertainty without a behavioral (betting) interpretation was unacceptable.
A second major point of concern is formulated in [10] and [5]. In both references, the main reason to reject Shafer's belief functions is that the calculus of these belief functions, as put forward in the so called Dempster rule of combination to combine two belief functions into a new one, would not be well founded or motivated, and would lead to unacceptable results.
The main goal of the current paper is to show how belief functions arise form a natural betting interpretation, thereby taking away the first major point of concern. We already mentioned above that classically, the degree of belief an agent has in an event A, is interpreted as the price for which he or she is willing to both buy and sell a bet that pays out 1 if A turns out to be true. We argue, however, that a degree of belief an agent has in an event A, should be interpreted as the maximum price for which he or she is willing to buy (not necessarily sell) a bet that pays out 1 if A turns out to be true. In our approach, the difference between buying and selling is interpreted as the difference between disbelief and lack of believe that we mentioned above. The distinction between buying and selling prices also appears in the theory of Peter Walley [16], and below we comment on the relation between his general approach and our theory.
We do not further comment on Dempster's rule of combination for the simple reason that we have no need for this rule, and we develop the theory without it.
Other characterizations of belief functions exist in the literature, for instance in the work of Smets [14]. We think our approach is more direct than his, and as a result our characterization is much more concise (he has 10 requirements). Moreover, unlike Smets, we work in a betting interpretation, so that our result can be seen as an analogue of the Dutch Book Theorem.
The paper is built up as follows. In Section 2 we develop our betting theory, and in Section 3 we state and prove the betting interpretation of belief functions. This section contains our main result Theorem 3.10.

Betting functions and degrees of belief
We approach belief behaviorally, working under the assumption that the degree to which someone believes an event, can be measured by his or her willingness to accept bets. To go towards a definition that we can work with mathematically, we introduce betting functions. Definition 2.1. A bet (or gamble) on Ω is a function X : Ω → R. We write X = R X for the collection of all bets on Ω. A betting function is a function R : X → {0, 1} such that for each X ∈ X , there is an α 0 ∈ R such that R(X + α) = 0 for α < α 0 and R(X + α) = 1 for α ≥ α 0 .
We interpret R as a function that indicates, for each bet X ∈ X , whether or not an agent is willing to accept X, where we interpret R(X) = 1 as 'willing to accept the bet' and R(X) = 0 'not willing to accept the bet'. The definition of a betting function justifies the following definition. Definition 2.2. Let R be a betting function. The buy function Buy R : X → R is given by This is the maximum price an agent is willing to pay for a bet which pays out X. The sell function Sell R : X → R is given by This is the minimum price an agent is willing to sell the bet which pays out X.
The buy and sell function have the following relation: This shows how buy and sell functions are dual in the sense that Buy R is completely determined by Sell R and vice versa. This relation between Buy and Sell is precisely the relation between lower and upper previsions in [16], but in our case the relation follows from the underlying concept of a betting function. Note that we can recover R from Buy R , since These conditions do not necessarily capture everything a reasonable agent should adhere to, and for rational behavior more is needed as we will see. However, the conditions formulate the very basics. Condition (R1) says that agents should be willing to accept bets with guaranteed positive results. Condition (R2) says that willingness to accept bets should only depend on the relative sizes of the results, but not on their absolute sizes. Condition (R3) says that if an agent is willing to accept two bet, (s)he should be willing to accept bets simultaneously. We want to point out that both condition (R2) and (R3) have a debatable consequence: if an agent is willing to accept a bet in which (s)he wins 1 euro if A is true and loses 1 euro if A is false, then (s)he) should be willing to accept a bet in which (s)he wins 1 million euro if A is true and loses 1 million euro if A is false. In the real world, one could think of all kinds of reasons why an agent would not want to accept the second bet, even if (s)he is willing to accept the first one. By imposing coherence, we thus consider highly idealized agents.
The following result says that Buy R is coherent as a lower prevision in the sense of Walley [16] if and only if R is coherent in the sense of Definition 2.3.
The second property tells us that, for every λ > 0, we have By the third property, we have The second and third property of Theorem 2.4 tell us that Buy R is a super-linear functional if R is coherent. Coherence of R can of course also be captured in terms of Sell R : Proof. This follows directly from Theorem 2.4 and the relation between Buy R and Sell R .
We want to measure the degree to which an agent believes A ⊆ Ω by the willingness to accept a bet with a desirable result if A is true and an undesirable result if A is false. The actions that have a desirable result if A is true and an undesirable result if A is false, are buying the bet 1 A and selling the bet 1 A c . It is willingness to buy 1 A for a high price and willingness to sell 1 A c for a low price, that shows a high degree of belief in A. This leads to our definition of degree of belief. Definition 2.6 (Degree of belief). Let R be the coherent betting function of an agent. We define the degree to which this agent believes an event A ⊆ Ω as Buy R (1 A ) = 1 − Sell R (1 A c ).

Adding B-consistency
In this section, we introduce an additional condition for betting functions and show that this constraint precisely leads to buy functions which are belief functions when restricted to bets of the form 1 A , with A ⊂ Ω. Before we do this, however, we will briefly discuss how our setup relates to the Dutch Book argument for probability distributions.
The Dutch Book argument is centered around the principle that betting behavior of agents should not lead to sure loss. The following theorem tells us that not having a sure loss is already implied by coherence. This theorem is well known within Walley's theory, but we give our version of the proof as a service to the reader.
Proof. We first show that Buy R (X) ≤ max X for each X ∈ X . Suppose that Buy R (X) > max X. This means there is an ǫ > 0 such that Note that for every Y ∈ X , there is a λ > 0 such that λ(X −max X −ǫ) < Y . Since R(λ(X − max X − ǫ)) = 1 by (R2) and R(Y − λ(X − max X − ǫ)) = 1 by (R1), it follows with (R3) that R(Y ) = 1. Hence R(Y ) = 1 for all Y ∈ X , but then there is no maximum α ∈ R such that R(Y − α) = 1. This is a contradiction, and it follows that Buy R (X) ≤ max X. Now let X 1 , . . . , X N ∈ X and Y 1 , . . . , Y M ∈ X . By using Theorem 2.4 and the property we just proved, we find Theorem 3.1 tells us that coherence is stronger than having no sure loss. At the same time, even coherence does not imply that A → Buy R (1 A ) is a probability distribution: it is clear that the collection of maps A → Buy R (1 A ) for coherent R, is much richer than only probability distributions. This tells us that the property that Buy R (1 A ) = Sell R (1 A ) for every A ⊆ Ω, which is usually implicitly assumed when the Dutch Book argument is laid out, is crucial for the restriction to probability distributions. The next theorem confirms this. Proof. Suppose there is a coherent R such that P (A) = Buy R (1 A ) = Sell R (1 A ). We have P (Ω) = Buy R (1 Ω ) = 1 and P (A) = Buy R (1 A ) ≥ 0 by coherence. Then for disjoint A, B ⊆ Ω we find Hence P is additive.
For the converse, suppose that P is a probability distribution. Then define Buy R (X) = ω∈Ω P ({ω})X(ω). Clearly, R is coherent and we have Buy R (X) = Sell R (X).
As we already mentioned, the constraint that Buy R (1 A ) = Sell R (1 A ) for every A ⊆ Ω, is precisely one we do not want to impose. This property, however, is not easily weakened in a reasonable way. Therefore, we will work towards another characterization of probability distributions (Theorem 3.7), from which we can derive our constraint. We start with the definition of a belief valuation. A belief valuation is also called a categorical belief function or a 0-1 necessity measure in the literature. In practice, we use the characterization that B is a belief valuation if and only if there is a nonempty set E B ⊆ Ω such that We can also describe belief  Proof. Suppose first that B is a belief valuation. We define R by Clearly R is coherent and it follows from the definition of B R that B R = B.
For the converse, suppose that R is a coherent betting function. We check that B R is a belief valuation. Since Buy R (1 Ω ) = 1 we have B R (Ω) = 1. The second property in Definition 3.3 follows immediately from the monotonicity of Buy R . If Buy R (A) = Buy R (B) = 1, then A belief valuation should be compared with the classical notion of a truth valuation. The definition of a truth valuation T : 2 Ω → {0, 1} is similar to the definition of a belief valuation, the only difference being that in the first bullet, 'if' is replaced by 'if and only if'. Hence a truth valuation is a special belief valuation, namely one that corresponds to a proper ultrafilter of sets. Truth valuations are precisely those belief valuations B for which E B is a singleton. A major difference between truth and belief is that if an agent does not believe in A, this does not imply that (s)he does believe in A c .
As a result, the implication in the first bullet in the definition of a belief valuation goes in one direction only.
Given a belief valuation B and a set S ⊆ Ω for which B(S) = 1, the agent fully believes that a bet X ∈ X has a revenue of at least min ω∈S X(ω). (3.9) Since this holds for all S for which B(S) = 1, this leads to the definition of guaranteed revenue. To benchmark and motivate our main result Theorem 3.10 below we now first show how classical probability distributions can be characterized with the notion of guaranteed revenue. Definition 3.6. A betting function R is P-consistent if and only if, for all X 1 , . . . , X N and Y 1 , . . . , Y M such that for every belief valuation B, we have Theorem 3.7. P is a probability distribution if and only if there exists a coherent and P-consistent R such that P (A) = Buy R (1 A ).
In words, this result says that if the guaranteed revenue of one collection of bets is larger than the guaranteed revenue of a second collection of bets, then the agent should be willing to pay more for the second collection.
The proof of the theorem below reveals that the statement of the theorem is, strictly speaking, overly complicated. Indeed, if we would leave out G B everywhere, the ensuing result would still be true, and probably easier to interpret: the theorem would say that if one collection of bets always pays out more than a second collection, an agent would be willing to pay more for the first collection. We have chosen for the current formulation since this formulation points the way for the necessary changes.
Proof. (of Theorem 3.7.) First suppose that R is coherent, P-consistent and that P (A) = Buy R (1 A ). Suppose A ∩ B = ∅. Then for every B, so by P-consistency we have (3.14) Hence P (A ∪ B) = P (A) + P (B). Since R is coherent, we have P (Ω) = Buy R (1 Ω ) = 1 and it follows that P is a probability distribution. Now suppose that P is a probability distribution. We define R by Clearly Buy R (1 A ) = P (A), and it follows from Theorem 2.4 that R is coherent. Now suppose that for every belief valuation B. For every ω ∈ Ω, the map B ω (A) := 1 A (ω) is a belief valuation. Note that G Bω (X) = X(ω), so (3.16) implies that Hence, by our definition of Buy R : Buy R (Y j ).
Although we do not deny that the notion of P-consistency is in some sense a reasonable requirement for collections of bets, there is, from the point of view of epistemic probability, a problem with it. Whereas in (3.11) the guaranteed revenues of the combined bets are compared, in (3.12) the sums of the buy prices of the individual bets are compared. This observation goes back to the heart of the problem with the use of classical probability distributions for epistemic purposes: the guaranteed revenue of the combined bets 1 A and 1 c A is -of course -the same as the guaranteed revenue of the bet 1, under any belief valuation. But this should not have any direct implication for the maximum price for which an agent would be willing to buy 1 A or 1 c A individually.
This suggests that for epistemic purposes, we should change P-consistency in one of the following two ways: in (3.11) the sums of the guaranteed revenues of the individual bets should be compared, or in (3.12) the buy prices of the combined bets should be compared. The first option leads to our definition of B-consistency.
There is a simple reason for this choice. Indeed, the alternative would result in the constraint that if G B ( i X i ) ≤ G B ( j Y j ) for every belief valuation B, then Buy R ( X i ) ≤ Buy R ( j Y j ). This constraint, however, is simply B-consistency for N = M = 1.
Hence, the notion of B-consistency differs from P-consistency in the sense that we compare the correct quantities: we compare sums of guaranteed revenues of individual (collective) bets to sums of buy prices of individual (collective) bets.
The next example illustrates that not every coherent betting function is B-consistent.
Example 3.9. Suppose Ω = {1, 2, 3, 4} and that R is given by It is easy to check that R is coherent, and that for all belief valuations B we have With the notion of B-consistency we can now state and prove our main result. The following theorem constitutes our behavioral interpretation of belief functions. It shows that only B-consistency is needed to guarantee that a lower prevision in the sense of Walley [16] is in fact a belief function and that every belief function can be obtained this way. The result legitimizes the use of belief functions for modelling epistemic probability. This result follows immediately from the following theorem which is interesting in its own right and characterizes B-consistency for coherent betting functions. for all X ∈ X .
The expression in (3.24) is known as the Choquet integral of X with respect to the belief function that corresponds to m (see for instance [7]), and is also called the lower expectation of X (see for instance [13]). Hence, another way of phrasing Theorem 3.11 is that R is B-consistent if and only if Bel(A) := Buy R (1 A ) is a belief function and Buy R (X) is given by the Choquet integral of X with respect to Bel. We now give the proof.
Proof of Theorem 3.11. First suppose that there is a basic belief assignment m such that for all X ∈ X . Also suppose that for X 1 , . . . , X N , for every nonempty S ⊆ Ω. Then Buy R (Y j ). (3.26) For the converse of the theorem, suppose that R is B-consistent. First we show that Bel(A) := Buy R (1 A ) is a belief function. Clearly we have Now let X ∈ X . We write X(Ω) = {y 1 , . . . , y K } where y 1 < y k < . . . < y K . Set y 0 := 0. It is well known that the Choquet integral of X can be expressed as

Discussion
We have first argued that the classical axioms of Kolmogorov are not suitable for modeling epistemic probability. This has been known for a long time, and researchers like Glenn Shafer and Peter Walley have developed a general theory of belief functions, respectively lower provisions, as an alternative for classical probability theory in an epistemological context. The theory of belief functions was never picked up in mainstream probability. One of the reasons for this was the lack of a clear behavioral interpretation of belief functions, analogous to the betting interpretation of probabilities. In this paper we have developed such a behavioral interpretation. We have embedded belief functions in a betting context, and we have shown that belief functions arise precisely when we add Bconsistency to the coherent lower previsions in the theory of Walley. In this way, not only do we provide a behavioral interpretation of belief functions, but we also make a natural connection between the theory of Walley and the theory of Shafer.
Of course adding B-consistency calls for an argument why a rational agent should adhere to it. We think it is not controversial to say that no guaranteed losses (Theorem 3.1) is the bottom line for any reasonable constraint. Beyond that things are, of course, debatable. Our argument to restrict the lower previsions of Walley to B-consistency is derived from the way we obtained it, namely by altering P-consistency in a very reasonable way. If a collection of bets is in some sense guaranteed to be better than another collection of bets, then the total price should be higher. The point is now how "better" should be formulated. When compared to P-consistency, B-consistency allows for all the flexibility of epistemic probability that we asked for in the introduction. Indeed, note that it is possible to set m(Ω) = 1, which corresponds to total ignorance apart from the fact that the outcome is in Ω. We think it is rational to compare collection of bets from the viewpoint of total guaranteed revenue, and to be willing to pay more when this quantity increases. As such, we think that B-consistency is a very reasonable constraint for a rational agent to adhere to.
But there is more than philosophy, of course. We also want a theory that is practical and relatively easy to apply. Belief functions are close to classical probabilities since they are determined by basic belief assignments. As such, we think they are much more practical and easier to use than coherent lower previsions. Indeed, in practical situations, one does not directly use constraints as coherence or B-consistency to construct a buy function. In case of belief functions, one typically proceeds by constructing a suitable basic belief assignment, see for instance our paper [8] in which we apply belief functions in the classical forensic context of the so called island problem, precisely by setting up appropriate basic belief assignments. This is very much akin the classical situation: not many people use characterizations like P-consistency (or related characterizations) to set up a probability measure, but it is reassuring that such characterizations exist. Hence we should view B-consistency as a theoretical underpinning for why to use belief functions, but not as a tool that is used in practice to construct belief functions.
Last but not least, we mention conditional belief functions, a notion which we have not introduced in the current paper. Shafer originally based his notion of conditional belief on the so called Dempster rule of combination, a rule that has been criticized fiercely, see e.g. [9,4]. In a forthcoming paper we will discuss conditional beliefs from the perspective of the current betting interpretation. It turns out that there are various ways to do this, depending on the precise epistemological situation, and this fact adds to the suitableness of belief functions to model epistemic uncertainty. Another natural line of research is to further develop the theory of belief functions in infinite spaces, an enterprise that already has obtained some attention in e.g. [15].