1 Introduction

Ramsey’s groundbreaking work “Truth and Probability” (1926) established the decision theoretic approach to subjective probability, or, in his terminology, to degree of belief. Ramsey’s idea was to consider a person who has to choose between different practical options, where the outcome of the decision depends on unknown facts. One’s decision will be determined by (i) one’s probabilistic assessment of the facts, i.e., one’s degrees of belief in the truth of various propositions, and (ii) one’s personal benefits that are associated with the possible outcomes of the decision. Assuming that the person is a rational agent—whose decisions are determined by some assignment of degrees of belief to propositions and utility values to the outcomes—we should, in principle, be able to derive the person’s degrees of belief and utilities from the person’s decisions. Ramsey proposed a system for modeling the agent’s point of view in which this can be done. The goal of the project is a representation theorem, which shows that the rational agent’s decisions should be determined by the expected utility criterion.

The system proposed by Savage (1954, 1972) is the first decision-theoretic system that comes after Ramsey’s, but it is radically different from it, and it was Savage’s system that put the decision-theoretic approach on the map.Footnote 1 To be sure, in the intervening years a considerable body of research has been produced in subjective probability, notably by de Finetti (1937a, b), and by Koopman (1940a, b, 1941), whose works, among many others, are often mentioned by Savage. De Finetti also discusses problems related to expected utility. Yet these approaches were not of the decision-theoretic type: they did not aim at a unified account in which the subjective probability is derivable from decision making patterns. It might be worthwhile to devote a couple of pages to Ramsey’s proposal, for its own sake and also to put Savage’s work in perspective. We summarize and discuss Ramsey’s work in Appendix A.

The theory as presented in Savage (1954, 1972) has been known for its comprehensiveness and its clear and elegant structure. Some researchers have deemed it the best theory of its kind: Fishburn (1970) has praised it as “the most brilliant axiomatic theory of utility ever developed” and Kreps (1988) describes it as “the crowning glory of choice theory.”

The system is determined by (I) The formal structure, or the basic design, and (II) The axioms that the structure should satisfy, or—in Savage’s terminology—the postulates. Savage’s crucial choice of design is to base the model on two independent coordinates: (i) a set S of states (which correspond to what in other systems is the set possible worlds) and (ii) a set of consequences, X, whose members represent the outcomes of one’s acts. The acts themselves, whose collection is denoted here as \({\mathcal A}\), constitute the third major component. They are construed as functions from S into X. The idea is simple: the consequence of one’s act depends on the state of the world. Therefore, the act itself can be represented as a function from the set of states into the set of consequences. Thus, we can use heuristic visualization of two coordinates in a two-dimensional space.

S is provided with additional structure, namely, a Boolean algebra \({\mathcal B}\) of subsets of S, whose members are called events (which, in another terminology, are propositions). The agent’s subjective, or personal view is given by the fourth component of the system, which is a preference relation, \(\succcurlyeq \), defined over the acts. All in all, the structure is:

$$\begin{aligned} (S, X, {\mathcal A}, \succcurlyeq , {\mathcal B}) \end{aligned}$$

We shall refer to it as a Savage-type decision model, or, for short, decision model. Somewhat later in his book Savage introduces another important element: that of constant acts. It will be one of the focus points of our paper and we shall discuss it shortly. (For contrast, note that in Ramsey’s system the basic component consists of propositions and worlds, where the latter can be taken as maximally consistent sets of propositions. There is no independent component of “consequences.”)

Savage’s notion of consequences corresponds to the “goods” in vNM—the system presented in von Neumann and Morgenstern (1944). Now vNM uses gambles that are based on an objective \(\sigma \)-additive probability distribution. Savage does not presuppose any probability but has to derive the subjective probability within his system. The most striking feature of that system is the elegant way of deriving—from his first six postulates—a (finitely additive) probability over the Boolean algebra of events. That probability is later used in defining the utility function, which assigns utilities to the consequences. The definition proceeds along the lines of vNM, but since the probability need not be \(\sigma \)-additive, Savage cannot apply directly the vNM construction. He has to add a seventh postulate and the derivation is somewhat involved.

We assume some familiarity with the Savage system. For the sake of completeness we include some additional definitions and a list of the postulates (stated in forms equivalent to the originals) in Appendix B.

As far as the postulates are concerned, Savage’s system constitutes a very successful decision theory, including a decision-based theory of subjective probability. Additional assumptions, which are not stated as axioms, are however required: (i) in Savage’s derivation of subjective probability, and (ii) in his derivation of personal utility. These assumptions are quite problematic and our goal here is to show how they can be eliminated and how the elimination yields a simpler and more realistic theory.

The first problematic assumption is the \(\sigma \)-algebra assumption: In deriving the subjective probability, Savage has to assume that the Boolean algebra, \({\mathcal B}\), over which the probability is to be defined is a \(\sigma \)-algebra (i.e., closed under countable infinite unions and intersections). Savage insists however that we should not require the subjective probability to be \(\sigma \)-additive.

He fully recognizes the importance of the mathematical theory, which is based on the Kolmogorov axioms according to which \({\mathcal B}\) is a \(\sigma \)-algebra and the probability is \(\sigma \)-additive; but he regards \(\sigma \)-additivity as a sophisticated mathematical concept, whose comprehension may lie beyond that of our rational agent. Rationality need not require having the abilities of a professional mathematician. In this Savage follows de Finetti (it should be noted that both made important mathematical contributions to the theory that is based on the Kolmogorov axioms). It is therefore odd that the Boolean algebra, over which the finitely additive probability is to be defined, is required to be a \(\sigma \)-algebra. Savage notes this oddity and justifies it on grounds of expediency, he sees no other way of deriving the quantitative probability that is needed for the purpose of defining expected utilities:

It may seem peculiar to insist on \(\sigma \)-algebra as opposed to finitely additive algebras even in a context where finitely additive measures are the central object, but countable unions do seem to be essential to some of the theorems of §3—for example, the terminal conclusions of Theorem 3.2 and Part 5 of Theorem 3.3.   (p. 43)

The theorems he refers to are the places where his proof relies on the \(\sigma \)-algebra assumption. The \(\sigma \)-algebra assumption is invoked by Savage in order to show that the satisfaction of some axioms regarding the qualitative probability implies that there is a unique finitely additive probability that agrees with the qualitative one. We eliminate it by showing that there is a way of defining the finitely additive numeric probability, which does not rely on that assumption. This is the hard technical core of the paper, which occupies almost a third of it. We develop for this purpose a new technique based on what we call tri-partition trees.

Now this derived finitely additive probability later serves in defining the expected utilities. Savage’s way of doing this requires that the probability should have a certain property, which we shall call “completeness” (Savage does not give it a name). He uses the \(\sigma \)-algebra assumption a second time in order to show that the probability that he defined is indeed complete. This second use of the \(\sigma \)-algebra assumption can be eliminated by showing that (i) without the \(\sigma \)-algebra assumption, the defined probability satisfies a certain weaker property “weak completeness” and (ii) weak completeness is sufficient for defining the expected utilities.

The second problematic assumption we address in this paper concerns constant acts. An act f is said to be constant if for some fixed consequence \(a\in X\), \(f(x) = a\), for all \(x\in S\).Footnote 2 Let \(\mathfrak {c}_a\) denote that act. Note that, in Savage’s framework, the utility-value of a consequence depends only on the consequence, not on the state in which it is obtained. Hence, the preorder among constant acts induces a preorder of the corresponding consequencesFootnote 3:

$$\begin{aligned} a\ge b \; \iff _\text {Df} \; \mathfrak {c}_a\succcurlyeq \mathfrak {c}_b \end{aligned}$$

where ab range over all consequences for which \(\mathfrak {c}_a\) and \(\mathfrak {c}_b\) exist. The Constant Acts Assumption (CAA) is:

CAA: :

For every consequence \(a\in X\) there exists a constant act \(\mathfrak {c}_a \in {\mathcal A}\).

Savage does not state CAA explicitly, but it is clearly implied by his discussion and it is needed in his proof of the representation theorem. Note that if CAA holds then the above induced preorder is a total preorder of X.

By a simple act we mean an act with a finite range of values. The term used by Savage (1972, p. 70) is ‘gamble’; he defines it as an act, f, such that, for some finite set, A, \(f^{-1}(A)\) has probability 1. It is easily seen that an act is a gamble iff it is equivalent to a simple act. ‘Gamble’ is also used in gambling situations, where one accepts or rejects bets. We shall use ‘simple act’ and ‘gamble’ interchangeably. Using the probability that has been obtained already, the following is derivable from the first six postulates and CAA.

Proposition 1.1

(Simple act utility) We can associate utilities with all consequences, so that, for all simple acts the preference is determined by the acts’ expected utilities.Footnote 4

CAA has however highly counterintuitive implications, a fact that has been observed by several writers.Footnote 5 The consequences of a person’s act depend, as a rule, on the state of the world. More often than not, a possible consequence in one state is impossible in another. Imagine that I have to travel to a nearby city and can do this either by plane or by train. At the last moment I opt for the plane, but when I arrive at the airport I find that the flight has been canceled. If a and b are respectively the states flight-as-usual and flight-canceled, then the consequence of my act in state a is something like ‘arrived at X by plane at time Y.’ This consequence is impossible—logically impossible, given the laws of physics—in state b. Yet CAA implies that this consequence, or something with the same utility-value, can be transferred to state b.Footnote 6 Our result shows that CAA can be avoided at some price, which—we later shall argue—is worth paying. To state the result, let us first define feasible consequences: A consequence a is feasible if there exists some act, \(f\in {\mathcal A}\), such that \(f^{-1}(a)\) is not a null event.Footnote 7 It is not difficult to see that the name is justified and that unfeasible consequences, while theoretically possible, are merely a pathological curiosity. Note that if we assume CAA then all consequences are trivially feasible. Let us replace CAA by the following much weaker assumption:

2CA: :

There are two non-equivalent constant acts \(\mathfrak {c}_a\) and \(\mathfrak {c}_b\).

(Note that 2CA makes the same claim as postulate P5; but this is misleading: while P5 presupposes CAA, 2CA does not.) Having replaced CAA by 2CA we can prove the following:

Proposition 1.2

(Simple act utility*) We can associate utilities with all feasible consequences, so that, for all simple acts, the preference is determined by the act’s expected utilities.

It is perhaps possible to extend this result to all acts whose consequences are feasible. This will require a modified form of P7. But our proposed modification of the system does not depend on there being such an extension. In our view the goal of a subjective decision theory is to handle all scenarios of having to choose from a finite number of options, involving altogether a finite number of consequences. Proposition 1.2 is therefore sufficient. The question of extending it to all feasible acts is intriguing because of its mathematical interest, but this is a different matter.

The rest of the paper is organized as follows. In what immediately follows we introduce some further concepts and notations that will be used throughout the paper. Section 2 is devoted to the analysis of the notions of idealized rational agents and what being “more realistic” about it entails. We argue that, when carried too far, the idealization voids the very idea underlying the concept of personal probability and utility; the framework then becomes, in the best case, a piece of abstract mathematics. Section 3 is devoted to the \(\sigma \)-algebra assumption. It consists of a short overview of Savage’s original proof followed by a presentation of the tri-partition trees and our proof, which is most of the section. In Sect. 3.3, we outline a construction by which, from a given finite decision model that satisfies P1–P5, we get a countably infinite decision model that satisfies P1–P6; this model is obtained as a direct limit of an ascending sequence of finite models. In Sect. 4, we take up the problem of CAA. We argue that, as far as realistic decision theory is concerned, we need to assign utilities only to simple acts. Then we indicate the proof of Proposition 1.2. To a large extent this material has been presented in Gaifman and Liu (2015), hence we contend ourselves with a short sketch.

Some terminologies, notations, and constructions Recall that ‘\(\succcurlyeq \)’ is used for the preference relation over the acts. \(f\succcurlyeq g\) says that f is equi-or-more preferable to g; \(\preccurlyeq \) is its converse. \(\succcurlyeq \) is a preorder, which means that it is a reflexive and transitive relation; it is also total, which means that for every fg either \(f\succcurlyeq g\) or \(g\succcurlyeq f\). If \(f\succcurlyeq g\) and \(f\preccurlyeq g\) then the acts are said to be equivalent, and this is denoted as \(f\equiv g\). The strict preference is denoted as \(f\succ g\); it is defined as \( f\succcurlyeq g\ \text{ and }\ g \not \succcurlyeq f\), and its converse is denoted as \(\prec \).

Cut-and-Paste: :

If f and g are acts and E is an event then we define

$$\begin{aligned} (f|E+g|\overline{E})(s) =_\text{ Df }{\left\{ \begin{array}{ll} f(s)&{}\text { if } s\in E\\ g(s)&{}\text { if } s\in \overline{E}, \end{array}\right. } \end{aligned}$$

where \(\overline{E}=S-E=\) the complement of E.Footnote 8

Note that \(f|E+g|\overline{E}\) is obtained by “cutting and pasting” parts of f and g, which results in the function that agrees with f on E, and with g on \(\overline{E}\). Savage takes it for granted that the acts are closed under cut-and-paste. Although the stipulation is never stated explicitly, it is obviously a property of \({\mathcal A}\). It is easily seen that by iterating the cut-and-paste operations just defined we get a cut-and-paste that involves any finite number of acts. It is of the form:

$$\begin{aligned} f_1|E_1 + f_2|E_2 + \ldots + f_n|E_n \end{aligned}$$

where \(\{E_1,\ldots , E_n\}\) is a partition of S.

Recall that, for any given consequence \(a\in X\), \(\mathfrak {c}_a\) is the constant act whose consequence is a for all states. This notation is employed under the assumption that such an act exists. If \(\mathfrak {c}_a \succcurlyeq \mathfrak {c}_b\) then we put: \(a\ge b\). Similarly for strict preference. Various symbols are used with systematic ambiguity, e.g., ‘\(\equiv \)’ for acts and for consequences, ‘\(\le \)’ and ‘<’ for consequences as well as for numbers. Later, when qualitative probabilities are introduced, we shall use \(\succeq \) and \(\preceq \), for the “greater-or-equal” relation (or “weakly more probable” relation) and its converse, and \(\succ \) and \(\prec \) for the strict inequalities. Note that, following Savage, we mean by a numeric probability a finitely additive probability function. If \(\sigma \)-additivity is intended it will be clearly indicated.

2 The logic of the system and the role of “idealized rational agents”

The decision theoretic approach construes a person’s subjective probability in terms of its function in determining the person’s decision under uncertainty. The uncertainty should however stem from lack of empirical knowledge, not from one’s limited deductive capacities. One could be uncertain because one fails to realize that such and such facts are logically deducible from other known facts. This type of uncertainty does not concern us in the context of subjective probability. Savage (1972, p. 7) therefore posits an idealized person, with unlimited deductive capacities in logic, and he notes (in a footnote on that page) that such a person should know the answers to all decidable mathematical propositions. By the same token, we should endow our idealized person with unlimited computational powers. This is of course unrealistic; if we do take into account the rational agent’s bounded deductive, or computational resources, we get a “more realistic” system. This is what Hacking (1967) meant in his “A slightly more realistic personal probability;” a more recent work on that subject is Gaifman (2004). But this is not the sense of “realistic” of the present paper. By “realistic” we mean conceptually realistic; that is, a more realistic ability to conceive impossible fantasies and treat them as if they were real.

We indicated in the introduction that CAA may give rise to agents who have such extraordinary powers of conceiving. We shall elaborate on this sort of unrealistic abilities shortly. The \(\sigma \)-algebra assumption can lead to even more extreme cases in a different area: the foundation of set theory. We will not go into this here, since this would require too long a detour.

It goes without saying that the extreme conceptual unrealism, of the kind we are considering here, has to be distinguished from the use of hypothetical mundane scenarios—the bread-and-butter of every decision theory that contains more than experimental results. Most, if not all, of the scenarios treated in papers and books of decision theory are hypothetical, but sufficiently grounded in reality. The few examples Savage discusses in his book are of this kind. The trouble is that the solutions that he proposes require that the agent be able to assess the utilities of physical impossibilities and to weigh them on a par with everyday situations.

Let us consider a simple decision problem, an illustrative example proposed by Savage (1972, pp. 13–14), which will serve us for more than one purpose. We shall refer to it as Omelet. John (in Savage 1972 he is “you”) has to finish making an omelet started by his wife, who has already broken into a bowl five good eggs. A sixth unbroken egg is lying on the table, and it must be either used in making the omelet, or discarded. There are two states of the world \(\textit{good}\) (the sixth egg is good) and \(\textit{rotten}\) (the sixth egg is rotten). John considers three possible acts, \(f_1\): break the sixth egg into the bowl, \(f_2\): discard the sixth egg, \(f_3\): break the sixth egg into a saucer; add it to the five eggs if it is good, discard it if it is rotten. The consequences of the acts are as follows:

figure a

Omelet is one of the many scenarios in which CAA is highly problematic. It requires the existence of an act by which a good six-egg omelet is made out of five good eggs and a rotten one.Footnote 9 Quite plausibly, John can imagine a miracle by which a six-egg omelet is produced from five good eggs and a rotten one; this lies within his conceptual capacity. But this would not be sufficient; he has to take the miracle seriously enough, so that he can rank it on a par with the other real possibilities, and eventually assign to it a utility value. This is what the transfer of six-egg omelet from good to rotten means. In another illustrative example (Savage 1972, p. 25), the result of such a miraculous transfer is that the person can enjoy a refreshing swim with her friends, while in fact she is “...sitting on a shadeless beach twiddling a brand-new tennis racket”—because she bought a tennis racket instead of a bathing suit—“while her friends swim.” CAA puts extremely high demands on what the agent, even an idealized one, should be able to conceive.

CAA is the price Savage has to pay for making the consequences completely independent of the states.Footnote 10 A concrete consequence is being abstracted so that only its personal value remains. These values can be then smoothly transferred from one state to another. Our suggestion for avoiding such smooth transfers is described in the introduction. In Sect. 4 we shall argue that the price one has to pay for this is worth paying.

Returning to Omelet, let us consider how John will decide. It would be wrong to describe him as appealing to some intuitions about his preference relation, or interrogating himself about it. John determines his preferences by appealing to his intuitions about the likeliness of the states and the personal benefits he might derive from the consequences.Footnote 11 If he thinks that good is very likely and washing the saucer, in the case of rotten, is rather bothersome, he will prefer \(f_1\) to the other acts; if washing the saucer is not much of a bother he might prefer \(f_3\); if wasting a good egg is no big deal, he might opt for \(f_2\).

If our interpretation is right, then a person derives his or her preferences by combining subjective probabilities and utilities. On the other hand, the representation theorem goes in the opposite direction: from preference to probability and utility. As a formal structure, the preference relation is, in an obvious sense, more elementary than a real valued function. If it can be justified directly on rationality grounds, this will yield a normative justification to the use probability and utility.

The Boolean algebra in Omelet is extremely simple; besides S and \(\varnothing \) it consists of two atoms. The preference relation implies certain constraints on the probabilities and the utility-values, but it does not determine them. This, as a rule, is the case whenever the Boolean algebra is finite.Footnote 12 Now the idea underlying the system is that if the preference relation is defined over a sufficiently rich set of acts (and if it satisfies certain plausible postulates) then both probabilities and utilities are derivable from it. As far as the probability is concerned, the consequences play a minor role. We need only two non-equivalent constant acts, say \(\mathfrak {c}_a, \mathfrak {c}_b\), and we need only the preferences over two-valued acts, in which the values are a or b. But \({\mathcal B}\) has to satisfy P6\('\), which implies that is must be infinite; moreover, in Savage’s system, which includes the \(\sigma \)-algebra assumption, the set of states, as well as Boolean algebra should have cardinalities that are \(2^{\aleph _0}\) at least. Our result makes it possible to get a countable Boolean algebra, \({\mathcal B}\), and a decision model \((S, X, {\mathcal A}, \succcurlyeq , {\mathcal B})\) which is a direct limit of an ascending sequence of substructures \((S_i, X, {{\mathcal A}}_i, {\succcurlyeq }_i, {\mathcal B}_i)\), where the \(S_i\)’s are finite, and where X is any fixed set of consequences containing two non-equivalent ones. This construction is described briefly at the end of the next section.

3 Eliminating the sigma-algebra assumption

3.1 Savage’s derivation of numeric probabilities

Savage’s derivation of a numeric probability comprises two stages. First, he defines, using P1–P4 and the assumption that there are two non-equivalent constant acts, a qualitative probability. This is a binary relation, \(\succeq \) , defined over events, which satisfies the axioms proposed by de Finetti (1937a) for the notion of “X is weakly more probable than Y.” The second stage is devoted to showing that if a qualitative probability, \(\succeq \) , satisfies certain additional assumptions, then there is a unique numeric probability, \(\mu \), that represents \(\succeq \); that is, for all events EF:

$$\begin{aligned} E\succeq F\iff \mu (E)\ge \mu (F) \end{aligned}$$
(3.1)

Our improvement on Savage’s result concerns only the second stage. For the sake of completeness we include a short description of the first.

3.1.1 From preferences over acts to qualitative probabilities

The qualitative probability, \(\succeq \), is defined by:

Definition 3.1

For any events EF, say that E is weakly more probable than F, written \(E\succeq F\) (or \(F\preceq E\)), if, for any \(\mathfrak {c}_a\) and \(\mathfrak {c}_b\) satisfying \(\mathfrak {c}_a\succ \mathfrak {c}_b\), we have

$$\begin{aligned} \mathfrak {c}_{a}|E+ \mathfrak {c}_{b}|\overline{E}\, \succcurlyeq \, \mathfrak {c}_{a}|F+ \mathfrak {c}_{b}|\overline{F}. \end{aligned}$$
(3.2)

E and F are said to be equally probable, in symbols \(E\equiv F\), if both \(E\succeq F\) and \(F\succeq E\).

Savage’s P4 guarantees that the above concept is well defined, i.e., (3.2) does not depend on the choice of the pair of constant acts. The definition has a clear intuitive motivation and it is not difficult to show that \(\succeq \) is a qualitative probability, as defined by de Finetti (in an equivalent formulation used by Savage):

Definition 3.2

(Qualitative probability) A binary relation \(\succeq \) over \({\mathcal B}\) is said to be a qualitative probability if the following hold for all \(A,B,C\in {\mathcal B}\):

  1. i.

    \(\succeq \) is a total preorder,

  2. ii.

    \( A \succeq \varnothing \),

  3. iii.

    \( S \succ \varnothing \),

  4. iv.

    if \(A\cap C=B\cap C=\varnothing \) then

    $$\begin{aligned} A\succeq B\iff A\cup C\succeq B\cup C. \end{aligned}$$
    (3.3)

For a given decision model, which satisfies P1–P4 and which has two non-equivalent constant acts, the qualitative probability of the model is the qualitative probability defined via Definition 3.1. If that qualitative probability is representable by a quantitative probability, and if moreover the representing probability is unique, then we get a single numeric probability and we are done.Footnote 13 The following postulate ascribes to the qualitative probability the property which Savage (1972, p. 38) suggests as the key for deriving numeric probabilities.

P6 \('\) : :

For any events EF, if \(E\succ F\), then there is a partition \(\{P_i\}_{i=1}^n\) of S such that \(E\succ F\cup P_i\) for all \(i=1,\ldots ,n\).

Note that P6\('\) is not stated in terms of a preference relation (\(\succcurlyeq \)) over acts. But, given the way in which the qualitative probability has been defined in terms of \(\succcurlyeq \), P6\('\) is obviously implied by P6 (see Appendix B). As Savage describes it, the motivation for P6 is its intuitive plausibility and its obvious relation to P6\('\).

Before proceeding to the technical details that occupy most of this section it would be useful to state for comparison the two theorems, Savage’s and ours, and pause on some details regarding the use of the probability function in the derivation of utilities.

3.1.2 Overview of the main results

We state the results as theorems about qualitative probabilities. The corresponding theorems within the Savage framework are obtained by replacing the qualitative probability \(\succeq \) by the preference relation over acts \(\succcurlyeq \), and P6\('\) by P6.

Theorem 3.3

(Savage) Let \(\succeq \) be a qualitative probability defined over the Boolean algebra \({\mathcal B}\). If (i) \(\succeq \) satisfies P6\('\) and (ii) \({\mathcal B}\) is a \(\sigma \)-algebra, then there is a unique numeric probability \(\mu \), defined over \({\mathcal B}\), which represents \(\succeq \). That probability has the following property:

\((\dagger )\) :

For any event A and any \(\rho \in (0,1)\), there exists an event \(B\subseteq A\) such that \(\mu (B) = \rho \cdot \mu (A)\).

Theorem 3.4

(Main theorem) Let \(\succeq \) be a qualitative probability defined over the Boolean algebra \({\mathcal B}\). If \(\succeq \) satisfies P6\('\),  then there is a unique numeric probability \(\mu \), defined over \({\mathcal B}\), which represents \(\succeq \). That probability has the following property:

\((\ddagger )\) :

For every event, A, every \(\rho \in (0,1)\), and every \(\epsilon >0\) there exists an event \(B\subseteq A\), such that \((\rho - \epsilon )\cdot \mu (A) \le \mu (B) \le \rho \cdot \mu (A)\).

Remark 3.5

  1. (1)

    Probabilities satisfying (\(\dagger \)) were called in Sect. 1 “complete” and those satisfying (\(\ddagger \)) were called “weakly complete.”

  2. (2)

    Given a numeric probability \(\mu \), let a \(\rho \)-portion of an event A be any event \(B \subseteq A\) such that \(\mu (B) = \rho \cdot \mu (A)\) . Then (\(\dagger \)) means that, for every \(0< \rho < 1\), every event has a \(\rho \)-portion. (\(\ddagger \)) is a weaker condition: for every A, and for every \(\rho \in (0,1)\), there are \(\rho '\)-portions of A, where \(\rho '\) can be strictly smaller than \(\rho \) but arbitrarily close to it.

  3. (3)

    For the case \(A=S\), (\(\dagger \)) implies that the set of values of \(\mu \) is the full interval [0, 1]. But (\(\ddagger \)) only implies that the set of values is dense in [0, 1]. Obviously, the satisfaction of P6\('\) implies that the Boolean algebra is infinite, but, as indicated in Sect. 3.3 it can be countable, in which case (\(\dagger \)) must fail.

  4. (4)

    That the constructed probability is complete, i.e., satisfies (\(\dagger \)), is proven in Chapter 3 of Savage (1972), which is devoted to probabilities. This property is used much later in the derivation of expected utilities in Chapter 5. In Sect. 3.2.4 below we will show that the probability that is constructed without assuming the \(\sigma \)-algebra assumption is weakly complete, and in Sect. 4 we will show that weak completeness is sufficient for assigning utilities to consequences. As remarked in (3), (\(\dagger \)) implies that the set of values of is the real interval [0, 1], implying that the Boolean algebra must have the power of the continuum. There are however examples of countable models that satisfy all the required postulates of Savage (Theorem 3.3.5). Therefore, one cannot prove that the probability satisfies (\(\dagger \)), without the \(\sigma \)-algebra assumption.

3.1.3 Savage’s original proof

The proof is given in the more technical part of the book (Savage 1972, pp. 34–38). The presentation seems to be based on working notes, reflecting a development that led Savage to P6\('\). Many proofs consists of numbered claims and sub-claims, whose proofs are left to the reader (some of these exercises are difficult). Some of the theorems are supposed to provide motivation for P6\('\), which is introduced (on p. 38) after the technical part: “In the light of Theorems 3 and 4, I tentatively propose the following postulate ....” Some of the concepts that Savage employs have only historical interest. While many of these concepts are dispensable if P6\('\) is presupposed, some remain useful for clarifying the picture and are therefore used in later textbooks (e.g., Kreps 1988, p. 123). We shall use them as well.

Definition 3.6

(Fineness) A qualitative probability is fine if for every \(E\succ \varnothing \) there is a partition \(\{P_i\}_{i=1}^n\) of S such that \(E\succ P_i\), for every \(i=1,\ldots ,n\).

Definition 3.7

(Tightness) A qualitative probability is tight, if whenever \(E\succ F\), there exists \(C\succ \varnothing \), such that \(E\succ F\cup C \succ F\).

Obviously the fineness property is a special case of P6\('\), where the smaller set is \(\varnothing \). It is easy to show that \(\text{ P6 }' \iff \text{ fineness } + \text{ tightness }\), and in this “decomposition,” tightness is “exactly” what is needed in order to pass from fineness to P6\('\).

Remark 3.8

  1. (1)

    Savage’s definition of “tightness” (p. 34) is different from the notion of tightness given above—it is more complicated and has only historical interest, although the two are equivalent if we presuppose fineness.

  2. (2)

    Let us say that the probability function \(\mu \)almost represents\(\succeq \) (in Savage’s terminology “almost agrees with” \(\succeq \)) if, for any EF:

    $$\begin{aligned} E\succeq F\Longrightarrow \mu (E)\ge \mu (F). \end{aligned}$$
    (3.4)

    Since \(E\not \succeq F \Rightarrow F \succ E\) it is easily seen that if \(\mu \) almost represents \(\succeq \) then it represents \(\succeq \) iff

    $$\begin{aligned} E\succ F\Longrightarrow \mu (E) > \mu (F) \end{aligned}$$
    (3.5)

    Savage’s proof presupposes fineness, and its upshot is the existence of a unique \(\mu \) that almost represents \(\succeq \). Now fineness implies that if \(E \succ \varnothing \), then \(\mu (E) > 0\).Footnote 14 With tightness added, this implies (3.5). Hence, under P6\('\), \(\mu \) is the unique probability representing \(\succeq \).

Savage’s proof can be divided into three parts. Part I introduces the concept of an almost uniform partition, which plays a central role in the whole proof, and proves the theorem that links almost uniform partitions to the existence of numeric probabilities. Before proceeding recall the following:

  1. 1.

    A partition of B is a collection of disjoint subsets of B, referred to as parts, whose union is B. We presuppose that the number of parts is \( > 1\) and is finite and that B is non-null, i.e., \(B\succ \varnothing \).

  2. 2.

    It is assumed that no part is a null-event, unless this is explicitly allowed.

  3. 3.

    By an n-partition we mean a partition into n parts (this is what Savage calls n-fold partition).

  4. 4.

    We adopt self-explanatory expressions, like “a partition \(A = A_1\cup \dots \cup A_n\)” which means that the sets on the right-hand side are a partition of A.

Definition 3.9

An almost uniform partition of an event B is a partition of B into a finite number of disjoint events, such that the union of any \(r+1\) parts is weakly more probable than the union of any r parts. An almost uniform n-partition of B is a n-partition of B which is almost uniform.

The main result of Part I comprises what in Savage’s enumeration are Theorem 1 and its proof, and the first claim of Theorem 2 (on the bottom of p. 34), and its proof. The latter consists of steps 1–7 and ends in the middle of p. 36. All in all, the result in Part I is:

Theorem 3.10

If, for arbitrary large numbers, n, there are almost uniform n-partitions of S, then there exists a unique numerical probability \(\mu \) which almost represents \(\succeq \).

The proof of this result consists mainly of direct computational/combinatorial arguments; it is given with sufficient details and does not use the \(\sigma \)-algebra assumption. We shall take the theorem and its proof for granted.

Part II consists in showing that fineness and the \(\sigma \)-algebra assumption imply that there exist almost uniform n-partitions for arbitrary large numbers n (together with the theorems of Part I this yields a unique probability that almost represents the qualitative one). This part is done in Theorem 3. The latter consists of a sequence of claims, referred to as “parts,” in which later parts are to be derived from earlier ones. The arrangement is intended to help the reader to find the proofs. For the more difficult parts, additional details are provided. Many claims are couched in terms that have only historical interests. For our purposes, we need only to focus on a crucial construction that uses what we shall call “iterated 3-partitions” (cf. Sect. 3.1.4 below). This construction is described in the proof of Part 5 (on the top of p. 35). As a last step it involves the crucial use of the \(\sigma \)-algebra assumption, we shall return to this step shortly.

Part III of Savage’s proof consists in the second claim of the aforementioned Theorem 2. It asserts that the numeric probability, which is derivable from the existence of almost uniform n-partitions for arbitrary large n’s, satisfies \((\dag )\). The proof consists in three claims, 8a, 8b, 8c, the last of which relies on on the \(\sigma \)-algebra assumption. The parallel part of our proof is the derivation of \((\ddagger )\) without using the \(\sigma \)-algebra assumption. The proof is given in Sect. 3.2.4 below.

3.1.4 Savage’s method of iterated 3-partitions

In order to prove Part 5 of Theorem 3, Savage claims that the following is derivable from the laws of qualitative probabilities and fineness.

Theorem 3.11

(Savage) For any given \(B\succ \varnothing \) there exists an infinite sequence of 3-partitions of B: \(\{C_n, D_n, G_n\}_n\), which has the following properties :Footnote 15

  1. (1)

    \(C_n \cup G_n \succeq D_n\) and \(D_n \cup G_n \succeq C_n\)

  2. (2)

    \(C_n \subseteq C_{n+1}\) , \(D_n \subseteq D_{n+1}\), hence \(G_n \supseteq G_{n+1}\)

  3. (3)

    \(G_n-G_{n+1} \succeq G_{n+1}\)

    These properties imply that \(G_n\) becomes arbitrary small as \(n\rightarrow \infty \), that is:

  4. (4)

    For any \(F \succ \varnothing \), there exists n such that \(G_m \prec F\) for all \(m\ge n\).

Note

Condition (3) in Theorem 3.11 means that \(G_n\) is a disjoint union of two subsets, \(G_n = G_{n+1} \cup (G_n - G_{n+1})\), each of which is \(\succeq G_{n+1}\). In this sense \(G_{n+1}\) is less than or equal to “half of \(G_n\)”. Had the probability been numeric we could have omitted the scare quotes; it would have implied that the probabilities of \(G_n\) tend to 0, as \(n \rightarrow \infty \). In the case of a qualitative probability the analogous conclusion is that the sets become arbitrary small, in the non-numerical sense.

Savage provides an argument, based on fineness, which derives (4) from the previous properties. The argument is short and is worth repeating: Given any \(F\succ \varnothing \), we have to show that, for some n, \(G_n \prec F\). Assume, for contradiction, that this is not the case. Then \(F\preceq G_n\), for all ns. Now fineness implies that there is a partition \(S = P_1\cup \cdots \cup P_m\) such that \(P_i \preceq F\), for \(i = 1,\ldots ,m\). If \(F \preceq G_n\), then \(P_1 \preceq G_{n}\), hence \(P_1 \cup P_2 \preceq G_{n-1}\), hence \(P_1 \cup P_2 \cup P_3 \cup P_4 \preceq G_{n-2}\), and so on. Therefore, if \(2^{k-1} \ge m\), then \(S \preceq G_1\), which is a contradiction.

Definition 3.12

Call an infinite sequence of 3-partitions of B, which satisfies conditions (1), (2), (3), a Savage chain for B. We say that the chain passes through a 3-partition of B, if the 3-partition occurs in the sequence.

We presented the theorem so as to conform with Savage’s notation and the capital letters he used. Later we shall change the notation. We shall use ordered triples for the 3-partition and place in the middle the sets that play the role of the \(G_n\)’s. The definition just given can be rephrased of course in terms of our later terminology.

Figure 1 is an illustration of a Savage chain. Presenting the Savage chain as a sequence of triples with the \(G_n\)s in the middle, makes for better pictorial representation. And it is essential when it comes to trees.

Fig. 1
figure 1

Savage’s error reducing partitions

The fact that \( D_n \cup G_n \succeq C_n,\; C_n\cup G_n \succeq D_n\), and the fact that \(G_n\) becomes arbitrary small suggest that \(G_n\) plays the role of a “margin of error” in a division of the set into two, roughly equivalent parts. Although the error becomes arbitrary small, there is no way of getting rid of it. At this point Savage uses the\(\sigma \)-algebra assumption, he puts:

$$\begin{aligned} B_1=\bigcup _n C_n \quad \quad \text {and}\quad \quad B_2=\Big (\bigcup _n D_n\Big )\cup \Big (\bigcap _n G_n\Big ). \end{aligned}$$
(3.6)

Remark 3.13

The rest of Savage’s proof is not relevant to our work. For the sake of completeness, here is a short account of it. \(B_1, B_2\) form a partition of B, and \(\bigcap _n G_n \equiv \varnothing \). Assuming P6\('\), one can show that \(B_1 \equiv B_2\); but Savage does not use P6\('\) (a postulate that is introduced after Theorem 3), hence he only deduces that \(B_1\) and \(B_2\) are what he calls “almost equivalent”—one of the concepts he used at the time, which we need not go into. By iterating this division he proves that, for every n, every non-null event can be partitioned into \(2^n\) almost equivalent events. At an earlier stage (Part 4) he states that every partition of S into almost equivalent events is almost uniform. Hence, there are almost uniform n-partitions of S for arbitrary large ns. This together with the first claim of his Theorem 2 (Theorem 3.10 in our numbering) proves the existence of the required numeric probability.

We eliminate the \(\sigma \)-algebra assumption by avoiding the construction of (3.6). We develop, instead, a technique of using trees, which generates big partitions, and many “error parts,” which can be treated simultaneously. We use it in order to get almost uniform partitions.

3.2 Eliminating the \(\sigma \)-algebra assumption by using tripartition trees

So far, trying to follow faithfully the historical development of Savage’s system, we presupposed fineness rather than P6\('\). If we continue to do so the proof will be burdened by various small details, and we prefer to avoid this.Footnote 16 From now on we shall presuppose P6\('\).Footnote 17

First, we give the 3-partitions that figure in Savage’s construction a more suggestive form, suitable for our purposes:

Definition 3.14

(Tripartition). A Savage tripartition or, for short, a tripartition of a non-null event, B, is an ordered triple (CED) of disjoint events such that:

  1. i.

    \(B=C\cup E\cup D\)

  2. ii.

    \(C,D\succ \varnothing \),

  3. iii.

    \( C\cup E\succeq D\) and \(E\cup D \succeq C\).

We refer to E as the error part, or simply error, and to \(C\ \text{ and }\ D\) as the regular parts.

We allow E to be a null-set, i.e., \(E\equiv \varnothing \), including \(E = \varnothing \). The case \(E=\varnothing \) constitutes the extreme case of a tripartition, where the error is \(\varnothing \). In diagrams, \(\varnothing \) serves in this case as a marker that separates the two parts.Footnote 18

3.2.1 Tripartition trees

Recall that a binary partition tree is a rooted ordered tree whose nodes are sets, such that each node that is not a leaf has two children that form a 2-partition of it. By analogy, a tripartition tree, \({\mathcal T}\), is a rooted ordered tree such that: (1) The nodes are sets, which are referred to as parts, and they are classified into regular parts, and error parts. (2) The root is a regular part. (3) Every regular part that is not a leaf has three children that constitute a tripartition of it. (4) Error-parts have no children.

Figure 2 provides an illustration of a tripartition tree, written top down, in which the root is the event A, and the error-parts are shaded.

Note

No set can occur twice in a partition tree. Hence we can simplify the structure by identifying the nodes with the sets; we do not have to construe it as a labeled tree. (In the special cases in which the error is empty, \(\varnothing \) can occur more than once, but this should not cause any confusion.)

Fig. 2
figure 2

A tripartition tree \({\mathcal T}\) of A

3.2.2 Additional concepts, terminologies, and notations

  1. 1.

    The levels of a tripartition tree are defined as follows: (1) level 0 contains the root; (2) level \(n+1\) contains all the children of the regular nodes on level n; (3) level \(n+1\) contains all error nodes on level n.

  2. 2.

    Note that this means that, once an error-part appears on a certain level it keeps reappearing on all higher levels.

  3. 3.

    A tripartition tree is uniform if all the regular nodes that are leaves are on the same level. From now on we assume that the tripartition trees are uniform, unless indicated otherwise.

  4. 4.

    The height of a finite tree \({\mathcal T}\) is n, where n is the level of the leaves that are regular nodes. If the tree is infinite its height is \(\infty \).

  5. 5.

    A subtree of a tree is a tree consisting of some regular node (the root of the subtree) and all its descendants.

  6. 6.

    The truncation of a tree \({\mathcal T}\) at level m, is the tree consisting of all the nodes of \({\mathcal T}\) whose level is \(\le m\). (Note that if \(m\ge \) height of \({\mathcal T}\), then truncation at level m is the same as \({\mathcal T}\).)

  7. 7.

    Strictly speaking, the root by itself does not constitute a tripartition tree. But there is no harm in regarding it as the truncation at the 0 level, or as a tree of height 0.

Remark 3.15

  1. (1)

    An ordered tree is one in which the children of any node are ordered (an assignment, which assigns to every node an ordering of its children, is included in the structure). Sometimes the trees must be ordered, e.g., when they are used to model syntactic structures of sentences. But sometimes an ordering is imposed for convenience; it makes for an easy way of locating nodes and for a useful two-dimensional representation. In our case, the ordering makes it possible to locate the error-parts by their middle positions in the triple.Footnote 19

  2. (2)

    The main error part of a tree is the error part on level 1.

  3. (3)

    It is easily seen that on level k there are \(2^k\) regular parts and \(2^k - 1\) error-parts. We use binary strings of length k to index the regular parts, and binary strings of length \(k-1\) to index the error-parts, except for the main error-part. Figure 1 shows how this is done. The main error-part of that tree is E. We can regard the index of E as the empty binary sequence.

  4. (4)

    We let \({\mathcal T}\) range over tripartitions trees and \({\mathcal T}_A\) over tripartition trees of A. We put \({\mathcal T}= {\mathcal T}_A\) in order to say that \({\mathcal T}\) is a tripartition tree with root A . To indicate the regular and error parts we put: \({\mathcal T}_A=(A_\sigma ,E_\sigma )\), where \(\sigma \) ranges over the binary sequences (it is understood that the subscript of E ranges over sequences of length smaller by 1 than the subscript of A.) To indicate also the height k, we put: \({\mathcal T}_{A,k}=(A_\sigma ,E_\sigma )_k\). Various parameters will be omitted if they are understood from the context.

Definition 3.16

(Total error) The total error of a tree \({\mathcal T}\), denoted \(E({\mathcal T})\), is the union of all error-parts of \({\mathcal T}\). That is to say, if \({\mathcal T}= {\mathcal T}_A =(A_\sigma ,E_\sigma )\), then \(E({\mathcal T})=_{\text{ Df }} \bigcup _\sigma E_\sigma \).

If \({\mathcal T}\) is of height k then \(E({\mathcal T})\) is the union of all error-parts on the k-level of \({\mathcal T}\). This is obvious, given that all error-parts of level j, where \(j<k\), reappear on level \(j+1\). For the same reason, if \(j<k\), then the total error of the truncated tree at level j is the union of all error-parts on level j.

Now recall that a Savage tripartition (CED) has the property that \(C\cup E\succeq D\) and \(C\preceq E\cup D\) (cf. Definition 3.14). This property generalizes to tripartition trees:

Theorem 3.17

Let \({\mathcal T}_A\) be a partition tree of A of height k, then, for any regular parts \(A_\sigma , A_{\sigma '}\) on the kth level, the following holds;

$$\begin{aligned} {\begin{matrix} A_\sigma \cup E({{\mathcal T}_A})&{}\succeq A_{\sigma '}\\ A_{\sigma '}\cup E({{\mathcal T}_A})&{}\succeq A_{\sigma } \end{matrix}} \end{aligned}$$
(3.7)

Proof

We prove the theorem by induction on k. For \(k=1\) the claim holds since \((A_0,E,A_1)\) is just a Savage tripartition. For \(k>1\), let \({\mathcal T}^*_A=(A_\sigma ,E_\sigma )_{k-1}\) be the truncated tree consisting of the first \(k\!-1\!\) levels of \({\mathcal T}_A\). By the inductive hypothesis, for all regular parts \(A_\tau , A_{\tau '}\) on the \(k-1\) level of \({\mathcal T}^*_A\),

$$\begin{aligned} \begin{aligned} A_\tau \cup E({{\mathcal T}^*_A})&\succeq A_{\tau '}\\ A_{\tau '}\cup E({{\mathcal T}^*_A})&\succeq A_{\tau } \end{aligned} \end{aligned}$$
(3.8)
Fig. 3
figure 3

Claim in the proof of Theorem 3.17

The rest of the proof relies on the following claim.

Claim

Assume that the following holds (as illustrated in Fig. 3):

$$\begin{aligned} \begin{aligned}&A_1\cup E_1 \succeq B_1 \quad \quad B_1\cup E_1 \succeq A_1\\&A_2\cup E_2 \succeq B_2 \quad \quad B_2\cup E_2 \succeq A_2\\ \end{aligned} \end{aligned}$$
(3.9)

and

$$\begin{aligned} \begin{aligned}&\Big (A_1\cup E_1\cup B_1\Big )\cup E \succeq \Big (A_2\cup E_2\cup B_2 \Big )\\&\Big (A_2\cup E_2 \cup B_2\Big )\cup E \succeq \Big (A_1\cup E_1\cup B_1\Big ). \end{aligned} \end{aligned}$$
(3.10)

Then \(C_1\cup (E_1\cup E\cup E_2)\succeq C_2\), where \(C_1\) is either \(A_1\) or \(B_1\) and \(C_2\) is either \(A_2\) or \(B_2\). That is, the union of any regular part on one side of E with \(E_1\cup E \cup E_2\) is \(\succeq \) any regular part on the other side of E.

Proof of Claim

WLOG, it is sufficient to show this for the case \(C_1=A_1\) and \(C_2=A_2\). The other cases follow by symmetry. Thus, we have to prove:

$$\begin{aligned} A_1\cup (E_1\cup E\cup E_2)\succeq A_2. \end{aligned}$$
(3.11)

Now, consider the following two cases:

  1. (1)

    If \(B_1\succeq A_2\), then we have

    $$\begin{aligned} (A_1\cup E_1)\cup E\cup E_2\succeq B_1\cup E\cup E_2\succeq A_2 \cup E\cup E_2\succeq A_2. \end{aligned}$$
  2. (2)

    Otherwise \(B_1\prec A_2\). Suppose, to the contrary, that (3.11) fails, that is, \(A_2\succ A_1\cup (E_1\cup E\cup E_2)\). Since \(A_1,E_1,E,E_2,A_2,B_2\) are mutually exclusive, we have:

    $$\begin{aligned} \begin{aligned} A_2\cup B_2&\succ A_1\cup (E_1\cup E\cup E_2)\cup B_2\\&\succeq A_1\cup E_1\cup E\cup A_2\\&\succ A_1\cup E_1\cup E\cup B_1. \end{aligned} \end{aligned}$$

    The first inequality follows from the properties of qualitative probability. The second inequality holds because \(E_2\cup B_2 \succeq A_2\) in (3.9) and the third holds since we assume that \(A_2\succ B_1\). But, again from (3.9), we have \(A_1\cup E_1\cup E\cup B_1\succeq A_2\cup E_2\cup B_2\succeq A_2\cup B_2\). Contradiction. This proves (3.11).

By symmetry, other cases hold as well. This completes the proof of the Claim.

Getting back to the proof of the theorem, assume WLOG that in (3.7) \(A_\sigma \) is to the left of \(A_\sigma '\). Now each of them is a regular part of a tripartition of a regular part on level \(k-1\). Consider the case in which \(A_\sigma \) appears in a tripartition of the form \((A_\sigma ,E_\lambda ,B_\sigma )\) and \(A_\sigma '\) appears in a tripartition of the form \((B_{\sigma '},E_{\lambda '},A_{\sigma '})\). There are other possible cases, but they follow from this case by symmetry arguments. In fact, using the the “rotation automorphisms” described in footnote 19, they can be converted to each other. We get:

$$\begin{aligned} \begin{aligned} A_\sigma \cup E_\lambda \succeq B_\sigma \quad&\quad B_\sigma \cup E_\lambda \succeq A_\sigma \\ A_{\sigma '}\cup E_{\lambda '} \succeq B_{\sigma '} \quad&\quad B_{\sigma '}\cup E_{\lambda '} \succeq A_{\sigma '}. \end{aligned} \end{aligned}$$
(3.12)

Since \(A_\sigma \cup E_\lambda \cup B_\sigma \) and \(A_{\sigma '}\cup E_{\lambda '}\cup B_{\sigma '}\) are regular parts on the \(k-1\) level of \({\mathcal T}_A\), the inductive hypothesis (3.8) implies:

$$\begin{aligned} \begin{aligned} \Big (A_\sigma \cup E_\lambda \cup B_\sigma \Big )\cup E({{\mathcal T}^*_A})&\succeq \Big (A_{\sigma '}\cup E_{\lambda '}\cup B_{\sigma '}\Big )\\ \Big (A_{\sigma '}\cup E_{\lambda '}\cup B_{\sigma '}\Big )\cup E({{\mathcal T}^*_A})&\succeq \Big (A_\sigma \cup E_\lambda \cup B_\sigma \Big ). \end{aligned} \end{aligned}$$
(3.13)

Clearly, (3.12) and (3.13) are a substitution variant of (3.9) and (3.10). Therefore the Claim implies:

$$\begin{aligned} \begin{aligned} A_\sigma \cup \big (E({\mathcal T}^*)\cup E_\lambda \cup E_{\lambda '}\big )&\succeq A_{\sigma '}\\ A_{\sigma '} \cup \big (E({\mathcal T}^*)\cup E_\lambda \cup E_{\lambda '}\big )&\succeq A_{\sigma }. \end{aligned} \end{aligned}$$
(3.14)

Since \(E({\mathcal T})\) is disjoint from both \( A_\sigma \) and \(A_{\sigma '}\) and \(E({\mathcal T}^*)\cup E_\lambda \cup E_{\lambda '} \subseteq E({\mathcal T})\), we get (3.7). \(\square \)

3.2.3 The error reduction method for trees

Note that trees that have the same height are structurally isomorphic and there is a unique one-to-one correlation that correlates the parts of one with the parts of the other. We have adopted a notation that makes clear, for each part in one tree, the corresponding part in the other tree. This also holds if one tree is a truncation of the other. The indexing of the regular parts and the error parts in the truncated tree is the same as in the whole tree.

Definition 3.18

(Error reduction tree). Given a tree, \({\mathcal T}_A=(A_\sigma ,E_\sigma )_k\), an error-reduction of\({\mathcal T}\) is a tree with the same root and the same height \({\mathcal T}'_A=(A'_\sigma ,E'_\sigma )_k\), such that for every \(\sigma \), \(A_\sigma \subseteq A'_\sigma \). We shall also say in that case that \({\mathcal T}'\) is obtained from\({\mathcal T}\)by error reduction.

Remark 3.19

  1. (1)

    A is the union of all the regular leaves and the total error, \(E({\mathcal T}_A)\). If every regular part weakly increases, it is obvious that the total error weakly decrease: \(E({\mathcal T}')\subseteq E({\mathcal T})\). Thus, the term ‘error-reduction’ is justified. The reverse implication is of course false in general. The crucial property of error-reducing is that, in the reduction of the total error, every regular part (weakly) increases as a set.

  2. (2)

    The reduction of \(E({\mathcal T})\) is in a weak sense, that is: that is, \(E({\mathcal T}') \subseteq E({\mathcal T})\). The strong sense can be obtained by adding the condition \(E({\mathcal T}') \prec E({\mathcal T})\). But, in view of our main result, we do not need to add it explicitly as part of the definition.

  3. (3)

    Error reductions of Savage tripartitions (i.e., triples) is the simplest case of error reduction of trees: each of the two regular parts weakly increases and the error part weakly decreases—this is the error reduction in trees of height 1.

  4. (4)

    It is easily seen that if \({\mathcal T}'\) is an error-reduction of \({\mathcal T}\) and \({\mathcal T}''\) is an error-reduction of \({\mathcal T}'\), then \({\mathcal T}''\) is an error-reduction of \({\mathcal T}\).

The proof of our central result is that, given any tripartition tree, there is an error-reduction of it in which the total error is arbitrarily small. That is, for every non-null set F, there is an error-reduction tree of total error \(\preceq F\). The proof uses a certain operation on tripartition trees, which is defined as follows.

Definition 3.20

(Mixed sum). Let \({\mathcal T}_A=(A_\sigma ,E_\sigma )_k\) and \({\mathcal T}'_{A'}=(A'_\sigma ,E'_\sigma )_k\) be two tripartition trees of two disjoint events (i.e., \(A\cap A'=\varnothing \)), of the same height, k. Then the mixed sum of \({\mathcal T}_A\) and \({\mathcal T}'_{A'}\) , denoted \({\mathcal T}_{A}\oplus {\mathcal T}'_{A'}\), is the tree of height k, defined by:

$$\begin{aligned} {\mathcal T}_{A}\oplus {\mathcal T}'_{A'}= {( A_\sigma \cup A'_\sigma ,\, E_\sigma \cup E'_\sigma )}_k. \end{aligned}$$
(3.15)

The notation \({\mathcal T}_{A}\oplus {\mathcal T}'_{A'}\) is always used under the assumption that A and A\('\) are disjoint and the trees are of the same height.

Lemma 3.21

  1. (1)

    \({\mathcal T}_{A}\oplus {\mathcal T}'_{A'}\) is a tripartition tree of \(A\cup A'\) whose total error is \( E({{\mathcal T}_A})\cup E({\mathcal T}'_{A'})\).

  2. (2)

    If \({\mathcal T}_A^{*}\) and \({\mathcal T}_{A'}^{+}\) are, respectively, error reductions of \({\mathcal T}_{A}\) and \({\mathcal T}'_{A'}\), then \({\mathcal T}^{*}_A \oplus {\mathcal T}_{A'}^{+}\) is an error reduction of \({\mathcal T}_{A}\oplus {\mathcal T}'_{A'}\)

Proof

The operation \(\oplus \) consists in taking the union of every pair of corresponding parts, which belong to tripartitions of two given disjoint sets. Therefore, the first claim follow easily from the definitions of tripartition trees and the laws of qualitative probability (stated in Definition 3.2). For example, for every binary sequence, \(\sigma \), of length < height of the tree, we have \(A_{\sigma ,0}\cup E_\sigma \succeq A_{\sigma ,1}\) and \(A'_{\sigma ,0}\cup E'_\sigma \succeq A'_{\sigma ,1}\). In each inequality the sets are disjoint, and every set in the first inequality is disjoint from every set in the second inequality. Hence, by the axioms of qualitative probability we get:

$$\begin{aligned} \big (A_{\sigma ,0}\cup A'_{\sigma ,0}\big )\cup \big (E_\sigma \cup E'_\sigma \big )\succeq \big (A_{\sigma ,1}\cup A'_{\sigma ,1}\big ) \end{aligned}$$

The second claim follows as easily from the definition of error-reduction and the laws of Boolean algebras. \(\square \)

Theorem 3.22

(Error reduction) For any tripartition tree \({\mathcal T}_A\) and any non-null event F, there is an error-reduction tripartition \({\mathcal T}^*_A\) such that \(E({\mathcal T}^*_A)\preceq F\).

Proof

We prove the theorem by induction on k, where \(k =\) height of \({\mathcal T}_A\). If \(k=0\), then formally \({\mathcal T}_A\) consists of A only. Hence the base case is \(k=1\), and the only error part is on level 1. Let the tripartition on level 1 be \((A_0, E, A_1)\). We now apply the following result that is implied by Fishburn’s reconstruction of the proofs that Savage did not include in his bookFootnote 20:

Claim

Given any tripartition \((C_0, E_0, D_0)\), there is a sequence of tripartitions \((C_n, E_n , D_n),\)\(n = 1, 2, \ldots \) that constitute a Savage chain such that \((C_1, E_1, D_1)\) is an error reduction of \((C_0, E_0, D_0)\).

Applying this Claim to the case \((C_0, E_0, D_0)=(A_0, E, A_1)\) we get an infinite Savage chain that begins with \((A_0, E, A_1)\). For some n, \(E_n \preceq F\). This proves the base case.

Note

Before proceeding, observe that, for any integer \(m>1\), every non-null event F can be partitioned into m disjoint non-null events. This is an easy consequence of fineness.Footnote 21 In what follows we use a representation of ordered partition trees of the form:

$$\begin{aligned} \Big [{\mathcal T}_{B_1},\ldots , {\mathcal T}_{B_m}\Big ] \end{aligned}$$

where \(m>1\) and the \(B_i\)’s are disjoint non-null sets. This includes the possibility that some \({\mathcal T}_{B_i}\)’s are of height 0, in which case we can replace \({\mathcal T}_{B_i}\) by \(B_i\). The root of the tree is the union of the \(B_i\)’s, and the \(B_i\)’s are its children, ordered as indicated by the indexing. The whole tree is not necessarily a tripartition tree, but each of the m subtrees is. For example, \([B, B', {\mathcal T}_C, {\mathcal T}_D]\) denotes a partition tree in which \((B, B', C, D)\) is a 4-partition of the root, the root being the union of these sets, B and \(B'\) are leaves, and C and D are roots of the tripartition trees \({\mathcal T}_C\) and \({\mathcal T}_D\).

Now, for the inductive step, assume that the induction hypothesis holds for k and let \({\mathcal T}_A\) be a tripartition tree of height \(k+1\). Then \({\mathcal T}_A\) is of the form:

$$\begin{aligned} \Big [{\mathcal T}_{B_l},\ E,\ {\mathcal T}_{B_r}\Big ] \end{aligned}$$

where \({\mathcal T}_{B_l}\) and \({\mathcal T}_{B_r}\) are of height k. Given any \(F \succ \varnothing \), we have to construct a tree-partition of \({\mathcal T}_A\) of total error \(\prec F\). Partition the given F into 5 non-null events: \(F_1,\, F_2,\, F_3,\, F_4, F_5\); as observed above, this is always possible.

If E is a null set, then we apply the induction hypotheses to each of \({\mathcal T}_{B_l}\) and \({\mathcal T}_{B_r}\), get error-reductions in which the total errors are, respectively, less-than-or-equal-to \(F_1\) and \(F_5\), and we are done. Otherwise we proceed as follows.

Using again the Claim from Fishburn’s reconstruction, we get a tripartition of E: \((C_l, E^*, C_r)\), where \(E^* \preceq F_3\). Ignoring for the moment the role of \(E^*\) as an error part, we get:

$$\begin{aligned} \Big [{\mathcal T}_{B_l},\ C_l,\ E^*,\ C_r,\ {\mathcal T}_{B_r}\Big ] \end{aligned}$$

Note that in this partition the root, which is A, is first partitioned into 5 events; \(B_l\) and \(B_r\) are roots of tripartition trees of height k, and \(C_l,\ E^*,\ \text{ and }\ C_r\) are leaves. Using the induction hypothesis, get an error-reduction \({\mathcal T}^*_{B_l}\) of \({\mathcal T}_{B_l}\) and an error-reduction \({\mathcal T}^*_{B_r}\) of \({\mathcal T}_{B_r}\), such that \(E({\mathcal T}^*_{B_l}) \preceq F_1\), and \(E({\mathcal T}^*_{B_r}) \preceq F_5\). Get an arbitrary tripartition tree \({\mathcal T}_{C_l}\) of \(C_l\), and an arbitrary tripartition tree \({\mathcal T}_{C_r}\) of \(C_r\) each of height k (every non-null set has a tripartition tree of any given height). Using again the inductive hypothesis, get error-reductions, \({\mathcal T}^*_{C_l}\) and \({\mathcal T}^*_{C_r}\), such that \(E({\mathcal T}^*_{C_l}) \preceq F_2\), and \(E({\mathcal T}^*_{C_l}) \preceq F_4\). This gives us the following partition of A:

$$\begin{aligned} \Big [{\mathcal T}^*_{B_l},\ {\mathcal T}^*_{C_l},\ E^*,\ {\mathcal T}^*_{C_r},\ {\mathcal T}^*_{B_r}\Big ]. \end{aligned}$$

Now, put \({\mathcal T}_{A_0} = {\mathcal T}^*_{B_l} \oplus {\mathcal T}^*_{C_l}\) and \({\mathcal T}_{A_1} = {\mathcal T}^*_{B_r} \oplus {\mathcal T}^*_{C_r}\), then

$$\begin{aligned} \Big [{\mathcal T}_{A_0},\ E^*,\ {\mathcal T}_{A_1}\Big ] \end{aligned}$$

is a tripartition tree of A of height \(k+1\). Call it \({\mathcal T}^*_A\). By Lemma 3.21, \(E({\mathcal T}_{A_0}) \preceq F_1\cup F_2\) and \(E( {\mathcal T}_{A_1}) \preceq F_4\cup F_5\). Since \(E^* \preceq F_3\), together we get: \(E({\mathcal T}^*_A) \preceq F\). \(\square \)

Theorem 3.22 is our main result and we shall refer to it as the error reduction theorem, or, for short, error reduction. We shall also use error reduction for the process in which we get tripartition trees in which the error is reduced.

Remark 3.23

In a way, this theorem generalizes the construction of monotonically decreasing sequence of error-parts in Theorem 3.11. But, instead of reducing a single error-part (the shaded areas in Fig. 1), the method we use reduces simultaneously all error-parts in a tripartition tree.

3.2.4 Almost uniform partitions

Recall that a partition \(\{P_i\}_{i=1}^n\) of a non-null event A is almost uniform if the union of any r members of the partition is not more probable than the union of any \(r+1\) members. In Theorem 3.10 we rephrased a result by Savage, which claims that if, for arbitrary large values of n there are almost uniform n-partitions of S, then there is a unique numeric probability that almost represents the underlying qualitative one. We noted that Savage’s proof requires no further assumptions regarding the qualitative probability, and that if we assume P6\('\) then the probability (fully) represents the qualitative one (cf. Remark 3.8 above). Using repeated error reductions, we shall now show that for arbitrary large ns there are almost uniform n-partitions of S.

Definition 3.24

Given \(C\succ \varnothing \), let us say that \(B\ll \frac{1}{n}C\) if there is a sequence \(C_1, C_2, \ldots , C_n\), of n mutually disjoint subsets of C, such that \(C_1\preceq C_2\preceq \cdots \preceq C_n\) and \(B\preceq C_1\).

The following are some simple intuitive properties of \(\ll \). The first two are immediate from the definition, and in the sequel we shall need only the first.

Lemma 3.25

  1. (1)

    If \(B\ll \frac{1}{n}C\), and if \(A\preceq B\, and\, C\subseteq D\) then \(A\ll \frac{1}{n}D\).Footnote 22

  2. (2)

    If \(B\ll \frac{1}{n}C\) then \(B\ll \frac{1}{m}C\) for all \(m<n\).

  3. (3)

    For any \(C,D\succ \varnothing \), there exists n such that, for all B,

    $$\begin{aligned} B\ll \frac{1}{n}C \implies B\preceq D. \end{aligned}$$
    (3.16)

Lemma 3.26

Let \({\mathcal T}=(A_\sigma ,E_\sigma ) \) be a tripartition tree of height k, then, given any n and any regular part \(A_\sigma \) on the kth level of \({\mathcal T}\), there is an error reduction, \({\mathcal T}'\), of \({\mathcal T}\), such that

$$\begin{aligned} E({\mathcal T}'_A)\ll \frac{1}{n}A'_\sigma . \end{aligned}$$
(3.17)

Here \(A'_\sigma \) is the part that corresponds to \(A_\sigma \) under the structural isomorphism of the two trees.

Proof

Fix \(A_\sigma \) and let \(\{C_i\}_{i=1}^n\) be a disjoint sequence of events contained in it as subsets, such that \(C_1\preceq C_2\preceq \ldots \preceq C_n\). Using error reduction, get a tree \({\mathcal T}'\) such that \(E({\mathcal T}')\preceq C_1\). Consequently, \(E({\mathcal T}'_A) \ll \frac{1}{n}A_\sigma \). Since the parts are disjoint and under the error reduction each regular part in \({\mathcal T}\) is a subset of its corresponding part in \({\mathcal T}'\), \(A'_\sigma \) is the unique part containing \(A_\sigma \) as a subset, which implies (3.17). \(\square \)

Lemma 3.27

Given any tripartition tree \({\mathcal T}=(A_\sigma ,E_\sigma )\) of height k and given any n, there is an error reduction \({\mathcal T}'=(A'_\sigma ,E'_\sigma )\) of \({\mathcal T}\) such that, for every regular part \(A'_\sigma \) on the kth level, \(E({\mathcal T}')\ll \frac{1}{n}A'_\sigma \).

Proof

Apply Lemma 3.26 repeatedly \(2^k\) times, as \(\sigma \) ranges over all the binary sequences of length k. Since the regular parts can only expand and the total error can only contract, we get at the end an error reduction, \({\mathcal T}'\), such that \(E({\mathcal T}')\ll \frac{1}{n}A'_\sigma \), for all \(\sigma \). \(\square \)

Theorem 3.28

Let \({\mathcal T}\) be a tripartition tree of height k, then there is an error reduction \({\mathcal T}'\) of \({\mathcal T}\) such that the following holds: If \(\Xi _1\) and \(\Xi _2\) are any two sets of regular parts of \({\mathcal T}'\) of the kth level that are of equal cardinality r, where \(r < 2^{k-1}\), and if \(A_\tau \) is any regular part of \({\mathcal T}'\) on the kth level that is not in \(\Xi _1\cup \Xi _2\), then we have

$$\begin{aligned} \bigcup _{A_\sigma \in \Xi _1}A_\sigma \cup E({\mathcal T}') \preceq \bigcup _{A_\sigma \in \Xi _2}A_\sigma \cup A_\tau . \end{aligned}$$
(3.18)

Proof

Apply Lemma 3.27 for the case where \(n=2^{k-1}\) and get a reduction tree \({\mathcal T}'\) of \({\mathcal T}\) such that \(E({\mathcal T}')\ll \frac{1}{2^{k-1}}A_\sigma \) for all regular parts \(A_\sigma \) on the kth level of \({\mathcal T}'\). Let \(\Xi _1\) and \(\Xi _2\) and \(A_\tau \) be as in the statement of the theorem, then we have:

$$\begin{aligned} E({\mathcal T}') \ll \frac{1}{2^{k-1}}A_\tau . \end{aligned}$$

By Definition 3.24, this means that there is a sequence \(\{C_i\}\) of disjoint subsets of \(A_\tau \) of length \(2^{k-1}\) such that:

$$\begin{aligned} \begin{aligned} E({\mathcal T}')\preceq C_1\preceq C_2\preceq \cdots&\preceq C_r \preceq C_{r+1}\preceq \cdots \preceq C_{2^{k-1}} \\ \bigcup _{i=1}^{2^{k-1}} C_i&\subseteq A_\tau \end{aligned} \end{aligned}$$
(3.19)

where r is the cardinality of \(\Xi _1\) and \(\Xi _2\).

Let \(A_1, A_2 \ldots , A_r\) and \(B_1, B_2 \ldots , B_r\) be enumerations of the members of \(\Xi _1\) and \(\Xi _2\), respectively. Obviously, we have \(E({\mathcal T}')\preceq A_i\) and \(E({\mathcal T}')\preceq B_i \) for all \(i=1,\ldots , r\). Since \(E({\mathcal T}')\preceq C_1\), we get by Theorem 3.17:

$$\begin{aligned} A_i\preceq B_i\cup E({\mathcal T}')\preceq B_i\cup C_i\quad \quad \text {for all }i=1,\ldots ,r. \end{aligned}$$

Since all parts are disjoint, we get \(\bigcup _{i=1}^r A_i \preceq \bigcup _{i=1}^r B_i \cup \bigcup _{i=1}^r C_i\), that is:

$$\begin{aligned} \bigcup _{A_\sigma \in \Xi _1}A_\sigma \cup E({\mathcal T}') \preceq \bigcup _{A_\sigma \in \Xi _2}A_\sigma \cup \bigcup _{i=1}^r C_i \preceq \bigcup _{A_\sigma \in \Xi _2}A_\sigma \cup A_\tau \end{aligned}$$
(3.20)

which is what we want. \(\square \)

Remark 3.29

The last theorem claims that for every tripartition tree \({\mathcal T}\) of height k, there is an error reduction \({\mathcal T}'\) such that, for every two disjoint sets, \(\Xi _1\) and \(\Xi _2\) of regular leaves (parts on level k) of equal cardinality, \(r<2^{k-1}\), if A is a leaf that does not belong to \(\Xi _1\cup \Xi _2\), then \(\bigcup \Xi _1\cup E({\mathcal T}')\preceq \bigcup \Xi _2\cup A\). (Here \(\bigcup \Xi _1\) is the union of all members of \(\Xi _1\).) It is not difficult to see that if A is any regular part of \({\mathcal T}'\), and if \(A_i (i=1,\dots ,2^k-1)\) are the rest of the regular leaves, then the following collection of sets is an almost uniform partition of S:

$$\begin{aligned} \big \{A\cup E({\mathcal T}'), A_1,\ldots ,A_{2^k-1}\big \}. \end{aligned}$$

Note that in comparing the qualitative probabilities of unions of two subsets of the family, we can assume that they have no common members, because common members can be crossed out, via the qualitative-probability rules. This implies that we need to compare only the union of r and \(r+1\) members, where \(2r+1\le 2^k\), which implies \(r<2^{k-1}\). Hence, we can assume the restriction on r in the last theorem. All in all, the last theorem implies that there are almost uniform partitions of S of arbitrary large sizes. This, as explained before, implies the existence of a unique finitely additive probability that represent the qualitative probability.

3.2.5 The proof of the \((\ddagger )\) condition

Next we demonstrate that the \((\ddagger )\) condition holds. As we shall show in Sect. 4, this property will play a crucial role in defining utilities for simple acts, without using the \(\sigma \)-algebra assumption.

Theorem 3.30

Let \(\mu \) be the probability that represents the qualitative probability \(\succeq \). Assume that P6\('\)  holds. Then, for every non-null event, A, every \(\rho \in (0,1)\) and every \(\epsilon >0\) there exists an event \(B\subseteq A\), such that \((\rho - \epsilon )\cdot \mu (A) \le \mu (B) \le \rho \cdot \mu (A)\).

Proof

As stated by Savage, there is a Savage chain for A, that is, an infinite sequence of 3-partitions of A: \((A'_n\ E_n\ A_n'')_n,\, n=1,2,\ldots \) such that:

  1. (i)

    \(A'_n \cup E_n \succeq A_n''\, \text{ and }\, A'_n \cup E_n \succeq A_n''\)

  2. (ii)

    \(A'_{n+1}\supseteq A'_n,\; A_{n+1}''\supseteq A_n''\), hence \(E_{n+1}\ \subseteq E_n\ \)

  3. (iii)

    \(E_n - E_{n+1} \succeq E_{n+1}\).

This, as shown in Fishburn’s reconstruction, is provable without using the \(\sigma \)-algebra assumption. Consequently we get:

  1. (1)

    \(\mu (A'_n) + \mu (E_n) \ge \mu (A_n'')\) and \(\mu (A_n'') + \mu (E_n) \ge \mu (A'_n)\), which imply:

    1. (a)

      \(|\mu (A'_n) - \mu (A_n'')| \le \mu (E_n)\).

  2. (2)

    \(\mu (E_{n+1}) \le (1/2)\cdot \mu (E_n)\), which implies:

    1. (b)

      \(\mu (E_n) \le (1/2)^{n-1}\).

Since \(\mu (A) = \mu (A'_n) + \mu (E_n) + \mu (A_n'')\), we get from (a) and (b):

$$\begin{aligned} \mu (A'_n) \longrightarrow 1/2\cdot \mu (A), \quad \quad \mu (A_n'') \longrightarrow 1/2\cdot \mu (A). \end{aligned}$$

Since both \(A'_n\) and \(A_n''\) are monotonically increasing as sets, \(\mu (A'_n)\) and \(\mu (A_n'')\) are monotonically increasing. Consequently, we get: \(\mu (A'_n) \le 1/2\cdot \mu (A) \) and \(\mu (A_n'') \le 1/2\cdot \mu (A) \). All these imply the following claim:

Claim 1

Let A be a non-null set. Then, for every \(\epsilon > 0\), there are two disjoint subsets of A, \(A_0\) and \(A_1\), such that, for \(i=0,1\):

$$\begin{aligned} 1/2 \cdot \mu (A) - \epsilon ~~\le ~~ \mu (A_i)~~ \le ~~ 1/2\cdot \mu (A). \end{aligned}$$

Call such a partition an \(\epsilon \)-bipartition of A. Call \(\epsilon \) the error-margin of the bipartition. We can now apply such a bipartition to each of the parts, and so on. By “applying the procedure” we mean applying it to all the non-null minimal sets that were obtained at the previous stages (the inductive definition should be obvious).

Claim 2

Let A be any non-null set. Then for every \(k>1\) and every \(\epsilon >0\), there are \(2^k\) disjoint subsets of A, \(A_i, i = 1,\ldots ,2^k\), such that:

$$\begin{aligned} 1/2^k \cdot \mu (A) - \epsilon ~~\le ~~ \mu (A_i)~~ \le ~~ 1/2^k\cdot \mu (A). \end{aligned}$$

(This claim is proved by considering k applications of the procedure above, where the error-margin is \(\epsilon /k\).) Note that since Claim 2 is made for any \(\epsilon >0\), and any \(k>1\), we can replace \(\epsilon \) by \(\epsilon /2^k\cdot \mu (A)\). Thus, the following holds:

  1. (+)

    For every \(\epsilon > 0\), \(k>1\), there are \(2^k\) disjoint subsets, \(A_i\), of A, such that:

    $$\begin{aligned}1/2^k\cdot \mu (A) - \epsilon /2^k\cdot \mu (A)\ ~~\le ~~ \mu (A_i)~~ \le ~~ 1/2^k\cdot \mu (A). \end{aligned}$$

The following is a reformulation of (+)

(\(*\)):

For every \(\epsilon > 0\), \(k>1\), there are \(2^k\) disjoint subsets, \(A_i\), of A, such that:

$$\begin{aligned} \mu (A_i) \in \Big [1/2^k\cdot \big (\mu (A) - \epsilon \big ), ~~ 1/2^k\cdot \mu (A) \Big ]. \end{aligned}$$

Similarly, \((\ddagger )\) can be put in the form

(\(**\)):

Fix any non-null set A. Then for every \(\rho < 1\), and any \(\epsilon '>0\), there is a set \(B\subseteq A\), for which

$$\begin{aligned} \mu (B) \in \Big [\big (1-\epsilon '\big )\cdot \rho \mu (A) , ~ \rho \mu (A)\Big ]. \end{aligned}$$

All the subsets that are generated in the process above are subsets of A. Therefore A plays here the role of the “universe,” except that its probability, \(\mu (A)\), which must be non-zero, can be \(<1\). In order to simplify the formulas, we can assume that \(A =S \) (the universe). The argument for this case works in general, except that \(\mu (A)\) has to be added as a factor in the appropriate places. Thus the proof is reduced to proving that, of the following two conditions, \((\circ )\) implies \((\circ \circ )\).

\((\circ )\) :

Given any \(\epsilon > 0\) and any \(k > 1\), there are \(2^k\) disjoint subsets, \(A_i\), such that, for all i, \(\mu (A_i) \in \big [1/2^k\big (1 - \epsilon \big ), ~ 1/2^k \big ]. \)

\((\circ \circ )\) :

Given any \(0<\rho <1\) and any \(\epsilon '>0\), there is a set B such that \(\mu (B) \in \big [\rho \big (1-\epsilon '\big ), ~\rho \big ].\)

Now let \(\rho '=\rho \cdot (1-\epsilon ')\); then \((\circ \circ )\) means that given \(0<\rho '<\rho <1\), there is B such that \(\mu (B)\in [\rho ' , ~\rho ]\). Let \(\theta <\rho \). Since \(\theta \) and \(\rho \) are infinite sums of binary fractions of the form \(1/2^k\), it is easily seen that there is a finite set of such fractions whose sum is in the interval \( [\theta ,~\rho ]\). Since, \(1/2^m =2\big (1/2^{m+1}\big )\), it follows that there are k and \(l<2^k\), such that \(l/2^k \in [\theta ,~\rho ]\). Let \(A_i\) be the disjoint sets that satisfy \((\circ )\) and let B be the union of l of them. Then \((1-\epsilon )\cdot \rho \le \mu (B)\le \rho \). \(\square \)

Remark 3.31

It’s worth repeating that (\(\ddagger \)) does not rely on the \(\sigma \)-algebra assumption, but (\(\dag \)) does. That (\(\dagger \)) cannot be obtained without the \(\sigma \)-algebra assumption is shown by the existence of countable models, as shown in §3.3.

3.3 Countable models

The \(\sigma \)-algebra assumption implies that the Boolean algebra of events has at least the cardinality of the continuum. Its elimination makes it possible to use a countable Boolean algebra. All that is needed is a qualitative probability, \(\succeq \), defined over a countable Boolean algebra, which satisfies P6\('\). There are more than one way to do this. Here is a type of what we shall call bottom up extension. In what follows, a qualitative probability space is a system of the form \((S, {\mathcal B}, \succeq )\), where \({\mathcal B}\) is a Boolean algebra of subsets of S and \(\succeq \) is qualitative probability defined over \({\mathcal B}\).

Definition 3.32

Let \((S, {\mathcal B}, \succeq )\) be a qualitative probability space. Then a normal bottom up extension of \((S, {\mathcal B}, \succeq )\) is a pair consisting of a qualitative probability \((S', {\mathcal B}', \succeq ')\) and a mapping \(h: S'\rightarrow S\), of \(S'\) onto S, such that for every \(A,B\in {\mathcal B}\), \(h^{-1}(B) \in {\mathcal B}'\) and \(A \succeq B \iff h^{-1}(A) \succeq ' h^{-1}(B)\).

Remark 3.33

The extension is obtained by, so to speak, splitting the atoms (the states in S) of the original algebra. This underlies the technique of getting models that satisfy P6\('\). In order to satisfy P6\('\) we have, given \(A\succ B\), to partition S into sufficiently fine parts, \(P_i, i=1, 2, \ldots ,n\), such that \(A\succ B\cup P_i\) for all \(i=1,\ldots ,n\). If we start with a finite Boolean algebra, the way to do it is to divide the atoms into smaller atoms. The intuitive idea is that our states do not reflect certain features of reality, and that, if we take into account such features, some states will split into smaller ones.

This picture should not imply that P6\('\), which is a technical condition, should be adopted. The intuitive justification of P6\('\), which has been pointed out by Savage, is different.

We have shown that, starting from a finite qualitative probability space we can, by an infinite sequence of normal extensions, get a countable space (that is, both S and \({\mathcal B}\) are countable) that satisfies P6\('\). We can also get models with other desired features. The proof of the following theorem, which is not included here, uses techniques of repeated extensions that are employed in set theory and in model theory.

Theorem 3.34

(Countable model theorem)

  1. (1)

    Let \((S_0, {\mathcal B}_0, \succeq _0)\) be a finite qualitative probability space and assume that the qualitative probability is representable by some numeric probability. Then there is an infinite countable model, \((S, {\mathcal B}, \succeq )\), which forms together with a mapping, \(h: S \rightarrow S_0\) , a normal extension of \((S_0, {\mathcal B}_0, \succeq _0)\), and which satisfies P6\('\).

  2. (2)

    Let \(\Xi \) be any countable subset of (0, 1) and let \(\mu \) be the numeric probability that represents \(\succeq \) (which exists by (1) and by our main result). Then we can construct the model \((S, {\mathcal B}, \succeq )\) in such a way that \(\mu (A)\notin \Xi \) for every \(A\in {\mathcal B}\).

This theorem implies, for example, that for all n, no number of the form 1 / n, where \(n>1\), and no number of the form \((1/2)^n\), where \(n>0\), are among the values of \(\mu \). Now de Finetti and Koopman proposed axiom systems for subjective probability that included an axiom stating that there are partitions of S into n equal parts for arbitrary large ns. Our theorem shows that, without the \(\sigma \)-algebra assumption, P6\('\) does not imply the existence of a probability that satisfies that axiom. Savage found P6\('\) more intuitive than their axiom (and indeed it is), but was somewhat puzzled by the fact that it implies that axiom. Our last theorem solves this puzzle. It shows that without the \(\sigma \)-algebra assumption it does not imply their axiom.

Remark 3.35

So far we have been dealing with the Boolean algebra only. But in order to state the results within the full perspective of Savage’s system, we shall state them as results about decision models, that is, about systems of the form \((S, X, {\mathcal A}, \succcurlyeq , {\mathcal B})\). This is done in the following theorem.

In what follows \(f \circ g\) is the composition of the functions f and g, defined by \((f\circ g)(x) = f(g(x))\). It is used under the assumption that the domain of f includes the range of g.

Theorem 3.36

Let \((S,X,{\mathcal A},\succcurlyeq ,{\mathcal B})\) be a decision model that satisfies P1–P5 (where P5 is interpreted as the existence of two non-equivalent constant acts, but without assuming CAA). Assume that S is finite and there is a probability over \({\mathcal B}\) that represents the qualitative probability. Then there is a Savage system, \((S^*,X^*,{\mathcal A}^*,\succcurlyeq ^*,{\mathcal B}^*)\), that satisfies P1–P6 and there is a function h that maps \(S^*\) onto S such that the following holds:

  1. (i)

    \(S^*\) and \({\mathcal B}^*\) are countable,

  2. (ii)

    for all \(A\in {\mathcal B}\), \(h^{-1}(A)\in {\mathcal B}^*\),

  3. (iii)

    \(X^*=X\),

  4. (iv)

    \(f\in {\mathcal A}^*\) iff \(f\circ h\in {\mathcal A}\),

  5. (v)

    \(f^*\succcurlyeq ^* g^*\) iff \(f\circ h\succcurlyeq g\circ h\).

As noted, the proofs of these theorems employs techniques of model-theory and set-theory. Here is a rough idea of one basic techniques from set theory. At every stage of the repeated extensions we ensure that a particular instance of P6\('\) should be satisfied. As the model grows, there are more cases to take care of, but we can arrange these tasks so that after the infinite sequence of extensions all are taken care of. We shall not go into more detail here.

4 A simpler utility function for simple acts

In discussing the possibilities of extending the utility to non-simple acts, Savage remarks:

The requirement that an act has only a finite number of consequences may seem, from a practical point of view, almost no requirement at all. To illustrate, the number of time intervals that might possibly be the duration of a human life can be regarded as finite, if you agree that the duration may as well be rounded to the nearest minute, or second, or microsecond, and that there is almost no possibility of its exceeding a thousand years. More generally, it is plausible that, no matter what set of consequences is envisaged, each consequence can be particularly identified with some element of a suitably chosen finite, though possibly enormous, subset. If that argument were valid, it could easily be extended to reach the conclusion that infinite sets are irrelevant to all practical affairs, and therefore to all parts of applied mathematics. (Savage 1972, p. 76–77)

In the last sentence Savage claims that the argument in favor of restricting ourselves to simple acts should be rejected; otherwise this argument would also imply that in applied mathematics we need not consider infinite sets. But Savage’s system main goal is to serve as a foundation for subjective (in his terminology, personal) probability—clearly a philosophical goal, which makes it a different kind of thing than a piece of applied mathematics. In applied mathematics one uses, as a rule, \(\sigma \)-additive probabilities, for reasons of convenience and efficiency; but Savage avoids, because of philosophical qualms, the adoption of \(\sigma \)-additivity as an axiom of his system.

In the continuation of the above quote Savage points out the very high benefits that accrue in mathematics from the use of infinite sets, which “can lead to great simplification of situations that could, in principle, but only with enormous difficulty, be treated in terms of finite sets.” Yet, his system cannot be treated merely as a piece of mathematics. As a mathematician, Savage is interested in generalizing various concepts and theorems, for the sake of the mathematical significance of the generalization. As we shall presently show, CAA can be avoided, if we limit ourselves to simple acts, and, as far as the philosophical goal of his system is concerned—this is all that matters. For, as we noted in the first two Sections, CAA implies a rather dubious notion of “rational agent.”

Note that it is known, and anyone who follows Savage’s derivation can easily check it, that in the Savage system all that is needed for defining the probabilities are two non-equivalent constant acts.Footnote 23 That is, instead of using CAA we posit 2CA, i.e., there are two non-equivalent acts. Assume that they are \(\mathfrak {c}_0\) and \(\mathfrak {c}_1\) and that their corresponding consequences are \(a_0\) and \(a_1\). Assume, WLOG, that \(\mathfrak {c}_0\prec \mathfrak {c}_1\). Savage’s technique of getting qualitative probabilities can be now applied, so that for every \(E, F \in {\mathcal B}\) we define:

$$\begin{aligned} E\succeq F \;\Leftrightarrow _\text {Df}\; \mathfrak {c}_1|E+ \mathfrak {c}_0|\overline{E} \succcurlyeq \mathfrak {c}_1|F+ \mathfrak {c}_0|\overline{F}. \end{aligned}$$
(4.1)

Then \(\succeq \) is a qualitative probability. We can now represent it by a uniquely determined numeric probability, \(\mu \). Under the \(\sigma \)-algebra assumption, Savage’s construction gives us a probability that satisfies \((\dagger )\). Without the \(\sigma \)-algebra assumption, our construction gives us a probability that satisfies \((\ddagger )\).

Recall that a feasible consequence is a consequence, a, for which there is an act \(f\in {\mathcal A}\), such that \(f^{-1}(a)\) is not null. We shall now show how, using the probability \(\mu \), we can assign utility values to all to feasible consequences, so as to get an expected utility function defined over all simple acts. This is done without assuming CAA. Let u(x) the utility of consequence x. We start by putting \(u(a_0)=0\), \(u(a_1)=1\). This means that the acts \(\mathfrak {c}_0\) and \(\mathfrak {c}_1\) fix the basic utility scale. Without appealing to CAA we shall now assign utilities to all feasible consequences. To do this, we use the probability \(\mu \), which we have derived already. The definition is simpler if \(\mu \) satisfies \((\dagger )\). Therefore we shall provide this definition first, and then point out the modification that will give us the utility assignment, if the probability satisfies \((\ddagger )\). At the end we get a utility assignment for all simple acts, where neither the \(\sigma \)-algebra assignment nor CAA are assumed.

4.1 Constructing utilities under the \((\dagger )\) condition

Consider now any feasible consequences \(a\in X\) and let g be an act such that \(g^{-1}(a)\) is not null. Let \(A=g^{-1}(a)\) and let \(\mathfrak {c}^*_A=_\text {Df}g|A+\mathfrak {c}_0|\overline{A}.\) By definition, \(\mathfrak {c}^*_A\) yields a if \(s\in A\), status quo, (i.e., 0)—otherwise. Let U(f) be the utility value of the act f, which we have to define.

To define utilities, we compare \(\mathfrak {c}^*_A\) with \(\mathfrak {c}_0\). If \(\mathfrak {c}^*_A\equiv \mathfrak {c}_0\), we put \(u(a)=0\). Otherwise there are three possibilities:

$$\begin{aligned} \text {(i)} \quad \mathfrak {c}_1\succcurlyeq \mathfrak {c}^*_A \succ \mathfrak {c}_0 \quad \text {(ii)}\quad \mathfrak {c}^*_A \succ \mathfrak {c}_1 \quad \text {(iii)}\quad \mathfrak {c}_0\succ \mathfrak {c}^*_A \end{aligned}$$

In each one of these possibilities, the utility of \(\mathfrak {c}^*_A\) and that of a can be defined as follows. Let \(\mu \) be the numeric probability derived under the (\(\dagger \)) condition. Then for case (i), let

$$\begin{aligned} \rho =\sup \Big \{\mu (B)\;\Big |\; B\subseteq A \text { and }\mathfrak {c}^*_A \succcurlyeq \mathfrak {c}_1|B+\mathfrak {c}_0|\overline{B}\Big \}. \end{aligned}$$
(4.2)

Define

$$\begin{aligned} U[\mathfrak {c}^*_A]=\rho \quad \quad \text {and}\quad \quad u(a)=\frac{\rho }{ \mu (A)}. \end{aligned}$$
(4.3)

For case (ii), let \(\rho =\sup \{\mu (B)\mid B\subseteq A \text { and } \mathfrak {c}_1 \succcurlyeq \mathfrak {c}^*_A|B+\mathfrak {c}_0|\overline{B} \}\); define \(U[\mathfrak {c}^*_A]=1/\rho \) and \(u(c)=1/[\rho \cdot \mu (A)]\). Case (iii) in which the utility comes out negative is treated along similar lines and is left to the reader.

This assignment of utilities leads to a representation of the utility of any simple acts, f, as the expected utilities of the consequences, that appear as values of the act, where, WLOG, we assume that each consequence a of f is feasible. As noted by Savage, this definition works, if we assume that \(\mu \) satisfied \((\dagger )\). The proof is straightforward.

4.2 Constructing utilities under the \((\ddagger )\) condition

If \(\mu \) satisfies the weaker condition \((\ddagger )\) the definition has to be modified. Here is the modification for the case \(\mathfrak {c}_1\succcurlyeq \mathfrak {c}^*_A \succ \mathfrak {c}_0\). From this it is not difficult to see what the modification is in the other cases. Instead of (4.2), let

$$\begin{aligned} \rho&=\sup \Bigg \{ \mu (B) \;\Bigg |\; \forall \epsilon >0\; \exists B'\subseteq A \; \\&\quad \,\,\,\Big [\mu (B)-\epsilon \le \mu (B')\le \mu (B) \text { and } \mathfrak {c}^*_A \succcurlyeq \mathfrak {c}_1|B'+\mathfrak {c}_0|\overline{B'}\Big ]\Bigg \}. \end{aligned}$$

Define utilities of \(\mathfrak {c}_A\) and a as in (4.3), then we are done.