Revising Probabilities and Full Beliefs

A new formal model of belief dynamics is proposed, in which the epistemic agent has both probabilistic beliefs and full beliefs. The agent has full belief in a proposition if and only if she considers the probability that it is false to be so close to zero that she chooses to disregard that probability. She treats such a proposition as having the probability 1, but, importantly, she is still willing and able to revise that probability assignment if she receives information that gives her sufficient reasons to do so. Such a proposition is (presently) undoubted, but not undoubtable (incorrigible). In the formal model it is assigned a probability 1 − δ, where δ is an infinitesimal number. The proposed model employs probabilistic belief states that contain several underlying probability functions representing alternative probabilistic states of the world. Furthermore, a distinction is made between update and revision, in the same way as in the literature on (dichotomous) belief change. The formal properties of the model are investigated, including properties relevant for learning from experience. The set of propositions whose probabilities are infinitesimally close to 1 forms a (logically closed) belief set. Operations that change the probabilistic belief state give rise to changes in this belief set, which have much in common with traditional operations of belief change.


Introduction
Formal models of belief states can employ either a dichotomous (all-or-nothing) or a more fine-graded representation of beliefs. The former approach is applied in models belonging to the tradition called belief change or belief revision [1,5,9,15,18,35]. In these models, an agent's current beliefs are represented by a consistent and logically closed set of propositions, called a "belief set". We will call these models dichotomous since they have only two degrees of belief; a proposition is either believed, in which case case it is an element of the belief set, or it is not believed, and not an element of the belief set. The most important fine-graded approach is probability theory, in which a belief state is represented by a probability function with a dense range of infinitely many degrees of belief, between 0 and 1.
Both dichotomous and probabilistic representations are usually treated as parts of dynamic frameworks, which also contain operations of change. Such operations specify how the belief state will be modified in response to various inputs. In both approaches, the most important type of input is a proposition to be assimilated as a full belief. If the input is a proposition a, then in a dichotomous belief change model, a will be an element of the new belief set that results from the operation. In a probabilistic model, the new probability function will assign the probability 1 to a.
In standard probability theory, the new probability function p is commonly assumed to be completely determined, through conditionalization, by the input a and the previous probability function p: 1 In contrast, the previous belief set K in a dichotomous belief change model does not contain enough information for determining, in combination with the input a, the contents of the resulting new belief set. The major reason for this is that if a is inconsistent with K, then some elements of K will have to be removed in order to retain consistency when a is added. There are many ways to do this, and K alone does not tell us how to do it. Therefore, the belief state has to contain more information than what can be inferred from the belief set K; this additional information should tell us which elements to exclude from K in order to make room for a. Quite a few types of such additional components of the belief state ("selection mechanisms") have been proposed (Rott and Hansson [36]; Hansson [18], pp. [6][7][8][9][10][11][12][13][14][15][16]. Some of them take the form of relations ordering the elements of K, or numbers assigned to its elements to signal their degrees of retractability. However, such numerical assignments do not denote degrees of beliefs, and they do not have the properties expected of plausible such degrees [9, pp. 87-88]. Another important difference between dichotomous and probabilistic models of belief change concerns the removability of full beliefs. In the common models of (dichotomous) belief change, a non-tautologous belief that is an element of the belief set can be removed in a subsequent operation of change. For instance, with * denoting the operation of change and a a non-tautologous input proposition, we have a ∈ K * a and a / ∈ K * a * ¬a, according to the standard properties of such operations. 2 In contrast, as can be seen from Eq. 1, conditionalization by a proposition a that has 1 Unless p(a) = 0, in which case p (d) is left undefined by Eq. 1. This equation is offered as a definition in Kolmogorov [27, p. 6], but as noted by Hájek [10, p. 274] it should preferably be called an analysis ("the ratio analysis"), rather than a definition. Kolmogorov also offered an account of conditional probability in terms of Markov chains (ibid., pp. [12][13], which allows conditional probabilities with a zero-probability antecedent to be well-defined. The problems with the ratio analysis are nicely summarized in Lyon [33]. Unfortunately, Kolmogorov's account in terms of Markov chains has its own problems (Seidenfeld et al. [39]; Seidenfeld [38]; Hájek [10], pp. 291, 301, and 319). 2 a / ∈ K * a * ¬a follows from the AGM postulates success and consistency [1].
non-zero probability results in a new probability function p with p (a) = 1, and any series of further conditionalizations of p will result in a probability function that also assigns the value 1 to a. Probabilistic and dichotomous belief representations capture different features of real-life epistemic change. Probabilistic models have the obvious advantage of allowing for smaller changes than a shift between full belief and complete lack of belief. On the other hand, the common probabilistic models are not well suited to represent the process of giving up a full belief. The gist of the problem is that we all have beliefs which we do not doubt, but which we will nevertheless give up if confronted with sufficiently convincing evidence that they are mistaken. Such a pattern is difficult to represent in probabilistic models, since they use the same means (probability 1) to represent that a belief is not doubted and that it cannot be rejected. 3 The loss of full beliefs is much more readily represented in a dichotomous belief change model. Indeed, operations that remove beliefs from the belief set (operations of contraction) are standard components of such models.
Several attempts have been made to reconcile these two types of models with each other. (For overviews, see Leitgeb [31], pp. 1-53 and Zhuang et al. [47].) In most of these proposals, an individual's probabilistic and all-or-nothing beliefs refer to the same propositions, so that each proposition has both a probability and a dichotomous belief status (believed/not believed). The two assignments of a belief status are usually closely connected to each other in these models; typically a proposition's dichotomous belief status can be inferred from the probability function. The most obvious such approach is to identify the set of full beliefs with the propositions that have probability 1. However, in combination with standard conditionalization, this construction has the rather devastating effect that once a full belief has been acquired, there is no way to give it up, whatever new information the agent receives. 4 One 3 To avoid this, we need a framework in which only undoubtable propositions have unit probability, or equivalently: only propositions with undoubtable negations have zero probability. This is close to requiring regularity, a property promoted most famously by David Lewis, who defined it as meaning that only "the empty proposition, true at no worlds" has zero probability [32, p. 88]. He noted that if the sample space is infinite, then regularity implies that many of its elements have infinitesimal probabilities. Brian Skyrms pointed out that in such a sample space, failure of regularity is accompanied by failure of Shimony's [40] strict coherence property; betting on the truth of a contingent proposition with probabily 1 will be "a bet which we will consider fair even though we can possibly lose it but cannot possibly win it." [41, p. 74] Importantly, regularity fails in standard probability theory even if the sample space is finite. Common Bayesian revision of a probability function p by a contingent proposition e results in a new probability function p with p (a) = p(a | e) for all a, and consequently p (¬e) = 0 even though ¬e is contingent. With a revision function such as that introduced below in Definition 5, this will not be the case. 4 This leads to a major limitation in standard probabilistic models, namely that they cannot represent revision by a proposition previously believed to be false. In dichotomous belief change, such an operation is represented by a revision K * a for an a with ¬a ∈ K. The operations of revision studied in belief change theory determine what K * a will be with the help of some mechanism that selects those parts of K that will be retained in K * a. (Typically, the retained parts of K will not imply ¬a unless it is a tautology, and therefore a can be added without loss of consistency.) In standard probabilistic representation, revision by a sentence a corresponds to conditionalization by a. The outcome is a new probability function p that replaces the original probability function p, such that p (d) = p(d | a) for all d. However, if p(¬a) = 1, then p(d | a) is undefined according to standard accounts of conditional probability. (See footnote 1.) way to solve this would be to apply the so-called Lockean thesis [6], which identifies the full beliefs with the propositions that have a probability higher than some fixed probability limit below 1. However, no non-arbitrary such limit is within sight, and in addition, this construction is incompatible with the common assumption that what follows logically from two beliefs is also believed. 5 Recently, other more sophisticated constructions have gained interest, in which the criterion for status as full belief is not a fixed minimal level of probability but some comparison with the probabilities of propositions representing alternative beliefs on the same subject-matter. Such constructions are highly sensitive to the individuation of said alternatives. In Leitgeb's [31] proposal, which is arguably the best worked-out of these constructions, full belief will sometimes be assigned to propositions with a probability barely above 0.5, but on other occasions it will not be assigned to propositions with very high probabilities [19, pp. 278-279]. 6 In this article, a new approach will be presented. Its basic idea is that we have full belief in a proposition if and only if we consider the probability that it is false to be so close to zero that we choose to disregard that probability. 7 We therefore treat the proposition as if it had the probability 1, but, importantly, we are still willing and able to revise that probability assignment if we receive information that gives us sufficient reasons to do so. Another way to express this is that such a proposition is (presently) undoubted, but not undoubtable (incorrigible). As I have argued elsewhere, this is the common form of full beliefs, both in science and in everyday life [13], [20, pp. 68-71].
In order to represent undoubted but doubtable propositions, we need to enrich the standard representation of probabilities with some means to keep track of probabilities that are small enough to be currently neglected, but still capable of being raised to levels at which the propositions they refer to will be taken seriously. For this purpose, 5 If I believe that Manchester United will win the match they play on Sunday in London, and I also believe that Flamengo will win the match they are playing in Rio de Janeiro on the same day, then I will expectedly also believe that both teams are going to win their matches on that day. However, if the first two predictions are just above the probability limit for full belief, then the third, conjunctive statement can fall below it. It is commonly assumed that the set of beliefs held by a rational agent should be closed under logical consequence. Therefore, the observation that what follows from two propositions that are both above a certain probability limit can fall below that limit is commonly taken as a decisive argument against using a real-valued probability limit below 1 as criterion of rational belief. This is highlighted in two of the most famous probability paradoxes, namely the lottery paradox [30, pp. 197-198] and the paradox of the preface [34]. 6 See Schurz [37] for a series of impossibility results showing that the combination of the Lockean thesis (with a real-valued probability limit below 1) and logical closure of the set of beliefs can only obtain in models that are small in terms of the number of beliefs and/or the number of doxastic possibilities that are represented. 7 The probabilities referred to in this article are "objectivist credences", i.e. they will represent the agent's best estimate of objective probabilities.
we will use hyperreal numbers to represent probabilities. 8 A (finite) hyperreal number is either a real number or the sum of a real number and a (positive or a negative) infinitesimal number. (A positive infinitesimal is larger than 0 but smaller than all positive real numbers. See Appendix A for a brief introduction to hyperreal numbers.) The following notation will be used: Definition 1 (1) Propositions are denoted by the letters a, b, c, d, and e.
(2) The letters s, t, u, v, x, y, and z represent hyperreal numbers (which may be real). The letter δ represents numbers that are either 0 or infinitesimal, and represents infinitesimal numbers.
(3) The standard (real) part of a finite hyperreal number s is denoted st(s). Probabilities will be represented by hyperreal numbers in the closed interval [0, 1]. The agent has full belief in a proposition a if and only if st(p(a)) = 1. 9 If a is a 8 There is a considerable literature on infinitesimal probabilities. Most of it has employed infinitesimals to attack the problems arising when classical probability theory is applied to infinite domains (event spaces). In the words of Brian Skyrms, infinitesimal probabilities are needed since we have "fattened the domain of the probability function (the space of possibilities) without a concomitant fattening of the range" [41, p. 74]. For instance, a fair lottery with an infinite number of tickets cannot be modelled with real-valued probabilities, but it can be modeled by assigning the same, non-zero, infinitesimal probability to all tickets [2,44,45]. For a recent overview, see Benci et al. [3]. In this article, the domain of the probability function will be assumed to be finite, and therefore we will not need infinitesimal probabilities for their most common purpose, namely to ensure that all logically possible outcomes receive non-zero probability. Instead, we will use infinitesimals for two other purposes, both of which are known from previous literature: (1) we will identify the set of full beliefs with the set of propositions whose probability is infinitesimally close to 1 [44], and (2) we will use propositions with infinite probabilities as "memory tracks" of beliefs that have been given up [43]. Major new features in this article are the transfer of the distinction between update and revision from the belief change literature (Section 2.2), the combination of infinitesimals with an approach to probability revision that differs from traditional Bayesian conditionalization (Sections 2.3-2.4), and several adjustments that align revisions of probabilities with the operations on belief sets that are studied in the belief change literature. 9 This can be described as a non-standard version of the Lockean thesis. Essentially the same version of the thesis was applied by Wenmacker [44] in a study of fair lotteries with an infinite number of tickets. (However, she employed Hrbacek's [23] relative analysis, which mirrors infinitesimal numbers as a category of 'ultrasmall' real numbers.) Contrary to a limit for full belief at (arbitrarily small) positive real-valued distance from 1, a limit infinitesimally smaller than 1 is not affected by Schurz's [37] impossibility theorems. To see why this is so, consider his result for the sufficiency part of the Lockean thesis. According to this part of the thesis, there is a real-valued number t with 1/2 < t ≤ 1, such that a proposition a is a full belief if t ≤ p(a). Suppose that there are n logically independent full beliefs a 1 , . . . , a n such that for all a i : t ≤ p(a i ) < u < 1, where u is an upper bound on their probabilities. Due to the logical closure of the set of full beliefs, t ≤ p(a 1 & . . . &a n ). Due to the probability axioms, p(a 1 & . . . &a n ) = p(a 1 ) × · · · × p(a n ) < u n . Thus 1/2 < u n . For instance, if u = 0.95, then n < 14 and if u = 0.98, then n < 35. (Schurz presents this result in a generalized form that does not presume logical independence.) However, in the hyperreal model, there is no justification for imposing an upper bound such as u. Even if we do so, it follows from t u that 1/2 < u n for all n. tautology, then p(a) = 1, but we will leave it open whether there are non-tautologies with unit probability.
In this article, it will be assumed that the domain (sample space) of the probability function is finite. Therefore, we can also assume that the standard axioms for probability, the Kolmogorov axioms, hold in spite of the codomain (range) of the probability function being extended from the closed real-valued interval [0, 1] to the closed hyperreal-valued interval [0, 1]. 10 Probabilistic revision will be performed with an operation of revision that does not in general coincide with standard (Bayesian) conditionalization.
Importantly, no metaphysical claims are made about infinitesimals. They are used here as a modelling tool since they have properties that are convenient for the representation of probabilities that are conceived as so small that they need not be taken into account. Other constructions, such as vectors of real numbers, could be used for the same purpose, but the hyperreal number system has the advantage of being well established and having well-known properties.
The new model of probabilistic revision is introduced in Section 2. In Section 3 some of its properties are investigated. The set of propositions whose probability is infinitesimally close to 1 forms a (logically closed) belief set. Section 4 is devoted to the changes in that belief set that follow from operations of probabilistic change. All formal proofs are deferred to an appendix.

Constructing the Multistate Model of Probabilistic Revision
In order to construct a workable model of epistemic change, based on the principles introduced in the previous section, some additional adjustments of the traditional account of probabilistic revision are needed. These changes will bring the probabilistic model closer to the common models of dichotomous belief change. To begin with, we will adopt an iteration-friendly notation for revision from the dichotomous models. After that, we will transfer the distinction between update and revision from dichotomous modelling. With these adjustments in place, we are ready to introduce a new model of probabilistic belief states and their revision, to be called the multistate model.

An Iteration-Friendly Notation
The common way to express probabilistic change in formal language is to employ the notation for conditionalization. In standard (real-valued) probability theory, the outcome of updating a probability function p by a proposition a 1 to which p assigns a non-zero probability is a new probability function p , which is usually written 10 If the sample space is infinite, then the axiom of countable additivity, for all d. This notation is impractical for repeated operations of change. For instance, if p is in its turn revised by the proposition a 2 , then this results in a new probability function p such that p (d) = p (d | a 2 ) for all d, and in this notation the preceding revision by a 1 is lost out of sight. The common notation in the literature on dichotomous belief change is much more clear in this respect. The revision of a belief set K by a proposition a 1 is denoted K * a 1 , its revision by first a 1 and then a 2 is denoted K * a 1 * a 2 , etc. This notation can be used for probabilistic revision as well. Thus, the revision of a probability function p by a proposition a 1 will be written p * a 1 , its revision by first a 1 and then a 2 will be denoted p * a 1 * a 2 , etc. Hence, p * a 1 * a 2 (d) = 0.7 means that d is assigned the probability 0.7 by the probability function that is obtained by revising p by first a 1 and then a 2 . All inputs for the revision operation * will be propositions, and * a 1 represents the receipt of information indicating that a 1 is the case.
Formulas such as p * a 1 * a 2 (d) are somewhat difficult to parse. For clarity, we will introduce boldface brackets around subformulas denoting a composite probability function, thus writing ( ( (p * a 1 * a 2 ) ) )(d) instead of p * a 1 * a 2 (d).

Update and Revision
In a seminal article, Keller and Wilkins [26] observed that there are two types of reasons why we incorporate new information into a belief set. One is that the world has changed, and the other that we have received new information about it. They called the first type of belief change "change-recording" and the second "knowledgeadding". In the subsequent literature, another terminology has been established for this distinction. Belief changes in response to information about changes in the world are called "updates", whereas belief changes based on additional information about unchanged features of the world are called "revisions". Examples of the following type are commonly used to illustrate the difference between update and revision: Initially, the epistemic agent believes that Mr. Booker has placed either a comic book or a dictionary, but not both, on a certain, previously empty, table.
Case i: The agent receives the additional information that it was in fact a comic book that Mr. Booker put on the table. She now believes that there is a comic book on the table, but no dictionary. This is a revision. Letting c represent the proposition that there is a comic book on the table, the revision is denoted K * c.
Case ii: The agent receives the additional information that someone just put a comic book on the table. She now believes that there are either two comic books on the table, or a comic book and a dictionary. This is an update, denoted .
To see that the two operations differ, let d denote that there is a dictionary on the table, and note that ¬d ∈ K * c and . The following example shows how the distinction between update and revision can be applied to probabilistic belief states: An urn is placed on the table. We know that it contains 100 thoroughly mixed balls that all have equal size and weight. We also know that there are either 75 yellow and 25 black balls, or 75 black and 25 yellow ones, but we do not know which. We consider the two possible colour compositions to be equally probable.
The urn is shaken, and a ball is pulled out. It turns out to be yellow. Problem 1: What is now the probability that a yellow ball has been drawn? Problem 2: What is now the best estimate of what the probability was, just before the draw, that a yellow ball would be drawn?
In Problem 1, which concerns update, the probability asked for is 1. In formal terms: (a) = p(a | a) = 1. Problem 2 concerns a revision. To solve it, we can assume that there are two possible states of the world, namely b 1 in which the urn contains 75 yellow and 25 black balls, and b 2 in which it contains 75 black and 25 yellow balls. Let a be the proposition that the drawn ball is yellow. We have Thus, given our observation of a, the best estimate of the probability of b 1 is 0.75, thus ( ( (p * a) ) )(b 1 ) = 0.75. By a similar reasoning, ( ( (p * a) ) )(b 2 ) = 0.25. Our beliefs about the contents of the urn in each of the two states of the world have not changed, thus: and consequently: Thus, ( ( (p * a) ) )(a) = 0.625.

A Multistate Model for Real-Valued Probabilities
As the urn example shows, update corresponds to the standard account of how we "revise" (conditionalize) beliefs. Revision (in the sense used here) is of course fully in line with how problems like the above urn problem are standardly solved, but it has not been recognized as a separate operation of change. That we will do here. For that purpose, we need to collect the information about the alternative probabilistic states of the world (in the above examples, b 1 and b 2 ) into a single state, which will be the object to which the revision is applied. For clarity, the resulting model will first be constructed for real-valued probabilities, and then (in Section 2.4) generalized to accommodate hyperreal probabilities.
We will assume that the epistemic agent entertains an exhaustive set B of mutually exclusive possible states of the world, which are not directly observable. In formal terms: Definition 2 A state catalogue for a probability function p is a non-empty set B of propositions within the domain of p, such that: Notably, the state catalogue B is only a proper subset of the domain (universe, sample space) of p. Its elements are states of the world, but not in the usual, deterministic sense. Instead, each of its elements "assigns" probabilities to propositions concerning what can be observed in the world (in short: observables). These "assignments" will take the form of observation-independent probabilities conditional on each element of B.
We will assume for simplicity that B is finite. This should be a fairly unproblematic assumption, since B does not denote the set of possible states of the world, but rather a set of (probabilistic) states that are available for consideration by the epistemic agent. It may but need not be the case that b 1 ∨ . . . ∨ b n is a tautology.
The empirical propositions to which each element of B "assigns" probabilities form a set E, another subset of the domain of p, which has the following properties: Definition 3 A set of observables for a probability function p is a non-empty set E of propositions within its domain, such that Furthermore, E is logically disjoint from a state catalogue B if and only if no logically contingent truth-functional combination of elements of E follows logically from some logically contingent truth-functional combination of elements of B.
With the standard interdefinitions of truth-functional operations, (1) and (2) imply that E is closed under truth-functional combinations of its own elements.
As in the urn example, probability revision will take the form of changes in the probabilities we assign to possible probabilistic structures of the world. These alternative structures are represented by elements of B. Since only the probabilities, not the nature, of these states are subject to change, the conditional probabilities of observables, given each element of B, will not change. In other words, the following property will hold for all probability functions p under consideration, all b ∈ B and all a 1 , a 2 ∈ E: We can now introduce a model of probability revision that follows the pattern of reasoning introduced above in the discussion of the urn problem: 11 Definition 4 A multistate model of real-valued probability revision is a quadruple p, B, E, * , in which p is a real-valued probability function, B a state catalogue for p, E a set of observables for p that is logically disjoint from B, and * an operation of revision such that for all a 1 , a 2 ∈ E and b ∈ B: If p(a 1 ) = 0, then: If p(a 1 ) = 0, then:

A Multistate Model for Hyperreal Probabilities
Let us now adapt the model developed in the previous subsection to hyperreal numbers. Definition 2 does not need any change to allow for hyperreal probabilities. However, the possibility of including states with infinitesimal probability in the state catalogue B is crucial for the purpose of this model. These are the states that the epistemic agent does not currently consider to be serious possibilities, but which she may possibly, on some future occasion, upgrade to serious possibilities. To illustrate this, suppose in our urn example that we repeatedly pull out a ball, note its colour, put it back, thoroughly shake the urn, again pull out a ball, etc. If we have done this one hundred times, and pulled out a yellow ball in each of these draws, then we are justified in assigning a real-valued non-zero probability to some previously disregarded 11 Two interesting comparisons can be drawn between this model and other accounts of epistemic change. First, the role of the state catalogue B can be compared to that of a belief base in dichotomous belief change. A belief base is a (usually finite) set of propositions whose logical closure is identical to the belief set. All changes are performed on the belief base, and the new belief set is obtained as the logical closure of the new belief base [11,15]. This is analogous to the role of the state catalogue B in the present model, if we replace logical inference by the method for inferring probabilities described in Definition 4. Secondly, the multistate model is closely related to models employing second-order probabilities [7,17,42]. In such models, there is a set P = {p 1 , p 2 , . . .} of first order probability functions, all of which have the same domain, and a second-order probability functionp that assigns non-zero probabilities to the elements of P. The probability of a proposition a is equal to itsp-weighted first-order probability, i.e.: This approach can be translated directly into the present framework. For that purpose, we construct a state catalogue B with a one-to-one correspondence between elements p k ∈ P and states b k ∈ B. Furthermore, we introduce a probability function p such that (1)  states, such as a state in which all balls in the urn are yellow. For this to be possible, the state catalogue must contain such a state, although it was previously not considered to be a serious possibility. In our model, such a belief state initially had infinitesimal probability, but it can be raised to real-valued probability, depending on the information received. 12 Just like Definition 2, Definition 3 can be used without changes in a model with hyperreal probabilities.
Definition 4, however, requires a rather thorough adaptation. We are aiming at a model in which elements of B that are currently not considered to be serious possibilities can be retained (with infinitesimal probabilities) for possible future invigoration. To achieve that, we need to modify the inputs of revision, and adjust the operation of revision accordingly. For a simple example, let b ∈ B and p(a 1 | b) = 0. Then it follows from Definition 4 that ( ( (p * a 1 ) ) )(b) = 0 and consequently ( ( (p * a 1 * a 2 * · · · * a n ) ) )(b) = 0 for all series a 2 , . . . , a n of propositions, i.e. b is lost for ever, which is exactly what we wanted to avoid.
To solve this, we need to replace the operation of revision employed in Definition 4, which assigns probability 1 to the new full belief a 1 , with an operation that assigns to it a probability 1 − δ, where δ is either 0 or an infinitesimal number. This requires that we change the format of inputs. Instead of performing revision by a proposition a, we will follow Jeffrey [24, pp. 164-183] in using inputs of the form s, a , where a is a proposition and s its new probability after the operation has been performed. For our present purposes, we only need to perform revisions by inputs s, a such that st(s) = 1, i.e. we revise by 1 − δ, a , where δ is either 0 or a positive infinitesimal number. Instead of using standard conditionalization as a suboperation of revision, as in Definition 4, we will use Jeffrey conditionalization, i.e.: p(e 2 | s, e 1 ) = s × p(e 2 | e 1 ) + (1 − s) × p(e 2 | ¬e 1 ) This has the important consequence that when we revise by a sentence a, then we do not -as in standard Bayesian conditionalization -have to remove all traces of ¬a. To the contrary, if δ = 0, then ¬a is retained as an infinitesimal trace, along with infinitesimal traces of the conditional probabilities that have ¬a as their antecedent. They can all be recovered to "reality" (i.e. real-valued probabilities) when that is needed. 13 The following simplifying notation will be used, under the assumption that st(δ) = 0 ≤ δ: 12 Alternatively, we can model this process as the addition of a new element to the state catalogue. One way to do this was explored in considerable detail by Wenmackers and Romeijn [46]. A portion of the total probability is assigned to the "catch-all hypothesis" that none of the specific theories to which probability has been assigned is true. A new theory can then be introduced by receiving initial probability "shaven off" from the probability of the catch-all hypothesis. In their formal development of this approach, Wenmackers and Romeijn assigned real-valued probability to the catch-all hypothesis, but they mentioned that it could instead be assigned infinitesimal probability [46,[1248][1249]. 13 This use of infinitesimals is one of the methods mentioned by Skyrms [43] to avoid the irreversibility of Bayesian updating and to retain memory of p(e | a) after revision by a.
The limiting clause (0) of Definition 4 will still apply, but in a slightly adjusted form: If p(a 1 ) = 0, then p * δ a 1 = p.
In other words, it is still impossible to revise by a sentence that has the antecedent probability 0. However, this limiting case will be less prominent in the hyperreal version of the model, since p * δ a 1 does not have to be equal to p in the important case when st(p(a 1 )) = 0 = p(a 1 ).
Clause (1) of Definition 4 has to be adjusted to Jeffrey conditionalization, thus: Revision in the multistate model takes the form of changes in the probabilities assigned to elements of B, which are (unchangeable) probabilistic states. Clause (2) ensures the permanence of these states, encoded in the constancy of their degrees of probabilistic support of different observables. This is an essential property of the multistate model, and clause (2) is not in need of any other modification than replacement of the revision operation * by the more general * δ . Clause (3) requires a change similar to that of clause (1): Finally, as the reader may have noticed, our new versions of clauses 1 and 3 are undefined if p(a 1 ) = 1, since they have p(¬a 1 ) in a denominator. Therefore, this case has to be excluded from these clauses, and dealt with by special postulation. In the real-valued model of Definition 4, it follows from clauses (1)-(3) that p * a 1 = p if p(a 1 ) = 1. 14 This is a sensible solution since a revision by a proposition that one is already unable to doubt should be vacuous and leave the belief state as it was. We can achieve this by extending clause (0) to this case as well: If p(a 1 ) = 0 or p(a 1 ) = 1, then p * δ a 1 = p Summarizing all this, we obtain the following multistate model of revision with hyperreal probabilities: 14 To see this, note that in this case, p(a 1 &b) = p(b). It then follows from clause (1) that ( ( (p * a 1 ) ) )(b) = p(b) and from clause (3) that ( ( (p * a 1 ) ) )(a 2 ) = b∈B p(a 2 &b) = p(a 2 ).
Definition 5 A multistate model of (hyperreal) probability revision is a quadruple p, B, E, * , in which p is a hyperreal-valued probability function, B a state catalogue for p, E a set of observables for p that is logically disjoint from B, and * an operation of revision such that for all a 1 , a 2 ∈ E and b ∈ B: If p(a 1 ) = 0 or p(a 1 ) = 1, then: (0) p * δ a 1 = p If 0 = p(a 1 ) = 1, then:

Properties of Probability Revision in the Multistate Model
The following observation confirms that if p satisfies the Kolmogorov axioms, then so does every new probability function that is obtainable by revision of p: 15

Observation 2 Let p, B, E, * be a multistate model of probability revision.
(1) Let e 1 , ..., e n be mutually exclusive elements of E, let d ∈ E, and let each of δ 1 , ..., δ n and δ be either 0 or a positive infinitesimal. Then: (2) Let a 1 and a 2 be elements of E, both with non-zero probabilities, and let each of δ 1 and δ 2 be either 0 or a positive infinitesimal. Then: p(a 2 ) × ( ( (p * δ 2 a 2 ) ) )(a 1 ) ≈ p(a 1 ) × ( ( (p * δ 1 a 1 ) ) )(a 2 ) These are properties shared by standard Bayesian conditionalization (which is of course a special case of the multistate model, in which B has exactly one element). The following conditions hold for Bayesian conditionalization, provided that e 1 , ..., e n are mutually exclusive: The resources of this model allow us to divide sentences into an exhaustive set of five mutually exclusive epistemic classes: irreversible full belief (p(a) = 1) reversible full belief (p(a) 1) irresolute belief (0 p(a) 1) reversible full disbelief (0 p(a)) irreversible full disbelief (0 = p(a)) Traditional (Bayesian) probability dynamics contains only the first, third, and fifth of these classes. Three of the classes have close analogues in dichotomous belief change, namely the second (a ∈ K), third (a / ∈ K, ¬a / ∈ K) and fourth (¬a ∈ K). The following two observations show what patterns of movement are possible among the five epistemic classes.   Fig. 1 To the left, the three belief classes in the traditional approach to real-valued probabilities, and the two types of possible movements between these classes. To the right, the five belief classes in the multistage hyperreal model, and the twelve types of possible movements between them Learning that an event has taken place can induce us to increase our previous estimate of its (prior) probability. This was illustrated in the urn example discussed in Section 2.2, where we noted an increase in the assessed probability of an event a from p(a) = 0.5 to ( ( (p * δ a) ) )(a) = 0.625. Such changes appear to be common in practice, although they have sometimes been described as irrational [22]. They are difficult to represent in a standard Bayesian framework, since the most obvious representation p(a | a) does not work for the purpose. (The value of that expression is 1 whenever a has non-zero probability.) In the multistate model they are represented by expressions such as ( ( (p * δ a) ) )(a), and typically, ( ( (p * δ a) ) )(a) = 1. The following observation reports some results on this type of epistemic change. (1) It does not hold in general that if d a, then ( ( (p * 0 d) ) )(a) ≥ p(a) (2) It does not hold in general that ( ( (p * 0 a) ) )(a) ≥ ( ( (p * 0 d) ) )(a).
It is essential for probabilistic learning that we can make several observations of what we consider to be the same phenomenon, such as several tosses of the same dice or several accidents of the same type. Such "duplicates" can be represented as follows: We can use ( ( (p * δ a n ) ) )(b) as an abbreviated notation for ( ( (p * δ a 1 * δ . . . * δ a n ) ) )(b), where b ∈ B and {a 1 , . . . , a n } is a set of probabilistic duplicates of a.
Observation 8 Let p, B, E, * be a multistate model of probability revision. Let a ∈ E be an observable that has probabilistic duplicates, and let b be an element of B such that p(a | b ) > p(a | b) for all b ∈ B \ {b }. Then st(( ( (p * δ a n ) ) )(b )) converges to 1 with increasing n.
For events whose combination has at most infinitesimal probability, the permutativity result of Theorem 1 does not hold in general:

Observation 9
Let p, B, E, * be a multistate model of probability revision.
In the example used in the proof of Observation 9, the iterated operation gives preference to the most recent information. 16 The common models of dichotomous belief change have the same tendency, but to an even higher degree; they always give priority to the most recent input and delete whatever needs to be removed in order to accommodate it. [16,21] This is a feature of the multistate model that seems to be worth a more thorough investigation.

A Derived Logic of Full Beliefs
One obvious way to connect probabilistic and dichotomous belief change models is to treat the "top" of a standard probability function, i.e. the set of sentences to which it assigns probability 1, as a (dichotomous) belief set. However, this approach has the serious drawback that in the standard model, this top will act as a black hole, a belief that has reached it can never be removed from it again. A much more reasonable model can be obtained by instead using the "soft top" of the probabilistic belief states in the (hyperreal) multistate model. The soft top consists of the propositions whose probability has the standard part 1. It corresponds to the two uppermost levels in the right-hand diagram in Fig. 1, but for most purposes only the lowest of these two levels should be used for empirical propositions. Since different infinitesimal numbers, as well as 0, can serve as the index δ in the revision operation * δ , different revisions by one and the same proposition are available. 17 In this section, the focus will be on the derived system of dichotomous belief change that can be obtained from the multistate model in this way. To begin with, we need to confirm that the soft top is really a (logically closed) belief set. The most influential approach to dichotomous belief change is the AGM model, which is characterized by a set of postulates. Although these postulates are far from uncontroversial [18, pp. 27-42] and [5, pp. 41-47], it is common practice to compare other dichotomous belief change models to them. The postulates come in two groups: 18 The six basic AGM postulates: (vacuity) a ∈ K * a (success) 17 If the framework contains revisions by inputs s, a with s 1, then these operations can be seen from the dichotomous viewpoint as operations that strengthen or weaken the credibility of the proposition a without changing its (dichotomous) belief status. Such operations have been discussed in the belief revision literature,see [4,28,29]. 18 K + a is an abbreviation of Cn(K ∪ {a}).
If a ↔ a is a logical truth, then K * a = K * a . (extensionality) If a is consistent, then so is K * a. (consistency) The two supplementary AGM postulates K * (a&a ) ⊆ (K * a) + a (superexpansion) If ¬a / ∈ K * a, then (K * a) + a ⊆ K * (a&a ). (subexpansion) The following observation shows that four of the six basic AGM postulates hold for revision in the multistate model:

Observation 11
Let p be the probability function of a multistate model p, B, E, * of probability revision. Then: (4) It does not hold in general that there is some δ with 0 ≤ δ ≈ 0 and a ∈ p * δ a . ("success") (5) If a 1 ↔ a 2 , then p * δ a 1 = p * δ a 1 (extensionally) (6) p * δ a ⊥ (strong consistency) 19 The success postulate, which fails for this operation, is probably the most criticized among the AGM postulates for revision. Quite a few authors have found it utterly unrealistic that an operation of revision should accept any new information that is received. There is also a considerable literature on non-prioritized revision, by which is meant revision which does not always accept its inputs. (See Fermé and Hansson [5, pp. 65-68] for an overview.) Several weaker versions of the success postulate have been proposed. The following two are satisfied by several operations of non-prioritized revision: Either K * p p or K * p = K. (relative success) If K * q p, then K * p p. (regularity) However, neither of these holds for the multistate revision operation: Observation 12 Let p be the probability function of a multistate model p, B, E, * of probability revision, and let a ∈ E. Then: (1) It does not hold in general that there is some δ such that a ∈ p * δ a or p * δ a = p .
(2) It does not hold in general that there is some δ such that: If a ∈ p * δ d , then a ∈ p * δ a . Strong consistency, which was shown in Observation 11 to hold in the multistate model, is significantly stronger than the common consistency postulate of the AGM framework (which would in our notation be expressed as: If a ⊥, then p * δ a ⊥). 19 Hansson [12, p. 344] and [14, p. 154].
The strong consistency postulate is incompatible with the success postulate, since the former forbids but the latter requires that p * δ ⊥ ⊥.
In the presence of closure, the vacuity postulate, is equivalent with the conjunction of the following two conditions: On closer inspection it turns out that the reason vacuity does not hold for our probability-based operation of revision is that (i) does not hold. In contrast, (ii) holds: Observation 13 Let p be the probability function of a multistate model p, B, E, * of probability revision. Then: If ¬a / ∈ p , then p ⊆ p * δ a (preservation) Since (i) follows from success, it would make no difference to replace vacuity by preservation (ii) in the list of basic AGM axioms. After such a modification, success would be the only of these postulates not satisfied in the multistate model.
Neither of the two supplementary postulates of the AGM model holds for the multistate operation.

Observation 14 Let p, B, E, * be a multistate model of probability revision.
(1) It does not hold in general that there are δ and δ such that p * δ (a 1 &a 2 ) ⊆ Cn( p * δ a 1 ∪ {a 2 }). ("superexpansion") (2) It does not hold in general that if ¬a 2 / ∈ p * δ a 1 , then there is some δ such that Cn( p * δ a 1 ∪ {a 2 }) ⊆ p * δ (a 1 &a 2 ) . ("subexpansion") As emphasized in Section 1, a major purpose of the present framework is to allow for reversible movements between the set of irresolute beliefs and the set of full beliefs. We have investigated operations that transfer a proposition from the set of resolute beliefs to that of full beliefs. For changes in the opposite direction we will need other types of operations, namely operations designed to decrease the probability of a given proposition. In the language of dichotomous belief change, such operations are called contractions. The construction of suitable such operations and the investigation of their relations to other operations of contraction are left for a later occasion. you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Appendix A: A Brief Introduction to Hyperreal Numbers
Readers wanting an introduction to the arithmetic of hyperreals and infinitesimals are referred to Keisler [25].
The number system of hyperreal numbers (R * ) is an extension of the set R of real numbers. It consists of both finite and infinite numbers, but our focus will be on the finite numbers. The hyperreal finite numbers consist of (1) the real numbers and (2) other numbers on an extended number line, which are posited between the real numbers. The positive hyperreal numbers that are infinitely close to 0 (larger than 0 but smaller than all positive real numbers) are the positive infinitesimals. The negative infinitesimals are the numbers that are smaller than 0 but larger than all negative real numbers.
Two hyperreal numbers s and t are infinitesimally close (denoted s ≈ t) if and only if their difference s −t is infinitesimal. The relation ≈ is an equivalence relation. Each finite hyperreal number is infinitely close to exactly one real number, which is called its standard part. The standard part of the number t is denoted st(t ). Standard parts satisfy the following rules.
The set of hyperreal numbers satisfies the same algebraic laws as the real numbers. Let δ and be infinitesimals, and let let s and t be finite numbers that are not infinitesimals. Then: The following are infinitesimals: δ + , δ , sδ and δ/s. The following are finite and not infinitesimal: s + , st and s/t. The following is finite and may or may not be infinitesimal: s + t The following may be infinite or finite, and in the latter case it may or may not be infinitesimal: δ/ .

Proof of Observation 9
The following example serves to prove both parts of the observation: We then have:

Proof of Observation 10
It is sufficient to prove that (1) If a ∈ p and a d, then d ∈ p , and (2) If a 1 ∈ p and a 2 ∈ p , then a 1 &a 2 ∈ p .
For (1), let a ∈ p . If a d, then d is equivalent with a ∨ (d&¬a), and it follows from the third Kolmogorov axiom that p(a) ≤ p(a ∨ (d&¬a)) = p(d), hence st(p(d)) = 1.
Second step: We are going to show that for all b ∈ B, if p(b ) ≈ 0, then p(d | b ) ≈ 1. Let p(b ) ≈ 0. It follows from d ∈ p that b∈B (p(b) × p(d | b)) ≈ 1. We also have b∈B p(b) = 1 and p(d | b) ≤ 1 for all b ∈ B, from which the desired conclusion follows.