Progression and Verification of Situation Calculus Agents with Bounded Beliefs

We investigate agents that have incomplete information and make decisions based on their beliefs expressed as situation calculus bounded action theories. Such theories have an infinite object domain, but the number of objects that belong to fluents at each time point is bounded by a given constant. Recently, it has been shown that verifying temporal properties over such theories is decidable. We take a first-person view and use the theory to capture what the agent believes about the domain of interest and the actions affecting it. In this paper, we study verification of temporal properties over online executions. These are executions resulting from agents performing only actions that are feasible according to their beliefs. To do so, we first examine progression, which captures belief state update resulting from actions in the situation calculus. We show that, for bounded action theories, progression, and hence belief states, can always be represented as a bounded first-order logic theory. Then, based on this result, we prove decidability of temporal verification over online executions for bounded action theories.


Introduction
In this paper, we develop a computationally-grounded framework to model and verify agents that operate in infinite domains, have incomplete information and make decisions based on their beliefs, expressed as situation calculus bounded action theories [11]. The situation calculus [33,36] is a first-order logical framework for reasoning about action, where several issues have been addressed, such as the frame problem, time, continuous change, complex actions and processes, uncertainty, and many others. It is also the basis of the Golog family of agent programming languages [10,27] and has been used to develop rich theories of agent mental states and actions [44].
We use situation calculus action theories to express the mental model of an agent that can deliberate and act in the world. We take a first-person view and use the theory to capture what the agent believes about the domain of interest and the actions affecting it. In other words, the agent represents its beliefs 1 about the world as a situation calculus theory and uses it to reason and deliberate about what to do. Once the agent has chosen an action it executes it in the real world, and follows such an execution in its mental model constituted by the theory.
Essentially the agent works in a sort of infinite loop in which, at each iteration, the agent (i) understands which actions are known to be executable in the current state through reasoning, exploiting the (incomplete) information formalized in the situation calculus theory, (ii) choses one among them (using any suitable deliberation mechanism which we do not model here), and (iii) executes it, advancing to the resulting state. 2 This agent execution regime is known in the literature as online execution [10,16] and contrasts with so called offline execution, in which the agent reasons about possible action executions but does not perform any actions in the real world.
The main goal of this paper is to analyze the agent's online execution capabilities through verification of temporal properties expressed in a firstorder variant of the μ-calculus. For instance, we can easily express that there exists a sequence of actions known to be executable that reaches a state where a goal is true, even if the agent has incomplete information about the world (as represented by the action theory), and hence we can check if a conformant plan [5] exists through verification.
Specifically, we adopt bounded action theories [11], a particular class of action theories, for which it was shown that verification (over offline executions) of a very expressive class of first-order μ-calculus temporal properties is decidable. Bounded action theories are basic action theories [36], which entail that in all situations the number of object tuples in the extension of each fluent is bounded by a constant. In such theories, the object domain remains nonetheless infinite, as is the domain of situations. Boundedness can often be safely assumed, since in reality facts don't persist indefinitely as everything decays and changes. Moreover, agents often forget facts either because they are not used or because they cannot be reconfirmed. Many examples of domains modeled as bounded action theories are reported in [11], which also identifies various ways to obtain boundedness: (i) by strengthening preconditions to block actions where the bound would be exceeded; (ii) by ensuring that actions are effect bounded and never make more fluent atoms true than they make false; and (iii) by using fading fluents whose strength fades over time unless they are reconfirmed.
Towards the goal of devising decidable techniques for verifying properties of online execution in the case of bounded situation calculus action theories, first we examine progression [29] for such kind of theories. By progressing the initial situation description over an action we obtain a new situation description representing all that is known about the situation after the action is performed. More specifically, the fragment of the original theory that talks about the initial situation can be considered the initial belief state, and similarly the result of progressing such fragment as the result of executing an action can be considered the belief state after the action, and so on. 3 In this sense progression can be thought of as capturing belief state update that results from actions in the situation calculus. Unfortunately, in the general case, progression (and hence such belief states) can only be expressed in second-order logic [29,50]. Here, we show that for bounded action theories, progression, and belief states, can always be represented in first-order logic, and discuss how a first-order progression can be constructed.
Often, belief states are a priori thought of as some sort of first-order theory about the current world state whose models are the possible alternative world states that the agent thinks it may be in [39,40]. However, in the situation calculus, first-order belief states are not complete in general unless one additionally keeps a description of the past situations; to represent updated belief states without keeping such past information, second-order logic is needed [29,50]. When progression is first-order representable, such firstorder belief states are indeed "complete" and no further information (apart from the specification of actions) is needed. Hence for bounded theories, by iterating progression steps we obtain a "computationally grounded" model of agents [52], in the sense that such a model captures how the belief states of agents are generated and updated from the action theory, which describes what is true (according to the agent) and how this evolves as actions are performed.
With this result on progression in place, we investigate the verification of online executions of agents. We show that for a very rich class of temporal properties expressed in a first-order variant of the μ-calculus verification of online executions is decidable. This result complements the one in [11], which showed decidability of verification for offline executions.
The rest of this paper is organized as follows. In Sect. 2, we briefly review the situation calculus and the notion of online execution. Then in Sect. 3, we recall the definition of progression. In Sect. 4, we go over the notion of bounded action theory from [11] and give some examples. In Sect. 5, we show our first major result, i.e., that the progression of a bounded action theory can always be represented in first-order logic. Following that in Sect. 6, we introduce a language for expressing temporal properties of online executions (a first-order variant of the μ-calculus), and then show our main result, i.e., that verification of such properties over bounded action theories is decidable. Finally in Sect. 7, we discuss related work, while in Sect. 8, we summarize our contributions and discuss future work.

The Situation Calculus and Online Executions
The situation calculus [33,36] is a sorted predicate logic language for representing and reasoning about dynamically changing worlds. All changes to the world are the result of actions, which are terms in the language. We denote action variables by lower case letters a, action types by capital letters A, and action terms by α, possibly with subscripts. A possible world history is represented by a term called a situation. The constant S 0 is used to denote the initial situation where no actions have yet been performed. Sequences of actions are built using the function symbol do, where do(a, s) denotes the successor situation resulting from performing action a in situation s. Besides actions and situations, there is also the sort of objects for all other entities. Predicates and functions whose value varies from situation to situation are called fluents, and are denoted by symbols taking a situation term as their last argument (e.g., Holding(x, s)). For simplicity, and without loss of generality, we assume that there are no functions other than constants (and do) and no non-fluent predicates. We denote fluents by F and the finite set of primitive fluents by F. The arguments of fluents (apart from the last argument which is of sort situation) are assumed to be of sort object. A special predicate Poss(a, s) is used to state that action a is executable in situation s. The abbreviation Executable(s) means that every action performed in reaching situation s was possible in the situation in which it occurred.
Within the language, one can formulate action theories that describe how the world changes as the result of the available actions. Here, we concen-trate on basic action theories (BATs) as proposed in [35,36]. We also assume that there is a finite number of action types A. Moreover, we assume that the terms of object sort are in fact a countably infinite set N of standard names for which we have the unique name assumption and domain closure. As a result a basic action theory D is the union of the following disjoint sets: the foundational, domain independent, (second-order, or SO) axioms of the situation calculus (Σ); (first-order, or FO) action precondition axioms stating when actions can be legally performed and characterizing Poss (D ap ); (FO) successor state axioms describing how fluents change between situations (D ss ); (FO) unique name axioms for actions and (FO) domain closure axioms on action types (D una ); (SO) unique name and domain closure axioms for object constants (D coa ); and (FO) axioms describing the initial configuration of the world (D 0 ), which we assume finite. 4 Note that successor state axioms encode the causal laws of the domain; they take the place of the so-called effect axioms and provide a solution to the frame problem.
We say that a formula φ(s) is uniform in a situation term s if s is the only situation term it contains, and we will use the term situation-suppressed to refer to the formula such that the situation argument in fluents is omitted (see [36] for a formal definition). Following standard terminology, sentences are closed formulas of the language with no free variables of any sort.
A basic action theory represents the conditions under which actions are executable, how the world state changes as a result of the actions that are possible, and what information the modeler has about the initial state. Typically such theories are used to support reasoning on "offline executions", where the agent "thinks" about the executability of action sequences and what conditions would hold in the resulting state, without actually executing any action in the real world. In this way, the agent can understand the consequences of acting before actually performing any action. Situation calculus action theories can also be used to support reasoning on "online executions" [10,16], where the agent reasons on the theory to understand which actions are possible and what they bring about, so as to select in an informed way one of them and actually perform it in the real world. We can think of an agent operating online as executing the following procedure using its basic action theory D and starting in the initial situation S now 5 : In other words, at each iteration the agent selects one action among those that it knows/believes to be executable and executes it, and so on.
The following simple example illustrates the difference between online and offline executions.
Example 1. Consider a domain in which actions α and β are available, and are executable under the following precondition axioms: Further let us assume that the following successor state axiom holds for fluent F and that the initial situation description is empty, and hence we do not know whether P or ¬P holds initially. Then by reasoning on offline executions we can infer that D |= ∃a.P oss(a, S 0 ) ∧ F (do(a, S 0 )).
However, there are no online executions leading to F . Indeed the agent cannot infer P and thus doesn't know whether α is in fact executable and similarly it cannot infer ¬P and thus doesn't know whether β is executable. So it cannot select α nor β. In other words, while some executable action exists, the agent does not know exactly which one can be executed, i.e., no action is epistemically feasible for the agent [38].
In this paper we are interested in verification of temporal properties over online executions. In particular, the procedure above if executed for all possible action choices generates a sort of infinite tree representing all possible ways to execute actions online. We are interested in verifying temporal properties over such a tree.
Next we present a basic action theory that will be our running example.
Example 2. Consider a factory where items are moved by robots between available working stations, may be painted when located at a particular station, and shipped out of the factory when placed at the shipping dock. Items may be heavy or fragile, in which case a different type of robot is required for moving them. We introduce the following action theory where fluents and actions have the intuitive meaning. 6 Action Precondition Axioms: Successor State Axioms: Initial State Axioms: • IsRobot(r) ≡ r = R1 ∨ r = R2; 6 We omit leading universal quantifiers for readability. Also, for simplicity, we use nonfluent predicates, e.g., IsLoc. To conform to the assumptions of the previous section, such predicates can be modeled by fluents whose successor state axioms preserve their truth value in all situations.
• HandlesHeavy(r) ≡ r = R1; • HandlesF ragile(r) ≡ r = R2; The successor state axiom for At says that item x is at location l after action a is performed in situation s if and only if either a involved robot r moving x to l and nothing was at l in s, and if x is either heavy or fragile then r handles this, or x was already at l in s, and a was not to ship it when it was at the shipping dock in s, nor was it for a robot r to move it to a different location l where r can handle x. Note that each station may hold at most one item at any given time. Also, Shipped(x, s) holds if item x has been shipped in the last performed action, while Painted(x, s) keeps track of items that have been painted until Shipped(x, s) becomes true. With regards to item I3, there is incomplete information about its properties as heavy or fragile. Thus a conformant plan needs to be obtained for processing it.

Progression and Belief States
The progression of a basic action theory is the problem of updating the initial description of the world in D 0 so that it reflects the current state of the world after some actions have been performed. In other words, a one-step progression of D with respect to a ground action α is obtained by replacing the initial knowledge base D 0 in D by a suitable set D α of sentences so that the original theory D and the theory (D − D 0 ) ∪ D α are equivalent with respect to how they describe the situation do(α, S 0 ) and the situations in the possible futures of do(α, S 0 ). In a seminal paper, Lin and Reiter [29] gave a definition for the progression D α of D 0 with respect to α and D as follows. Denote by S α the situation term do(α, S 0 ) and let M and M be structures with the same domains for sorts action and object. We write M ∼ S α M if: (i) M and M have the same interpretation of all situation-independent predicate and function symbols; 7 and (ii) M and M agree on all fluents at S α , that is, for every relational fluent F , and every variable assignment μ, M, μ |= F ( x, S α ) if and only if M , μ |= F ( x, S α ). Then, for D α a set of (possibly second-order) sentences uniform in S α , we say that D α is a progression of D 0 with respect to α and D if for any structure M , M is a model of D α if and only if there is a model This definition requires for the two theories D and (D − D 0 ) ∪ D α that any model of one is indistinguishable from some model of the other with respect to how they interpret the situations in S α and the future of S α . One technical detail is that according to this definition, some of the situationindependent properties of D are incorporated into the updated version of the initial knowledge base D α . In particular D una ∪ D coa (which is already present in D − D 0 ) needs to be in D α in order to comply with the definition. We will see later how we can focus on the part of D α that does not include D una ∪ D coa , in particular when this is finite and can be constructed by operating on D 0 . 8 Now, observe that we can take progression as a way of characterizing the belief state of an agent in a particular situation, i.e., what the agent believes about the current situation and what may happen in the future. In the context of the situation calculus, the various models of a basic action theory can be seen as characterizing the possible actual states where an agent may be in, while the notion of belief state may be captured by everything that is entailed by the theory in a particular situation. 9 The progressed knowledge base D α is a sentence that essentially represents this.
It has been shown that in the very general case, progression, hence this form of belief states, can only be captured in second-order logic [29,50]. Nonetheless there are cases, such as the so-called relatively complete theories [29], for which a first-order progression can always be obtained (an analysis of all the known classes to date can be found in [51]). Next, we proceed to show that for bounded action theories it is also the case that a first-order progression can always be constructed. 7 In our case we do not have situation-independent predicates and the only functions we consider are constants. 8 A discussion on the need for Duna in Dα and a slightly more involved definition that separates Duna from the progression can be found in [50, Definition 6, Appendix A]. 9 In fact, accounts in epistemic variants of the situation calculus have also been studied [25,42], but here we appeal to an interpretation directly based on entailment.

Bounded Action Theories
Let b be some natural number. We can use the notation |{ x | φ( x)}| ≥ b to stand for the FO formula: Using this, De Giacomo et al. [11] defines the notion of a fluent F ( x, s) in situation s being bounded by a natural number b as Bounded F,b (s) .
= |{ x | F ( x, s)}| < b and the notion of situation s being bounded by b: An action theory D then is bounded by b if it entails that: ∀s.Executable(s) ⊃ Bounded b (s). De Giacomo et al. [11] shows that for bounded action theories, verification of sophisticated temporal properties is decidable. It also identifies interesting classes of such theories.
Example 2 (continued). It is not difficult to show that the basic action theory we introduced in Example 2 is in fact bounded by 5. First note that there are 5 locations initially and as IsLoc is a non-fluent predicate this always remains true (and similarly for the other non-fluent predicates). For the fluent At, initially it is bounded by 3 and the action theory maintains this bound since moving an item replaces one atom of At by another, shipping removes one, and painting has no effect on At. Note that we do not model the arrival of new items (we will do this in the next example), however since there can be at most one item in each location, even in this case At would remain bounded by 5. For the fluent Shipped, it is bounded by 1 as initially it is an empty relation and the theory ensures that the ship action leaves at most one atom true at each situation, namely the item that was just shipped. Finally, for the fluent Painted it is bounded by 3 as no new items can arrive and only those present in the factory can be painted.
The case of item I3 is interesting as it illustrates how incomplete knowledge affects planning. The above action theory entails that a plan exists such that I3 is eventually shipped. In this conformant plan, both robots will attempt to move item I3 to the shipping dock in sequence with exactly one of them successfully moving it (depending on whether it is fragile or heavy), and then it will be shipped.
On the other hand, for either robot r the theory does not entail that r can successfully move I3 to the shipping dock: if I3 is heavy, only R1 can move it, while if it is fragile only R2 can move it. As a result, the plan of robot R1 moving I3 to the shipping dock and then shipping it is not feasible because the agent in control does not know that I3 will be at the shipping dock after R1 tries to move it, and as a result the ship action will not be known to be executable (and similarly for a plan that only involves R2).
Finally, the theory entails that there exists a plan such that eventually all objects are painted and shipped. We discuss how such statements can be specified and verified in the remainder.
In the above example, it is straightforward to satisfy the boundedness assumption as the domain of objects that may be affected in any future situation is in fact limited to the objects mentioned in the description of the initial state. Nonetheless, we can easily extend it to the case where arbitrary items may be introduced through an arrive action that brings new objects to the factory.
Example 3. We adapt the theory of Example 2 so that it also includes action arrive(x), where item x is placed in the shipping dock provided that the dock is free and the item is not already in the factory. This can be seen as an exogenous action that is invoked periodically when new items arrive and need to be processed.
The new theory is the same as before except that the following action precondition axiom is added and γ + (x, l, a, s) in the successor state axiom for At(x, l, s) is replaced by the following formula: First, note that as there are infinitely many constants, which are standard names, effectively an unbounded number of items may be handled by subsequent arrive, move, and ship actions. Observe though that since there are only a fixed number of stations in the factory, in any given situation the number of items present in the factory remains bounded, in fact by the same number as before. We can reason about this in a similar way as in the previous example.
As before, At is initially bounded by 3 but now the action theory ensures that it remains bounded by 5. This is because new items may arrive at the shipping dock only when the shipping dock is empty. As moving an item replaces one atom of At by another, and shipping an item removes one atom, there can be at most 5 items in the factory, one in each of the 5 available stations. Consequently, Painted is also bounded by 5 as only those items present in the factory can be painted, and Shipped is bounded by 1 as before.
In the previous examples, all individuals that may appear in the extensions of fluents are "standard names" mentioned in the description of the initial knowledge base or in the arguments of subsequent actions. Nonetheless, this is not necessary for boundedness. In the following example we adapt the theory to state that initially there is an item at station Hold 3 whose identity is not known, and which is either fragile or heavy.
Example 4. Consider again Example 2 and assume that the identity of the object at station Hold3 is unknown. The new theory is the same as before except for the initial state axioms for At, Heavy, and F ragile, now combined into the following: This shows that the boundedness condition does not require the identity of the individuals involved in the relations to be known. This allows for representing rich scenarios where initially it is only specified that a bounded number of objects will be in the extension of some property, and where the identity of these objects may be discovered later. For example, in a university admissions scenario we may know that at most ten new doctoral students will be admitted, and later on learn who these new students are.

Progressing Bounded Theories
We start by showing general results about progression. First we show that we can remove D una ∪ D coa from D when finding a progression for D 0 , and then add them back to get a correct progression with respect to the original theory.
α is a progression of D 0 with respect to α and D * , then D * α ∪D una ∪D coa is a progression of D with respect to α and D.
Proof. Assume that the "if" part of the lemma holds, i.e., that D * α is a progression of D 0 with respect to α and D * . We will show that the "then" part of the lemma holds by considering the definition of progression for The other direction is similar.
As we are interested in a knowledge base that remains finite as we progress, this lemma allows us to focus on the part of D α that can be maintained to be finite (note that D una ∪ D coa is infinite because D coa is infinite), and then reason with D α under the assumption of uniqueness of names for actions and objects. In particular, Lemma 1 allows us to look into "composing" a progression from various parts of D 0 that are progressed separately as shown in the next result:  10 The following holds: where ϕ i are (possibly second-order) sentences uniform in S 0 .
Proof. (⇐): Suppose not. Then there exists a model M such that, for The other direction is similar.
Lemma 2 says that one can obtain a progression of a disjunctive D 0 by progressing separately all of its disjuncts with respect to D * , and then adding the (infinite) set D una ∪ D coa by means of Lemma 1. For bounded action theories, the interpretation of fluents at S 0 can be captured (up to object renaming) by a characteristic sentence, i.e., a sentence of the form: where: AllDist(w 1 , . . . , w k ) is a formula of inequalities stating that w 1 , . . . , w k have distinct values, also distinct from any constant in D 0 ; and φ i ( x i , w 1 , . . . , w k ) is a formula of the form ( x i • t), with • ∈ {=, =} and t containing only variables w i and constants from D 0 , 11 that represents, up to object renaming, the extension of the set of fluents {F 1 . . . F n } in the language at S 0 , and the interpretation of constants occurring in such extensions. We observe that k is not the bound b. Indeed, b is a bound on the (maximum) number of tuples contained in the extension of the fluents, while k is a bound on the number of distinct elements that can occur in such extensions. Naturally, k can be derived from b, through the (maximum) arity of fluents, and vice versa. Note that characteristic sentences are FO and uniform in S 0 , and that the sets of models of non-equivalent characteristic sentences are disjoint.
Example 5. The characteristic sentence: captures the models that agree, up to object renaming, on the interpretation of c and F at S 0 . In particular, the sentence captures all models such that, for any three distinct By this, the next result easily follows, which states that the initial situation description D 0 can be rewritten as

Theorem 2. Any bounded action theory D is logically equivalent to
. , M 0 } as defined above. This result shows the existence of a particular first-order sentence Φ 0 = { i=1 Φ i } uniform in S 0 , that can replace D 0 in D, but does not provide a constructive way to obtain Φ 0 . The following result implies that, for a bounded action theory, such a sentence is in fact computable. Proof. Since D is bounded, one can compute a natural number B such that the fluent extensions in any executable situation, including S 0 , contain at most B distinct values. Thus, the characteristic sentences associated with the cells of M 0 use at most B distinct variables, and so we can take as candidate characteristic sentences those where at most B distinct variables occur, which are finitely many (once a suitable normal form is fixed).
By Theorem 2 and the fact that non-equivalent characteristic sentences have disjoint sets of models, it follows that a candidate characteristic sentence Φ is such that ( Thus, a way to obtain the desired characteristic sentences is to take all the candidate sentences Φ such that Φ |= D 0 . To this end, one can observe that since the models of each Φ have the same (finite) fluent extensions (up to renaming), they satisfy the same FO domain-independent sentences. Therefore, to check whether Φ |= D 0 , one can take any model M (satisfying D coa ) such that M |= Φ, and check whether M |= D 0 . In our case, the fluent extensions of a model satisfying a characteristic sentence can be obtained from the characteristic sentence itself.
Next, we observe that the syntactic form of the characteristic sentence of each cell is essentially the same as that of relatively complete initial knowledge bases with bounded unknowns, defined in [51] (cf. Definition 3), for which a FO progression (expressed again as a relatively complete sentence with bounded unknowns) always exists (cf. Theorem 1 in [51]). Therefore, given D, one can apply Theorem 2 to rewrite D 0 as a disjunction of (finitely many) characteristic sentences, and then progress each of them separately, using Theorem 1, to compute a progression of D. Note that, while progression applies to a single action, since the resulting progression is still a disjunction of characteristic sentences, we can apply it iteratively to deal with arbitrary (finite) action sequences.
So, with Theorem 1 we have shown that we can split the progression of a disjunctive D 0 into the disjunction of separate progressions, and with Theorem 2 that we can rewrite any D 0 (of a bounded action theory) into an appropriate form of disjuncts, each of which we can progress separately using Theorem 1 in [51]. Theorem 3 below essentially shows how the main idea behind the progression mechanism in [51] can be extended to progress initial knowledge bases that are disjunctions of relatively complete initial knowledge bases with bounded unknowns.
Theorem 3. All bounded action theories are iteratively first-order progressable.
This view of the knowledge base as a disjunction of a finite set of characteristic sentences provides a practical abstraction based on the boundedness assumption that also illustrates how it can be updated. We next show one step of progression for Example 4.
Example 6. The initial knowledge base can be logically equivalently expressed as the disjunction of two characteristic sentences, ψ 1 ∨ ψ 2 , the first of which is the following: ∃w. AllDist(w, R1, R2, Hold1, Hold2, Hold3, P aintStn The second sentence, ψ 2 , is the same as ψ 1 except for the last two conjuncts, in which item w is characterized as heavy instead of fragile: Now consider action move(R1, I1, ShipDock). Using the progression method of [51] for each characteristic sentence, we obtain a progressed version of the knowledge base, ψ 1 ∨ ψ 2 , which is identical to ψ 1 ∨ ψ 2 , except that the location of I1 is updated in the specification of At. For ψ 1 , we have: As to ψ 2 , it is obtained by replacing the specification of At in ψ 2 with the one above.

Verifying Online Executions
To express properties over online executions of BATs, we introduce a specific first-order variant of the μ-calculus [21,45]. The main characteristic of μ-calculus is its ability to express directly least and greatest fixpoints of (predicate-transformer) operators formed using formulae relating the current state to the next one. By using such fixpoint constructs one can easily express sophisticated properties defined by induction or co-induction. The μ-calculus is known to be one of the most powerful temporal logics, subsuming both linear time logics, such as LTL, and branching time logics such as CTL and CTL* [3].
Our variant of the μ-calculus, called μLO, is able to express first-order properties logically implied in a situation (as opposed to expressing firstorder properties true in a situation as in [11]). This is needed since online executions depend on what is logically implied, which, in a first-person view, corresponds to what the agent believes. To do so the "atomic" μLO formulas have the form holds(ϕ) and express that the FO (closed) formula ϕ is logically implied by the BAT in the current situation. The syntax of μLO is as follows: where ϕ is an arbitrary closed situation-suppressed (i.e., with all situation arguments in fluents suppressed) situation calculus FO closed formula, whose constants must appear in D \ (D una ∪ D coa ), and Z is an SO (0-ary) predicate variable. We use the following standard abbreviations: Φ 1 ∨ Φ 2 = ¬(¬Φ 1 ∧ ¬Φ 2 ), [−]Φ = ¬ − ¬Φ, and νZ.Φ = ¬μZ.¬Φ[Z/¬Z]. Intuitively, − Φ holds in a situation if there exists an action (that is known to be executable) after which Φ holds, and [−]Φ holds in a situation if for all actions (that are known to be executable), Φ holds afterwards. As usual in the μ-calculus, formulae of the form μZ.Φ (and νZ.Φ) must satisfy syntactic monotonicity of Φ with respect to Z, which states that every occurrence of the variable Z in Φ must be within the scope of an even number of negation symbols. The fixpoint formulas μZ.Φ and νZ.Φ denote respectively the least and the greatest fixpoint of the formula Φ, seen as a predicate transformer λZ.Φ (their existence is guaranteed by the syntactic monotonicity of Φ). We can express arbitrary temporal/dynamic properties using least and greatest fixpoint constructions. For instance, to say that it is possible to eventually achieve ϕ, where ϕ is a closed situation-suppressed formula, we use the least fixpoint formula μZ.ϕ ∨ − Z. Similarly, we can use a greatest fixpoint formula νZ.ϕ ∧ [−]Z to express that ϕ always holds. Note that our μLO language does not allow for quantification across situations. However one can mitigate this limitation by using fluents to refer to objects across situations.
As to semantics, since μLO contains formulae with predicate free variables, given an action theory D, we introduce a predicate variable valuation V, i.e., a mapping from predicate variables Z to sets of ground situation terms. Then, we assign semantics to formulae by associating with D and V an extension function (·) D V which maps μLO formulae to subsets of ground situation terms. We denote by Γ the set of ground executable situation terms, inductively defined as follows 12 : • If σ ∈ Γ, A is an action type with parameters x, n ∈ N is a vector of names such that | n| = | x|, and D |= P oss(A( n), σ), then do(A( n), σ) ∈ Γ.
The extension function is defined inductively as follows: With a slight abuse of notation, given a closed situation-suppressed formula ϕ, we denote by ϕ[σ] the formula ϕ with the situation argument reintroduced and assigned to σ. Also, given a valuation V, a predicate variable Z, and a set E of situation terms, we denote by V[Z/E] the valuation obtained from V by changing the value of Z to E. Notice also that when a μLO formula Φ is closed (with respect to predicate variables), its extension (Φ) D V does not depend on the predicate valuation V. The only formulas of interest in verification are those that are closed. We say that a theory D entails a closed μLO formula Φ, written D |= Φ, if S 0 ∈ (Φ) D V (for any valuation V, which is in fact irrelevant for closed formulas).
We next show some examples. For simplicity we adopt the notation of the well-known logic CTL * , which can be thought of as a fragment of μLO.
In particular E and A respectively express that there exists an (infinite) path (of action executions) satisfying and that all paths satisfy ; and G and F respectively express that along a path always holds and along a path eventually holds.
Example 7. For the agent in Example 2, one property of interest to verify is whether it is possible for the agent to eventually know that it has shipped all items that were in the factory. This can be expressed as the least fixpoint formula μZ. holds(¬∃x∃l.At(x, l) ∨ − Z) or, in CTL * , EF holds(¬∃x∃l. At(x, l)).
In the above, we rely on the fact that if there are no items left in the factory, then all items that were there must have been shipped. It is easy to check that the theory of Example 2, D 2 , entails this formula. More generally, a formula EF ϕ represents an instance of a conformant planning problem. It is satisfied by a theory if there exists an executable sequence of actions such that afterwards the agent knows that ϕ holds. In fact we can also show that the above property can always be achieved: D 2 |= AGEF holds(¬∃x∃l. At(x, l)). Another property that can be shown to hold for this domain is that it is possible for the agent to eventually know that it has shipped all items that were in the factory, and that every shipped item was painted. We express this as follows: At(x, l))∧ G holds(∀x(Shipped(x) ⊃ P ainted(x)))).
Example 8. For the agent in Example 3, with associated theory D 3 , we can show that: D 3 |= EF (holds(∀l¬∃x.At(x, l)) ∧ F holds(∀l.IsLoc(l) ⊃ ∃x. At(x, l))), i.e., it is possible to eventually have all items shipped out of the factory and then later to eventually have all locations filled with items. Moreover, we can also show that always if an item is at the shipping dock it can be shipped out next: D 3 |= AG(holds(∃x.At(x, ShipDock)) ⊃ − holds(¬∃x.At(x, ShipDock))). However, this is not the case for other locations, e.g. Hold1, as it is possible for all locations to become occupied, at which point the agent must ship the item at the shipping dock before it can move the item at any other location: D 3 |= ¬AG(holds(∃x. At(x, Hold1)) ⊃ − holds(¬∃x. At(x, Hold1))).
We observe that because Γ and N (thus the object sort) are infinite, one cannot check whether D |= Φ using an exhaustive search procedure, as typically done, e.g., in standard μ-calculus model checking [21]. This is true also for bounded theories, which can be infinite-state. However, under the assumption of boundedness, a finite structure can be constructed, which can be used to carry out the verification task. Such a construction is based on an abstraction process which amounts to clustering all the situations whose corresponding states are isomorphic, so as to generate a finite-state transition system that can be used for the verification. The following result, proven in the rest of the section, is based on the construction of such a transition system. D be a BAT bounded by b and Φ a closed μLO formula. Then, checking whether D |= Φ is decidable.
The proof involves two steps. Firstly, we provide an alternative semantics for μLO formulas, equivalent to the one above. Such a semantics is defined on top of a transition system T D derived from D, which we call progressionbased and that captures the evolution of the domain according to D. The second step exploits our results about progression of bounded theories, to show that a finite-state transition system T F can be effectively constructed, which is equivalent, for the purpose of verification, to T D . Since standard model checking algorithms can be executed on T F , this implies that the verification of μLO formulas is decidable.
We start by introducing the notion of transition system (TS). An (onlineexecution) transition system is a tuple T = Q, q 0 , λ, → , where: • Q is the set of possible states; • q 0 ∈ Q is the initial state; • λ : Q → 2L is the labeling function, associating each state q with a set D q of situation-suppressed sentences over N (L denotes the set of situation calculus situation-suppressed sentences); • → ⊆ Q × Q is the transition relation.
As can be seen, this is a special case of standard labelled TS, where states are labelled by (possibly non first-order) logical theories. We call this class of TSs online-execution to stress that they can accommodate all the information relevant to online executions.
The semantics of a μLO formula Φ over a TS T , under a valuation V, is as follows: We say that T entails a closed μLO formula Φ, written T |= Φ if q 0 ∈ (Φ) D V (for any valuation V, which is irrelevant). This is essentially the standard semantics of the μ-calculus [21], with satisfaction replaced by entailment on state labels.
Every action theory D induces a (family of) so-called progression-based TS T D = Q, q 0 , λ, → , defined as follows: • λ is inductively defined as follows: Intuitively, T D is the (infinite) situation tree of the theory with each situation labelled by a progression of D with respect to some ground action term, and taking the preceding situation as initial situation. As the following result shows, T D retains all the information entailed by D at every situation, and can thus be used to interpret μLO formulae. where h(D ) stands for the theory obtained from D by replacing every constant n it mentions by h(n). We say that h preserves a set of constants C ⊆ N , if h(n) = n, for every n ∈ C. We write D ∼ C D to denote that D and D are logically equivalent modulo renaming preserving C.
Logical equivalence modulo renaming formalizes the intuition that D and D have exactly the same models, modulo renaming of constants. It can be seen that when two theories are logically equivalent modulo renaming, they satisfy the same closed formulas, modulo renaming of constants. Proof. Immediate consequence of the definition of logical equivalence modulo renaming.
In particular, this result implies that formulae mentioning only preserved constants can be left unchanged. Thus, to check whether a closed formula ϕ is entailed by a class of logically equivalent (modulo renaming) theories, it is sufficient to evaluate the formula against an arbitrary representative of the class.
T 1 and T 2 are said to be (online-execution) bisimilar with respect to C, written T 1 ≈ C T 2 , if q 10 , q 20 ∈ B, for some bisimulation B preserving C. As usual, bisimilarity is an equivalence relation.
A notable property of online-execution bisimulations is that they preserve entailment of μLO formulas.
Theorem 7. Given two TSs T 1 and T 2 , and a set of constants C ⊆ N , if Proof. The proof is by induction on the structure of Φ and essentially analogous to that of the bisimulation-invariance theorem for standard μ-calculus [21]. In fact, the only difference is in the case of atomic formulae (Φ = ϕ) and the next operator (Φ = − Φ ). For the former, the thesis is a direct consequence of Corollary 1 as, by the definition of online-execution bisimulation, λ(q 10 ) ∼ C λ(q 20 ). For the latter, we first observe that, by the definition of online-execution bisimulation, there exists a transition q 10 → 1 q 1 if and only if there exists a transition q 20 → 2 q 2 such that q 1 , q 2 ∈ B. Now, one can see that q 1 ∈ (Φ ) T 1 V if and only if the TS T 1 = Q 1 , q 1 , → 1 , λ 1 , obtained from T 1 by setting the initial state to q 1 , is such that T 1 |= Φ , and analogously for q 2 and T 2 = Q 2 , q 2 , → 2 , λ 2 . Also, it is immediate to see that because T 1 ≈ C T 2 , we have that T 1 ≈ C T 2 . But then, by the induction hypothesis, T 1 |= Φ if and only if T 2 |= Φ , and the thesis follows.
Observe that this result holds even between an infinite-state TS, say T D , and a finite-state TS, say T F . When this is the case, the verification can be performed on T F by adapting standard μ-calculus model checking techniques, which essentially perform fixpoint computations on a finite state space. Unfortunately, two major obstacles prevent this approach from being effective at this stage. Firstly, Theorem 7 applies only provided T F is available while, thus far, we have no guarantee that it is actually computable.
Secondly, μ-calculus model checking requires a procedure to check whether state labelings of T F , which are essentially FO theories, entail atomic subformulas of Φ, a problem that is, in general, undecidable.
For the former problem, we next describe a procedure for the construction of a finite-state TS T F that we then prove to be online-execution bisimilar to T D . The latter problem can be easily overcome by resorting to Theorem 15 of [11], which states that the verification problem for bounded theories is decidable for a variant of μLO that covers the case that concerns us. for all assignments v : let Dα be the progression of (D − D0) ∪ λ(u) with respect to α = A(v( x)); 10: if there exists u ∈ U s. We construct T F using Algorithm 1. The algorithm takes a basic action theory D bounded by b as input and returns a finite-state TS T F = U, u 0 , →, λ bisimilar to T D . T F is obtained by iteratively progressing D, starting from the initial situation and expanding the frontier states in F (lines [5][6][7][8][9][10][11][12][13][14][15][16][17]. However, at each step, the theory is not progressed in all possible ways, that is with respect to all the infinitely many executable ground action terms; instead, only a finite subset of such terms is considered (see lines [6][7]. Notice that by the boundedness assumption and Theorem 3, the progression of D α is computable (D una and D coa need not be explicitly represented but can be assumed, and reasoning can be performed under these assumptions), which is clearly a necessary condition for the algorithm to terminate (line 9 could not be completed otherwise). In addition, for the algorithm to be welldefined, it is required that testing the condition of the if statement (line 10) be decidable. This fact is a consequence of the next result, once observed that, D being b-bounded, so are the models of the labels of the states in U .
where c contains all the constants in D 0i not occurring in C, x is a fresh set of variables such that | c| = | x|, and, by slight abuse of notation, D 0i [ c/ x] stands for the conjunction of the formulas in D 0i , with the constants in c syntactically replaced by the variables in x. It can then be seen that checking whether D 01 ∼ C D 02 is equivalent to checking whether Φ 1 and Φ 2 are logically equivalent (according to the standard definition). Notice that each Φ i imposes only constraints on the objects occurring in the extension of some fluent, while it does not constrain the remaining objects. In particular, it leaves the object sort free. Moreover, such formulas constrain the fluent extensions to be bounded. As a result, to check whether Φ 1 ≡ Φ 2 , it is sufficient checking whether the finite models of the two formulas such that the object sort contains only the values occurring in some fluent, match. But since such models are finite and, up to object renaming, finitely many, this is decidable.
Termination of the algorithm is guaranteed by the following result.
Theorem 9. If D is a BAT bounded by b, then Algorithm 1 terminates and produces a TS T F with a finite number of states in U .
Proof. We observe that, under boundedness, there exist only finitely many equivalence classes of theories, with respect to logical equivalence modulo renaming. This holds, in particular, for theories containing situationsuppressed formulas only, such as those labeling the states of T F . Termination is a consequence of this observation and the fact that, by the if statement, a new state u (to expand) is added to F only if no other state u is present in U with a labelling that is logically equivalent modulo renaming to that of u .
Finally, we can prove that Algorithm 1 returns a TS bisimilar to T D . Proof. Define B ⊆ Q × U such that q, u ∈ B if and only if λ(q) ∼ C λ(u). Obviously, q 0 , u 0 ∈ B, as λ(q 0 ) = λ(u 0 ), by the definition of T D and the construction of T F . We show that B is an online-execution bisimulation. To this end, consider a pair q, u ∈ B and observe that, by the definition of T D , a transition q → q exists in T D if and only if, for some ground action A( n), λ(q) |= Poss(A( n)) (with Poss situation-suppressed). Similarly, by the way T F is constructed, a transition u → u exists in T F if and only if λ(u) |= Poss(A( m)), for some ground action A( m).
Assume q → q , thus λ(q) |= Poss(A( n)), for some A( n). We next prove that for any choice of O in Algorithm 1, there exists an action term A(v( x)) such that λ(u) |= Poss (A(v( x))). To see this, observe that, by definition of ∼ C , there exists a bijection h witnessing λ(q) ∼ C λ(u). Assume first that h maps the objects of n into objects of the set O ∪ C ∪ C λ(u) . In this case, the fact that λ(u) |= Poss(A(v( x))) is implied by Theorem 6, for v( x) = h( n). Otherwise, there exists an element n of n such that h(n) / ∈ O ∪ C ∪ C λ(u) . In this case, we modify h into a bijection h analogous to h except that h(n) is swapped with some value h(n ) ∈ O such that n does not occur in n. This is always possible because | x| = |O| and O ∩ (C ∪ C λ(u) ) = ∅. It can be checked that the so-obtained h is also a witness of λ(q) ∼ C λ(u). In addition, by the constraints on the choice of O, one can iterate these changes, to finally obtain an h such that h ( n) ∈ O ∪ C ∪ C λ(u) , for which the previous case applies. Thus, there exists a transition u → u in T D .
The fact that λ(q ) ∼ C λ(u ) follows from the observation that when we progress theories that are logically equivalent modulo renaming, through the same ground actions (modulo the same renaming), we end up with progressed theories that are logically equivalent modulo the same renaming (it is easy to show that, otherwise, Theorem 6 would be violated). In particular, λ(q ) and λ(u ) are obtained as progressions of, respectively, (D − D 0 ) ∪ λ(q) and (D − D 0 ) ∪ λ(u), which are logically equivalent modulo renaming, as so are λ(q) and λ(u).
Proving the other requirement of the bisimulation relation is simpler, as any ground action term considered in the construction of T F has a corresponding term in T D , and no surgery is required on h.
Once T F is obtained, a variant of the standard algorithm for μ-calculus model checking [21] can be applied to check whether T F |= Φ. This observation, together with Theorems 10 and 7, implies the desired result, i.e., Theorem 4.

Related Work
There has been growing interest in reasoning about and verifying agent programs. Most work on verification of agent systems/programs uses propositional modal logics and model checking techniques [3,32]. These include [6,18,47], and [53], which all focus on model-checking of BDI programs. Model checking (and satisfiability) in these propositional modal logics is decidable. But such logics can only represent finite domains and finite state systems.
There is also some work that uses first-order logical formalisms and can deal with infinite domains. This includes work that uses theorem proving techniques, such as Shapiro et al.'s CASLve verification environment [43,44] for multi-agent ConGolog programs based on an extended version of the situation calculus with knowledge and goal fluents. Another approach first developed by [8] uses fixpoint approximation techniques reminiscent of model checking, in combination with "characteristic graphs", which can finitely represent a Golog program's configuration graph. De Giacomo et al. [15] and Sardina and De Giacomo [37] also use these techniques. But note that for these first-order formalisms, verification is undecidable in general. So these approaches have no termination guarantees and are hence sound but not complete.
First-order reasoning about action formalisms such as the situation calculus are very general and expressive. So until recently, decidability results for reasoning in the situation calculus had been few, e.g., [46] for an argumentless fluents fragment, and [23] for a description logic-like 2 variables fragment. It is the case that situation calculus basic action theories support regression to reduce reasoning about a given future situation to reasoning about the initial situation [36], and generalizations of this result such as just-in-time histories [17] can also be exploited. However, these techniques cannot be used to verify general temporal properties.
A significant advance was [11], where the class of bounded action theories in the situation calculus is identified for which verification of temporal properties is decidable. In such theories, the number of object tuples that belong to the extension of fluents is bounded in every situation. But the object domain remains infinite, and an infinite run may involve an unbounded number of objects. Claßen et al. [9] also identifies cases where verification of ConGolog programs is decidable. In both of these works, properties are verified over offline executions. In this paper, we essentially extend such approaches, in particular [11], to verify temporal properties over online executions. By the way, note that in [14] it is shown that if the initial situation description is bounded, then one can verify using the techniques of [11] that an action theory remains bounded in all executable situations.
Verification of infinite states systems is of interest not only for AI, but also for other areas of computer science. There is also substantial work that uses model checking techniques on infinite state systems. However, in most of this work the emphasis is on studying recursive control rather than on a rich data oriented state description; typically data are either ignored or finitely abstracted, see e.g., [7]. There has recently been some attention paid in the field of business processes and services to including data into the analysis of processes [20,22,24]. Interestingly, while we have verification tools that are quite good for dealing with data and processes separately, when we consider them together, we get infinite-state transition systems, which resist classical model checking approaches to verification. Lately, there has been some work on developing verification techniques that can deal with such infinite-state processes [1,2,4,19]. In particular [2,4] brings forth the idea of exploiting state boundedness to get decidability for verification of infinite-state dataaware systems.
Note that in this paper, we take a first-person view of the action theory as representing the agent's beliefs, so the notion of belief is metatheoretic. There has also been work on versions of the situation calculus that incorporate an additional knowledge/belief modality, thus taking a third-person view of knowledge/belief. This can be done by adapting the possible worlds model of knowledge to the situation calculus, as first proposed by [34]. Scherl and Levesque [41,42] formalized this approach in the context of Reiter's basic action theories [36] and showed that regression could be used to answer epistemic queries about a given ground situation. Lakemeyer and Levesque [25,26] also developed a first-order modal version of the epistemic situation calculus.
The approach in this paper is related to but quite different from that in [12], which builds on a version of the situation calculus with a knowledge modality [41]. Such an approach is especially interesting when there are several agents working from their own first-person account simultaneously, and we can consider their relationship to a third-person (modeler) account [43]. However De Giacomo et al. [12] uses a notion of bounded epistemic action theory that is more restrictive than ours, in that it requires that the number of object tuples that the agent thinks may belong to any given fluent in a situation is bounded. In other words, the total number of tuples summed over all epistemic alternatives (in the situation), is required to be bounded. Here, instead, we only require that in any possible world (i.e., model of the theory), the number of distinct tuples in the extension of any fluent in any given situation is bounded.
There is also a lot of work on progression. As mentioned in Sect. 5, the notion of progression for basic action theories was first introduced by Lin and Reiter [29]. They were also first to investigate restrictions that guarantee a first-order progression: a restriction on the form of D 0 , namely the relatively complete databases, and a restriction on the type of available actions, namely the context-free assumption for actions, each of which guarantees that a first-order progression can always be found. There is a lot of related work to this direction. Liu and Levesque [31] introduced the local-effect assumption for actions which was later shown by Vassos et al. [49] to be a sufficient condition for ensuring that a first-order progression can be found, while Liu and Lakemeyer [30] extended this further to the so-called normal actions. Among the most relevant to this work is the recent result about progressing a D 0 that is relatively complete with bounded unknowns [51], where essentially D 0 is similar to a database with named nulls and constraints on the values of those. In this work, Vassos and Patrizi [51] also give a classification of all known restrictions (or classes of basic action theories) for which a first-order progression can always effectively be computed.
Finally, note that in [13] the online executions verification for bounded action theories framework presented in this paper is adapted to handle sensing actions. There, a first-order variant of linear time logic [48] is used to specify the properties to be verified. It is also shown that one can always obtain a first-order progression for sensing actions.

Conclusion
We have proposed a decidable framework for verifying agents with bounded beliefs operating in infinite state domains. The agent has bounded beliefs if the action theory that models the agent's beliefs and deliberation process entails that the number of tuples that belong to any fluent in any situation is bounded by a constant. We have shown that this boundedness condition is sufficient to ensure that the agent's belief state in any situation can be progressed and remain first-order representable. The framework allows complex subjective temporal properties to be specified and verified over online executions of the agent, i.e., executions where the agent only performs actions that it knows are feasible. 14 We have assumed that the object domain is isomorphic to an infinite set of standard names. Since we are concerned with online executions, it would be strange to allow for the occurrence of actions that the agent cannot even name. But note that all the results we have shown hold even if we drop domain closure for objects.
In the case where the initial situation description is in the form studied in [51], computing progression becomes particularly easy. This simplifies verification by making it simple to compute the finite transition system on which the model checking algorithm is applied.
In future work, we want to extend our online executions verification framework to deal with partially observable actions and forgetting (which helps to maintain boundedness); this will require changes to the specification language as it introduces forms of nondeterminism that are not under the agent's control. We also want to allow some forms of quantification across situations in the specification language. Finally, we want to extend the framework to support the verification of agent programs.