What do you really want to do? Towards a Theory of Intentions for Human-Robot Collaboration

The architecture described in this paper encodes a theory of intentions based on the key principles of non-procrastination, persistence, and automatically limiting reasoning to relevant knowledge and observations. The architecture reasons with transition diagrams of any given domain at two different resolutions, with the fine-resolution description defined as a refinement of, and hence tightly-coupled to, a coarse-resolution description. For any given goal, nonmonotonic logical reasoning with the coarse-resolution description computes an activity, i.e., a plan, comprising a sequence of abstract actions to be executed to achieve the goal. Each abstract action is implemented as a sequence of concrete actions by automatically zooming to and reasoning with the part of the fine-resolution transition diagram relevant to the current coarse-resolution transition and the goal. Each concrete action in this sequence is executed using probabilistic models of the uncertainty in sensing and actuation, and the corresponding fine-resolution outcomes are used to infer coarse-resolution observations that are added to the coarse-resolution history. The architecture’s capabilities are evaluated in the context of a simulated robot assisting humans in an office domain, on a physical robot (Baxter) manipulating tabletop objects, and on a wheeled robot (Turtlebot) moving objects to particular places or people. The experimental results indicate improvements in reliability and computational efficiency compared with an architecture that does not include the theory of intentions, and an architecture that does not include zooming for fine-resolution reasoning.


Introduction
Consider a wheeled robot delivering objects to particular places or people, or a robot with manipulators stacking objects in particular configurations on a tabletop, as shown in Fig. 1. Such robots that are deployed to assist humans in dynamic domains have to reason with different descriptions of uncertainty and incomplete domain knowledge. Information about the domain often includes commonsense knowledge, especially default knowledge that holds in all but a few exceptional circumstances. For instance, the robot may be told that "books are usually in the library, but cookbooks may be in the kitchen". The robot also extracts information from sensor inputs using algorithms that quantify uncertainty probabilistically, e.g., "I am 95% certain the robotics book is on the table". Although it is difficult to equip robots with comprehensive domain knowledge or provide elaborate supervision, reasoning with incomplete or incorrect information can lead to incorrect or suboptimal outcomes, especially when the robot is faced with unexpected success or failure. For example, a robot may be asked to move two books from the office to the library in a domain with four rooms. If this robot can only grasp one object at a time, it will plan to move one book at a time from the office to the library. After moving the first book, if the robot observes the second book in the library, or in another room on the way back to the office, it should stop executing the current plan because this plan will no longer achieve the desired goal. Instead, it should reason about this unexpected observation and compute a new plan if necessary. One way to achieve this behavior with a traditional planning system is to reason about all observations of domain objects and events during plan execution, but this approach is computationally unfeasible in complex domains. The architecture described in this paper, on the other hand, achieves the desired behavior by equipping a robot pursuing a particular goal with an adapted theory of intentions. This theory builds on the fundamental principles of non-procrastination and persistence in the pursuit of a desired goal. It enables the robot to reason about mental actions and states, automatically identifying and considering the domain observations relevant to the current action and the goal during planning and execution. We refer to actions in such plans as intentional actions. We describe the following characteristics of our architecture: -The domain's transition diagrams at two different resolutions are described in an action language, with the fine-resolution transition diagram defined as a refinement of the coarse-resolution transition diagram. At the coarse resolution, non-monotonic logical reasoning with incomplete commonsense domain knowledge, which includes a theory of intentions, produces a sequence of intentional abstract actions for any given goal. -Each intentional abstract action is implemented as a sequence of concrete actions by automatically zooming to and reasoning with the part of the fine-resolution system description relevant to the current coarse-resolution transition and the goal. Each concrete action in this sequence is executed using probabilistic models of uncertainty, and the observed and inferred outcomes are added to the appropriate coarse/fine-resolution history.
Action languages are formalisms that are used to model domain dynamics (i.e., action effects). We chose to use an extension to action language AL d [13], which we introduced in prior work to model non-Boolean fluents and non-deterministic causal laws [26], because it provides the desired expressive power for robotics domains. Also, we chose to translate our action language descriptions to programs in CR-Prolog [2], an extension of Answer Set Prolog (ASP) [14], because it supports non-monotonic logical reasoning with incomplete commonsense knowledge in dynamic domains, which is a key desired capability in robotics. 1 Furthermore, for the execution of each concrete action, we use existing algorithms that include probabilistic models of the uncertainty in perception and actuation. Our architecture builds on the complementary strengths of prior work on an architecture that used declarative programming to reason about intended actions to achieve a given goal [5], and an architecture that introduced step-wise refinement of tightly-coupled transition diagrams at two different resolutions to support non-monotonic logical reasoning and probabilistic reasoning for planning and diagnostics [26]. Prior work on the refinementbased architecture did not include a theory of intentions. Also, prior work on the theory of intentions did not consider the uncertainty in sensing and actuation, and did not scale to complex domains. The key contributions of our architecture are thus to: -enable planning with intentional abstract actions, and the associated mental states, actions, and beliefs, in the presence of incomplete domain knowledge, partial observability, and non-deterministic action outcomes; and -support scalability to larger domains by automatically restricting fine-resolution reasoning to knowledge and observations relevant to the goal or the coarse-resolution abstract action at hand, and by using probabilistic models of the uncertainty in sensing and actuation only when executing concrete actions.
We demonstrate the applicability of our architecture in the context of a: (i) simulated robot assisting humans in an office domain; (ii) physical robot (Baxter) manipulating objects on a tabletop; and (iii) wheeled robot (Turtlebot) moving target objects to desired locations or people in an office domain. We show that our architecture improves reliability and computational efficiency in comparison with a baseline architecture that does not reason about intentional actions and beliefs at different resolutions, and with a baseline architecture that does not limit reasoning to the relevant part of the domain.
The remainder of the paper is organized as follows. First, Section 2 reviews some related work to motivate the need for our architecture. Section 3 then describes the knowledge representation and reasoning architecture. The results of evaluating the capabilities of this architecture are described in Section 4, followed by a description of the conclusions and future work in Section 5.

Related work
There is much work on modeling, recognizing, and reasoning about intentions. For instance, Belief-desire-intention (BDI) architectures model the intentions of reasoning agents and use these models to eliminate choices that are inconsistent with the agent's current intentions [6,20]. However, these approaches do not learn from experience, are unable to adapt to new situations, and make it difficult (by themselves) to explicitly represent or reason about goals (e.g., for planning). There has been work in developing probabilistic graphical models that enable a robot to reason with encoded domain knowledge and learned models to recognize a human participant's intentions [16,17]. These approaches assume that the structure of the models used to represent knowledge is known a priori (e.g., the nodes and links of a hidden Markov model), and use prior (observed) data to estimate the model parameters, e.g., the probabilities of particular state transitions, and of obtaining particular observations. Reasoning about intent, and identifying discrepancies between expectations and observations, has also been modeled as a component of architectures for agents that perform goal-directed reasoning. For instance, a recent architecture models metacognitive expectations by allowing agents to reason about their cognition [7]. This meta-reasoning is achieved by introducing different levels in the architecture, along with distinct mechanisms at each level, to represent and reason about the domain knowledge and the beliefs of the associated agent. Having such separate levels that are not tightly coupled limits generalization, and the smooth transfer of control and information between the levels.
Initial work on formalizing intentions based on declarative programming introduced an action language and two fundamental principles: (i) non-procrastination, i.e., intended actions are executed as soon as possible; and (ii) persistence, i.e., unfulfilled intentions persist [3]. This architecture did not model agents with specific goals, but it was used to enable an observer to recognize an agent's activity and intention [12]. The Theory of Intentions (T I) extended this work to goal-driven agents by expanding transition diagrams with physical states and physically executable actions to include mental fluents and actions [4,5]. It associated a sequence of agent actions (called an "activity") with the goal it intended to achieve, and the intentional agent only performed activities needed to achieve the goal. This theory has been used to understand narratives of restaurant scenarios [30], and to model goal-driven agents in dynamic domains [23]. A requirement of such theories is that the domain knowledge, including the preconditions and effects of actions and goals, be encoded in advance, which is difficult to do in robot domains. Also, the set of states (and actions) can be large in robot domains, making efficient reasoning a challenging task. Recent work attempted to improve computational efficiency of reasoning with such theories by clustering indistinguishable states [24], but this approach required the clusters to be encoded in advance [30]. Furthermore, these approaches do not consider the uncertainty in sensing and actuation, which is the primary source of error in robotics.
Logic-based methods have been used widely in robotics, including those that also support probabilistic reasoning [15,31]. Methods based on classical first-order logic do not support non-monotonic logical reasoning or the desired expressiveness, e.g., it is not always meaningful to express degrees of belief by attaching probabilities to logic statements. Logics such as ASP support non-monotonic logical reasoning and have been used in cognitive robotics [10] and many other applications [9]. However, classical ASP formulations do not support probabilistic models of uncertainty, and such models are used widely to model the uncertainty in sensing and actuation in robotics. Approaches based on logic programming also do not support one or more of the desired capabilities such as reasoning with large probabilistic components, or incremental addition of probabilistic information and variables to reason about open worlds. As a step towards addressing these challenges, our prior refinement-based architecture reasoned with tightly-coupled transition diagrams at two resolutions [26]. For any given goal, each abstract action in a coarse-resolution plan computed using ASP-based reasoning with commonsense knowledge, was executed as a sequence of concrete actions computed by probabilistic reasoning over the relevant part of the fineresolution diagram using partially observable Markov decision processes. In this paper, we explore the combination of the principles of step-wise refinement with those of T I. In comparison with prior work, the architecture described in this paper supports reasoning about intentional actions and beliefs in the presence of incomplete domain knowledge, partial observability, and non-deterministic action outcomes, and it incorporates a more efficient approach for fine-resolution reasoning to support scalability to larger domains. Figure 2 is a simplified block diagram of the overall architecture. Similar to prior work [26], this architecture may be viewed as comprising three tightly-coupled components: a controller, a logician, and an executor; the significant differences in comparison with prior work are described later in this section. The controller maintains the overall beliefs regarding the state of the domain, and transfers control and information between the components. Reasoning is based on transition diagrams of the domain at two different resolutions, with a fine-resolution representation defined as a refinement of a coarse-resolution representation of the domain. For any given goal, the logician performs non-monotonic logical reasoning  with the coarse-resolution representation of commonsense domain knowledge to generate an activity, i.e., a sequence of intentional abstract actions to achieve the goal. To implement each such intentional abstract action, the controller automatically zooms to the part of the fine-resolution representation that is relevant to the desired abstract transition and the goal. Reasoning with this relevant part provides a plan of concrete actions; each such concrete action is executed by the executor using probabilistic models of the uncertainty in sensing and actuation. The observed and inferred outcomes of executing a concrete action, along with any other relevant observations, are communicated to the controller and added to the coarse-resolution history. The logician reasons with this history and continues with the current activity of intentional abstract actions only if it will achieve the desired goal. If, on the other hand, the logician finds that pursuing the current activity will not achieve the desired goal, a new activity is computed and implemented. We use CR-Prolog to represent and reason with the coarse-resolution and fine-resolution representations. We use existing implementations of probabilistic algorithms for executing concrete actions. The following running example will be used to describe the components of the architecture, along with differences from prior work. -Sorts such as place, thing, robot, obj ect, and book, arranged hierarchically, e.g., obj ect and robot are subsorts of thing. Sort names and constants are in lower-case, and variable names are in uppercase. -Places: {off ice 1 , off ice 2 , kitchen, library} with a door between neighboring places-see Fig. 3; only the door between kitchen and library can be locked. -Instances of sorts, e.g., rob 1 , book 1 , book 2 .

Knowledge representation and reasoning architecture
-Static attributes such as color, size and different parts (e.g., base and handle) associated with objects. -Other agents that may influence the domain, e.g., move a book or lock a door. These agents are not modeled explicitly; only the potential execution of exogenous actions by these agents is used to explain unexpected observations.

Action Language and Domain Representation
We first describe the action language encoding of the dynamics of the domain, and the translation of this encoding to CR-Prolog programs for knowledge representation and reasoning.

Action Language AL d
Action languages are formal models of parts of natural language used for describing transition diagrams of dynamic systems. We use an extension of the action language AL d [13] that supports non-Boolean fluents and non-deterministic causal laws [26], to describe the transition diagrams of our domain at different resolutions. AL d has a sorted signature with actions, i.e., a set of elementary operations, statics, i.e., domain attributes whose values cannot be changed by actions, and fluents, i.e., attributes whose values can be changed by actions. Basic fluents obey laws of inertia and can be changed by actions, whereas defined fluents do not obey laws of inertia and are not changed directly by actions. AL d allows three types of statements (i) causal law; (ii) state constraint; and (iii) executability condition: where a is an action, l is a literal (i.e., a domain attribute or its negation), l b is a basic literal, and p 0 , . . . , p m are domain literals. The causal law implies that if action a is executed in a state satisfying p 0 , . . . , p m , the literal l b will be true in the resulting state. The state constraint implies that literal l is true in a state satisfying p 0 , . . . , p m . The executability condition implies that it is impossible to execute actions a 0 , . . . , a k in a state satisfying domain literals p 0 , . . . , p m . 2 For more details about the syntax and semantics of AL d , please see [13], and for details about the extension of AL d to support non-Boolean fluents and non-deterministic causal laws, please see [26].

Coarse-resolution knowledge representation
The coarse-resolution domain representation consists of system description D c , which is a collection of statements of AL d , and history H c . System description D c has a sorted signature c and axioms that describe the corresponding transition diagram τ c . The signature c defines the basic sorts, domain attributes and actions. In addition to the basic sorts and ground instances introduced in Example 1, c for the RA domain includes sort step for temporal reasoning. Domain attributes (i.e., statics and fluents) and actions are described in terms of their arguments' sorts. In the RA domain, coarse-resolution statics include relations such as next to(place, place), which describes the relative arrangement of places in the domain; and relations modeling object attributes, e.g., we may represent an object's color as obj color(obj ect, color). 3 Fluents of the coarse-resolution representation of the RA domain include loc(thing, place), which denotes the location of the robot or other domain objects; in hand(robot, obj ect), which denotes whether a particular object is in the robot's hand; and locked(place), which implies that a particular place is locked. The locations of other agents, if any, are not changed by the robot's actions; these locations are inferred from observations obtained from other sensors. Next, c for the RA domain includes actions such as move(robot, place), pickup(robot, obj ect), putdown(robot, obj ect), and unlock(robot, place); we also consider exogenous actions exo move(obj ect, place) and exo lock(place) for diagnostic reasoning, e.g., for explaining unexpected observations. Finally, c also includes the relation holds(f luent, step) to imply that a particular fluent is true at a particular time step. Note that it is possible to consider domain attributes and actions as functions and use the corresponding notation, e.g., loc : thing → place, in hand : robot ×obj ect → bool, and move : robot ×place → action. We use the predicate notation for simplicity, ease of understanding, and to be consistent with the notation used in other parts of this paper.
Axioms in the coarse-resolution representation of the RA domain include causal laws, state constraints, and executability conditions such as: which describe the dynamics of the domain. For instance, Statement 1(a) implies that executing action move(rob 1 , library) causes loc(rob 1 , library) to be true in the resultant state, Statement 1(c) implies that any object can only be in one location at a time, and Statement 1(e) implies that the robot cannot pick an object up unless the object is in the same location as the robot. These axioms are used for inference, planning, and diagnostics, as described later in Section 3.1.3.
The history H c of a dynamic domain is usually a record of statements of the form: (i) obs(f luent, boolean, step) implying that particular fluents were observed to be true or false at a particular time step; and (ii) hpd(action, step) implying that particular actions happened at a particular time step. In [26], this notion was expanded to represent defaults describing the values of fluents in the initial state. For instance, in the coarse-resolution history H c of the RA domain, the statement "a book is usually in the library and if it is not there, it is normally in the office" is encoded as: These statements represent prioritized defaults. We can also encode exceptions, e.g., "cookbooks are in the kitchen"; for more information, please see [14]. Notice that this representation does not assign numerical values to degrees of belief associated with these defaults, but supports elegant reasoning with generic defaults and their specific exceptions (if any).

Reasoning with Knowledge
Key tasks of an agent equipped with a system description and history include reasoning with this domain representation for planning and diagnostics. In our architecture, these tasks are accomplished by translating the domain representation to a program in CR-Prolog, a variant of ASP that incorporates consistency restoring (CR) rules [2]. An independent group of researchers have developed (and will be releasing) software to automate the translation between a description in AL d and the corresponding description in CR-Prolog. In our case, we build on previous work that specified steps for this translation [26], and either perform this translation manually or use a script that automates this translation.
ASP is based on stable model semantics and supports concepts such as default negation and epistemic disjunction, e.g., unlike "¬a" that states a is believed to be false, "not a" only implies a is not believed to be true, and unlike "p ∨ ¬p" in propositional logic, "p or ¬p" is not tautologous. In other words, each literal can be true, false or just "unknown", and an agent associated with an ASP program only believes that which it is forced to believe. ASP can also represent recursive definitions and constructs that are difficult to express in classical logic formalisms, and it supports non-monotonic logical reasoning, i.e., it is able to revise previously held conclusions based on new evidence. The CR-Prolog program (D c , H c ) for the coarse-resolution representation of the RA domain includes the signature and axioms of D c , inertia axioms, reality checks, closed world assumptions (CWAs) for defined fluents and actions. For instance, (D c , H c ) includes: where Statements 3(a)-(b) are inertia axioms for basic fluents, Statements 3(c)-(d) are reality check axioms implying that any mismatch between observations and expectations based on current beliefs results in an inconsistency, and Statement 3(e) is the CWA for actions. Program (D c , H c ) also includes observations, actions, and defaults from H c . Every default also has a CR rule that allows the robot to assume the default's conclusion is false to restore consistency under exceptional circumstances. For instance, the axiom: considers the rare event of a book not being in the library. This axiom is only used under exceptional circumstances to restore consistency in the presence of an unexpected observation, e.g., a book that is expected to be in the library is later found to be in off ice 2 .
Each answer set of an ASP program, typically computed by applying a SAT (i.e., satisfiability) solver to the ASP program, represents the set of beliefs of an agent associated with the program. Algorithms for computing entailment, and for planning and diagnostics, reduce these tasks to computing answer sets of CR-Prolog programs. We compute answer sets of CR-Prolog programs using the system called SPARC [1]. An illustrative version of the coarse-resolution CR-Prolog program for the RA domain (written using SPARC) is available in our open-source software repository [25].

Adapted Theory of Intention
For any given goal, a robot reasoning with domain knowledge (as described above) will compute a plan and execute it actions in the plan until either the goal is achieved or an action in the plan has an unexpected outcome. In the latter case, the robot will attempt to explain the unexpected outcome (i.e., perform diagnostics) and compute a new plan if necessary.
To motivate the need for a different approach in dynamic domains, consider the following five scenarios in which the goal is to move book 1 and book 2 to the library; these scenarios have been adapted from scenarios considered in prior work [5]: -Scenario 1 (planning): Robot rob 1 is in the kitchen holding book 1 , and believes book 2 is in the kitchen and that the library is unlocked. The computed plan is: -Scenario 2 (unexpected success): Assume that rob 1 in Scenario-1 has moved to the library and put book 1 down, and observes book 2 there. The robot should be able to explain this observation (e.g., book 2 was moved there as a result of an exogenous action) and realize that the goal has been achieved. -Scenario 3 (not expected to achieve goal, diagnose and replan, case 1): Assume rob 1 in Scenario-1 starts moving book 1 to library, but observes book 2 is not in the kitchen. The robot should realize the plan will fail to achieve the overall goal, explain the unexpected observation, and compute a new plan. -Scenario 4 (not expected to achieve goal, diagnose and replan, case 2): Assume rob1 is in the kitchen holding book 1 , and believes that book 2 is in off ice 2 and the library is unlocked. The robot plans to put book 1 in the library before fetching book 2 from off ice 2 . Before rob 1 moves to the library, it unexpectedly observes book 2 in the kitchen. The robot should realize that its current plan will fail, explain the unexpected observation, and compute a new plan. -Scenario 5 (failure to achieve the goal, diagnose and replan): Assume robot rob 1 in Scenario-1 is putting book 2 in the library, after having put book 1 in the library earlier, and observes that book 1 is no longer there. The robot's intention should persist; it should explain the unexpected observation, replan if necessary, and execute actions until the goal is achieved (i.e., both books are in the library).
One way to support the desired behavior in such scenarios is to reason with all observations of domain objects and events, e.g., observations of all objects in the field of view of the robot's (or the domain's) sensors, during plan execution. Such an approach would be computationally unfeasible in complex domains in which there may be many new observations and events at each time step. Also, only a small number of these observations and events may be relevant to the task at hand. We thus pursue a different approach in our architecture; our adapted theory of intention builds on the principles of non-procrastination and persistence, and extends the ideas from T I. Specifically, our architecture enables the robot to automatically compute actions that are intended for the given goal and current beliefs. As the robot attempts to implement each such action, the robot automatically identifies and considers those observations that are "relevant" to this action or the goal. The robot adds these observations to the recorded history, and uses them to reason about mental states and actions, to determine if and when it should replan as against following the existing plan. We will henceforth use AT I to refer to this adapted theory of intention; it expands both the system description D c and history H c in the original program (D c , H c ) to reason about intentional actions and beliefs. Below, we describe the steps of this expansion along with some examples, and provide a link to an illustrative program that is obtained by applying these steps in the RA domain.
First, the signature c is expanded to represent an activity as a triplet comprising a goal, a plan to achieve the goal, and a specific name for the activity. We do so by introducing (in c ) relations such as: activity(name), activity goal(name, goal) (5) activity length(name, length) activity component (name, number, action) which represent each named activity, the goal and length of each activity, and the actions that are the components of the activity. Note that these relations are not ground initially because the specific activities and goals are constructed or defined as needed. However, once they are ground, the corresponding terms behave as statics.
Next, the existing fluents of c are considered to be physical fluents and the set of fluents is expanded to include mental fluents such as:

select (goal), abandon(goal)
where the first two mental actions are used by the controller to start or stop a particular activity, and the other two actions represent exogenous actions executed (e.g., by a human or an external system) to select or abandon a goal. In addition to c , the domain's history H c is expanded to include relations such as: which denote that a particular action was attempted at a particular time step, and that a particular action did not happen (i.e., was not executed successfully) at a particular time step. Note that it is straightforward for the robot to figure out when an action was attempted, but figuring out when an action was actually executed (or not executed) requires external (e.g., sensor) input and reasoning, e.g., diagnostic reasoning with observations to determine whether an action had the intended outcome(s). In our control loop and experimental trials, we use ASP to reason with the observations to determine whether an action was actually executed, and then use this information for subsequent reasoning.
The expansion of the signature and the history makes it necessary to expand the description of the domain dynamics. To do so, we introduce new axioms in D c . This includes axioms that represent the effects of the physical and mental actions on the physical and mental fluents, e.g., starting (stopping) an activity makes it active (inactive), and executing an action in an activity keeps the current activity active. The new axioms include state constraints, e.g., to describe conditions under which any particular activity or goal is active, and executability conditions, e.g., it is not possible for the robot to simultaneously execute two mental actions or to start an activity when another activity is active and still valid. In addition, axioms are introduced to generate intentional actions, build a consistent model of the domain history, and to perform diagnostics. For example, the following axioms are related to finding the next intended action given an activity and a goal: where Statement 9(a) implies that if the next agent action AA in the current activity is not impossible, it is expected to occur in a subsequent time step. Statement 9(b) implies that if the goal holds in a future step, given that actions of the current activity AN occur as planned, the activity has a projected success. Statement 9(c) implies that if we do not have a projected success, it must have been because one of our actions cannot occur, or our current activity AN does not reach the goal. Finally, Statement 9(d) implies that if our current activity has a projected success, the activity's next action will be the next intended action. As another example, the following axioms of D c define an activity as being futile: where Statement 10(a) introduces an inconsistency if there is an active activity AN that does not have projected success and has not been defined as being futile. Then, Statement 10(b) is a CR rule that provides a path out of such an inconsistency by defining the activity as being futile in these exceptional circumstances. Finally, Statement 10(c) implies that is an activity as been defined as being futile, the next intentional action of the robot will be to stop this activity and plan a different activity. As described in Section 3.1.3, we use a script to automatically translate the revised system description D c and history H c to a CR-Prolog program (D c , H c ) that is solved for planning or diagnostics. However, recall that CR-rules are used to build a consistent model of history, which involves reasoning about potential exceptions to defaults and the execution of exogenous actions, and to generate minimal plans of intentional actions. This reasoning is challenging because we need to encode some preferences between the different CR rules; unexpected observations could potentially be explained using exceptions to defaults or using exogenous actions, e.g., a book may be observed in the kitchen because it is an exception to the corresponding default, or because it was moved there by some one. Our preference is based on the following key postulate: Unexpected observations are more likely to be due to exceptions to defaults than due to exogenous actions. This is a reasonable claim for many robotics domains, and is translated to the following preference: First try to explain unexpected observations by considering exceptions to defaults; if that does not suffice, consider exogenous actions to generate explanations.
Even with this preference, the agent will have to use CR rules for both diagnostics and planning. Exploring all possible combinations of such rules can become computationally expensive in complex domains. To ensure efficient and correct reasoning while still encoding the desired preference, we modify the axioms (by adding suitable flags) such that coarse-resolution reasoning with AT I is performed in two phases. The robot first computes a consistent model of history without considering axioms for plan generation, and then uses this model to guide the computation of plan(s) without considering the axioms for diagnostics. A CR-Prolog program illustrating this process for the RA domain, written in SPARC with explanatory comments, is available in ToI planning.sp in the folder simulation/ASP f iles/ in our open-source software repository [25].
The following are the key differences that distinguish AT I from the prior work on T I and the prior work on coarse-resolution reasoning in the refinement-based architecture [26]: 1. T I becomes computationally expensive, especially as the size of the plan or domain history increases. Reasoning with T I performs diagnostics and planning jointly, which allows it to consider different explanations during planning but makes computation unfeasible in all but the very simple domains. On the other hand, reasoning with AT I, as stated above, first builds a consistent model of history by considering different explanations, and uses the chosen model to guide planning, significantly improving computational efficiency and supporting scalability in complex domains. 2. T I assumes complete knowledge of the state of other agents (e.g., humans or other robots) that perform exogenous actions. In most robotics domains, this assumption is unrealistic; these domains typically only afford partial observability. AT I instead makes the more realistic assumption that the robot can only make unreliable observations of its domain through its sensors and infer exogenous actions by reasoning about and trying to explain these observations. 3. AT I does not include the notion of sub-goals and sub-activities (and associated relations) from T I, as they are not necessary. Also, these sub-activities and sub-goals need to be encoded in advance to use T I, which is difficult to do in practical (robotics) domains. Furthermore, even if this knowledge is encoded, it will make reasoning (e.g., for planning or diagnostics) significantly more computationally expensive if the robot has to repeatedly examine if one of the many stored activities provides a minimal and correct path to the desired goal. 4. Coarse-resolution reasoning in the prior work on the refinement-based architecture did not (a) reason about intentional actions; or (b) reason about exogenous actions in addition to initial state defaults. These limitations are relaxed in the architecture described in this paper. A consistent model of history is constructed with defaults and exogenous actions at the coarse resolution, and reasoning with intentional actions supports reasoning in the presence of unexpected successes and failures.
Any architecture with AT I, the original T I, or a different reasoning component based on non-monotonic logics or classical first-order logic, will have two key limitations that have not been discussed so far. First, reasoning does not scale well to the finer resolution at which actions will often have to be executed to perform various tasks in robotics domains. For instance, the coarse-resolution representation discussed so far is not sufficient if the robot has to grasp and pickup a particular cup from a particular table, or deliver the cup to a particular person. Also, using logics to reason with a sufficiently fine-grained domain representation (e.g., to perform the grasping task) will be computationally expensive. Second, we have not yet modeled the actual sensor observations of the robot or the uncertainty in sensing and actuation. This uncertainty is the primary source of error on robots, and many existing algorithms use probabilities to model this uncertainty quantitatively. Section 2 discusses additional limitations of approaches based on logical and probabilistic reasoning for robotics domains. Our architecture addresses these limitations by combining AT I with ideas that build on prior work on a refinement-based architecture [26], as described below.

Refinement, Zooming and Execution
Consider a coarse-resolution system description D c of transition diagram τ c that includes AT I. For any given goal, reasoning with (D c , H c ) will provide an activity, i.e., a sequence of abstract intentional actions. In our architecture, the execution of the coarseresolution transition corresponding to each such abstract action is based on a fine-resolution system description D f of transition diagram τ f that is a refinement of, and is tightly coupled to, D c . We can imagine refinement as taking a closer look at the domain through a magnifying lens, potentially leading to the discovery of concrete structures that were previously abstracted away by the designer, e.g., for efficient reasoning with rich commonsense knowledge. Our architecture builds on the general design methodology described in prior work [26] to construct D f using D c and some domain-specific information provided by the designer. This approach includes a weak refinement that temporarily limits the robot's ability to observe the value of fluents (through sensors), and a theory of observation that leads to the definition of strong refinement by relaxing this limitation. The coarse-resolution transition is then implemented by automatically zooming to and reasoning with the part of D f relevant to this transition and the coarse-resolution goal. We describe the steps of this process and highlight key differences between our current approach and prior work [26].
where the superscript "*" represents fine-resolution counterparts of the sorts in D c that are magnified by refinement. Also, {c 1 , . . . , c m } are the grid cells that are the components of the original set of places, and any cup has a base and handle as components (i.e., parts); a book, on the other hand, is not magnified and has no components. The sort hierarchy is also suitably modified, e.g., cup and cup * are siblings that are children of sort obj ect. Also, for each domain attribute of c magnified by the increase in resolution, we introduce appropriate fine-resolution counterparts in f . For instance, in the RA domain, f includes domain attributes such as: loc(thing, place), next to(place, place) (12) loc * (thing * , place * ), next to * (place * , place * ) where relations with and without the "*" superscript represent the fine-resolution counterparts and their coarse-resolution versions respectively. The specific relations listed above describe the location of each thing at two different resolutions, and describe two places or cells that are next to each other. The signature f will also include actions that are copies of those in c and those with magnified sorts. For instance, f for the RA domain includes: where C, C 1 and C 2 are elements of sort place * (i.e., grid cells in places), and Opart is an element of sort cup * , i.e., cup 1 base or cup 1 handle. We also include bridge axioms that relate coarse-resolution domain attributes to their fine-resolution counterparts. For instance: where Statement 15(a) implies that any object that is in a particular cell within a particular room is also within that room, and Statement 15(b) implies that if the robot has some part of an object in its grasp then the entire object is also in its grasp. Note that the refinement process does not inherit any of the relations or axioms that were introduced in D c to reason about intentional actions. Next, to support the observation of the values of fluents, the signature f is expanded to include knowledge-producing action test (robot, f luent) that checks the value of a fluent in a given state, and only changes the value of appropriate (fine-resolution) knowledge fluents. We also introduce knowledge fluents to describe observations of the environment, e.g., basic fluents to describe the direct (sensor-based) observation of the values of the fine-resolution fluents, and defined domain-dependent fluents that determine when the value of a particular fluent can be tested. Note that the value of any concrete fluent or static in f is directly observable, e.g., the grid cell location of the robot, whereas any abstract fluent or static in f is only indirectly observable, e.g., the place location of an object cannot be observed directly. The axioms of D f are then expanded to include (a) causal laws describing the effect of the test action on the corresponding fine-resolution basic knowledge fluents; (b) executability conditions for these test actions; (c) axioms that describe the robot's ability to sense the values of directly and indirectly observable fluents; and (d) auxiliary axioms for indirect observation of fluents. For example: where can test (rob 1 , F ) is a domain-dependent defined fluent that encodes the information about when the robot can test the value of a particular fluent, and observed(rob 1 , F ) is a knowledge fluent that encodes that the robot has observed a particular value for a particular fluent directly, e.g., Statement 16(a), or indirectly, e.g., Statement 16(d). Prior work has shown that if certain conditions are met by the definition of D f and D c , then for each transition in τ c between coarse-resolution states σ 1 and σ 2 , there exists a path in τ f between some refinement of σ 1 and some refinement of σ 2 -see [26] for related definitions and proofs. Although the D c described in this paper also includes AT I, recall that the design of our architecture includes the key decision of confining the representation and reasoning methods associated with this theory to the coarse resolution. In other words, although the transition diagrams for D c and D f , i.e., τ c and τ f , are tightly-coupled, the components of the signature and the axioms added to D c for AT I are not refined or included in D f . Our design choice thus enables us to include the additional theory while ensuring that the result from [26] about the correspondence of paths in τ c and τ f holds for the coarse and fine resolution descriptions in this paper. While the tight coupling established by refinement between the coarse resolution and fine resolution descriptions is appealing, reasoning at fine resolution using D f becomes computationally unfeasible for complex domains. Also, the refined description does not (so far) consider probabilistic models of the uncertainty in sensing and actuation. We address the computational complexity problem through a key expansion to the principle of zooming introduced in [26]. Specifically, for each abstract transition T to be implemented (i.e., executed) at fine resolution, the previous definition of zooming determined D f (T ), the part of the system description D f relevant to transition T ; it did so by determining the object constants of f relevant to T and restricting D f to these object constants. Here, we extend this definition of zooming to identify D f (T , G), the part of system description D f relevant to the transition or the overall goal. To identify this part, we first make some key changes to the definition of relevance in [26] as follows. 4. If the body of an executability condition of a H contains a term f (x 1 , . . . , x n , y) that is in σ 1 , the constants x 1 , . . . , x n , y are in relCon c (T , G); 5. If f (x 1 , . . . , x n , y) belongs to G, then x 1 , . . . , x n , y are in relObCon c (T , G).

Constants from relCon c (T , G) are said to be relevant to T or G.
Note that unlike prior work, this definition of relevance considers the coarse-resolution goal when identifying the object constants relevant to a particular coarse-resolution transition. Consider a scenario in our RA domain, in which the goal is to take the book tb 1 , which is known to be in off ice 1 , to the library, with the robot rob 1 being in the kitchen. For the first transition T = σ 1 , move(rob 1 , off ice 1 ), σ 2 in the activity for this goal, Once the relevant coarse-resolution system description has been identified, the zoomed system description can be constructed as follows.  (rob 1 , c i ), which moves the robot to a particular cell, test (rob 1 , loc * (rob 1 , c i )), which checks whether rob 1 is in a particular cell location, and observed(rob 1 , loc(tb 1 , c j )), which represents the observation of tb 1 in a particular cell. Also, restricting axioms of D f to the signature f (T , G) removes causal laws for pickup and putdown, and irrelevant state constraints and executability conditions; the variables in the remaining axioms are restricted to object constants in f (T , G). It can be shown that for any given transition σ 1 (T , G), a H , σ 2 (T , G) in the coarse-resolution transition diagram of D c (T , G), there exists a path between a refinement of σ 1 (T , G) and a refinement of σ 2 (T , G). This result can be established by following steps similar to those in the proof provided in prior work [26]. The key differences are the revised definitions of relevance and zooming, as provided above, which will require suitable revisions in the proof.
Once the relevant fine-resolution description has been identified, prior work achieved fine-resolution implementations of any desired coarse resolution transition T by (a) mapping D f (T ) and estimated probabilities of state transitions to a partially observable Markov decision process (POMDP); and (b) using an approximate solver to solve each such POMDP and obtain a policy that maps belief states to actions. Although the POMDP that is constructed and solved only focuses on the relevant part of the fine-resolution description, this approach can become computationally expensive in complex domains. Instead, to implement transition T in our architecture, ASP-based reasoning with (D f (T , G), H f ) is used to compute a sequence of concrete (i.e., fine-resolution) actions, with the goal being a fine-resolution counterpart of the resultant state of the coarse-resolution transition T . In what follows, we use "refinement and zooming" to refer to the use of both refinement and zooming as described above. The execution of each fine-resolution concrete action is then based on existing implementations of algorithms for common robotics tasks such as navigation, mapping, object recognition, localization, and grasping-see Section 4 for more details. These algorithms provide probabilistic measures of certainty about their decisions, e.g., about the presence or absence of target objects in an image of the scene. When the robot makes decisions at the fine resolution, the high-probability outcomes of each concrete action's execution get elevated to statements associated with complete certainty in H f and used for subsequent reasoning; this approach may result in incorrect commitments but the non-monotonic logical reasoning capability helps the robot identify and recover from such errors. The coarse-resolution outcomes of such fine-resolution reasoning are added to the coarse-resolution H c for subsequent reasoning using AT I. The CR-Prolog programs for fine-resolution reasoning in the RA domain (i.e., with the refined and zoomed system description), and the program for the overall control loop of the architecture, are available in our online repository [25].
The following are the key differences that distinguish fine-resolution reasoning in our architecture from that in prior work on the refinement-based architecture [26]: 1. Prior work did not maintain a history and perform logical reasoning at the fineresolution; as stated earlier, a POMDP-based approach was used, which becomes computationally expensive in complex domains. Also, prior work assumed limited dynamic changes in the domain during planning and execution. These limitations are relaxed in the architecture described in this paper. Fine-resolution reasoning builds a consistent model of history, and considers the relevant fine-resolution observations to compute and add appropriate statements to the coarse-resolution history. Furthermore, the tight coupling between the system descriptions and the separation of concerns, with AT I only included in the coarse resolution, helps establish the desirable properties of prior work, e.g., about the existence of paths in the fine-resolution transition diagram for any given transition in the coarse-resolution diagram. 2. Zooming is a key requirement for the desired reasoning capabilities and for computational efficiency. Prior work on zooming automatically extracted the part of the fine-resolution system description relevant to the implementation of any given transition at the coarse resolution. The architecture described in this paper, on the other hand, automatically identifies and reasons about the part of the fine-resolution system description relevant to the coarse-resolution transition and the goal under consideration. As a result, reasoning and plan execution are reliable and efficient in the presence of dynamic (and unexpected) changes in the domain. 3. Prior work used a POMDP to reason probabilistically over the zoomed fine-resolution system description D f (T ) for any coarse-resolution transition T . This is a computationally expensive process, especially when domain changes prevent reuse of POMDP policies [26]. In this paper, CR-Prolog is used to compute a plan of concrete actions from D f (T , G). Each concrete action is then executed using algorithms that incorporate probabilistic models of uncertainty, significantly reducing the computational cost of fine-resolution reasoning and execution. In addition, the algorithms for the individual concrete actions can be implemented, revised, and replaced without requiring any further changes in the other components of the architecture.
As we show below, these differences help improve the reliability and computational efficiency of reasoning.

Experimental setup and results
This section reports the results of experimentally evaluating the capabilities of our architecture in different scenarios. We evaluated the following hypotheses: -H1: using AT I improves the computational efficiency in comparison with not using it, especially in scenarios with unexpected success. -H2: using AT I improves the accuracy in comparison with not using it, especially in scenarios with unexpected goal-relevant observations. -H3: the architecture that combines AT I with refinement and zooming supports reliable and efficient operation in complex (robot) domains.
We report results of evaluating these hypotheses experimentally: (a) in a simulated domain based on Example 1; (b) on a Baxter manipulating objects on a tabletop; and (c) on a Turtlebot finding and moving objects to particular places in an indoor domain. We also provide some execution traces as illustrative examples of the working of the architecture. To evaluate the ability to scale to more complex domains, we defined variants of the RA domain at eight different complexity levels. The key components of each complexity level are as follows: -L1: one object with one fine-resolution part, i.e., no new parts considered after refinement; two rooms with two cells in each room. -L2: two objects, each with two refined parts; three rooms with two cells in each room.
-L3: three objects, each with three fine-resolution parts (e.g., base and handle of cup); four rooms with four cells in each room. -L4: four objects, each with four refined parts; five rooms with five cells in each room.
-L5: eight objects, each with two refined parts; five rooms with nine cells in each room. -L6: eight objects, each with two fine-resolution parts, and four objects, each with one fine-resolution part; five rooms with twelve cells in each room. -L7: eight objects, each with two fine-resolution parts, and four objects, each with one fine-resolution part; five rooms with sixteen cells in each room.
-L8: sixteen objects, each with two fine-resolution parts, and eight objects, each with one fine-resolution part; five rooms with sixteen cells in each room.
where the number of objects, number of object parts, number of rooms, and the number of cells in each room, typically increase between consecutive complexity levels. There are some exceptions, e.g., between L5 − L6 and L6 − L7, introduced to isolate and study the effects of a change in the value of one of these parameters.
In each experimental trial, the robot's goal was to find and move one or more objects to particular locations. As a baseline for comparison for hypotheses H 1 and H 2, we used an ASP-based reasoner that does not include AT I-we refer to this as the "traditional planning" (T P) approach. The term "traditional" implies that the planner only monitors the effects of the action being executed; it does not identify and monitor observations related to the current transition and the goal. We do not use T I as the baseline for comparison because it includes components that make it much more computationally expensive than AT I. Also, T I does not support reasoning with incomplete knowledge, non-determinism, and partial observability, capabilities that are often needed in robotics domains-see Section 3.2 for a related discussion. In the T P approach, the robot uses ASP to reason with incomplete domain knowledge, and only monitors the outcome(s) of the action currently being executed. Recall that AT I is introduced in the coarse resolution; to thoroughly examine the effect of this theory, we first compare AT I with T P in the coarse resolution, i.e., without any refinement, zooming, or fine-resolution reasoning. We then separately examine the effect of refinement, zooming, and probabilistic models of the uncertainty in sensing and actuation, in the context of evaluating hypothesis H 3. We do so by combining refinement and zooming with AT I; the baseline for comparison was a system that did not use zooming as part of fine-resolution reasoning-we refer to this as the "non-zooming" approach that still includes AT I (at the coarse resolution) and reasoning with the refined description. We also combine AT I with refinement and zooming to run experiments on robots. Although we do not do so in this paper, our architecture's components for fine-resolution reasoning can also be combined with T P (if needed).
As stated in Section 3.3, we use existing implementations of suitable algorithms for executing the concrete actions, e.g., for navigation, object recognition, obstacle avoidance, and manipulation. These algorithms internally model and estimate the uncertainty in sensing and actuation probabilistically. Some of these algorithms operate continuously (e.g., for obstacle avoidance), while others (e.g., object recognition) are selected and used as needed. When we run experiments in simulation (see Section 4.2 below), we use statistics obtained from executing the concrete actions on robots to simulate the probabilistic models of uncertainty, e.g., the robot moves to the desired grid cell in 85% of the trial and recognizes an object correctly in 90% of the trials. When we run experiments on robots (see Section 4.3 below), we use existing implementations of algorithms developed by us and other researchers based on the Robot Operating System (ROS). For instance, whenever we use our architecture in a domain where the robot can move, we use the particle filter-based algorithm in ROS for simultaneous localization and mapping [8]. This algorithm enables the robot to periodically, simultaneously, and probabilistically track multiple hypotheses, each of which represent a pose sequence and a map of the domain. For visual object recognition, we use an algorithm developed by others in our research group. This algorithm is used when needed by executing a suitable knowledge-producing (e.g., test) action, and is based on learned models that characterize each object using color, shape, and local gradient features [18]. We also use an existing implementation in ROS for local obstacle avoidance. These algorithms associate probabilities with outcomes, e.g., a probabilistic measure of certainty is computed and provided with the robot's estimate of its pose, or its estimate of the class label assigned to domain objects observed in camera images.
We used one or more of the following performance measures to evaluate the hypotheses: (i) total (planning and execution) time; (ii) number of plans computed; (iii) planning time; (iv) execution time; (v) number of actions executed; and (vi) accuracy. Note a plan is considered to be correct if it is minimal and results (on execution) in the achievement of the goal. We begin with execution traces demonstrating the working of the architecture.

Execution traces
The following execution traces illustrate the differences in the decisions made by a robot using AT I in comparison with a robot using T P, focusing primarily on coarse-resolution reasoning. These traces correspond to scenarios drawn from the RA domain; we focus on scenarios in which the robot has to respond to unexpected observed effects (successes and failures) caused by exogenous actions.

Execution Example 1 [Example of Scenario-2]
Assume that robot rob 1 is in the kitchen initially, holding book 1 in its hand, and believes that book 2 is in off ice 2 and the library is unlocked.
-The goal is to have book 1  -Assume that as the robot is putting book 1 down in the library, book 2 has been moved (e.g., by a human or other external agent) to the library. -With AT I, the robot observes book 2 in the library, reasons and explains the observation as the result of an exogenous action, realizes the goal has been achieved and stops further planning and execution. -With T P, the robot does not observe or does not use the information encoded in the observation of book 2 . It will thus waste time executing subsequent steps of the plan until it is unable to find or pickup book 2 in the library. It will then replan (potentially including prior observation of book 2 ) and eventually achieve the desired goal. It may also compute and pursue plans assuming book 2 is in different places, and take more time to achieve the goal.

Execution Example 2 [Example of Scenario-5]
Assume that robot rob 1 is in the kitchen initially, holding book 1 in its hand, and believes that book 2 is in the kitchen and the library is unlocked.
-The goal is to have book 1 and book 2 in the library. The computed plan is the same for AT I and T P, and consists of the actions: -Assume the robot is in the act of putting book 2 in the library, after having already put down book 1 in the library earlier. However, book 1 is unexpectedly moved from the library (e.g., to the kitchen, unknown to the robot) while the robot is moving book 2 . -With AT I, the robot observes book 1 in not in the library, realizes the goal has not been achieved although the computed plan has been completed, computes a new plan, and executes this plan until it finds book 1 and moves it to the library. -With T P, the robot puts book 2 in the library and stops execution because it believes it has achieved the desired goal. In other words, it does not realize that the goal has not been achieved.

Experimental results in simulation
We evaluated hypotheses H 1 and H 2 extensively in a simulated world that mimics Example 1, with four places and different objects. Please note the following: -As stated earlier, we first compared AT I with T P in the context of the coarseresolution domain representation, i.e., these trials did not include refinement, zooming or fine-resolution reasoning. We also temporarily abstracted away uncertainty in perception and actuation. -We conducted paired trials and compared the results obtained using T P with those obtained using AT I for the same initial conditions and for the same dynamic domain changes (when appropriate), e.g., a book is moved unknown to the robot and the robot obtains an unexpected observation. -When we included fine-resolution reasoning in simulation, we assumed a fixed execution time for each concrete action to measure execution time, e.g., 15 units for moving from a room to the neighboring room, 5 units to pick up an object or put it down; and 5 units to open a door. -Ground truth (e.g., minimal plan) was provided by a separate component that reasons with complete domain knowledge. Table 1 summarizes the results of ≈ 800 paired trials in each of the five scenarios described in Section 3.2. Also, all claims made below were tested for statistical significance. The initial conditions, e.g., starting location of the robot and objects' locations, and the goal, were set randomly in each paired trial. However, before choosing a particular instance of a scenario defined by a particular initial condition, the simulator does use ground truth knowledge (not available to the robot) to verify that the chosen goal is reachable from the chosen initial conditions. Also, in suitable scenarios, a randomly-chosen, valid (unexpected) domain change is introduced in each paired trial. Given the significant differences that may exist between two paired trials, averaging the measured time or plan length across different trials does not provide any useful information about the performance of the two approaches being compared. In each paired trial, the value of each performance measure (except accuracy) obtained with T P is thus expressed as a fraction of the value of the same performance measure obtained with AT I; each value reported in Table 1 is the average of these computed ratios. We highlight some key findings below. Scenario-1 represents a standard planning task with no unexpected domain changes. In this scenario, both T P and AT I provide the same accuracy (100%) and compute essentially the same plan, but computing an activity comprising intentional actions and repeatedly checking the validity of this activity takes longer. This explains the reported average values of 0.45 and 0.81 for planning time and total time (for T P) in Table 1 above.
In Scenario-2 (unexpected success), both T P and AT I achieve 100% accuracy. Here, AT I stops reasoning and execution once it realizes the desired goal has been achieved unexpectedly. However, T P does not realize this because it does not consider observations not directly related to the action being executed; it keeps trying to find the objects of interest in different places. This explains why T P has a higher planning time and execution time, computes more plans, and executes more actions (i.e., plan steps) than AT I. Scenarios 3-5 correspond to different kinds of unexpected failures. In each trial for these scenarios, AT I leads to a successful achievement of the goal, whereas there are many instances in which T P is unable to recover from the unexpected observations and achieve the goal. For instance, if the goal is to move two books to the library, and one of the books is moved to an unexpected location when it is no longer part of an un-executed action in the robot's plan, the robot may not reason about this unexpected occurrence and the desired goal may not be achieved. This phenomenon is especially pronounced in Scenario-5 that represents an extreme case in which the robot using T P is never able to achieve the assigned goal because it never realizes that it has failed to achieve the goal. Notice that in the trials corresponding to all three scenarios, AT I takes more time than T P to plan and execute the plans for any given goal, but this increase in time is justified given the high accuracy and the desired behavior that the robot is able to achieve in these scenarios using AT I.
The row labeled "All" in Table 1 shows the average of the results obtained in the different scenarios. The subsequent three rows in Table 1 summarize results after removing from consideration trials in which T P fails to achieve the assigned goal. We then notice that AT I is at least as fast as T P and is often faster, i.e., it takes less time (overall) to plan and execute actions to achieve the desired goal. In summary, T P may result in faster planning in well-defined domains with little or no dynamic changes, but it results in lower accuracy and higher execution time than AT I in dynamic domains, especially in the presence of unexpected successes and failures that are common in dynamic domains. The results in Table 1 provide evidence in support of hypotheses H 1 and H 2. The subsequent analysis of the fine-resolution components of our architecture was thus performed by combining them with AT I and not with T P.
Next, to evaluate hypothesis H 3, we ran experiments in the eight complexity levels listed in Section 4, with and without including zooming. All trials included AT I for coarse-resolution reasoning with the adapted theory of intentions, and refined domain representation for fine-resolution reasoning. Recall that the robot cannot actually execute the coarse-resolution actions. As before, the goal in each experimental trial was to find and move a target object to a target location. Similar to the experiments used to evaluate H 1 and H 2, the values of performance measures without zooming are, wherever appropriate, expressed a fraction of the values with zooming. Tables 2 and 3 summarize the corresponding results, and we make the following observations: -When AT I was used with zooming, all trials in all complexity levels terminated successfully, i.e., the assigned goal was always achieved-see Table 2. Without zooming, the goal was achieved in all trials in complexity levels L1 and L2, in only 65% of the trials in complexity level L3, and in none of the trials in complexity levels L4 − L8.
The observed failures in complexity levels L3 − L8 were due to the existence of too many options (i.e., paths in the transition diagram) to consider during fine-resolution reasoning in the absence of zooming. In such cases, fine-resolution planning terminated unexpectedly (i.e., before the goal was achieved) in the absence of zooming. Thus, Tables 2 and 3 do not consider paired trials at or above complexity level L4; at these complexity levels, we only report results of trials that included zooming in fine-resolution reasoning. -The coarse-resolution reasoning time, i.e., the time for coarse-resolution planning and diagnostics, increases gradually (as expected) with the increase in the complexity level.
In general, the time taken for coarse-resolution reasoning is much smaller in comparison with the fine-resolution reasoning time in complex domains The fine-resolution reasoning time, i.e. the time for planning at the fine resolution, and for inferring coarseresolution observations based on fine-resolution outcomes, also increases with the increase in the complexity level. With zooming included in the fine-resolution reasoning, this increase is reasoning time scales well with the increase in the complexity level. However, in the absence of zooming, the increase in reasoning time is much more pronounced, e.g., fine-resolution reasoning at complexity level L3 without zooming takes (on average) 55 times as much time as when zooming is used. -Note that reasoning can imply multiple instances of planning and diagnostics for a particular goal. When zooming is used, the average time spent computing each refined plan scales well with the increase in the level of complexity. When zooming is not included in the fine-resolution reasoning, the average time spent in each refined plan increases dramatically, e.g., even at complexity level L3, each refined plan without zooming takes (on average) 85 times as much time as with zooming. -The results with complexity levels L7 and L8 present an interesting comparison, and further indicate the benefits of zooming. Complexity level L8 has the same number of rooms and cells in each room as L7, but it has twice as many objects as L7. This increase would typically have caused a significant increase in the reasoning time, especially when we consider the parts of the different objects in the fine-resolution. However, zooming enables the robot to limit its attention to only the objects and object parts relevant to any given task; we only observe a small increase in the coarse-resolution reasoning time, with hardly any change in the fine-resolution reasoning time.
Overall, Tables 2 and 3 indicate that zooming supports scalable fine-resolution reasoning with the increase in complexity. When used in conjunction with the AT I at the coarse resolution, we obtain reliable and efficient performance in dynamic domains. These results thus support hypothesis H 3. Values of reasoning times without zooming are expressed as a fraction of the values with zooming. We only compute the ratio of reasoning times in trials that resulted in successful achievement of the assigned goal; this considers all the trials for complexity levels L1 − L2, but only 65% of the trials for complexity level L3

Experimental results on physical robots
We also ran experimental trials with the combined architecture, i.e., AT I with refinement and zooming, on two different robot platforms. These trials represented instances of the different scenarios (in Section 3.2) in domains that are variants of the RA domain in Example 1. First, consider the experiments with the Baxter manipulating objects on a tabletop as shown in Fig. 1. This domain is characterized by the following: -The goal is to move particular objects between different "zones" (instead of places), or between particular cell locations within the zones, on a tabletop. -After refinement, each zone is magnified to obtain grid cells. Also, each object is magnified into parts such as base and top after refinement. -Objects are characterized by the attributes color and size.
-The robot does not have a mobile base but it uses its arm to move objects between cells or zones.
Next, consider the experiments with the Turtlebot robot operating in an indoor domain as shown in Fig. 1. This domain is characterized by the following details: -The goal is to find and move particular objects between places in an indoor domain.
-The robot does not have a manipulator arm. It solicits help from a human to pickup the desired object when it has reached the location of the target object, and to put the object down when it has reached the location where it has to move the object. -Objects are characterized by the attributes color and type.
-After refinement, each place or zone was magnified to obtain grid cells. Also, each object is magnified into parts such as base and handle after refinement.
Although the two domains differ significantly, e.g., in terms of the domain attributes, actions and complexity, no change is required in the architecture or the underlying methodology.
Other than providing the domain-specific information, no human supervision is necessary; most of the other steps are automated. Similar to the experiments in simulation, we used accuracy (of task completion) and time (for planning and execution) as the performance measures, expressing the values of relevant measures (e.g., planning time) for the baseline implementation as a fraction of the values with our architecture. In ≈ 50 experimental trials in each domain, the robot using the combined architecture is able to successfully achieve the assigned goal. The performance is similar to that observed in the simulation trials. For instance, if we do not include AT I, i.e., use T P with refinement and zooming, the accuracy with which the goal is achieved reduces from 100% to ≈ 60%, and it takes ≈ 2 − 3 times as much time to achieve the goal, especially in the presence of unexpected success or failure. In other scenarios, the performance with AT I is at least as good as that with T P. Also, if we do not include zooming, the robot takes significantly longer to plan and execute concrete actions. In fact, as the domain becomes more complex, i.e., with an increase in the number of domain objects and the length of the plan required to achieve the desired goal, planning starts becoming computationally expensive and (often) computationally unfeasible without zooming. These results support the three hypotheses listed in Section 4. Videos of the trials on the Baxter and the Turtlebot corresponding to different scenarios can be viewed online [29].
For instance, in one trial involving the Turtlebot, the goal is to have both a cup and a bottle in the library, and these objects and the robot are initially in off ice 2 . The computed plan has the robot pick up the bottle, move to the kitchen, move to the library, put the bottle down, move back to the kitchen and then to off ice 2 , pick up the cup, move to the library through the kitchen, and put the cup down. When the Turtlebot is moving to the library holding the bottle, someone moves the cup to the library. With AT I, the robot uses the observation of the cup (as it is putting the bottle down in the library), to infer that the goal has been achieved and to terminate plan execution early. Without AT I, i.e., with T P, the robot continued with its initial plan and realized that there was a problem (unexpected observation of the cup in the library) only when it went back to off ice 2 and did not find the cup there.
Similarly, in one trial with the Baxter, the goal is to have blue blocks and green blocks in zone Y (yellow zone on the right side of the screen) and these blocks are initially in zone R (red zone on the left side of the screen). The computed plan has the Baxter move its arm to zone R, pick up a block, move to zone G (green zone in the center) then to zone Y to put the block down, and repeat this process until it has moved all the blue blocks and green blocks. When the Baxter has moved one block and is moving back to pick up the second block from zone R, an exogenous action puts the first block in zone G. With AT I, as the Baxter is moving over zone G on the way to zone R, it observes the block (that it has previously placed in zone Y), performs diagnostics and realizes his current activity will not achieve the goal. It then stops executing its current activity, computes a new activity of intentional actions, and succeeds in moving both blocks to zone Y. With T P, the robot is not able to use the observation of the first block in zone G, continues with the initial plan and never realizes that the goal has not been achieved.

Discussion and future work
In this paper we presented a general architecture that represents and reasons with intentional actions. The architecture represents and reasons with domain knowledge and beliefs encoded as tightly-coupled transition diagrams at two different resolutions, with the fineresolution description defined as a refinement of the coarse-resolution description. For any given goal, non-monotonic logical reasoning with the coarse-resolution domain representation containing commonsense domain knowledge is used to provide a plan of intentional abstract actions. The coarse-resolution transition corresponding to each such abstract intentional action is implemented as a sequence of concrete actions by automatically identifying and reasoning with the part of the fine-resolution representation relevant to the coarseresolution transition and the coarse-resolution goal. The execution of each concrete action uses probabilistic models of the uncertainty in sensing and actuation, and any associated outcomes are added to the coarse-resolution history. Experimental results in simulation and on different robot platforms, as summarized above, indicate that this architecture improves the accuracy and computational efficiency of decision making in comparison with an architecture that does not reason with intentional actions. It also significantly improves the computational efficiency of decision making in comparison with an architecture that does not support zooming in the fine resolution.
This architecture opens up multiple directions for future research that build on the capabilities of the current architecture. First, although the current architecture builds on key results of the coupling between the transition diagrams, it will be interesting to formally establish the relationship between the different transition diagrams in this architecture, along the lines of the analysis provided in [26]. This will enable any designer using our architecture for a particular robotics domain to establish correctness of the algorithms and build trust in the resultant behavior of the robot. Second, the results reported in this paper are based on experimental trials in variants of a particular (RA) domain. However, the underlying capability of modeling and reasoning about intentional actions is relevant to other problems and applications characterized by dynamic changes. For instance, other work within our research group has combined the reasoning capabilities of our architecture with inductive learning of domain constraints to guide the construction of deep networks that have been used for estimating the occlusion and stability of object structures [19] and for answering explanatory questions about images [21,22]; other research groups have explored the combination of ASP-based knowledge representation with low-level perceptual processing for explaining spatial relations in videos [28]. Future research can adapt our architecture to such problems in more complex domains to demonstrate the scalability and wider applicability of our architecture. Third, the relational representation and reasoning capabilities supported by our architecture can be used to provide explanations of the decisions made, the underlying beliefs, and the experiences that informed these beliefs. Currently, our architecture only reasons with representations at two different resolutions, but proof of concept work indicates that it is possible to introduce a theory of explanations and expand the notion of refinement to interactively provide explanations at different levels of abstraction [27]. Fourth, the architecture has only considered a single robot representing and reasoning with intentional actions. There is considerable research on a team of robots working with humans, including approaches based on logic programming [11]. Future work can extend our architecture to a team of robots collaborating with humans in dynamic application domains such as disaster rescue, surveillance, and healthcare. The long-term goal is to enable a team of robots collaborating with humans in complex domains to represent and reason reliably and efficiently with different descriptions of incomplete domain knowledge and uncertainty.