Abstract
Goal recognisers attempt to infer an agent’s intentions from a sequence of observed actions. This is an important component of intelligent systems that aim to assist or thwart actors; however, there are many challenges to overcome. For example, the initial state of the environment could be partially unknown, and agents can act suboptimally and observations could be missing. Approaches that adapt classical planning techniques to goal recognition have previously been proposed, but, generally, they assume the initial world state is accurately defined. In this paper, a state is inaccurate if any fluent’s value is unknown or incorrect. Our aim is to develop a goal recognition approach that is as accurate as the current state-of-the-art algorithms and whose accuracy does not deteriorate when the initial state is inaccurately defined. To cope with this complication, we propose solving goal recognition problems by means of an Action Graph. An Action Graph models the dependencies, i.e. order constraints, between all actions rather than just actions within a plan. Leaf nodes correspond to actions and are connected to their dependencies via operator nodes. After generating an Action Graph, the graph’s nodes are labelled with their distance from each hypothesis goal. This distance is based on the number and type of nodes traversed to reach the node in question from an action node that results in the goal state being reached. For each observation, the goal probabilities are then updated based on either the distance the observed action’s node is from each goal or the change in distance. Our experimental results, for 15 different domains, demonstrate that our approach is robust to inaccuracies within the defined initial state.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
By observing the behaviour of an agent, artificially intelligent systems can attempt to determine the agent’s intentions. Knowledge of an agent’s intentions is essential in numerous application areas. These include computer games in which non-playable characters must adapt to players’ actions [9]; intelligent user help for human–computer interaction scenarios [25, 26]; offering humans energy saving advice [37]; robot sports playing (e.g. table tennis [63]); interfering (and thus preventing) the intentions of computer network intruders [14, 42]; determine the location a human is navigating to (e.g. for airport security) [38], and to enable proactive robot assistance [4, 12, 20, 34, 35, 56]. Rather than developing domain specific intention recognition algorithms, a symbolic representation of the world and agents’ actions can be provided as input to non-domain specific algorithms [53, 62].
Intention recognition can be split into several categories, namely, activity recognition [18, 36, 60, 65], plan recognition [14, 41, 58], and goal recognition [39, 44, 49, 55, 64]. Our work falls under the category of goal recognition (GR), in which the aim is to label a sequence of observations (e.g. actions) with which goal the observee is attempting to reach. For instance, when provided with a sequence of move actions, GR methods will attempt to select (from a predefined list) which location the agent is intending to reach. For the Kitchen domain by Raírez and Geffner [49], the sequence of observed actions includes taking different items and using appliances (e.g. a toaster), and the returned classification indicates if the observee is likely to be making breakfast, dinner or a packed lunch. Goal and plan recognisers operate on discrete observations/actions, and thus, assume that data streams have been preprocessed, e.g. sensor data have been processed by activity recognisers.
Our GR method aims to overcome several challenges. First, the defined initial world state could be inaccurate; for instance, if an item or agent (e.g. cup or human) is occluded, its location is indeterminable, and thus, possibly defined incorrectly. Second, the observed agent could act suboptimally [49]; therefore, all plans (including suboptimal plans) are represented within the underlying structure generated by our approach. Third, actions could be missing from the sequence of observations [46], e.g. due to sensor malfunction or occlusions. Finally, an observation should be rapidly processed, so there is little delay in determining the observee’s goal. The cited GR works have investigated handling suboptimal observation sequences and handling missing observations, but they do not consider inaccurate initial states.
We define the term inaccurate initial state as an initial state containing fluents (i.e. non-static variables) whose value is unknown (i.e. undefined) and/or incorrect (i.e. set to the wrong value). Inaccurate initial states have been handled by task planners [6, 43]. Moreover, GR with probabilistic, partially observable state knowledge and stochastic action outcomes has previously been investigated [29, 50, 68]; however, these systems require the probability of each state and action outcome to be known (and thus defined within the GR problem). GR with incomplete domain models, i.e. problems containing actions with incomplete preconditions and effects, has also been considered [45], but the initial state was assumed to be accurately represented. Our system makes no assumptions about the correctness of the initial value assigned to a fluent.
In this paper, we aim to answer two research questions. (i) Can a structure similar to those created by library-based approaches be generated from a PDDL defined GR problem? (ii) When the initial state is inaccurately defined, how can a goal recognition approach be prevented from suffering a major loss of accuracy?
To answer these questions, we develop a novel technique for transforming GR problems into an Action Graphs, a structure inspired by AND/OR trees. Leaf nodes correspond to actions and are connected to their dependencies via operator nodes. Operator nodes include DEP (short for dependencies), ORDERED/AND, UNORDERED/AND and OR nodes. After transforming the action definitions and world model into an Action Graph, the Action Graph’s nodes are labelled with their distance from each hypothesis goal, i.e. each goal the observee could be intending to achieve. Both these processes are performed offline. For each observation, the online process updates the goal probabilities based on either the distance the observed action’s node is from each goal or the change in distance. Our distance measure is based on the number and type of nodes traversed to reach the node in question from an action node that results in the goal state being reached. The goal(s) with the highest probability are returned as the set of candidate, i.e. predicted, goals. An Action Graph does not contain a perfect representation of all plans; as mentioned by Pereira et al. [46], unlike task planning, this is not a requirement of GR. A conceptual overview of our system is provided in Fig. 1.
Our previous work on GR employed an acyclic (rather than cyclic) Action Graph [21], and thus, for many domains did not achieve a high accuracy. Moreover, our previous method cannot handle an inaccurate initial state. Action Graphs have also been applied to goal recognition design, in which the aim is to reduce the number of observations required to determine an agent’s goal [19]. This paper introduces a novel method for inserting the actions into an Action Graph and presents an alternative approach for updating the goals’ probability.
The remainder of this paper is structured as follows. Section 2 presents some background information. A formal definition of our Action Graph structure is provided in Sect. 3, and Sect. 4 describes the algorithm that generates the Action Graph. Section 5 introduces our distance measure and how the nodes are labelled with their distance from each goal. The different goal probability update rules, that are executed when an observation is received, are described in Sect. 6. Our experimental results, discussed in Sect. 7, show that our GR method is unaffected by inaccuracies in the initial state.
2 Background
Symbolic task planning and goal recognition problems are often defined in Planning Domain Definition Language (PDDL) [40], a popular domain-independent language for modelling the behaviour of deterministic agents. A PDDL defined problem includes action definitions, objects, predicates and an initial state; an example of each is provided in Listing 1. Our GR approach transforms a PDDL problem into a multi-valued problem by running the converter of [22]. The multi-valued problem is then transformed into an Action Graph. The goal of this section is to provide readers unfamiliar with PDDL and goal recognition the background information required to understand the data provided as input and the notations used. This section first describes why a multi-valued representation is used. The task planning and GR (also known as inverse planning) problem definitions are provided, in the subsections, from a multi-valued problem perspective.
Footnote 1To create a concise, grounded representation of a problem, a PDDL defined problem is often converted into a multi-valued representation [22, 23]. This representation uses finite variables rather than Boolean propositions. For example, rather than a move(1_1 1_2) action (which symbolises an agent moving from grid position 1_1 to 1_2) removing the proposition (at 1_1) from the current state and inserting the proposition (at 1_2), a variable, i.e. fluent, that represents the agent’s location is changed from (at 1_1) to (at 1_2). This enables a more concise representation of the problem to be produced, from which the relations between the different propositions can be extracted. Moreover, a (grounded) action is only created, from an action definition, if its static preconditions appear in the PDDL defined initial world state. For example, to create the move(1_1 1_2) action, positions 1_1 and 1_2 must be adjacent. Which locations are adjacent can be statically defined; in other words, no action modifies which locations are adjacent. Further details on the benefits of this representation are given in Helmert [23].
2.1 Symbolic task planning
In symbolic task planning, a problem contains a single goal state, and task planners, e.g. Fast Downward [22], find the appropriate set of actions, i.e. a task plan, that can transform the initial world state into the desired goal state. Definitions for states, actions and planning problems are provided below.
Definition 1
Planning Problem A planning problem P can be defined as \(P = (F,I,A,G)\), where F is a set of fluents, I is the initial state, G is a goal state, and A is a set of actions [13, 16].
Definition 2
Fluent A fluent (\(f\in F\)) is a state variable.
When assigned a value, a fluent can be represented by a grounded predicate. Grounded predicates are also called atoms. For instance, (at 1_2) is an atom which denotes that the observed agent is at the position 1_2.
Definition 3
State A state contains all fluents, each of which is assigned a value.
The initial state (I) contains all fluents, whereas the goal (G) could be a partial state, containing a subset of fluents. To transition between states, the value of fluents are altered by actions. An action is formally defined as follows:
Definition 4
Action An action (a) is comprised of a name, a set of objects, a set of preconditions (\(a_{pre}\)) and a set of effects (\(a_{eff}\)). Preconditions and effects are composed of a set of valued fluents. Preconditions can contain or and and statements.
Action a is applicable to state s if the state is consistent with the action’s preconditions. Applying action a to state s will result in state \(s'\), where \(a_{eff}\subseteq s'\) and \(\forall (f \in s', f\notin a_{eff}): (f \in s)\).
Definition 5
Planning Problem Solution A solution to a planning problem is a sequence of actions \(\pi =(a0, a1,..., ai\in A)\) such that applying each action in turn starting from state I results in a state (\(s^i\)) that is consistent with the (partial) goal state G, i.e. \(s^i \supseteq G\).
Planners search for the optimal solution to a planning problem. An optimal solution is the solution with the lowest possible cost. In our work, the cost of an action is 1, and thus, the cost of a plan is equivalent to its length.
2.2 Goal recognition
Goal recognition is often viewed as the inverse of planning, as the aim is to label a sequence of observations with the goal the observed agent is attempting to reach. This section provides the formal definition of a GR problem and describes the observation sequences and output of our GR approach.
Definition 6
Goal Recognition Problem A GR problem is defined as \(T = (F,I,A, O,{\mathcal {G}})\), where \({\mathcal {G}}\) is the set of all possible (hypothesis) goals and O is a sequence of observations [49].
Definition 7
Observations O is a sequence of observed actions (observations), i.e. \(O=(a1, a2,..., ai\in A)\).
A completed sequence of observations with no missing actions can be applied to an initial state I to reach a goal state \(G\in {\mathcal {G}}\). This sequence can also be incomplete, have missing observations or/and be suboptimal. An incomplete sequence of observations contains the first N actions that are required to reach a goal; in other words, the goal has not yet been reached. An action could be missing from anywhere within a (incomplete or complete) sequence of observations. Observations are suboptimal if any number of additional, unnecessary actions have been performed to reach the goal.
GR approaches attempt to select the real goal from the set of hypothesis goals \({\mathcal {G}}\). Our GR approach produces a probability distribution over the hypothesis goals, i.e. \(\sum _{i=1}^{|{\mathcal {G}}|}P({G_i}|O) = 1\). In other words, we aim to find the likelihood of a given observation sequence O under the assumption that the observee is pursuing a goal \(G_i\), i.e. \(P(O|G_i)\). The goal(s) with the highest probability are returned as the set of candidate goals \({\mathcal {C}}\). As goals can be equally probable, there can be multiple candidate goals, i.e. \(|{\mathcal {C}}| \ge 1\). Nevertheless, we assume that there is only a single real goal. Note, our evaluation metrics (see Section 2.7.1) take into account that multiple goals could be returned.
3 Action Graph structure and formal definitions
Action Graphs model the possible order constraints between actions by linking actions (dependants) to their dependencies. They are constructed of action nodes and operator nodes, namely, DEP (short for dependencies), ORDERED/AND, UNORDERED/AND and OR nodes. Action nodes are always leaf nodes, and their dependencies are conveyed through their connections (via operator nodes) to other actions. This section defines dependencies, provides a definition of an Action Graph, describes how the Action Graph structure links actions to their dependencies and briefly mentions related structures.
Definition 8
Action’s Dependencies The set of dependencies of action \(a\in {\mathcal {A}}\) is formally defined as: \(D(a) = \{ a' \mid (a'_{eff} \cap a_{pre}) \ne \emptyset \} \).
Definition 9
Action’s Dependant Action a is a dependant of action \(a'\) if \(a' \in D(a)\).
In other words, action \(a'\) is a dependency of action a if at least one effect of \(a'\) fulfils at least one of a’s preconditions, i.e. \(a'\in D(a)\) if \(a'_{eff} \cap a_{pre} \ne \emptyset \). In that case, action a is called the dependant of the dependency \(a'\). The order in which dependencies are likely to be observed can be conveyed by the nodes of an Action Graph.
Definition 10
Action Graph \(AG=(N^O,N^A,E)\), where \(N^O\) is a set of operator nodes, \(N^A\) are action nodes and E are edges.Footnote 2 Operator nodes are of type DEP, UNORDERED/AND, OR and ORDERED/AND nodes, i.e. \(N^O=(N^{OR}, N^{DEP}, N^{O\text {-}AND}, N^{U\text {-}AND})\). The root node is of type OR. All nodes (except the root) have a set of parents. All operator nodes (\(N^O\)) have a set of children, and those children can be operator nodes or action nodes. \(N^A\) are leaf nodes.
The operator node types are described in the list below and depicted in Fig. 2. The precedes operator \(\prec \) denotes that the list of actions on the left precedes (are dependencies of) the action on the right. Standard maths notation is used to denote if a set of actions is unordered or ordered, that is, curly brackets denote the actions are unordered and angle brackets show the actions are ordered [52]. Moreover, rather than writing or constraints as two statements, e.g. \(a4 \prec a1 \text { OR } a5 \prec a1\), a shortened form is given, e.g. \(or(a4,a5) \prec a1\).
-
DEP nodes indicate that an action’s dependencies are performed before the action itself, e.g. \( D(a1) \prec a1\). The second (i.e. last) child of a DEP node is the action node itself; the first child could be of any type.
-
UNORDERED-AND nodes denote that different dependencies set different preconditions (and there are no order constraints on the dependencies), e.g. if \(a2 \in D(a1)\), \(a3 \in D(a1)\) and \(\left( a1_{pre}\cap a2_{eff}\right) \ne \left( a1_{pre}\cap a3_{eff} \right) \), then \( (a2 \wedge a3) \prec a1\).
-
OR nodes express the multiple (alternative) ways a precondition can be reached, e.g. if \(a4 \in D(a1)\), \(a5 \in D(a1)\) and \(\left( a1_{pre} \cap a4_{eff}\right) = \left( a1_{pre} \cap a5_{eff}\right) \) then \(or(a4,a5) \prec a1\).
-
ORDERED-AND nodes indicate there are order constraints between an action’s dependencies. Such constraints are required when executing one dependency could unset the preconditions of another. For example, if \(a6 \in D(a1)\), \(a7 \in D(a1)\) and both \(a6_{pre}\) and \(a7_{eff}\) contain the same fluent but with different values, then a6 is performed before a7, i.e. \(\langle a6,a7\rangle \prec a1\). This is because the effects (fluents) of a7 are preconditions of a1 but to perform a6 those fluents must be assigned a different value. If these constraints are cyclic, e.g. \( \{\langle a6, a7 \rangle , \langle a7, a6 \rangle \} \prec a1\), then the constraint is ignored; in other words, the dependencies are considered to be unordered.
Dependencies are actions, and thus, they can also have dependencies. For example, a8 could depend on a9, which depends on a10, i.e. \(a10 \prec a9 \prec a8\). If an action has dependencies, its only parent is the DEP node linking it to its dependencies (and dependants). Thus, continuing with the example, the left child of a8’s parent DEP node is a9’s parent DEP node.
Cyclic dependencies can also occur, e.g. a1 could depend on a2 which depends on a1 (i.e. \(...a2 \prec a1 \prec a2...\)). This causes cycles to appear within the Action Graph. These cycles can also be caused by indirect dependencies, e.g. \(...a2 \prec a3 \prec a1 \prec a2...\) in which a1 is a dependency of a2 and a2 is an indirect dependency of a1.
Our Action Graph structure does not contain states. An Action Graph only captures information about which actions fulfil each action’s preconditions. This is similar to the structures of library-based intention recognition and planning approaches [24, 31, 54, 59]. For instance, Goal-Plan trees contain (sub)goals with plans that can contain subgoals [54, 59]. Goal-Plan trees do not contain knowledge of the environment’s current state. In particular, the Action Graph structure was inspired by the work of Holtzen et al. [24], who represented a library of plans as AND/OR trees. Differently, our approach takes PDDL rather than a library of plans as input and enables suboptimal and cyclic dependencies to be represented.
Moreover, the definition of a dependency is similar to causal links from Partial-Order Causal Link (POCL) planning [17]. Like dependencies, a causal link expresses that an action’s preconditions are contained within another action’s effects. Differently, POCL structures represent complete plans (to reach a single goal state from the initial state), edges rather than nodes are used to denote the order constraints and they can contain ungrounded actions. As GR does not require a completely valid plan, Action Graphs are simpler to construct than POCL structures.
4 Cyclic Action Graph creation
Our goal recognition method creates an Action Graph, labels the nodes with their distance from each goal, then for each observation updates the goals’ probability. This section describes how an Action Graph is generated from a GR problem. The modifications to the preprocessing step, that transforms a PDDL problem into a multi-valued problem, are described. Subsequently, the action insertion algorithm is detailed, followed by an example.
4.1 Preprocessing: multi-valued problem generation
This paper only provides the details of the transformation, from a PDDL defined problem to a multi-valued problem, that are key to understanding our approach and that differ from [22]. A single goal statement is required by the converter of Helmert [22]; therefore, prior to calling the converter, a goal statement is created by placing all hypothesis goals (\({\mathcal {G}}\)) into an or statement, i.e. \(G=or(G_1, G_2,...,G_{|{\mathcal {G}}|} \in {\mathcal {G}})\).
The converter’s parameter, to keep all unreachable states, is set to true and, after parsing the PDDL, all groundings of the actions’ effects are inserted into the initial state (I). This forces actions, and all fluents’ values, to be inserted into the resulting representation even if the actions’ fluent preconditions, and thus, possibly the goals, are unreachable from the defined (original) initial state. For instance, if an agent’s location is missing from I, e.g. because it is unknown, and no transition between an unknown and known location exists, then move actions would not be inserted as their preconditions can never be met. To prevent this, all (at ?location) groundings are inserted into I. Additional static atoms are not inserted into I; thus, continuing with our example, move(1_1 1_2) is only appended to the set of actions A if (adjacent 1_1 1_2) is declared in the defined initial state.
4.2 Inserting actions into an Action Graph
An Action Graph is initialised with an OR node as the root; then, each action (\(a \in A\)) is inserted into the graph in turn by connecting it to its dependencies. Actions can be inserted in any order. Finally, the graph is adjusted so only the Goal Actions’ parent DEP nodes are connected to the root. This process is detailed below, and the pseudocode is provided in Appendix A.
If an action has no dependencies, because either there are no actions that fulfil its preconditions or it has no preconditions, it is simply appended to the root’s children. In all other cases, the root is linked to a new DEP node. The DEP node’s two children are set to an UNORDERED-AND node, proceeded by the action node itself. If this action node was already created, because it is a dependency of an already processed action, the action node’s prior parents are moved to be the DEP node’s parents.
The UNORDERED/AND node’s children are set to one or more of the following: the action nodes (or parents) of the dependencies, OR nodes if there are multiple ways in which a precondition can be met, and/or ORDERED/AND nodes. OR nodes’ are inserted by setting their children to the action nodes of the dependencies that set the same precondition(s). If a dependency has dependencies, the corresponding child becomes the dependency’s parent. This is because actions that have dependencies can only ever have a single parent, of type DEP. Note: if an operator would only have one child, the operator node is not inserted.
ORDERED/AND nodes indicate that there are order constraints on the dependencies themselves. This is detected by checking if a fluent has a value in a dependency’s preconditions which is different in another dependency’s effects (see Sect. 3); and thus, the former dependency must be performed first. If this constraint is bidirectional/cyclic, the ORDERED/AND node is not inserted; instead, the dependencies become the children of the UNORDERED/AND node. Only the preconditions/effects of direct dependencies are checked, the algorithm does not check if a dependence’s dependency could undo/unset a dependency’s precondition. Performing this check would be computationally complex and a perfect representation of the plans to reach of the actions’ effects is not required.
The ORDERED/AND node’s children could also be of type UNORDERED/AND or OR. When multiple dependencies could unset a dependency’s preconditions, an UNORDERED/AND is inserted as the child of the ORDERED/AND node’s right branch. Moreover, the dependencies that set the same precondition(s) of the dependant are grouped together as the children of an OR node. Therefore, if one of these dependencies is affected by (or effects) another dependency, the OR node becomes the ORDERED/AND (or UNORDERED/AND) node’s corresponding child. Without this feature, the graph’s structure would become more complex, i.e. be of greater depth and/or breadth.
4.3 Identifying goal actions
An action is a Goal Action if its effects fulfil a goal’s atoms, i.e. \(a_{eff} \supseteq G\), where \(G \in {\mathcal {G}}\). After all actions have been inserted, the root node’s children are modified so that only the Goal Actions are attached to the root. If multiple actions are required to fulfil a goal, e.g. \((a1_{eff} \cup a2_{eff}) \supseteq G\), then an auxiliary Goal Action (\(a^x\)) is created. Auxiliary Goal Actions are linked to the multiple actions that fulfil the goal via a DEP node, e.g. \(\{a1,a2\} \prec a^x\). They are connected to their dependencies, i.e. the goal’s dependencies, in the same way as all other actions are.
Identifying and creating Goal Actions simplifies traversing the graph to find all nodes belonging to a single goal. All children, including indirect children, of a Goal Action’s parent DEP node could appear in a plan (from any initial state) to reach the goal the Goal Action fulfils. Therefore, the graph can be traversed, in a depth-first or breadth-first manner, to find all the nodes, and thus actions, belonging to a goal.
4.4 Example
An example is provided, in this section, to demonstrate how our creation algorithm works for the grid-based navigation problem depicted in Fig. 3. Figure 4 shows the Action Graph after each action, and its dependencies, have been inserted. The four insertions, detailed below, were selected to show the different structural features of an Action Graph. A figure with all actions inserted into the graph would be unreadable, and thus is not provided.
The example starts by inserting the goal action, move(2_0 1_0). The preconditions of move(2_0 1_0) are met by executing one of two possible actions, i.e. \(or(\texttt {{move(1\_0 2\_0)}}, \texttt {{move(2\_1 2\_0)}}) \prec \texttt {{move(2\_0 1\_0)}}\); therefore, it is inserted by connecting it to its dependencies via a DEP node and a OR node (see Fig. 4a). Likewise, when move(1_0 0_0), whose preconditions are reached by one of three actions (i.e. \(or(\texttt {{move(2\_0 1\_0)}}, \texttt {{move(1\_1 1\_0)}}, \texttt {{move}}\texttt {{(0\_0 1\_0)}}) \prec \texttt {{move(1\_0 0\_0)}}\)), is inserted, an OR node is created. As one of its dependencies has already been inserted, the appropriate child of the OR node is set to move(2_0 1_0)’s parent DEP node. This is shown in Fig. 4b.
Inserting move(0_0 1_0) causes the graph to become cyclic (Fig. 4c) because it depends on one of its dependants, i.e. \(\texttt {{move(1\_0 0\_0)}} \prec \texttt {{move(0\_0 1\_0)}}\). Figure 4d displays the graph after move(1_1 2_1) has been inserted. This action requires location 2_1 to be unlocked with key1, and thus, its dependencies include unlock actions. As unlock actions’ preconditions contain the location of the agent, they must be performed prior to the move actions required by the dependant. Therefore, an ORDERED/AND node is created during the insertion of move(1_1 2_1), i.e. \( \langle or(\texttt {{unlock(2\_1 2\_0 key1)}}, \texttt {{unlock(2\_1 1\_1 key1)}}), or(\texttt {{move(0\_1 1\_1)}}, \texttt {{move(2\_1 1\_1)}}, \texttt {{move(1\_0 1\_1)}})\rangle \prec \texttt {{move(1\_1 2\_1)}}\).
5 Node distance initialisation
Each node has a set of distances associated with it, which indicate how far the node is from each goal, i.e. the number of DEP and ORDERED/AND nodes that must be traversed to get from the Goal Action’s parent to the node in question. These distances are set by means of a breadth-first traversal (BFT). A BFT was implement because this will result in the nodes being visited in the order they are likely to be performed. An explanation of this algorithm is provided, followed by an example. The pseudocode can be found in Appendix B.
5.1 Node value initialisation algorithm
During the BFTs, that start from each Goal Action’s parent node, the current node’s distance is set, the count (i.e. distance measure) is increased if the node is of type DEP or ORDERED/AND, and each of the node’s children is pushed onto the BFT-queue. This distance measure provides an indication of how far each node is from each goal whilst attempting to minimise favouring shorter plans (see Sect. 6 for the calculations of the goal probabilities). The same node could be visited multiple times during a BFT; however, if the current distance/count is greater than or equal to the node’s already assigned distance, it is not reprocessed. As well as allowing the shortest distance to be assigned to each node, this prevents an endless loop from occurring when two actions depend on each other (e.g. \(...a2 \prec a1 \prec a2...\)).
As an action could appear in a plan multiple times, some nodes require multiple distances for the same goal; this is the case for the descendants of ORDERED/AND nodes’ right branch. Therefore, a node contains a map for each goal, from the last traversed ORDERED/AND to the node’s distance from the goal via the ORDERED/AND node. When the right branch of the ORDERED/AND node has been fully observed, the distance of the node, returned when calling a get distance method, will be the distance associated with that ORDERED/AND node. As the initial state is unknown and plans are not perfectly represented, the distances assigned to the left branch of ORDERED/AND nodes are not based on the depth of the right branch.
Labelling nodes with multiple distances per goal increases the worst case time complexity from \(O(n^2)\) to \(O(n^3)\), with respect to the number of actions. This is because each action in the graph could be a dependency of all other actions; thus, for all actions all other actions could be visited. When labelling the nodes with multiple values this process could be repeated n times. Therefore, to help minimise the number of nodes the BFTs traverse, when an UNORDERED/AND node is reached, its children’s (including indirect children’s) distance is not associated with the prior ORDERED/AND node(s). Developing this component greatly reduced (\(\approx \)halved) the run time of our experiments (Sect. 7.3.2) and had negligible impact on the accuracy of our approach.
The offline component of our system finishes by setting the prior probability of each goal. We chose to use a uniform prior probability as, since no actions have been observed, all goals are assumed to be equally likely.
5.2 Example
The Action Graph depicted in Fig. 5 shows the resulting action nodes’ distance from each goal for a simplified version of the Kitchen domain by Ramírez and Geffner [48]. In this example, there are two Goal Action nodes, namely, pack/lunch() and make/dinner(). By executing the BFTs described above, each node is labelled with their distance from each goal. This example will be used in Sect. 6, to demonstrate how an observation affects the goals’ probability, and thus, why this node value initialisation procedure has been implemented.
6 Updating the goal probabilities
When an action is observed, the probability associated with each goal is updated based on either its distance from the observed action or the difference between its distance from the prior observation and the current observation. These two update rules are described in turn along with their advantages and disadvantages. The experiments section presents results for both these update rules separately, as well as combined. The pseudocode, for the rules combined, is provided in Algorithm 1.
6.1 Update rule 1: distance from observed action
Each goal’s probability is updated based on how close the goal is to the observed action and how unique the observation is to the goal. The probabilities of the goals closest to the observation are increased, whilst those furthest from the observation are decreased. If an observation only belongs to a single goal, that goal’s probability is increased and all other probabilities are decreased. This is performed by multiplying each goal’s probability by its distance from the observed action’s node divided by the sum of all goals’ distances (lines 7-10); then normalising the resulting values (line 12). Note, if the observation is not within a plan to reach the goal G, 0 is returned by the getDisFromGoal method (line 8) so that \(c(G)=0\) and, so long as another goal’s plan contains the action, its probability is reduced.
For the example shown in Fig. 5, there are two goals, both with a prior probability of 0.5. When the take(plate) action is observed, the resulting probabilities are unaltered as its node’s distance to each goal is equal. More nodes must be traversed to reach take(plate) from pack/lunch(), than from make/dinner(). Nevertheless, the goal with a shorter plan was not favoured as the distance counter (see Sect. 5) was only increased when a DEP or ORDERED-AND node was traversed. If take(knife) is observed, the probability of making a pack lunch is increased, i.e. \(P(\texttt {{(made-lunch)}})=0.67\) and \(P(\texttt {{(make-dinner)}})=0.33\), as the observed action is unique to this goal.
The main disadvantage of this approach is that the probabilities of goals with shorter, strongly ordered, plans are increased more than those with longer plans. Therefore, the list of returned candidate goals \({\mathcal {C}}\) often contains the goal(s) with a shorter plan. For instance, if an incomplete sequence of observations contains actions that approach both G1 and G2, whichever of these two goals has the shortest plan length will be returned as a candidate goal, the other will not be. The subsequent update rule aims towards mitigating this disadvantage.
6.2 Update rule 2: change in distance from the observed actions
If the previous observation (\(o^{t-1}\)) and the current observation (\(o^t\)) are connected via a DEP or ORDERED-AND node, the goal probabilities are updated based on the change in distance, i.e. the difference between the goal’s distance from the previous and current observations (lines 2-5 of Algorithm 1). To check if the observations are connected, an upwards traversal (in a depth-first manner) is performed, starting from the action node of \(o^{t-1}\), to find a DEP or ORDERED-AND node whose right branch’s child is the action node of \(o^t\).
If the list of observations is not missing any actions, the change in distance will always be 1, 0 or -1. As an observation could be missed (e.g. due to sensor failure), our algorithm needs to account for the difference being within a wider range of values. A negative difference indicates the observee moved further from the goal, whereas a positive difference indicates they moved closer. The sigmoid function converts the difference into a value between 0 and 1 (i.e. \(\sigma \) from line 3); the goal’s value is multiplied by this (line 4) and then, normalised (line 12). If either observation does not belong to the goal, the value of v(G) is equivalent to setting the result of the sigmoid function to 0; in other words, the difference is \(-\infty \). This update rule results in the probability of the goal the observee is moving towards at the highest rate to be increased the most.
When this rule is used independently from the first rule, if the previous observation is null, or the current and previous observations are not connected via a DEP or ORDERED/AND node (as detailed above), then \(c(G)=0.5\) for the goals dependent (or indirectly dependent) on the current observation and \(c(G)=0\) for all other goals. This prevents the goals that have shorter plans from being favoured; however, goals for which the observation appears in a (very) suboptimal plan are treated equally to those for which an optimal plan contains the observation.
In the example depicted in Fig. 6, if the observee moves from position 2_1 to 1_1 then to 0_1, the goal probabilities remain equal. As the distance to both goals is reduced at the same rate, the real goal is indiscernible. If the observee were to move vertically, and thus, step towards one goal (and away from the other), the corresponding goal’s probability is increased, e.g. observing move(2_1 1_1) then move(1_1 1_0) results in \(P(G1)=0.58\) and \(P(G2)=0.42\).
6.3 Processes common to update rules 1 and 2
In both update rules, \(1+c(G)\) is calculated, rather than just c(G), so that a goal’s probability is never set to 0. If the probability of a goal were to be set to 0, it cannot be increased; thus, the heuristic would not be able to recover from receiving an incorrect (noisy) observation. These update rules, along with the graph’s structure, enable our system to handle noisy observations as well as suboptimal plans and missing observations.
When an action has been observed, which operator nodes have been completed are updated by traversing up the graph, in a depth-first manner, from the observed action’s node (line 13). OR nodes are set as completed if one of their children has been completed, DEP nodes are complete if the child connected to their right branch has been observed and UNORDERED/AND nodes are set as completed if all their children are. If a node is not set to completed, its parents are not traversed. When an ORDERED/AND node’s left branch has been completed, the nodes attached to its right branch are informed, so that their distance associated with that ORDERED/AND node is used when the next observation is received (as described in Sect. 5).
7 Experiments
Through experiments we aim to demonstrate the accuracy of our GR approach, after 10, 30, 50, 70 and 100% of actions in a plan have been observed, on 15 different domains. This section describes the evaluation metrics, followed by the setup and results of the different experiments. A comparison between our different update rules and the goal completion heuristic, namely, h\(_{\text {gc}}\), by Pereira et al. [44, 46] is provided, on problems for which differing percentages of fluents have been set to incorrect values. Our method is then compared to h\(_{\text {gc}}\) on a dataset containing GR problems with a known, and thus, correctly defined, initial world state.
Pereira et al. [44, 46] recently improved the accuracy and computational time of GR by finding landmarks, i.e. states that must be reached to achieve a particular goal. After processing the observations, the resulting value of each goal (\(G \in {\mathcal {G}}\)) is based on the percentage of its landmarks that have been completed. h\(_{\text {gc}}\) takes a threshold value as a parameter. Any goals whose value is greater than or equal to the most likely goal’s value minus the threshold are included in \({\mathcal {C}}\). When the threshold is 0.0, like our approach, only the most likely goal(s) are included in \({\mathcal {C}}\). Therefore, we present the results of their approach for 0.0 as the threshold. The compiled version of h\(_{\text {gc}}\), provided by Pereira et al. [44], was ran during the experiments.Footnote 3 We provide a detailed comparison with h\(_{\text {gc}}\) as it has been very recently developed and was shown to outperform alternative methods; other approaches to GR will be discussed in the related work section (Sect. 8).
The dataset created by Ramírez and Geffner [49] and Pereira et al. [44]Footnote 4 forms the basis for our experiments. Details on generating the lists of observations and the inaccurate initial states are provided in the setup sections, specific to each experiment. A brief description of each domain is supplied in Appendix C. Experiments were ran on a server with 16GB of RAM and a Intel Xeon 3.10GHz processor.
7.1 Evaluation metrics
Our approach is evaluated on the number of returned candidate goals (i.e. \(|{\mathcal {C}}|\)) and standard classification metrics, namely, accuracy (sometimes referred to as quality), precision, recall and F1-Score [11, 27]. A definition for each of these is provided below. Subsequently, performance profiles, which are provided to show a comparison of the approaches’ run-times, are introduced.
7.1.1 Classification metrics applied to goal recognition
Accuracy (\(Q_{P,S}\)), precision (\(M_{P,S}\)), recall (\(R_{P,S}\)) and F1-Score (\(F1_{P,S}\)) are provided in Eqs. 1, 2, 3 and 4, respectively. The definitions of TP, FP, FN and TN are provided, from a GR perspective, in Table 1. In these definitions, \(G_P\) is the actual goal (ground truth) for problem P, \({\mathcal {C}}_{P,S}\) is the set of candidate goals returned by solution/approach S and \({\mathcal {G}}_{P}\) is the set of hypothesis goals. TP is 1 if the true goal is in the set of candidates or 0 if it is not; FN is the inverse of TP; FP is the number of returned candidates that are not the real goal, and TN is the number of goals correctly identified as not the real goal. For each metric, the average over all problems per domain is displayed in the results.
Prior goal recognition papers [8, 44, 49] defined the accuracy/quality as the number of times the actual goal appeared in the set of candidate goals, i.e. they did not take \(|{\mathcal {C}}|\) into consideration when calculating accuracy. This resulted in approaches being reported as 100% accurate, even when more than one candidate goal was returned. In our paper, this is equivalent to recall (\(R_{P,S}\)). By using the definitions provided in this paper, an approach can only have an accuracy of 1 (i.e. 100%) if it always returns one candidate goal, i.e. the real goal.
7.1.2 Performance profiles
The computation times (T) are presented in performance profiles, as suggested by Dolan and Moré [7]. This enables the results to be presented in a more readable format, and all datasets can be grouped into a single result to prevent a small number of problems from dominating the discussion. To produce the performance profile an approach (\(S\in {\mathcal {S}}\)), i.e. of our Action Graph approach and of h\(_{\text {gc}}\), the ratio between its run-time (\(T_{P,S}\)) and the quickest run-time for a problem (\(P \in {\mathcal {P}}\)) is calculated, as shown in Eq. 5. Equation 6 calculates the percentage of problems an approach solved when the ratio is less than a given threshold, \(\tau \). When \(\tau =0\), the resulting \(P_{S}(\tau )\) of an approach is the percentage of problems it solved quicker than the other approach. How much \(\tau \) must be increased for 100% of problems to be solved depends on how far off the best approach that approach is.
7.2 Goal recognition with an inaccurate initial state
The main aim of our approach is to be able to perform GR when the initial state of the environment is defined inaccurately. A fluent’s value could be incorrect if it is unknown, and thus, incorrectly guessed, or an error has been made while determining the environment’s state. Therefore, a dataset containing differing percentages, i.e. 10, 20, 40, 60, 80 and 100%, of fluents set to incorrect values was produced. How this dataset was generated is discussed, followed by the results produced by our Action Graph approach and h\(_{\text {gc}}\).
7.2.1 Setup
A dataset containing problems with varying amounts of fluents set to incorrect values was generated from the dataset containing the first N % of observations, i.e. that was used in the experiments of Sect. 7.3.3. For each problem, contained in the aforementioned dataset, 10, 20, 40, 60, 80 and 100% of fluents were chosen at random and their value set to a randomly selected incorrect value. As there are elements of randomness, for each percentage of fluents, 5 problems were created. The changes that can be made to the initial state I were (manually) defined based on the actions’ effects.
For instance, in a Zeno-Travel problem, containing 5 cities, 4 people, 3 aircraft and 2 fuel-levels, there are 10 fluents whose initial value can be altered. Each person can be at a city or in an aircraft; thus, a fluent indicating a person’s location can be changed to one of the 7 alternative (incorrect) values. Each aircraft has two fluents associated with it, i.e. is in a city and has a fuel level; both of these can be changed to an alternative value.
These state changes could cause some (or all) goals to be unreachable from the defined initial state (e.g. in the Sokoban domain, the robot could be unable to navigate to a location from which one of the boxes can be pushed) and the initial state itself could be invalid/contradictory (e.g. in the Blocks-World domain, blockA’s fluent could express the block is on the table and the gripper’s fluent could indicate it is holding blockA). The changes made to the initial state are outlined in Appendix C, and a detailed table of possible changes can be found at https://doi.org/10.5281/zenodo.3621275. No modifications were made to the lists of observations.
7.2.2 Results discussion
The accuracy of our approach was not affected by setting fluents in the initial state to incorrect values, whereas the accuracy of h\(_{\text {gc}}\) greatly reduced (see Fig. 7 and Appendix D ). When building the Action Graph, the initial values of fluents are ignored. As a result, no matter what the initial values of fluents are defined as being, the same Action Graph is produced. The initialising of the nodes’ values uses the goal states, but not the initial state. Therefore, the output of our goal recognition approach is unaffected by incorrectly initialised fluents. Previously, researchers have used the initial state in their approach, and as a result, their accuracy will deteriorate. When 20% of the fluents’ were set to incorrect values, a large decrease in the accuracy of h\(_{\text {gc}}\) was observed, and as this percentage was increased, the accuracy further reduced. For several domains, i.e. Kitchen, Rovers and Intrusion-Detection, the resulting M and R of h\(_{\text {gc}}\) rose when 100% of fluents (rather than 80%) were incorrect. This is because at 100%, for these domains, all goals were contained in the set of candidate goals, and thus, the real goal was contained within \({\mathcal {C}}\). Other approaches to GR are also unable to handle inaccurate initial states because they attempt to find the plans/states that reach each goal from the defined initial world state. These are discussed further in the related work section.
7.3 Goal recognition with a known initial state
After describing the experiment setup, this section compares the computational time of our Action Graph approach to h\(_{\text {gc}}\). The accuracy of these approaches, when ran on problems with accurate initial states, is then discussed. This includes GR problems where the observations are the first N % of actions in a plan and ones for which the observations are a random N % of actions in a plan (i.e. is missing observations).
7.3.1 Setup
The GR problems in the original dataset\(^{4}\) contain 10/30/50/70/100% of observable actions in the plan to reach a goal; these observations/actions were selected at random. Therefore, for each of the original problems that contain 100% of the observations, we generated GR problems by selecting the first 10, 30, 50, 70 and 100% of observations. As a task planner (which is not guaranteed to find an optimal plan) was ran to create the problems produced by Pereira et al. [44], some observation sequences are suboptimal.
The accuracy results for our two goal probability update rules (described in Sect. 6), when ran independently and combined, are presented. In the results table, these are named AG1 (i.e. the first update rule), AG2 (the second update rule) and AG3, which is the combination of the two rules (i.e. Algorithm 1).
7.3.2 Run-times
The Action Graph heuristic took an average of 0.02 s to process all observations, whereas h\(_{\text {gc}}\) took 0.66 s. Labelling the nodes with their distance from the goals is computationally expensive; therefore, when the offline processing times (which includes the PDDL to Action Graph transformation steps) are included, Actions Graphs took an average of 2.38 s per problem. The performance profiles are displayed in Fig. 8, and the results per domain are shown in Table 2. Table 3 provides the average size of the Action Graph for each domain. These run-times were produced while processing the GR dataset containing the first N % of observations, which contains 2705 problems.
When the whole process is included in the run-times, our approach GR out performed h\(_{\text {gc}}\) on 64% of problems; however, the difference in run-time was greater for the problems our solution was slower at (than for the problems h\(_{\text {gc}}\) was slower on). This is indicated by how much \(\tau \) must be increased before 100% of problems were solved. At \(\tau =17.45\), h\(_{\text {gc}}\) solved all problems, and at \(\tau =100.00\), all problems were solved by our approach. If only the online process is included, our approach solves 100% of problems quicker than h\(_{\text {gc}}\), and \(\tau \) must reach 20426 before h\(_{\text {gc}}\) solves 100%. Note: for h\(_{\text {gc}}\), the landmarks could be discovered offline, and thus, the online computational time reduced. This is only possible if the initial value of each fluent is known in advance.
Both the size and the structure of an Action Graph impact the run-times of our approach. Domains with relatively few actions have much shorter initialisation time (e.g. Kitchen, Intrusion-Detection and Campus). For larger domains, the structure of the graph had a greater impact as the run-time was affected the number of times each node was visited. Our node labelling algorithm, which performs BFT, does not visit nodes if their already assigned distance is lower than the current distance. Moreover, an UNORDERED/AND node’s children are not associated with the prior ORDERED/AND node, and thus, are visited fewer times than if no UNORDERED/AND node is traversed. As a result, for example, although the Action Graph of the Zeno-Travel domain contains more nodes than Driverlog’s, its run-time is shorter.
We have identified two ways in which the total processing time of our approach could be reduced. Rather than calculating all the nodes’ distances from the goals upfront, this process could be performed for just the observed actions; however, observations would be processed at a reduced rate. Second, the nodes’ distances for each goal could be computed in parallel; as we envision this process being performed offline (thus, the computational time of this is of lesser importance) and the performance gain would be hardware dependent, this was not implemented.
7.3.3 Results after processing the first N % of observations
Our Action Graph approach outperformed h\(_{\text {gc}}\) when 10, 30 and 50% of observations had been received, at 70 and 100% h\(_{\text {gc}}\) slightly outperforms our approach. As described in [46], the lower the number of observations the less likely it is that a landmark is observed; therefore, h\(_{\text {gc}}\) cannot disambiguate the goals. Figure 9 displays the average F1-Score, produced by AG3 and h\(_{\text {gc}}\), at each per cent of observations; Table 4 shows the results per domain for AG1, AG2, AG3 and h\(_{\text {gc}}\). All result is the average overall domains rather than problems, so that the result is not weighted towards the domains with the most problems.
Action Graphs have a low precision and recall for the Sokoban domain. GR problems for the Sokoban domain contain observations to navigate to and push two boxes to different locations. The actions for collecting and pushing one box were observed, before the second box was acted on. Whilst observing the actions to push the first box to its goal location, our Action Graph approach increased the probability of the appropriate goals (i.e. the goals the box was becoming closer to). When the observed agent started to navigate to the second box, the aforementioned goals’ probability was decreased. Therefore, when the second box is pushed, any of the goals containing a location it is being pushed towards could appear in the set of candidate goals. In other words, the goal probabilities lose information about the first goal atom to be achieved. We considered increasing the probability of the goals with fully observed atoms; however, in problems from other domains some goals’ atoms are subgoals of another goal.
For the Kitchen domain, our Action Graph approach reduced the number of candidate goals significantly more than h\(_{\text {gc}}\) as few landmarks were observed. On the other hand, due to the structure of the produced graph, Action Graphs produced a low R and M for the Depots and Blocks-World domains. The plans for these domains are highly state dependent, which is not captured by the Action Graph structure. For instance, in a Blocks-World problem, picking up blockA requires the gripper to be empty by putting down all blocks (including blockA). The graph structure captures the dependencies of actions; however, does not account for the prior state(s) of the environment, e.g. the gripper could already be empty.
AG2 only outperformed AG1 on the Easy-IPC-Grid domain. In this domain, there are strong constraints on the order in which actions are performed and a false goal could be traversed on-route to the real goal. Therefore, for this domain, update rule 2 prevented the shortest plan being favoured and successfully increased the probabilities of the goals the observed agent was navigating towards. Nevertheless, this update rule could not determine the real goal for the majority of domains. This is because all goals, whose plans contain the observed action, were multiplied (increased) by the same amount when the current and previous observations were not connected via a DEP (or ORDERED/AND) node. All suboptimal plans are encoded in an Action Graph’s structure; therefore, for many domains, all actions are included within a plan to reach any of the goals. Combining AG2 with AG1 increased the results of the Easy-IPC-Grid domain without greatly affecting the results produced for the other domains. The subsequent sections just show the results of AG3.
7.3.4 Missing observations results discussion
Our Action Graph approach and h\(_{\text {gc}}\) were also ran on the 6313 GR problems\(^{4}\) produced by Pereira et al. [44] and Ramírez and Geffner [49], that contain missing observations. These problems contain a random 10, 30, 50, 70 and 100% of observations. The F1-Scores are depicted in Fig. 10, and a table containing the results per domain can be found at https://doi.org/10.5281/zenodo.3621275.
These results show a similar trend to the previous experiment, i.e. our approach produced a higher F1-Score than h\(_{\text {gc}}\) at 10, 30 and 50% of observations (and vice versa after 70 and 100% of observations). Both approaches perform better on the dataset containing missing observations, than they did in the previous experiment, as each GR problem could contain observations that are close to the goal (due to random actions in a plan having been selected).
8 Related work
Methods for intention recognition can be broadly categorised as data-driven and knowledge-driven (i.e. symbolic) methods [47, 67]. Data-driven approaches train a recognition model from a large dataset [1, 3, 33, 57, 67]. The main disadvantages of this method are that often a large amount of labelled training data is required and the produced models often only work on data similar to the training set [51, 66]. Since our work belongs to the category of knowledge-driven methods, data-driven methods are not further discussed.
Knowledge-driven approaches rely on a logical description of the actions agents can perform. They can be further divided into approaches that parse a library of plans (also known as “recognition as parsing"), and approaches that solve recognition problems, defined in languages usually associated with planning, i.e. “recognition as planning" [32]. Our GR approach derives a graph structure, similar to those used by some recognition as parsing methods, from a PDDL defined (planning-based) GR problem. Recognition as planning is often viewed as more flexible and general because a library of plans is not required and cyclic plans are difficult to compile into a library [49]. We chose to transform a PDDL planning problem into an Action Graph to enable the goal probabilities to be updated quickly, all plans (including suboptimal plans) to be represented, cyclic plans to be expressed and inaccurate initial states to be handle. Our approach takes advantage of the fact that a perfect/complete representation of plans is not required to perform GR. In this section, recognition as parsing and recognition as planning approaches are discussed in turn.
8.1 Recognition as parsing
In recognition as parsing, hierarchical structures are usually developed which include abstract actions along with how they are decomposed to concrete (observable) actions [31]. Several prior approaches have represented these hierarchical structures as AND/OR trees [15, 24]. As previously mentioned, our graph structure was inspired by these works. The recognition as parsing approaches, mentioned in this section, enables both the goal and plan of the observed agent to be recognised but do not mention handling invalid initial states or suboptimal plans.
Kautz et al. [30, 31] introduce a language to describe a hierarchy of actions. Based on which low level actions are observed, the higher level task(s) an agent is attempting to achieve is inferred. Their paper presents one of the earliest plan/goal recognition formal theories that aimed to handle simultaneous action execution, multi-plan recognition and missing observations.
A set of action sequence graphs is derived from a library of plans in [61]. This set is compared to an action sequence graph, created from a sequence of observations, to find the plan most similar to the observation sequence. Their approach was shown to perform well on misclassified (incorrect) sensor observations and missing actions; but to generate the library of plans a planner is called, and thus, a known initial state is required.
8.2 Recognition as planning
Recognition as planning is a more recently proposed approach, in which languages normally associated with task planning, such as STRIPS [10] and PDDL [40], define the actions agents can perform (along with their preconditions and effects) and world states. In recognition as parsing there are usually only action definitions, whereas planning-based approaches allow for the inclusion of state knowledge, such as what objects are found within the environment and their locations.
In [48, 49], it was proposed to view goal recognition as the inverse of planning. To find the difference in the cost of the plan to reach the goal with and without taking the observations into consideration, a planner is called twice for every possible goal. Therefore, the performance would greatly deteriorate when exposed to inaccurate initial states. In [5], the work from [49] was extended, to find the joint probability of pairs of goals rather than a single goal. Their work aimed to handle multiple interleaving goals. Although initial approaches were computationally expensive as they required a task planner to be called multiple times [5, 48, 49], the latest advances in recognition as planning algorithms have greatly improved this [8, 44, 55].
Plan graphs were proposed in [8]. A plan graph, which contains actions and propositions labelled as either true, false or unknown, is built from a planning problem and updated based on the observations. Rather than calling a planner, the graph is used to calculate the cost of reaching the goals. Our Action Graph structure differs greatly from a plan graph, as Action Graphs only contain actions and the constraints between those actions.
More recently, Pereira et al. [44, 46] significantly reduced the recognition time by finding landmarks. Our experiments show a comparison with this approach. This work has been expanded to handle incomplete domain models [45], i.e. GR problems with incomplete preconditions and effects, and its accuracy has been very recently improved by Wilken et al. [64]. In future work, we will explore applying our work to incomplete domain models and comparing to the work of Wilken et al. [64].
9 Conclusion
Our novel approach to goal recognition aims to handle problems in which the defined initial state is inaccurate. An inaccurate initial state contains fluents whose value is unknown and/or incorrect. For instance, if an item or agent (e.g. cup or human) is occluded its location is indeterminable, and thus, possibly defined incorrectly. Our approach transforms a PDDL defined GR problem into an Action Graph, which models the order constraints between actions. Each node is labelled with the minimum number of DEP and ORDERED/AND nodes, traversed to reach it from each goal. When an action is observed, the goals’ probability is updated based on either the distance the action’s associated node is from the goals or, if the current and prior observation are connected via a DEP or ORDERED/AND node, the change in distance. Experiments proved that when the fluents have incorrect values in the initial state, e.g. because they are unknown or sensed/determined incorrectly, the performance of our approach is unaffected.
In future work, we intend to apply our Action Graph method to further challenges associated with symbolic GR. As well as the defined initial state being inaccurate, the domain model (i.e. action definitions) could be incorrect [45]. Therefore, we will experiment with adapting the Action Graph structure based on the order observations are received. To create a more compact structure, and thus reduce the computational time of this, we will investigate grouping related actions into a single node. For instance, in the Sokoban domain, the same actions can be performed on both box1 and box2; therefore, actions, such as push(box1 loc1 loc2) and push(box2 loc1 loc2), can be grouped into a single node. Moreover, we intend to apply our GR approach to problems in which either the observed agent has multiple goals, or multiple agents have individual or joint goals [28].
Another direction for future research is to modify our approach so that the Action Graph structure expands over time. As more observation are made, new actions could be inserted and the links between actions could be adjusted. This could enable actions and sequences to be learnt. The performance of our current method could be compared to this and to recurrent neural networks (RNN) based approaches [3] using real-world data.
As developing the PDDL can be time consuming and challenging, researches have attempted to replace this manual process, with deep learning methods [1, 2]. We will explore the potential of learning the Action Graph structure from pairs of images, and then, converting the Action Graph into a PDDL defined domain. Which, subsequently, could be provided as input to task planners as well as goal recognisers.
Notes
\(N^x\) denotes a set of nodes that are of a specific type (x).
References
Amado L, Pereira RF, Aires J, et al. (2018) Goal recognition in latent space. In: International joint conference on neural networks. IEEE, IJCNN, pp. 1–8, https://doi.org/10.1109/IJCNN.2018.8489653
Asai M (2019) Unsupervised grounding of plannable first-order logic representation from images. In: Proceedings of the twenty-ninth international conference on automated planning and scheduling. AAAI Press, ICAPS’19, pp. 583–591
Bisson F, Larochelle H, Kabanza F (2015) Using a recursive neural network to learn an agent’s decision model for plan recognition. In: Proceedings of the twenty-fourth international joint conference on artificial intelligence. AAAI Press, IJCAI’15, pp. 918–924, http://dl.acm.org/citation.cfm?id=2832249.2832376
Buyukgoz S, Grosinger J, Chetouani M et al (2022) Two ways to make your robot proactive: reasoning about human intentions or reasoning about possible futures. Front Robot AI. https://doi.org/10.3389/frobt.2022.929267
Chen J, Chen Y, Xu Y et al (2013) A planning approach to the recognition of multiple goals. Int J Intell Syst 28(3):203–216. https://doi.org/10.1002/int.21565
Cimatti A, Roveri M (2000) Conformant planning via symbolic model checking. J Artif Intell Res 13:305–338
Dolan ED, Moré JJ (2002) Benchmarking optimization software with performance profiles. Math Program 91(2):201–213. https://doi.org/10.1007/s101070100263
E-Martin Y, R-Moreno MD, Smith DE (2015) A fast goal recognition technique based on interaction estimates. In: Proceedings of the twenty-fourth international joint conference on artificial intelligence. AAAI Press, Buenos, IJCAI’15
Fagan M, Cunningham P (2003) Case-based plan recognition in computer games. In: Ashley KD, Bridge DG (eds) International conference on case-based reasoning research and development. Springer, Berlin, Heidelberg, pp. 161–170, https://doi.org/10.1007/3-540-45006-8_15
Fikes RE, Nilsson NJ (1971) STRIPS: a new approach to the application of theorem proving to problem solving. Artif Intell 2(3):189–208. https://doi.org/10.1016/0004-3702(71)90010-5
Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3:1289–1305
Freedman RG, Zilberstein S (2017) Integration of planning with recognition for responsive interaction using classical planners. In: Proceedings of the thirty-first aaai conference on artificial intelligence. AAAI Press, AAAI’17, pp. 4581–4588, http://dl.acm.org/citation.cfm?id=3298023.3298233
Geffner H, Bonet B (2013) A concise introduction to models and methods for automated planning: synthesis lectures on artificial intelligence and machine learning, 1st edn. Morgan & Claypool Publishers
Geib CW, Goldman RP (2001) Plan recognition in intrusion detection systems. In: Proceedings DARPA information survivability conference and exposition II. DISCEX’01, vol 1, pp 46–55. IEEE https://doi.org/10.1109/DISCEX.2001.932191
Geib CW, Goldman RP (2009) A probabilistic plan recognition algorithm based on plan tree grammars. Artif Intell 173(11):1101–1132. https://doi.org/10.1016/j.artint.2009.01.003
Ghallab M, Nau D, Traverso P (2004) Automated planning: theory and practice. Elsevier, San Francisco
Ghallab M, Nau D, Traverso P (2016) Automated planning and acting. Cambridge University Press, Cambridge
Gupta N, Gupta SK, Pathak RK et al (2022) Human activity recognition in artificial intelligence framework: a narrative review. Artif Intell Rev 55(6):4755–4808
Harman H, Simoens P (2019) Action graphs for performing goal recognition design on human-inhabited environments. Sensors 19(12):2741. https://doi.org/10.3390/s19122741
Harman H, Simoens P (2020) Action graphs for proactive robot assistance in smart environments. J Ambient Intell Smart Environ 12(2):79–99
Harman H, Chintamani K, Simoens P (2018) Action trees for scalable goal recognition in robotic applications. In: Proceedings of the sixth workshop on planning and robotics (PlanRob), pp. 90–94
Helmert M (2006) The fast downward planning system. J Artif Intell Res 26:191–246. https://doi.org/10.1613/jair.1705
Helmert M (2009) Concise finite-domain representations for PDDL planning tasks. Artif Intell 173(5):503–535. https://doi.org/10.1016/j.artint.2008.10.013
Holtzen S, Zhao Y, Gao T, et al (2016) Inferring human intent from video by sampling hierarchical plans. In: IEEE/RSJ international conference on intelligent robots and systems. IEEE, IROS, pp. 1489–1496 https://doi.org/10.1109/IROS.2016.7759242
Hong J (2001) Goal recognition through goal graph analysis. J Artif Intell Res 15:1–30. https://doi.org/10.1613/jair.830
Horvitz E, Breese J, Heckerman D, et al (1998) The lumière project: Bayesian user modeling for inferring the goals and needs of software users. In: Proceedings of the fourteenth conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., San Francisco, UAI’98, pp 256–265, http://dl.acm.org/citation.cfm?id=2074094.2074124
Hossin M, Sulaiman M (2015) A review on evaluation metrics for data classification evaluations. Int J Data Min Knowl Manag Process 5(2):1
Hu DH, Yang Q (2008) CIGAR: concurrent and interleaving goal and activity recognition. In: Proceedings of the twenty-third national conference on artificial intelligence—Volume 3. AAAI Press, AAAI’08, pp. 1363–1368 http://dl.acm.org/citation.cfm?id=1620270.1620286
Jiao P, Xu K, Yue S et al (2017) A decentralized partially observable Markov decision model with action duration for goal recognition in real time strategy games. Discrete Dyn Nat Soc. https://doi.org/10.1155/2017/4580206
Kautz HA (1987) A formal theory of plan recognition. PhD thesis, University of Rochester. Department of Computer Science
Kautz HA, Allen JF (1986) Generalized plan recognition. In: Proceedings of the fifth AAAI national conference on artificial intelligence. AAAI Press, AAAI’86, pp. 32–37
Keren S, Mirsky R, Geib C (2019) Plan activity and intent recognition tutorial. Retrieved from http://www.planrec.org/Tutorial/Resources_files/pair-tutorial.pdf
Khan IU, Afzal S, Lee JW (2022) Human activity recognition via hybrid deep learning based model. Sensors. https://doi.org/10.3390/s22010323
Lemaignan S, Warnier M, Sisbot EA et al (2017) Artificial cognition for social human-robot interaction: an implementation. Artif Intell 247:45–69. https://doi.org/10.1016/j.artint.2016.07.002
Levine SJ, Williams BC (2018) Watching and acting together: concurrent plan recognition and adaptation for human-robot teams. J Artif Intell Res 63:281–359. https://doi.org/10.1613/jair.1.11243
Liao L, Fox D, Kautz H (2007) Extracting places and activities from gps traces using hierarchical conditional random fields. Int J Robot Res 26(1):119–134. https://doi.org/10.1177/0278364907073775
Lima WS, Souto E, Rocha T, et al (2015) User activity recognition for energy saving in smart home environment. In: IEEE symposium on computers and communication (ISCC), pp. 751–757. IEEE, https://doi.org/10.1109/ISCC.2015.7405604
Masters P, Sardina S (2019) Cost-based goal recognition in navigational domains. J Artif Intell Res 64:197–242. https://doi.org/10.1613/jair.1.11343
Masters P, Kirley M, Smith W (2021) Extended goal recognition: A planning-based model for strategic deception. In: Proceedings of the 20th international conference on autonomous agents and multiagent systems. International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, AAMAS ’21, pp. 871-879
McDermott D (2000) The 1998 AI planning system competition. AI Mag 21(2):35. https://doi.org/10.1609/aimag.v21i2.1506
Mirsky R, Stern R, Gal K et al (2018) Sequential plan recognition: an iterative approach to disambiguating between hypotheses. Artif Intell 260:51–73. https://doi.org/10.1016/j.artint.2018.03.006
Mirsky R, Shalom Y, Majadly A et al (2019) New goal recognition algorithms using attack graphs. In: Dolev S, Hendler D, Lodha S et al (eds) Cyber security cryptography and machine learning. Springer International Publishing, Cham, pp 260–278
Palacios H, Geffner H (2009) Compiling uncertainty away in conformant planning problems with bounded width. J Artif Intell Res 35:623–675
Pereira RF, Oren N, Meneguzzi F (2017) Landmark-based heuristics for goal recognition. In: Proceedings of the thirty-first AAAI conference on artificial intelligence. AAAI Press, AAAI’17, pp 3622–3628, http://dl.acm.org/citation.cfm?id=3298023.3298094
Pereira RF, Pereira AG, Meneguzzi F (2019) Landmark-enhanced heuristics for goal recognition in incomplete domain models. In: Proceedings of the twenty-ninth international conference on automated planning and scheduling. AAAI Press, ICAPS’19, pp. 329–337
Pereira RF, Oren N, Meneguzzi F (2020) Landmark-based approaches for goal recognition as planning. Artif Intell 279:103217. https://doi.org/10.1016/j.artint.2019.103217
Rafferty J, Nugent CD, Liu J et al (2017) From activity recognition to intention recognition for assisted living within smart homes. IEEE Trans Hum-Mach Syst 47(3):368–379. https://doi.org/10.1109/THMS.2016.2641388
Ramírez M, Geffner H (2009) Plan recognition as planning. In: Proceedings of the twenty-first international joint conference on artifical intelligence. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, IJCAI’09, pp. 1778–1783, http://dl.acm.org/citation.cfm?id=1661445.1661731
Ramírez M, Geffner H (2010) Probabilistic plan recognition using off-the-shelf classical planners. In: Proceedings of the twenty-fourth AAAI conference on artificial intelligence. AAAI Press, AAAI’10, pp. 1121–1126
Ramírez M, Geffner H (2011) Goal recognition over POMDPs: Inferring the intention of a POMDP agent. In: Proceedings of the twenty-second international joint conference on artificial intelligence. AAAI Press, IJCAI’11, pp 2009–2014, https://doi.org/10.5591/978-1-57735-516-8/IJCAI11-335
Roy PC, Giroux S, Bouchard B et al (2011) A possibilistic approach for activity recognition in smart homes for cognitive assistance to Alzheimer’s patients. In: Chen L, Nugent CD, Biswas J, Hoey J (eds) Activity recognition in pervasive intelligent environments, vol 4. Atlantis Press, Paris, pp 33–58. https://doi.org/10.2991/978-94-91216-05-3_2
Rubin JE (1967) Set theory for the mathematician. Holden-Day, San Francisco
Schmidt C, Sridharan N, Goodson J (1978) The plan recognition problem: an intersection of psychology and artificial intelligence. Artif Intell 11(1):45–83. https://doi.org/10.1016/0004-3702(78)90012-7
Shaw PH, Farwer B, Bordini RH (2008) Theoretical and experimental results on the goal-plan tree problem. In: Proceedings of the seventh international joint conference on autonomous agents and multiagent systems, AAMAS’08, vol 3, pp 1379-1382. International Foundation for Autonomous Agents and Multiagent Systems, Richland
Shvo M, McIlraith SA (2020) Active goal recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence 34(06):9957–9966. https://doi.org/10.1609/aaai.v34i06.6551, https://ojs.aaai.org/index.php/AAAI/article/view/6551
Shvo M, Hari R, O’Reilly Z, et al (2022) Proactive robotic assistance via theory of mind. In: 2022 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 9148–9155, https://doi.org/10.1109/IROS47612.2022.9981627
Singla G, Cook DJ, Schmitter-Edgecombe M (2010) Recognizing independent and joint activities among multiple residents in smart environments. J Ambient Intell Humaniz Comput 1(1):57–63. https://doi.org/10.1007/s12652-009-0007-1
Sohrabi S, Riabov AV, Udrea O (2016) Plan recognition as planning revisited. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence. AAAI Press, IJCAI’16, pp 3258–3264, http://dl.acm.org/citation.cfm?id=3061053.3061077
Thangarajah J, Padgham L, Winikoff M (2003) Detecting & exploiting positive goal interaction in intelligent agents. In: Proceedings of the second international joint conference on autonomous agents and multiagent systems. Association for Computing Machinery, New York, AAMAS’03, pp 401–408, https://doi.org/10.1145/860575.860640
Tremblay S, Fortin-Simard D, Blackburn-Verreault E, et al (2015) Exploiting environmental sounds for activity recognition in smart homes. In: AAAI workshop: artificial intelligence applied to assistive technologies and smart environments. AAAI-ATSE
Vattam SS, Aha DW, Floyd M (2014) Case-based plan recognition using action sequence graphs. In: Lamontagne L, Plaza E (eds) Case-based reasoning research and development. Springer International Publishing, Cham, pp 495–510. https://doi.org/10.1007/978-3-319-11209-1_35
Vilain M (1990) Getting serious about parsing plans: a grammatical analysis of plan recognition. In: Proceedings of the eighth national conference on artificial intelligence. AAAI Press, AAAI’90, pp 190–197, http://dl.acm.org/citation.cfm?id=1865499.1865528
Wang Z, Boularias A, Mülling K et al (2017) Anticipatory action selection for human-robot table tennis. Artif Intell 247:399–414. https://doi.org/10.1016/j.artint.2014.11.007
Wilken N, Cohausz L, Bartelt C, et al (2023) Planning landmark based goal recognition revisited: does using initial state landmarks make sense? arXiv preprint arXiv:2306.15362 [cs.AI]
Wu J, Osuntogun A, Choudhury T, et al (2007) A scalable approach to activity recognition based on object use. In: IEEE eleventh international conference on computer vision. IEEE, ICCV, pp 1–8, https://doi.org/10.1109/ICCV.2007.4408865
Yordanova K, Krüger F, Kirste T (2012) Context aware approach for activity recognition based on precondition-effect rules. In: IEEE international conference on pervasive computing and communications workshops. IEEE, PerCom Workshops, pp 602–607, https://doi.org/10.1109/PerComW.2012.6197586
Yordanova K, Lüdtke S, Whitehouse S et al (2019) Analysing cooking behaviour in home settings: towards health monitoring. Sensors 19(3):55. https://doi.org/10.3390/s19030646
Yue S, Yordanova K, Krüger F et al (2016) A decentralized partially observable decision model for recognizing the multiagent goal in simulation systems. Discrete Dyn Nat Soc. https://doi.org/10.1155/2016/5323121
Acknowledgements
The research leading to this article was funded through an SB Fellowship of the FWO-Vlaanderen (project no. 1S40217N). Helen Harman is currently affiliated with the University of Lincoln.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A Action Graph creation algorithm
The pseudocode for our Action Graph creation method, described in Sect. 4, is provided in Algorithm 2.
Appendix B Algorithm to the nodes’ distance from each goal
The pseudocode for the BFT that labels the nodes with their distance from each goal, described in Sect. 5, is provided in Algorithm 3.
Appendix C Domains
The list below describes each domain in turn and describes the fluents whose initial value was modified to generate the dataset for the experiments described in Sect. 7.2. Further details on the modified fluents are provided at https://doi.org/10.5281/zenodo.3621275. These GR domains were produced by Ramírez and Geffner [44, 49]\(^{4}\), based on the work of Wu et al. [65] and the IPC domains\(^{1}\).
-
Blocks-World: A hand stacks blocks on top of one another to create a tower.
-
A block can be placed on the table or another block, a block can become unclear, and the hand can be empty or holding a block.
-
-
Campus: In the campus domain, a university student navigates to different locations (e.g. the library, a cafe, etc.) to perform different activities.
-
The student’s location can change and if an activity has been performed (or not) can be modified.
-
-
Depots: In this domain, crates are relocated by trucks and hoists.
-
Crates can change location, a crate could be on another create or in a truck, and a hoist could be lifting a crate or available.
-
-
Driverlog: Trucks are driven by different drives, so that objects can be relocated.
-
Trucks, drivers and objects can change location, an object could be in a truck, and a driver can be driving a truck.
-
-
DWR: Robots and cranes relocate containers, which are piled on top of each other.
-
The location of a robot, which pile a container is in (and/or on top of), if a container has been loaded onto a robot and if a crane is holding a container can be changed.
-
-
Easy-IPC-Grid: A robot navigates a grid to reach a goal location, and along the way must collect keys to unlock locations.
-
A location can be unlocked, and the location of the robot and if a key is being carried can be modified.
-
-
Ferry: A ferry transports cars to different locations.
-
The ferry’s location and if a car is at a location or on the ferry can be changed.
-
-
Intrusion-Detection: An intruder accesses, modifies and downloads information from a computer system.
-
What information has been accessed, modified or downloaded can be changed.
-
-
Kitchen: The observed agent takes different items and performs kitchen-based activities (e.g. makes toast). Note that: “activities" are not included in the list of observations.
-
Which items have been taken, what equipment has been used and what activities have been performed can be changed.
-
-
Logistics: Packages are transported to different locations and airports by air planes and trucks.
-
Which airport, location, truck or airplane a package is at/in; a truck’s airport/location, and an airplane’s airport can be changed.
-
-
Miconic: A lift takes different passages to there desired floor.
-
The floor the lift starts on, which passages are in the lift and which passages have been served can be changed.
-
-
Rovers: A rover navigates a planet collecting rock samples, soil samples and images.
-
The location of the rover, if a camera has been calibrated, if the robots store is empty/full, what data have been collected, which data have been communicated and if a channel is free can be modified.
-
-
Satellite: In the satellite domain, satellites take images in different modes and directions.
-
Which direction a satellite is pointing in, if the power is being availed, if the power is on, if the an instrument is calibrated and if an image has been taken can be modified.
-
-
Sokoban: A robot must push two boxes to different locations.
-
The location of the robot and the boxes can be modified.
-
-
Zeno-travel: People travel on aircraft, to reach different cities.
-
Which city/aircraft a person is in/at, which city an aircraft is in and an aircraft’s fuel level can be changed.
-
Appendix D Inaccurate initial state detailed experimental results
As described in Sect. 7.2, experiments where performed in which varying amounts of fluents were set to incorrect values. A breakdown of the results, per domain, for these experiments are provided in Tables 5 and 6.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Harman, H., Simoens, P. Cyclic Action Graphs for goal recognition problems with inaccurately initialised fluents. Knowl Inf Syst 66, 1257–1300 (2024). https://doi.org/10.1007/s10115-023-01976-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-023-01976-6