1 Introduction

By observing the behaviour of an agent, artificially intelligent systems can attempt to determine the agent’s intentions. Knowledge of an agent’s intentions is essential in numerous application areas. These include computer games in which non-playable characters must adapt to players’ actions [9]; intelligent user help for human–computer interaction scenarios [25, 26]; offering humans energy saving advice [37]; robot sports playing (e.g. table tennis [63]); interfering (and thus preventing) the intentions of computer network intruders [14, 42]; determine the location a human is navigating to (e.g. for airport security) [38], and to enable proactive robot assistance [4, 12, 20, 34, 35, 56]. Rather than developing domain specific intention recognition algorithms, a symbolic representation of the world and agents’ actions can be provided as input to non-domain specific algorithms [53, 62].

Intention recognition can be split into several categories, namely, activity recognition [18, 36, 60, 65], plan recognition [14, 41, 58], and goal recognition [39, 44, 49, 55, 64]. Our work falls under the category of goal recognition (GR), in which the aim is to label a sequence of observations (e.g. actions) with which goal the observee is attempting to reach. For instance, when provided with a sequence of move actions, GR methods will attempt to select (from a predefined list) which location the agent is intending to reach. For the Kitchen domain by Raírez and Geffner [49], the sequence of observed actions includes taking different items and using appliances (e.g. a toaster), and the returned classification indicates if the observee is likely to be making breakfast, dinner or a packed lunch. Goal and plan recognisers operate on discrete observations/actions, and thus, assume that data streams have been preprocessed, e.g. sensor data have been processed by activity recognisers.

Our GR method aims to overcome several challenges. First, the defined initial world state could be inaccurate; for instance, if an item or agent (e.g. cup or human) is occluded, its location is indeterminable, and thus, possibly defined incorrectly. Second, the observed agent could act suboptimally [49]; therefore, all plans (including suboptimal plans) are represented within the underlying structure generated by our approach. Third, actions could be missing from the sequence of observations [46], e.g. due to sensor malfunction or occlusions. Finally, an observation should be rapidly processed, so there is little delay in determining the observee’s goal. The cited GR works have investigated handling suboptimal observation sequences and handling missing observations, but they do not consider inaccurate initial states.

We define the term inaccurate initial state as an initial state containing fluents (i.e. non-static variables) whose value is unknown (i.e. undefined) and/or incorrect (i.e. set to the wrong value). Inaccurate initial states have been handled by task planners [6, 43]. Moreover, GR with probabilistic, partially observable state knowledge and stochastic action outcomes has previously been investigated [29, 50, 68]; however, these systems require the probability of each state and action outcome to be known (and thus defined within the GR problem). GR with incomplete domain models, i.e. problems containing actions with incomplete preconditions and effects, has also been considered [45], but the initial state was assumed to be accurately represented. Our system makes no assumptions about the correctness of the initial value assigned to a fluent.

In this paper, we aim to answer two research questions. (i) Can a structure similar to those created by library-based approaches be generated from a PDDL defined GR problem? (ii) When the initial state is inaccurately defined, how can a goal recognition approach be prevented from suffering a major loss of accuracy?

To answer these questions, we develop a novel technique for transforming GR problems into an Action Graphs, a structure inspired by AND/OR trees. Leaf nodes correspond to actions and are connected to their dependencies via operator nodes. Operator nodes include DEP (short for dependencies), ORDERED/AND, UNORDERED/AND and OR nodes. After transforming the action definitions and world model into an Action Graph, the Action Graph’s nodes are labelled with their distance from each hypothesis goal, i.e. each goal the observee could be intending to achieve. Both these processes are performed offline. For each observation, the online process updates the goal probabilities based on either the distance the observed action’s node is from each goal or the change in distance. Our distance measure is based on the number and type of nodes traversed to reach the node in question from an action node that results in the goal state being reached. The goal(s) with the highest probability are returned as the set of candidate, i.e. predicted, goals. An Action Graph does not contain a perfect representation of all plans; as mentioned by Pereira et al. [46], unlike task planning, this is not a requirement of GR. A conceptual overview of our system is provided in Fig. 1.

Fig. 1
figure 1

Conceptual overview of the goal recognition process described in this paper. As indicated by the grey boxes, our approach contains three main processes: (1) create the Action Graph (see Sect. 4), (2) label the Action Graph’s nodes with how many steps away from each goal state they are (see Sect. 5), and (3) use these labels/distances to update the goals’ probabilities when an action is observed (see Sect. 6)

Our previous work on GR employed an acyclic (rather than cyclic) Action Graph [21], and thus, for many domains did not achieve a high accuracy. Moreover, our previous method cannot handle an inaccurate initial state. Action Graphs have also been applied to goal recognition design, in which the aim is to reduce the number of observations required to determine an agent’s goal [19]. This paper introduces a novel method for inserting the actions into an Action Graph and presents an alternative approach for updating the goals’ probability.

The remainder of this paper is structured as follows. Section 2 presents some background information. A formal definition of our Action Graph structure is provided in Sect. 3, and Sect. 4 describes the algorithm that generates the Action Graph. Section 5 introduces our distance measure and how the nodes are labelled with their distance from each goal. The different goal probability update rules, that are executed when an observation is received, are described in Sect. 6. Our experimental results, discussed in Sect. 7, show that our GR method is unaffected by inaccuracies in the initial state.

2 Background

Symbolic task planning and goal recognition problems are often defined in Planning Domain Definition Language (PDDL) [40], a popular domain-independent language for modelling the behaviour of deterministic agents. A PDDL defined problem includes action definitions, objects, predicates and an initial state; an example of each is provided in Listing 1. Our GR approach transforms a PDDL problem into a multi-valued problem by running the converter of  [22]. The multi-valued problem is then transformed into an Action Graph. The goal of this section is to provide readers unfamiliar with PDDL and goal recognition the background information required to understand the data provided as input and the notations used. This section first describes why a multi-valued representation is used. The task planning and GR (also known as inverse planning) problem definitions are provided, in the subsections, from a multi-valued problem perspective.

figure c

Footnote 1To create a concise, grounded representation of a problem, a PDDL defined problem is often converted into a multi-valued representation [22, 23]. This representation uses finite variables rather than Boolean propositions. For example, rather than a move(1_1 1_2) action (which symbolises an agent moving from grid position 1_1 to 1_2) removing the proposition (at 1_1) from the current state and inserting the proposition (at 1_2), a variable, i.e. fluent, that represents the agent’s location is changed from (at 1_1) to (at 1_2). This enables a more concise representation of the problem to be produced, from which the relations between the different propositions can be extracted. Moreover, a (grounded) action is only created, from an action definition, if its static preconditions appear in the PDDL defined initial world state. For example, to create the move(1_1 1_2) action, positions 1_1 and 1_2 must be adjacent. Which locations are adjacent can be statically defined; in other words, no action modifies which locations are adjacent. Further details on the benefits of this representation are given in Helmert [23].

2.1 Symbolic task planning

In symbolic task planning, a problem contains a single goal state, and task planners, e.g. Fast Downward [22], find the appropriate set of actions, i.e. a task plan, that can transform the initial world state into the desired goal state. Definitions for states, actions and planning problems are provided below.

Definition 1

Planning Problem A planning problem P can be defined as \(P = (F,I,A,G)\), where F is a set of fluents, I is the initial state, G is a goal state, and A is a set of actions [13, 16].

Definition 2

Fluent A fluent (\(f\in F\)) is a state variable.

When assigned a value, a fluent can be represented by a grounded predicate. Grounded predicates are also called atoms. For instance, (at 1_2) is an atom which denotes that the observed agent is at the position 1_2.

Definition 3

State A state contains all fluents, each of which is assigned a value.

The initial state (I) contains all fluents, whereas the goal (G) could be a partial state, containing a subset of fluents. To transition between states, the value of fluents are altered by actions. An action is formally defined as follows:

Definition 4

Action An action (a) is comprised of a name, a set of objects, a set of preconditions (\(a_{pre}\)) and a set of effects (\(a_{eff}\)). Preconditions and effects are composed of a set of valued fluents. Preconditions can contain or and and statements.

Action a is applicable to state s if the state is consistent with the action’s preconditions. Applying action a to state s will result in state \(s'\), where \(a_{eff}\subseteq s'\) and \(\forall (f \in s', f\notin a_{eff}): (f \in s)\).

Definition 5

Planning Problem Solution A solution to a planning problem is a sequence of actions \(\pi =(a0, a1,..., ai\in A)\) such that applying each action in turn starting from state I results in a state (\(s^i\)) that is consistent with the (partial) goal state G, i.e. \(s^i \supseteq G\).

Planners search for the optimal solution to a planning problem. An optimal solution is the solution with the lowest possible cost. In our work, the cost of an action is 1, and thus, the cost of a plan is equivalent to its length.

2.2 Goal recognition

Goal recognition is often viewed as the inverse of planning, as the aim is to label a sequence of observations with the goal the observed agent is attempting to reach. This section provides the formal definition of a GR problem and describes the observation sequences and output of our GR approach.

Definition 6

Goal Recognition Problem A GR problem is defined as \(T = (F,I,A, O,{\mathcal {G}})\), where \({\mathcal {G}}\) is the set of all possible (hypothesis) goals and O is a sequence of observations  [49].

Definition 7

Observations O is a sequence of observed actions (observations), i.e. \(O=(a1, a2,..., ai\in A)\).

A completed sequence of observations with no missing actions can be applied to an initial state I to reach a goal state \(G\in {\mathcal {G}}\). This sequence can also be incomplete, have missing observations or/and be suboptimal. An incomplete sequence of observations contains the first N actions that are required to reach a goal; in other words, the goal has not yet been reached. An action could be missing from anywhere within a (incomplete or complete) sequence of observations. Observations are suboptimal if any number of additional, unnecessary actions have been performed to reach the goal.

GR approaches attempt to select the real goal from the set of hypothesis goals \({\mathcal {G}}\). Our GR approach produces a probability distribution over the hypothesis goals, i.e. \(\sum _{i=1}^{|{\mathcal {G}}|}P({G_i}|O) = 1\). In other words, we aim to find the likelihood of a given observation sequence O under the assumption that the observee is pursuing a goal \(G_i\), i.e. \(P(O|G_i)\). The goal(s) with the highest probability are returned as the set of candidate goals \({\mathcal {C}}\). As goals can be equally probable, there can be multiple candidate goals, i.e. \(|{\mathcal {C}}| \ge 1\). Nevertheless, we assume that there is only a single real goal. Note, our evaluation metrics (see Section 2.7.1) take into account that multiple goals could be returned.

3 Action Graph structure and formal definitions

Action Graphs model the possible order constraints between actions by linking actions (dependants) to their dependencies. They are constructed of action nodes and operator nodes, namely, DEP (short for dependencies), ORDERED/AND, UNORDERED/AND and OR nodes. Action nodes are always leaf nodes, and their dependencies are conveyed through their connections (via operator nodes) to other actions. This section defines dependencies, provides a definition of an Action Graph, describes how the Action Graph structure links actions to their dependencies and briefly mentions related structures.

Definition 8

Action’s Dependencies The set of dependencies of action \(a\in {\mathcal {A}}\) is formally defined as: \(D(a) = \{ a' \mid (a'_{eff} \cap a_{pre}) \ne \emptyset \} \).

Definition 9

Action’s Dependant Action a is a dependant of action \(a'\) if \(a' \in D(a)\).

In other words, action \(a'\) is a dependency of action a if at least one effect of \(a'\) fulfils at least one of a’s preconditions, i.e. \(a'\in D(a)\) if \(a'_{eff} \cap a_{pre} \ne \emptyset \). In that case, action a is called the dependant of the dependency \(a'\). The order in which dependencies are likely to be observed can be conveyed by the nodes of an Action Graph.

Definition 10

Action Graph \(AG=(N^O,N^A,E)\), where \(N^O\) is a set of operator nodes, \(N^A\) are action nodes and E are edges.Footnote 2 Operator nodes are of type DEP, UNORDERED/AND, OR and ORDERED/AND nodes, i.e. \(N^O=(N^{OR}, N^{DEP}, N^{O\text {-}AND}, N^{U\text {-}AND})\). The root node is of type OR. All nodes (except the root) have a set of parents. All operator nodes (\(N^O\)) have a set of children, and those children can be operator nodes or action nodes. \(N^A\) are leaf nodes.

The operator node types are described in the list below and depicted in Fig. 2. The precedes operator \(\prec \) denotes that the list of actions on the left precedes (are dependencies of) the action on the right. Standard maths notation is used to denote if a set of actions is unordered or ordered, that is, curly brackets denote the actions are unordered and angle brackets show the actions are ordered [52]. Moreover, rather than writing or constraints as two statements, e.g. \(a4 \prec a1 \text { OR } a5 \prec a1\), a shortened form is given, e.g. \(or(a4,a5) \prec a1\).

Fig. 2
figure 2

The different types of order constrains on actions that achieve a1’s preconditions, i.e. \(\{a2,a3,or(a4,a5),\langle a6,a7\rangle \} \prec a1\). Solid arrows point to the dependant, and dashed arrows point to the dependencies. UNORDERED-AND is shortened to U-AND and ORDERED-AND to O-AND

  • DEP nodes indicate that an action’s dependencies are performed before the action itself, e.g. \( D(a1) \prec a1\). The second (i.e. last) child of a DEP node is the action node itself; the first child could be of any type.

  • UNORDERED-AND nodes denote that different dependencies set different preconditions (and there are no order constraints on the dependencies), e.g. if \(a2 \in D(a1)\), \(a3 \in D(a1)\) and \(\left( a1_{pre}\cap a2_{eff}\right) \ne \left( a1_{pre}\cap a3_{eff} \right) \), then \( (a2 \wedge a3) \prec a1\).

  • OR nodes express the multiple (alternative) ways a precondition can be reached, e.g. if \(a4 \in D(a1)\), \(a5 \in D(a1)\) and \(\left( a1_{pre} \cap a4_{eff}\right) = \left( a1_{pre} \cap a5_{eff}\right) \) then \(or(a4,a5) \prec a1\).

  • ORDERED-AND nodes indicate there are order constraints between an action’s dependencies. Such constraints are required when executing one dependency could unset the preconditions of another. For example, if \(a6 \in D(a1)\), \(a7 \in D(a1)\) and both \(a6_{pre}\) and \(a7_{eff}\) contain the same fluent but with different values, then a6 is performed before a7, i.e. \(\langle a6,a7\rangle \prec a1\). This is because the effects (fluents) of a7 are preconditions of a1 but to perform a6 those fluents must be assigned a different value. If these constraints are cyclic, e.g. \( \{\langle a6, a7 \rangle , \langle a7, a6 \rangle \} \prec a1\), then the constraint is ignored; in other words, the dependencies are considered to be unordered.

Dependencies are actions, and thus, they can also have dependencies. For example, a8 could depend on a9, which depends on a10, i.e. \(a10 \prec a9 \prec a8\). If an action has dependencies, its only parent is the DEP node linking it to its dependencies (and dependants). Thus, continuing with the example, the left child of a8’s parent DEP node is a9’s parent DEP node.

Cyclic dependencies can also occur, e.g. a1 could depend on a2 which depends on a1 (i.e. \(...a2 \prec a1 \prec a2...\)). This causes cycles to appear within the Action Graph. These cycles can also be caused by indirect dependencies, e.g. \(...a2 \prec a3 \prec a1 \prec a2...\) in which a1 is a dependency of a2 and a2 is an indirect dependency of a1.

Our Action Graph structure does not contain states. An Action Graph only captures information about which actions fulfil each action’s preconditions. This is similar to the structures of library-based intention recognition and planning approaches [24, 31, 54, 59]. For instance, Goal-Plan trees contain (sub)goals with plans that can contain subgoals [54, 59]. Goal-Plan trees do not contain knowledge of the environment’s current state. In particular, the Action Graph structure was inspired by the work of Holtzen et al. [24], who represented a library of plans as AND/OR trees. Differently, our approach takes PDDL rather than a library of plans as input and enables suboptimal and cyclic dependencies to be represented.

Moreover, the definition of a dependency is similar to causal links from Partial-Order Causal Link (POCL) planning [17]. Like dependencies, a causal link expresses that an action’s preconditions are contained within another action’s effects. Differently, POCL structures represent complete plans (to reach a single goal state from the initial state), edges rather than nodes are used to denote the order constraints and they can contain ungrounded actions. As GR does not require a completely valid plan, Action Graphs are simpler to construct than POCL structures.

4 Cyclic Action Graph creation

Our goal recognition method creates an Action Graph, labels the nodes with their distance from each goal, then for each observation updates the goals’ probability. This section describes how an Action Graph is generated from a GR problem. The modifications to the preprocessing step, that transforms a PDDL problem into a multi-valued problem, are described. Subsequently, the action insertion algorithm is detailed, followed by an example.

4.1 Preprocessing: multi-valued problem generation

This paper only provides the details of the transformation, from a PDDL defined problem to a multi-valued problem, that are key to understanding our approach and that differ from [22]. A single goal statement is required by the converter of Helmert [22]; therefore, prior to calling the converter, a goal statement is created by placing all hypothesis goals (\({\mathcal {G}}\)) into an or statement, i.e. \(G=or(G_1, G_2,...,G_{|{\mathcal {G}}|} \in {\mathcal {G}})\).

The converter’s parameter, to keep all unreachable states, is set to true and, after parsing the PDDL, all groundings of the actions’ effects are inserted into the initial state (I). This forces actions, and all fluents’ values, to be inserted into the resulting representation even if the actions’ fluent preconditions, and thus, possibly the goals, are unreachable from the defined (original) initial state. For instance, if an agent’s location is missing from I, e.g. because it is unknown, and no transition between an unknown and known location exists, then move actions would not be inserted as their preconditions can never be met. To prevent this, all (at ?location) groundings are inserted into I. Additional static atoms are not inserted into I; thus, continuing with our example, move(1_1 1_2) is only appended to the set of actions A if (adjacent 1_1 1_2) is declared in the defined initial state.

4.2 Inserting actions into an Action Graph

An Action Graph is initialised with an OR node as the root; then, each action (\(a \in A\)) is inserted into the graph in turn by connecting it to its dependencies. Actions can be inserted in any order. Finally, the graph is adjusted so only the Goal Actions’ parent DEP nodes are connected to the root. This process is detailed below, and the pseudocode is provided in Appendix A.

If an action has no dependencies, because either there are no actions that fulfil its preconditions or it has no preconditions, it is simply appended to the root’s children. In all other cases, the root is linked to a new DEP node. The DEP node’s two children are set to an UNORDERED-AND node, proceeded by the action node itself. If this action node was already created, because it is a dependency of an already processed action, the action node’s prior parents are moved to be the DEP node’s parents.

The UNORDERED/AND node’s children are set to one or more of the following: the action nodes (or parents) of the dependencies, OR nodes if there are multiple ways in which a precondition can be met, and/or ORDERED/AND nodes. OR nodes’ are inserted by setting their children to the action nodes of the dependencies that set the same precondition(s). If a dependency has dependencies, the corresponding child becomes the dependency’s parent. This is because actions that have dependencies can only ever have a single parent, of type DEP. Note: if an operator would only have one child, the operator node is not inserted.

ORDERED/AND nodes indicate that there are order constraints on the dependencies themselves. This is detected by checking if a fluent has a value in a dependency’s preconditions which is different in another dependency’s effects (see Sect. 3); and thus, the former dependency must be performed first. If this constraint is bidirectional/cyclic, the ORDERED/AND node is not inserted; instead, the dependencies become the children of the UNORDERED/AND node. Only the preconditions/effects of direct dependencies are checked, the algorithm does not check if a dependence’s dependency could undo/unset a dependency’s precondition. Performing this check would be computationally complex and a perfect representation of the plans to reach of the actions’ effects is not required.

Fig. 3
figure 3

Example problem from the Easy-IPC-Grid domain. Before an agent can move to position (2,1), it must be unlocked. The lettered arrows indicate the actions inserted to produce the sub-figures of Fig. 4. Based on the GR Easy-IPC-Grid problems developed by Ramírez and Geffner [48], based on a domain from the official IPC\(^{1}\)

The ORDERED/AND node’s children could also be of type UNORDERED/AND or OR. When multiple dependencies could unset a dependency’s preconditions, an UNORDERED/AND is inserted as the child of the ORDERED/AND node’s right branch. Moreover, the dependencies that set the same precondition(s) of the dependant are grouped together as the children of an OR node. Therefore, if one of these dependencies is affected by (or effects) another dependency, the OR node becomes the ORDERED/AND (or UNORDERED/AND) node’s corresponding child. Without this feature, the graph’s structure would become more complex, i.e. be of greater depth and/or breadth.

4.3 Identifying goal actions

An action is a Goal Action if its effects fulfil a goal’s atoms, i.e. \(a_{eff} \supseteq G\), where \(G \in {\mathcal {G}}\). After all actions have been inserted, the root node’s children are modified so that only the Goal Actions are attached to the root. If multiple actions are required to fulfil a goal, e.g. \((a1_{eff} \cup a2_{eff}) \supseteq G\), then an auxiliary Goal Action (\(a^x\)) is created. Auxiliary Goal Actions are linked to the multiple actions that fulfil the goal via a DEP node, e.g. \(\{a1,a2\} \prec a^x\). They are connected to their dependencies, i.e. the goal’s dependencies, in the same way as all other actions are.

Identifying and creating Goal Actions simplifies traversing the graph to find all nodes belonging to a single goal. All children, including indirect children, of a Goal Action’s parent DEP node could appear in a plan (from any initial state) to reach the goal the Goal Action fulfils. Therefore, the graph can be traversed, in a depth-first or breadth-first manner, to find all the nodes, and thus actions, belonging to a goal.

Fig. 4
figure 4

Example of the steps taken to insert four actions, and their dependencies, into an Action Graph. This example has been simplified, i.e. in the original Easy-IPC-Grid problems, keys have different shapes. Solid pale arrows show the root node’s connections (prior to the Goal Actions being discovered)

4.4 Example

An example is provided, in this section, to demonstrate how our creation algorithm works for the grid-based navigation problem depicted in Fig. 3. Figure 4 shows the Action Graph after each action, and its dependencies, have been inserted. The four insertions, detailed below, were selected to show the different structural features of an Action Graph. A figure with all actions inserted into the graph would be unreadable, and thus is not provided.

The example starts by inserting the goal action, move(2_0 1_0). The preconditions of move(2_0 1_0) are met by executing one of two possible actions, i.e. \(or(\texttt {{move(1\_0 2\_0)}}, \texttt {{move(2\_1 2\_0)}}) \prec \texttt {{move(2\_0 1\_0)}}\); therefore, it is inserted by connecting it to its dependencies via a DEP node and a OR node (see Fig. 4a). Likewise, when move(1_0 0_0), whose preconditions are reached by one of three actions (i.e. \(or(\texttt {{move(2\_0 1\_0)}}, \texttt {{move(1\_1 1\_0)}}, \texttt {{move}}\texttt {{(0\_0 1\_0)}}) \prec \texttt {{move(1\_0 0\_0)}}\)), is inserted, an OR node is created. As one of its dependencies has already been inserted, the appropriate child of the OR node is set to move(2_0 1_0)’s parent DEP node. This is shown in Fig. 4b.

Inserting move(0_0 1_0) causes the graph to become cyclic (Fig. 4c) because it depends on one of its dependants, i.e. \(\texttt {{move(1\_0 0\_0)}} \prec \texttt {{move(0\_0 1\_0)}}\). Figure 4d displays the graph after move(1_1 2_1) has been inserted. This action requires location 2_1 to be unlocked with key1, and thus, its dependencies include unlock actions. As unlock actions’ preconditions contain the location of the agent, they must be performed prior to the move actions required by the dependant. Therefore, an ORDERED/AND node is created during the insertion of move(1_1 2_1), i.e. \( \langle or(\texttt {{unlock(2\_1 2\_0 key1)}}, \texttt {{unlock(2\_1 1\_1 key1)}}), or(\texttt {{move(0\_1 1\_1)}}, \texttt {{move(2\_1 1\_1)}}, \texttt {{move(1\_0 1\_1)}})\rangle \prec \texttt {{move(1\_1 2\_1)}}\).

5 Node distance initialisation

Each node has a set of distances associated with it, which indicate how far the node is from each goal, i.e. the number of DEP and ORDERED/AND nodes that must be traversed to get from the Goal Action’s parent to the node in question. These distances are set by means of a breadth-first traversal (BFT). A BFT was implement because this will result in the nodes being visited in the order they are likely to be performed. An explanation of this algorithm is provided, followed by an example. The pseudocode can be found in Appendix B.

5.1 Node value initialisation algorithm

During the BFTs, that start from each Goal Action’s parent node, the current node’s distance is set, the count (i.e. distance measure) is increased if the node is of type DEP or ORDERED/AND, and each of the node’s children is pushed onto the BFT-queue. This distance measure provides an indication of how far each node is from each goal whilst attempting to minimise favouring shorter plans (see Sect. 6 for the calculations of the goal probabilities). The same node could be visited multiple times during a BFT; however, if the current distance/count is greater than or equal to the node’s already assigned distance, it is not reprocessed. As well as allowing the shortest distance to be assigned to each node, this prevents an endless loop from occurring when two actions depend on each other (e.g. \(...a2 \prec a1 \prec a2...\)).

As an action could appear in a plan multiple times, some nodes require multiple distances for the same goal; this is the case for the descendants of ORDERED/AND nodes’ right branch. Therefore, a node contains a map for each goal, from the last traversed ORDERED/AND to the node’s distance from the goal via the ORDERED/AND node. When the right branch of the ORDERED/AND node has been fully observed, the distance of the node, returned when calling a get distance method, will be the distance associated with that ORDERED/AND node. As the initial state is unknown and plans are not perfectly represented, the distances assigned to the left branch of ORDERED/AND nodes are not based on the depth of the right branch.

Labelling nodes with multiple distances per goal increases the worst case time complexity from \(O(n^2)\) to \(O(n^3)\), with respect to the number of actions. This is because each action in the graph could be a dependency of all other actions; thus, for all actions all other actions could be visited. When labelling the nodes with multiple values this process could be repeated n times. Therefore, to help minimise the number of nodes the BFTs traverse, when an UNORDERED/AND node is reached, its children’s (including indirect children’s) distance is not associated with the prior ORDERED/AND node(s). Developing this component greatly reduced (\(\approx \)halved) the run time of our experiments (Sect. 7.3.2) and had negligible impact on the accuracy of our approach.

The offline component of our system finishes by setting the prior probability of each goal. We chose to use a uniform prior probability as, since no actions have been observed, all goals are assumed to be equally likely.

5.2 Example

The Action Graph depicted in Fig. 5 shows the resulting action nodes’ distance from each goal for a simplified version of the Kitchen domain by Ramírez and Geffner [48]. In this example, there are two Goal Action nodes, namely, pack/lunch() and make/dinner(). By executing the BFTs described above, each node is labelled with their distance from each goal. This example will be used in Sect. 6, to demonstrate how an observation affects the goals’ probability, and thus, why this node value initialisation procedure has been implemented.

Fig. 5
figure 5

Example of an Action Graph with the action nodes labelled with their distance from each goal. G1 represents the goal (made_lunch), which is reached by performing the pack-lunch() action and G2 represents (made_dinner), which is reached by performing the make-dinner() action. This example is based on the Kitchen domain developed by Ramírez and Geffner [49] based on the work of Wu et al. [65]. To make this figure readable, it has been simplified, i.e. many nodes and edges are not included

6 Updating the goal probabilities

When an action is observed, the probability associated with each goal is updated based on either its distance from the observed action or the difference between its distance from the prior observation and the current observation. These two update rules are described in turn along with their advantages and disadvantages. The experiments section presents results for both these update rules separately, as well as combined. The pseudocode, for the rules combined, is provided in Algorithm 1.

figure d

6.1 Update rule 1: distance from observed action

Each goal’s probability is updated based on how close the goal is to the observed action and how unique the observation is to the goal. The probabilities of the goals closest to the observation are increased, whilst those furthest from the observation are decreased. If an observation only belongs to a single goal, that goal’s probability is increased and all other probabilities are decreased. This is performed by multiplying each goal’s probability by its distance from the observed action’s node divided by the sum of all goals’ distances (lines 7-10); then normalising the resulting values (line 12). Note, if the observation is not within a plan to reach the goal G, 0 is returned by the getDisFromGoal method (line 8) so that \(c(G)=0\) and, so long as another goal’s plan contains the action, its probability is reduced.

For the example shown in Fig. 5, there are two goals, both with a prior probability of 0.5. When the take(plate) action is observed, the resulting probabilities are unaltered as its node’s distance to each goal is equal. More nodes must be traversed to reach take(plate) from pack/lunch(), than from make/dinner(). Nevertheless, the goal with a shorter plan was not favoured as the distance counter (see Sect. 5) was only increased when a DEP or ORDERED-AND node was traversed. If take(knife) is observed, the probability of making a pack lunch is increased, i.e. \(P(\texttt {{(made-lunch)}})=0.67\) and \(P(\texttt {{(make-dinner)}})=0.33\), as the observed action is unique to this goal.

The main disadvantage of this approach is that the probabilities of goals with shorter, strongly ordered, plans are increased more than those with longer plans. Therefore, the list of returned candidate goals \({\mathcal {C}}\) often contains the goal(s) with a shorter plan. For instance, if an incomplete sequence of observations contains actions that approach both G1 and G2, whichever of these two goals has the shortest plan length will be returned as a candidate goal, the other will not be. The subsequent update rule aims towards mitigating this disadvantage.

6.2 Update rule 2: change in distance from the observed actions

If the previous observation (\(o^{t-1}\)) and the current observation (\(o^t\)) are connected via a DEP or ORDERED-AND node, the goal probabilities are updated based on the change in distance, i.e. the difference between the goal’s distance from the previous and current observations (lines 2-5 of Algorithm 1). To check if the observations are connected, an upwards traversal (in a depth-first manner) is performed, starting from the action node of \(o^{t-1}\), to find a DEP or ORDERED-AND node whose right branch’s child is the action node of \(o^t\).

If the list of observations is not missing any actions, the change in distance will always be 1, 0 or -1. As an observation could be missed (e.g. due to sensor failure), our algorithm needs to account for the difference being within a wider range of values. A negative difference indicates the observee moved further from the goal, whereas a positive difference indicates they moved closer. The sigmoid function converts the difference into a value between 0 and 1 (i.e. \(\sigma \) from line 3); the goal’s value is multiplied by this (line 4) and then, normalised (line 12). If either observation does not belong to the goal, the value of v(G) is equivalent to setting the result of the sigmoid function to 0; in other words, the difference is \(-\infty \). This update rule results in the probability of the goal the observee is moving towards at the highest rate to be increased the most.

When this rule is used independently from the first rule, if the previous observation is null, or the current and previous observations are not connected via a DEP or ORDERED/AND node (as detailed above), then \(c(G)=0.5\) for the goals dependent (or indirectly dependent) on the current observation and \(c(G)=0\) for all other goals. This prevents the goals that have shorter plans from being favoured; however, goals for which the observation appears in a (very) suboptimal plan are treated equally to those for which an optimal plan contains the observation.

In the example depicted in Fig. 6, if the observee moves from position 2_1 to 1_1 then to 0_1, the goal probabilities remain equal. As the distance to both goals is reduced at the same rate, the real goal is indiscernible. If the observee were to move vertically, and thus, step towards one goal (and away from the other), the corresponding goal’s probability is increased, e.g. observing move(2_1 1_1) then move(1_1 1_0) results in \(P(G1)=0.58\) and \(P(G2)=0.42\).

Fig. 6
figure 6

Action nodes’ distance from each goal, represented on a depiction of a grid-based navigation environment. G1 and G2 are goals and arrows represent move actions

6.3 Processes common to update rules 1 and 2

In both update rules, \(1+c(G)\) is calculated, rather than just c(G), so that a goal’s probability is never set to 0. If the probability of a goal were to be set to 0, it cannot be increased; thus, the heuristic would not be able to recover from receiving an incorrect (noisy) observation. These update rules, along with the graph’s structure, enable our system to handle noisy observations as well as suboptimal plans and missing observations.

When an action has been observed, which operator nodes have been completed are updated by traversing up the graph, in a depth-first manner, from the observed action’s node (line 13). OR nodes are set as completed if one of their children has been completed, DEP nodes are complete if the child connected to their right branch has been observed and UNORDERED/AND nodes are set as completed if all their children are. If a node is not set to completed, its parents are not traversed. When an ORDERED/AND node’s left branch has been completed, the nodes attached to its right branch are informed, so that their distance associated with that ORDERED/AND node is used when the next observation is received (as described in Sect. 5).

7 Experiments

Through experiments we aim to demonstrate the accuracy of our GR approach, after 10, 30, 50, 70 and 100% of actions in a plan have been observed, on 15 different domains. This section describes the evaluation metrics, followed by the setup and results of the different experiments. A comparison between our different update rules and the goal completion heuristic, namely, h\(_{\text {gc}}\), by Pereira et al. [44, 46] is provided, on problems for which differing percentages of fluents have been set to incorrect values. Our method is then compared to h\(_{\text {gc}}\) on a dataset containing GR problems with a known, and thus, correctly defined, initial world state.

Pereira et al. [44, 46] recently improved the accuracy and computational time of GR by finding landmarks, i.e. states that must be reached to achieve a particular goal. After processing the observations, the resulting value of each goal (\(G \in {\mathcal {G}}\)) is based on the percentage of its landmarks that have been completed. h\(_{\text {gc}}\) takes a threshold value as a parameter. Any goals whose value is greater than or equal to the most likely goal’s value minus the threshold are included in \({\mathcal {C}}\). When the threshold is 0.0, like our approach, only the most likely goal(s) are included in \({\mathcal {C}}\). Therefore, we present the results of their approach for 0.0 as the threshold. The compiled version of h\(_{\text {gc}}\), provided by Pereira et al. [44], was ran during the experiments.Footnote 3 We provide a detailed comparison with h\(_{\text {gc}}\) as it has been very recently developed and was shown to outperform alternative methods; other approaches to GR will be discussed in the related work section (Sect. 8).

The dataset created by Ramírez and Geffner [49] and Pereira et al. [44]Footnote 4 forms the basis for our experiments. Details on generating the lists of observations and the inaccurate initial states are provided in the setup sections, specific to each experiment. A brief description of each domain is supplied in Appendix C. Experiments were ran on a server with 16GB of RAM and a Intel Xeon 3.10GHz processor.

7.1 Evaluation metrics

Our approach is evaluated on the number of returned candidate goals (i.e. \(|{\mathcal {C}}|\)) and standard classification metrics, namely, accuracy (sometimes referred to as quality), precision, recall and F1-Score  [11, 27]. A definition for each of these is provided below. Subsequently, performance profiles, which are provided to show a comparison of the approaches’ run-times, are introduced.

7.1.1 Classification metrics applied to goal recognition

Accuracy (\(Q_{P,S}\)), precision (\(M_{P,S}\)), recall (\(R_{P,S}\)) and F1-Score (\(F1_{P,S}\)) are provided in Eqs. 123 and 4, respectively. The definitions of TP, FP, FN and TN are provided, from a GR perspective, in Table 1. In these definitions, \(G_P\) is the actual goal (ground truth) for problem P, \({\mathcal {C}}_{P,S}\) is the set of candidate goals returned by solution/approach S and \({\mathcal {G}}_{P}\) is the set of hypothesis goals. TP is 1 if the true goal is in the set of candidates or 0 if it is not; FN is the inverse of TP; FP is the number of returned candidates that are not the real goal, and TN is the number of goals correctly identified as not the real goal. For each metric, the average over all problems per domain is displayed in the results.

$$\begin{aligned} Q_{P,S}= & {} \frac{\mathrm{{TP+TN}}}{\mathrm{{TP+TN+FN+TP}}} \end{aligned}$$
(1)
$$\begin{aligned} M_{P,S}= & {} \frac{\textrm{TP}}{\mathrm{{TP+FP}}} \end{aligned}$$
(2)
$$\begin{aligned} R_{P,S}= & {} \frac{\textrm{TP}}{\mathrm{{TP+FN}}} \end{aligned}$$
(3)
$$\begin{aligned} F1_{P,S}= & {} {\left\{ \begin{array}{ll} 2*\frac{M_{P,S}*R_{P,S}}{M_{P,S}+R_{P,S}}, &{} \text {if } G_P \in \mathcal {C}_{P,S} \\ 0, &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(4)
Table 1 Definitions of True Positive (TP), False Positive (FP), False Negative (FN) and True Negative (TN) results of solution/approach (S) on a goal recognition problem (P)

Prior goal recognition papers [8, 44, 49] defined the accuracy/quality as the number of times the actual goal appeared in the set of candidate goals, i.e. they did not take \(|{\mathcal {C}}|\) into consideration when calculating accuracy. This resulted in approaches being reported as 100% accurate, even when more than one candidate goal was returned. In our paper, this is equivalent to recall (\(R_{P,S}\)). By using the definitions provided in this paper, an approach can only have an accuracy of 1 (i.e. 100%) if it always returns one candidate goal, i.e. the real goal.

7.1.2 Performance profiles

The computation times (T) are presented in performance profiles, as suggested by Dolan and Moré [7]. This enables the results to be presented in a more readable format, and all datasets can be grouped into a single result to prevent a small number of problems from dominating the discussion. To produce the performance profile an approach (\(S\in {\mathcal {S}}\)), i.e. of our Action Graph approach and of h\(_{\text {gc}}\), the ratio between its run-time (\(T_{P,S}\)) and the quickest run-time for a problem (\(P \in {\mathcal {P}}\)) is calculated, as shown in Eq. 5. Equation 6 calculates the percentage of problems an approach solved when the ratio is less than a given threshold, \(\tau \). When \(\tau =0\), the resulting \(P_{S}(\tau )\) of an approach is the percentage of problems it solved quicker than the other approach. How much \(\tau \) must be increased for 100% of problems to be solved depends on how far off the best approach that approach is.

$$\begin{aligned} \Gamma ^T_{P,S}= & {} \frac{ T_{P,S} }{\min {( T_{P,S}: S \in {\mathcal {S}})} } \end{aligned}$$
(5)
$$\begin{aligned} P_{S}(\tau )= & {} \frac{ 1 }{ |{\mathcal {P}}| } |P \in {\mathcal {P}}: \Gamma ^T_{P,S} \le \tau | \end{aligned}$$
(6)

7.2 Goal recognition with an inaccurate initial state

The main aim of our approach is to be able to perform GR when the initial state of the environment is defined inaccurately. A fluent’s value could be incorrect if it is unknown, and thus, incorrectly guessed, or an error has been made while determining the environment’s state. Therefore, a dataset containing differing percentages, i.e. 10, 20, 40, 60, 80 and 100%, of fluents set to incorrect values was produced. How this dataset was generated is discussed, followed by the results produced by our Action Graph approach and h\(_{\text {gc}}\).

7.2.1 Setup

A dataset containing problems with varying amounts of fluents set to incorrect values was generated from the dataset containing the first N % of observations, i.e. that was used in the experiments of Sect. 7.3.3. For each problem, contained in the aforementioned dataset, 10, 20, 40, 60, 80 and 100% of fluents were chosen at random and their value set to a randomly selected incorrect value. As there are elements of randomness, for each percentage of fluents, 5 problems were created. The changes that can be made to the initial state I were (manually) defined based on the actions’ effects.

For instance, in a Zeno-Travel problem, containing 5 cities, 4 people, 3 aircraft and 2 fuel-levels, there are 10 fluents whose initial value can be altered. Each person can be at a city or in an aircraft; thus, a fluent indicating a person’s location can be changed to one of the 7 alternative (incorrect) values. Each aircraft has two fluents associated with it, i.e. is in a city and has a fuel level; both of these can be changed to an alternative value.

Fig. 7
figure 7

Graph showing the effect increasing the amount of incorrect fluents in the initial state had on the accuracy of our Action Graph approach and h\(_{\text {gc}}\) by Pereira et al. [44]. Our Action Graph approach is indicated by solid lines, and the dash lines show the approach of Pereira et al. [44]. Each line colour indicates a different % of observations

These state changes could cause some (or all) goals to be unreachable from the defined initial state (e.g. in the Sokoban domain, the robot could be unable to navigate to a location from which one of the boxes can be pushed) and the initial state itself could be invalid/contradictory (e.g. in the Blocks-World domain, blockA’s fluent could express the block is on the table and the gripper’s fluent could indicate it is holding blockA). The changes made to the initial state are outlined in Appendix C, and a detailed table of possible changes can be found at https://doi.org/10.5281/zenodo.3621275. No modifications were made to the lists of observations.

7.2.2 Results discussion

The accuracy of our approach was not affected by setting fluents in the initial state to incorrect values, whereas the accuracy of h\(_{\text {gc}}\) greatly reduced (see Fig. 7 and Appendix D ). When building the Action Graph, the initial values of fluents are ignored. As a result, no matter what the initial values of fluents are defined as being, the same Action Graph is produced. The initialising of the nodes’ values uses the goal states, but not the initial state. Therefore, the output of our goal recognition approach is unaffected by incorrectly initialised fluents. Previously, researchers have used the initial state in their approach, and as a result, their accuracy will deteriorate. When 20% of the fluents’ were set to incorrect values, a large decrease in the accuracy of h\(_{\text {gc}}\) was observed, and as this percentage was increased, the accuracy further reduced. For several domains, i.e. Kitchen, Rovers and Intrusion-Detection, the resulting M and R of h\(_{\text {gc}}\) rose when 100% of fluents (rather than 80%) were incorrect. This is because at 100%, for these domains, all goals were contained in the set of candidate goals, and thus, the real goal was contained within \({\mathcal {C}}\). Other approaches to GR are also unable to handle inaccurate initial states because they attempt to find the plans/states that reach each goal from the defined initial world state. These are discussed further in the related work section.

7.3 Goal recognition with a known initial state

After describing the experiment setup, this section compares the computational time of our Action Graph approach to h\(_{\text {gc}}\). The accuracy of these approaches, when ran on problems with accurate initial states, is then discussed. This includes GR problems where the observations are the first N % of actions in a plan and ones for which the observations are a random N % of actions in a plan (i.e. is missing observations).

7.3.1 Setup

The GR problems in the original dataset\(^{4}\) contain 10/30/50/70/100% of observable actions in the plan to reach a goal; these observations/actions were selected at random. Therefore, for each of the original problems that contain 100% of the observations, we generated GR problems by selecting the first 10, 30, 50, 70 and 100% of observations. As a task planner (which is not guaranteed to find an optimal plan) was ran to create the problems produced by Pereira et al. [44], some observation sequences are suboptimal.

The accuracy results for our two goal probability update rules (described in Sect. 6), when ran independently and combined, are presented. In the results table, these are named AG1 (i.e. the first update rule), AG2 (the second update rule) and AG3, which is the combination of the two rules (i.e. Algorithm 1).

7.3.2 Run-times

The Action Graph heuristic took an average of 0.02 s to process all observations, whereas h\(_{\text {gc}}\) took 0.66 s. Labelling the nodes with their distance from the goals is computationally expensive; therefore, when the offline processing times (which includes the PDDL to Action Graph transformation steps) are included, Actions Graphs took an average of 2.38 s per problem. The performance profiles are displayed in Fig. 8, and the results per domain are shown in Table 2. Table 3 provides the average size of the Action Graph for each domain. These run-times were produced while processing the GR dataset containing the first N % of observations, which contains 2705 problems.

Fig. 8
figure 8

Performance profiles comparing the recognition time of our Action Graph approach to the goal completion heuristic by Pereira et al. [44]. These values were produced using the dataset containing GR problems with the first N % of actions in a plan as observations

Table 2 The run-times, in seconds, of our Action Graph approach, including and excluding the Action Graph initialisation (i.e. creation and node labelling) time, and h\(_{\text {gc}}\) [44] per domain. All is the total/average over all problems
Table 3 The average size of the Action Graph per domain. All is the average over all problems. The Nodes column provides the total number of action, DEP, ORDERED/AND, UNORDERED/AND and OR nodes.

When the whole process is included in the run-times, our approach GR out performed h\(_{\text {gc}}\) on 64% of problems; however, the difference in run-time was greater for the problems our solution was slower at (than for the problems h\(_{\text {gc}}\) was slower on). This is indicated by how much \(\tau \) must be increased before 100% of problems were solved. At \(\tau =17.45\), h\(_{\text {gc}}\) solved all problems, and at \(\tau =100.00\), all problems were solved by our approach. If only the online process is included, our approach solves 100% of problems quicker than h\(_{\text {gc}}\), and \(\tau \) must reach 20426 before h\(_{\text {gc}}\) solves 100%. Note: for h\(_{\text {gc}}\), the landmarks could be discovered offline, and thus, the online computational time reduced. This is only possible if the initial value of each fluent is known in advance.

Both the size and the structure of an Action Graph impact the run-times of our approach. Domains with relatively few actions have much shorter initialisation time (e.g. Kitchen, Intrusion-Detection and Campus). For larger domains, the structure of the graph had a greater impact as the run-time was affected the number of times each node was visited. Our node labelling algorithm, which performs BFT, does not visit nodes if their already assigned distance is lower than the current distance. Moreover, an UNORDERED/AND node’s children are not associated with the prior ORDERED/AND node, and thus, are visited fewer times than if no UNORDERED/AND node is traversed. As a result, for example, although the Action Graph of the Zeno-Travel domain contains more nodes than Driverlog’s, its run-time is shorter.

We have identified two ways in which the total processing time of our approach could be reduced. Rather than calculating all the nodes’ distances from the goals upfront, this process could be performed for just the observed actions; however, observations would be processed at a reduced rate. Second, the nodes’ distances for each goal could be computed in parallel; as we envision this process being performed offline (thus, the computational time of this is of lesser importance) and the performance gain would be hardware dependent, this was not implemented.

7.3.3 Results after processing the first N % of observations

Our Action Graph approach outperformed h\(_{\text {gc}}\) when 10, 30 and 50% of observations had been received, at 70 and 100% h\(_{\text {gc}}\) slightly outperforms our approach. As described in [46], the lower the number of observations the less likely it is that a landmark is observed; therefore, h\(_{\text {gc}}\) cannot disambiguate the goals. Figure 9 displays the average F1-Score, produced by AG3 and h\(_{\text {gc}}\), at each per cent of observations; Table 4 shows the results per domain for AG1, AG2, AG3 and h\(_{\text {gc}}\). All result is the average overall domains rather than problems, so that the result is not weighted towards the domains with the most problems.

Fig. 9
figure 9

Average F1-Score, after the first 10, 30, 50, 70 and 100% of observations have been processed, for our Action Graph approach and h\(_{\text {gc}}\) by Pereira et al. [44]

Table 4 Accuracy results for the dataset containing the first 10, 30, 50, 70 and 100% of observations. AG1 is the first update rule, AG2 the second update rule, and AG3 is the combination of the two rules (see Sect. 6)

Action Graphs have a low precision and recall for the Sokoban domain. GR problems for the Sokoban domain contain observations to navigate to and push two boxes to different locations. The actions for collecting and pushing one box were observed, before the second box was acted on. Whilst observing the actions to push the first box to its goal location, our Action Graph approach increased the probability of the appropriate goals (i.e. the goals the box was becoming closer to). When the observed agent started to navigate to the second box, the aforementioned goals’ probability was decreased. Therefore, when the second box is pushed, any of the goals containing a location it is being pushed towards could appear in the set of candidate goals. In other words, the goal probabilities lose information about the first goal atom to be achieved. We considered increasing the probability of the goals with fully observed atoms; however, in problems from other domains some goals’ atoms are subgoals of another goal.

For the Kitchen domain, our Action Graph approach reduced the number of candidate goals significantly more than h\(_{\text {gc}}\) as few landmarks were observed. On the other hand, due to the structure of the produced graph, Action Graphs produced a low R and M for the Depots and Blocks-World domains. The plans for these domains are highly state dependent, which is not captured by the Action Graph structure. For instance, in a Blocks-World problem, picking up blockA requires the gripper to be empty by putting down all blocks (including blockA). The graph structure captures the dependencies of actions; however, does not account for the prior state(s) of the environment, e.g. the gripper could already be empty.

AG2 only outperformed AG1 on the Easy-IPC-Grid domain. In this domain, there are strong constraints on the order in which actions are performed and a false goal could be traversed on-route to the real goal. Therefore, for this domain, update rule 2 prevented the shortest plan being favoured and successfully increased the probabilities of the goals the observed agent was navigating towards. Nevertheless, this update rule could not determine the real goal for the majority of domains. This is because all goals, whose plans contain the observed action, were multiplied (increased) by the same amount when the current and previous observations were not connected via a DEP (or ORDERED/AND) node. All suboptimal plans are encoded in an Action Graph’s structure; therefore, for many domains, all actions are included within a plan to reach any of the goals. Combining AG2 with AG1 increased the results of the Easy-IPC-Grid domain without greatly affecting the results produced for the other domains. The subsequent sections just show the results of AG3.

7.3.4 Missing observations results discussion

Our Action Graph approach and h\(_{\text {gc}}\) were also ran on the 6313 GR problems\(^{4}\) produced by Pereira et al. [44] and Ramírez and Geffner [49], that contain missing observations. These problems contain a random 10, 30, 50, 70 and 100% of observations. The F1-Scores are depicted in Fig. 10, and a table containing the results per domain can be found at https://doi.org/10.5281/zenodo.3621275.

These results show a similar trend to the previous experiment, i.e. our approach produced a higher F1-Score than h\(_{\text {gc}}\) at 10, 30 and 50% of observations (and vice versa after 70 and 100% of observations). Both approaches perform better on the dataset containing missing observations, than they did in the previous experiment, as each GR problem could contain observations that are close to the goal (due to random actions in a plan having been selected).

Fig. 10
figure 10

Average F1-Score produced by our Action Graph approach and h\(_{\text {gc}}\) by Pereira et al. [44] on the dataset containing missing observations (in other words when 10, 30, 50, 70 and 100% of actions had been observed)

8 Related work

Methods for intention recognition can be broadly categorised as data-driven and knowledge-driven (i.e. symbolic) methods [47, 67]. Data-driven approaches train a recognition model from a large dataset [1, 3, 33, 57, 67]. The main disadvantages of this method are that often a large amount of labelled training data is required and the produced models often only work on data similar to the training set [51, 66]. Since our work belongs to the category of knowledge-driven methods, data-driven methods are not further discussed.

Knowledge-driven approaches rely on a logical description of the actions agents can perform. They can be further divided into approaches that parse a library of plans (also known as “recognition as parsing"), and approaches that solve recognition problems, defined in languages usually associated with planning, i.e. “recognition as planning" [32]. Our GR approach derives a graph structure, similar to those used by some recognition as parsing methods, from a PDDL defined (planning-based) GR problem. Recognition as planning is often viewed as more flexible and general because a library of plans is not required and cyclic plans are difficult to compile into a library [49]. We chose to transform a PDDL planning problem into an Action Graph to enable the goal probabilities to be updated quickly, all plans (including suboptimal plans) to be represented, cyclic plans to be expressed and inaccurate initial states to be handle. Our approach takes advantage of the fact that a perfect/complete representation of plans is not required to perform GR. In this section, recognition as parsing and recognition as planning approaches are discussed in turn.

8.1 Recognition as parsing

In recognition as parsing, hierarchical structures are usually developed which include abstract actions along with how they are decomposed to concrete (observable) actions [31]. Several prior approaches have represented these hierarchical structures as AND/OR trees [15, 24]. As previously mentioned, our graph structure was inspired by these works. The recognition as parsing approaches, mentioned in this section, enables both the goal and plan of the observed agent to be recognised but do not mention handling invalid initial states or suboptimal plans.

Kautz et al. [30, 31] introduce a language to describe a hierarchy of actions. Based on which low level actions are observed, the higher level task(s) an agent is attempting to achieve is inferred. Their paper presents one of the earliest plan/goal recognition formal theories that aimed to handle simultaneous action execution, multi-plan recognition and missing observations.

A set of action sequence graphs is derived from a library of plans in [61]. This set is compared to an action sequence graph, created from a sequence of observations, to find the plan most similar to the observation sequence. Their approach was shown to perform well on misclassified (incorrect) sensor observations and missing actions; but to generate the library of plans a planner is called, and thus, a known initial state is required.

8.2 Recognition as planning

Recognition as planning is a more recently proposed approach, in which languages normally associated with task planning, such as STRIPS [10] and PDDL [40], define the actions agents can perform (along with their preconditions and effects) and world states. In recognition as parsing there are usually only action definitions, whereas planning-based approaches allow for the inclusion of state knowledge, such as what objects are found within the environment and their locations.

In [48, 49], it was proposed to view goal recognition as the inverse of planning. To find the difference in the cost of the plan to reach the goal with and without taking the observations into consideration, a planner is called twice for every possible goal. Therefore, the performance would greatly deteriorate when exposed to inaccurate initial states. In [5], the work from [49] was extended, to find the joint probability of pairs of goals rather than a single goal. Their work aimed to handle multiple interleaving goals. Although initial approaches were computationally expensive as they required a task planner to be called multiple times [5, 48, 49], the latest advances in recognition as planning algorithms have greatly improved this [8, 44, 55].

Plan graphs were proposed in [8]. A plan graph, which contains actions and propositions labelled as either true, false or unknown, is built from a planning problem and updated based on the observations. Rather than calling a planner, the graph is used to calculate the cost of reaching the goals. Our Action Graph structure differs greatly from a plan graph, as Action Graphs only contain actions and the constraints between those actions.

More recently, Pereira et al. [44, 46] significantly reduced the recognition time by finding landmarks. Our experiments show a comparison with this approach. This work has been expanded to handle incomplete domain models [45], i.e. GR problems with incomplete preconditions and effects, and its accuracy has been very recently improved by Wilken et al. [64]. In future work, we will explore applying our work to incomplete domain models and comparing to the work of Wilken et al. [64].

9 Conclusion

Our novel approach to goal recognition aims to handle problems in which the defined initial state is inaccurate. An inaccurate initial state contains fluents whose value is unknown and/or incorrect. For instance, if an item or agent (e.g. cup or human) is occluded its location is indeterminable, and thus, possibly defined incorrectly. Our approach transforms a PDDL defined GR problem into an Action Graph, which models the order constraints between actions. Each node is labelled with the minimum number of DEP and ORDERED/AND nodes, traversed to reach it from each goal. When an action is observed, the goals’ probability is updated based on either the distance the action’s associated node is from the goals or, if the current and prior observation are connected via a DEP or ORDERED/AND node, the change in distance. Experiments proved that when the fluents have incorrect values in the initial state, e.g. because they are unknown or sensed/determined incorrectly, the performance of our approach is unaffected.

In future work, we intend to apply our Action Graph method to further challenges associated with symbolic GR. As well as the defined initial state being inaccurate, the domain model (i.e. action definitions) could be incorrect [45]. Therefore, we will experiment with adapting the Action Graph structure based on the order observations are received. To create a more compact structure, and thus reduce the computational time of this, we will investigate grouping related actions into a single node. For instance, in the Sokoban domain, the same actions can be performed on both box1 and box2; therefore, actions, such as push(box1 loc1 loc2) and push(box2 loc1 loc2), can be grouped into a single node. Moreover, we intend to apply our GR approach to problems in which either the observed agent has multiple goals, or multiple agents have individual or joint goals [28].

Another direction for future research is to modify our approach so that the Action Graph structure expands over time. As more observation are made, new actions could be inserted and the links between actions could be adjusted. This could enable actions and sequences to be learnt. The performance of our current method could be compared to this and to recurrent neural networks (RNN) based approaches [3] using real-world data.

As developing the PDDL can be time consuming and challenging, researches have attempted to replace this manual process, with deep learning methods [1, 2]. We will explore the potential of learning the Action Graph structure from pairs of images, and then, converting the Action Graph into a PDDL defined domain. Which, subsequently, could be provided as input to task planners as well as goal recognisers.