Combining decision making and dynamical systems for monitoring and executing manipulation tasks

In this paper, we propose a unified framework for online task scheduling, monitoring, and execution that integrates reconfigurable behavior trees, a decision-making framework with integrated low-level control functionalities, and reactive motion generation with stable dynamical systems. In this way, we realize a flexible and reactive system capable of coping with unexpected variations in the executive context without penalizing modularity, expressiveness, and readability of humans. The framework is evaluated in a simulated sorting task showing promising results in terms of flexibility regarding task scheduling and robustness to external disturbances.


Introduction
Robots operating in everyday environments need increased flexibility to cope with sudden and unexpected variations in their executive context. In dynamic environments, the robot needs to perform sensing and planning operations at low cost, as well as to monitor its own actions in a goal-oriented fashion [1,2]. High flexibility and adaptation capabilities are required at each level of the senseplan-act loop to deploy robust and effective robotic solutions [3]. At the same time, the enhanced reasoning and acting capabilities of cognitive robots have to be implemented considering facets like modularity, reusability, design effectiveness, and human-readability.
In this work, we propose a unified framework that combines highlevel decision making, continuous execution monitoring, and online motion generation. In our framework, task scheduling and execution monitoring are handled by Reconfigurable Behaviour Trees (RBTs) [4], while stable dynamical systems are exploited for online motion generation. RBTs extend the traditional behavior trees [2] with control layer features permitting the continuous monitoring of the task execution and the online replanning of the task to react to perceptual stimuli. This is particularly useful to rapidly react to external perturbations in the task execution like unexpected changes in the pose of the manipulated object. The control layer of RBT is implemented using only the basic components of a traditional BT, de facto keeping the same level of expressiveness, modularity, and human-readability. To improve readability, a list of abbreviations used throughout the paper is provided in Table 1.

Motion planning with dynamical systems
A prominent approach in the field of DS-based motion generation are Dynamic Movement Primitives (DMP) [14]. In a DMP, a linear dynamics is summed to a nonlinear forcing term used to reproduce a given trajectory. The nonlinear term is suppressed by a timedependent clock signal to guarantee convergence to a given target. DMPs have been extended in several ways. Among the others: [15] tests different approaches to chain multiple DMPs representing position and orientation trajectories, while [16] introduces task-dependent parameters to customize the execution. The timedependency is the main drawback of DMPs, since it limits the generalization capabilities pf the DMP outside the demonstration area [17]. The stable estimator of dynamical systems (SEDS) [9] is probably the first example of stable and time-independent DS learned from demonstration. A known problem of SEDS is the lack of accuracy in reproducing nonlinear motions. This problem is alleviated in [18] by learning a Lyapunov function that complies with the demonstrations and stabilizes the DS with minimal intervention, and in [17] by learning a diffeomorphism that maps the training data into a space where the SEDS works accurately. These approaches permit an accurate reproduction of the demonstrated motion, but introduce extra training time and open parameters to fit also the Lyapunov function or the diffeomorphism.
Some approaches attempt to find a compromise between training time and accuracy [13,19,20]. Blocher et al. [19] applies only if Gaussian mixture regression is used to encode the nonlinear system, while the approach in [20] works with a single demonstration. In previous work [13], we propose a solution that works with any regression technique and multiple demonstrations, offering a good compromise between accuracy and training time. This approach, outlined in Sect. 3.1, is exploited here to generate the motion trajectories.

Task scheduling and execution monitoring
Behavior Trees (BT) [2,21] are a popular approach for task scheduling that extend and generalize several other approaches including decision trees [22], the subsumption architecture [23], and sequential behavior composition [24]. BTs use a fixed number of node types to build a high-level representation of a robotic task as a rooted tree. A BT is expressive, human-readable, modular, and reusable; all features that make BTs a popular and attractive technique. However, BTs are a high-level task scheduling approach where the decision making is decoupled from the physical executive state and possible ambiguities in the execution are not considered.
During the operational life of a robot, unexpected events may occur and a certain degree of flexibility in the task scheduling is beneficial. Approaches like [25,26] permit continuous monitoring of the task execution by integrating perceptual stimuli in the decision making. We have exploited this functionality to build a framework for learning, monitoring, and executing manipulation tasks [27][28][29]. The drawback of these approaches is that they may lose the modularity and reusability of BTs. In [4], we proposed Reconfigurable Behaviour Trees (RBTs) as an executive framework that combines high-level decision making and low-level control features. RBTs are described in Sect. 3.2.

Methods
In this section, we present the two components of our framework, namely the Energy-based Stabilizer of Dynamical Systems (ESDS) for online motion generation and the Reconfigurable Behavior Tree (RBT) for task scheduling and execution monitoring.

Energy-based stabilizer of dynamical systems
Dynamical system definition ESDS encodes the demonstrated trajectories into a nonlinear DS. Without loss of generality, we assume that such a DS has an equilibrium atx = 0 where also the velocity vanished. Under this assumption, the DS used by ESDS can be written asẋ where f(x) is a smooth and nonlinear vector field, κ( x ) = 1 − e −0.1 x 2 ensures that (1) has equilibrium point atx = 0, and γ (z, s) a stabilizing gain defined later in this section. ESDS makes no prior assumption on the stability of f(x) and uses γ (z, s) to ensure global convergence tox at runtime. This permits learning the nonlinear term f(x) from training data using any regression technique, using the approach described as follows.

Learning from demonstration
The nonlinear term f(x) in (1) is learned in a supervised manner. In our setting, a demonstrated trajectory {x t ,ẋ t } T t=1 consists of desired robot positions x t and velocitiesẋ t sampled in T consecutive time instances. This has to be cast into input/output pairs for supervised learning. To this end, we assume γ (z, s) = 1 and rewrite (1) as where The relation (2) clearly shows that f(x) is a nonlinear mapping between the position (x) and κ −1 ( x ) (ẋ + x). Hence, we define the input data as I = {x t } T t=1 and the output data . The procedure is repeated in case multiple demonstrations are provided. Once the input/output pairs I/O are computed, any regression technique can be used to retrieve an estimate of f(x) for each input state. It is worth mentioning that the self-defined inverse function κ −1 (·) in (3) does not generate discontinuities in the training data sinceẋ t and x t vanishes while approaching the equilibrium point.
Online stabilization The dynamical system (1) has an equilibrium point atx = 0, but so far we have not discussed the stability of this equilibrium. Global stability of the equilibrium, implying convergence of the trajectory to a given target, is of fundamental importance to apply a DS to robot motion generation. In ESDS, the dynamics is stabilized at runtime by the stabilizing gain γ (z, s), which is derived from energy considerations. We summarize the main aspects of this derivation and refer to [13] for further details. Stability of nonlinear dynamical systems is usually analyzed using Lyapunov theory [30] by defining a positive definite and vanishing at the equilibrium (Lyapunov) function and showing that its time derivative is negative definite and vanishes at the equilibrium. For the DS (1), we consider the Lyapunov function depicted in Fig. 1, consisting of a quadratic potential 1 2 x 2 and an energy tank. The level of the energy tank is represented by the additional state variable s. Taking the time derivative of this function, it is possible to find stability conditions that affects the value of s and its rate of changė s. In particular, one can prove that the energy level s is increased by the term x T x and decreased when z = κ( x )x T f(x) < 0. Therefore, the value of s andṡ can be controlled to render the function in Fig. 1 a proper Lyapunov function. This is obtained by assigning an initial value s to s and by designing 0 ≤ γ (z, s) ≤ to prevent that s becomes negative. The function γ (z, s) is computed as summarized in Table 2. Note that the dynamics −x in (1) vanishes only at the equilibrium and prevents the DS from stopping in a spurious attractor if γ (z, s) = 0. As detailed in [13], the initial value of s can be efficiently estimated from training data.
Comparison with existing approaches The effectiveness of ESDS is demonstrated on a public benchmark (the LASA Handwriting dataset 1 ) containing 26 2D motions. For a quantitative comparison, we consider the reproduction accuracy and the training time. Accuracy is measured with the Swept Error Area (SEA) metric [18] that represents the distortion (area) between a generated trajectory and the relative demonstration. Comparative results from [13] are reported in Table 3, using a third-party implementation 2 for CLF-DM. The comparison shows that ESDS offers a good compromise between accuracy and training time, especially considering that it does not impose any restriction on the regression technique used to retrieve the motion.

Task monitoring with reconfigurable behavior trees
Behavior trees A BT is a graphical language that models the behavior of an autonomous agent as rooted trees obtained by combing the finite set of primitive nodes types shown in Table 4. Condition and Action nodes are execution nodes, while Sequence, Fallback/Selector, Decorators, and Parallel are control flow nodes. During the execution, each node enters a running state that terminates with a success or a failure. Executing a BT means periodically traversing the tree top-down and left to right. The traversal is regulated by a clock signal called "tick". As shown in Table 4, each node in the BT responds to the tick depending on its own type and on the return state of the other nodes.
Reconfigurable behavior trees Intuitively, we can think of a Reconfigurable Behavior Tree (RBT) [4] as a BT with branches that can be dynamically allocated and deallocated. The dynamic allocation mechanism is triggered by environmental stimuli like object addition or removal. The continuous monitoring of such stimuli, as well as the dynamic allocation of the tree's branches, requires the following functionalities: i) a Long-Term Memory (LTM), organized in JSON schemata, to conveniently store tree branches that are dynamically loaded into a Working Memory (WM), ii) an Emphasizer that transforms logical pre-and postconditions and sensory data into a priority assigned to each branch in the LTM, and iii) an Instantiator process that queries the LTM and dynamically load the subtree in the WM. Distinctive features of a RBT are implemented using only the six nodes in Table 4, i.e. without increasing the design effort.
Task monitoring and execution The generic RBT, depicted in Fig. 2, combines static and dynamic nodes. It is a goal oriented task scheduler since it terminates as soon as the goal reached condition turns True. It exploits a blackboard to share variables across the nodes and store the logical pre-and postconditions and the priority list. The blackboard is a thread-safe mechanism that greatly simplifies the communication between nodes. The core of the RBT are the green and blue nodes that are executed in parallel, preserving the asynchronous nature of sensor readings and decision making. The Emphasizer (green node) transforms sensory input into subtree priorities and it never terminates (is always in the Running state). This, and the fact the Parallel node parent of the Emphasizer terminates only if its two children do, let the RBT run until the goal is reached. The blue nodes, which are dynamically allocated at each tick, load from the LTM the branch with higher priority and prepare a small BT ready for execution. The dynamic allocation of the blue nodes is required to prevent deadlocks, letting the RBT reach the goal. Emphasizer In contrast to traditional BTs, RBTs allows a dynamic reconfiguration of the execution order and a continuous monitoring of the task execution. This is obtained by exploiting logical preand postconditions and continuous environmental stimuli. Pre-and T fal ← ATTACHSUBTREE(T seq ) 13: end if 14: a, preC ← GETACTIONS(schema)

17:
T seq ← SEQUENCENODE(C, A) 18: T ← ATTACHSUBTREE(T fal , T seq ) 19: end for 20: return T 21: end function 312 heft 6.2020 postconditions are used to identify active branches in the tree, i.e. subtrees with True preconditions and at least a False postconditions. The Emphasizer periodically looks for active subtrees and determines if there are execution conflicts, i.e. multiple branches that are concurrently active. This ambiguity in the decision process is resolved using a priority-based mechanism. We introduce a priority for each active branch, a real value normalized between 0 and 1, and use it to determine which subtree has to be loaded and executed. Following [4], we define the priority e as where ω is the distance between the robot end-effector and the object to manipulate, and the thresholds ω min and ω max represent the minimum and maximum distance to the object. A typical choice for ω min is the size of the object, while ω max represent the maximum distance at which the object can be successfully grasped.

Evaluation
We evaluate our framework in the box sorting task depicted in Fig. 3. The robotic has to sort colored boxes (r_box, b_box, and g_box) by picking them from the table (Fig. 3(a)) and placing them in the "storage" area indicated by a white patch ( Fig. 3(b)). The Panda robot and the operational space use CoppeliaSim [31] and the robot model identified in [32]. Each box can be sorted by executing the BT shown in Fig. 4, where the generic box reads as r_box, b_box, or g_box. The presented scenario is simple, but it is sufficient to show the modularity and reusability of the proposed solution. Indeed, the nodes in Fig. 4 can be abstracted into the higherlevel action node execute subtree in Fig. 2 (modularity). Moreover, the subtree in Fig. 4 can be exploited to pick and place similar objects (reusability), like the 3 colored boxes in Fig. 3. The switching between the 3 sorting subtasks is regulated by a RBT like the one in Fig. 2. At runtime, the sorting BT of the closest (highest priority) box is loaded and connected to the BT using Algorithm 1, replacing the block execute subtree. To compute the subtree priority (4), we choose ω min as the length of the box side (ω min = 0.05 m) and we estimate the maximum distance that still allows grasping a box to be θ max = 1 m. The RBT successfully terminates if the 3 boxes are sorted in the storage area. This is obtained by defining the RBT goal as r_box placed ∧ b_box placed ∧ g_box placed. RBTs are implemented in Python using the basic BT nodes provided by  py_tree. 3 In our implementation, the RBT has 19 nodes, obtained by merging the trees in Fig. 2 and Fig. 4. Tree traversals (ticks) are periodically performed every 38 ms. For comparison, a standard BT requires 151 nodes to schedule the same task [4]. This result in an increase in tick time of ≈ 38 %. The pick box and place box action nodes in Fig. 4 are mapped to stable dynamical systems using the ESDS approach presented in Sect. 3.1. The DS representing each motion is learned from a demonstrated trajectory using a Gaussian processes [33]. The training data and the retrieved trajectories are shown in Fig. 5. At runtime, the system generates a smooth trajectory connecting the current end-effector position with a given target position. The target for picking actions is the box position, while for placement actions it is the desired position in the storage area. Trajectories generated by ESDS (implemented in Matlab ® ) are depicted in Fig. 5.
Nominal execution In this experiment, the robot performs the sorting task in an "ideal" scenario where the boxes are placed on the table like in Fig. 3(a) and no external perturbation occurs. The robot performs the sorting starting with the blue box (closest), then switches to the red box, and finally to the green one. As already mentioned, the task execution order is regulated by the distance between the robot gripper and the boxes. After placing the b_box in the storage area, the RBT updates the priority of r_box and g_box. Since r_box is the closest to the robot, the subtree sort r_box (Fig. 4) is loaded and executed. The sort g_box subtask is executed at the end and the RBT successfully terminates.
External perturbations In this experiment, we introduce a perturbation during the execution of the sorting task. The task is the same as depicted in Fig. 4 and described in the previous paragraph. As before, the robot performs the sorting starting with the blue box (closest) and then switches to the r_box. However, during the execution of the pick r_box action, we remove the r_box from the scene. The system detects this incident and promptly reacts by loading the sort g_box subtask. ESDS replans the pick trajectory on the fly without discontinuities (Fig. 5(c)). Once the green box is sorted, the RBT does not terminate since the goal r_box placed ∧ b_box placed ∧ g_box placed is False and keeps monitoring the scene to detect eventual changes. At this point, one can place the red box in the storage area or back on the table. If r_box is placed in the storage area then r_box placed becomes True and the task successfully terminates. In case r_box is placed back to the table, the Instantiator loads the sort r_box subtree and the sorting task successfully terminates.

Conclusion and future work
In this work, we presented a framework for monitoring and executing robotic tasks. The framework has two key components, namely the Reconfigurable Behavior Trees and the Energy-based Stabilizer of Dynamical Systems. RBTs are a novel executive framework that features high-level decision making and low-level control capabilities, enabling continuous monitoring and task switching. ESDS learns a flexible and robust motion representation from a handful of demonstrations that permits generating motion trajectories with proved convergence. The proposed framework is tested in a sorting scenario, showing its capabilities of handling perturbations during task execution.
In future work, we plan to test our framework on real robots and to provide a more comprehensive evaluation in challenging humanrobot interaction scenarios. We will also investigate the possibility of learning also the RBT from human demonstrations.
Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.