1 Introduction

Robots operating in everyday environments need increased flexibility to cope with sudden and unexpected variations in their executive context. In dynamic environments, the robot needs to perform sensing and planning operations at low cost, as well as to monitor its own actions in a goal-oriented fashion [1, 2]. High flexibility and adaptation capabilities are required at each level of the sense-plan-act loop to deploy robust and effective robotic solutions [3]. At the same time, the enhanced reasoning and acting capabilities of cognitive robots have to be implemented considering facets like modularity, reusability, design effectiveness, and human-readability.

In this work, we propose a unified framework that combines high-level decision making, continuous execution monitoring, and online motion generation. In our framework, task scheduling and execution monitoring are handled by Reconfigurable Behaviour Trees (RBTs) [4], while stable dynamical systems are exploited for online motion generation. RBTs extend the traditional behavior trees [2] with control layer features permitting the continuous monitoring of the task execution and the online replanning of the task to react to perceptual stimuli. This is particularly useful to rapidly react to external perturbations in the task execution like unexpected changes in the pose of the manipulated object. The control layer of RBT is implemented using only the basic components of a traditional BT, de facto keeping the same level of expressiveness, modularity, and human-readability.

Motion planning with dynamical systems (DS) has several interesting features, which explains why it is gaining interest in the learning community. A DS generates goal-oriented, converging trajectories that connect any two points in the robot’s workspace. DS trajectories are generated at runtime, allowing for online motion replanning to handle unexpected perturbations [5,6,7,8]. A stable dynamical system can be learned from human demonstrations [9] in an incremental way [10, 11]. Finally, suitable control can be applied to constrain the motion within a certain region and adapt to changes in the workspace [12]. In this work, we use the Energy-based Stabilizer of Dynamical Systems (ESDS) [13] that offers a good compromise between training time and accuracy in motion reproduction. Overall, the combination of reconfigurable behaviour trees for task monitoring and dynamical systems for trajectory generation results in a flexible and efficient framework for robotic task learning and execution.

The rest of the paper is organized as follows. Section 2 reviews related work. The key aspects of the RBT and the ESDS approaches are presented in Sect. 3. Results obtained in a sorting task are shown in Sect. 4. Section 5 states the conclusion and proposes further extensions. To improve readability, a list of abbreviations used throughout the paper is provided in Table 1.

Table 1. List of abbreviations

2 Related work

2.1 Motion planning with dynamical systems

A prominent approach in the field of DS-based motion generation are Dynamic Movement Primitives (DMP) [14]. In a DMP, a linear dynamics is summed to a nonlinear forcing term used to reproduce a given trajectory. The nonlinear term is suppressed by a time-dependent clock signal to guarantee convergence to a given target. DMPs have been extended in several ways. Among the others: [15] tests different approaches to chain multiple DMPs representing position and orientation trajectories, while [16] introduces task-dependent parameters to customize the execution. The time-dependency is the main drawback of DMPs, since it limits the generalization capabilities pf the DMP outside the demonstration area [17].

The stable estimator of dynamical systems (SEDS) [9] is probably the first example of stable and time-independent DS learned from demonstration. A known problem of SEDS is the lack of accuracy in reproducing nonlinear motions. This problem is alleviated in [18] by learning a Lyapunov function that complies with the demonstrations and stabilizes the DS with minimal intervention, and in [17] by learning a diffeomorphism that maps the training data into a space where the SEDS works accurately. These approaches permit an accurate reproduction of the demonstrated motion, but introduce extra training time and open parameters to fit also the Lyapunov function or the diffeomorphism.

Some approaches attempt to find a compromise between training time and accuracy [13, 19, 20]. Blocher et al. [19] applies only if Gaussian mixture regression is used to encode the nonlinear system, while the approach in [20] works with a single demonstration. In previous work [13], we propose a solution that works with any regression technique and multiple demonstrations, offering a good compromise between accuracy and training time. This approach, outlined in Sect. 3.1, is exploited here to generate the motion trajectories.

2.2 Task scheduling and execution monitoring

Behavior Trees (BT) [2, 21] are a popular approach for task scheduling that extend and generalize several other approaches including decision trees [22], the subsumption architecture [23], and sequential behavior composition [24]. BTs use a fixed number of node types to build a high-level representation of a robotic task as a rooted tree. A BT is expressive, human-readable, modular, and reusable; all features that make BTs a popular and attractive technique. However, BTs are a high-level task scheduling approach where the decision making is decoupled from the physical executive state and possible ambiguities in the execution are not considered.

During the operational life of a robot, unexpected events may occur and a certain degree of flexibility in the task scheduling is beneficial. Approaches like [25, 26] permit continuous monitoring of the task execution by integrating perceptual stimuli in the decision making. We have exploited this functionality to build a framework for learning, monitoring, and executing manipulation tasks [27,28,29]. The drawback of these approaches is that they may lose the modularity and reusability of BTs. In [4], we proposed Reconfigurable Behaviour Trees (RBTs) as an executive framework that combines high-level decision making and low-level control features. RBTs are described in Sect. 3.2.

3 Methods

In this section, we present the two components of our framework, namely the Energy-based Stabilizer of Dynamical Systems (ESDS) for online motion generation and the Reconfigurable Behavior Tree (RBT) for task scheduling and execution monitoring.

3.1 Energy-based stabilizer of dynamical systems

Dynamical system definition

ESDS encodes the demonstrated trajectories into a nonlinear DS. Without loss of generality, we assume that such a DS has an equilibrium at \(\hat{\boldsymbol{x}}=\boldsymbol{0}\) where also the velocity vanished. Under this assumption, the DS used by ESDS can be written as

$$ \dot{\boldsymbol{x}} = -\boldsymbol{x}+ \gamma (z,s) \kappa (\Vert \boldsymbol{x}\Vert )\boldsymbol{f}(\boldsymbol{x}), $$
(1)

where \(\boldsymbol{f}(\boldsymbol{x})\) is a smooth and nonlinear vector field, \(\kappa (\Vert \boldsymbol{x}\Vert ) = 1 - e^{-0.1\Vert \boldsymbol{x}\Vert ^{2}}\) ensures that (1) has equilibrium point at \(\hat{\boldsymbol{x}}=\boldsymbol{0}\), and \(\gamma (z,s)\) a stabilizing gain defined later in this section. ESDS makes no prior assumption on the stability of \(\boldsymbol{f}(\boldsymbol{x})\) and uses \(\gamma (z,s)\) to ensure global convergence to \(\hat{\boldsymbol{x}}\) at runtime. This permits learning the nonlinear term \(\boldsymbol{f}(\boldsymbol{x})\) from training data using any regression technique, using the approach described as follows.

Learning from demonstration

The nonlinear term \(\boldsymbol{f}(\boldsymbol{x})\) in (1) is learned in a supervised manner. In our setting, a demonstrated trajectory \(\{ \boldsymbol{x}_{t},\,\dot{\boldsymbol{x}}_{t}\}_{t=1}^{T}\) consists of desired robot positions \(\boldsymbol{x}_{t}\) and velocities \(\dot{\boldsymbol{x}}_{t}\) sampled in \(T\) consecutive time instances. This has to be cast into input/output pairs for supervised learning. To this end, we assume \(\gamma (z,s)=1\) and rewrite (1) as

$$ \kappa ^{-1}(\Vert \boldsymbol{x}\Vert )\left ( \dot{\boldsymbol{x}} + \boldsymbol{x}\right ) = { \boldsymbol{f}}(\boldsymbol{x}), $$
(2)

where

$$ \kappa ^{-1}(\Vert \boldsymbol{x}\Vert ) = \textstyle\begin{cases} \frac{1}{\kappa (\Vert \boldsymbol{x}\Vert )} \quad \Vert \boldsymbol{x}\Vert \neq 0 \\ 1 ~~\qquad \text{otherwise} \end{cases}\displaystyle . $$
(3)

The relation (2) clearly shows that \(\boldsymbol{f}(\boldsymbol{x})\) is a nonlinear mapping between the position (\(\boldsymbol{x}\)) and \(\kappa ^{-1}(\Vert \boldsymbol{x}\Vert )\left ( \dot{\boldsymbol{x}} + \boldsymbol{x}\right )\). Hence, we define the input data as \(\mathcal{I} = \{\boldsymbol{x}_{t}\}_{t=1}^{T}\) and the output data as \(\mathcal{O} = \left \lbrace \kappa ^{-1}(\Vert \boldsymbol{x}_{t}\Vert )\left (\dot{\boldsymbol{x}}_{t} + \boldsymbol{x}_{t}\right )\right \rbrace _{t=1}^{T}\). The procedure is repeated in case multiple demonstrations are provided. Once the input/output pairs ℐ/\(\mathcal{O}\) are computed, any regression technique can be used to retrieve an estimate of \(\boldsymbol{f}(\boldsymbol{x})\) for each input state. It is worth mentioning that the self-defined inverse function \(\kappa ^{-1}(\cdot )\) in (3) does not generate discontinuities in the training data since \(\dot{\boldsymbol{x}}_{t}\) and \(\boldsymbol{x}_{t}\) vanishes while approaching the equilibrium point.

Online stabilization

The dynamical system (1) has an equilibrium point at \(\hat{\boldsymbol{x}}=0\), but so far we have not discussed the stability of this equilibrium. Global stability of the equilibrium, implying convergence of the trajectory to a given target, is of fundamental importance to apply a DS to robot motion generation. In ESDS, the dynamics is stabilized at runtime by the stabilizing gain \(\gamma (z,s)\), which is derived from energy considerations. We summarize the main aspects of this derivation and refer to [13] for further details.

Stability of nonlinear dynamical systems is usually analyzed using Lyapunov theory [30] by defining a positive definite and vanishing at the equilibrium (Lyapunov) function and showing that its time derivative is negative definite and vanishes at the equilibrium. For the DS (1), we consider the Lyapunov function depicted in Fig. 1, consisting of a quadratic potential \(\frac{1}{2}\Vert \boldsymbol{x}\Vert ^{2}\) and an energy tank. The level of the energy tank is represented by the additional state variable \(s\). Taking the time derivative of this function, it is possible to find stability conditions that affects the value of \(s\) and its rate of change \(\dot{s}\). In particular, one can prove that the energy level \(s\) is increased by the term \(\boldsymbol{x}^{\mathrm{T}}\boldsymbol{x}\) and decreased when \(z=\kappa (\Vert \boldsymbol{x}\Vert )\boldsymbol{x}^{ \mathrm{T}}\boldsymbol{f}(\boldsymbol{x})<0\). Therefore, the value of \(s\) and \(\dot{s}\) can be controlled to render the function in Fig. 1 a proper Lyapunov function. This is obtained by assigning an initial value \(\overline{s}\) to \(s\) and by designing \(0 \leq \gamma (z,s) \leq \) to prevent that \(s\) becomes negative. The function \(\gamma (z,s)\) is computed as summarized in Table 2. Note that the dynamics \(-\boldsymbol{x}\) in (1) vanishes only at the equilibrium and prevents the DS from stopping in a spurious attractor if \(\gamma (z,s)=0\). As detailed in [13], the initial value of \(s\) can be efficiently estimated from training data.

Fig. 1.
figure 1

The Lyapunov function used to prove the stability of (1)

Table 2. Definition of the function \(\gamma (z,s)\)

Comparison with existing approaches

The effectiveness of ESDS is demonstrated on a public benchmark (the LASA Handwriting datasetFootnote 1) containing 26 2D motions. For a quantitative comparison, we consider the reproduction accuracy and the training time. Accuracy is measured with the Swept Error Area (SEA) metric [18] that represents the distortion (area) between a generated trajectory and the relative demonstration. Comparative results from [13] are reported in Table 3, using a third-party implementationFootnote 2 for CLF-DM. The comparison shows that ESDS offers a good compromise between accuracy and training time, especially considering that it does not impose any restriction on the regression technique used to retrieve the motion.

Table 3. Reproduction error and training time (mean / range) of different approaches on the LASA dataset

3.2 Task monitoring with reconfigurable behavior trees

Behavior trees

A BT is a graphical language that models the behavior of an autonomous agent as rooted trees obtained by combing the finite set of primitive nodes types shown in Table 4. Condition and Action nodes are execution nodes, while Sequence, Fallback/Selector, Decorators, and Parallel are control flow nodes. During the execution, each node enters a running state that terminates with a success or a failure. Executing a BT means periodically traversing the tree top-down and left to right. The traversal is regulated by a clock signal called “tick”. As shown in Table 4, each node in the BT responds to the tick depending on its own type and on the return state of the other nodes.

Table 4. BT nodes and their return status

Reconfigurable behavior trees

Intuitively, we can think of a Reconfigurable Behavior Tree (RBT) [4] as a BT with branches that can be dynamically allocated and deallocated. The dynamic allocation mechanism is triggered by environmental stimuli like object addition or removal. The continuous monitoring of such stimuli, as well as the dynamic allocation of the tree’s branches, requires the following functionalities: i) a Long-Term Memory (LTM), organized in JSON schemata, to conveniently store tree branches that are dynamically loaded into a Working Memory (WM), ii) an Emphasizer that transforms logical pre- and postconditions and sensory data into a priority assigned to each branch in the LTM, and iii) an Instantiator process that queries the LTM and dynamically load the subtree in the WM. Distinctive features of a RBT are implemented using only the six nodes in Table 4, i.e. without increasing the design effort.

Task monitoring and execution

The generic RBT, depicted in Fig. 2, combines static and dynamic nodes. It is a goal oriented task scheduler since it terminates as soon as the goal reached condition turns True. It exploits a blackboard to share variables across the nodes and store the logical pre- and postconditions and the priority list. The blackboard is a thread-safe mechanism that greatly simplifies the communication between nodes. The core of the RBT are the green and blue nodes that are executed in parallel, preserving the asynchronous nature of sensor readings and decision making. The Emphasizer (green node) transforms sensory input into subtree priorities and it never terminates (is always in the Running state). This, and the fact the Parallel node parent of the Emphasizer terminates only if its two children do, let the RBT run until the goal is reached. The blue nodes, which are dynamically allocated at each tick, load from the LTM the branch with higher priority and prepare a small BT ready for execution. The dynamic allocation of the blue nodes is required to prevent deadlocks, letting the RBT reach the goal.

Fig. 2.
figure 2

The generic RBT combines static and dynamic nodes. The nodes surrounded by the gray thick square permits terminating the task once a global goal is reached. The Emphasizer (green dot-dashed node) changes the priority level of each box using perceptual information (robot-box distances). The Instantiator is responsible for allocating and deallocating the nodes surrounded by the blue dashed polygon when the box priorities change. The action node change box online modifies the box to sort based on the priority, while action node sort box generates the pick-and-place subtask (Color figure online)

Instantiator

The procedure presented in Algorithm 1 is used by the Instantiator to query a tree branch from the LTM and instantiate an executable BT. The branch to load is identified by a unique label that serves as root of the BT (line 2). Once the JSON schemata are queried, the tree is built top-down by iteratively casting JSON schemata into the corresponding BT nodes and adding them to the tree (lines 4 to 19). Similarly to [2], each postcondition is represented by a Condition node (line 7) and attached to the existing tree with a Fallback \(\mathcal{T}_{\mathit{fal}}\) (lines 8–9). This permits terminating the execution when the postcondition is True. Multiple postconditions are attached to a Sequence node (line 11) and therefore sequentially checked. After these steps, the Sequence node is connected to \(\mathcal{T}_{\mathit{fal}}\) (line 12) that now contains all the postconditions and can be connected the BT (line 18). Preconditions are also cast into Condition nodes (line 16), while Action nodes are used to represent robot motions represented as stable dynamical systems (see Sect. 3.1). An Action can be executed only if all its preconditions are True. This is achieved by connecting Action and Conditions to a Sequence node that is then attached to the BT (lines 17–18).

Algorithm 1
figure 3

Load and instantiate a BT

Emphasizer

In contrast to traditional BTs, RBTs allows a dynamic reconfiguration of the execution order and a continuous monitoring of the task execution. This is obtained by exploiting logical pre- and postconditions and continuous environmental stimuli. Pre- and postconditions are used to identify active branches in the tree, i.e. subtrees with True preconditions and at least a False postconditions. The Emphasizer periodically looks for active subtrees and determines if there are execution conflicts, i.e. multiple branches that are concurrently active. This ambiguity in the decision process is resolved using a priority-based mechanism. We introduce a priority for each active branch, a real value normalized between 0 and 1, and use it to determine which subtree has to be loaded and executed. Following [4], we define the priority \(e\) as

$$ e(\omega )= \textstyle\begin{cases} 1 & \textrm{if } \omega \leq \omega _{\min } \\ \frac{\omega -\omega _{\max }}{\omega _{\min } - \omega _{\max }} & \textrm{if } \omega _{\min } < \omega < \omega _{\max } \\ 0 & \textrm{if } \omega \geq \omega _{\max } \end{cases} $$
(4)

where \(\omega \) is the distance between the robot end-effector and the object to manipulate, and the thresholds \(\omega _{\min }\) and \(\omega _{\max }\) represent the minimum and maximum distance to the object. A typical choice for \(\omega _{\min }\) is the size of the object, while \(\omega _{\max }\) represent the maximum distance at which the object can be successfully grasped.

4 Evaluation

We evaluate our framework in the box sorting task depicted in Fig. 3. The robotic has to sort colored boxes (r_box, b_box, and g_box) by picking them from the table (Fig. 3(a)) and placing them in the “storage” area indicated by a white patch (Fig. 3(b)). The Panda robot and the operational space use CoppeliaSim [31] and the robot model identified in [32]. Each box can be sorted by executing the BT shown in Fig. 4, where the generic box reads as r_box, b_box, or g_box. The presented scenario is simple, but it is sufficient to show the modularity and reusability of the proposed solution. Indeed, the nodes in Fig. 4 can be abstracted into the higher-level action node execute subtree in Fig. 2 (modularity). Moreover, the subtree in Fig. 4 can be exploited to pick and place similar objects (reusability), like the 3 colored boxes in Fig. 3. The switching between the 3 sorting subtasks is regulated by a RBT like the one in Fig. 2. At runtime, the sorting BT of the closest (highest priority) box is loaded and connected to the BT using Algorithm 1, replacing the block execute subtree. To compute the subtree priority (4), we choose \(\omega _{\min }\) as the length of the box side (\(\omega _{\min }=0.05\text{ m}\)) and we estimate the maximum distance that still allows grasping a box to be \(\theta _{\max }=1\text{ m}\). The RBT successfully terminates if the 3 boxes are sorted in the storage area. This is obtained by defining the RBT goal as r_box placedb_box placedg_box placed. RBTs are implemented in Python using the basic BT nodes provided by py_tree.Footnote 3 In our implementation, the RBT has 19 nodes, obtained by merging the trees in Fig. 2 and Fig. 4. Tree traversals (ticks) are periodically performed every \(38\text{ ms}\). For comparison, a standard BT requires 151 nodes to schedule the same task [4]. This result in an increase in tick time of \(\approx 38\) %.

Fig. 3.
figure 4

An illustration of the box sorting task. The manipulator has to pick one by one the boxes from the table and place them in the white (storage) area

Fig. 4.
figure 5

The BT used to sort (pick from the table and place in the storage area) a generic box. This BT replaces the generic execute subtree node in Fig. 2

The pick box and place box action nodes in Fig. 4 are mapped to stable dynamical systems using the ESDS approach presented in Sect. 3.1. The DS representing each motion is learned from a demonstrated trajectory using a Gaussian processes [33]. The training data and the retrieved trajectories are shown in Fig. 5. At runtime, the system generates a smooth trajectory connecting the current end-effector position with a given target position. The target for picking actions is the box position, while for placement actions it is the desired position in the storage area. Trajectories generated by ESDS (implemented in Matlab®) are depicted in Fig. 5.

Fig. 5.
figure 6

(a)-(b) Placement trajectories generated with ESDS. The black solid lines indicate the demonstrations. Colored lines are the trajectories generated for each box. (c) Picking trajectories generated with ESDS. The vertical black dashed line indicates the time when the red box is removed from the scene. Starting from this position a new motion is generated to pick the green box (green solid line). The red dashed line shows the originally-planned trajectory (pick r_box)

Nominal execution

In this experiment, the robot performs the sorting task in an “ideal” scenario where the boxes are placed on the table like in Fig. 3(a) and no external perturbation occurs. The robot performs the sorting starting with the blue box (closest), then switches to the red box, and finally to the green one. As already mentioned, the task execution order is regulated by the distance between the robot gripper and the boxes. After placing the b_box in the storage area, the RBT updates the priority of r_box and g_box. Since r_box is the closest to the robot, the subtree sort r_box (Fig. 4) is loaded and executed. The sort g_box subtask is executed at the end and the RBT successfully terminates.

External perturbations

In this experiment, we introduce a perturbation during the execution of the sorting task. The task is the same as depicted in Fig. 4 and described in the previous paragraph. As before, the robot performs the sorting starting with the blue box (closest) and then switches to the r_box. However, during the execution of the pick r_box action, we remove the r_box from the scene. The system detects this incident and promptly reacts by loading the sort g_box subtask. ESDS replans the pick trajectory on the fly without discontinuities (Fig. 5(c)). Once the green box is sorted, the RBT does not terminate since the goal r_box placedb_box placedg_box placed is False and keeps monitoring the scene to detect eventual changes. At this point, one can place the red box in the storage area or back on the table. If r_box is placed in the storage area then r_box placed becomes True and the task successfully terminates. In case r_box is placed back to the table, the Instantiator loads the sort r_box subtree and the sorting task successfully terminates.

5 Conclusion and future work

In this work, we presented a framework for monitoring and executing robotic tasks. The framework has two key components, namely the Reconfigurable Behavior Trees and the Energy-based Stabilizer of Dynamical Systems. RBTs are a novel executive framework that features high-level decision making and low-level control capabilities, enabling continuous monitoring and task switching. ESDS learns a flexible and robust motion representation from a handful of demonstrations that permits generating motion trajectories with proved convergence. The proposed framework is tested in a sorting scenario, showing its capabilities of handling perturbations during task execution.

In future work, we plan to test our framework on real robots and to provide a more comprehensive evaluation in challenging human-robot interaction scenarios. We will also investigate the possibility of learning also the RBT from human demonstrations.