In this section, we present the two components of our framework, namely the Energy-based Stabilizer of Dynamical Systems (ESDS) for online motion generation and the Reconfigurable Behavior Tree (RBT) for task scheduling and execution monitoring.
Energy-based stabilizer of dynamical systems
Dynamical system definition
ESDS encodes the demonstrated trajectories into a nonlinear DS. Without loss of generality, we assume that such a DS has an equilibrium at \(\hat{\boldsymbol{x}}=\boldsymbol{0}\) where also the velocity vanished. Under this assumption, the DS used by ESDS can be written as
$$ \dot{\boldsymbol{x}} = -\boldsymbol{x}+ \gamma (z,s) \kappa (\Vert \boldsymbol{x}\Vert )\boldsymbol{f}(\boldsymbol{x}), $$
(1)
where \(\boldsymbol{f}(\boldsymbol{x})\) is a smooth and nonlinear vector field, \(\kappa (\Vert \boldsymbol{x}\Vert ) = 1 - e^{-0.1\Vert \boldsymbol{x}\Vert ^{2}}\) ensures that (1) has equilibrium point at \(\hat{\boldsymbol{x}}=\boldsymbol{0}\), and \(\gamma (z,s)\) a stabilizing gain defined later in this section. ESDS makes no prior assumption on the stability of \(\boldsymbol{f}(\boldsymbol{x})\) and uses \(\gamma (z,s)\) to ensure global convergence to \(\hat{\boldsymbol{x}}\) at runtime. This permits learning the nonlinear term \(\boldsymbol{f}(\boldsymbol{x})\) from training data using any regression technique, using the approach described as follows.
Learning from demonstration
The nonlinear term \(\boldsymbol{f}(\boldsymbol{x})\) in (1) is learned in a supervised manner. In our setting, a demonstrated trajectory \(\{ \boldsymbol{x}_{t},\,\dot{\boldsymbol{x}}_{t}\}_{t=1}^{T}\) consists of desired robot positions \(\boldsymbol{x}_{t}\) and velocities \(\dot{\boldsymbol{x}}_{t}\) sampled in \(T\) consecutive time instances. This has to be cast into input/output pairs for supervised learning. To this end, we assume \(\gamma (z,s)=1\) and rewrite (1) as
$$ \kappa ^{-1}(\Vert \boldsymbol{x}\Vert )\left ( \dot{\boldsymbol{x}} + \boldsymbol{x}\right ) = { \boldsymbol{f}}(\boldsymbol{x}), $$
(2)
where
$$ \kappa ^{-1}(\Vert \boldsymbol{x}\Vert ) = \textstyle\begin{cases} \frac{1}{\kappa (\Vert \boldsymbol{x}\Vert )} \quad \Vert \boldsymbol{x}\Vert \neq 0 \\ 1 ~~\qquad \text{otherwise} \end{cases}\displaystyle . $$
(3)
The relation (2) clearly shows that \(\boldsymbol{f}(\boldsymbol{x})\) is a nonlinear mapping between the position (\(\boldsymbol{x}\)) and \(\kappa ^{-1}(\Vert \boldsymbol{x}\Vert )\left ( \dot{\boldsymbol{x}} + \boldsymbol{x}\right )\). Hence, we define the input data as \(\mathcal{I} = \{\boldsymbol{x}_{t}\}_{t=1}^{T}\) and the output data as \(\mathcal{O} = \left \lbrace \kappa ^{-1}(\Vert \boldsymbol{x}_{t}\Vert )\left (\dot{\boldsymbol{x}}_{t} + \boldsymbol{x}_{t}\right )\right \rbrace _{t=1}^{T}\). The procedure is repeated in case multiple demonstrations are provided. Once the input/output pairs ℐ/\(\mathcal{O}\) are computed, any regression technique can be used to retrieve an estimate of \(\boldsymbol{f}(\boldsymbol{x})\) for each input state. It is worth mentioning that the self-defined inverse function \(\kappa ^{-1}(\cdot )\) in (3) does not generate discontinuities in the training data since \(\dot{\boldsymbol{x}}_{t}\) and \(\boldsymbol{x}_{t}\) vanishes while approaching the equilibrium point.
Online stabilization
The dynamical system (1) has an equilibrium point at \(\hat{\boldsymbol{x}}=0\), but so far we have not discussed the stability of this equilibrium. Global stability of the equilibrium, implying convergence of the trajectory to a given target, is of fundamental importance to apply a DS to robot motion generation. In ESDS, the dynamics is stabilized at runtime by the stabilizing gain \(\gamma (z,s)\), which is derived from energy considerations. We summarize the main aspects of this derivation and refer to [13] for further details.
Stability of nonlinear dynamical systems is usually analyzed using Lyapunov theory [30] by defining a positive definite and vanishing at the equilibrium (Lyapunov) function and showing that its time derivative is negative definite and vanishes at the equilibrium. For the DS (1), we consider the Lyapunov function depicted in Fig. 1, consisting of a quadratic potential \(\frac{1}{2}\Vert \boldsymbol{x}\Vert ^{2}\) and an energy tank. The level of the energy tank is represented by the additional state variable \(s\). Taking the time derivative of this function, it is possible to find stability conditions that affects the value of \(s\) and its rate of change \(\dot{s}\). In particular, one can prove that the energy level \(s\) is increased by the term \(\boldsymbol{x}^{\mathrm{T}}\boldsymbol{x}\) and decreased when \(z=\kappa (\Vert \boldsymbol{x}\Vert )\boldsymbol{x}^{ \mathrm{T}}\boldsymbol{f}(\boldsymbol{x})<0\). Therefore, the value of \(s\) and \(\dot{s}\) can be controlled to render the function in Fig. 1 a proper Lyapunov function. This is obtained by assigning an initial value \(\overline{s}\) to \(s\) and by designing \(0 \leq \gamma (z,s) \leq \) to prevent that \(s\) becomes negative. The function \(\gamma (z,s)\) is computed as summarized in Table 2. Note that the dynamics \(-\boldsymbol{x}\) in (1) vanishes only at the equilibrium and prevents the DS from stopping in a spurious attractor if \(\gamma (z,s)=0\). As detailed in [13], the initial value of \(s\) can be efficiently estimated from training data.
Table 2. Definition of the function \(\gamma (z,s)\)
Comparison with existing approaches
The effectiveness of ESDS is demonstrated on a public benchmark (the LASA Handwriting datasetFootnote 1) containing 26 2D motions. For a quantitative comparison, we consider the reproduction accuracy and the training time. Accuracy is measured with the Swept Error Area (SEA) metric [18] that represents the distortion (area) between a generated trajectory and the relative demonstration. Comparative results from [13] are reported in Table 3, using a third-party implementationFootnote 2 for CLF-DM. The comparison shows that ESDS offers a good compromise between accuracy and training time, especially considering that it does not impose any restriction on the regression technique used to retrieve the motion.
Table 3. Reproduction error and training time (mean / range) of different approaches on the LASA dataset
Task monitoring with reconfigurable behavior trees
Behavior trees
A BT is a graphical language that models the behavior of an autonomous agent as rooted trees obtained by combing the finite set of primitive nodes types shown in Table 4. Condition and Action nodes are execution nodes, while Sequence, Fallback/Selector, Decorators, and Parallel are control flow nodes. During the execution, each node enters a running state that terminates with a success or a failure. Executing a BT means periodically traversing the tree top-down and left to right. The traversal is regulated by a clock signal called “tick”. As shown in Table 4, each node in the BT responds to the tick depending on its own type and on the return state of the other nodes.
Table 4. BT nodes and their return status
Reconfigurable behavior trees
Intuitively, we can think of a Reconfigurable Behavior Tree (RBT) [4] as a BT with branches that can be dynamically allocated and deallocated. The dynamic allocation mechanism is triggered by environmental stimuli like object addition or removal. The continuous monitoring of such stimuli, as well as the dynamic allocation of the tree’s branches, requires the following functionalities: i) a Long-Term Memory (LTM), organized in JSON schemata, to conveniently store tree branches that are dynamically loaded into a Working Memory (WM), ii) an Emphasizer that transforms logical pre- and postconditions and sensory data into a priority assigned to each branch in the LTM, and iii) an Instantiator process that queries the LTM and dynamically load the subtree in the WM. Distinctive features of a RBT are implemented using only the six nodes in Table 4, i.e. without increasing the design effort.
Task monitoring and execution
The generic RBT, depicted in Fig. 2, combines static and dynamic nodes. It is a goal oriented task scheduler since it terminates as soon as the goal reached condition turns True. It exploits a blackboard to share variables across the nodes and store the logical pre- and postconditions and the priority list. The blackboard is a thread-safe mechanism that greatly simplifies the communication between nodes. The core of the RBT are the green and blue nodes that are executed in parallel, preserving the asynchronous nature of sensor readings and decision making. The Emphasizer (green node) transforms sensory input into subtree priorities and it never terminates (is always in the Running state). This, and the fact the Parallel node parent of the Emphasizer terminates only if its two children do, let the RBT run until the goal is reached. The blue nodes, which are dynamically allocated at each tick, load from the LTM the branch with higher priority and prepare a small BT ready for execution. The dynamic allocation of the blue nodes is required to prevent deadlocks, letting the RBT reach the goal.
Instantiator
The procedure presented in Algorithm 1 is used by the Instantiator to query a tree branch from the LTM and instantiate an executable BT. The branch to load is identified by a unique label that serves as root of the BT (line 2). Once the JSON schemata are queried, the tree is built top-down by iteratively casting JSON schemata into the corresponding BT nodes and adding them to the tree (lines 4 to 19). Similarly to [2], each postcondition is represented by a Condition node (line 7) and attached to the existing tree with a Fallback \(\mathcal{T}_{\mathit{fal}}\) (lines 8–9). This permits terminating the execution when the postcondition is True. Multiple postconditions are attached to a Sequence node (line 11) and therefore sequentially checked. After these steps, the Sequence node is connected to \(\mathcal{T}_{\mathit{fal}}\) (line 12) that now contains all the postconditions and can be connected the BT (line 18). Preconditions are also cast into Condition nodes (line 16), while Action nodes are used to represent robot motions represented as stable dynamical systems (see Sect. 3.1). An Action can be executed only if all its preconditions are True. This is achieved by connecting Action and Conditions to a Sequence node that is then attached to the BT (lines 17–18).
Emphasizer
In contrast to traditional BTs, RBTs allows a dynamic reconfiguration of the execution order and a continuous monitoring of the task execution. This is obtained by exploiting logical pre- and postconditions and continuous environmental stimuli. Pre- and postconditions are used to identify active branches in the tree, i.e. subtrees with True preconditions and at least a False postconditions. The Emphasizer periodically looks for active subtrees and determines if there are execution conflicts, i.e. multiple branches that are concurrently active. This ambiguity in the decision process is resolved using a priority-based mechanism. We introduce a priority for each active branch, a real value normalized between 0 and 1, and use it to determine which subtree has to be loaded and executed. Following [4], we define the priority \(e\) as
$$ e(\omega )= \textstyle\begin{cases} 1 & \textrm{if } \omega \leq \omega _{\min } \\ \frac{\omega -\omega _{\max }}{\omega _{\min } - \omega _{\max }} & \textrm{if } \omega _{\min } < \omega < \omega _{\max } \\ 0 & \textrm{if } \omega \geq \omega _{\max } \end{cases} $$
(4)
where \(\omega \) is the distance between the robot end-effector and the object to manipulate, and the thresholds \(\omega _{\min }\) and \(\omega _{\max }\) represent the minimum and maximum distance to the object. A typical choice for \(\omega _{\min }\) is the size of the object, while \(\omega _{\max }\) represent the maximum distance at which the object can be successfully grasped.