1 Introduction

Optimal control is a powerful tool to synthesize motions and controls through task goals (cost/optimality) and constraints (e.g., system dynamics, interaction constraints). We can formulate such problems using direct methods (Betts, 2009), which first discretize over both state and controls, and then optimize using sparse general-purpose nonlinear programming (NLP) software such as Snopt (Gill et al., 2005), Knitro (Byrd et al., 2006), and Ipopt (Wächter & Biegler, 2006). To ensure that the system state evolves as described by the equations of motion, we define equality constraints in the NLP problem. However, common algorithms for nonlinear programming can only guarantee the constraint satisfaction at their convergence. When this occurs, we say that the discretized states are dynamically feasible. Furthermore, despite the advantage of using advanced software for nonlinear programming, these algorithms perform very large matrix factorizations during the computation of the search direction, i.e., the resolution of the Karush-Kuhn-Tucker (KKT) problem. To do so, they use sparse linear solvers such as MA27, MA57, and MA97 (see http://www.hsl.rl.ac.uk/) that do not exploit the temporal/Markovian structure of optimal control problems efficiently. Indeed, the expensive factorizations of these linear solvers limit their practical use to realtime control on reduced models [e.g., Wieber (2006), Pardo et al. (2016), Mastalli et al. (2017), Di Carlo et al. (2018)] or motion planning [e.g., Carpentier et al. (2016), Aceituno-Cabezas et al. (2017), Winkler et al. (2018), Merkt et al. (2018), Mastalli et al. (2020)] in robotics. Furthermore, classical line search methods used in general-purpose NLP solvers are less effective than the nonlinear roll-out of the dynamics used in shooting methods as their use increases the number of iterations [cf. Liao and Shoemaker (1992)], which is used in recent method for multiple shooting (Giftthaler et al., 2018).

Dynamic programming methods, which have their foundations in the calculus of variations as indirect methods, have once again attracted attention due to recent results on fast nonlinear model predictive control based on DDP [e.g., Tassa et al. (2012), Koenemann et al. (2015), Neunert et al. (2018), Farshidian et al. (2017)]. In particular, there is a significant interest in the iterative linear-quadratic regulator (iLQR) algorithm (Li & Todorov, 2004) as its Gauss-Newton (GN) approximation reduces the computation time while having super-linear convergence. Both iLQR and DDP algorithms perform a Riccati sweep in the backward pass, which incorporates elements that are reminiscent of Pontryagin’s maximum principle (PMP). For instance, at convergence, the gradient of the value function in the backward pass represents the costate; instead, the roll-out of the system dynamics describes the state integration step. This connection was recognized by Bellman’s groundbreaking work (Bellman, 1954) that established the so-called Hamilton-Jacobi-Bellman (HJB) equation in the continuous-time domain. In contrast to classical direct collocation approaches, these approaches exploit the temporal/Markovian structure of the optimal control problem by solving a sequence of smaller sub-problems derived from Bellman’s principle of optimality (Mayne, 1966). This leads to fast and cheap computations due to very small matrix factorizations and effective data cache accesses. Despite these advantages, both algorithms are unable to handle equality and inequality constraints efficiently. Furthermore, they have a poor basin of attraction for a good local optimum as it requires a good initialization in order to converge and are prone to numerical instability—commonly recognized challenges for single shooting approaches (Betts, 2009). These undesirable properties are mainly due to the fact that the iLQR/DDP algorithms implicitly enforce the dynamic feasibility through the system roll-out.

1.1 Related work

Trade-offs between feasibility and optimality appear in most of the state-of-the-art nonlinear programming software. For instance, Ipopt includes a feasibility restoration phase which aims at reducing the constraint violation (Wächter & Biegler, 2006). In Knitro, the progress on both feasibility and optimality is achieved by adding an \(\ell ^1\)-norm penalty term for the constraints in the merit function (Byrd et al., 2006). In fact, by changing the merit function or the line search procedure, we can put emphasis on obtaining feasible solutions before trying to optimize them. Instead, the iLQR/DDP algorithms do not make this trade-off, as the backward and forward passes do not accept infeasible iterations. However, recent work on multiple shooting DDP (Giftthaler et al., 2018; Mastalli et al., 2020) has provided ways of handling dynamically infeasible iterations, which we elaborate below.

The multiple shooting variants in Giftthaler et al. (2018), Mastalli et al. (2020) are rooted in dynamic programming. For instance, Giftthaler et al. (2018) introduced a liftedFootnote 1 version of the algebraic Riccati equation that allows initialization of both state and control trajectories; it further accounts for the relinearization required by the dynamics gaps in the backward pass and uses a merit function to balance feasibility and optimality. In turn, in our previous work (Mastalli et al., 2020), we proposed a modification of the forward pass that numerically matches the gap contraction expected by a direct multiple shooting method subject to equality constraints only. It factorizes the KKT matrix via a Riccati recursion and defines the behavior of the defect constraints based on the first-order necessary condition (FONC) of optimality.Footnote 2 These approaches improve numerical robustness against poor initialization, as they are able to use an initial guess for the state trajectory. Unfortunately, none of these methods handle inequality constraints such as control limits, with the exception of a recent work that computes squashed control sequences (Marti-Saumell et al., 2020).

There are two main strategies for incorporating arbitrary constraints: active set and penalization methods (as extensively described in Nocedal and Wright (2006)). In the robotics community, one of the first successful attempts to incorporate inequality constraints in DDP used an active set approach (Tassa et al., 2014), which is based on Ohno (1978)—a pioneering work in the control community. Concretely, this approach focused on handling control limits during the computation of the backward pass, i.e., in the minimization of the action-value function (\(Q-\)function),Footnote 3 which resembles the control Hamiltonian at convergence [see Kirk (1998), Section 3.11]. The method is popularly named Box-DDP, and the authors also showed a better convergence rate when compared with a squashing function approach. Later, Xie et al. (2017) included general inequality constraints into the \(Q-\)function and the forward pass. The method sacrifices the computational performance by including a second quadratic program, which is solved in the forward pass. However, it still remains faster than solving the same problem using direct collocation with Snopt as reported in Hargraves and Paris (1987).

Generally speaking, active set methods are suitable for small-size problems (such as minimizing the \(Q-\)function described above) as their accuracy and speed often outperform other methods. However, the combinatorial complexity of finding the active set is prohibitive in large-scale optimization problems. This motivates the development of penalty-based methods, despite their numerical difficulties: ill-conditioning and slow convergence. To overcome these difficulties, Lantoine and Russell (2008) proposed a method that incorporates an augmented Lagrangian term. This method was studied in the context of robust thrust optimization, in which the dynamical system has fewer degrees of freedom compared to complex legged robots. Later, Howell et al. (2019) extended the augmented Lagrangian approach to handle arbitrary inequality constraints for aerial navigation and manipulation problems. Additionally, the algorithm incorporates an active set projection for solution polishing and is often faster than direct collocation solved with Ipopt or Snopt.

Our work proposes a feasibility-driven search for nonlinear optimal control problems with control limits. The main motivation of our approach is to increase the algorithm’s basins of attraction, by focusing on feasibility instead of focusing solely on efficiency and optimality. Apart from the control limits and dynamics, we handle all remaining constraints (e.g., state and friction cone) through quadratic penalization, as described in the results section.

1.2 Contribution

The main contribution of this work is the first complete study of the numerical properties, behaviors, and guarantees of feasibility-driven search in differential dynamic programming. It relies on three technical contributions:

  1. (i)

    An original and efficient optimal control algorithm that directly handles control limits (Box-FDDP),

  2. (ii)

    Extensive comparisons against direct transcription and Box-DDP\(^+\) (a single shooting method),

  3. (iii)

    A tutorial that connects the different branches of theory in optimal control, and

  4. (iv)

    An experimental validation of the dynamic feasibility evolution against interior point and active set algorithms for nonlinear programming.

Fig. 1
figure 1

Snapshots of an athletic behavior computed by our feasibility-driven approach with explicit control limits: Box-FDDP. The optimized motion considers the robot’s full-body dynamics, joint limits, and friction cone constraints. We initialize our algorithm using the default posture and quasi-static torques (as described in the results section). We describe three desired actions: grasping the bar at a specific point, raising the feet up-ward a given location (Seq. 1), and landing over a specific placement (Seq. 5). The shown behavior emerges without further specification from our problem formulation. The arrows describe the magnitude and direction of the contact forces. To watch the video, click the figure or see https://youtu.be/bOGBPTh_lsU(Color figure online)

Our approach builds on top of our previous results on feasibility-driven search (Mastalli et al., 2020), for which we hereby propose to define two modes in our algorithm: feasibility-driven and control-bounded. It considers the dynamic feasibility in the forward pass and explicitly incorporates control limits, which does not require a merit function as in Giftthaler et al. (2018). Additionally, our approach has outstanding numerical capabilities, which allow us to generate motions that go beyond state-of-the-art methods on optimal control or trajectory optimization in robotics, e.g., the athletic maneuver of a humanoid robot shown in Fig. 1.

2 Direct multiple shooting and differential dynamic programming

Before describing our approach, we introduce direct multiple shooting, and explain its numerical advantages when compared to single shooting methods such as DDP (Sect. 2.1). Then, in Sect. 2.2 we present a unique tutorial that connects the various branches of theory: KKT, PMP, and HJB. Additionally, in Sect. 2.3 we describe the salient aspects of original Box-DDP proposed by Tassa et al. (2014), and our variant Box-DDP\(^+\). This section contains known material, although we believe it contributes (i) to unveil the underlying problems of differential dynamic programming, and (ii) to understand the theoretical foundations of our feasibility-driven approach.

2.1 Direct multiple shooting for optimal control

Without loss of generality, we consider a direct multiple shooting approach for the nonlinear optimal control problem with control bounds in which each shooting segment defines a single timestep:s

$$\begin{aligned} \begin{aligned} \min _{{\mathbf {x}}_s,{\mathbf {u}}_s}&&\ell _N({\mathbf {x}}_N) + \sum _{k=0}^{N-1} \ell _k({\mathbf {x}}_k,{\mathbf {u}}_k) \\&\quad \text {s.t.}&\quad \bar{\mathbf {f}}_0\,\,\,\,\,\,:={\mathbf {x}}_0\ominus \tilde{{\mathbf {x}}}_0={\mathbf {0}},\\&&\quad \bar{\mathbf {f}}_{k+1}:={\mathbf {f}}({\mathbf {x}}_k,{\mathbf {u}}_k)\ominus {\mathbf {x}}_{k+1}={\mathbf {0}},& \scriptstyle {\forall k=\{0,1,\cdots ,N-1\}}\\&&\quad \underline{\mathbf {u}} \le {\mathbf {u}}_k \le \bar{\mathbf {u}},&\scriptstyle {\forall k=\{0,1,\cdots ,N-1\}}\\ \end{aligned} \end{aligned}$$

where the state \({\mathbf {x}}\in X\) lies in a differential manifold (with dimension \(n_x\)); the control \({\mathbf {u}}\in {\mathbb {R}}^{n_u}\) defines the input commands; \(\underline{\mathbf {u}}\)\(\bar{\mathbf {u}}\) are the lower and upper control bounds; \(\tilde{{\mathbf {x}}}_0\) is the initial state of the system; \(\ominus \) describes the difference operator of the state manifold [notation inspired by Frese (2008) that is needed to optimize over manifolds (Gabay, 1982)]; \(N\in {\mathbb {N}}\) describes the number of nodes (horizon); \(\ell _N\), \(\ell _k\) are the terminal and running cost functions; and \(\bar{\mathbf {f}}_0\), \(\bar{\mathbf {f}}_{k+1}\in T_{{\mathbf {x}}}X\) are the gap residual functions that impose the dynamic feasibility; and \(T_{{\mathbf {x}}}X\) describes the tangent space of the state manifold at the state \({\mathbf {x}}\).

Equation (1) describes a nonlinear program as the system dynamics are transcribed into a set of algebraic equations with defects in each timestep. It is possible to extend this notation for cases where shooting segments contain multiple timesteps; however, as seen later, this does not provide any computational benefit, i.e., reduction in the computation time or better distribution of nonlinearities of the dynamics. Fig. 2 depicts the transcription process incorporating state and control trajectories \(({\mathbf {x}}_s,{\mathbf {u}}_s)\) as decision variables. This is in contrast to differential dynamic programming, which only transcribe the control sequence \({\mathbf {u}}_s\) and obtain \({\mathbf {x}}_s\) by integrating the system dynamics (i.e., a single shoot).

Fig. 2
figure 2

A schematic of a direct multiple shooting formulation. Different intermediate states are introduced as decision variables. A set of equality constraints enforces the dynamics feasibility. The black dots represent the initial states used for the system roll-out. The dashed lines represent the gaps between the roll-outs. Dynamic feasibility is achieved once these gaps are closed

2.1.1 Numerical behavior of direct multiple shooting

Algorithms for nonlinear programming aim at finding the Karush–Kuhn–Tucker (KKT) conditions defined by the FONC of optimality (Nocedal & Wright, 2006). This process involves iteratively solving a KKT problem (i.e., linear system of equations) until satisfaction of a stopping criterion. In the line search strategy, the solution of this KKT problem provides a search direction \(\delta {\mathbf {w}}_k\), and the selected step length \(\alpha \) defines how much the current guess \({\mathbf {w}}_k\) moves along that direction, i.e., \({\mathbf {w}}_{k+1} = {\mathbf {w}}_{k} \oplus \alpha \delta {\mathbf {w}}_{k}\). Note that the integrator operator \(\oplus \) enables us to optimize over the manifold [as in Frese (2008), Gabay (1982)], however, it is a feature that general-purpose nonlinear programming libraries often does not have.

We can easily analyze the numerical behavior of direct multiple shooting formulations by focusing on the KKT problem for the shooting interval k only. This is possible because of the temporal/Markovian structure of optimal control problems. Therefore, when we apply the Newton method on this KKT sub-problem together with the Bellman’s principle of optimality, we obtain:

$$\begin{aligned} \begin{bmatrix} \varvec{\ell }_{\mathbf {xx}_k} &{} \varvec{\ell }_{\mathbf {ux}_k}^\top &{} {\mathbf {f}}^\top _{{\mathbf {x}}_k} &{} \\ \varvec{\ell }_{\mathbf {ux}_k} &{} \varvec{\ell }_{\mathbf {uu}_k} &{} {\mathbf {f}}^\top _{{\mathbf {u}}_k} &{} \\ {\mathbf {f}}_{{\mathbf {x}}_k} &{} {\mathbf {f}}_{{\mathbf {u}}_k} &{} &{} -{\mathbf {I}} \\ &{} &{} -{\mathbf {I}} &{} {\mathcal {V}}_{\mathbf {xx}_{k+1}} \end{bmatrix} \begin{bmatrix} \delta {\mathbf {x}}_k \\ \delta {\mathbf {u}}_k \\ \varvec{\lambda }^+_{k+1} \\ \delta {\mathbf {x}}_{k+1} \end{bmatrix} = - \begin{bmatrix} \varvec{\ell }_{{\mathbf {x}}_k} \\ \varvec{\ell }_{{\mathbf {u}}_k} \\ \bar{\mathbf {f}}_{k+1} \\ {\mathcal {V}}_{{\mathbf {x}}_{k+1}} \end{bmatrix}, \end{aligned}$$

where Eq. (2) defines the stationary condition (first and second rows) and the primal feasibility (third row) of the FONC of optimality, respectively; \(\delta {\mathbf {x}}_k\)\(\delta {\mathbf {u}}_k\) define the search direction for the primal variables; \(\varvec{\lambda }^+_{k+1}\) is the updated Lagrangian multipliers; \(\varvec{\ell }_{{\mathbf {x}}_k}\), \(\varvec{\ell }_{{\mathbf {u}}_k}\), and \(\varvec{\ell }_{\mathbf {xx}_k}\), \(\varvec{\ell }_{\mathbf {xu}_k}\), \(\varvec{\ell }_{\mathbf {uu}_k}\) are the Jacobians and Hessians of the cost function; \({\mathbf {f}}_{{\mathbf {x}}_k}\), \({\mathbf {f}}_{{\mathbf {u}}_k}\) are the Jacobians of the system dynamics; and \({\mathcal {V}}_{{\mathbf {x}}_k}\), \({\mathcal {V}}_{\mathbf {xx}_k}\) are the gradient and Hessian of the value function. Note that we apply the Gauss-Newton (GN) approximation as we ignore the Hessian of the system dynamics to avoid expensive tensor-vector multiplications.

When we factorize this system of equations, the resulting search direction always satisfies the dynamics constraints if the Jacobians and Hessians are constant (i.e., a LQR problem). However, if we apply an \(\alpha \)-step, the gap of the dynamics closes by a factor of \((1-\alpha )\). We observe this by inspecting the primal feasibility at the next iteration:

$$\begin{aligned} \bar{\mathbf {f}}_{k+1}^{i+1}= & {} \bar{\mathbf {f}}_{k+1}^{i} - (\delta {\mathbf {x}}_{k+1} - \mathbf {f_x}_k\delta {\mathbf {x}}_k - \mathbf {f_u}_k\delta {\mathbf {u}}_k)\nonumber \\= & {} (1-\alpha )({\mathbf {f}}({\mathbf {x}}^i_k,{\mathbf {u}}^i_k)\ominus {\mathbf {x}}^i_{k+1}), \end{aligned}$$

where, by definition in Eq. (2) (third row), we have that

$$\begin{aligned} \alpha \bar{\mathbf {f}}^i_{k+1} = \delta {\mathbf {x}}_{k+1}-\mathbf {f_x}_k\delta {\mathbf {x}}_k-\mathbf {f_u}_k\delta {\mathbf {u}}_k, \end{aligned}$$

with the gap defined as \(\bar{\mathbf {f}}^i_{k+1}={\mathbf {f}}({\mathbf {x}}^i_k,{\mathbf {u}}^i_k) \ominus {\mathbf {x}}^i_{k+1}\), and i describes the iteration number.

As described later in Sect. 3, injecting this numerical behavior can be interpreted as a feasibility-driven approach for multiple shooting. However, our approach operates quite differently from classical multiple shooting approaches. For instance, it does not increase the computation time by defining extra state (decision) variables. But there is no such thing as a free lunch as our approach cannot temporarily increase the defects (e.g., to reduce the cost value) after taking its first full step (\(\alpha =1\)).

2.1.2 Advantages of direct multiple shooting

The rationale for a direct multiple shooting approach (namely, adding \({\mathbf {x}}_s\) as decision variables) is to distribute the nonlinearities of the dynamics over the entire horizon (Diehl et al., 2006). To illustrate this statement, we recognize that integrating over a horizon implies recursively calling integrator functions, i.e.,

$$\begin{aligned} {\mathbf {x}}_{k+1}= & {} {\mathbf {f}}({\mathbf {x}}_k, {\mathbf {u}}_k) \nonumber \\= & {} {\mathbf {f}}({\mathbf {f}}({\mathbf {x}}_{k-1}, {\mathbf {u}}_{k-1}), {\mathbf {u}}_k) \nonumber \\= & {} \vdots \nonumber \\= & {} {\mathbf {f}}({\mathbf {f}}(\cdots {\mathbf {f}}({\mathbf {x}}_0,{\mathbf {u}}_0), {\mathbf {u}}_{k-1}), {\mathbf {u}}_k), \end{aligned}$$

in which the nonlinearity increases along the horizon. This means that the local prediction, constructed by the derivatives of the nonlinear system in the KKT problem, becomes more inaccurate as the horizon increases. This is a well recognized drawback of single shooting approaches, and certainly a numerical limitation of the Box-DDP\(^+\) algorithm described below.

2.2 Connection between KKT, PMP and HJB branches

The fourth row of Eq. (2) connects the value function with the Lagrange multipliers associated with the state equations. By definition, this multiplier corresponds to the next costate value at node k, which reveals an interesting connection with the PMP used in indirect multiple shooting methods and the KKT approach, i.e.

$$\begin{aligned} \varvec{\lambda }^+_{k} = {\mathcal {V}}_{{\mathbf {x}}_k} + {\mathcal {V}}_{\mathbf {xx}_k}\delta {\mathbf {x}}_k. \end{aligned}$$

This might be not surprising if we realize that the PMP or KKT approach write the optimal control in term of the costate, while HJB expresses it in terms of the value function. Equation (2) is also at the heart of direct single shooting approaches such as DDP if each dynamics gap \(\bar{\mathbf {f}}_{k+1}\) vanishes, therefore this connection holds for single shooting settings as well.

Different interpretations can arise from this connection. For instance, DDP or our approach can be interpreted as iterative methods for solving the PMP in discrete-time optimal control problems under single and multiple shooting settings, respectively. Furthermore, under the context of linear dynamics and quadratic cost, DDP or our approach can be classified as global methods as they compute an optimal policy (i.e., a closed-loop solution).

2.3 Differential dynamic programming with control limits

As proposed by Tassa et al. (2014), Box-DDP locally approximates the value function at node k as

$$\begin{aligned} \begin{aligned} {\mathcal {V}}_k(\delta {\mathbf {x}}_k)&= \min \limits _{\delta {\mathbf {u}}_k} \ell _k(\delta {\mathbf {x}}_k,\delta {\mathbf {u}}_k) + {\mathcal {V}}_{k+1}({\mathbf {f}}(\delta {\mathbf {x}}_k,\delta {\mathbf {u}}_k)),\\ \quad \text {s.t.}&\qquad \underline{\mathbf {u}} \le {\mathbf {u}}_k + \delta {\mathbf {u}}_k \le \bar{\mathbf {u}}~, \end{aligned} \end{aligned}$$

which breaks the optimal control problem into a sequence of simpler sub-problems with control bounds. Then, a local search direction is computed through a linear quadratic (LQ) approximation of the value function:

$$\begin{aligned}&\delta {\mathbf {u}}^*_k(\delta {\mathbf {x}}_k) = \arg \min _{\delta {\mathbf {u}}_k} \overbrace{\frac{1}{2} \begin{bmatrix} 1 \\ \delta {\mathbf {x}}_k \\ \delta {\mathbf {u}}_k \end{bmatrix}^\top \begin{bmatrix} 0 &{} {\mathbf {Q}}^\top _{{\mathbf {x}}_k} &{} {\mathbf {Q}}^\top _{{\mathbf {u}}_k} \\ {\mathbf {Q}}_{{\mathbf {x}}_k} &{} {\mathbf {Q}}_{\mathbf {xx}_k} &{} {\mathbf {Q}}_{\mathbf {xu}_k} \\ {\mathbf {Q}}_{{\mathbf {u}}_k} &{} {\mathbf {Q}}^\top _{\mathbf {xu}_k} &{} {\mathbf {Q}}_{\mathbf {uu}_k} \end{bmatrix} \begin{bmatrix} 1 \\ \delta {\mathbf {x}}_k \\ \delta {\mathbf {u}}_k \end{bmatrix}}^{Q_k(\delta {\mathbf {x}}_k,\delta {\mathbf {u}}_k,\bar{{\mathcal {V}}}_k)},\nonumber \\&\quad \text {s.t.} \quad \underline{\mathbf {u}} \le {\mathbf {u}}_k + \delta {\mathbf {u}}_k \le \bar{\mathbf {u}}, \end{aligned}$$

where the \({\mathbf {Q}}_k\) terms describe the LQ approximation of the action-value function \(Q_k(\cdot )\) that can be seen as a function of the derivatives of the value function \(\bar{{\mathcal {V}}}_k=({\mathcal {V}}_{{\mathbf {x}}_k},{\mathcal {V}}_{\mathbf {xx}_k})\). Solving Eq. (6) results in a local feedback control law \(\delta {\mathbf {u}}_k={\mathbf {k}}_k+{\mathbf {K}}_k\delta {\mathbf {x}}_k\) consisting of a feed-forward term \({\mathbf {k}}_k\) and a state feedback gain \({\mathbf {K}}_k\) for each discretization point k. Below, we describe how to compute the \({\mathbf {Q}}_k\) terms in the so-called Riccati sweep step.

2.3.1 Riccati sweep

The LQ approximation of the action-value function \(Q_k\) is computed recursively, backwards in time, as follows

$$\begin{aligned} {\mathbf {Q}}_{{\mathbf {x}}_k}= & {} \varvec{\ell }_{{\mathbf {x}}_k} + {\mathbf {f}}^\top _{{\mathbf {x}}_k}{\mathcal {V}}_{{\mathbf {x}}_{k+1}}, \nonumber \\ {\mathbf {Q}}_{{\mathbf {u}}_k}= & {} \varvec{\ell }_{{\mathbf {u}}_k} + {\mathbf {f}}^\top _{{\mathbf {u}}_k}{\mathcal {V}}_{{\mathbf {x}}_{k+1}}, \nonumber \\ {\mathbf {Q}}_{\mathbf {xx}_k}= & {} \varvec{\ell }_{\mathbf {xx}_k} + {\mathbf {f}}^\top _{{\mathbf {x}}_k}{\mathcal {V}}_{\mathbf {xx}_{k+1}} {\mathbf {f}}_{{\mathbf {x}}_k},\nonumber \\ {\mathbf {Q}}_{\mathbf {xu}_k}= & {} \varvec{\ell }_{\mathbf {xu}_k} + {\mathbf {f}}^\top _{{\mathbf {x}}_k}{\mathcal {V}}_{\mathbf {xx}_{k+1}} {\mathbf {f}}_{{\mathbf {u}}_k},\nonumber \\ {\mathbf {Q}}_{\mathbf {uu}_k}= & {} \varvec{\ell }_{\mathbf {uu}_k} + {\mathbf {f}}^\top _{{\mathbf {u}}_k}{\mathcal {V}}_{\mathbf {xx}_{k+1}} {\mathbf {f}}_{{\mathbf {u}}_k}, \end{aligned}$$

where \({\mathcal {V}}_{{\mathbf {x}}_{k+1}}\), \({\mathcal {V}}_{\mathbf {xx}_{k+1}}\) are obtained by solving the following algebraic Riccati equations at \(k+1\):

$$\begin{aligned} {\mathcal {V}}_{{\mathbf {x}}_{k+1}}= & {} {\mathbf {Q}}_{{\mathbf {x}}_{k+1}} - {\mathbf {Q}}_{\mathbf {xu}_{k+1}} \hat{\mathbf {Q}}_{\mathbf {uu},\text {f}_{k+1}}^{-1} {\mathbf {Q}}_{{\mathbf {u}}_{k+1}},\nonumber \\ {\mathcal {V}}_{\mathbf {xx}_{k+1}}= & {} {\mathbf {Q}}_{\mathbf {xx}_{k+1}} - {\mathbf {Q}}_{\mathbf {xu}_{k+1}} \hat{\mathbf {Q}}_{\mathbf {uu},\text {f}_{k+1}}^{-1} {\mathbf {Q}}_{\mathbf {xu}_{k+1}}^\top , \end{aligned}$$

with \(\hat{\mathbf {Q}}_{\mathbf {uu},\text {f}_{k+1}}\) as the control Hessian in the free space, which we will describe below. Additionally, we use the gradient and Hessian of the value function to find a local search direction as described below. In the case of Box-DDP\(^+\), our adaptation of Box-DDP to allow initialization with state trajectories, the gradient of the value function in the first iteration is relinearized by the initialization infeasibility \(\bar{\mathbf {f}}^0_{k+1}\) as \({\mathcal {V}}_{{\mathbf {x}}_{k+1}}+{\mathcal {V}}_{\mathbf {xx}_{k+1}}\bar{\mathbf {f}}^0_{k+1}\).

2.3.2 Control-bounded direction

We compute the control-bounded direction, defined in Eq. (6), by breaking it down into the so-called feed-forward and feedback sub-problems (Tassa et al., 2014). We first compute the feed-forward term by solving the following quadratic programming (QP) program with box constraints:

$$\begin{aligned} \begin{aligned} {\mathbf {k}}_k&= \arg \min _{\delta {\mathbf {u}}_k} \frac{1}{2}\delta {\mathbf {u}}_k^\top {\mathbf {Q}}_{\mathbf {uu}_k}\delta {\mathbf {u}}_k + {\mathbf {Q}}_{{\mathbf {u}}_k}^\top \delta {\mathbf {u}}_k,\\ \quad \text {s.t.}&\qquad \underline{\mathbf {u}} \le {\mathbf {u}}_k + \delta {\mathbf {u}}_k \le \bar{\mathbf {u}}, \end{aligned} \end{aligned}$$

and then the feedback gain as

$$\begin{aligned} {\mathbf {K}}_k=-\hat{\mathbf {Q}}^{-1}_{\mathbf {uu},\text {f}_k}{\mathbf {Q}}_{\mathbf {ux}_k}, \end{aligned}$$

where \(\hat{\mathbf {Q}}_{\mathbf {uu},\text {f}_k}\) is the control Hessian of the free subspace obtained in Eq. (9) in which \(\text {f}_k\) describes the free subspace at node k, i.e., the indexes of the inactive bounds. With these indexes, we sort and partition the control Hessian as:

$$\begin{aligned} {\mathbf {Q}}_{\mathbf {uu}_k}= \begin{bmatrix} {\mathbf {Q}}_{\mathbf {uu},\text {f}_k} &{} {\mathbf {Q}}_{\mathbf {uu},\text {fc}_k} \\ {\mathbf {Q}}_{\mathbf {uu},\text {cf}_k} &{} {\mathbf {Q}}_{\mathbf {uu},\text {c}_k} \end{bmatrix}, \end{aligned}$$

and compute \(\hat{\mathbf {Q}}^{-1}_{\mathbf {uu},\text {f}_k}\) internally based on the factorization of \({\mathbf {Q}}^{-1}_{\mathbf {uu},\text {f}_k}\). This is what our  Box-QP program does to solve the feed-forward sub-problem efficiently via the Projected-Newton QP algorithm (Bertsekas, 1982). This algorithm quickly identifies the active set and moves along the free subspace of the Newton step. It also has a similar computational cost to the unconstrained QP if the active set remains unchanged. Thus, the runtime performance is similar to the DDP algorithm. However, it requires a feasible initialization \(\delta {\mathbf {u}}^0_k\).

Again, by using a Projected-Newton QP algorithm, we further efficiently obtain the control Hessian of the free subspace \({\mathbf {Q}}^{-1}_{\mathbf {uu},\text {f}_k}\) as the algorithm computes it internally when it moves along the free subspace of the Newton step. With it, we compute a state feedback gain that generates corrections within the control limits. This is an important feature for controlling the robot as well as for rolling-out the nonlinear dynamics in the forward pass. For more details about the Projected-Newton QP algorithm see Bertsekas (1982).

2.3.3 State integration

In any DDP algorithm such as Box-DDP, we perform a state integration using the locally-linear policy as

$$\begin{aligned} \hat{\mathbf {x}}_0= & {} {\mathbf {x}}_0,\nonumber \\ \hat{\mathbf {u}}_k= & {} {\mathbf {u}}_k + \alpha {\mathbf {k}}_k + {\mathbf {K}}_k (\hat{\mathbf {x}}_k \ominus {\mathbf {x}}_k),\quad \scriptstyle {\forall k=\{0,1,\cdots ,N-1\}}\nonumber \\ \hat{\mathbf {x}}_{k+1}= & {} {\mathbf {f}}(\hat{\mathbf {x}}_k, \hat{\mathbf {u}}_k),\qquad \scriptstyle {\forall k=\{0,1,\cdots ,N-1\}} \end{aligned}$$

where \(\hat{\mathbf {x}}_k\), \(\hat{\mathbf {u}}_k\) are the new state and control at node k generated using a step length \(\alpha \). The feedback gain helps to distribute the nonlinearities; however, as seen in the previous section, it does not resemble the numerical behavior described by the FONC of optimality in direct multiple shooting. This different numerical behavior stems from the state integration procedure closing the gaps, i.e., \(\bar{\mathbf {f}}_k = {\mathbf {0}}\), \(\forall k=\{0,1,\cdots ,N\}\).

2.3.4 Expected improvement

When solving the algebraic Riccati equations, we obtain the expected improvement as

$$\begin{aligned} \Delta {\mathcal {V}}_k = -\frac{1}{2}{\mathbf {Q}}_{{\mathbf {u}}_k}^T \hat{\mathbf {Q}}_{\mathbf {uu},\text {f}_k}^{-1} {\mathbf {Q}}_{{\mathbf {u}}_k}. \end{aligned}$$

Below, we elaborate the proposed algorithm based on the aforementioned description.

3 Box-FDDP: a feasibility-driven approach for multiple shooting

We now introduce a novel algorithm that combines a feasibility-driven search (Sect. 2.1) with an active set treatment of the control limits (Sect. 2.3) named Box-FDDP. The Box-FDDP algorithm comprises two modes: feasibility-driven and control-bounded modes, one of which is chosen for a given iteration (Algorithm 1). The feasibility-drivenFootnote 4 mode mimics the numerical resolution of a direct multiple shooting problem with only dynamics constraints when computing the search direction and step length (lines 410 and 1121, respectively). This mode neglects the control limits of the system as its focuses on dynamic feasibility only. In contrast, the control-bounded mode projects the search direction onto the feasible control region whenever the dynamics constraint is feasible (line 10). In both modes, the applied controls in the forward pass are projected onto their feasible box (line 13), causing dynamically-infeasible iterations to reach the control box. With this strategy, our solver focuses on feasibility early on, which increases its basins of attraction, and later on optimality. Technical descriptions of both modes are elaborated in Sects. 3.1 and 3.2. Note that the existence of feasible descent directions are introduced later in Sect. 3.3.

figure h

3.1 Search direction

In the standard Box-DDP algorithm, an initial forward pass is performed to obtain the initial state trajectory \({\mathbf {x}}_s\). This trajectory enforces the dynamics explicitly; thus, the gaps are zero, i.e., \(\bar{\mathbf {f}}_{k}={\mathbf {0}}\) for all \(k=\{0,1,\cdots ,N-1\}\). Instead, our multiple shooting variant, Box-FDDP, computes the gaps once at each iteration (line 3), which are used to find the search direction and to compute the expected improvement. However, if the iteration is dynamically feasible, then the search direction procedure is the same as in the standard Box-DDP (Tassa et al., 2014). Below, we describe the steps performed to compute the search direction.

3.1.1 Computing the gaps

Given a current iterate \(({\mathbf {x}}_s,{\mathbf {u}}_s)\), we perform a nonlinear roll-out to compute the gaps as

$$\begin{aligned} \bar{\mathbf {f}}_{k+1} := {\mathbf {f}}({\mathbf {x}}_k,{\mathbf {u}}_k) \ominus {\mathbf {x}}_{k+1}, \quad \scriptstyle {\forall k=\{0,1,\cdots ,N-1\}} \end{aligned}$$

where \({\mathbf {f}}({\mathbf {x}}_k,{\mathbf {u}}_k)\) is the roll-out state at interval \(k+1\), \({\mathbf {x}}_{k+1}\) is the next shooting state, and \(\ominus \) is the difference operator.

3.1.2 Action-value function of direct multiple shooting formulation

Without loss of generality, we use the Gauss-Newton (GN) approximation (Li & Todorov, 2004) to write the action-value function of our algorithm as

$$\begin{aligned} \begin{aligned} Q_k(\cdot )&= \frac{1}{2} \begin{bmatrix} 1 \\ \delta {\mathbf {x}}_{k+1} \end{bmatrix}^\top \begin{bmatrix} 0 &{} {\mathcal {V}}^\top _{{\mathbf {x}}_{k+1}} \\ {\mathcal {V}}_{{\mathbf {x}}_{k+1}} &{} {\mathcal {V}}_{\mathbf {xx}_{k+1}} \end{bmatrix} \begin{bmatrix} 1 \\ \delta {\mathbf {x}}_{k+1} \end{bmatrix}&\\&\quad + \frac{1}{2} \begin{bmatrix} 1 \\ \delta {\mathbf {x}}_k \\ \delta {\mathbf {u}}_k \end{bmatrix}^\top \begin{bmatrix} 0 &{} \varvec{\ell }^\top _{{\mathbf {x}}_k} &{} \varvec{\ell }^\top _{{\mathbf {u}}_k} \\ \varvec{\ell }_{{\mathbf {x}}_k} &{} \varvec{\ell }_{\mathbf {xx}_k} &{} \varvec{\ell }_{\mathbf {xu}_k} \\ \varvec{\ell }_{{\mathbf {u}}_k} &{} \varvec{\ell }^\top _{\mathbf {xu}_k} &{} \varvec{\ell }_{\mathbf {uu}_k} \end{bmatrix} \begin{bmatrix} 1 \\ \delta {\mathbf {x}}_k \\ \delta {\mathbf {u}}_k \end{bmatrix}&\end{aligned}, \end{aligned}$$

where again \(\varvec{\ell }_{{\mathbf {x}}_k},\varvec{\ell }_{{\mathbf {u}}_k}\) and \(\varvec{\ell }_{\mathbf {xx}_k},\varvec{\ell }_{\mathbf {xu}_k},\varvec{\ell }_{\mathbf {uu}_k}\) describe the gradient and Hessian of the cost function, respectively; \(\delta {\mathbf {x}}_{k+1}={\mathbf {f}}_{{\mathbf {x}}_k}\delta {\mathbf {x}}_k+{\mathbf {f}}_{{\mathbf {u}}_k}\delta {\mathbf {u}}_k\) is the linearized dynamics; and \({\mathbf {f}}_{{\mathbf {x}}_k},{\mathbf {f}}_{{\mathbf {u}}_k}\) are its Jacobians.

In direct multiple shooting settings, linearization of the system dynamics includes a drift term

$$\begin{aligned} \delta {\mathbf {x}}_{k+1} = \mathbf {f_x}_k\delta {\mathbf {x}}_k + \mathbf {f_u}_k\delta {\mathbf {u}}_k + \bar{\mathbf {f}}_{k+1}, \end{aligned}$$

as there are gaps in the dynamics term \(\bar{\mathbf {f}}_{k+1}\) produced between subsequent shooting segments (q.v. Fig. 2). Then, the Riccati sweep needs to be adapted as follows:

$$\begin{aligned} {\mathbf {Q}}_{{\mathbf {x}}_k}= & {} \varvec{\ell }_{{\mathbf {x}}_k} + {\mathbf {f}}^\top _{{\mathbf {x}}_k}{\mathcal {V}}^+_{{\mathbf {x}}_{k+1}}, \nonumber \\ {\mathbf {Q}}_{{\mathbf {u}}_k}= & {} \varvec{\ell }_{{\mathbf {u}}_k} + {\mathbf {f}}^\top _{{\mathbf {u}}_k}{\mathcal {V}}^+_{{\mathbf {x}}_{k+1}}, \nonumber \\ {\mathbf {Q}}_{\mathbf {xx}_k}= & {} \varvec{\ell }_{\mathbf {xx}_k} + {\mathbf {f}}^\top _{{\mathbf {x}}_k}{\mathcal {V}}_{\mathbf {xx}_{k+1}} {\mathbf {f}}_{{\mathbf {x}}_k},\nonumber \\ {\mathbf {Q}}_{\mathbf {xu}_k}= & {} \varvec{\ell }_{\mathbf {xu}_k} + {\mathbf {f}}^\top _{{\mathbf {x}}_k}{\mathcal {V}}_{\mathbf {xx}_{k+1}} {\mathbf {f}}_{{\mathbf {u}}_k},\nonumber \\ {\mathbf {Q}}_{\mathbf {uu}_k}= & {} \varvec{\ell }_{\mathbf {uu}_k} + {\mathbf {f}}^\top _{{\mathbf {u}}_k}{\mathcal {V}}_{\mathbf {xx}_{k+1}} {\mathbf {f}}_{{\mathbf {u}}_k} \end{aligned}$$

in which

$$\begin{aligned} {\mathcal {V}}^+_{{\mathbf {x}}_{k+1}}={\mathcal {V}}_{{\mathbf {x}}_{k+1}}+{\mathcal {V}}_{\mathbf {xx}_{k+1}}\bar{\mathbf {f}}_{k+1} \end{aligned}$$

is the gradient of the value function after the deflection produced by \(\bar{\mathbf {f}}_{k+1}\) (also described above as relinearization). Note that the Hessian of the value function remains unchanged as DDP approximates the value function through a LQ model. Additionally, this modification affects the values of the Riccati equations, Eq. (8), and expected improvement, Eq. (13), as they depend on the gradient of the value function.

3.1.3 Feasibility-driven direction

For dynamically-infeasible iterates (line 7), we ignore the control constraints and compute a control-unbounded direction:

$$\begin{aligned} \begin{aligned} {\mathbf {k}}_k&=-{\mathbf {Q}}_{\mathbf {uu}_k}^{-1}{\mathbf {Q}}_{{\mathbf {u}}_k},&\\ {\mathbf {K}}_k&=-{\mathbf {Q}}_{\mathbf {uu}_k}^{-1}{\mathbf {Q}}_{\mathbf {ux}_k}.&\end{aligned} \end{aligned}$$

We do this because we cannot quantify the effect of the gaps on the control bounds, which are needed to solve the feed-forward sub-problem Eq. (9). Our approach is equivalent to opening the control bounds during dynamically-infeasible iterates.

3.1.4 Control-bounded direction

We warm-start the Box-QP using the feed-forward term \({\mathbf {k}}_k\) computed in the previous iteration. However, if the algorithm is switching from feasibility to control-bounded mode (i.e., the previous iteration is infeasible), then \({\mathbf {k}}_k\) might fall outside the feasible box and \(\underline{\mathbf {u}}-{\mathbf {u}}_k\le {\mathbf {k}}_k\le \bar{\mathbf {u}}-{\mathbf {u}}_k\) do not hold. This violates the assumption of the previously-described Box-QP, for which a feasible initial point needs to be provided.

To handle infeasible iterates, we propose to project the warm-start of the Box-QP (line 9) as

$$\begin{aligned} \llbracket {\mathbf {k}}_k\rrbracket _{\underline{\mathbf {u},\bar{u}}} = \min {(\max {({\mathbf {k}}_k,\underline{\mathbf {u}}-{\mathbf {u}}_k)},\bar{\mathbf {u}}-{\mathbf {u}}_k)}, \end{aligned}$$

where \(\underline{\mathbf {u}}\), \(\bar{\mathbf {u}}\) are the lower and upper bounds of the feed-forward sub-problem, Eq. (9), respectively.

Once we project the warm-start \({\mathbf {k}}_k\), we solve the feed-forward and feedback sub-problems as explained in Sect. 2.3.2. Furthermore, we solve the Box-QP using a Projected-Newton method (Bertsekas, 1982), which handles box constraints efficiently as described above.

3.2 Step length

As far as we know, the standard Box-DDP (Tassa et al., 2014) modifies only the search direction (i.e., backward pass) to handle the control limits. However, it is also important to project the controls onto the feasible box during the forward pass. We do this by finding a step length that minimizes the cost (Nocedal & Wright, 2006).

3.2.1 Projecting the roll-out towards the feasible box

We propose to project the controls onto the feasible box in the nonlinear roll-out (line 13), i.e.,

$$\begin{aligned} \hat{\mathbf {u}}_k \leftarrow \min {(\max {(\hat{\mathbf {u}}_k,\underline{\mathbf {u}})},\bar{\mathbf {u}})}, \end{aligned}$$

where \(\hat{\mathbf {u}}_k\) is the updated control from the control policy. Our method does not require to solve another QP problem (Xie et al., 2017) or to project the linear search direction given the gaps on the dynamics (Howell et al., 2019). Furthermore, the control policy considers a gap prediction that guarantees a feasible descent direction. We formally describe the technical details of this procedure in Sect. 3.2.3.

3.2.2 Updating the gaps

As analyzed earlier, the evolution of the gaps in direct multiple shooting is affected by the selected step length. For an optimal control problem without control limits, this evolution is defined as

$$\begin{aligned} \bar{\mathbf {f}}_k \leftarrow (1-\alpha )\bar{\mathbf {f}}_k, \end{aligned}$$

where \(\alpha \) is the step-length found by the line-search procedure (line 1121). Note that a full step \((\alpha =1)\) closes the gaps completely. We described this gap contraction rate in Sect. 2.1.1.

3.2.3 Nonlinear step

With a nonlinear roll-outFootnote 5 (line 18), we avoid the linear prediction error of the dynamics that is typically handled by a merit function in general-purpose NLP algorithms, as explained in Mastalli et al. (2020). If we keep the gap-contraction rate of Eq. (22), then we obtain

$$\begin{aligned} \hat{\mathbf {x}}_{k}= & {} {\mathbf {f}}(\hat{\mathbf {x}}_{k-1},\hat{\mathbf {u}}_{k-1}) \oplus (\alpha - 1)\bar{\mathbf {f}}_{k-1}, \nonumber \\ \hat{\mathbf {u}}_k= & {} {\mathbf {u}}_k + \alpha {\mathbf {k}}_k + {\mathbf {K}}_k(\hat{\mathbf {x}}_k\ominus {\mathbf {x}}_k), \end{aligned}$$

where \(\hat{\mathbf {x}}_{k}\), \(\hat{\mathbf {u}}_{k}\) are the next state and control along an \(\alpha \)-step; \({\mathbf {k}}_k\) and \({\mathbf {K}}_k\) are the feed-forward term and feedback gains computed by Eq. (19) or Eq. (9)–(10). Furthermore, the initial condition of the roll-out is defined as \(\hat{\mathbf {x}}_0=\mathbf {\tilde{x}}_0 \oplus (\alpha -1) \bar{\mathbf {f}}_0\). Note that this is in contrast to the standard Box-DDP, in which the gaps are always closed, even for \(\alpha < 1\).

Avoiding the use of a merit function helps the algorithm to check the search direction more accurately. Indeed, it has been shown that the nonlinear roll-out is more effective than a standard line search procedure as it reduces the number of iterations (Liao & Shoemaker, 1992).

3.2.4 Expected Improvement

It is critical to properly evaluate the success of a trial step. Given the current dynamics gaps \(\bar{{\mathbf {f}}}_k\), Box-FDDP computes the expected improvement of a computed search direction as

$$\begin{aligned} \Delta J(\alpha ) = \Delta _1\alpha + \frac{1}{2}\Delta _2\alpha ^2, \end{aligned}$$


$$\begin{aligned} \Delta _1 = \sum _{k=0}^N {\mathbf {k}}_k^\top {\mathbf {Q}}_{{\mathbf {u}}_k} +\bar{\mathbf {f}}_k^\top ({\mathcal {V}}_{{\mathbf {x}}_k} - {\mathcal {V}}_{\mathbf {xx}_k}\delta \hat{\mathbf {x}}_k),\nonumber \\ \Delta _2 = \sum _{k=0}^N {\mathbf {k}}_k^\top \hat{\mathbf {Q}}_{\mathbf {uu},\text {f}_k}{\mathbf {k}}_k + \bar{\mathbf {f}}_k^\top (2{\mathcal {V}}_{\mathbf {xx}_k}\delta \hat{\mathbf {x}}_k - {\mathcal {V}}_{\mathbf {xx}_k}\bar{\mathbf {f}}_k), \end{aligned}$$

where \(\hat{\mathbf {Q}}_{\mathbf {uu},\text {f}_k}\) is the control Hessian of the free space, \(\delta \hat{\mathbf {x}}_k = \hat{\mathbf {x}}_k\ominus {\mathbf {x}}_k\), and J is the total cost of a given state-control trajectory (\({\mathbf {x}}_s\), \({\mathbf {u}}_s\)). We use this expected improvement model for both modes. Note that, in the feasibility-driven mode, the free space spans the entire control space; instead, in the control-bounded mode, the gaps are zero.

We obtain this expression by computing the cost from a linear roll-out of the current control policy as described in Eq. (23). We also accept ascent directions when evaluating the trial step, our approach is inspired by the Goldstein condition (Nocedal & Wright, 2006, Chapter 3):

$$\begin{aligned} \ell '-\ell \le {\left\{ \begin{array}{ll} b_1 \Delta J(\alpha ) &{} \text {if }\Delta J(\alpha )\le 0 \\ b_2 \Delta J(\alpha ) &{} \text {otherwise} \end{array}\right. }, \end{aligned}$$

where \(b_1\), \(b_2\) are adjustable parameters, we used in this paper \(b_1 = 0.1\) and \(b_2 = 2\). Ascent directions improve the algorithm convergence as it helps to reduce the feasibility error through a moderate increment in the cost.

3.2.5 Regularization

We regularize the \(\mathbf {Q_{uu}}\) and \({\mathcal {V}}_\mathbf {{xx}}\) terms through a Levenberg-Marquardt scheme (Fletcher, 1971). Concretely, we increase the damping value \(\mu \) when the computation of the feed-forward sub-problem—formulated in Eq. (9)—fails, or when the forward pass accepts a step length smaller than \(\alpha _0=0.01\). Moreover, we decrease the damping value if the iteration accepts a step larger than \(\alpha _1=0.5\). Both regularization procedures modify the values of the \(\mathbf {Q_{uu}}\) and \({\mathcal {V}}_\mathbf {{xx}}\) terms during the Riccati sweep computation as

$$\begin{aligned} \begin{aligned} \mu '&\leftarrow \beta _{i,d}\mu ,\\ \mathbf {Q_{uu}}&\leftarrow \mathbf {Q_{uu}} + \mu '{\mathbf {I}},\\ {\mathcal {V}}_\mathbf {{xx}}&\leftarrow {\mathcal {V}}_\mathbf {{xx}} + \mu '{\mathbf {I}}, \end{aligned} \end{aligned}$$

where \(\beta _i\) and \(\beta _d\) are the factorsFootnote 6 used to increase or decrease the current damping value \(\mu \), respectively; \(\mu '\) is the newly-computed damping value; and \({\mathbf {I}}\) is the identity matrix. Additionally, we start the regularization procedure with an initial, and user-defined, damping value.Footnote 7 We also define minimum and maximum damping values to avoid increasing or decreasing the damping value unnecessarily.Footnote 8

Both regularizations significantly increase the robustness of the algorithm and ensures convergence, as it moves from Newton direction to steepest-descent, and vice versa. The Newton direction, which occurs with \(\mu =0\), provides fast convergence and is robust against scaling because it exploits the Hessian of the problem. However, it does not always produce valid descent directions as \(\mathbf {Q_{uu}}\) might be indefine and the problem nonconvex. In such cases, increasing the damping value guarantees that \(\mathbf {Q_{uu}}\) is positive-define which, in turn, computes a search direction closer to the steepest-descent one. Instead, \({\mathcal {V}}_\mathbf {{xx}}\) enforces the state trajectory to be closer to the one previously computed (Tassa, 2011). It will also not result in vanishing feedback gains even for large damping values.

3.3 Existence of feasible descent directions

As described above, our approach has two main modes: feasibility-driven and control-bounded. During the feasibility-driven phase, we compute a search direction to drive the next guess towards dynamic feasibility and try a step while keeping the control within the box constraints. This projection procedure can be seen as a nonlinear term in our dynamics, but we assume its effect is negligible for finding a feasible direction. On the other hand, our algorithm computes a search direction that considers the box constraints after the dynamic feasibility has been achieved. This is needed to improve the next current guess by taking control constraints into account when computing the feedback gains along the free subspace.

As analyzed in Sect. 2.1.1, the feasibility-driven direction is computed by mimicking the numerical behavior of a nonlinear program during the resolution of a direct multiple shooting problem with only dynamics constraints. It implies that the feasibility-driven search produces a descent direction, and eventually the algorithm converges, if the cost Hessian is a positive definite matrix. Indeed, the positiveness is always guaranteed by our regularization procedure as described before. Furthermore, the feasibility-driven step aims at reducing the nonlinearities produced by a single shooting formulation (e.g., DDP algorithm). When the dynamics are feasible, we apply a control-bounded search which also produces a descent direction as it boils down to a QP program.

In the next section, we present a series of results that demonstrate the benefits of our feasibility-driven approach.

4 Results

We analyze Box-FDDP across a wide range of optimal control problems (briefly introduced in Sect. 4.1) as follows. First, in Sect. 4.2, we show the benefits of the feasibility mode by analyzing the gap contraction and its connection with the nonlinearities in the dynamics. Second, we compare our algorithm against a direct transcription formulation in Sect. 4.4. Concretely, we compare the dynamic feasibility and optimality evolutions, runtime performance, and robustness to different initial guesses against the interior point and active set algorithms available in Knitro. Finally, in Sect. 4.5, we report the results of a squashing approach for solving the control bounds as it demonstrates the numerical performance of having two modes.

Fig. 3
figure 3

Snapshots of different robot maneuvers computed by Box-FDDP. a traversing a narrow passage with a quadcopter (goal); b Talos balancing on a single leg (taichi); c aggressive jumping of 30 cm that reaches ANYmal limits (jump); d Talos dipping on a parallel bars (dip); e ANYmal hopping with two legs (hop); f Talos performing a pull-up bar task (pullup). To watch the video, click its respective figure or see https://youtu.be/bOGBPTh_lsU(Color figure online)

We benchmark the algorithms using an Intel Core i9-9900KF CPU with eight cores @ 3.60 GHz and 16 MB cache. Our implementation of Box-FDDP supports code generation, but all runtime performances reported henceforth do not employ it for fair comparison with other approaches. We use 8 threads to compute the cost and dynamics derivatives for the experiments with the Box-FDDP algorithm and the legged robots only. We use the same initial regularization and stopping criteria values for each problem, and the values are \(10^{-9}\) and \(5\times 10^{-5}\), respectively.

4.1 Optimal control problems

To provide empirical evidence on the benefits of the feasibility-driven approach, we developed a range of different optimal control problems: (1) an under-actuated double pendulum (pend); (2) a quadcopter navigating towards a goal (goal) or through a narrow passage (narrow) and looping (loop); (3) various gaits, aggressive jumps (jump) and unstable hopping (hop) in a quadruped robot; (4) whole-body manipulation (man), hand control while balancing in single leg (taichi), dip on parallel bars (dip) and a pull-up bar task (pullup) in a humanoid robot. Figure 3 shows snapshots of motions computed by Box-FDDP for some of these problems, and the accompanying video shows the entire motion sequences.Footnote 9

We describe the cost functions, dynamics, control limits, penalization terms, and initialization of each optimal control problem in Appendix A. Finally, some of these problems, as well as our implementation of the Box-FDDP algorithm, are publicly available in the Crocoddyl repository (Mastalli et al., 2019).

4.2 Advantages of the feasibility-driven mode

To understand the benefits of the feasibility-driven mode, we analyze the resulting total cost, number of iterations and total computation time obtained in both algorithms: Box-FDDP and Box-DDP\(^+\) using the same initial guess. Box-DDP\(^+\) is an improved version of the standard Box-DDP proposed in Tassa et al. (2014), which it accepts initialization for both: state and control trajectories as described above. This version accepts infeasible warm-starts as in Eq. (17), and it is available in the Crocoddyl repository. Without this modification, the standard Box-DDP could easily diverge (and not converge at all) when we initialize it using quasi-static torques in problems with medium to longer horizons.

4.2.1 Larger basin of attraction and convergence

In our experiments, Box-DDP\(^+\) was unable to generate jumping and hopping motions for quadrupeds, as well as pull-ups for humanoids, i.e., it failed to solve our jump, hop, and pullup task specifications (marked by the ✗ in Table 1). Box-DDP\(^+\) failed to generate such aggressive motions because trajectories satisfying all the specifications of those tasks are significantly distant from the initial guess provided to the solver. Box-DDP\(^+\) behaves poorly as it computes solution with higher cost value and computation time, which is critical drawback in model predictive control applications.

In contrast, our approach (Box-FDDP) was able to solve all of the tasks, and it did so with fewer iterations and lower total cost (see Table 1 and Fig. 4). Furthermore, Box-FDDP and Box-DDP\(^+\) have the same algorithmic complexity, but since our approach requires fewer iterations than Box-DDP\(^+\), the total computation time of our approach is lower. These results are a direct consequence of the feasibility-driven mode in our approach, which is able to find control sequences even when faced with poor initial guesses. The infeasible iterations ensure convergence from remote initial guesses through a balance between optimality and feasibility.

Finally, we would like to emphasize that the feasibility-driven mode of Box-FDDP can not only solve tasks that Box-DDP\(^+\) is unable to solve, but also improve the solutions of tasks that Box-DDP\(^+\) is able to solve. For instance, consider the quadcopter tasks goal, narrow, and loop: In the accompanying video, we show that our approach generates concise and smooth quadcopter trajectories, whereas Box-DDP\(^+\) generates jerky motions and with unnecessary loops, due to early projection of the control commands.

Table 1 Number of iterations, total cost, and average total computation time over 100 trials
Fig. 4
figure 4

Cost and convergence comparison for different optimal control problems. Box-FDDP outperformed Box-DDP\(^+\) in all the cases: (top) double pendulum (pend), quadcopter navigation (quad), and whole-body manipulation (man), Talos dipping on parallel bars (dip); and (bottom) whole-body balance (taichi), quadrupedal jumping (jump), quadrupedal hopping (hop), Talos’ pull-up workout (pullup). Box-FDDP (*-feas) solved the problem with fewer iterations and often with lower cost than Box-DDP\(^+\). Furthermore, Box-DDP\(^+\) failed to solve some of the hardest problems: i.e., quadrupedal jumping and hopping, Talos’ pull-up task. Our algorithm showed a large basin of attraction for local optimum as it is less sensitive to poor initialization compared with Box-DDP\(^+\). We use the same initial guess for both cases: Box-FDDP (*-feas) and Box-DDP\(^+\)(Color figure online)

4.2.2 Gap contraction and nonlinearities

We observed that the gap contraction rate is highly influenced by the nonlinearities of the system dynamics (see Fig. 5). When compared to the dynamics, the nonlinearities of the task often have a smaller effect (e.g., dip vs pullup). Indeed, the gap contraction speed followed the order: humanoid, quadruped, double pendulum, and quadcopter.

Fig. 5
figure 5

Gap contraction of Box-FDDP for different optimal control problems. For all the cases, the gaps were open for the first several iterations. The gap contraction rate varies according to the accepted step-length. Smaller contraction rates, during the first iterations, appeared in very nonlinear problems (taichi, man, hop, and jump), because of the larger error of the search direction(Color figure online)

Propagation errors due to the dynamics linearization have an important effect on the algorithm progress as Riccati recursions maintain a local quadratic approximation of the value function. The prediction of the expected improvement is indeed more accurate for systems with less nonlinearities, which is the reason why the algorithm tends to accept larger steps that result in higher gap contractions. Indeed, the effect of having a feasibility search is more significant in problems with very nonlinear dynamics as it reduces the total cost faster due to the nonlinearity distribution motivated in Sect. 2.1.2 (see Figs. 4 and 5). It also produces a low cost reduction rate during the gap contraction phase as the algorithm is first focusing on achieving dynamic feasibility.

Fig. 6
figure 6

Joint torques and velocities for the ANYmal jumping maneuver. (top) Generated torques of the LF joints and its limits (32 N m); (bottom) Generated velocities of the LF joint and its limits (7.5 rad/s). The red region describes the flight phase. Note that HAA, HFE, and KFE are the abduction/adduction, hip flexion/extension and knee flexion/extension joints, respectively (Color figure online)

4.2.3 Highly-dynamic and complex maneuvers

The Box-FDDP algorithm can solve a wide range of motions: from unstable and consecutive hops to aggressive and complex motor behaviors. In Fig. 6, we show the joint torques and velocities of a single leg for the ANYmal’s jumping problem (depicted in Fig. 3c). The motion consisted of three phases: jumping (0–300 ms), flying (300–700 ms), and landing (700–1000 ms). We used 0.7 as a friction coefficient and reduced the real joint limits of the ANYmal robot: from 40 to 32 N m (torque limits) and from 15 to 7.5 rad/s (velocity limits). Thus, generating a 30 cm jump becomes a very challenging task. Furthermore, in this experiment, the velocity limit violations appeared since we used quadratic penalization (with constant weights) to enforce them, and the swing phases are likely too short for such a large jump. Note that if we use constant weights, it might turn out that, for some cases, these weights are not big enough. Nonetheless, we only encountered these violations in very constrained problems. For instance, we did not find velocity violations for the walking, trotting, pacing, and bounding gaits (reported in the accompanying video). For these cases, Box-FDDP converged approximately with the same number of iterations achieved by the FDDP solver (i.e., a fully unconstrained case). This is due to the fact that the robot could generate those gaits without reaching its torque limits.

Surprisingly, naturally looking behaviors emerged during the computation of the dip and pullup problems on the Talos humanoid robot. We did not include any heuristic that could have helped the algorithm to generate these undefined behaviors. For instance, the balancing and leg-crossing on the bars emerge if we allocate a significant amount of time in that motion phase. Similarly, the pull-up motion emerges if we significantly increase the maximum torque limits on the arms.

4.3 Experimental trials

We demonstrated the capabilities of our Box-FDDP algorithm in a model predictive control application for the ANYmal C quadruped robot. The Box-FDDP algorithm computed reference motions in real-time and its efficiency enables our predictive controller to run in excess of 100 Hz computation frequency with a horizon of 1.25 s. Figure 7 shows snapshots of the forward trotting gait computed with the Box-FDDP algorithm. It also displays the contact-force tracking and updates of the swing-foot reference trajectories. In this experiment, we predefined the timings for the swing-feet motions and the foothold locations.

Fig. 7
figure 7

Snapshots of hardware experiments using Box-FDDP on the ANYmal C quadruped: Here, the robot completes a trotting gait in a model predictive control fashion. Our method is able to resolve the optimal control problem for a 1.25 s horizon in \(\approx \)10 ms on laptop hardware. The top figure shows snapshots of the ANYmal robot trotting forward, while the bottom figure displays the measured and desired contact forces and reference swing trajectories sent to the controller. To watch the video, click on the figure or see https://youtu.be/bOGBPTh_lsU. For more details about the predictive control algorithm, please refer to Mastalli et al. (2022)(Color figure online)

Table 2 Box-FDDP vs Knitro in double pendulum problem

4.4 Box-FDDP vs a direct transcription formulation

We formulated an efficient direct transcription problem with dynamics defects constraints in each node. The formulation is conceptually similar to our feasible descent direction introduced in Sect. 3.3. We transcribed the dynamics using a symplectic Euler scheme, the same integration scheme also used in Box-FDDP. We solved the direct transcription problem using different optimization algorithms provided by Knitro (Byrd et al., 2006). These available algorithms are: Interior/Direct (Waltz et al., 2006), Interior/CG (Byrd et al., 1999), active set (Byrd et al., 2003) and sequential quadratic programming (SQP) algorithms. Below we briefly describe each of Knitro’s algorithms, and then report the comparison results with Box-FDDP.

The Interior/Direct (Knitro-IDIR) algorithm replaces the NLP problem with a series of barrier sub-problems. In each iteration, it solves the primal-dual KKT problem using a line search procedure.Footnote 10 Instead, Interior/CG (Knitro-ICG) solves the primal-dual KKT problem using a projected conjugate gradient method. This method uses exact second derivatives, without explicitly storing the Hessian matrix, through a tailored trust region procedure. Interior/Direct also invokes this trust region procedure if the line search iteration converges to a non-stationary point (Waltz et al., 2006). In contrast to the interior point methods, the active set algorithm (Knitro-SLQP) replaces the NLP problem with a sequence of quadratic programs to form a sequential linear-quadratic programming algorithm. This algorithm selects a set of active constraints in each iteration, and produces a more exterior path (i.e., along the constraints) to the solution. Finally, Knitro’s SQP algorithm (Knitro-SQP) is also an active set method designed for small to medium scale problems with expensive function evaluations. Both active set approaches are often preferable to interior point methods on small- to medium-sized problems when we can provide a good initial guess (Nocedal & Wright, 2006). However, the problems in robotics are often large with many inequality constraints. Indeed, the benefits of interior point methods have been pointed out in the context of direct methods (Pardo et al., 2016).

4.4.1 Optimality versus feasibility

We compared the total cost, number of iterations, and total computation time against the different algorithms implemented in Knitro over 100 trials. For the comparison, we solved the double pendulum problem (pend), as it requires discovery of a swing-up maneuver. With this, we can clearly compare the trade-off between optimality and feasibility across the different algorithms. Note that, as described earlier, we used a single-thread for both Knitro and Box-FDDP despite our algorithm supporting multithreading.

Table 2 reports three different formulations used in the Knitro algorithms. The first one (pen) emulates exactly the optimal control formulation used in Box-FDDP, i.e., control constraints, regularization terms, and a terminal quadratic cost. The second case (regconst) uses a terminal constraint to impose the desired up-ward position together with the regularization terms. The third case (const) uses only the terminal constraints. Below we summarize the obtained results for each formulation.

Box-FDDP converges faster (w.r.t. time) than Knitro algorithms in all of the above formulations. However, Knitro-ICG is as fast as our approach with the const formulation. On the other hand, when it comes to optimality, Knitro produces more optimal solutions if we use the regconst formulation. Indeed, in our experience, Knitro generally has a better behavior when the formulation is dominated by constraint functions. Note that we do not report the cost values for the const formulation as this boils down to a feasibility problem, i.e., a problem with only constraints.

We used the regconst formulation to be able to compare both: cost and feasibility evolution. The dynamic infeasibility decreased monotonically for all the algorithms as plotted in Fig. 8 (top). Box-FDDP shows a fast resolution of the dynamics feasibility as interior point algorithms that, generally speaking, require fewer iterations than active set approaches (Hei et al., 2008). The cost evolution is also similar to the interior point algorithms, where the total costs are reported as regconst cases in Table 2.

Fig. 8
figure 8

Dynamic feasibility and cost evolution for the double pendulum problem. Box-FDDP had a similar evolution (dynamic feasibility and cost) to the interior point algorithms implemented in Knitro. Knitro produced lower cost solutions, but the computational burden is often much higher

4.4.2 Computation time

Box-FDDP had a better runtime performance than the Knitro algorithms for the double pendulum problem (cf. Table 2). However, to answer the runtime performance scalability to higher-dimensional optimal control problems, we analyzed the problem of generating a forward jumping maneuver with the ANYmal robot (i.e., jump).

Table 3 Runtime performance for a forward jumping maneuver over 100 trials

We used the same phase timings, joint limits and friction coefficient reported in Sect. 4.2.3. The results reported with Knitro and Box-FDDP cases are based on slightly different optimal control formulations. The idea is to define the most suitable formulation for each algorithm. For instance, we use quadratic penalization terms to impose the desired foothold placement, joint velocity limits and friction cone constraints for the Box-FDDP algorithm. Instead, for the Knitro algorithms, we substitute these penalization terms by general equality and inequality constraints. To further reduce the computation time of Knitro cases, we also impose a constraint for the terminal position of the trunk. Note that we did not include any cost term since it negatively affects the convergence rate of Knitro, i.e., we treated it as a feasibility problem.

Table 3 reports the runtime performance over 100 trials for the Knitro-IDIR algorithm only. The other methods (i.e., Knitro-ICG, Knitro-SLQP and Knitro-SQP) were unable to solve this problem. We used 5 different initial trunk heights, and we initialized the algorithms using their corresponding joint posture (as described in Appendix A-C) and no controls (i.e., \({\mathbf {u}}^0_s=\{{\mathbf {0}},\cdots ,{\mathbf {0}}\}\)). As in the double pendulum case, Box-FDDP also solved this problem faster than Knitro algorithms, even though it required a significant number of extra—computationally inexpensive—iterations. We suspect that this increment in the number of iterations is due to the use of penalization terms in the contact placement, friction cone and state limits constraints.

4.4.3 Robustness against different initial guesses

We compared the robustness against different initial guesses for the double pendulum (pend) and quadrupedal jump (jump) problems. In both problems, we generated random joint postures—around the nominal state—and used them to define an initial guess for the state trajectory \({\mathbf {x}}^0_{\mathbf {s}}\). In addition to the robot’s joint postures, we also generated random joint velocities around the zero-velocity condition for the double pendulum case only. We used this single random posture and velocity for each node in \({\mathbf {x}}^0_{\mathbf {s}}\), and initialized the control sequence with zeros. We used the most suitable formulations for Box-FDDP and Knitro algorithms as justified above.

Table 4 reports the number of successful resolutions over 100 trials. We considered a problem to be successfully solved if the gradient of the Box-FDDP or the feasibility of the Knitro algorithms are lower than \(5\times 10^{-5}\) (absolute feasibility tolerance). Note that this includes the cases where Knitro found a feasible approximate solution.Footnote 11 Furthermore, we considered a problem resolution unsuccessful if the problem does not converge within 70 s, which is enough time as we can see above. For each problem, we used two different maximum values of the random initialization, which their maximum magnitude are described using the \(\ell ^\infty \) norm (i.e., \(\Vert \cdot \Vert _\infty \)). We added this additive noise to the default initial guess (described in Appendix A) used for the state trajectory.

As expected, the Knitro interior point methods performed better than the active set ones. For the double pendulum problem, the interior point algorithms (i.e., Knitro-IDIR and Knitro-ICG) perform better than Box-FDDP if the warm-starting point is close to the initial condition. Despite that, Box-FDDP shows more robustness to initial guesses as its percentage of successful resolutions is consistent. Furthermore, we observed a significant increment in the number of successful resolutions for the jump problem. Indeed, Knitro was not able to solve this problem at all for random magnitudes bigger than \(\Vert 0.01\Vert _\infty \).

Table 4 Percentage of successful resolutions from random initial guesses

4.5 Box-FDDP, Box-DDP, and squashing approach in nonlinear problems

To evaluate the numerical performance of having two modes, we compared the Box-FDDP (with two modes depending on the dynamics feasibility), Box-DDP\(^+\) (using a single mode) and DDP\(^+\) with a squashing function (using a single mode) for three scenarios with the IRIS quadcopter: reaching goal (goal), looping maneuver (loop), and traversing a narrow passage (narrow). We used a sigmoidal element-wise squashing function of the form:

$$\begin{aligned} {\mathbf {s}}^i({\mathbf {u}}^i) = \frac{1}{2}\left( \underline{\mathbf {u}}^i + \sqrt{\gamma ^2 + ({\mathbf {u}}^i-\underline{\mathbf {u}}^i)^2}\right) +\\ \frac{1}{2}\left( \overline{\mathbf {u}}^i - \sqrt{\gamma ^2 + ({\mathbf {u}}^i-\overline{\mathbf {u}}^i)^2}\right) \end{aligned}$$

in which the sigmoid is approximated through two smooth-abs functions, \(\gamma \) defines its smoothness, and \(\underline{\mathbf {u}}^i\), \(\overline{\mathbf {u}}^i\) are the element-wise lower and upper control bounds, respectively. We introduced this squashing function on the system controls as: \({\mathbf {x}}_{k+1} = {\mathbf {f}}({\mathbf {x}}_k,{\mathbf {s}}({\mathbf {u}}_k))\). We used \(\gamma =2\) for all the experiments presented in this section.

Fig. 9
figure 9

Cost and convergence comparison for different quadcopter maneuvers: looping (loop) and narrow passage traversing (narrow). Box-FDDP (*-feas) outperformed both Box-DDP\(^+\) and DDP with squashing function (*-squash)(Color figure online)

Figure 9 shows that Box-FDDP converged faster than the other approaches. As reported in the accompanying video, Box-FDDP did not generate undesired loops and jerky motions as in the other cases. Indeed, the solutions with Box-FDDP have the lowest cost values (cf. Table 1). We also observed that the squashing approach often converges sooner compared to Box-DDP\(^+\). The main reason is due to the early saturation of the controls performed by Box-DDP\(^+\).

Fig. 10
figure 10

Costs associated for 10 different initial configurations of reaching goal tasks. Box-FDDP converges earlier and with lower total cost than Box-DDP\(^+\) and DDP\(^+\) with squashing function. The performance of the squashing function approach exhibits a high dependency on the initial condition(Color figure online)

In Fig. 10, we show the cost evolution for 10 different initial configurations of the reaching goal task. The target and initial configurations are (3, 0, 1) and \((-0.3 \pm 0.6, 0, 0)\) m, respectively. Infeasible iterations, in Box-FDDP, produce a very low cost in the first iterations. The squashing approach is the most sensitive to initial conditions. However, on average, it produces slightly better solutions than Box-DDP\(^+\). This is in contrast to the reported results in Tassa et al. (2014), where the performance was analyzed only for the linear-quadratic regulator problem.

5 Conclusion

We proposed a feasibility-driven approach whose search is primarily driven by the dynamics feasibility of the optimal control problem. The dynamically-infeasible iterations, which mimic a direct multiple shooting approach, allowed us to solve a wide range of optimal control problems despite it being provided with a poor initialization. The benefits of our approach are crystallized over a set of athletic and highly-dynamic maneuvers computed for the Talos humanoid and the ANYmal quadruped robots, respectively. Its improvement on the basin of attraction for a good local optimum has been a key factor to optimize such kind of complex maneuvers while considering the robot’s full rigid body dynamics, joint limits and friction cone constraints. Indeed, Box-FDDP has shown an increment in the robustness against different initial guesses compared with advanced Knitro algorithms.

We have provided evidence that our algorithm produces descent search directions. For instance, we have observed that the feasibility contraction decreases monotonically as often happens in the advanced nonlinear programming algorithms available in Knitro. Our approach has also shown to quickly reduce the dynamic infeasibility as observed in the most competitive Knitro algorithms. A similar effect is observed in the cost evolution as well. Our results suggest that the gap contraction rate is influenced by the nonlinearities of the system dynamics.

The runtime performance of Box-FDDP is often superior to direct transcription solved using state-of-the-art Knitro algorithms despite the increase in number of iterations due to the use of quadratic penalization terms. However, when comparing the computation time per iteration, our approach is between 2 to 10 times faster, which makes it suitable for model predictive control applications. Indeed, we demonstrated that our Box-FDDP algorithm can generate trotting gaits on the ANYmal C robot in a predictive control fashion. One additional remark is that we have not considered the runtime reduction due to code generation support in Crocoddyl via CppADCodeGen (Leal, 2011) and CppAD (Bell, 2012). According to our experience, code generation can lead to a computation time reduction between 30% to 60% as can be seen in our public benchmarks (Mastalli et al., 2019).

We have developed and reported the results for a wide range of optimal control problems in robotics. These problems cover an important part of the spectrum of robotics applications. The comparison across all these problems is unusual, and it represents an important contribution to the research community since we have open sourced many of these examples, as well as the Box-FDDP algorithm, in the Crocoddyl repository (Mastalli et al., 2019). Our feasibility-driven approach enabled model predictive control applications on the ANYmal C robot, however, it can potentially be used in other applications such as in humanoid robotics (Eßer et al., 2021), robot co-design (IROS, 2022), and learning (Lembono et al., 2020).