Abstract
Differential dynamic programming (DDP) is a direct single shooting method for trajectory optimization. Its efficiency derives from the exploitation of temporal structure (inherent to optimal control problems) and explicit rollout/integration of the system dynamics. However, it suffers from numerical instability and, when compared to direct multiple shooting methods, it has limited initialization options (allows initialization of controls, but not of states) and lacks proper handling of control constraints. In this work, we tackle these issues with a feasibilitydriven approach that regulates the dynamic feasibility during the numerical optimization and ensures control limits. Our feasibility search emulates the numerical resolution of a direct multiple shooting problem with only dynamics constraints. We show that our approach (named BoxFDDP) has better numerical convergence than BoxDDP\(^+\) (a single shooting method), and that its convergence rate and runtime performance are competitive with stateoftheart direct transcription formulations solved using the interior point and active set algorithms available in Knitro. We further show that BoxFDDP decreases the dynamic feasibility error monotonically—as in stateoftheart nonlinear programming algorithms. We demonstrate the benefits of our approach by generating complex and athletic motions for quadruped and humanoid robots. Finally, we highlight that BoxFDDP is suitable for model predictive control in legged robots.
1 Introduction
Optimal control is a powerful tool to synthesize motions and controls through task goals (cost/optimality) and constraints (e.g., system dynamics, interaction constraints). We can formulate such problems using direct methods (Betts, 2009), which first discretize over both state and controls, and then optimize using sparse generalpurpose nonlinear programming (NLP) software such as Snopt (Gill et al., 2005), Knitro (Byrd et al., 2006), and Ipopt (Wächter & Biegler, 2006). To ensure that the system state evolves as described by the equations of motion, we define equality constraints in the NLP problem. However, common algorithms for nonlinear programming can only guarantee the constraint satisfaction at their convergence. When this occurs, we say that the discretized states are dynamically feasible. Furthermore, despite the advantage of using advanced software for nonlinear programming, these algorithms perform very large matrix factorizations during the computation of the search direction, i.e., the resolution of the KarushKuhnTucker (KKT) problem. To do so, they use sparse linear solvers such as MA27, MA57, and MA97 (see http://www.hsl.rl.ac.uk/) that do not exploit the temporal/Markovian structure of optimal control problems efficiently. Indeed, the expensive factorizations of these linear solvers limit their practical use to realtime control on reduced models [e.g., Wieber (2006), Pardo et al. (2016), Mastalli et al. (2017), Di Carlo et al. (2018)] or motion planning [e.g., Carpentier et al. (2016), AceitunoCabezas et al. (2017), Winkler et al. (2018), Merkt et al. (2018), Mastalli et al. (2020)] in robotics. Furthermore, classical line search methods used in generalpurpose NLP solvers are less effective than the nonlinear rollout of the dynamics used in shooting methods as their use increases the number of iterations [cf. Liao and Shoemaker (1992)], which is used in recent method for multiple shooting (Giftthaler et al., 2018).
Dynamic programming methods, which have their foundations in the calculus of variations as indirect methods, have once again attracted attention due to recent results on fast nonlinear model predictive control based on DDP [e.g., Tassa et al. (2012), Koenemann et al. (2015), Neunert et al. (2018), Farshidian et al. (2017)]. In particular, there is a significant interest in the iterative linearquadratic regulator (iLQR) algorithm (Li & Todorov, 2004) as its GaussNewton (GN) approximation reduces the computation time while having superlinear convergence. Both iLQR and DDP algorithms perform a Riccati sweep in the backward pass, which incorporates elements that are reminiscent of Pontryagin’s maximum principle (PMP). For instance, at convergence, the gradient of the value function in the backward pass represents the costate; instead, the rollout of the system dynamics describes the state integration step. This connection was recognized by Bellman’s groundbreaking work (Bellman, 1954) that established the socalled HamiltonJacobiBellman (HJB) equation in the continuoustime domain. In contrast to classical direct collocation approaches, these approaches exploit the temporal/Markovian structure of the optimal control problem by solving a sequence of smaller subproblems derived from Bellman’s principle of optimality (Mayne, 1966). This leads to fast and cheap computations due to very small matrix factorizations and effective data cache accesses. Despite these advantages, both algorithms are unable to handle equality and inequality constraints efficiently. Furthermore, they have a poor basin of attraction for a good local optimum as it requires a good initialization in order to converge and are prone to numerical instability—commonly recognized challenges for single shooting approaches (Betts, 2009). These undesirable properties are mainly due to the fact that the iLQR/DDP algorithms implicitly enforce the dynamic feasibility through the system rollout.
1.1 Related work
Tradeoffs between feasibility and optimality appear in most of the stateoftheart nonlinear programming software. For instance, Ipopt includes a feasibility restoration phase which aims at reducing the constraint violation (Wächter & Biegler, 2006). In Knitro, the progress on both feasibility and optimality is achieved by adding an \(\ell ^1\)norm penalty term for the constraints in the merit function (Byrd et al., 2006). In fact, by changing the merit function or the line search procedure, we can put emphasis on obtaining feasible solutions before trying to optimize them. Instead, the iLQR/DDP algorithms do not make this tradeoff, as the backward and forward passes do not accept infeasible iterations. However, recent work on multiple shooting DDP (Giftthaler et al., 2018; Mastalli et al., 2020) has provided ways of handling dynamically infeasible iterations, which we elaborate below.
The multiple shooting variants in Giftthaler et al. (2018), Mastalli et al. (2020) are rooted in dynamic programming. For instance, Giftthaler et al. (2018) introduced a lifted^{Footnote 1} version of the algebraic Riccati equation that allows initialization of both state and control trajectories; it further accounts for the relinearization required by the dynamics gaps in the backward pass and uses a merit function to balance feasibility and optimality. In turn, in our previous work (Mastalli et al., 2020), we proposed a modification of the forward pass that numerically matches the gap contraction expected by a direct multiple shooting method subject to equality constraints only. It factorizes the KKT matrix via a Riccati recursion and defines the behavior of the defect constraints based on the firstorder necessary condition (FONC) of optimality.^{Footnote 2} These approaches improve numerical robustness against poor initialization, as they are able to use an initial guess for the state trajectory. Unfortunately, none of these methods handle inequality constraints such as control limits, with the exception of a recent work that computes squashed control sequences (MartiSaumell et al., 2020).
There are two main strategies for incorporating arbitrary constraints: active set and penalization methods (as extensively described in Nocedal and Wright (2006)). In the robotics community, one of the first successful attempts to incorporate inequality constraints in DDP used an active set approach (Tassa et al., 2014), which is based on Ohno (1978)—a pioneering work in the control community. Concretely, this approach focused on handling control limits during the computation of the backward pass, i.e., in the minimization of the actionvalue function (\(Q\)function),^{Footnote 3} which resembles the control Hamiltonian at convergence [see Kirk (1998), Section 3.11]. The method is popularly named BoxDDP, and the authors also showed a better convergence rate when compared with a squashing function approach. Later, Xie et al. (2017) included general inequality constraints into the \(Q\)function and the forward pass. The method sacrifices the computational performance by including a second quadratic program, which is solved in the forward pass. However, it still remains faster than solving the same problem using direct collocation with Snopt as reported in Hargraves and Paris (1987).
Generally speaking, active set methods are suitable for smallsize problems (such as minimizing the \(Q\)function described above) as their accuracy and speed often outperform other methods. However, the combinatorial complexity of finding the active set is prohibitive in largescale optimization problems. This motivates the development of penaltybased methods, despite their numerical difficulties: illconditioning and slow convergence. To overcome these difficulties, Lantoine and Russell (2008) proposed a method that incorporates an augmented Lagrangian term. This method was studied in the context of robust thrust optimization, in which the dynamical system has fewer degrees of freedom compared to complex legged robots. Later, Howell et al. (2019) extended the augmented Lagrangian approach to handle arbitrary inequality constraints for aerial navigation and manipulation problems. Additionally, the algorithm incorporates an active set projection for solution polishing and is often faster than direct collocation solved with Ipopt or Snopt.
Our work proposes a feasibilitydriven search for nonlinear optimal control problems with control limits. The main motivation of our approach is to increase the algorithm’s basins of attraction, by focusing on feasibility instead of focusing solely on efficiency and optimality. Apart from the control limits and dynamics, we handle all remaining constraints (e.g., state and friction cone) through quadratic penalization, as described in the results section.
1.2 Contribution
The main contribution of this work is the first complete study of the numerical properties, behaviors, and guarantees of feasibilitydriven search in differential dynamic programming. It relies on three technical contributions:

(i)
An original and efficient optimal control algorithm that directly handles control limits (BoxFDDP),

(ii)
Extensive comparisons against direct transcription and BoxDDP\(^+\) (a single shooting method),

(iii)
A tutorial that connects the different branches of theory in optimal control, and

(iv)
An experimental validation of the dynamic feasibility evolution against interior point and active set algorithms for nonlinear programming.
Our approach builds on top of our previous results on feasibilitydriven search (Mastalli et al., 2020), for which we hereby propose to define two modes in our algorithm: feasibilitydriven and controlbounded. It considers the dynamic feasibility in the forward pass and explicitly incorporates control limits, which does not require a merit function as in Giftthaler et al. (2018). Additionally, our approach has outstanding numerical capabilities, which allow us to generate motions that go beyond stateoftheart methods on optimal control or trajectory optimization in robotics, e.g., the athletic maneuver of a humanoid robot shown in Fig. 1.
2 Direct multiple shooting and differential dynamic programming
Before describing our approach, we introduce direct multiple shooting, and explain its numerical advantages when compared to single shooting methods such as DDP (Sect. 2.1). Then, in Sect. 2.2 we present a unique tutorial that connects the various branches of theory: KKT, PMP, and HJB. Additionally, in Sect. 2.3 we describe the salient aspects of original BoxDDP proposed by Tassa et al. (2014), and our variant BoxDDP\(^+\). This section contains known material, although we believe it contributes (i) to unveil the underlying problems of differential dynamic programming, and (ii) to understand the theoretical foundations of our feasibilitydriven approach.
2.1 Direct multiple shooting for optimal control
Without loss of generality, we consider a direct multiple shooting approach for the nonlinear optimal control problem with control bounds in which each shooting segment defines a single timestep:s
where the state \({\mathbf {x}}\in X\) lies in a differential manifold (with dimension \(n_x\)); the control \({\mathbf {u}}\in {\mathbb {R}}^{n_u}\) defines the input commands; \(\underline{\mathbf {u}}\), \(\bar{\mathbf {u}}\) are the lower and upper control bounds; \(\tilde{{\mathbf {x}}}_0\) is the initial state of the system; \(\ominus \) describes the difference operator of the state manifold [notation inspired by Frese (2008) that is needed to optimize over manifolds (Gabay, 1982)]; \(N\in {\mathbb {N}}\) describes the number of nodes (horizon); \(\ell _N\), \(\ell _k\) are the terminal and running cost functions; and \(\bar{\mathbf {f}}_0\), \(\bar{\mathbf {f}}_{k+1}\in T_{{\mathbf {x}}}X\) are the gap residual functions that impose the dynamic feasibility; and \(T_{{\mathbf {x}}}X\) describes the tangent space of the state manifold at the state \({\mathbf {x}}\).
Equation (1) describes a nonlinear program as the system dynamics are transcribed into a set of algebraic equations with defects in each timestep. It is possible to extend this notation for cases where shooting segments contain multiple timesteps; however, as seen later, this does not provide any computational benefit, i.e., reduction in the computation time or better distribution of nonlinearities of the dynamics. Fig. 2 depicts the transcription process incorporating state and control trajectories \(({\mathbf {x}}_s,{\mathbf {u}}_s)\) as decision variables. This is in contrast to differential dynamic programming, which only transcribe the control sequence \({\mathbf {u}}_s\) and obtain \({\mathbf {x}}_s\) by integrating the system dynamics (i.e., a single shoot).
2.1.1 Numerical behavior of direct multiple shooting
Algorithms for nonlinear programming aim at finding the Karush–Kuhn–Tucker (KKT) conditions defined by the FONC of optimality (Nocedal & Wright, 2006). This process involves iteratively solving a KKT problem (i.e., linear system of equations) until satisfaction of a stopping criterion. In the line search strategy, the solution of this KKT problem provides a search direction \(\delta {\mathbf {w}}_k\), and the selected step length \(\alpha \) defines how much the current guess \({\mathbf {w}}_k\) moves along that direction, i.e., \({\mathbf {w}}_{k+1} = {\mathbf {w}}_{k} \oplus \alpha \delta {\mathbf {w}}_{k}\). Note that the integrator operator \(\oplus \) enables us to optimize over the manifold [as in Frese (2008), Gabay (1982)], however, it is a feature that generalpurpose nonlinear programming libraries often does not have.
We can easily analyze the numerical behavior of direct multiple shooting formulations by focusing on the KKT problem for the shooting interval k only. This is possible because of the temporal/Markovian structure of optimal control problems. Therefore, when we apply the Newton method on this KKT subproblem together with the Bellman’s principle of optimality, we obtain:
where Eq. (2) defines the stationary condition (first and second rows) and the primal feasibility (third row) of the FONC of optimality, respectively; \(\delta {\mathbf {x}}_k\), \(\delta {\mathbf {u}}_k\) define the search direction for the primal variables; \(\varvec{\lambda }^+_{k+1}\) is the updated Lagrangian multipliers; \(\varvec{\ell }_{{\mathbf {x}}_k}\), \(\varvec{\ell }_{{\mathbf {u}}_k}\), and \(\varvec{\ell }_{\mathbf {xx}_k}\), \(\varvec{\ell }_{\mathbf {xu}_k}\), \(\varvec{\ell }_{\mathbf {uu}_k}\) are the Jacobians and Hessians of the cost function; \({\mathbf {f}}_{{\mathbf {x}}_k}\), \({\mathbf {f}}_{{\mathbf {u}}_k}\) are the Jacobians of the system dynamics; and \({\mathcal {V}}_{{\mathbf {x}}_k}\), \({\mathcal {V}}_{\mathbf {xx}_k}\) are the gradient and Hessian of the value function. Note that we apply the GaussNewton (GN) approximation as we ignore the Hessian of the system dynamics to avoid expensive tensorvector multiplications.
When we factorize this system of equations, the resulting search direction always satisfies the dynamics constraints if the Jacobians and Hessians are constant (i.e., a LQR problem). However, if we apply an \(\alpha \)step, the gap of the dynamics closes by a factor of \((1\alpha )\). We observe this by inspecting the primal feasibility at the next iteration:
where, by definition in Eq. (2) (third row), we have that
with the gap defined as \(\bar{\mathbf {f}}^i_{k+1}={\mathbf {f}}({\mathbf {x}}^i_k,{\mathbf {u}}^i_k) \ominus {\mathbf {x}}^i_{k+1}\), and i describes the iteration number.
As described later in Sect. 3, injecting this numerical behavior can be interpreted as a feasibilitydriven approach for multiple shooting. However, our approach operates quite differently from classical multiple shooting approaches. For instance, it does not increase the computation time by defining extra state (decision) variables. But there is no such thing as a free lunch as our approach cannot temporarily increase the defects (e.g., to reduce the cost value) after taking its first full step (\(\alpha =1\)).
2.1.2 Advantages of direct multiple shooting
The rationale for a direct multiple shooting approach (namely, adding \({\mathbf {x}}_s\) as decision variables) is to distribute the nonlinearities of the dynamics over the entire horizon (Diehl et al., 2006). To illustrate this statement, we recognize that integrating over a horizon implies recursively calling integrator functions, i.e.,
in which the nonlinearity increases along the horizon. This means that the local prediction, constructed by the derivatives of the nonlinear system in the KKT problem, becomes more inaccurate as the horizon increases. This is a well recognized drawback of single shooting approaches, and certainly a numerical limitation of the BoxDDP\(^+\) algorithm described below.
2.2 Connection between KKT, PMP and HJB branches
The fourth row of Eq. (2) connects the value function with the Lagrange multipliers associated with the state equations. By definition, this multiplier corresponds to the next costate value at node k, which reveals an interesting connection with the PMP used in indirect multiple shooting methods and the KKT approach, i.e.
This might be not surprising if we realize that the PMP or KKT approach write the optimal control in term of the costate, while HJB expresses it in terms of the value function. Equation (2) is also at the heart of direct single shooting approaches such as DDP if each dynamics gap \(\bar{\mathbf {f}}_{k+1}\) vanishes, therefore this connection holds for single shooting settings as well.
Different interpretations can arise from this connection. For instance, DDP or our approach can be interpreted as iterative methods for solving the PMP in discretetime optimal control problems under single and multiple shooting settings, respectively. Furthermore, under the context of linear dynamics and quadratic cost, DDP or our approach can be classified as global methods as they compute an optimal policy (i.e., a closedloop solution).
2.3 Differential dynamic programming with control limits
As proposed by Tassa et al. (2014), BoxDDP locally approximates the value function at node k as
which breaks the optimal control problem into a sequence of simpler subproblems with control bounds. Then, a local search direction is computed through a linear quadratic (LQ) approximation of the value function:
where the \({\mathbf {Q}}_k\) terms describe the LQ approximation of the actionvalue function \(Q_k(\cdot )\) that can be seen as a function of the derivatives of the value function \(\bar{{\mathcal {V}}}_k=({\mathcal {V}}_{{\mathbf {x}}_k},{\mathcal {V}}_{\mathbf {xx}_k})\). Solving Eq. (6) results in a local feedback control law \(\delta {\mathbf {u}}_k={\mathbf {k}}_k+{\mathbf {K}}_k\delta {\mathbf {x}}_k\) consisting of a feedforward term \({\mathbf {k}}_k\) and a state feedback gain \({\mathbf {K}}_k\) for each discretization point k. Below, we describe how to compute the \({\mathbf {Q}}_k\) terms in the socalled Riccati sweep step.
2.3.1 Riccati sweep
The LQ approximation of the actionvalue function \(Q_k\) is computed recursively, backwards in time, as follows
where \({\mathcal {V}}_{{\mathbf {x}}_{k+1}}\), \({\mathcal {V}}_{\mathbf {xx}_{k+1}}\) are obtained by solving the following algebraic Riccati equations at \(k+1\):
with \(\hat{\mathbf {Q}}_{\mathbf {uu},\text {f}_{k+1}}\) as the control Hessian in the free space, which we will describe below. Additionally, we use the gradient and Hessian of the value function to find a local search direction as described below. In the case of BoxDDP\(^+\), our adaptation of BoxDDP to allow initialization with state trajectories, the gradient of the value function in the first iteration is relinearized by the initialization infeasibility \(\bar{\mathbf {f}}^0_{k+1}\) as \({\mathcal {V}}_{{\mathbf {x}}_{k+1}}+{\mathcal {V}}_{\mathbf {xx}_{k+1}}\bar{\mathbf {f}}^0_{k+1}\).
2.3.2 Controlbounded direction
We compute the controlbounded direction, defined in Eq. (6), by breaking it down into the socalled feedforward and feedback subproblems (Tassa et al., 2014). We first compute the feedforward term by solving the following quadratic programming (QP) program with box constraints:
and then the feedback gain as
where \(\hat{\mathbf {Q}}_{\mathbf {uu},\text {f}_k}\) is the control Hessian of the free subspace obtained in Eq. (9) in which \(\text {f}_k\) describes the free subspace at node k, i.e., the indexes of the inactive bounds. With these indexes, we sort and partition the control Hessian as:
and compute \(\hat{\mathbf {Q}}^{1}_{\mathbf {uu},\text {f}_k}\) internally based on the factorization of \({\mathbf {Q}}^{1}_{\mathbf {uu},\text {f}_k}\). This is what our BoxQP program does to solve the feedforward subproblem efficiently via the ProjectedNewton QP algorithm (Bertsekas, 1982). This algorithm quickly identifies the active set and moves along the free subspace of the Newton step. It also has a similar computational cost to the unconstrained QP if the active set remains unchanged. Thus, the runtime performance is similar to the DDP algorithm. However, it requires a feasible initialization \(\delta {\mathbf {u}}^0_k\).
Again, by using a ProjectedNewton QP algorithm, we further efficiently obtain the control Hessian of the free subspace \({\mathbf {Q}}^{1}_{\mathbf {uu},\text {f}_k}\) as the algorithm computes it internally when it moves along the free subspace of the Newton step. With it, we compute a state feedback gain that generates corrections within the control limits. This is an important feature for controlling the robot as well as for rollingout the nonlinear dynamics in the forward pass. For more details about the ProjectedNewton QP algorithm see Bertsekas (1982).
2.3.3 State integration
In any DDP algorithm such as BoxDDP, we perform a state integration using the locallylinear policy as
where \(\hat{\mathbf {x}}_k\), \(\hat{\mathbf {u}}_k\) are the new state and control at node k generated using a step length \(\alpha \). The feedback gain helps to distribute the nonlinearities; however, as seen in the previous section, it does not resemble the numerical behavior described by the FONC of optimality in direct multiple shooting. This different numerical behavior stems from the state integration procedure closing the gaps, i.e., \(\bar{\mathbf {f}}_k = {\mathbf {0}}\), \(\forall k=\{0,1,\cdots ,N\}\).
2.3.4 Expected improvement
When solving the algebraic Riccati equations, we obtain the expected improvement as
Below, we elaborate the proposed algorithm based on the aforementioned description.
3 BoxFDDP: a feasibilitydriven approach for multiple shooting
We now introduce a novel algorithm that combines a feasibilitydriven search (Sect. 2.1) with an active set treatment of the control limits (Sect. 2.3) named BoxFDDP. The BoxFDDP algorithm comprises two modes: feasibilitydriven and controlbounded modes, one of which is chosen for a given iteration (Algorithm 1). The feasibilitydriven^{Footnote 4} mode mimics the numerical resolution of a direct multiple shooting problem with only dynamics constraints when computing the search direction and step length (lines 4–10 and 11–21, respectively). This mode neglects the control limits of the system as its focuses on dynamic feasibility only. In contrast, the controlbounded mode projects the search direction onto the feasible control region whenever the dynamics constraint is feasible (line 10). In both modes, the applied controls in the forward pass are projected onto their feasible box (line 13), causing dynamicallyinfeasible iterations to reach the control box. With this strategy, our solver focuses on feasibility early on, which increases its basins of attraction, and later on optimality. Technical descriptions of both modes are elaborated in Sects. 3.1 and 3.2. Note that the existence of feasible descent directions are introduced later in Sect. 3.3.
3.1 Search direction
In the standard BoxDDP algorithm, an initial forward pass is performed to obtain the initial state trajectory \({\mathbf {x}}_s\). This trajectory enforces the dynamics explicitly; thus, the gaps are zero, i.e., \(\bar{\mathbf {f}}_{k}={\mathbf {0}}\) for all \(k=\{0,1,\cdots ,N1\}\). Instead, our multiple shooting variant, BoxFDDP, computes the gaps once at each iteration (line 3), which are used to find the search direction and to compute the expected improvement. However, if the iteration is dynamically feasible, then the search direction procedure is the same as in the standard BoxDDP (Tassa et al., 2014). Below, we describe the steps performed to compute the search direction.
3.1.1 Computing the gaps
Given a current iterate \(({\mathbf {x}}_s,{\mathbf {u}}_s)\), we perform a nonlinear rollout to compute the gaps as
where \({\mathbf {f}}({\mathbf {x}}_k,{\mathbf {u}}_k)\) is the rollout state at interval \(k+1\), \({\mathbf {x}}_{k+1}\) is the next shooting state, and \(\ominus \) is the difference operator.
3.1.2 Actionvalue function of direct multiple shooting formulation
Without loss of generality, we use the GaussNewton (GN) approximation (Li & Todorov, 2004) to write the actionvalue function of our algorithm as
where again \(\varvec{\ell }_{{\mathbf {x}}_k},\varvec{\ell }_{{\mathbf {u}}_k}\) and \(\varvec{\ell }_{\mathbf {xx}_k},\varvec{\ell }_{\mathbf {xu}_k},\varvec{\ell }_{\mathbf {uu}_k}\) describe the gradient and Hessian of the cost function, respectively; \(\delta {\mathbf {x}}_{k+1}={\mathbf {f}}_{{\mathbf {x}}_k}\delta {\mathbf {x}}_k+{\mathbf {f}}_{{\mathbf {u}}_k}\delta {\mathbf {u}}_k\) is the linearized dynamics; and \({\mathbf {f}}_{{\mathbf {x}}_k},{\mathbf {f}}_{{\mathbf {u}}_k}\) are its Jacobians.
In direct multiple shooting settings, linearization of the system dynamics includes a drift term
as there are gaps in the dynamics term \(\bar{\mathbf {f}}_{k+1}\) produced between subsequent shooting segments (q.v. Fig. 2). Then, the Riccati sweep needs to be adapted as follows:
in which
is the gradient of the value function after the deflection produced by \(\bar{\mathbf {f}}_{k+1}\) (also described above as relinearization). Note that the Hessian of the value function remains unchanged as DDP approximates the value function through a LQ model. Additionally, this modification affects the values of the Riccati equations, Eq. (8), and expected improvement, Eq. (13), as they depend on the gradient of the value function.
3.1.3 Feasibilitydriven direction
For dynamicallyinfeasible iterates (line 7), we ignore the control constraints and compute a controlunbounded direction:
We do this because we cannot quantify the effect of the gaps on the control bounds, which are needed to solve the feedforward subproblem Eq. (9). Our approach is equivalent to opening the control bounds during dynamicallyinfeasible iterates.
3.1.4 Controlbounded direction
We warmstart the BoxQP using the feedforward term \({\mathbf {k}}_k\) computed in the previous iteration. However, if the algorithm is switching from feasibility to controlbounded mode (i.e., the previous iteration is infeasible), then \({\mathbf {k}}_k\) might fall outside the feasible box and \(\underline{\mathbf {u}}{\mathbf {u}}_k\le {\mathbf {k}}_k\le \bar{\mathbf {u}}{\mathbf {u}}_k\) do not hold. This violates the assumption of the previouslydescribed BoxQP, for which a feasible initial point needs to be provided.
To handle infeasible iterates, we propose to project the warmstart of the BoxQP (line 9) as
where \(\underline{\mathbf {u}}\), \(\bar{\mathbf {u}}\) are the lower and upper bounds of the feedforward subproblem, Eq. (9), respectively.
Once we project the warmstart \({\mathbf {k}}_k\), we solve the feedforward and feedback subproblems as explained in Sect. 2.3.2. Furthermore, we solve the BoxQP using a ProjectedNewton method (Bertsekas, 1982), which handles box constraints efficiently as described above.
3.2 Step length
As far as we know, the standard BoxDDP (Tassa et al., 2014) modifies only the search direction (i.e., backward pass) to handle the control limits. However, it is also important to project the controls onto the feasible box during the forward pass. We do this by finding a step length that minimizes the cost (Nocedal & Wright, 2006).
3.2.1 Projecting the rollout towards the feasible box
We propose to project the controls onto the feasible box in the nonlinear rollout (line 13), i.e.,
where \(\hat{\mathbf {u}}_k\) is the updated control from the control policy. Our method does not require to solve another QP problem (Xie et al., 2017) or to project the linear search direction given the gaps on the dynamics (Howell et al., 2019). Furthermore, the control policy considers a gap prediction that guarantees a feasible descent direction. We formally describe the technical details of this procedure in Sect. 3.2.3.
3.2.2 Updating the gaps
As analyzed earlier, the evolution of the gaps in direct multiple shooting is affected by the selected step length. For an optimal control problem without control limits, this evolution is defined as
where \(\alpha \) is the steplength found by the linesearch procedure (line 11–21). Note that a full step \((\alpha =1)\) closes the gaps completely. We described this gap contraction rate in Sect. 2.1.1.
3.2.3 Nonlinear step
With a nonlinear rollout^{Footnote 5} (line 18), we avoid the linear prediction error of the dynamics that is typically handled by a merit function in generalpurpose NLP algorithms, as explained in Mastalli et al. (2020). If we keep the gapcontraction rate of Eq. (22), then we obtain
where \(\hat{\mathbf {x}}_{k}\), \(\hat{\mathbf {u}}_{k}\) are the next state and control along an \(\alpha \)step; \({\mathbf {k}}_k\) and \({\mathbf {K}}_k\) are the feedforward term and feedback gains computed by Eq. (19) or Eq. (9)–(10). Furthermore, the initial condition of the rollout is defined as \(\hat{\mathbf {x}}_0=\mathbf {\tilde{x}}_0 \oplus (\alpha 1) \bar{\mathbf {f}}_0\). Note that this is in contrast to the standard BoxDDP, in which the gaps are always closed, even for \(\alpha < 1\).
Avoiding the use of a merit function helps the algorithm to check the search direction more accurately. Indeed, it has been shown that the nonlinear rollout is more effective than a standard line search procedure as it reduces the number of iterations (Liao & Shoemaker, 1992).
3.2.4 Expected Improvement
It is critical to properly evaluate the success of a trial step. Given the current dynamics gaps \(\bar{{\mathbf {f}}}_k\), BoxFDDP computes the expected improvement of a computed search direction as
with
where \(\hat{\mathbf {Q}}_{\mathbf {uu},\text {f}_k}\) is the control Hessian of the free space, \(\delta \hat{\mathbf {x}}_k = \hat{\mathbf {x}}_k\ominus {\mathbf {x}}_k\), and J is the total cost of a given statecontrol trajectory (\({\mathbf {x}}_s\), \({\mathbf {u}}_s\)). We use this expected improvement model for both modes. Note that, in the feasibilitydriven mode, the free space spans the entire control space; instead, in the controlbounded mode, the gaps are zero.
We obtain this expression by computing the cost from a linear rollout of the current control policy as described in Eq. (23). We also accept ascent directions when evaluating the trial step, our approach is inspired by the Goldstein condition (Nocedal & Wright, 2006, Chapter 3):
where \(b_1\), \(b_2\) are adjustable parameters, we used in this paper \(b_1 = 0.1\) and \(b_2 = 2\). Ascent directions improve the algorithm convergence as it helps to reduce the feasibility error through a moderate increment in the cost.
3.2.5 Regularization
We regularize the \(\mathbf {Q_{uu}}\) and \({\mathcal {V}}_\mathbf {{xx}}\) terms through a LevenbergMarquardt scheme (Fletcher, 1971). Concretely, we increase the damping value \(\mu \) when the computation of the feedforward subproblem—formulated in Eq. (9)—fails, or when the forward pass accepts a step length smaller than \(\alpha _0=0.01\). Moreover, we decrease the damping value if the iteration accepts a step larger than \(\alpha _1=0.5\). Both regularization procedures modify the values of the \(\mathbf {Q_{uu}}\) and \({\mathcal {V}}_\mathbf {{xx}}\) terms during the Riccati sweep computation as
where \(\beta _i\) and \(\beta _d\) are the factors^{Footnote 6} used to increase or decrease the current damping value \(\mu \), respectively; \(\mu '\) is the newlycomputed damping value; and \({\mathbf {I}}\) is the identity matrix. Additionally, we start the regularization procedure with an initial, and userdefined, damping value.^{Footnote 7} We also define minimum and maximum damping values to avoid increasing or decreasing the damping value unnecessarily.^{Footnote 8}
Both regularizations significantly increase the robustness of the algorithm and ensures convergence, as it moves from Newton direction to steepestdescent, and vice versa. The Newton direction, which occurs with \(\mu =0\), provides fast convergence and is robust against scaling because it exploits the Hessian of the problem. However, it does not always produce valid descent directions as \(\mathbf {Q_{uu}}\) might be indefine and the problem nonconvex. In such cases, increasing the damping value guarantees that \(\mathbf {Q_{uu}}\) is positivedefine which, in turn, computes a search direction closer to the steepestdescent one. Instead, \({\mathcal {V}}_\mathbf {{xx}}\) enforces the state trajectory to be closer to the one previously computed (Tassa, 2011). It will also not result in vanishing feedback gains even for large damping values.
3.3 Existence of feasible descent directions
As described above, our approach has two main modes: feasibilitydriven and controlbounded. During the feasibilitydriven phase, we compute a search direction to drive the next guess towards dynamic feasibility and try a step while keeping the control within the box constraints. This projection procedure can be seen as a nonlinear term in our dynamics, but we assume its effect is negligible for finding a feasible direction. On the other hand, our algorithm computes a search direction that considers the box constraints after the dynamic feasibility has been achieved. This is needed to improve the next current guess by taking control constraints into account when computing the feedback gains along the free subspace.
As analyzed in Sect. 2.1.1, the feasibilitydriven direction is computed by mimicking the numerical behavior of a nonlinear program during the resolution of a direct multiple shooting problem with only dynamics constraints. It implies that the feasibilitydriven search produces a descent direction, and eventually the algorithm converges, if the cost Hessian is a positive definite matrix. Indeed, the positiveness is always guaranteed by our regularization procedure as described before. Furthermore, the feasibilitydriven step aims at reducing the nonlinearities produced by a single shooting formulation (e.g., DDP algorithm). When the dynamics are feasible, we apply a controlbounded search which also produces a descent direction as it boils down to a QP program.
In the next section, we present a series of results that demonstrate the benefits of our feasibilitydriven approach.
4 Results
We analyze BoxFDDP across a wide range of optimal control problems (briefly introduced in Sect. 4.1) as follows. First, in Sect. 4.2, we show the benefits of the feasibility mode by analyzing the gap contraction and its connection with the nonlinearities in the dynamics. Second, we compare our algorithm against a direct transcription formulation in Sect. 4.4. Concretely, we compare the dynamic feasibility and optimality evolutions, runtime performance, and robustness to different initial guesses against the interior point and active set algorithms available in Knitro. Finally, in Sect. 4.5, we report the results of a squashing approach for solving the control bounds as it demonstrates the numerical performance of having two modes.
We benchmark the algorithms using an Intel Core i99900KF CPU with eight cores @ 3.60 GHz and 16 MB cache. Our implementation of BoxFDDP supports code generation, but all runtime performances reported henceforth do not employ it for fair comparison with other approaches. We use 8 threads to compute the cost and dynamics derivatives for the experiments with the BoxFDDP algorithm and the legged robots only. We use the same initial regularization and stopping criteria values for each problem, and the values are \(10^{9}\) and \(5\times 10^{5}\), respectively.
4.1 Optimal control problems
To provide empirical evidence on the benefits of the feasibilitydriven approach, we developed a range of different optimal control problems: (1) an underactuated double pendulum (pend); (2) a quadcopter navigating towards a goal (goal) or through a narrow passage (narrow) and looping (loop); (3) various gaits, aggressive jumps (jump) and unstable hopping (hop) in a quadruped robot; (4) wholebody manipulation (man), hand control while balancing in single leg (taichi), dip on parallel bars (dip) and a pullup bar task (pullup) in a humanoid robot. Figure 3 shows snapshots of motions computed by BoxFDDP for some of these problems, and the accompanying video shows the entire motion sequences.^{Footnote 9}
We describe the cost functions, dynamics, control limits, penalization terms, and initialization of each optimal control problem in Appendix A. Finally, some of these problems, as well as our implementation of the BoxFDDP algorithm, are publicly available in the Crocoddyl repository (Mastalli et al., 2019).
4.2 Advantages of the feasibilitydriven mode
To understand the benefits of the feasibilitydriven mode, we analyze the resulting total cost, number of iterations and total computation time obtained in both algorithms: BoxFDDP and BoxDDP\(^+\) using the same initial guess. BoxDDP\(^+\) is an improved version of the standard BoxDDP proposed in Tassa et al. (2014), which it accepts initialization for both: state and control trajectories as described above. This version accepts infeasible warmstarts as in Eq. (17), and it is available in the Crocoddyl repository. Without this modification, the standard BoxDDP could easily diverge (and not converge at all) when we initialize it using quasistatic torques in problems with medium to longer horizons.
4.2.1 Larger basin of attraction and convergence
In our experiments, BoxDDP\(^+\) was unable to generate jumping and hopping motions for quadrupeds, as well as pullups for humanoids, i.e., it failed to solve our jump, hop, and pullup task specifications (marked by the ✗ in Table 1). BoxDDP\(^+\) failed to generate such aggressive motions because trajectories satisfying all the specifications of those tasks are significantly distant from the initial guess provided to the solver. BoxDDP\(^+\) behaves poorly as it computes solution with higher cost value and computation time, which is critical drawback in model predictive control applications.
In contrast, our approach (BoxFDDP) was able to solve all of the tasks, and it did so with fewer iterations and lower total cost (see Table 1 and Fig. 4). Furthermore, BoxFDDP and BoxDDP\(^+\) have the same algorithmic complexity, but since our approach requires fewer iterations than BoxDDP\(^+\), the total computation time of our approach is lower. These results are a direct consequence of the feasibilitydriven mode in our approach, which is able to find control sequences even when faced with poor initial guesses. The infeasible iterations ensure convergence from remote initial guesses through a balance between optimality and feasibility.
Finally, we would like to emphasize that the feasibilitydriven mode of BoxFDDP can not only solve tasks that BoxDDP\(^+\) is unable to solve, but also improve the solutions of tasks that BoxDDP\(^+\) is able to solve. For instance, consider the quadcopter tasks goal, narrow, and loop: In the accompanying video, we show that our approach generates concise and smooth quadcopter trajectories, whereas BoxDDP\(^+\) generates jerky motions and with unnecessary loops, due to early projection of the control commands.
4.2.2 Gap contraction and nonlinearities
We observed that the gap contraction rate is highly influenced by the nonlinearities of the system dynamics (see Fig. 5). When compared to the dynamics, the nonlinearities of the task often have a smaller effect (e.g., dip vs pullup). Indeed, the gap contraction speed followed the order: humanoid, quadruped, double pendulum, and quadcopter.
Propagation errors due to the dynamics linearization have an important effect on the algorithm progress as Riccati recursions maintain a local quadratic approximation of the value function. The prediction of the expected improvement is indeed more accurate for systems with less nonlinearities, which is the reason why the algorithm tends to accept larger steps that result in higher gap contractions. Indeed, the effect of having a feasibility search is more significant in problems with very nonlinear dynamics as it reduces the total cost faster due to the nonlinearity distribution motivated in Sect. 2.1.2 (see Figs. 4 and 5). It also produces a low cost reduction rate during the gap contraction phase as the algorithm is first focusing on achieving dynamic feasibility.
4.2.3 Highlydynamic and complex maneuvers
The BoxFDDP algorithm can solve a wide range of motions: from unstable and consecutive hops to aggressive and complex motor behaviors. In Fig. 6, we show the joint torques and velocities of a single leg for the ANYmal’s jumping problem (depicted in Fig. 3c). The motion consisted of three phases: jumping (0–300 ms), flying (300–700 ms), and landing (700–1000 ms). We used 0.7 as a friction coefficient and reduced the real joint limits of the ANYmal robot: from 40 to 32 N m (torque limits) and from 15 to 7.5 rad/s (velocity limits). Thus, generating a 30 cm jump becomes a very challenging task. Furthermore, in this experiment, the velocity limit violations appeared since we used quadratic penalization (with constant weights) to enforce them, and the swing phases are likely too short for such a large jump. Note that if we use constant weights, it might turn out that, for some cases, these weights are not big enough. Nonetheless, we only encountered these violations in very constrained problems. For instance, we did not find velocity violations for the walking, trotting, pacing, and bounding gaits (reported in the accompanying video). For these cases, BoxFDDP converged approximately with the same number of iterations achieved by the FDDP solver (i.e., a fully unconstrained case). This is due to the fact that the robot could generate those gaits without reaching its torque limits.
Surprisingly, naturally looking behaviors emerged during the computation of the dip and pullup problems on the Talos humanoid robot. We did not include any heuristic that could have helped the algorithm to generate these undefined behaviors. For instance, the balancing and legcrossing on the bars emerge if we allocate a significant amount of time in that motion phase. Similarly, the pullup motion emerges if we significantly increase the maximum torque limits on the arms.
4.3 Experimental trials
We demonstrated the capabilities of our BoxFDDP algorithm in a model predictive control application for the ANYmal C quadruped robot. The BoxFDDP algorithm computed reference motions in realtime and its efficiency enables our predictive controller to run in excess of 100 Hz computation frequency with a horizon of 1.25 s. Figure 7 shows snapshots of the forward trotting gait computed with the BoxFDDP algorithm. It also displays the contactforce tracking and updates of the swingfoot reference trajectories. In this experiment, we predefined the timings for the swingfeet motions and the foothold locations.
4.4 BoxFDDP vs a direct transcription formulation
We formulated an efficient direct transcription problem with dynamics defects constraints in each node. The formulation is conceptually similar to our feasible descent direction introduced in Sect. 3.3. We transcribed the dynamics using a symplectic Euler scheme, the same integration scheme also used in BoxFDDP. We solved the direct transcription problem using different optimization algorithms provided by Knitro (Byrd et al., 2006). These available algorithms are: Interior/Direct (Waltz et al., 2006), Interior/CG (Byrd et al., 1999), active set (Byrd et al., 2003) and sequential quadratic programming (SQP) algorithms. Below we briefly describe each of Knitro’s algorithms, and then report the comparison results with BoxFDDP.
The Interior/Direct (KnitroIDIR) algorithm replaces the NLP problem with a series of barrier subproblems. In each iteration, it solves the primaldual KKT problem using a line search procedure.^{Footnote 10} Instead, Interior/CG (KnitroICG) solves the primaldual KKT problem using a projected conjugate gradient method. This method uses exact second derivatives, without explicitly storing the Hessian matrix, through a tailored trust region procedure. Interior/Direct also invokes this trust region procedure if the line search iteration converges to a nonstationary point (Waltz et al., 2006). In contrast to the interior point methods, the active set algorithm (KnitroSLQP) replaces the NLP problem with a sequence of quadratic programs to form a sequential linearquadratic programming algorithm. This algorithm selects a set of active constraints in each iteration, and produces a more exterior path (i.e., along the constraints) to the solution. Finally, Knitro’s SQP algorithm (KnitroSQP) is also an active set method designed for small to medium scale problems with expensive function evaluations. Both active set approaches are often preferable to interior point methods on small to mediumsized problems when we can provide a good initial guess (Nocedal & Wright, 2006). However, the problems in robotics are often large with many inequality constraints. Indeed, the benefits of interior point methods have been pointed out in the context of direct methods (Pardo et al., 2016).
4.4.1 Optimality versus feasibility
We compared the total cost, number of iterations, and total computation time against the different algorithms implemented in Knitro over 100 trials. For the comparison, we solved the double pendulum problem (pend), as it requires discovery of a swingup maneuver. With this, we can clearly compare the tradeoff between optimality and feasibility across the different algorithms. Note that, as described earlier, we used a singlethread for both Knitro and BoxFDDP despite our algorithm supporting multithreading.
Table 2 reports three different formulations used in the Knitro algorithms. The first one (pen) emulates exactly the optimal control formulation used in BoxFDDP, i.e., control constraints, regularization terms, and a terminal quadratic cost. The second case (regconst) uses a terminal constraint to impose the desired upward position together with the regularization terms. The third case (const) uses only the terminal constraints. Below we summarize the obtained results for each formulation.
BoxFDDP converges faster (w.r.t. time) than Knitro algorithms in all of the above formulations. However, KnitroICG is as fast as our approach with the const formulation. On the other hand, when it comes to optimality, Knitro produces more optimal solutions if we use the regconst formulation. Indeed, in our experience, Knitro generally has a better behavior when the formulation is dominated by constraint functions. Note that we do not report the cost values for the const formulation as this boils down to a feasibility problem, i.e., a problem with only constraints.
We used the regconst formulation to be able to compare both: cost and feasibility evolution. The dynamic infeasibility decreased monotonically for all the algorithms as plotted in Fig. 8 (top). BoxFDDP shows a fast resolution of the dynamics feasibility as interior point algorithms that, generally speaking, require fewer iterations than active set approaches (Hei et al., 2008). The cost evolution is also similar to the interior point algorithms, where the total costs are reported as regconst cases in Table 2.
4.4.2 Computation time
BoxFDDP had a better runtime performance than the Knitro algorithms for the double pendulum problem (cf. Table 2). However, to answer the runtime performance scalability to higherdimensional optimal control problems, we analyzed the problem of generating a forward jumping maneuver with the ANYmal robot (i.e., jump).
We used the same phase timings, joint limits and friction coefficient reported in Sect. 4.2.3. The results reported with Knitro and BoxFDDP cases are based on slightly different optimal control formulations. The idea is to define the most suitable formulation for each algorithm. For instance, we use quadratic penalization terms to impose the desired foothold placement, joint velocity limits and friction cone constraints for the BoxFDDP algorithm. Instead, for the Knitro algorithms, we substitute these penalization terms by general equality and inequality constraints. To further reduce the computation time of Knitro cases, we also impose a constraint for the terminal position of the trunk. Note that we did not include any cost term since it negatively affects the convergence rate of Knitro, i.e., we treated it as a feasibility problem.
Table 3 reports the runtime performance over 100 trials for the KnitroIDIR algorithm only. The other methods (i.e., KnitroICG, KnitroSLQP and KnitroSQP) were unable to solve this problem. We used 5 different initial trunk heights, and we initialized the algorithms using their corresponding joint posture (as described in Appendix AC) and no controls (i.e., \({\mathbf {u}}^0_s=\{{\mathbf {0}},\cdots ,{\mathbf {0}}\}\)). As in the double pendulum case, BoxFDDP also solved this problem faster than Knitro algorithms, even though it required a significant number of extra—computationally inexpensive—iterations. We suspect that this increment in the number of iterations is due to the use of penalization terms in the contact placement, friction cone and state limits constraints.
4.4.3 Robustness against different initial guesses
We compared the robustness against different initial guesses for the double pendulum (pend) and quadrupedal jump (jump) problems. In both problems, we generated random joint postures—around the nominal state—and used them to define an initial guess for the state trajectory \({\mathbf {x}}^0_{\mathbf {s}}\). In addition to the robot’s joint postures, we also generated random joint velocities around the zerovelocity condition for the double pendulum case only. We used this single random posture and velocity for each node in \({\mathbf {x}}^0_{\mathbf {s}}\), and initialized the control sequence with zeros. We used the most suitable formulations for BoxFDDP and Knitro algorithms as justified above.
Table 4 reports the number of successful resolutions over 100 trials. We considered a problem to be successfully solved if the gradient of the BoxFDDP or the feasibility of the Knitro algorithms are lower than \(5\times 10^{5}\) (absolute feasibility tolerance). Note that this includes the cases where Knitro found a feasible approximate solution.^{Footnote 11} Furthermore, we considered a problem resolution unsuccessful if the problem does not converge within 70 s, which is enough time as we can see above. For each problem, we used two different maximum values of the random initialization, which their maximum magnitude are described using the \(\ell ^\infty \) norm (i.e., \(\Vert \cdot \Vert _\infty \)). We added this additive noise to the default initial guess (described in Appendix A) used for the state trajectory.
As expected, the Knitro interior point methods performed better than the active set ones. For the double pendulum problem, the interior point algorithms (i.e., KnitroIDIR and KnitroICG) perform better than BoxFDDP if the warmstarting point is close to the initial condition. Despite that, BoxFDDP shows more robustness to initial guesses as its percentage of successful resolutions is consistent. Furthermore, we observed a significant increment in the number of successful resolutions for the jump problem. Indeed, Knitro was not able to solve this problem at all for random magnitudes bigger than \(\Vert 0.01\Vert _\infty \).
4.5 BoxFDDP, BoxDDP, and squashing approach in nonlinear problems
To evaluate the numerical performance of having two modes, we compared the BoxFDDP (with two modes depending on the dynamics feasibility), BoxDDP\(^+\) (using a single mode) and DDP\(^+\) with a squashing function (using a single mode) for three scenarios with the IRIS quadcopter: reaching goal (goal), looping maneuver (loop), and traversing a narrow passage (narrow). We used a sigmoidal elementwise squashing function of the form:
in which the sigmoid is approximated through two smoothabs functions, \(\gamma \) defines its smoothness, and \(\underline{\mathbf {u}}^i\), \(\overline{\mathbf {u}}^i\) are the elementwise lower and upper control bounds, respectively. We introduced this squashing function on the system controls as: \({\mathbf {x}}_{k+1} = {\mathbf {f}}({\mathbf {x}}_k,{\mathbf {s}}({\mathbf {u}}_k))\). We used \(\gamma =2\) for all the experiments presented in this section.
Figure 9 shows that BoxFDDP converged faster than the other approaches. As reported in the accompanying video, BoxFDDP did not generate undesired loops and jerky motions as in the other cases. Indeed, the solutions with BoxFDDP have the lowest cost values (cf. Table 1). We also observed that the squashing approach often converges sooner compared to BoxDDP\(^+\). The main reason is due to the early saturation of the controls performed by BoxDDP\(^+\).
In Fig. 10, we show the cost evolution for 10 different initial configurations of the reaching goal task. The target and initial configurations are (3, 0, 1) and \((0.3 \pm 0.6, 0, 0)\) m, respectively. Infeasible iterations, in BoxFDDP, produce a very low cost in the first iterations. The squashing approach is the most sensitive to initial conditions. However, on average, it produces slightly better solutions than BoxDDP\(^+\). This is in contrast to the reported results in Tassa et al. (2014), where the performance was analyzed only for the linearquadratic regulator problem.
5 Conclusion
We proposed a feasibilitydriven approach whose search is primarily driven by the dynamics feasibility of the optimal control problem. The dynamicallyinfeasible iterations, which mimic a direct multiple shooting approach, allowed us to solve a wide range of optimal control problems despite it being provided with a poor initialization. The benefits of our approach are crystallized over a set of athletic and highlydynamic maneuvers computed for the Talos humanoid and the ANYmal quadruped robots, respectively. Its improvement on the basin of attraction for a good local optimum has been a key factor to optimize such kind of complex maneuvers while considering the robot’s full rigid body dynamics, joint limits and friction cone constraints. Indeed, BoxFDDP has shown an increment in the robustness against different initial guesses compared with advanced Knitro algorithms.
We have provided evidence that our algorithm produces descent search directions. For instance, we have observed that the feasibility contraction decreases monotonically as often happens in the advanced nonlinear programming algorithms available in Knitro. Our approach has also shown to quickly reduce the dynamic infeasibility as observed in the most competitive Knitro algorithms. A similar effect is observed in the cost evolution as well. Our results suggest that the gap contraction rate is influenced by the nonlinearities of the system dynamics.
The runtime performance of BoxFDDP is often superior to direct transcription solved using stateoftheart Knitro algorithms despite the increase in number of iterations due to the use of quadratic penalization terms. However, when comparing the computation time per iteration, our approach is between 2 to 10 times faster, which makes it suitable for model predictive control applications. Indeed, we demonstrated that our BoxFDDP algorithm can generate trotting gaits on the ANYmal C robot in a predictive control fashion. One additional remark is that we have not considered the runtime reduction due to code generation support in Crocoddyl via CppADCodeGen (Leal, 2011) and CppAD (Bell, 2012). According to our experience, code generation can lead to a computation time reduction between 30% to 60% as can be seen in our public benchmarks (Mastalli et al., 2019).
We have developed and reported the results for a wide range of optimal control problems in robotics. These problems cover an important part of the spectrum of robotics applications. The comparison across all these problems is unusual, and it represents an important contribution to the research community since we have open sourced many of these examples, as well as the BoxFDDP algorithm, in the Crocoddyl repository (Mastalli et al., 2019). Our feasibilitydriven approach enabled model predictive control applications on the ANYmal C robot, however, it can potentially be used in other applications such as in humanoid robotics (Eßer et al., 2021), robot codesign (IROS, 2022), and learning (Lembono et al., 2020).
Notes
This name is coined by Albersmeyer and Diehl (2010), and we refer to gaps or defects produced between multiple shooting nodes.
For more details about the FONC of optimality see Nocedal and Wright (2006).
In the following section we formally describe the actionvalue function (i.e.,\(Q\)function).
Here, feasibility concerns the dynamics of the system, not the feasibility of other problem constraints.
In this work, a nonlinear rollout is also referred to as a nonlinear step.
\(\beta _{i,d}\) commonly range between 2–10. We set \(\beta _{i,d} = 10\) in this work.
We use \(10^{9}\) as the initial regularization value.
We use \(10^{16}\) and \(10^{12}\) as the minimum and maximum damping values, respectively. Note that \(10^{16}\) is approximately the resolution of a double number.
Supplementary video: https://youtu.be/bOGBPTh_lsU.
For more details about interior point methods, the authors suggest the reader to see Nocedal and Wright (2006).
For further detail, we suggest the reader to consult the Knitro manual: https://www.artelys.com/docs/knitro/3_referenceManual.html.
For more details about the ABA algorithm see Featherstone (2007).
Talos’ arms are not strong enough to support its own weight.
References
A versatile codesign approach for dynamic legged robots, In IEEE/RSJ international conference on intelligence robotics and systems. (IROS).
AceitunoCabezas, B., Mastalli, C., Dai, H., Focchi, M., Radulescu, A., Caldwell, D. G., Cappelletto, J., Grieco, J. C., FernandezLopez, G., & Semini, C. (2017). Simultaneous contact, gait and motion planning for robust multilegged locomotion via mixedinteger convex optimization, In IEEE robotics and automation letters (RAL).
Albersmeyer, J., & Diehl, M. (2010). The lifted Newton method and its application in optimization, SIAM Journal on Optimization.
Baumgarte, J. (1972). Stabilization of constraints and integrals of motion in dynamical systems. Computer Methods in Applied Mechanics and Engineering, 1, 1–16.
Bell, B. (2012). CppAD: A package for differentiation of C++ algorithms, In Computational infrastructure for operations research. [Online]. Available: http://www.coinor.org/CppAD/
Bellman, R. E. (1954). The theory of dynamic programming, Bulletin of the American Mathematical Society.
Bertsekas, D. P. (1982). Projected newton methods for optimization problems with simple constraints. SIAM Journal on Control and Optimization, 20, 221–246.
Betts, J. T. (2009). Practical methods for optimal control and estimation using nonlinear programming (2nd ed.). Cambridge University Press.
Budhiraja, R., Carpentier, J., Mastalli, C., & Mansard, N. (2018). Differential dynamic programming for multiphase rigid contact dynamics, In IEEE international conference on human and robotics (ICHR).
Byrd, R. H., Gould, N. I. M., Nocedal, J., & Waltz, R. A. (2003). An algorithm for nonlinear optimization using linear programming and equality constrained subproblems. Mathematical of Programming, 100, 27–48.
Byrd, R. H., Hribar, M. E., & Nocedal, J. (1999). An interior point algorithm for largescale nonlinear programming. SIAM Journal on Optimization, 9, 877–900.
Byrd, R. H., Nocedal, J., & Waltz, R. A. (2006). KNITRO: An integrated package for nonlinear optimization, In Large scale nonlinear optimization, (pp. 35–59).
Carpentier, J., & Mansard, N. (2018). Analytical derivatives of rigid body dynamics algorithms, In Robotics: science and systems (RSS).
Carpentier, J., Tonneau, S., Naveau, M., Stasse, O., & Mansard, N. (2016). A versatile and efficient pattern generator for generalized legged locomotion, In International conference on robotics and automation (ICRA).
Di Carlo, J., Wensing, P. M., Katz, B., Bledt, G., & Kim, S. (2018). Dynamic locomotion in the MIT cheetah 3 through convex modelpredictive control, In IEEE/RSJ international conference on intelligent robots and systems (IROS).
Diehl, M., Bock, H. G., Diedam, H., & Wieber, P. B. (2006). Fast direct multiple shooting algorithms for optimal robot control, In Proceedings of Berlin Heidelberg: Fast Motions in Biomechanics Robot. Springer.
Eßer, J., Kumar, S., Peters, H., Bargsten, V., Fernandez, J. d. G., Mastalli, C., Stasse, O., & Kirchner, F. (2021). Design, analysis and control of the seriesparallel hybrid RH5 humanoid robot, In IEEE International conference on human and robotics (ICHR).
Farshidian, F., Jelavic, E., Satapathy, A., Giftthaler, M., & Buchli, J. (2017). Realtime motion planning of legged robots: A model predictive control approach,’ In IEEE international conference on human and robotics (ICHR).
Featherstone, R. (2007). Rigid body dynamics algorithms. SpringerVerlag.
Fletcher, R. (1971). A modified Marquardt subroutine for nonlinear least squares.Technical Report.
Frese, U. (2008). A framework for sparse nonlinear least squares problems on manifolds, Ph.D. dissertation, Universität Bremen.
Gabay, D. (1982). Minimizing a differentiable function over a differential manifold. Journal of Optimization Theory and Applications, 37, 177–219.
Giftthaler, M., Neunert, M., Stäuble, M., Buchli, J., & Diehl, M. (2018). A family of iterative gaussnewton shooting methods for nonlinear optimal control, In IEEE/RSJ international conference on intelligent robots and systems (IROS).
Gill, P. E., Murray, W., & Saunders, M. A. (2005). SNOPT: An SQP algorithm for largescale constrained optimization, SIAM Rev.
Hargraves, C. R., & Paris, S. W. (1987). Direct trajectory optimization using nonlinear programming and collocation. Journal of Guidance, 10(4), 338–342.
Harwell Subroutine Library, AEA Technology, Harwell, Oxfordshire, England. A catalogue of subroutines, http://www.hsl.rl.ac.uk/.
Hei, L., Nocedal, J., & Waltz, R. A. (2008). A numerical study of activeset and interiorpoint methods for bound constrained optimization, In Modeling, Simulation and Optimization of Complex Processes.
Howell, T. A., Jackson, B., & Manchester, Z. (2019). ALTRO: a fast solver for constrained trajectory optimization, In IEEE/RSJ international conference on intelligent robots and systems (IROS).
Kirk, D. E. (1998). Optimal control theory  An introduction. Dover Publications.
Koenemann, J., Prete, A. Del, Tassa, Y., Todorov, E., Stasse, O., Bennewitz, M., & Mansard, N. (2015). Wholebody modelpredictive control applied to the HRP2 humanoid, In IEEE/RSJ international conference on intelligent robots and systems (IROS).
Lantoine, G., & Russell, R. (2008). A hybrid differential dynamic programming algorithm for robust lowthrust optimization, In AIAA/AAS astrodyne special conference and exhibition.
Leal, J. “CppADCodeGen,” https://github.com/joaoleal/CppADCodeGen/, 2011–2020.
Lembono, T. S., Mastalli, C., Fernbach, P., Mansard, N., & Calinon, S. (2020). Learning how to walk: warmstarting optimal control solver with memory of motion, In IEEE international conference on robotics and automation (ICRA).
Li, W., & Todorov, E. (2004). Iterative linear quadratic regulator design for nonlinear biological movement systems, In ICINCO.
Liao, L. zhi, & Shoemaker, C. A. (1992). Advantages of differential dynamic programming over Newton’s method for discretetime optimal control problems, Cornell University, Technical Report.
MartiSaumell, J., Solà, J., Mastalli, C., & SantamariaNavarro, A. (2020). Squashbox feasibility driven differential dynamic programming, In IEEE/RSJ international conference on intelligent robots and systems (IROS).
Mastalli, C., Budhiraja, R., & Mansard, N. (2019). Crocoddyl: a fast and flexible optimal control library for robot control under contact sequence, https://github.com/loco3d/crocoddyl.
Mastalli, C., Budhiraja, R., Merkt, W., Saurel, G., Hammoud, B., Naveau, M., Carpentier, J., Righetti, L., Vijayakumar, S., & Mansard, N. (2020). Crocoddyl: An efficient and versatile framework for multicontact optimal control, In IEEE international conference on robotics and automation (ICRA).
Mastalli, C., Focchi, M., Havoutis, I., Radulescu, A., Calinon, S., Buchli, J., Caldwell, D. G., & Semini, C. (2017). Trajectory and foothold optimization using lowdimensional models for rough terrain locomotion, In IEEE robotics and automation letters (ICRA).
Mastalli, C., Havoutis, I., Focchi, M., Caldwell, D. G., & Semini, C. (2020). Motion planning for quadrupedal locomotion: coupled planning, terrain mapping and wholebody control, IEEE Transactions on Robotics (TRO).
Mastalli, C., Merkt, W., Xin, G., Shim, J., Mistry, M., Havoutis, I., & Vijayakumar S. (2022). Agile maneuvers in legged robots: a predictive control approach, arXiv:2203.07554.
Mayne, D. (1966). A secondorder gradient method for determining optimal trajectories of nonlinear discretetime systems, International Journal of Control.
Merkt, W., Ivan, V., & Vijayakumar, S. (2018). Leveraging precomputation with problem encoding for warmstarting trajectory optimization in complex environments, In IEEE/RSJ international conference on intelligent robots and systems (IROS).
Neunert, M., Stauble, M., Giftthaler, M., Bellicoso, C. D., Carius, J., Gehring, C., Hutter, M., & Buchli, J. (2018). Wholebody nonlinear model predictive control through contacts for quadrupeds, IEEE Robotics and Automation Letters (RAL).
Nocedal, J., & Wright, S. J. (2006). Numerical optimization (2nd ed.). Springer.
Ohno, K. (1978). A new approach to differential dynamic programming for discrete time systems. IEEE Transactions on Automatic Control, 23(1), 37–47.
Pardo, D., Moller, L., Neunert, M., Winkler, A. W., & Buchli, J. (2016). Evaluating direct transcription and nonlinear optimization methods for robot motion planning, IEEE Robotics and Automation Letters (RAL).
Tassa, Y. (2011). Theory and implementation of biomimetic motor controllers, Ph.D. dissertation, Hebrew University of Jerusalem.
Tassa, Y., Erez, T., & Todorov, E. (2012). Synthesis and stabilization of complex behaviors through online trajectory optimization,” In IEEE/RSJ international conference on intelligent robots and systems (IROS).
Tassa, Y., Mansard, N., & Todorov, E. (2014). Controllimited differential dynamic programming, In IEEE international conference on robotics and automation (ICRA).
Wächter, A., & Biegler, L. T. (2006). On the implementation of an interiorpoint filter linesearch algorithm for largescale nonlinear programming, Mathematical Programming.
Waltz, R. A., Morales, J. L., Nocedal, J., & Orban, D. (2006). An interior algorithm for nonlinear optimization that combines line search and trust region steps. Mathematical of Programming, 107, 391–408.
Wieber, P. B. (2006). Trajectory free linear model predictive control for stable walking in the presence of strong perturbations, In IEEE international conference on human robotics (ICHR).
Winkler, A. W., Bellicoso, D. C., Hutter, M., & Buchli, J. (2018). Gait and trajectory optimization for legged systems through phasebased endeffector parameterization, In IEEE robotics and automation letters (RAL).
Xie, Z., Liu, C. K., & Hauser, K. (2017). Differential dynamic programming with nonlinear constraints, In IEEE international conference on robotics and automation (ICRA).
Acknowledgements
The authors are grateful to Matt TimmonsBrown and Vladimir Ivan, from the University of Edinburgh, for the production of the audio material of our video and the fruitful discussions on the study of the effects of our feasibilitydriven approach, respectively.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This research was supported by (1) the European Commission under the Horizon 2020 Project Memory of Motion (MEMMO, Project ID: 780684), (2) the Engineering and Physical Sciences Research Council (EPSRC) UK RAI Hub for Offshore Robotics for Certification of Assets (ORCA, Grant reference EP/R026173/1), and (3) the Alan Turing Institute.
Supplementary Information
Below is the link to the electronic supplementary material.
Supplementary file 1 (mp4 56441 KB)
Appendix A Optimal control problems
Appendix A Optimal control problems
We divide the optimal control problems into four subsections: double pendulum, quadcopter, quadruped and humanoid robots.
1.1 A Double pendulum (pend)
The goal is to swing from the stable to the unstable equilibrium points, i.e. from downward to upward positions, respectively. To increase the problem complexity, the double pendulum (with weight of \(\approx \) 4.5 N) has a single actuated joint with small range of control (from \(5\) to 5 N m, largely insufficient for a static execution of the trajectory). The time horizon is 1 s with 100 nodes. We define a quadratic cost function, for each node, that aims to reach the upward position. For the running and terminal nodes, we use the weight values of \(2\times 10^{4}\) and \(2\times 10^4\), respectively. Additionally, we provide state and control regularization terms. To inspect the algorithm capabilities, we do not provide an initial guess (i.e., zeros for the states and controls), thus the swingup strategy is discovered by the solver itself. Finally, we implement the cost, dynamics, and their analytical derivatives.
1.2 B Quadcopter
We consider three tasks for the IRIS quadcopter: reaching goal (goal), looping maneuver (loop), and traversing a narrow passage (narrow). We define different waypoints to describe the tasks, where each waypoint specifies the desired pose and velocity. The waypoints are described through cost functions in the robot placement and velocity at specific instants of the motion. These cost functions are quadratic term with \(10^2\) as the weight value for both: pose and velocity. We also include quadratic regularization terms for the state and control, their weight values are \(10^{5}\) and \(10^{1}\), respectively. The vehicle pose is described as a \(\mathbb{SE}\mathbb{}(3)\) element, which allows us to consider any kind of motion such as looping maneuvers. Control inputs are considered to be the thrust produced by the propellers, which can vary within a range from 0.1 to 10.3 N each. We compute the dynamics using the articulated body algorithm (ABA) algorithm^{Footnote 12}, and the analytical derivatives are calculated as described in Carpentier and Mansard (2018). We integrate the dynamics with a fixed time step of 10 ms. The solution is computed from a coldstart of the algorithm. As in the double pendulum case, we do not provide an initial guess to the solvers.
1.3 C Aggressive jump, unstable hopping, and various gaits
We use the ANYmal quadruped robot to generate a wide range of motions—jumping, hopping, walking, trotting, pacing, and bounding. We deliberately reduce the torque and velocity limits to 32 N m and 7.5 rad/s, respectively, which intentionally increases the complexity of the jumping task (jump). We use a quadratic barrier to penalize the joint velocities and the contact forces that are outside the limits. The unstable hopping problem (hop) has a long horizon: 7.14 s with 714 nodes. It includes 10 hops in total with a phase that switches the feet in contact. Instead, the other problems have a horizon between \(0.7\)1 s with \(70100\) nodes. We define quadratic terms, with a weight value of \(10^6\), to track the desired swingfoot placements for each case. We regularize the state trajectory around the robot’s default configuration. We use a friction coefficient of 0.7. Furthermore, we formulate a multiphase optimal control problem using rigid contact dynamics, and their analytical derivatives, as described in Budhiraja et al. (2018), Mastalli et al. (2020), respectively. During a contact transition, we employ the impulse dynamics with analytical derivatives as also described in Mastalli et al. (2020). Our contact dynamics model also includes the Baumgarte gains, which are needed to numerically stabilize differential algebraic equations (Baumgarte, 1972); we use the values: (0, 50). We initialize the solver using the default posture and the quasistatic torques for each node of the initial guess trajectories. The default posture defines the standing position of the robot; it does not provide any relevant information for a specific maneuver (e.g., jump). The quasistatic torques describe the forces required to cancel the effects of gravity subject to the robot’s default posture.
1.4 D Wholebody manipulation and balance
We consider four problems for the Talos humanoid robot: wholebody manipulation (man), hand control while balancing in single leg (taichi), dip on parallel bars (dip) and a pullup bar task (pullup). For the dip and pullup workouts, we define the real torque limits for the leg and torso joints.^{Footnote 13} Additionally, we consider joint position and velocity limits in each scenario through a quadratic barrier. Both taichi, dip and pullup tasks are divided into three phases; for the taichi task: manipulation, standing on one foot, and balancing; for the dip and pullup bar task: grasping the bar, workout, and landing on ground. Dip and pullup problems have a horizon of 12 s with 400 nodes. The horizon of the wholebody manipulation and taichi problem are 3 and 6 s with 60 and 120 nodes, respectively. We define different Baumgarte gains for the hands and feet, i.e. (0, 20) and (0, 40), respectively. We use the contactplacement cost functions for both: feet and hands. Indeed, behaviors such as dip/pullup balancing, pullup motion, or the legcrossing strategy emerge from our optimization algorithm since we do not describe the center of mass and swingcontact motions. Finally, the regularization terms, dynamics and initialization strategy are the same that we use for the ANYmal cases. For the details about the ANYmal problems see above (Section A–C).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Mastalli, C., Merkt, W., MartiSaumell, J. et al. A feasibilitydriven approach to controllimited DDP. Auton Robot 46, 985–1005 (2022). https://doi.org/10.1007/s1051402210061w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s1051402210061w