First, a general mathematical problem formulation for a single dynamic system is presented, which is then extended to several systems that share a common resource. The extended problem is put into a standard form and algorithms for the distributed solution of the problem are introduced.
Problem formulation for a sub-system trajectory
A general dynamic or trajectory optimization problem over a fixed time interval \(t\in [t_{0},t_{f}]\) can be written as follows:
$$\begin{aligned} \min _{\begin{array}{c} u(t)\\ \forall t \in [t_{0},t_{f}] \end{array}}&\quad \left( \varUpsilon \left( \chi (t_{f})\right) + \int _{t_{0}}^{t_{f}} \varTheta \left( \chi (t),u(t),t \right) dt \right) ,&\end{aligned}$$
(1a)
$$\begin{aligned} \mathrm {s.t.}&\quad \dot{\chi }(t) = F\left( \chi (t),u(t),t \right) , \quad t \in [t_{0},t_{f}],\ \end{aligned}$$
(1b)
$$\begin{aligned}&\quad \chi (t_{0})=\chi _{0}, \quad \end{aligned}$$
(1c)
$$\begin{aligned}&\quad P\left( \chi (t),u(t),t \right) \le 0, \quad t \in [t_{0},t_{f}],\end{aligned}$$
(1d)
$$\begin{aligned}&\quad T\left( \chi (t_{f}) \right) \le 0. \end{aligned}$$
(1e)
In dynamic optimization, there is a distinction between state variables \(\chi (t)\) and inputs u(t) to the system. From the inputs u(t) and the initial condition, \(\chi (t_{0}) = \chi _{0}\), the state variables can be computed using the model equation Eq. 1b. Thus u(t) are the degree of freedom, while \(\chi (t)\) are the dependent variables. The goal is to find the best inputs such that the constraints are satisfied and some performance measure of the resulting trajectory is maximized or minimized.
The objective is given in the Bolza form which consists of a scalar performance measure at the end of the horizon, \(\varUpsilon\), and an integral part that tracks some scalar performance measure over the whole path of the trajectory, \(\varTheta\). Similar to the objective, the constraints are defined as terminal constraints (\(T_{i}\)) and path constraints (\(P_{i}\)) (Sargent 2000).
Problem formulation for multiple trajectories with shared inputs
The problem of interest in this paper is to optimize N trajectories for N sub-systems that share resources and start as well as end their operation possibly at different times. This can be formulated as the following dynamic optimization problem:
$$\begin{aligned} \min _{\begin{array}{c} u_{i}(t)\\ \forall (t,\ i)\ \in \ ([t_{min},t_{max}],\ \{1,\ldots ,N\}) \end{array}}&\quad \sum _{i\in \{1,\ldots ,N\}}^{} \left( \varUpsilon _i \left( \chi _i(t_{f,i})\right) + \int _{t_{0,i}}^{t_{f,i}} \varTheta _{i} \left( \chi _{i}(t),u_{i}(t),t \right) dt \right) , \end{aligned}$$
(2a)
$$\begin{aligned} \mathrm {s.t.}&\quad \sum _{i\in \{1,\ldots ,N\}}^{} u_i(t) \le u_{shared,max}(t), \quad t \in [t_{min},t_{max}], \end{aligned}$$
(2b)
$$\begin{aligned}&\quad \dot{\chi }_{i}(t) = F_{i}\left( \chi _{i}(t),u_{i}(t),t \right) , \nonumber \\&\quad \quad \quad t \in [t_{0,i},t_{f,i}],\ \forall i \in \{1,\ldots ,N\}, \end{aligned}$$
(2c)
$$\begin{aligned}&\quad \chi _{i}(t_{0,i})=\chi _{0,i}, \quad \forall i \in \{1,\ldots ,N\}, \end{aligned}$$
(2d)
$$\begin{aligned}&\quad P_{i}\left( \chi _{i}(t),u_{i}(t),t \right) \le 0, \nonumber \\&\quad \quad \quad t \in [t_{0,i},t_{f,i}],\ \forall i \in \{1,\ldots ,N\}, \end{aligned}$$
(2e)
$$\begin{aligned}&\quad T_{i}\left( \chi _{i}(t_{f,i}) \right) \le 0, \quad \forall i \in \{1,\ldots ,N\}, \end{aligned}$$
(2f)
$$\begin{aligned}&\quad u_{i}(t) = 0, \quad t \notin [t_{0,i},t_{f,i}], \quad \forall i \in \{1,\ldots ,N\}. \end{aligned}$$
(2g)
The considered time interval is given by \(t_{min}=\min \left\{ t_{0,1},\ldots ,t_{0,N} \right\}\) and \(t_{max}=\max \left\{ t_{f,1},\ldots ,t_{f,N} \right\}\). The objective is to minimize the sum of the individual objectives. The variables \(\chi _{i}\) and \(u_{i}\) belong to sub-system i exclusively, can only be manipulated by the respective sub-system, and, except for coupling via the overarching constraints Eq. 2b, have no impact on the other sub-systems. Due to the different starting and final times of the trajectories, when the trajectory of sub-system i is not active, its use of the resource is fixed at 0 via Eq. 2g.
Numerical solution methods for trajectory optimization
The problems given in Eqs. 1 and 2 can be solved using different approaches: Direct optimization methods, methods based on Pontryagin’s minimum principle, and methods that are based on the Hamilton–Jacobi–Bellman equations (Bellman 1957; Bertsekas 1995; Pontryagin 2018; von Stryk and Bulirsch 1992). Depending on the selected method, the level of discretization can be chosen: All quantities can be considered infinite-dimensional in time, only the inputs can be discretized, or additionally to the inputs also the states can be fully discretized. If the inputs are discretized, they are usually considered to be piece-wise constant or piece-wise linear within the discretization elements. An overview of the solution methods as well as discretization levels is given in Betts (1998) and Srinivasan et al. (2003).
While there are also other efficient methods, as, e.g., parsimonious input parametrization (Rodrigues and Bonvin 2019), direct methods are used here, since they are best suited to handle the overarching constraints Eq. 2b as well as to get reliable numerical solutions (Srinivasan et al. 2003). When the inputs are discretized into equidistant intervals of duration \(\varDelta t\), there are three options to solve such a problem using the direct method: The first option is control vector parametrization, where the states remain continuously defined for every point in time and are determined by integration. The degrees of freedom for the optimization are the values of the inputs. The second is the simultaneous approach, where, additionally to the inputs, also the states are fully discretized such that a sparse large non-linear program (NLP) results (Biegler 2007). Since also the states are degrees of freedom, only when the NLP has been solved, the resulting trajectory satisfies the model equations. On the other hand, this method often shows to be more robust. The third option is multiple shooting, in which the time is divided into several intervals, where in each interval control vector parametrization is used to determine the solution. Matching of the states at the ends of the intervals is forced by an additional boundary condition (Bock and Plitt 1985). In this contribution, control vector parametrization is applied and a constant stepsize 4th-order Runge-Kutta method is used for integration, because this renders all derivatives, of objective as well as of the constraints, dependent only on the inputs.
Discretized problem formulation
The discretization of the problem in Eq. 1 can be done as described in the previous subsection, however, for the problem formulation with multiple trajectories and shared inputs, synchronization of time between the different trajectories needs to be assured. Only if the starting and ending times are on the same time grid as the discretization of the inputs, the overarching constraints can be enforced at every point in time. The starting and ending times are thus expressed as multiples of the shared minimum discretization duration \(\varDelta t\) via the following relationship:
$$\begin{aligned} t_{i,0}&= N_{i,0} \ \varDelta t, \end{aligned}$$
(3)
$$\begin{aligned} t_{i,f}&= N_{i,f} \ \varDelta t. \end{aligned}$$
(4)
For each sub-system i, the sets of points \(\varPsi _{i} = \{N_{i,0} ,\ldots , N_{i,f}-1\}\) are defined as counterparts to the continuous time intervals \([t_{0,i},t_{f,i}]\).
Similar to the continuous case, \(N_{min}\) and \(N_{max}\) are defined as the minimum and maximum over all sub-systems i of \(N_{i,0}\) and \(N_{i,f}\). This leads to the following problem formulation:
$$\begin{aligned} \min _{ \begin{array}{c} \chi _{i,p},\ u_{i,p},\\ \forall (p,\ i) \ \in \ (\{N_{min},\ldots ,N_{max}\},\ \{1,\ldots ,N\}) \end{array}}&\quad \sum _{i\in \{1,\ldots ,N\}}^{} \left( \varUpsilon _i(\chi _{i,N_{i,f}}) + \sum _{p\in \varPsi _{i}}^{} \varTheta _{i} \left( \chi _{i,p},u_{i,p},\varDelta t,p\right) \right) , \end{aligned}$$
(5a)
$$\begin{aligned} \mathrm {s.t.}&\quad \sum _{i \in \{1,\ldots ,N\}}^{} u_{i,p} \le u_{shared,max,p}, \nonumber \\&\quad \quad \quad \quad \forall p\in \{N_{min},\ldots ,N_{max}\}, \end{aligned}$$
(5b)
$$\begin{aligned}&\quad \tilde{F}_{i}\left( \chi _{i,p},\chi _{i,p+1},u_{i,p},\varDelta t,p\right) = 0, \nonumber \\&\quad \quad \quad \quad \forall p\in \varPsi _{i}, i \in \{1,\ldots ,N\}, \end{aligned}$$
(5c)
$$\begin{aligned}&\quad \chi _{i,N_{i,0}}=\chi _{0,i}, \quad \forall i \in \{1,\ldots ,N\}, \end{aligned}$$
(5d)
$$\begin{aligned}&\quad P_{i}\left( \chi _{i,p},u_{i,p},\varDelta t,p\right) \le 0, \nonumber \\&\quad \quad \quad \quad \forall p\in \varPsi _{i}, i \in \{1,\ldots ,N\}, \end{aligned}$$
(5e)
$$\begin{aligned}&\quad T_{i}\left( \chi _{i,N_{i,f}} \right) \le 0, \quad \forall i \in \{1,\ldots ,N\}, \end{aligned}$$
(5f)
$$\begin{aligned}&\quad u_{i,p} = 0, \quad \forall p\notin \varPsi _{i}, i \in \{1,\ldots ,N\}. \end{aligned}$$
(5g)
In this formulation, p is the discrete time index. Note that in this case the models \(\tilde{F}_{i}\) are defined implicitly, connecting the old and the new states. This is used to express numerical integration or full discretization. Furthermore, it should be noted that the path constraints \(P_{i}\) are only defined at the grid points.
The resulting optimization problem is non-convex, which is in general a \(\mathscr {NP}\)-hard problem that can have multiple local minima (Esposito and Floudas 2000; Papamichail and Adjiman 2002).
Problem formulation in the standard form of distributed optimization
The problem of optimizing trajectories with shared resources across system boundaries from Eq. 5 can be written in the standard form of a general sharing problem, cf. Boyd et al. (2010),
$$\begin{aligned} \min _{x_{i},\ \forall i \in \{1,\ldots ,N\}}&\quad \sum _{i \in \{1,\ldots ,N\}}^{} f_i(x_i), \end{aligned}$$
(6a)
$$\begin{aligned} \mathrm {s.t.}&\quad \sum _{i \in \{1,\ldots ,N\}}^{} A_i x_i \le b,\end{aligned}$$
(6b)
$$\begin{aligned}&\quad g_i(x_i)\le 0, \quad i \in \{1,\ldots ,N\}. \end{aligned}$$
(6c)
The variables \(x_{i}\) comprise the inputs and the states of the sub-systems, \(dim (x_{i}) = dim(\chi _{i})+dim(u_{i})\). The inputs of the trajectory optimization problem described in the previous subsection are given by the linear mapping \(u_{i} = A_i x_i\), with \(u_{i} = [u_{i,N_{i,0}},u_{i,N_{i,1}},\ldots ,u_{i,N_{i,f}-1}]^{T}\). The state variables are given by \(\chi _{i} = B_i x_i\), with \(\chi _{i} = [\chi _{i,N_{i,0}},\chi _{i,N_{i,1}},\ldots ,\chi _{i,N_{i,f}}]^{T}\). The initial conditions, model equations, and system-specific constraints are described by the \(n_{g_{i}}\)-dimensional inequality constraint function \(g_{i}\). The dimension of the overarching constraints is m, i.e., \(dim(b)=dim(A_{i} x_{i})=dim(u_{i})=dim(u_{shared,max})=m\).
Necessary conditions of optimality for distributed optimization
For this problem in standard form, the Lagrangian of the problem is given by Bertsekas (1999):
$$\begin{aligned} \mathscr {L}(x,\lambda ,\mu )&:= \sum _{i \in \{1,\ldots ,N\}}^{} \left( f_i(x_i) \right) + \lambda ^T \left( \sum _{i \in \{1,\ldots ,N\}}^{} A_i x_i - b \right) + \sum _{i \in \{1,\ldots ,N\}}^{} \left( \mu _i^{T} (g_i(x_i)) \right) ,\nonumber \\&=\sum _{i \in \{1,\ldots ,N\}}^{} \left( f_i(x_i) + \lambda ^T A_i x_i - \frac{1}{N} \lambda ^T b + \mu _i^{T} (g_i(x_i)) \right) , \nonumber \\&=\sum _{i \in \{1,\ldots ,N\}}^{} \mathscr {L}_i(x_i,\lambda ,\mu _{i}), \end{aligned}$$
(7)
where \(\lambda\) are the Lagrange multipliers corresponding to the overarching constraints in Eq. 6b and \(\mu _{i}\) are the Lagrange multipliers for the sub-system specific constraints in Eq. 6c. Using the Lagrangian, the first-order necessary conditions of optimality (Karush-Kuhn-Tucker conditions) can be expressed as:
$$\begin{aligned} \nabla _{x_{i}} \mathscr {L}_{i}(x_{i},\lambda ,\mu _{i})&= 0,\quad \forall i \in \{1,\ldots ,N\}, \end{aligned}$$
(8a)
$$\begin{aligned} g_i(x_i)&\le 0,\quad \forall i \in \{1,\ldots ,N\},\end{aligned}$$
(8b)
$$\begin{aligned} \mu _i&\ge 0,\quad \forall i \in \{1,\ldots ,N\},\end{aligned}$$
(8c)
$$\begin{aligned} \sum _{i=1}^{N} A_i x_i - b&\le 0, \end{aligned}$$
(8d)
$$\begin{aligned} \lambda&\ge 0. \end{aligned}$$
(8e)
The interesting property of these conditions is that Eqs. 8a–8c can be evaluated independently for each sub-system i and only Eqs. 8d and 8e require coordination between the different sub-systems.
Distributed solution algorithms based on the dual problem
While the Eqs. 8 can be solved monolithically via state of the art solvers, in this contribution methods that exploit the distributed structure of the problem are investigated.
We focus on hierarchical methods, where all sub-system specific decisions are taken in a distributed fashion and only on the coordination layer, the satisfaction of the overarching constraints Eqs. 8d and 8e is enforced. These methods are also known as dual methods, which make use of the dual variables or Lagrange multipliers. In our case, the dual variables of interest are the ones corresponding to the overarching constraints, i.e. \(\lambda\).
Using the solution to Eqs. 8a–8c, written as \(\inf _{x_{i},\ \mu _{i}\ge 0} \mathscr {L}_i(x_i,\lambda ,\mu _{i})\), the dual function can then be defined:
$$\begin{aligned} d(\lambda ) = \inf _{x ,\ \mu \ge 0} \mathscr {L}(x,\lambda ,\mu ) = \sum _{i \in \{1,\ldots ,N\}}^{} \inf _{x_{i} ,\ \mu _{i}\ge 0} \mathscr {L}_i(x_i,\lambda ,\mu _{i}). \end{aligned}$$
(9)
Using this dual function of \(\lambda\), the optimality condition can be expressed as finding the maximum of \(d(\lambda )\) with \(\lambda \ge 0\), which is called the dual problem:
$$\begin{aligned} \max _{\lambda \ge 0} d(\lambda ). \end{aligned}$$
(10)
Due to the infimum in Eq. 9, the dual function is in general not known explicitly. However, according to Danskin’s theorem (Bertsekas 1999), a sub-gradient can be determined at \(\lambda\) via:
$$\begin{aligned} \partial d(\lambda ) = \partial \sum _{i \in \{1,\ldots ,N\}}^{} \inf _{x_{i},\ \mu _{i}\ge 0} \mathscr {L}_i(x_i,\lambda ,\mu _{i}) = \sum _{i \in \{1,\ldots ,N\}}^{} A_i x_i(\lambda ) - b. \end{aligned}$$
(11)
In this contribution, we compare iterative methods for the maximization of the dual which do not require the explicit knowledge of the value of the dual function and thus of the different individual objectives. As measures of convergence, we define two criteria. The primal feasibility, \(\varPhi _{Primal}\), is a measure of the satisfaction of the overarching constraints of the original problem. At the same time, due to Eq. 11 and the fact that the dual is always concave, it is also a measure of the vanishing of the gradient (Boyd and Vandenberghe 2004).
$$\begin{aligned} \varPhi _{Primal}[j]&= \max \left\{ 0, \left( \sum _{i \in \{1,\ldots ,N\}}^{} A_i x_i(\lambda ) - b\right) [j] \right\} , \quad j \in \{1,\ldots ,m\}. \end{aligned}$$
(12)
Here we highlight that \(x_i\) is a function of \(\lambda\). If all elements of \(\varPhi _{Primal}\) are equal to 0, the overarching constraints are satisfied. Additionally to the primal feasibility, which measures the satisfaction of the constraints, also a measure of convergence is required, in order to prevent termination when a solution is primal feasible, but not optimal yet. The dual feasibility, \(\varPhi _{Dual}\), can be interpreted as the satisfaction of the optimality criterion for the dual problem, i.e., the gradient approaching the 0 vector. Thus we define the dual feasibility as the finite difference approximation of the gradient of the dual function.
$$\begin{aligned} \varPhi _{Dual}[j]&=\frac{ \left| \lambda ^{+}[j] - \lambda [j] \right| }{ \alpha [j] }, \end{aligned}$$
(13)
This second feasibility criterion is a measure of how far the solution can deviate from active overarching constraints and is essential for inequality constrained problems, since primal feasibility can always be achieved by sufficiently large Lagrange multipliers. This ensures that the solution not only satisfies \(\sum _{i=1}^{N} A_i x_i^{(k)} - b \le \epsilon\) but also \(\sum _{i=1}^{N} A_i x_i^{(k)}[j] - b[j] \ge - \epsilon\) for all active overarching constraints j. Only if a solution is primal and dual feasible, a saddle point to the Lagrangian that satisfies the conditions of optimality is found.
Since we can optimize numerically only with a certain numerical error, we define a set of \(x^{*},\lambda ^{*}\), and \(\mu ^{*}\) to be optimal if the following holds:
$$\begin{aligned} \{x^{*},\lambda ^{*}, \mu ^{*}\} = \{ x,\lambda , \mu \, \mid \, \left\Vert \varPhi _{Primal}\right\Vert _{\infty } \le \epsilon _{Feas,Primal} \wedge \left\Vert \varPhi _{Dual}\right\Vert _{\infty } \le \epsilon _{Feas,Dual} \}, \end{aligned}$$
where \(\epsilon _{Feas,Primal}\) and \(\epsilon _{Feas,Dual}\) are the desired numerical tolerances and \(x_i\) minimizes the objective of sub-system i.
Sub-gradient method
The simplest method for the maximization of the dual is to follow the direction of the steepest ascent, i.e., to use the direction of the sub-gradient (Shor 2012). Here the challenge is the selection of a suitable stepsize. Since the dual function may be a non-smooth function, which depends on the solution structure of the different trajectories, the stepsize selection criteria for non-smooth optimization derived in Nesterov (2004, p. 142) should be satisfied.
$$\begin{aligned}&\alpha ^k > 0, \end{aligned}$$
(14a)
$$\begin{aligned}&\alpha ^k \rightarrow 0, \end{aligned}$$
(14b)
$$\begin{aligned}&\sum _{k=0}^{\infty } \alpha ^k = \infty . \end{aligned}$$
(14c)
Since the impact of small changes in the dual variables is not the same for all variables \(\lambda [j],\ j\in \{1,\ldots ,m\}\), a scheme is needed that adapts the stepsizes individually.
While there exist methods to evaluate the optimal stepsize at the current point using the Lipschitz constant, as explained in Bertsekas and Tsitsiklis (1989) and Kozma et al. (2014), determining the constant is difficult since the dual cannot be evaluated explicitly. Furthermore, since this constant is not valid globally, the stepsizes either have to be adapted during the maximization of the dual or be chosen conservatively enough to be valid throughout the domain of the dual function.
The sub-gradient method can be considered as alternating local optimization of the sub-systems and adaptation of the Lagrange multipliers on the coordination layer. We propose to decrease the stepsize of a specific overarching constraint every time the sign of the corresponding element of \(\sum _{i=1}^{N} A_i x_i(\lambda ) - b\) changes. This is equivalent to the inequality constraint becoming active or inactive respectively. Specifically the following adaptation of \(\alpha\) is used (cf. Algorithm 1, line 8):
$$\begin{aligned} \alpha ^{(k)}[j] = {\left\{ \begin{array}{ll} \gamma _{Decrease} \ \alpha ^{(k-1)}[j] , &{} \text {if} ,\ \left( \sum _{i=1}^{N} A_i x_i^{(k)} - b \right) [j] \left( \sum _{i=1}^{N} A_i x_i^{(k-1)} - b \right) [j] \le 0, \\ \alpha ^{(k-1)}[j] , &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
(15)
If one of the stepsizes is too large and it has an influence on the Lagrange multiplier (note that if a constraint is inactive, the multiplier is fixed at 0), then this leads inevitably to a diminishing of the stepsize in this direction. The factor \(\beta\) in Algorithm 1 prevents the stepsize from decreasing too fast due to oscillating responses while maintaining the stepsize as large as possible if consecutive steps have the same direction.
Other possible selections of \(\alpha\) can be found in Nesterov (2004). Regardless of the selection of the stepsize, the provable convergence rate for strictly convex problems is at best \(\mathscr {O}(1/k)\).
ADMM
A more robust method is the alternating direction method of multipliers (ADMM) Boyd et al. (2010). It uses the augmented Lagrangian:
$$\begin{aligned} \mathscr {L}_{\rho ,i}(x_i,\lambda ,\mu _{i},z_i) = \mathscr {L}_i(x_i,\lambda ,\mu _{i}) + \frac{\rho }{2} \left\Vert A_{i} x_{i} - z_{i} \right\Vert _{2}^{2}. \end{aligned}$$
(16)
Additionally to the linear penalty term, the deviation from a feasible use of the shared resources \(z_{i}\) is penalized quadratically. Essentially, the penalty terms convexify the problems around points that satisfy the overarching constraints, which accelerates the initial convergence. The variables \(z_{i}\) are determined on the coordinator level and are a projection of the current responses of the different sub-systems onto the feasible region. The stepsize \(\alpha\) for the update of the Lagrange multipliers is \(\frac{\rho }{N}\) in the case of ADMM. However, since additionally to the prices also the \(z_{i}\) determine the response of the systems, the dual feasibility is redefined as:
$$\begin{aligned} \varPhi _{Dual}[j] = \rho \left( \sum _{i=1}^{N} \left| A_i x_i^{(k)}[j] - z_{i}^{(k)}[j]\right| \right) . \end{aligned}$$
(17)
The convergence rate for convex problems is also \(\mathscr {O}(1/k)\) (Hong and Luo 2017; Kozma et al. 2014).
In Algorithm 2 the unscaled version of ADMM for equality constrained sharing problems is shown, cf. Boyd et al. (2010, p. 59). When adapting the algorithm to the inequality constraint problem considered here, the dual variables need to satisfy \(\lambda \ge 0\) and the z-variables need to be adjusted depending on whether the overarching constraints are active or not. When they are active, the same update as in Algorithm 2 applies, however, if the constraints are not active, the references \(z_{i}\) are based on the previous solutions of the sub-systems. This penalty is required since otherwise only some variables are quadratically penalized possibly leading to cyclic solution changes in subsequent iterations. We present ADMM including a new and efficient update step to compute the z-variables for inequality constrained problems in Algorithm 3. To improve convergence, different and variable penalty parameters \(\rho\) are used for each constraint. The penalty parameters are adapted at step 10 of Algorithm 3 using the scheme in Wang and Liao (2001):
$$\begin{aligned} \rho ^{(k)}[j] = {\left\{ \begin{array}{ll} \tau _{Incrase} \ \rho ^{(k-1)}[j] , &{} \text {if}\ \varPhi _{Primal}^{(k)}[j] \ge \delta \ \varPhi _{Dual}^{(k)}[j], \\ \tau _{Decrease} \ \rho ^{(k-1)}[j] , &{} \text {if} \ \delta \ \varPhi _{Primal}^{(k)}[j] \le \varPhi _{Dual}^{(k)}[j], \\ \rho ^{(k-1)}[j] , &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
(19)
The factor \(\delta\) is the maximally allowed difference between primal and dual feasibility before the penalty parameter is adapted to rebalance the proportion. The parameters \(\tau _{Incrase} > 1\) and \(\tau _{Decrease} < 1\) adjust the penalty parameter \(\rho\) if necessary. The update ensures that primal and dual feasibility are kept in balance or, in other terms, that far away from the optimum large changes in \(\lambda\) are made. Close to the optimum, the changes are reduced and additionally the quadratic penalization is decreased, to ensure that the solution without the quadratic penalty also satisfies the overarching constraints.
ALADIN
Another method that uses the augmented Lagrangian is the augmented Lagrangian based algorithm for distributed non-convex optimization in Houska et al. (2016). Different to ADMM, here the reference values for all variables of sub-system i are penalized for deviating from the reference \(z_{i}\):
$$\begin{aligned} \mathscr {L}_{\rho ,i}(x_i,\lambda ,\mu _{i},z_i) = \mathscr {L}_i(x_i,\lambda ,\mu _{i}) + \frac{\rho }{2} \left\Vert x_{i} - z_{i} \right\Vert _{2}^{2}. \end{aligned}$$
(20)
Additionally to reporting the consumption of the shared resources, the sub-systems also report their derivatives of the objective and of the active constraints with respect to the local decision variables to the coordination layer in each iteration k. The Hessian and gradient approximations are then calculated using the constraint Jacobian information \(\mathscr {C}_{i}^{(k)}\) (\(\mathscr {C}_{i}^{Active(k)}\)) of the (active) constraints from Eq. 6c:
$$\begin{aligned} \mathscr {C}_{i}^{Active(k)}[l,:] = {\left\{ \begin{array}{ll} \mathscr {C}_{i}^{(k)}[l,:], &{}\, \text {if } g_{i}(x_i^{(k)})[l]=0 ,\\ 0, &{}\, \text {otherwise,} \end{array}\right. } \quad \text {for } l\in \{1,\ldots ,n_{g_{i}}\}, \forall i\in \{1,\ldots ,N\}, \end{aligned}$$
(21)
where \(\mathscr {C}_{i}^{(k)} = \nabla _{x_{i}} g_{i} (x_{i})|_{x_{i}=x_{i}^{(k)}}\). The modified gradient and Hessian approximation are:
$$\begin{aligned} \mathscr {H}_{i}^{(k)} \approx&\left. \nabla _{x_{i}}^{2} \left( f_{i} (x_{i}) + \mu _{i}^{T} g_{i}(x_i) \right) \right| _{x_{i} = x_{i}^{(k)},\ \mu _{i} = \mu _{i}^{(k)}} . \end{aligned}$$
(23)
With this information of the sub-systems, instead of doing straight projections onto the feasible set, prices and reference values (\(z_{i}\)) are determined via a quadratic program that approximates the objective functions and active constraints of the sub-systems. The algorithm can be seen as a combination of sequential quadratic programming (SQP) and ADMM. The benefit of using more information from the different sub-systems on the coordinator level is in general a faster convergence. In Houska et al. (2016) it is shown that in theory super-linear to quadratic convergence rates are possible.
The update of the Lagrange multipliers \(\lambda\) is done differently compared to the previous two methods without sub-gradient information. Instead, the Lagrange multipliers from the overarching constraints \(\lambda _{QP}\) in the QP are used in the update step. The algorithm for equality constrained problems as proposed in Houska et al. (2016) is given in Algorithm 4. The dual feasibility is defined as follows:
$$\begin{aligned} \varPhi _{Dual}[j] = \rho \left( \sum _{i=1}^{N} \left| x_i^{(k)}[j] - z_{i}^{(k)}[j]\right| \right) . \end{aligned}$$
(25)
The parameters \(\alpha _{i} \in [0,1]\) can be used to adjust the behaviour of the algorithm to match frequent changes of the active set. Houska et al. (2016) provide an additional scheme that utilizes the objective values of the sub-systems, based on which these parameters can be adapted in each iteration to guarantee the convergence to a local minimum. The scheme is not considered in this work, because essentially a monolithic optimization is carried out to determine the parameters.
Since trajectory optimization problems with overarching inequality constraints are considered, Eq. 24b is changed to an inequality constraint and the algorithm is modified accordingly. The solution to trajectory optimization problems consists of different arcs, which correspond to active constraints. Since these constraints ultimately act on the inputs, Eq. 24c fixes all \(\varDelta z_{i}\) for the inputs in the QP. Therefore, the equality constraint Eq. 24c is changed to inequality, such that the variables \(\varDelta z_{i}\) are degrees of freedom. If the reference variables \(z_{i}\) resulting from the QP are infeasible, the Hessian of the sub-systems may become indefinite, which occurs mainly at the beginning of the scheme, when the QP approximations of the active constraints do not reflect the actual solution. Positive definiteness of \(\mathscr {H}_{i}^{(k)}\) is required for ALADIN (Houska et al. 2016) and therefore we propose the following strategy to enforce this condition: The elements on the main diagonal are increased by \(\kappa I\), where I is the identity matrix. Since having high values for \(\kappa\) penalizes large changes in \(\varDelta z_{i}\), this value is decreased with the number of iterations until \(\mathscr {H}_{i}^{(k)}\) becomes indefinite. Then \(\kappa\) remains fixed at this value, or is increased in subsequent iterations, if the Hessian is still indefinite.
Another challenge that arises from trajectory optimization is that small changes in the Lagrange multipliers of the overarching constraints \(\lambda\) can change the active set of the sub-systems. In order to be able to reach the correct active set, the stepsize of the algorithm possibly has to be infinitesimal. Thus, also the \(\alpha _{i}\) are adapted in each iteration. Eq. 24b is modified to account for smaller values of \(\alpha _{1}\), such that the new reference variables \(z_{i}\) are always feasible according to the coordinator level QP. In Algorithm 5, the different steps of ALADIN, adjusted to inequality constrained problems, are given.
Other methods
There are a variety of other dual based methods. For instance the ones introduced in Maxeiner and Engell (2020) or Wenzel et al. (2016), where, similar to the sub-gradient method, only Lagrange multipliers and the usage of the shared resources are exchanged. However, these methods are designed for equality constrained shared resource allocation problems.
Other methods, e.g. those presented in Kozma et al. (2014) and Nesterov (2004), use the objective values of the individual sub-systems. In this contribution, we focus on the presented methods, since they do not require the knowledge of the objective values of the individual sub-systems. Hence, they can be applied to solve problems where due to confidentiality the profit of the sub-systems cannot be openly communicated.