Abstract
The physical and virtual connectivity of systems via flows of energy, material, information, etc., steadily increases. This paper deals with systems of subsystems that are connected by networks of shared resources that have to be balanced. For the optimal operation of the overall system, the couplings between the subsystems must be taken into account, and the overall optimum will usually deviate from the local optima of the subsystems. However, for reasons, such as problem size, confidentiality, resilience to breakdowns, or generally when dealing with autonomous systems, monolithic optimization is often infeasible. In this contribution, iterative distributed optimization methods based on dual decomposition where the values of the objective functions of the different subsystems do not have to be shared are investigated. We consider connected dynamic systems that share resources. This situation arises for continuous processes in transient conditions between different steady states and in inherently discontinuous processes, such as batch production processes. This problem is challenging since small changes during the iterations towards the satisfaction of the overarching constraints can lead to significant changes in the arc structures of the optimal solutions for the subsystems. Moreover, meeting endpoint constraints at free final times complicates the problem. We propose a solution strategy for coupled semibatch processes and compare different numerical approaches, the subgradient method, ADMM, and ALADIN, and show that convexification of the subsystems around feasible points increases the speed of convergence while using secondorder information does not necessarily do so. Since sharing of resources has an influence on whether trajectory dependent terminal constraints can be satisfied, we propose a heuristic for the computation of free final times of the subsystems that allows the dynamic subprocesses to meet the constraints. For the example of several semibatch reactors which are coupled via a bound on the total feed flow rate, we demonstrate that the distributed methods converge to (local) optima and highlight the strengths and the weaknesses of the different distributed optimization methods.
Introduction
The physical and virtual connectivity of systems steadily increases in order to increase throughput, performance, and safety (Engell et al. 2016). The resources that connect the different subsystems can be flows of energy, flows of material, flows of information, or more specific constraints, as quantitative restrictions for hazardous substances, or certificates, as e.g. a given amount of \(\mathrm {CO}_{2}\)certificates that cannot be exceeded (RiusSorolla et al. 2020).
Of special interest are systems where the resources are not only shared bilaterally but among several subsystems. Such system arise in the process industry, where several units are connected via networks of energy and material (Jose and Ungar 2000; Wenzel et al. 2017), in power systems, where the economic dispatch between different electricity generation facilities has to be coordinated to satisfy the network demands (Wang et al. 1995; Zhang et al. 2013) or where demand response of small loads such as residential smart appliances are integrated into the network (Gatsis and Giannakis 2013; Safdarian et al. 2014), in information technology, for instance when coordinating bandwidth between different agents (Hasan et al. 2014; Koutsopoulos and Iosifidis 2010), or when coordinating different autonomous systems such as robots or vehicles to maintain connectivity constraints (Cortés et al. 2004; Galceran and Carreras 2013).
The challenge with such systems is that often a monolithic optimization is not possible. The optimization of a whole chemical site with different ownership of the plants, for instance, requires a limited flow of information to maintain the confidentiality of business data or technical information such as prices, production targets, capacities, efficiencies, etc. Another reason is that in many cases the autonomy and decisionmaking power should remain with the respective subsystems, which is not the case if a third party explicitly determines the operating conditions and the coordination of the resources between the subsystems. If the scale of the resulting optimization problem is large, transparency of the results can be limited, as root causes are difficult to trace across subsystem boundaries (Tang et al. 2018). Furthermore, the implementation of monolithic solutions for resource allocation requires the separation of the models from the subsystems, which constitutes an additional source of error if the models have to be maintained in different locations. Lastly, if the optimization is carried out for control purposes, solutions that reflect the modularity of the system and provide redundancy are often more desirable than a monolithic optimization (Camponogara et al. 2002; Cheng et al. 2007; Christofides et al. 2013; Farokhi et al. 2014; Maestre and Negenborn 2014; Scattolini 2009; Van Parys and Pipeleers 2017).
As a consequence, interconnected systems are often optimized on two levels: On the subsystem level, each subsystem tries to maximize its performance according to given boundary conditions, and on the systemwide or coordinator level, where the optimal allocation of shared resources is performed (Mesarovic et al. 1970). Such bilevel problems can be solved via iterative distributed optimization approaches, i.e. primalbased and dualbased decomposition methods which are compared for instance in Conejo et al. (2006), Palomar and Chiang (2006, 2007). The most suitable methods to tackle the aforementioned challenges, in particular the confidentiality issue, are dualbased decomposition methods where the coordinator broadcasts information such as prices to the different subsystems, whereupon these respond with their estimated usage of the shared resources, and the coordinator iteratively adapts the prices until the resource balances are met.
While methods for the distributed optimization of steady state problems have been thoroughly investigated, dynamic systems have mostly been considered in the context of distributed model predictive control. Most of the work in this area considers the control of continuous processes with linear dynamics, cf. the survey in Negenborn and Maestre (2014). To the authors’ knowledge, little work has been done to compare different distributed optimization methods for coordinating resources among distributed dynamic systems.
The application that we discuss here is the dynamic resource allocation for coupled chemical reactors that are operated in semibatch mode, i.e., some substances are filled into the reactor at the start while others are dosed during the batch run. At the end of the batch run, the reaction mixture which contains the desired product is withdrawn. We focus on how to share a limited amount of a resource, in this case, the feed flow to the reactors between the semibatch reactors, in order to determine an optimal overall operation for given initial conditions and a given production schedule. This is a common challenge in the process industry when large quantities of products are produced in multiproduct multibatch plants and suitable production sequences have to be defined and executed (Nie et al. 2015).
Contribution
The goal of this contribution is to provide a distributed optimization strategy for semibatch processes with endpoint constraints that are subject to overarching constraints and to compare different algorithms for the distributed allocation of shared resources between dynamic systems. We restrict ourselves to dual based methods that do not require the knowledge of the objective values of the subsystems. Additionally to the subgradient method and ADMM, we apply the augmented Lagrangian based algorithm for distributed nonconvex optimization (ALADIN) and adapt it so that it can also handle overarching inequality constraints as they arise for systems that share finite amounts of resources.
The challenge in solving these type of problems is that significant change in the arc structure, i.e. changes in the active sets of the constraints of the subsystems for small changes in the dual variables occur during the iterations of the dualbased distributed optimization methods.
We present a heuristic approach for the solution of such resource allocation problems for dynamic systems with terminal constraints and free final times, such that the different subsystems meet the constraints in the presence of overarching resource constraints.
Notation
Subsystem related vectors \(x_{i}\in \mathbb {R}^{n_{i}},\ i\in 1,\ldots ,N\), are stacked into one large vector \(x = [x_{1}^{T},x_{2}^{T},\ldots ,x_{N}^{T} ]^{T} \in \mathbb {R}^{n}\) of variables on the system level. The subscript i is used as indicator for the different subsystems throughout the paper. The superscript (k) indicates the current iteration k. To index the jth element of a vector x, the notation x[j] is used.
The Eucledian norm of a vector x is indicated by \(\left\Vert x\right\Vert _{2}\). The infinity norm of a vector \(x \in \mathbb {R}^{m}\) is defined by \(\left\Vert x\right\Vert _{\infty } = \max \left\{ \left x[1]\right ,\ldots ,\left x[m]\right \right\}\), where \(\left \cdot \right\) is the absolute value of a scalar and the \(\max\) operator followed by \(\{\cdot \}\) indicates the elementwise maximum of the set.
Theoretical background
First, a general mathematical problem formulation for a single dynamic system is presented, which is then extended to several systems that share a common resource. The extended problem is put into a standard form and algorithms for the distributed solution of the problem are introduced.
Problem formulation for a subsystem trajectory
A general dynamic or trajectory optimization problem over a fixed time interval \(t\in [t_{0},t_{f}]\) can be written as follows:
In dynamic optimization, there is a distinction between state variables \(\chi (t)\) and inputs u(t) to the system. From the inputs u(t) and the initial condition, \(\chi (t_{0}) = \chi _{0}\), the state variables can be computed using the model equation Eq. 1b. Thus u(t) are the degree of freedom, while \(\chi (t)\) are the dependent variables. The goal is to find the best inputs such that the constraints are satisfied and some performance measure of the resulting trajectory is maximized or minimized.
The objective is given in the Bolza form which consists of a scalar performance measure at the end of the horizon, \(\varUpsilon\), and an integral part that tracks some scalar performance measure over the whole path of the trajectory, \(\varTheta\). Similar to the objective, the constraints are defined as terminal constraints (\(T_{i}\)) and path constraints (\(P_{i}\)) (Sargent 2000).
Problem formulation for multiple trajectories with shared inputs
The problem of interest in this paper is to optimize N trajectories for N subsystems that share resources and start as well as end their operation possibly at different times. This can be formulated as the following dynamic optimization problem:
The considered time interval is given by \(t_{min}=\min \left\{ t_{0,1},\ldots ,t_{0,N} \right\}\) and \(t_{max}=\max \left\{ t_{f,1},\ldots ,t_{f,N} \right\}\). The objective is to minimize the sum of the individual objectives. The variables \(\chi _{i}\) and \(u_{i}\) belong to subsystem i exclusively, can only be manipulated by the respective subsystem, and, except for coupling via the overarching constraints Eq. 2b, have no impact on the other subsystems. Due to the different starting and final times of the trajectories, when the trajectory of subsystem i is not active, its use of the resource is fixed at 0 via Eq. 2g.
Numerical solution methods for trajectory optimization
The problems given in Eqs. 1 and 2 can be solved using different approaches: Direct optimization methods, methods based on Pontryagin’s minimum principle, and methods that are based on the Hamilton–Jacobi–Bellman equations (Bellman 1957; Bertsekas 1995; Pontryagin 2018; von Stryk and Bulirsch 1992). Depending on the selected method, the level of discretization can be chosen: All quantities can be considered infinitedimensional in time, only the inputs can be discretized, or additionally to the inputs also the states can be fully discretized. If the inputs are discretized, they are usually considered to be piecewise constant or piecewise linear within the discretization elements. An overview of the solution methods as well as discretization levels is given in Betts (1998) and Srinivasan et al. (2003).
While there are also other efficient methods, as, e.g., parsimonious input parametrization (Rodrigues and Bonvin 2019), direct methods are used here, since they are best suited to handle the overarching constraints Eq. 2b as well as to get reliable numerical solutions (Srinivasan et al. 2003). When the inputs are discretized into equidistant intervals of duration \(\varDelta t\), there are three options to solve such a problem using the direct method: The first option is control vector parametrization, where the states remain continuously defined for every point in time and are determined by integration. The degrees of freedom for the optimization are the values of the inputs. The second is the simultaneous approach, where, additionally to the inputs, also the states are fully discretized such that a sparse large nonlinear program (NLP) results (Biegler 2007). Since also the states are degrees of freedom, only when the NLP has been solved, the resulting trajectory satisfies the model equations. On the other hand, this method often shows to be more robust. The third option is multiple shooting, in which the time is divided into several intervals, where in each interval control vector parametrization is used to determine the solution. Matching of the states at the ends of the intervals is forced by an additional boundary condition (Bock and Plitt 1985). In this contribution, control vector parametrization is applied and a constant stepsize 4thorder RungeKutta method is used for integration, because this renders all derivatives, of objective as well as of the constraints, dependent only on the inputs.
Discretized problem formulation
The discretization of the problem in Eq. 1 can be done as described in the previous subsection, however, for the problem formulation with multiple trajectories and shared inputs, synchronization of time between the different trajectories needs to be assured. Only if the starting and ending times are on the same time grid as the discretization of the inputs, the overarching constraints can be enforced at every point in time. The starting and ending times are thus expressed as multiples of the shared minimum discretization duration \(\varDelta t\) via the following relationship:
For each subsystem i, the sets of points \(\varPsi _{i} = \{N_{i,0} ,\ldots , N_{i,f}1\}\) are defined as counterparts to the continuous time intervals \([t_{0,i},t_{f,i}]\).
Similar to the continuous case, \(N_{min}\) and \(N_{max}\) are defined as the minimum and maximum over all subsystems i of \(N_{i,0}\) and \(N_{i,f}\). This leads to the following problem formulation:
In this formulation, p is the discrete time index. Note that in this case the models \(\tilde{F}_{i}\) are defined implicitly, connecting the old and the new states. This is used to express numerical integration or full discretization. Furthermore, it should be noted that the path constraints \(P_{i}\) are only defined at the grid points.
The resulting optimization problem is nonconvex, which is in general a \(\mathscr {NP}\)hard problem that can have multiple local minima (Esposito and Floudas 2000; Papamichail and Adjiman 2002).
Problem formulation in the standard form of distributed optimization
The problem of optimizing trajectories with shared resources across system boundaries from Eq. 5 can be written in the standard form of a general sharing problem, cf. Boyd et al. (2010),
The variables \(x_{i}\) comprise the inputs and the states of the subsystems, \(dim (x_{i}) = dim(\chi _{i})+dim(u_{i})\). The inputs of the trajectory optimization problem described in the previous subsection are given by the linear mapping \(u_{i} = A_i x_i\), with \(u_{i} = [u_{i,N_{i,0}},u_{i,N_{i,1}},\ldots ,u_{i,N_{i,f}1}]^{T}\). The state variables are given by \(\chi _{i} = B_i x_i\), with \(\chi _{i} = [\chi _{i,N_{i,0}},\chi _{i,N_{i,1}},\ldots ,\chi _{i,N_{i,f}}]^{T}\). The initial conditions, model equations, and systemspecific constraints are described by the \(n_{g_{i}}\)dimensional inequality constraint function \(g_{i}\). The dimension of the overarching constraints is m, i.e., \(dim(b)=dim(A_{i} x_{i})=dim(u_{i})=dim(u_{shared,max})=m\).
Necessary conditions of optimality for distributed optimization
For this problem in standard form, the Lagrangian of the problem is given by Bertsekas (1999):
where \(\lambda\) are the Lagrange multipliers corresponding to the overarching constraints in Eq. 6b and \(\mu _{i}\) are the Lagrange multipliers for the subsystem specific constraints in Eq. 6c. Using the Lagrangian, the firstorder necessary conditions of optimality (KarushKuhnTucker conditions) can be expressed as:
The interesting property of these conditions is that Eqs. 8a–8c can be evaluated independently for each subsystem i and only Eqs. 8d and 8e require coordination between the different subsystems.
Distributed solution algorithms based on the dual problem
While the Eqs. 8 can be solved monolithically via state of the art solvers, in this contribution methods that exploit the distributed structure of the problem are investigated.
We focus on hierarchical methods, where all subsystem specific decisions are taken in a distributed fashion and only on the coordination layer, the satisfaction of the overarching constraints Eqs. 8d and 8e is enforced. These methods are also known as dual methods, which make use of the dual variables or Lagrange multipliers. In our case, the dual variables of interest are the ones corresponding to the overarching constraints, i.e. \(\lambda\).
Using the solution to Eqs. 8a–8c, written as \(\inf _{x_{i},\ \mu _{i}\ge 0} \mathscr {L}_i(x_i,\lambda ,\mu _{i})\), the dual function can then be defined:
Using this dual function of \(\lambda\), the optimality condition can be expressed as finding the maximum of \(d(\lambda )\) with \(\lambda \ge 0\), which is called the dual problem:
Due to the infimum in Eq. 9, the dual function is in general not known explicitly. However, according to Danskin’s theorem (Bertsekas 1999), a subgradient can be determined at \(\lambda\) via:
In this contribution, we compare iterative methods for the maximization of the dual which do not require the explicit knowledge of the value of the dual function and thus of the different individual objectives. As measures of convergence, we define two criteria. The primal feasibility, \(\varPhi _{Primal}\), is a measure of the satisfaction of the overarching constraints of the original problem. At the same time, due to Eq. 11 and the fact that the dual is always concave, it is also a measure of the vanishing of the gradient (Boyd and Vandenberghe 2004).
Here we highlight that \(x_i\) is a function of \(\lambda\). If all elements of \(\varPhi _{Primal}\) are equal to 0, the overarching constraints are satisfied. Additionally to the primal feasibility, which measures the satisfaction of the constraints, also a measure of convergence is required, in order to prevent termination when a solution is primal feasible, but not optimal yet. The dual feasibility, \(\varPhi _{Dual}\), can be interpreted as the satisfaction of the optimality criterion for the dual problem, i.e., the gradient approaching the 0 vector. Thus we define the dual feasibility as the finite difference approximation of the gradient of the dual function.
This second feasibility criterion is a measure of how far the solution can deviate from active overarching constraints and is essential for inequality constrained problems, since primal feasibility can always be achieved by sufficiently large Lagrange multipliers. This ensures that the solution not only satisfies \(\sum _{i=1}^{N} A_i x_i^{(k)}  b \le \epsilon\) but also \(\sum _{i=1}^{N} A_i x_i^{(k)}[j]  b[j] \ge  \epsilon\) for all active overarching constraints j. Only if a solution is primal and dual feasible, a saddle point to the Lagrangian that satisfies the conditions of optimality is found.
Since we can optimize numerically only with a certain numerical error, we define a set of \(x^{*},\lambda ^{*}\), and \(\mu ^{*}\) to be optimal if the following holds:
where \(\epsilon _{Feas,Primal}\) and \(\epsilon _{Feas,Dual}\) are the desired numerical tolerances and \(x_i\) minimizes the objective of subsystem i.
Subgradient method
The simplest method for the maximization of the dual is to follow the direction of the steepest ascent, i.e., to use the direction of the subgradient (Shor 2012). Here the challenge is the selection of a suitable stepsize. Since the dual function may be a nonsmooth function, which depends on the solution structure of the different trajectories, the stepsize selection criteria for nonsmooth optimization derived in Nesterov (2004, p. 142) should be satisfied.
Since the impact of small changes in the dual variables is not the same for all variables \(\lambda [j],\ j\in \{1,\ldots ,m\}\), a scheme is needed that adapts the stepsizes individually.
While there exist methods to evaluate the optimal stepsize at the current point using the Lipschitz constant, as explained in Bertsekas and Tsitsiklis (1989) and Kozma et al. (2014), determining the constant is difficult since the dual cannot be evaluated explicitly. Furthermore, since this constant is not valid globally, the stepsizes either have to be adapted during the maximization of the dual or be chosen conservatively enough to be valid throughout the domain of the dual function.
The subgradient method can be considered as alternating local optimization of the subsystems and adaptation of the Lagrange multipliers on the coordination layer. We propose to decrease the stepsize of a specific overarching constraint every time the sign of the corresponding element of \(\sum _{i=1}^{N} A_i x_i(\lambda )  b\) changes. This is equivalent to the inequality constraint becoming active or inactive respectively. Specifically the following adaptation of \(\alpha\) is used (cf. Algorithm 1, line 8):
If one of the stepsizes is too large and it has an influence on the Lagrange multiplier (note that if a constraint is inactive, the multiplier is fixed at 0), then this leads inevitably to a diminishing of the stepsize in this direction. The factor \(\beta\) in Algorithm 1 prevents the stepsize from decreasing too fast due to oscillating responses while maintaining the stepsize as large as possible if consecutive steps have the same direction.
Other possible selections of \(\alpha\) can be found in Nesterov (2004). Regardless of the selection of the stepsize, the provable convergence rate for strictly convex problems is at best \(\mathscr {O}(1/k)\).
ADMM
A more robust method is the alternating direction method of multipliers (ADMM) Boyd et al. (2010). It uses the augmented Lagrangian:
Additionally to the linear penalty term, the deviation from a feasible use of the shared resources \(z_{i}\) is penalized quadratically. Essentially, the penalty terms convexify the problems around points that satisfy the overarching constraints, which accelerates the initial convergence. The variables \(z_{i}\) are determined on the coordinator level and are a projection of the current responses of the different subsystems onto the feasible region. The stepsize \(\alpha\) for the update of the Lagrange multipliers is \(\frac{\rho }{N}\) in the case of ADMM. However, since additionally to the prices also the \(z_{i}\) determine the response of the systems, the dual feasibility is redefined as:
The convergence rate for convex problems is also \(\mathscr {O}(1/k)\) (Hong and Luo 2017; Kozma et al. 2014).
In Algorithm 2 the unscaled version of ADMM for equality constrained sharing problems is shown, cf. Boyd et al. (2010, p. 59). When adapting the algorithm to the inequality constraint problem considered here, the dual variables need to satisfy \(\lambda \ge 0\) and the zvariables need to be adjusted depending on whether the overarching constraints are active or not. When they are active, the same update as in Algorithm 2 applies, however, if the constraints are not active, the references \(z_{i}\) are based on the previous solutions of the subsystems. This penalty is required since otherwise only some variables are quadratically penalized possibly leading to cyclic solution changes in subsequent iterations. We present ADMM including a new and efficient update step to compute the zvariables for inequality constrained problems in Algorithm 3. To improve convergence, different and variable penalty parameters \(\rho\) are used for each constraint. The penalty parameters are adapted at step 10 of Algorithm 3 using the scheme in Wang and Liao (2001):
The factor \(\delta\) is the maximally allowed difference between primal and dual feasibility before the penalty parameter is adapted to rebalance the proportion. The parameters \(\tau _{Incrase} > 1\) and \(\tau _{Decrease} < 1\) adjust the penalty parameter \(\rho\) if necessary. The update ensures that primal and dual feasibility are kept in balance or, in other terms, that far away from the optimum large changes in \(\lambda\) are made. Close to the optimum, the changes are reduced and additionally the quadratic penalization is decreased, to ensure that the solution without the quadratic penalty also satisfies the overarching constraints.
ALADIN
Another method that uses the augmented Lagrangian is the augmented Lagrangian based algorithm for distributed nonconvex optimization in Houska et al. (2016). Different to ADMM, here the reference values for all variables of subsystem i are penalized for deviating from the reference \(z_{i}\):
Additionally to reporting the consumption of the shared resources, the subsystems also report their derivatives of the objective and of the active constraints with respect to the local decision variables to the coordination layer in each iteration k. The Hessian and gradient approximations are then calculated using the constraint Jacobian information \(\mathscr {C}_{i}^{(k)}\) (\(\mathscr {C}_{i}^{Active(k)}\)) of the (active) constraints from Eq. 6c:
where \(\mathscr {C}_{i}^{(k)} = \nabla _{x_{i}} g_{i} (x_{i})_{x_{i}=x_{i}^{(k)}}\). The modified gradient and Hessian approximation are:
With this information of the subsystems, instead of doing straight projections onto the feasible set, prices and reference values (\(z_{i}\)) are determined via a quadratic program that approximates the objective functions and active constraints of the subsystems. The algorithm can be seen as a combination of sequential quadratic programming (SQP) and ADMM. The benefit of using more information from the different subsystems on the coordinator level is in general a faster convergence. In Houska et al. (2016) it is shown that in theory superlinear to quadratic convergence rates are possible.
The update of the Lagrange multipliers \(\lambda\) is done differently compared to the previous two methods without subgradient information. Instead, the Lagrange multipliers from the overarching constraints \(\lambda _{QP}\) in the QP are used in the update step. The algorithm for equality constrained problems as proposed in Houska et al. (2016) is given in Algorithm 4. The dual feasibility is defined as follows:
The parameters \(\alpha _{i} \in [0,1]\) can be used to adjust the behaviour of the algorithm to match frequent changes of the active set. Houska et al. (2016) provide an additional scheme that utilizes the objective values of the subsystems, based on which these parameters can be adapted in each iteration to guarantee the convergence to a local minimum. The scheme is not considered in this work, because essentially a monolithic optimization is carried out to determine the parameters.
Since trajectory optimization problems with overarching inequality constraints are considered, Eq. 24b is changed to an inequality constraint and the algorithm is modified accordingly. The solution to trajectory optimization problems consists of different arcs, which correspond to active constraints. Since these constraints ultimately act on the inputs, Eq. 24c fixes all \(\varDelta z_{i}\) for the inputs in the QP. Therefore, the equality constraint Eq. 24c is changed to inequality, such that the variables \(\varDelta z_{i}\) are degrees of freedom. If the reference variables \(z_{i}\) resulting from the QP are infeasible, the Hessian of the subsystems may become indefinite, which occurs mainly at the beginning of the scheme, when the QP approximations of the active constraints do not reflect the actual solution. Positive definiteness of \(\mathscr {H}_{i}^{(k)}\) is required for ALADIN (Houska et al. 2016) and therefore we propose the following strategy to enforce this condition: The elements on the main diagonal are increased by \(\kappa I\), where I is the identity matrix. Since having high values for \(\kappa\) penalizes large changes in \(\varDelta z_{i}\), this value is decreased with the number of iterations until \(\mathscr {H}_{i}^{(k)}\) becomes indefinite. Then \(\kappa\) remains fixed at this value, or is increased in subsequent iterations, if the Hessian is still indefinite.
Another challenge that arises from trajectory optimization is that small changes in the Lagrange multipliers of the overarching constraints \(\lambda\) can change the active set of the subsystems. In order to be able to reach the correct active set, the stepsize of the algorithm possibly has to be infinitesimal. Thus, also the \(\alpha _{i}\) are adapted in each iteration. Eq. 24b is modified to account for smaller values of \(\alpha _{1}\), such that the new reference variables \(z_{i}\) are always feasible according to the coordinator level QP. In Algorithm 5, the different steps of ALADIN, adjusted to inequality constrained problems, are given.
Other methods
There are a variety of other dual based methods. For instance the ones introduced in Maxeiner and Engell (2020) or Wenzel et al. (2016), where, similar to the subgradient method, only Lagrange multipliers and the usage of the shared resources are exchanged. However, these methods are designed for equality constrained shared resource allocation problems.
Other methods, e.g. those presented in Kozma et al. (2014) and Nesterov (2004), use the objective values of the individual subsystems. In this contribution, we focus on the presented methods, since they do not require the knowledge of the objective values of the individual subsystems. Hence, they can be applied to solve problems where due to confidentiality the profit of the subsystems cannot be openly communicated.
Problem formulation for multiple trajectories with shared inputs and free final times
Due to the overarching constraint on the sharing of the resources, the terminal states of the trajectories change. With fixed final times, boundary conditions on the terminal state which are infeasible due to the overarching resourcesharing constraints, cannot be satisfied. Thus, additional degrees of freedom that enable the satisfaction of the terminal constraints, i.e., free final times, are needed.
The standard approach to include the final time as an optimization variable in trajectory optimization is time scaling. The time horizon is scaled to the interval [0, 1], discretized, and multiplied with the final time which is a continuous variable that is minimized. The number of discretization intervals stays constant, but the lengths of the discretization intervals change. The downside of this approach, for the considered scenario with sharing of resources, is that the constraints on the shared resources cannot be enforced exactly anymore, because the discretization intervals are not synchronized between the subsystems.
Another possibility is to adjust the number of intervals. Then, the length of the discretization intervals is fixed and the shared resource constraints can be enforced across all systems. As a result, the additional optimization variable, the number of discrete intervals, is of integer type (Van den Broeck et al. 2011).
In the following, we use the latter approach and consider a single terminal constraint for each subsystem that is feasible without the overarching constraint but not necessarily when it is present. The resulting problem of trajectory optimization with shared resources and free final times for the different trajectories can be written as the following mixedinteger nonlinear program (MINLP):
The binary variables \(y_{i,p}\) and \(\hat{y}_{i,p}\) are used for each subsystem i to indicate in which intervals \(p\) the trajectory is active (\(y_{i,p}\)) and in which interval \(p\) the trajectory satisfies the terminal constraints (\(\hat{y}_{i,p}\)). The connection between continuous and binary variables is done using the Big M method (Nemhauser and Wolsey 1988).
Different to the problem given in Eqs. 5, \(N_{max}\) is not the maximum of \(N_{i,f}\) but must be a sufficiently large integer such that the problem is feasible, i.e., that all trajectories can satisfy the terminal constraint when the resources are shared.
The problem given in Eqs. 27 can be solved monolithically using an MINLP solver. Since in this contribution distributed solutions are sought, another way to handle the binary variables is to use the heuristic described in Van den Broeck et al. (2011), where the minimum interval for which the terminal constraint is satisfied is found iteratively. The iterative process of determining the final interval is included into the scheme for the satisfaction of the overarching constraints.
Each subsystem updates its number of intervals via the following equation:
The adaptation is done if at least one of the terminal constraints cannot be reached with the given final times. As an additional criterion, the number of iterations between two adaptations is increased by 1 each time the final times are adapted.
As the changes in the discrete variables become less frequent, the influence of their adaptation vanishes and the distributed optimization methods converge.
Since the objectives are in general not smooth with respect to discrete changes of the final time, the objective functions of the subsystems should be scaled with their respective final times, i.e. by dividing the objective by the final time, in order to reduce the effect of the discrete changes.
Semibatch reactor case study
In this paper, we consider a modified version of the isothermal semibatch reactor with a safety constraint example from Ubrich et al. (1999). This reactor has been widely used as a benchmark in the trajectory optimization literature, e.g., Srinivasan et al. (2003).
A firstorder reaction is considered, in which the following reaction occurs:
Previous to the reaction, the reactor is filled with the amount \(V_{0,i}\ c_{A,0,i}\) of reactant A. The dosage profile of reactant B is a degree of freedom and hence the feed rates \(u_{i}(t)\) are the manipulated variables. The ordinary differential equations that describe the trajectories of the states in each reactor i are:
The trajectories of each reactor i need to satisfy the following constraints:

Limitation of the feed rate of the reactant B by \(u_{Max,i}\),

The path constraint that the adiabatic temperature rise in the reactor is limited
$$\begin{aligned} \varDelta T_{ad}(t) = \min \left\{ c_{A,i}(t), c_{B,i}(t) \right\} \frac{(\varDelta H_R)}{\rho \, c_p}, \end{aligned}$$(32)which poses a constraint on the concentration \(c_{B,i}(t)\) within the reactor,

The path constraint on the volume of the reactor, which requires that the reaction volume cannot exceed the maximum available reactor volume \(V_{Max,i}\).
As a terminal constraint, the amount of C must be above the desired threshold \(n_{C,Des,i}\). This amount can be calculated via \(n_{C,i}(t) = c_{A,0,i} V_{0,i}  c_{A,i}(t) V_{i}(t)\). In Table 1, all case study specific numerical values are given.
In trajectory optimization, different criteria can be selected as economically motivated objectives, e.g.:

Maximization of the valuable product at the end of the batch time, which yields the maximum material efficiency.

Minimization of the time necessary to produce a certain amount of valuable product, which yields as many batches as possible per time.

Maximization of productivity, i.e., the amount of valuable product divided by the batch time.
Here, the maximization of the throughput of product C is chosen as the optimization criterion for each reactor i:
In Fig. 1 the optimal trajectories of the states and the input of a single semibatch reactor without overarching constraints are shown for an input discretization interval of 4 h. On the left, the trajectories of the states are shown and the trajectory of the amount of final product is shown. The effective constraints on the quantities are indicated by the thin horizontal lines in the same line style. On the right, the input trajectory is shown. One can see the presence of different arcs, i.e., the presence of different active constraints. At first, the maximum feed rate constraint is active, then the temperature at cooling failure limits the feed rate until the feeding needs to be stopped because the maximum volume of the reaction mixture is reached.
In the case of a single semibatch reactor, the terminal constraint is satisfied after 17 discrete elements of duration \(\varDelta t = 4\) h.
Additionally to the individual limitations of the feed flows, we consider a coupling of the reactors via an overarching constraint on the joint feed flow rate of the reactant B:
The maximum combined feed flow rate for all reactors is considered to be constant. Dualizing this overarching constraint Eq. 34 according to Eq. 7 yields the following integral cost term in the objectives of the subsystems:
Validation of the solutions
All subsequent distributed solutions are compared to the monolithic solutions of the same problem. In the case of free final times, instead of solving the MINLP, all possible solutions for the final interval \(N_{i,f}\) that end no later than 3 intervals from the unconstrained solution are enumerated and used as the benchmark for the evaluation of the different distributed solutions.
For the iterative methods with the augmented Lagrangian term, upon convergence, the solutions are validated with \(\rho = 0\) in order to ensure that the constraints are satisfied even without penalty parameters and thus the corresponding Lagrange multipliers satisfy the necessary conditions of optimality.
Numerical results
The performance of the different methods is evaluated using the following three criteria:

required number of iterations,

evolution of the primal infeasibility over the iterations,

objective value at convergence.
All scenarios were evaluated using the optimization parameters given in Table 2, which were determined empirically. The optimization problems are solved in Python using IPOPT as NLP solver and the CasADi toolbox for the computation of the derivatives (Andersson et al. 2018; Wächter and Biegler 2006).
In the following, all reactors have the same properties and initial conditions, which is not required in general. Different scenarios are generated by changing the number of reactors, the time discretization \(\varDelta t\), and the starting times of the reactors. Three reactors starting at \(0\ \varDelta t\), \(1\ \varDelta t\), and \(2\ \varDelta t\) are indicated by the starting sequence [0, 1, 2].
Comparison of the methods for fixed final times
In the following, first the results for fixed final times are discussed. Since changing the fixed final time has only a minor influence on the arc structure of the solution as long as feeding is completed within the considered time horizon, it is not varied.
At first, different scenarios that are generated by varying the starting times of the reactors are considered. Since it is possible to generate scenarios without active overarching constraints, which do not require any coordination, only scenarios where the overarching input constraint is active in at least two intervals are considered.
In Fig. 2 the final distribution of the input between the different subsystems and the corresponding trajectories of the states are shown for three reactors starting at the same time, i.e. [0, 0, 0]. The input is discretized into piecewise constant intervals of \(\varDelta t = 4\) h and the final times for all of the reactors are fixed at \(20\ \varDelta t\). In Fig. 2b, the inputs \(u_{i}\) are stacked on top of each other. Input \(u_{1}\) is the difference between \(\bar{u}_{1}\) and the baseline at 0, input \(u_{2}\) is the difference between \(\bar{u}_{2}\) and \(\bar{u}_{1}\), and \(u_{3}\) is the difference between \(\bar{u}_{3}\) and \(\bar{u}_{2}\). This plot shows how the resources are distributed between the different subsystems over time and that the shared resource constraint can be satisfied. This is a special case due to the same starting time and the equal distribution of the feedrate (\(u_{i}\)), wherefore all trajectories in response to the prices are the same. The corresponding state profile is displayed in Fig. 2a. It is worth mentioning that the structure of the optimal solutions of the subsystems (reactors) changes in the distributed optimization. As the maximum feed rate is no longer reached, the first arc now is a sensitivity seeking arc, and the constraint on the adiabatic temperature becomes only active at 44 h. So for most of the batch time, the solution of the subsystem problems is not at the constraints, the constraint on the feed is dualized and only enforced by the coordination via the price of the feed.
Figure 3 shows the evolution of the Lagrange multipliers corresponding to the overarching constraints on the shared resources in the maximization of the dual for the three different methods. In Fig. 3a, the spikes ate the beginning of the evolution of the Lagrange multipliers for the subgradient method result from the adaptation of the stepsizes to the specific problem. Once the stepsizes are adequate, the prices converge to the values of the monolithic optimization. For ADMM, the prices converge more quickly towards the optimal ones compared to the subgradient method, however, the increasing number of active overarching constraints as well as the balancing of primal and dual feasibility result in oscillations towards the optimal Lagrange multipliers, as can be seen in Fig. 3b. In Fig. 3c, the prices are adapted according to the ALADIN method, which is based on the derivatives at the currently active set of local inequalities, i.e., the active input and path constraints of all reactors. After a few iterations, the approximated active set is close to the actual one, and the prices converge towards the optimal Lagrange multipliers quickly.
Another interesting scenario that is further examined results for the starting times [0, 0, 2]. The optimized input trajectories are shown for the different methods in Fig. 4. It can be seen that the shared resource constraint, indicated by the dashed line is satisfied for all methods, however, not all methods yield the same trajectories. Nonetheless, the objective values agree up to the 5th significant digit. Thus, the difference in the trajectories can be explained as different local optima that all satisfy (within the specified accuracy) the necessary conditions of optimality.
In Fig. 5, the corresponding evolutions of the primal feasibilities (infinity norm) are shown. As can be seen in Fig. 5a, in this scenario the subgradient method first converges to a point at which a further adaptation of the Lagrange multipliers, shown in Fig. 6a, does not have an effect on the primal feasibility. This is due to a set of local constraints being active. Once this active set changes before iteration 2000, the primal feasibility decreases further and the method converges towards a feasible solution.
For ADMM, the primal feasibility does not steadily decrease but oscillates. Due to the second optimality criterion of dual feasibility, the scheme continues to iterate even when the overarching constraints are satisfied for all intervals. As can be seen in Fig. 6b, starting from iteration 20, the Lagrange multipliers oscillate around their final values before they converge.
Similar to ADMM, ALADIN decreases the infinity norm of the primal feasibility quickly, cf. Fig. 5c. Thereafter, the primal feasibility spikes either when the active set in the QP approximation changes or when new overarching constraints become active. The latter can be seen in Fig. 6c by the new nonzero Lagrange multipliers. The former is only indirectly visible by the significant changes in the Lagrange multipliers. The initial spike in the Lagrange multipliers \(\lambda\) results from \(\lambda _{QP}\) being calculated based on the wrong active set in the coordinator level QP. Different from the [0, 0, 0] scenario, several iterations are necessary for the Lagrange multipliers, cf. Fig. 5c, to converge to the values that yield the inputs in Fig. 4d.
In Table 3, the results for different scenarios are given. The scenarios are permutations of the starting times between 0 and 8 h. The objective value is scaled by a factor of − 1000, such that higher values correspond to better objectives. This table is continued in Table 5 with the remaining permutations of the starting times between 0 and 16 h. If the objective value of a distributed method is slightly better than that of the monolithic solution method, this results from the overarching constraints being satisfied only to the specified tolerance \(\epsilon _{Feas}\). The monolithic solution is accurate to the precision of IPOPT, which is set to \(10^{12}\). Feeding slightly more into the reactors leads to these small differences in the objective values. Even though it is mostly not reflected in the objective values, the resulting trajectories for the inputs do not always have the same arc structure. In addition to the objective values, also the number of necessary iterations, with the value of the best distributed method highlighted in bold, as well as the numbers of coordinated intervals (# Coord. Ints.), i.e., intervals with active overarching constraints, are shown.
The number of coordinated intervals correlates with the objective value since if fewer intervals need to be coordinated, this means that for more intervals there is no active constraint on the usage of the resources. It is no surprise that the scenarios where the starting times are further apart as well as when the reactors start later, cf. [0, 0, 2] and [0, 2, 2], have a better objective value. The latter results from the fact that \({\partial n_{C,i}}/{\partial t}\) decreases once the path constraints on \(c_{B,i}\) is active.
The influence of the granularity of the time discretization on the number of necessary iterations depends on several factors. In general, it can be said that increasing the discretization interval \(\varDelta t\) leads on the average to fewer intervals that have to be coordinated and the number of reactor specific constraints that can be active decreases significantly. This can be seen by comparing the results of \(\varDelta t = 4\) h with the results in Tables 6 and 7 in the Appendix, where the time discretization is changed to \(\varDelta t = 8\) h and \(\varDelta t = 16\) h, respectively.
The average number of iterations for the subgradient method approximately halves if the time discretization interval is doubled. For all three discretizations, the subgradient method has the highest variance. For ADMM the number of iterations is similarly reduced, however, the method converges much more consistently, i.e., the different scenarios do not influence the number of iterations as much as for the other methods. ALADIN exhibits a larger spread for the time discretizations \(\varDelta t = 4\) h and \(\varDelta t = 8\) h, however, requires significantly fewer iterations for \(\varDelta t = 16\) h. An example, where ALADIN requires many iterations is scenario [0, 1, 4], where the evolution of the primal infeasibility and of the Lagrange multipliers are shown in Fig. 7. Between iteration 30 and 140, the stepsize is too large for the algorithm to find the correct QP approximation. Once this set is found, the algorithm converges, however with small steps, such that another 60 iterations are required.
Varying the number of reactors does not make the problem significantly harder. For instance, changing the number of reactors while maintaining their starting position (e.g., [0], [0, 0, 0] and [0, 0, 0, 0, 0, 0]) and adapting the available amount \(u_{shared,max}\) accordingly, i.e., by multiplication with the new number of reactors divided by the old one, does not influence the necessary number of iterations, except for the initial phase, since the solution has the same structure. This can be seen in Fig. 8a–c, where the structure of the imbalances is similar.
The difference in the final numbers of iterations results from the difference in the initial imbalance and the different adaptation of the stepsizes \(\alpha\). The evolution of the prices and of the final structure of the solution are similar.
If however \(u_{shared,max}\) is kept constant while the number of reactors is changed, the structure of the solution and the trajectory of the multipliers change, cf. Fig. 8d, where the maximum amount of \(u_{shared,max} = 0.05\) l/h is distributed between only two reactors.
Comparison of the methods for variable final times
The three methods are also compared for problems with enforced terminal constraints. The final times are initialized as in the previous case but changed during the optimization.
Additionally to the properties from the previous subsection, the points in time when the final time changes can be evaluated. In Figs. 9 and 10, the evolution of the final times and of the Lagrange multipliers is shown for scenario [0, 0, 2]. It can be seen that the distributed methods converge in the considered cases to the same final times. In the case of the subgradient method, as a result of the high fluctuations in the Lagrange multipliers at the beginning, also the final times change significantly until the stepsizes are adjusted accordingly. The large numbers of required iterations are caused by infinitesimal stepsizes resulting from the automatic adaptation of the stepsizes. For ADMM significantly fewer changes can be observed. ALADIN finds the vicinity of the optimal Lagrange multipliers even more quickly, and fewer changes in the final times occur. The number of iterations stays in a similar range as for the case with fixed final times, which can be seen in Table 4 and the resulting distribution of the feed rate between the reactors can be seen in Fig. 11.
In Table 4, as an additional column, the satisfaction of the terminal constraint at convergence is added. The solutions deviate more from the monolithic optimization for the case with free final times. However, for the considered cases and also the ones in the Tables 8, 9, and 10, the heuristic finds feasible solutions. Similarly as with the fixed final times, the smaller \(\varDelta t\) is, the more often the distributed solution methods converge to the monolithic solution.
Discussion
In the following, the results for the distributed optimization methods as well as for the free final times heuristic are discussed and analyzed. By coordinating the shared resource consumption between the individual reactors using Lagrange multipliers, the structure of the optimal solutions for the individual reactors may change. In this case, additional sensitivity seeking arcs arise instead of constraint seeking arcs. So the example shows that both constraint seeking arcs and sensitivity seeking arcs in the subproblems can be handled.
Since in most cases trajectory optimization is not convex and thus part of a problem class for which few general statements can be made, a qualitative evaluation of the suitability of the methods is done.
Comparison of the distributed optimization methods
While for ALADIN an extension exists that guarantees convergence to a local minimum using de facto monolithic optimization steps to determine \(\alpha _{1}\), \(\alpha _{2}\), and \(\alpha _{3}\), for ADMM and for the subgradient method no proofs exist that these methods necessarily converge in the nonconvex case. Furthermore, it should be noted that all the considered methods based on dual decomposition are infeasible path methods such that only at convergence, the resulting trajectories satisfy the overarching constraints. Nonetheless, in all considered scenarios the distributed optimization methods found feasible solutions with respect to the overarching constraints and converged to at least a local minimum using the parameters given in Table 2.
The subgradient method for inequality constrained problems adapts the stepsizes automatically, which has the advantage that no prior information on the different subsystems is required. This comes at the cost that for each scenario and for each optimization run a significant number of iterations is required to determine the stepsizes, which are conservative enough such that the active set of the inequality constraints on the shared resources stays mostly the same. Once these stepsizes are found, the method converges slowly and the final number of iterations can vary significantly, especially if local constraints are active, which can prevent improvements of the solution for many iterations as seen for instance in Fig. 5a. While one might conclude from Tables 3 and 4 that the subgradient method requires significantly more iterations for the free final times, the continuation in Tables 5 and 8 shows that this is not true. High numbers of necessary iterations are caused by the small stepsizes resulting from the scheme in Eq. 15.
The benefit of the subgradient method is its simplicity. The augmentation of the objective function can be interpreted economically as the cost for use of the shared resource and on the coordination layer, the update mechanism matches supply and demand via the prices.
While ADMM does not need more information from the different subsystems than the subgradient method, it introduces artificial penalization terms to regularize the deviations from feasible solutions. This comes with the advantage of a significantly improved speed of convergence in the considered cases. Whether this is acceptable depends on the situation: if distribution is mostly used as a tool to distribute the computational load or to robustify the optimization, it will probably not matter. If the goal is to coordinate the subsystems while they only optimize their local cost function, it may not be acceptable.
ALADIN uses much more information from the different subsystems, including state variables as well as derivatives of objective and active constraints, to create QP approximations. As a consequence, ALADIN converges in most cases significantly faster than the other methods, which is also described in the literature by Engelmann and Faulwasser (2019) and Jiang et al. (2017). However, this works only when the QP approximations are accurate. In distributed trajectory optimization there are two factors that can make the approximation difficult: highly nonlinear constraints and changing active sets. The former one is the result of the nonlinear model equations. The second results from the fact that small changes in the Lagrange multipliers can completely change the active set or the arc structure of the solutions. As long as the active set is not correct, \(\lambda _{QP}\) will not be optimal and ALADIN will not converge. Thus, an adaptive scheme for the stepsizes was presented in Algorithm 5 that, by decreasing the stepsize with the number of iterations, prevents oscillation between different active sets.
In the original ALADIN paper (Houska et al. 2016), the authors recommend that a sufficiently large penalty parameter \(\rho\) has to be chosen for the method to converge with a superlinear rate. We found that for the considered problems, choosing \(\rho\) too large resulted in indefinite Hessian matrices \(\mathscr {H}_{i}\). The difficulty to choose \(\rho\) and \(\kappa\) results from the following tradeoff: If \(\rho\) is chosen large, then the Lagrange multipliers \(\mu _{i}\) of the local constraints of the different subsystems can become large, if the \(z_{i}\) variables are not feasible, which in turn can lead to negative definite Hessian matrices \(\mathscr {H}_{i}\). Since \(\mathscr {H}_{i}\) must be positive definite for the coordinator level QP to yield meaningful updates of \(\lambda\) and \(z_{i}\), the parameter \(\kappa\), which increases the eigenvalues, must be selected sufficiently large. If however \(\kappa\) is large, then the coordinator level QP will yield very small \(\varDelta z_{i}\), which again yields small steps towards the optimum. Thus an adaptive scheme was used to prevent indefinite Hessian matrices while eventually allowing larger changes in the reference variables \(z_{i}\).
These adaptations to ALADIN were made to ensure convergence for all considered scenarios, which is, of course, a tradeoff since without these adaptations many scenarios converge much faster.
In general, in some real settings, sharing the gradients of the local objectives may not be acceptable, as this may allow the coordinator to decipher the local cost structure.
Evaluation of the heuristic for the satisfaction of the terminal constraints
In all considered scenarios with free final times, the terminal constraints were satisfied for the distributed solutions. This can in general not be guaranteed and including this check along with primal and dual feasibility as a convergence criterion can prevent convergence. We thus recommend for the application of distributed optimization with free final times to check the feasibility of all subproblems at convergence of the distributed optimization method. If this is not satisfied upon convergence, a fallback to a search space with worse objectives can be implemented. For the considered case study, this could, for example, be to increase the final times for all subsystems and reoptimize without adapting the final times, which would eventually guarantee the satisfaction of the terminal constraint. Other fallbacks could be to allocate 1/N of the shared resources to each reactor or to partly disaggregate the profiles.
With respect to the necessary number of required iterations to converge, it can be said that the proposed strategy to adapt the final times can be integrated into the iterative methods without a significant influence on the overall number of iterations.
Conclusions
In this contribution, different methods for distributed trajectory optimization, in which the objective values of the subsystems are not shared, were investigated. As an example, the trajectories of semibatch reactors that are connected via overarching constraints on the feed rates were optimized. We evaluated and compared different methods based on the optimization of the trajectory of a benchmark semibatch reactor and showed that for the considered case, convergence to local minima was achieved. Furthermore, a heuristic was proposed to include the final times of the different trajectories as degrees of freedom. Since the considered problem is not convex, a quantitative analysis of the results was done and possible obstacles for the application of the distributed optimization methods to other trajectory optimization problems were pointed out. In the distributed optimization, the structure of the arcs of the optimal solution may be different from the structure of the solutions for the subsystem problems as constraints for the subproblems are now dualized.
The three investigated methods, the subgradient method, ADMM, and ALADIN, provide different tradeoffs between sharing of information and rate of convergence. As expected, in general it can be said that the more information is exchanged between the coordination level and the subsystem level, the faster the methods converge. Since ADMM showed a consistent rate of convergence and it requires no additional information from the subsystems beyond the resource consumptions, it is recommended as the first choice for distributed trajectory optimization problems if confidentiality is of importance.
The results can also be applied in various other domains where resources have to be allocated or shared between different dynamic systems, e.g., in the coordination of plugin electric vehicles, the coordination of autonomous robots, distributed control, etc.
Future work will focus on improving the convergence of ALADIN for problems with overarching inequality constraints by better exploiting the available information on the active sets from the subproblems. There are other techniques than SQP, e.g. interior point or active set methods, which might be adapted to the application with ALADIN to enable faster convergence to the correct active set. Furthermore, the characteristics of the trajectory optimization problems which can be solved with the proposed methods, or more specifically, what structure of arcs and what type of terminal constraints can be coordinated, should be investigated.
References
Andersson JAE, Gillis J, Horn G, Rawlings JB, Diehl M (2018) CasADi: a software framework for nonlinear optimization and optimal control. Math Program Comput 11:1–36
Bellman R (1957) Dynamic programming, 1st edn. Princeton University Press, Princeton
Bertsekas DP (1995) Dynamic programming and optimal control. Athena Scientific, Belmont
Bertsekas DP (1999) Nonlinear programming. Athena Scientific, Belmont
Bertsekas DP, Tsitsiklis JN (1989) Parallel and distributed computation: numerical methods. PrenticeHall Inc, Upper Saddle River
Betts JT (1998) Survey of numerical methods for trajectory optimization. J Guid Control Dyn 21(2):193–207
Biegler LT (2007) An overview of simultaneous strategies for dynamic optimization. Chem Eng Process Process Intensif 46(11):1043–1053
Bock HG, Plitt KJ (1985) Multiple shooting algorithm for direct solution of optimal control problems. IFAC Proc Ser 17(2):1603–1608
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2010) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends® Mach Learn 3(1):1–122
Boyd S, Vandenberghe L (2004) Convex optimization, vol 25. Cambridge University Press, Cambridge
Camponogara E, Jia D, Krogh B, Talukdar S (2002) Distributed model predictive control. IEEE Control Syst Mag 22(1):44–52
Cheng R, Forbes J, Yip W (2007) Pricedriven coordination method for solving plantwide MPC problems. J Process Control 17(5):429–438
Christofides PD, Scattolini R, Muñoz de la Peña D, Liu J (2013) Distributed model predictive control: a tutorial review and future research directions. Comput Chem Eng 51:21–41
Conejo AJ, Castillo E, GarciaBertrand R, Minguez R (2006) Decomposition techniques in mathematical programming: engineering and science applications. Springer, Berlin
Cortés J, Martínez S, Karataş T, Bullo F (2004) Coverage control for mobile sensing networks. IEEE Trans Robot Autom 20(2):243–255
Engell S, Paulen R, Sonntag C, Thompson H, Reniers M, Klessova S, Copigneaux B (2016) Proposal of a European research and innovation agenda on cyberphysical systems of systems. Process Dynamics and Operations Group, Dortmund
Engelmann A, Faulwasser T (2019) Feasibility vs. optimality in distributed AC OPF: a case study considering ADMM and ALADIN. In: Bertsch V, Ardone A, Suriyah M, Fichtner W, Leibfried T, Heuveline V (eds) Advances in energy system optimization. Birkhäuser, Basel, pp 3–12
Esposito WR, Floudas CA (2000) Deterministic global optimization in nonlinear optimal control problems. J Glob Optim 17(1):97–126
Farokhi F, Shames I, Johansson KH (2014) Distributed MPC via dual decomposition and alternative direction method of multipliers. Intell Syst Control Autom Sci Eng 69:115–131
Galceran E, Carreras M (2013) A survey on coverage path planning for robotics. Robot Auton Syst 61(12):1258–1276
Gatsis N, Giannakis GB (2013) Decomposition algorithms for market clearing with largescale demand response. IEEE Trans Smart Grid 4(4):1976–1987
Hasan M, Hossain E, Kim DI (2014) Resource allocation under channel uncertainties for relayaided devicetodevice communication underlaying LTEA cellular networks. IEEE Trans Wirel Commun 13(4):2322–2338
Hong M, Luo Zq (2017) On the linear convergence of the alternating direction method of multipliers. Math Program 162(1–2):165–199
Houska B, Frasch J, Diehl M (2016) An augmented lagrangian based algorithm for distributed nonconvex optimization. SIAM J Optim 26(2):1101–1127
Jiang Y, Nimmegeers P, Telen D, Van Impe J, Houska B (2017) A Distributed Optimization Algorithm for Stochastic Optimal Control. IFACPapersOnLine 50(1):11263–11268
Jose RA, Ungar LH (2000) Pricing interprocess streams using slack auctions. AIChE J 46(3):575–587
Koutsopoulos I, Iosifidis G (2010) A framework for distributed bandwidth allocation in peertopeer networks. Perform Eval 67(4):285–298
Kozma A, Conte C, Diehl M, Methods O, Kozma A, Conte C, Diehl M (2014) Benchmarking largescale distributed convex quadratic programming algorithms. Optimization methods and software 30(1):191–214
Maestre JM, Negenborn RR (eds) (2014) Distributed model predictive control made easy, intelligent systems, control and automation: science and engineering, vol 69. Springer, Dordrecht
Maxeiner LS, Engell S (2020) An accelerated dual method based on analytical extrapolation for distributed quadratic optimization of largescale production complexes. Comput Chem Eng 135:106728
Mesarovic MD, Macko D, Takahara Y (1970) Theory of hierarchical, multilevel, systems. Academic Press, New York
Negenborn R, Maestre J (2014) Distributed model predictive control: an overview and roadmap of future research opportunities. IEEE Control Syst 34(4):87–97
Nemhauser G, Wolsey L (1988) Integer and combinatorial optimization. Wiley, Hoboken
Nesterov Y (2004) Introductory lectures on convex optimization. In: Pardalos PM, Hearn DW (eds) Applied optimization, vol 87. Springer, Boston
Nie Y, Biegler LT, Villa CM, Wassick JM (2015) Discrete time formulation for the integration of scheduling and dynamic optimization. Ind Eng Chem Res 54(16):4303–4315
Palomar DP, Chiang M (2006) A tutorial on decomposition methods for network utility maximization. IEEE J Sel Areas Commun 24(8):1439–1451
Palomar DP, Chiang M (2007) Alternative distributed algorithms for network utility maximization: framework and applications. IEEE Trans Autom Control 52(12):2254–2269
Papamichail I, Adjiman CS (2002) A rigorous global optimization algorithm for problems with ordinary differential equations. J Glob Optim 24(1):1–33
Pontryagin L (2018) Mathematical theory of optimal processes. Routledge, New York
RiusSorolla G, Maheut J, EstellésMiguel S, GarciaSabater JP (2020) Coordination mechanisms with mathematical programming models for decentralized decisionmaking: a literature review. Central Eur J Oper Res 28(1):61–104
Rodrigues D, Bonvin D (2019) Dynamic optimization of reaction systems via exact parsimonious input parameterization. Ind Eng Chem Res 58(26):11199–11212
Safdarian A, FotuhiFiruzabad M, Lehtonen M (2014) A distributed algorithm for managing residential demand response in smart grids. IEEE Trans Ind Inform 10(4):2385–2393
Sargent R (2000) Optimal control. J Comput Appl Math 124(1–2):361–371
Scattolini R (2009) Architectures for distributed and hierarchical model predictive control—a review. J Process Control 19(5):723–731
Shor NZ (2012) Minimization methods for nondifferentiable functions, vol 3. Springer, Berlin
Srinivasan B, Palanki S, Bonvin D (2003) Dynamic optimization of batch processes I. Characterization of the nominal solution. Comput Chem Eng 27(1):1–26
von Stryk O, Bulirsch R (1992) Direct and indirect methods for trajectory optimization. Ann Oper Res 37(1):357–373
Tang W, Allman A, Pourkargar DB, Daoutidis P (2018) Optimal decomposition for distributed optimization in nonlinear model predictive control through community detection. Comput Chem Eng 111:43–54
Ubrich O, Srinivasan B, Stoessel F, Bonvin D (1999) Optimization of a semibatch reaction system under safety constraints. In: 1999 European control conference (ECC). IEEE, pp 850–855
Van den Broeck L, Diehl M, Swevers J (2011) A model predictive control approach for time optimal pointtopoint motion control. Mechatronics 21(7):1203–1212
Van Parys R, Pipeleers G (2017) Distributed MPC for multivehicle systems moving in formation. Robot Auton Syst 97:144–152
Wächter A, Biegler LT (2006) On the implementation of an interiorpoint filter linesearch algorithm for largescale nonlinear programming. Math Program 106(1):25–57
Wang S, Shahidehpour S, Kirschen D, Mokhtari S, Irisarri G (1995) Shortterm generation scheduling with transmission and environmental constraints using an augmented Lagrangian relaxation. IEEE Trans Power Syst 10(3):1294–1301
Wang SL, Liao LZ (2001) Decomposition method with a variable parameter for a class of monotone variational inequality problems. J Optim Theory Appl 109(2):415–429
Wenzel S, Paulen R, Beisheim B, Krämer S, Engell S (2017) Marketbased coordination of shared resources in cyberphysical production sites. Chemie Ingenieur Technik 89(5):636–644
Wenzel S, Paulen R, Stojanovski G, Krämer S, Beisheim B, Engell S (2016) Optimal resource allocation in industrial complexes by distributed optimization and dynamic pricing. atAutomatisierungstechnik 64(6):428–442
Zhang Y, Gatsis N, Giannakis GB (2013) Robust energy management for microgrids with highpenetration renewables. IEEE Trans Sustain Energy 4(4):944–953
Acknowledgements
Open Access funding provided by Projekt DEAL.
Author information
Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The project leading to this publication has received funding from the European Union’s Horizon 2020 research and innovation programme under Grant Agreement No. 723575 (CoPro, spire2030.eu/copro) in the framework of the SPIRE PPP.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Maxeiner, L.S., Engell, S. Comparison of dual based optimization methods for distributed trajectory optimization of coupled semibatch processes. Optim Eng 21, 761–802 (2020). https://doi.org/10.1007/s11081020094997
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11081020094997
Keywords
 Distributed optimization
 Dual methods
 Dual decomposition
 Trajectory optimization
 Semibatch reactors
 ADMM
 ALADIN
 Sharing of resources