Comparison of dual based optimization methods for distributed trajectory optimization of coupled semi-batch processes

The physical and virtual connectivity of systems via flows of energy, material, information, etc., steadily increases. This paper deals with systems of sub-systems that are connected by networks of shared resources that have to be balanced. For the optimal operation of the overall system, the couplings between the sub-systems must be taken into account, and the overall optimum will usually deviate from the local optima of the sub-systems. However, for reasons, such as problem size, confidentiality, resilience to breakdowns, or generally when dealing with autonomous systems, monolithic optimization is often infeasible. In this contribution, iterative distributed optimization methods based on dual decomposition where the values of the objective functions of the different sub-systems do not have to be shared are investigated. We consider connected dynamic systems that share resources. This situation arises for continuous processes in transient conditions between different steady states and in inherently discontinuous processes, such as batch production processes. This problem is challenging since small changes during the iterations towards the satisfaction of the overarching constraints can lead to significant changes in the arc structures of the optimal solutions for the sub-systems. Moreover, meeting endpoint constraints at free final times complicates the problem. We propose a solution strategy for coupled semi-batch processes and compare different numerical approaches, the sub-gradient method, ADMM, and ALADIN, and show that convexification of the sub-systems around feasible points increases the speed of convergence while using second-order information does not necessarily do so. Since sharing of resources has an influence on whether trajectory dependent terminal constraints can be satisfied, we propose a heuristic for the computation of free final times of the sub-systems that allows the dynamic sub-processes to meet the constraints. For the example of several semi-batch reactors which are coupled via a bound on the total feed flow rate, we demonstrate that the distributed methods converge to (local) optima and highlight the strengths and the weaknesses of the different distributed optimization methods.

As a consequence, interconnected systems are often optimized on two levels: On the sub-system level, each sub-system tries to maximize its performance according to given boundary conditions, and on the system-wide or coordinator level, where the optimal allocation of shared resources is performed (Mesarovic et al. 1970). Such bilevel problems can be solved via iterative distributed optimization approaches, i.e. primal-based and dual-based decomposition methods which are compared for instance in Conejo et al. (2006), Chiang (2006, 2007). The most suitable methods to tackle the aforementioned challenges, in particular the confidentiality issue, are dual-based decomposition methods where the coordinator broadcasts information such as prices to the different sub-systems, whereupon these respond with their estimated usage of the shared resources, and the coordinator iteratively adapts the prices until the resource balances are met.
While methods for the distributed optimization of steady state problems have been thoroughly investigated, dynamic systems have mostly been considered in the context of distributed model predictive control. Most of the work in this area considers the control of continuous processes with linear dynamics, cf. the survey in Negenborn and Maestre (2014). To the authors' knowledge, little work has been done to compare different distributed optimization methods for coordinating resources among distributed dynamic systems.
The application that we discuss here is the dynamic resource allocation for coupled chemical reactors that are operated in semi-batch mode, i.e., some substances are filled into the reactor at the start while others are dosed during the batch run. At the end of the batch run, the reaction mixture which contains the desired product is withdrawn. We focus on how to share a limited amount of a resource, in this case, the feed flow to the reactors between the semi-batch reactors, in order to determine an optimal overall operation for given initial conditions and a given production schedule. This is a common challenge in the process industry when large quantities of products are produced in multi-product multi-batch plants and suitable production sequences have to be defined and executed (Nie et al. 2015).

Contribution
The goal of this contribution is to provide a distributed optimization strategy for semi-batch processes with end-point constraints that are subject to overarching constraints and to compare different algorithms for the distributed allocation of shared resources between dynamic systems. We restrict ourselves to dual based methods that do not require the knowledge of the objective values of the sub-systems. Additionally to the sub-gradient method and ADMM, we apply the augmented Lagrangian based algorithm for distributed non-convex optimization (ALADIN) and adapt it so that it can also handle overarching inequality constraints as they arise for systems that share finite amounts of resources.
The challenge in solving these type of problems is that significant change in the arc structure, i.e. changes in the active sets of the constraints of the sub-systems for small changes in the dual variables occur during the iterations of the dual-based distributed optimization methods.
We present a heuristic approach for the solution of such resource allocation problems for dynamic systems with terminal constraints and free final times, such that the different sub-systems meet the constraints in the presence of overarching resource constraints.

Notation
Sub-system related vectors x i ∈ ℝ n i , i ∈ 1, … , N , are stacked into one large vector x = [x T 1 , x T 2 , … , x T N ] T ∈ ℝ n of variables on the system level. The subscript i is used as indicator for the different sub-systems throughout the paper. The superscript (k) indicates the current iteration k. To index the j-th element of a vector x, the notation x[j] is used.
The Eucledian norm of a vector x is indicated by ‖x‖ 2 . The infinity norm of a vector x ∈ ℝ m is defined by ‖x‖ ∞ = max {�x[1]�, … , �x[m]�} , where |⋅| is the absolute value of a scalar and the max operator followed by {⋅} indicates the element-wise maximum of the set.

Theoretical background
First, a general mathematical problem formulation for a single dynamic system is presented, which is then extended to several systems that share a common resource. The extended problem is put into a standard form and algorithms for the distributed solution of the problem are introduced.

Problem formulation for a sub-system trajectory
A general dynamic or trajectory optimization problem over a fixed time interval t ∈ [t 0 , t f ] can be written as follows: T (t f ) ≤ 0.

3
Comparison of dual based optimization methods for distributed… In dynamic optimization, there is a distinction between state variables (t) and inputs u(t) to the system. From the inputs u(t) and the initial condition, (t 0 ) = 0 , the state variables can be computed using the model equation Eq. 1b. Thus u(t) are the degree of freedom, while (t) are the dependent variables. The goal is to find the best inputs such that the constraints are satisfied and some performance measure of the resulting trajectory is maximized or minimized.
The objective is given in the Bolza form which consists of a scalar performance measure at the end of the horizon, , and an integral part that tracks some scalar performance measure over the whole path of the trajectory, . Similar to the objective, the constraints are defined as terminal constraints ( T i ) and path constraints ( P i ) (Sargent 2000).

Problem formulation for multiple trajectories with shared inputs
The problem of interest in this paper is to optimize N trajectories for N sub-systems that share resources and start as well as end their operation possibly at different times. This can be formulated as the following dynamic optimization problem: The considered time interval is given by t min = min t 0,1 , … , t 0,N and t max = max t f ,1 , … , t f ,N . The objective is to minimize the sum of the individual objectives. The variables i and u i belong to sub-system i exclusively, can only be manipulated by the respective sub-system, and, except for coupling via the overarching constraints Eq. 2b, have no impact on the other sub-systems. Due to the different 1 3 starting and final times of the trajectories, when the trajectory of sub-system i is not active, its use of the resource is fixed at 0 via Eq. 2g.

Numerical solution methods for trajectory optimization
The problems given in Eqs. 1 and 2 can be solved using different approaches: Direct optimization methods, methods based on Pontryagin's minimum principle, and methods that are based on the Hamilton-Jacobi-Bellman equations (Bellman 1957;Bertsekas 1995;Pontryagin 2018;von Stryk and Bulirsch 1992). Depending on the selected method, the level of discretization can be chosen: All quantities can be considered infinite-dimensional in time, only the inputs can be discretized, or additionally to the inputs also the states can be fully discretized. If the inputs are discretized, they are usually considered to be piece-wise constant or piece-wise linear within the discretization elements. An overview of the solution methods as well as discretization levels is given in Betts (1998) andSrinivasan et al. (2003).
While there are also other efficient methods, as, e.g., parsimonious input parametrization (Rodrigues and Bonvin 2019), direct methods are used here, since they are best suited to handle the overarching constraints Eq. 2b as well as to get reliable numerical solutions (Srinivasan et al. 2003). When the inputs are discretized into equidistant intervals of duration t , there are three options to solve such a problem using the direct method: The first option is control vector parametrization, where the states remain continuously defined for every point in time and are determined by integration. The degrees of freedom for the optimization are the values of the inputs. The second is the simultaneous approach, where, additionally to the inputs, also the states are fully discretized such that a sparse large non-linear program (NLP) results (Biegler 2007). Since also the states are degrees of freedom, only when the NLP has been solved, the resulting trajectory satisfies the model equations. On the other hand, this method often shows to be more robust. The third option is multiple shooting, in which the time is divided into several intervals, where in each interval control vector parametrization is used to determine the solution. Matching of the states at the ends of the intervals is forced by an additional boundary condition (Bock and Plitt 1985). In this contribution, control vector parametrization is applied and a constant stepsize 4th-order Runge-Kutta method is used for integration, because this renders all derivatives, of objective as well as of the constraints, dependent only on the inputs.

Discretized problem formulation
The discretization of the problem in Eq. 1 can be done as described in the previous subsection, however, for the problem formulation with multiple trajectories and shared inputs, synchronization of time between the different trajectories needs to be assured. Only if the starting and ending times are on the same time grid as the discretization of the inputs, the overarching constraints can be enforced at every point in time. The starting and ending times are thus expressed as multiples of the shared minimum discretization duration t via the following relationship: Comparison of dual based optimization methods for distributed… For each sub-system i, the sets of points Similar to the continuous case, N min and N max are defined as the minimum and maximum over all sub-systems i of N i,0 and N i,f . This leads to the following problem formulation: In this formulation, p is the discrete time index. Note that in this case the models F i are defined implicitly, connecting the old and the new states. This is used to express numerical integration or full discretization. Furthermore, it should be noted that the path constraints P i are only defined at the grid points.
The resulting optimization problem is non-convex, which is in general a NP-hard problem that can have multiple local minima (Esposito and Floudas 2000;Papamichail and Adjiman 2002).

Problem formulation in the standard form of distributed optimization
The problem of optimizing trajectories with shared resources across system boundaries from Eq. 5 can be written in the standard form of a general sharing problem, cf. Boyd et al. (2010), The variables x i comprise the inputs and the states of the sub-systems, dim(x i ) = dim( i ) + dim(u i ) . The inputs of the trajectory optimization problem described in the previous subsection are given by the linear mapping The initial conditions, model equations, and systemspecific constraints are described by the n g i -dimensional inequality constraint function g i . The dimension of the overarching constraints is m, i.e.,

Necessary conditions of optimality for distributed optimization
For this problem in standard form, the Lagrangian of the problem is given by Bertsekas (1999): where are the Lagrange multipliers corresponding to the overarching constraints in Eq. 6b and i are the Lagrange multipliers for the sub-system specific constraints in Eq. 6c. Using the Lagrangian, the first-order necessary conditions of optimality (Karush-Kuhn-Tucker conditions) can be expressed as:

3
Comparison of dual based optimization methods for distributed… The interesting property of these conditions is that Eqs. 8a-8c can be evaluated independently for each sub-system i and only Eqs. 8d and 8e require coordination between the different sub-systems.

Distributed solution algorithms based on the dual problem
While the Eqs. 8 can be solved monolithically via state of the art solvers, in this contribution methods that exploit the distributed structure of the problem are investigated.
We focus on hierarchical methods, where all sub-system specific decisions are taken in a distributed fashion and only on the coordination layer, the satisfaction of the overarching constraints Eqs. 8d and 8e is enforced. These methods are also known as dual methods, which make use of the dual variables or Lagrange multipliers. In our case, the dual variables of interest are the ones corresponding to the overarching constraints, i.e. .
Using the solution to Eqs. 8a-8c, written as inf x i , i ≥0 L i (x i , , i ) , the dual function can then be defined: Using this dual function of , the optimality condition can be expressed as finding the maximum of d( ) with ≥ 0 , which is called the dual problem: Due to the infimum in Eq. 9, the dual function is in general not known explicitly. However, according to Danskin's theorem (Bertsekas 1999), a sub-gradient can be determined at via: In this contribution, we compare iterative methods for the maximization of the dual which do not require the explicit knowledge of the value of the dual function and thus of the different individual objectives. As measures of convergence, we define two criteria. The primal feasibility, Primal , is a measure of the satisfaction of the overarching constraints of the original problem. At the same time, due to Eq. 11 and (8c) the fact that the dual is always concave, it is also a measure of the vanishing of the gradient (Boyd and Vandenberghe 2004).
Here we highlight that x i is a function of . If all elements of Primal are equal to 0, the overarching constraints are satisfied. Additionally to the primal feasibility, which measures the satisfaction of the constraints, also a measure of convergence is required, in order to prevent termination when a solution is primal feasible, but not optimal yet. The dual feasibility, Dual , can be interpreted as the satisfaction of the optimality criterion for the dual problem, i.e., the gradient approaching the 0 vector. Thus we define the dual feasibility as the finite difference approximation of the gradient of the dual function.
This second feasibility criterion is a measure of how far the solution can deviate from active overarching constraints and is essential for inequality constrained problems, since primal feasibility can always be achieved by sufficiently large Lagrange multipliers. This ensures that the solution not only satisfies ≥ − for all active overarching constraints j. Only if a solution is primal and dual feasible, a saddle point to the Lagrangian that satisfies the conditions of optimality is found.
Since we can optimize numerically only with a certain numerical error, we define a set of x * , * , and * to be optimal if the following holds: where Feas,Primal and Feas,Dual are the desired numerical tolerances and x i minimizes the objective of sub-system i.

Sub-gradient method
The simplest method for the maximization of the dual is to follow the direction of the steepest ascent, i.e., to use the direction of the sub-gradient (Shor 2012). Here the challenge is the selection of a suitable stepsize. Since the dual function may be a non-smooth function, which depends on the solution structure of the different trajectories, the stepsize selection criteria for non-smooth optimization derived in Nesterov (2004, p. 142) should be satisfied.
Since the impact of small changes in the dual variables is not the same for all variables [j], j ∈ {1, … , m} , a scheme is needed that adapts the stepsizes individually.
While there exist methods to evaluate the optimal stepsize at the current point using the Lipschitz constant, as explained in Bertsekas and Tsitsiklis (1989) and Kozma et al. (2014), determining the constant is difficult since the dual cannot be evaluated explicitly. Furthermore, since this constant is not valid globally, the stepsizes either have to be adapted during the maximization of the dual or be chosen conservatively enough to be valid throughout the domain of the dual function.
The sub-gradient method can be considered as alternating local optimization of the sub-systems and adaptation of the Lagrange multipliers on the coordination layer. We propose to decrease the stepsize of a specific overarching constraint every time the sign of the corresponding element of changes. This is equivalent to the inequality constraint becoming active or inactive respectively. Specifically the following adaptation of is used (cf. Algorithm 1, line 8): If one of the stepsizes is too large and it has an influence on the Lagrange multiplier (note that if a constraint is inactive, the multiplier is fixed at 0), then this leads inevitably to a diminishing of the stepsize in this direction. The factor in Algorithm 1 prevents the stepsize from decreasing too fast due to oscillating responses while maintaining the stepsize as large as possible if consecutive steps have the same direction. otherwise.
Other possible selections of can be found in Nesterov (2004). Regardless of the selection of the stepsize, the provable convergence rate for strictly convex problems is at best O(1∕k).

ADMM
A more robust method is the alternating direction method of multipliers (ADMM) Boyd et al. (2010). It uses the augmented Lagrangian: Additionally to the linear penalty term, the deviation from a feasible use of the shared resources z i is penalized quadratically. Essentially, the penalty terms convexify the problems around points that satisfy the overarching constraints, which accelerates the initial convergence. The variables z i are determined on the coordinator level and are a projection of the current responses of the different sub-systems onto the feasible region. The stepsize for the update of the Lagrange multipliers is N in the case of ADMM. However, since additionally to the prices also the z i determine the response of the systems, the dual feasibility is redefined as: The convergence rate for convex problems is also O(1∕k) (Hong and Luo 2017; Kozma et al. 2014).
In Algorithm 2 the unscaled version of ADMM for equality constrained sharing problems is shown, cf. Boyd et al. (2010, p. 59). When adapting the algorithm to the inequality constraint problem considered here, the dual variables need to satisfy ≥ 0 and the z-variables need to be adjusted depending on whether the overarching constraints are active or not. When they are active, the same update as in Algorithm 2 applies, however, if the constraints are not active, the references z i are based on the previous solutions of the sub-systems. This penalty is required since otherwise only some variables are quadratically penalized possibly leading to cyclic solution changes in subsequent iterations. We present ADMM including a new and efficient update step to compute the z-variables for inequality constrained problems in Algorithm 3. To improve convergence, different and variable penalty parameters are used for each constraint. The penalty parameters are adapted at step 10 of Algorithm 3 using the scheme in Wang and Liao (2001): The factor is the maximally allowed difference between primal and dual feasibility before the penalty parameter is adapted to rebalance the proportion. The parameters Incrase > 1 and Decrease < 1 adjust the penalty parameter if necessary. The update ensures that primal and dual feasibility are kept in balance or, in other terms, that far away from the optimum large changes in are made. Close to the optimum, otherwise.
the changes are reduced and additionally the quadratic penalization is decreased, to ensure that the solution without the quadratic penalty also satisfies the overarching constraints.

ALADIN
Another method that uses the augmented Lagrangian is the augmented Lagrangian based algorithm for distributed non-convex optimization in Houska et al. (2016). Different to ADMM, here the reference values for all variables of sub-system i are penalized for deviating from the reference z i : Additionally to reporting the consumption of the shared resources, the sub-systems also report their derivatives of the objective and of the active constraints with respect to the local decision variables to the coordination layer in each iteration k. The Hessian and gradient approximations are then calculated using the constraint The modified gradient and Hessian approximation are: With this information of the sub-systems, instead of doing straight projections onto the feasible set, prices and reference values ( z i ) are determined via a quadratic program that approximates the objective functions and active constraints of the sub-systems. The algorithm can be seen as a combination of sequential quadratic programming (SQP) and ADMM. The benefit of using more information from the different sub-systems on the coordinator level is in general a faster convergence. In Houska et al. (2016) it is shown that in theory super-linear to quadratic convergence rates are possible. (20) (21)

3
Comparison of dual based optimization methods for distributed… The update of the Lagrange multipliers is done differently compared to the previous two methods without sub-gradient information. Instead, the Lagrange multipliers from the overarching constraints QP in the QP are used in the update step. The algorithm for equality constrained problems as proposed in Houska et al. (2016) is given in Algorithm 4. The dual feasibility is defined as follows: The parameters i ∈ [0, 1] can be used to adjust the behaviour of the algorithm to match frequent changes of the active set. Houska et al. (2016) provide an additional scheme that utilizes the objective values of the sub-systems, based on which these parameters can be adapted in each iteration to guarantee the convergence to a local minimum. The scheme is not considered in this work, because essentially a monolithic optimization is carried out to determine the parameters.
Since trajectory optimization problems with overarching inequality constraints are considered, Eq. 24b is changed to an inequality constraint and the algorithm is modified accordingly. The solution to trajectory optimization problems consists of different arcs, which correspond to active constraints. Since these constraints ultimately act on the inputs, Eq. 24c fixes all z i for the inputs in the QP.
Therefore, the equality constraint Eq. 24c is changed to inequality, such that the variables z i are degrees of freedom. If the reference variables z i resulting from the QP are infeasible, the Hessian of the sub-systems may become indefinite, which occurs mainly at the beginning of the scheme, when the QP approximations of the active constraints do not reflect the actual solution. Positive definiteness of H (k) i is required for ALADIN (Houska et al. 2016) and therefore we propose the following strategy to enforce this condition: The elements on the main diagonal are increased by I , where I is the identity matrix. Since having high values for penalizes large changes in z i , this value is decreased with the number of iterations until H (k) i becomes indefinite. Then remains fixed at this value, or is increased in subsequent iterations, if the Hessian is still indefinite.
Another challenge that arises from trajectory optimization is that small changes in the Lagrange multipliers of the overarching constraints can change the active set of the sub-systems. In order to be able to reach the correct active set, the stepsize of the algorithm possibly has to be infinitesimal. Thus, also the i are adapted in each iteration. Eq. 24b is modified to account for smaller values of 1 , such that the new reference variables z i are always feasible according to the coordinator level QP. In Algorithm 5, the different steps of ALADIN, adjusted to inequality constrained problems, are given.

3
Comparison of dual based optimization methods for distributed…

Other methods
There are a variety of other dual based methods. For instance the ones introduced in Maxeiner and Engell (2020) or Wenzel et al. (2016), where, similar to the sub-gradient method, only Lagrange multipliers and the usage of the shared resources are exchanged. However, these methods are designed for equality constrained shared resource allocation problems.
Other methods, e.g. those presented in Kozma et al. (2014) and Nesterov (2004), use the objective values of the individual sub-systems. In this contribution, we focus on the presented methods, since they do not require the knowledge of the objective values of the individual sub-systems. Hence, they can be applied to solve problems where due to confidentiality the profit of the sub-systems cannot be openly communicated.

Problem formulation for multiple trajectories with shared inputs and free final times
Due to the overarching constraint on the sharing of the resources, the terminal states of the trajectories change. With fixed final times, boundary conditions on the terminal state which are infeasible due to the overarching resource-sharing constraints, cannot be satisfied. Thus, additional degrees of freedom that enable the satisfaction of the terminal constraints, i.e., free final times, are needed. The standard approach to include the final time as an optimization variable in trajectory optimization is time scaling. The time horizon is scaled to the interval [0, 1], discretized, and multiplied with the final time which is a continuous variable that is minimized. The number of discretization intervals stays constant, but the lengths of the discretization intervals change. The downside of this approach, for the considered scenario with sharing of resources, is that the constraints on the shared resources cannot be enforced exactly anymore, because the discretization intervals are not synchronized between the sub-systems.
Another possibility is to adjust the number of intervals. Then, the length of the discretization intervals is fixed and the shared resource constraints can be enforced across all systems. As a result, the additional optimization variable, the number of discrete intervals, is of integer type ( Van den Broeck et al. 2011).
In the following, we use the latter approach and consider a single terminal constraint for each sub-system that is feasible without the overarching constraint but not necessarily when it is present. The resulting problem of trajectory optimization with shared resources and free final times for the different trajectories can be written as the following mixed-integer non-linear program (MINLP):

3
The binary variables y i,p and ŷ i,p are used for each sub-system i to indicate in which intervals p the trajectory is active ( y i,p ) and in which interval p the trajectory (27b) satisfies the terminal constraints ( ŷ i,p ). The connection between continuous and binary variables is done using the Big M method (Nemhauser and Wolsey 1988). Different to the problem given in Eqs. 5, N max is not the maximum of N i,f but must be a sufficiently large integer such that the problem is feasible, i.e., that all trajectories can satisfy the terminal constraint when the resources are shared.
The problem given in Eqs. 27 can be solved monolithically using an MINLP solver. Since in this contribution distributed solutions are sought, another way to handle the binary variables is to use the heuristic described in Van den Broeck et al. (2011), where the minimum interval for which the terminal constraint is satisfied is found iteratively. The iterative process of determining the final interval is included into the scheme for the satisfaction of the overarching constraints. Each sub-system updates its number of intervals via the following equation: The adaptation is done if at least one of the terminal constraints cannot be reached with the given final times. As an additional criterion, the number of iterations between two adaptations is increased by 1 each time the final times are adapted. As the changes in the discrete variables become less frequent, the influence of their adaptation vanishes and the distributed optimization methods converge.
Since the objectives are in general not smooth with respect to discrete changes of the final time, the objective functions of the sub-systems should be scaled with their respective final times, i.e. by dividing the objective by the final time, in order to reduce the effect of the discrete changes.

Semi-batch reactor case study
In this paper, we consider a modified version of the isothermal semi-batch reactor with a safety constraint example from Ubrich et al. (1999). This reactor has been widely used as a benchmark in the trajectory optimization literature, e.g., Srinivasan et al. (2003).
A first-order reaction is considered, in which the following reaction occurs: Previous to the reaction, the reactor is filled with the amount V 0,i c A,0,i of reactant A. The dosage profile of reactant B is a degree of freedom and hence the feed rates u i (t) are the manipulated variables. The ordinary differential equations that describe the trajectories of the states in each reactor i are:

3
The trajectories of each reactor i need to satisfy the following constraints: • Limitation of the feed rate of the reactant B by u Max,i , • The path constraint that the adiabatic temperature rise in the reactor is limited which poses a constraint on the concentration c B,i (t) within the reactor, • The path constraint on the volume of the reactor, which requires that the reaction volume cannot exceed the maximum available reactor volume V Max,i .
As a terminal constraint, the amount of C must be above the desired threshold n C,Des,i . This amount can be calculated via n C, In Table 1, all case study specific numerical values are given.
In trajectory optimization, different criteria can be selected as economically motivated objectives, e.g.: • Maximization of the valuable product at the end of the batch time, which yields the maximum material efficiency. • Minimization of the time necessary to produce a certain amount of valuable product, which yields as many batches as possible per time. • Maximization of productivity, i.e., the amount of valuable product divided by the batch time.
Here, the maximization of the throughput of product C is chosen as the optimization criterion for each reactor i: In Fig. 1 the optimal trajectories of the states and the input of a single semibatch reactor without overarching constraints are shown for an input discretization interval of 4 h. On the left, the trajectories of the states are shown and the trajectory of the amount of final product is shown. The effective constraints on the quantities are indicated by the thin horizontal lines in the same line style. On the right, the input trajectory is shown. One can see the presence of different arcs, i.e., the presence of different active constraints. At first, the maximum feed rate constraint is active, then the temperature at cooling failure limits the feed rate until the feeding needs to be stopped because the maximum volume of the reaction mixture is reached.
In the case of a single semi-batch reactor, the terminal constraint is satisfied after 17 discrete elements of duration t = 4 h.
Additionally to the individual limitations of the feed flows, we consider a coupling of the reactors via an overarching constraint on the joint feed flow rate of the reactant B:  The maximum combined feed flow rate for all reactors is considered to be constant. Dualizing this overarching constraint Eq. 34 according to Eq. 7 yields the following integral cost term in the objectives of the sub-systems:

Validation of the solutions
All subsequent distributed solutions are compared to the monolithic solutions of the same problem. In the case of free final times, instead of solving the MINLP, all possible solutions for the final interval N i,f that end no later than 3 intervals from the unconstrained solution are enumerated and used as the benchmark for the evaluation of the different distributed solutions. For the iterative methods with the augmented Lagrangian term, upon convergence, the solutions are validated with = 0 in order to ensure that the constraints are satisfied even without penalty parameters and thus the corresponding Lagrange multipliers satisfy the necessary conditions of optimality.

Numerical results
The performance of the different methods is evaluated using the following three criteria: • required number of iterations, • evolution of the primal infeasibility over the iterations, • objective value at convergence. All scenarios were evaluated using the optimization parameters given in Table 2, which were determined empirically. The optimization problems are solved in Python using IPOPT as NLP solver and the CasADi toolbox for the computation of the derivatives (Andersson et al. 2018;Wächter and Biegler 2006).
In the following, all reactors have the same properties and initial conditions, which is not required in general. Different scenarios are generated by changing the number of reactors, the time discretization t , and the starting times of the reactors. Three reactors starting at 0 t , 1 t , and 2 t are indicated by the starting sequence [0, 1, 2].

Comparison of the methods for fixed final times
In the following, first the results for fixed final times are discussed. Since changing the fixed final time has only a minor influence on the arc structure of the solution as long as feeding is completed within the considered time horizon, it is not varied.

3
Comparison of dual based optimization methods for distributed… At first, different scenarios that are generated by varying the starting times of the reactors are considered. Since it is possible to generate scenarios without active overarching constraints, which do not require any coordination, only scenarios where the overarching input constraint is active in at least two intervals are considered.
In Fig. 2 the final distribution of the input between the different sub-systems and the corresponding trajectories of the states are shown for three reactors starting at the same time, i.e. [0, 0, 0]. The input is discretized into piece-wise constant intervals of t = 4 h and the final times for all of the reactors are fixed at 20 t . In Fig. 2b, the inputs u i are stacked on top of each other. Input u 1 is the difference between ū 1 and the baseline at 0, input u 2 is the difference between ū 2 and ū 1 , and u 3 is the difference between ū 3 and ū 2 . This plot shows how the resources are distributed between the different sub-systems over time and that the shared resource constraint can be satisfied. This is a special case due to the same starting time and the equal distribution of the feedrate ( u i ), wherefore all trajectories in response to the prices are the same. The corresponding state profile is displayed in Fig. 2a. It is worth mentioning that the structure of the optimal solutions of the sub-systems (reactors) changes in the distributed optimization. As the maximum feed rate is no  longer reached, the first arc now is a sensitivity seeking arc, and the constraint on the adiabatic temperature becomes only active at 44 h. So for most of the batch time, the solution of the subsystem problems is not at the constraints, the constraint on the feed is dualized and only enforced by the coordination via the price of the feed. Figure 3 shows the evolution of the Lagrange multipliers corresponding to the overarching constraints on the shared resources in the maximization of the dual for the three different methods. In Fig. 3a, the spikes ate the beginning of the evolution of the Lagrange multipliers for the sub-gradient method result from the adaptation of the stepsizes to the specific problem. Once the stepsizes are adequate, the prices converge to the values of the monolithic optimization. For ADMM, the prices converge more quickly towards the optimal ones compared to the sub-gradient method, however, the increasing number of active overarching constraints as well as the balancing of primal and dual feasibility result in oscillations towards the optimal Lagrange multipliers, as can be seen in Fig. 3b. In Fig. 3c, the prices are adapted according to the ALADIN method, which is based on the derivatives at the currently active set of local inequalities, i.e., the active input and path constraints of all reactors. After a few iterations, the approximated active set is close to the actual one, and the prices converge towards the optimal Lagrange multipliers quickly. Another interesting scenario that is further examined results for the starting times [0, 0, 2]. The optimized input trajectories are shown for the different methods in Fig. 4. It can be seen that the shared resource constraint, indicated by the dashed line is satisfied for all methods, however, not all methods yield the same trajectories. Nonetheless, the objective values agree up to the 5th significant digit. Thus, the difference in the trajectories can be explained as different local optima that all satisfy (within the specified accuracy) the necessary conditions of optimality.
In Fig. 5, the corresponding evolutions of the primal feasibilities (infinity norm) are shown. As can be seen in Fig. 5a, in this scenario the sub-gradient method first converges to a point at which a further adaptation of the Lagrange multipliers, shown in Fig. 6a, does not have an effect on the primal feasibility. This is due to a set For ADMM, the primal feasibility does not steadily decrease but oscillates. Due to the second optimality criterion of dual feasibility, the scheme continues to iterate even when the overarching constraints are satisfied for all intervals. As can be seen in Fig. 6b, starting from iteration 20, the Lagrange multipliers oscillate around their final values before they converge.
Similar to ADMM, ALADIN decreases the infinity norm of the primal feasibility quickly, cf. Fig. 5c. Thereafter, the primal feasibility spikes either when the active set in the QP approximation changes or when new overarching constraints become active. The latter can be seen in Fig. 6c  from QP being calculated based on the wrong active set in the coordinator level QP. Different from the [0, 0, 0] scenario, several iterations are necessary for the Lagrange multipliers, cf. Fig. 5c, to converge to the values that yield the inputs in Fig. 4d. In Table 3, the results for different scenarios are given. The scenarios are permutations of the starting times between 0 and 8 h. The objective value is scaled by a factor of − 1000, such that higher values correspond to better objectives. This table is continued in Table 5 with the remaining permutations of the starting times between 0 and 16 h. If the objective value of a distributed method is slightly better than that of the monolithic solution method, this results from the overarching constraints being satisfied only to the specified tolerance Feas . The monolithic solution is accurate to the precision of IPOPT, which is set to 10 −12 . Feeding slightly more into the reactors leads to these small differences in the objective values. Even though it is mostly not reflected in the objective values, the resulting trajectories for the inputs do not always have the same arc structure. In addition to the objective values, also the number of necessary iterations, with the value of the best distributed method highlighted in bold, as well as the numbers of coordinated intervals (# Coord. Ints.), i.e., intervals with active overarching constraints, are shown.
The number of coordinated intervals correlates with the objective value since if fewer intervals need to be coordinated, this means that for more intervals there is no active constraint on the usage of the resources. It is no surprise that the scenarios where the starting times are further apart as well as when the reactors start later, cf.
[0, 0, 2] and [0, 2, 2], have a better objective value. The latter results from the fact that n C,i ∕ t decreases once the path constraints on c B,i is active.
The influence of the granularity of the time discretization on the number of necessary iterations depends on several factors. In general, it can be said that increasing the discretization interval t leads on the average to fewer intervals that have to be coordinated and the number of reactor specific constraints that can be active decreases significantly. This can be seen by comparing the results of t = 4 h with the results in Tables 6 and 7 in the Appendix, where the time discretization is changed to t = 8 h and t = 16 h, respectively.
The average number of iterations for the sub-gradient method approximately halves if the time discretization interval is doubled. For all three discretizations, the sub-gradient method has the highest variance. For ADMM the number of iterations is similarly reduced, however, the method converges much more consistently, i.e., the different scenarios do not influence the number of iterations as much as for the other methods. ALADIN exhibits a larger spread for the time discretizations t = 4 h and t = 8 h, however, requires significantly fewer iterations for t = 16 h. An example, where ALADIN requires many iterations is scenario [0, 1, 4], where the evolution of the primal infeasibility and of the Lagrange multipliers are shown in Fig. 7. Between iteration 30 and 140, the stepsize is too large for the algorithm to find the correct QP approximation. Once this set is found, the algorithm converges, however with small steps, such that another 60 iterations are required.
Varying the number of reactors does not make the problem significantly harder. For instance, changing the number of reactors while maintaining their starting position (e.g., [0], [0, 0, 0] and [0, 0, 0, 0, 0, 0]) and adapting the available amount  Comparison of dual based optimization methods for distributed… u shared,max accordingly, i.e., by multiplication with the new number of reactors divided by the old one, does not influence the necessary number of iterations, except for the initial phase, since the solution has the same structure. This can be seen in Fig. 8a-c, where the structure of the imbalances is similar. The difference in the final numbers of iterations results from the difference in the initial imbalance and the different adaptation of the stepsizes . The evolution of the prices and of the final structure of the solution are similar.
If however u shared,max is kept constant while the number of reactors is changed, the structure of the solution and the trajectory of the multipliers change, cf. Fig. 8d, where the maximum amount of u shared,max = 0.05 l/h is distributed between only two reactors.

Comparison of the methods for variable final times
The three methods are also compared for problems with enforced terminal constraints. The final times are initialized as in the previous case but changed during the optimization.
Additionally to the properties from the previous subsection, the points in time when the final time changes can be evaluated. In Figs. 9 and 10, the evolution of the final times and of the Lagrange multipliers is shown for scenario [0, 0, 2]. It can be seen that the distributed methods converge in the considered cases to the same final times. In the case of the sub-gradient method, as a result of the high fluctuations in the Lagrange multipliers at the beginning, also the final times change significantly until the stepsizes are adjusted accordingly. The large numbers of required iterations are caused by infinitesimal stepsizes resulting from the automatic adaptation of the stepsizes. For ADMM significantly fewer changes can be observed. ALADIN finds the vicinity of the optimal Lagrange multipliers even more quickly, and fewer changes in the final times occur. The number of iterations stays in a similar range as for the case with fixed final times, which can be seen in Table 4 and the resulting distribution of the feed rate between the reactors can be seen in Fig. 11.
In Table 4, as an additional column, the satisfaction of the terminal constraint at convergence is added. The solutions deviate more from the monolithic optimization for the case with free final times. However, for the considered cases and also the ones in the Tables 8, 9, and 10, the heuristic finds feasible solutions. Similarly as with the fixed final times, the smaller t is, the more often the distributed solution methods converge to the monolithic solution. Comparison of dual based optimization methods for distributed…

Discussion
In the following, the results for the distributed optimization methods as well as for the free final times heuristic are discussed and analyzed. By coordinating the shared resource consumption between the individual reactors using Lagrange multipliers, the structure of the optimal solutions for the individual reactors may change. In this case, additional sensitivity seeking arcs arise instead of constraint seeking arcs. So the example shows that both constraint seeking arcs and sensitivity seeking arcs in the sub-problems can be handled. Since in most cases trajectory optimization is not convex and thus part of a problem class for which few general statements can be made, a qualitative evaluation of the suitability of the methods is done.

Comparison of the distributed optimization methods
While for ALADIN an extension exists that guarantees convergence to a local minimum using de facto monolithic optimization steps to determine 1 , 2 , and 3 , for ADMM and for the sub-gradient method no proofs exist that these methods necessarily converge in the non-convex case. Furthermore, it should be noted that all the considered methods based on dual decomposition are infeasible path methods such that only at convergence, the resulting trajectories satisfy the overarching constraints. Nonetheless, in all considered scenarios the distributed optimization methods found feasible solutions with respect to the overarching constraints and converged to at least a local minimum using the parameters given in Table 2.
The sub-gradient method for inequality constrained problems adapts the stepsizes automatically, which has the advantage that no prior information on the  different sub-systems is required. This comes at the cost that for each scenario and for each optimization run a significant number of iterations is required to determine the stepsizes, which are conservative enough such that the active set of the inequality constraints on the shared resources stays mostly the same. Once these stepsizes are found, the method converges slowly and the final number of iterations can vary significantly, especially if local constraints are active, which can prevent improvements of the solution for many iterations as seen for instance in Fig. 5a. While one might conclude from Tables 3 and 4 that the sub-gradient method requires significantly more iterations for the free final times, the continuation in Tables 5 and 8 shows that this is not true. High numbers of necessary iterations are caused by the small stepsizes resulting from the scheme in Eq. 15. The benefit of the sub-gradient method is its simplicity. The augmentation of the objective function can be interpreted economically as the cost for use of the shared resource and on the coordination layer, the update mechanism matches supply and demand via the prices.
While ADMM does not need more information from the different sub-systems than the sub-gradient method, it introduces artificial penalization terms to regularize the deviations from feasible solutions. This comes with the advantage of a significantly improved speed of convergence in the considered cases. Whether this is acceptable depends on the situation: if distribution is mostly used as a tool to distribute the computational load or to robustify the optimization, it will probably not matter. If the goal is to coordinate the sub-systems while they only optimize their local cost function, it may not be acceptable.
ALADIN uses much more information from the different sub-systems, including state variables as well as derivatives of objective and active constraints, to create QP approximations. As a consequence, ALADIN converges in most cases significantly faster than the other methods, which is also described in the literature by Engelmann and Faulwasser (2019) and Jiang et al. (2017). However, this works only when the QP approximations are accurate. In distributed trajectory optimization there are two factors that can make the approximation difficult: highly non-linear constraints and changing active sets. The former one is the result of the non-linear model equations. The second results from the fact that small changes in the Lagrange multipliers can completely change the active set or the arc structure of the solutions. As long as the active set is not correct, QP will not be optimal and ALADIN will not converge. Thus, an adaptive scheme for the stepsizes was presented in Algorithm 5 that, by decreasing the stepsize with the number of iterations, prevents oscillation between different active sets.
In the original ALADIN paper (Houska et al. 2016), the authors recommend that a sufficiently large penalty parameter has to be chosen for the method to converge with a super-linear rate. We found that for the considered problems, choosing too large resulted in indefinite Hessian matrices H i . The difficulty to choose and results from the following trade-off: If is chosen large, then the Lagrange multipliers i of the local constraints of the different sub-systems can become large, if the z i variables are not feasible, which in turn can lead to negative definite Hessian matrices H i . Since H i must be positive definite for the coordinator level QP to yield meaningful updates of and z i , the parameter , which increases the eigenvalues, must be selected sufficiently large. If however is large, then the coordinator level QP will yield very small z i , which again yields small steps towards the optimum. Thus an adaptive scheme was used to prevent indefinite Hessian matrices while eventually allowing larger changes in the reference variables z i .
These adaptations to ALADIN were made to ensure convergence for all considered scenarios, which is, of course, a trade-off since without these adaptations many scenarios converge much faster.
In general, in some real settings, sharing the gradients of the local objectives may not be acceptable, as this may allow the coordinator to decipher the local cost structure.

Evaluation of the heuristic for the satisfaction of the terminal constraints
In all considered scenarios with free final times, the terminal constraints were satisfied for the distributed solutions. This can in general not be guaranteed and including this check along with primal and dual feasibility as a convergence criterion can prevent convergence. We thus recommend for the application of distributed optimization with free final times to check the feasibility of all sub-problems at convergence of the distributed optimization method. If this is not satisfied upon convergence, a fallback to a search space with worse objectives can be implemented. For the considered case study, this could, for example, be to increase the final times for all sub-systems and re-optimize without adapting the final times, which would eventually guarantee the satisfaction of the terminal constraint. Other fallbacks could be to allocate 1/N of the shared resources to each reactor or to partly disaggregate the profiles.
With respect to the necessary number of required iterations to converge, it can be said that the proposed strategy to adapt the final times can be integrated into the iterative methods without a significant influence on the overall number of iterations.

Conclusions
In this contribution, different methods for distributed trajectory optimization, in which the objective values of the sub-systems are not shared, were investigated. As an example, the trajectories of semi-batch reactors that are connected via overarching constraints on the feed rates were optimized. We evaluated and compared different methods based on the optimization of the trajectory of a benchmark semi-batch reactor and showed that for the considered case, convergence to local minima was achieved. Furthermore, a heuristic was proposed to include the final times of the different trajectories as degrees of freedom. Since the considered problem is not convex, a quantitative analysis of the results was done and possible obstacles for the application of the distributed optimization methods to other trajectory optimization problems were pointed out. In the distributed optimization, the structure of the arcs of the optimal solution may be different from the structure of the solutions for the sub-system problems as constraints for the sub-problems are now dualized.
The three investigated methods, the sub-gradient method, ADMM, and ALA-DIN, provide different trade-offs between sharing of information and rate of convergence. As expected, in general it can be said that the more information is exchanged between the coordination level and the sub-system level, the faster the methods converge. Since ADMM showed a consistent rate of convergence and it requires no additional information from the sub-systems beyond the resource consumptions, it is recommended as the first choice for distributed trajectory optimization problems if confidentiality is of importance.
The results can also be applied in various other domains where resources have to be allocated or shared between different dynamic systems, e.g., in the coordination of plug-in electric vehicles, the coordination of autonomous robots, distributed control, etc.
Future work will focus on improving the convergence of ALADIN for problems with overarching inequality constraints by better exploiting the available information on the active sets from the sub-problems. There are other techniques than SQP, e.g. interior point or active set methods, which might be adapted to the application with ALADIN to enable faster convergence to the correct active set. Furthermore, the characteristics of the trajectory optimization problems which can be solved with the proposed methods, or more specifically, what structure of arcs and what type of terminal constraints can be coordinated, should be investigated. Comparison of dual based optimization methods for distributed… 1 3

Appendix B: Results for different t and variable final times
See Tables 8, 9 and 10. 1 3 Comparison of dual based optimization methods for distributed…