Stochastic programming with primal–dual dynamics: a mean-field game approach

This study addresses primal–dual dynamics for a stochastic programming problem for capacity network design. It is proven that consensus can be achieved on the here and now variables which represent the capacity of the network. The main contribution is a heuristic approach which involves the formulation of the problem as a mean-field game. Every agent in the mean-field game has control over its own primal–dual dynamics and seeks consensus with neighboring agents according to a communication topology. We obtain theoretical results concerning the existence of a mean-field equilibrium. Moreover, we prove that the consensus dynamics converge such that the agents agree on the capacity of the network. Lastly, we emphasize the ways in which penalties on control and state influence the dynamics of agents in the mean-field game.


Introduction
Motivated by a stochastic program for the network capacity design and flow optimization, we cast for the first time primal-dual dynamics in the framework of mean-field games. We consider a scenario where demand materializes at some geographic spots and goods are transported from warehouses to fulfill such demand. We first model the problem as a two stage stochastic program for supply chain network design in the same spirit as in Ch. 1.5 [28]. Demand materializes at some nodes and goods flow over the edges. The flow in each edge is subject to capacity constraints and the flows in and out of each node must satisfy flow conservation constraints. The resulting optimization model is a constrained stochastic nonlinear (quadratic) two-stage program.
Continuous-time primal-dual gradient dynamics, also known as saddle-point dynamics, were first introduced in [2,20]. Primal-dual dynamics is used in a number of application domains such as energy resource allocation [17], cyberphysical systems [26], wireless communications networks [11], smart grids [34], just to name a few. Recent works have been dedicated to exponential stability analysis of primal-dual dynamics [12,24,27,30]. Furthermore, analysis on robustness has been performed in [15,19,25]. Also, the global/local nature of the convexity-concavity properties of the primal-dual dynamics was examined. Explicit attention has been paid to asymptotic stability properties of the primal-dual dynamics in [13,14,16,29]. Primal-dual dynamics and stochastic gradient descent has been recently studied in [8]. The extension to large scale problems is possible through parallelization and using elastic averaging stochastic gradient descent algorithms [9].
There is less work dedicated to the combination of primal-dual dynamics with mean-field games. However, there have been studies concerning Lagrangian relaxation methods to solve mean-field games, such as in [1], where a planning problem is discussed. Additionally, in [32] and [33] Lagrangian relaxation is used in mean-field game power control. More recently a primal-dual approach has been used to solve constrained mean-field games [10]. Mean-field games in the form of coupled partial differential equations originated in the works of Lasry and Lions [21][22][23]. Explicit closed-form expressions for mean-field equilibria were first proposed for linear-quadratic mean-field games [3] and have been extended to more general cases in [18]. This paper mainly follows the linear-quadratic mean-field game formulation as in [4][5][6].
The main contribution is a heuristic approach which involves the reformulation of the original problem into a large number of stochastic primal-dual dynamics which are coupled in the same spirit as in mean-field games. To be more specific, we consider the instance of a large number of agents, each assigned with a primal-dual dynamics subject to a specific realization of the demand. The players of the mean-field game have to obtain consensus over the here and now decision variables: the capacity of each edge of the supply chain network. We also consider wait and see decision variables for flow and the Lagrange multipliers coming from the primal-dual dynamics. Note the analogy with a two-player 2 Stochastic programming for network design Let us consider a physical flow network (referred to as micro-network) denoted as G(V, E). Let the set of nodes be V ∈ {V 1 , V 2 , … , V n } , where each node corresponds to an agent in the network. The parameter n denotes the number of nodes present in the network. Let the set of edges be E ∈ {E 1 , E 2 , … , E m } . The parameter m denotes the number of edges in the network. The demand is described by a vector w ∈ ℝ n×1 and is an uncertain random parameter. The capacity of the edges are the here and now variables and are denoted by c ∈ ℝ m×1 . The transported goods in each edge are the wait and see variables and are denoted by u ∈ ℝ m×1 . A graphical representation of the micro-network can be found in Fig. 1.
Let f 1 (c) = 1 2 c TQ 1 c +f T 1 c be the cost associated to the capacity of the edges in the micro-network, and f 2 (u) = 1 2 u TQ 2 u +f T 2 u be the cost charged for using the edges to transport goods such that the demand is satisfied. In the above costs we have a quadratic and a linear term. The matrices Q 1 ,Q 2 ∈ ℝ m×m are the cost coefficients for the quadratic term, and f 1 ,f 2 ∈ ℝ m are the linear cost coefficients. We model the supply chain network design problem as a two-stage stochastic program as follows: Note that the objective function that we wish to minimize describes the total cost of transporting goods. The optimization variables in this problem are the capacity of the edges c ∈ ℝ m×1 , and the flow in the edges u ∈ ℝ m×1 . The capacities c are the first-stage decision variables, and they are obtained before the realization of the demand. The optimal capacities are independent of the demand, however they have to accommodate any realization of it. On the other hand, the flows u are the secondstage decision variables, they depend on the values of the first-stage variables and they are obtained after observing the realized demand. In addition, B ∈ ℝ n×m is the incidence matrix of the micro-network, w (.) denotes expectation with respect to w and Q (u, w) denotes the sub-optimal value of the second-stage problem, namely Observe that as Q (c, w) depends on the capacities of the edges c and on the random variable w, the optimal value of the second-stage problem is also a random variable. Hence, we can obtain its expected value with respect to the demand, namely wQ (c, w) , which is included in the objective function of the two-stage stochastic program (1).

Primal-dual dynamics
In this section we introduce the stochastic primal-dual dynamic related to the twostage stochastic program (1). Let be a constant demand vector randomly extracted from the same distribution of w. To obtain the primal-dual dynamics associated with this particular realization of the two-stage program (1) we first derive the Lagrangian function as follows: where ∈ ℝ m×1 and ∈ ℝ n×1 are the Lagrange multipliers.
The Lagrange dual problem can then be formulated as follows: By imposing that the gradient of the Lagrangian with respect to u and c vanishes, we obtain the following Karush-Kuhn-Tucker (KKT) conditions: The corresponding primal-dual dynamics are obtained from the gradient descent and gradient ascent dynamics over u and c respectively as shown next: In compact form, the above set of differential equation can be written as In the two-stage program (1) the capacity vector c needs to be the same for any realization of the demand w, namely for any value of in the support of w. This implies that the primal-dual dynamics (4) obtained for different realization need to reach consensus on c. We show that this is possible by developing an ad-hoc mean-field game model in the following section.

Construction of the mean-field game
Let us now introduce a two-layer network as in Fig. 2. Layer 1 involves the communication topology among different populations (henceforth referred to as macro-network) denoted as Ĝ (V,Ê), whereas layer 2 comprises the physical flow network (micro-network) as introduced in Sect. 2, denoted as G(V, E). Each node in the macro-network Ĝ (V,Ê) represents a different population. All populations in Ĝ have the same topology of their micro-network G. Hence, they are characterized by the same incidence matrix B and the same cost functions f 1 and f 2 . However, each agent in the population has a different realization of the demand. The aim is to obtain a feasible solution to the two-stage stochastic program (1) by reaching consensus on the capacities of the edges between any agent in the population and across all populations. We use the macro-network to model a distributed optimization setting. In this section we present a heuristic approach that turns the two-stage stochastic program (1) into a mean-field game, where each player corresponds to an agent in each population, and it is assigned to a primal-dual dynamics according to its demand realization. This new approach represents a distributed optimization problem, which enables the agents to reach consensus on the edges' capacities, and it provides a feasible solution to problem (1) by taking into account the communication between populations.
Consider |V| = p populations such that each agent in our game belongs to a population k ∈ {1, … , p} and is characterized by a primal-dual dynamics (4) obtained for a particular realization of the demand. The generic agent in population k is characterized by its state x(t) which involves the decision variables (flows and capacities) and the Lagrange multipliers of problem (1), namely

3
Stochastic programming with primal-dual dynamics: a mean-field… Here, is a constant sample randomly extracted from the same distribution of the demand in (1) and v is an additional control input that we use to force consensus on the edge capacities c. Note that (5) can be written in compact form as . Now let us consider a probability density function that describes the density of the agents in a population in state x at time t, m k (x, t) , with the property ∫ ℝ 3m+n m k (x, t)dx = 1 . Then the mean states can be computed following So far, we have considered the populations separately. To describe the interaction between populations we take into account the topology of the macro-network Ĝ . Let us associate population k with agent k. Then we can introduce an interaction topology between agents, say Ĝ = {V,Ê} , and we can define the neighbors of an agent k in Ĝ as: In order to reach consensus, each population is interested in knowing the local average state of its neighbours, which is given by: Here, |N(k)| denotes the cardinality of neighbor set k. Let us now consider a running cost function g(x, k , v) and a terminal cost function ( k , x) , which are defined as follows: The first term in the expression for g assigns a penalty to the state deviating from the mean k . The second term assigns a penalty on control. The matrices Q, S and R are diagonal matrices of compatible dimensions.
Every agent in population k wishes to solve the following problem: subject to The optimal control problem as in (7) has the structure of an optimal tracking problem. The objective function in (7) captures the minimization of the total expected cost. The aim is to reach consensus by minimizing the deviations of the state x from the local average of each population k . This is denoted by the quadratic error term in the running cost function g, and by S( k − x) 2 in the terminal cost function . The term 1 2 v T Rv in the cost g penalises the energy of the control. The affine dynamics ̇x = Ax + B + C in the constraints comes from the primal-dual dynamics explained in the previous section.
For every population k, denote the value of the optimization problem starting at time t and state x by k (x, t) . Let us also denote by m k (x, 0) = m k0 (x) ∈ ℝ the initial density of the agents of population k in state x. This results in the following meanfield game in k (x, t) and m k (x, t): Any solution of (8) is referred to as the mean-field equilibrium, which provides the sub-optimal values for the wait and see variables. Note that the second and fourth equation from (8) are the boundary conditions, while the third equation is the advection equation.
The optimal time-varying state-feedback control can be computed for every single agent in population k and is given by: In this expression, note that the Hamiltonian appears as the argument of the minimizer.

Mean-field equilibrium and convergence
In this section, we obtain an expression for the mean-field control and provide results for the mean-field equilibrium dynamics.

Lemma 1 The mean-field game takes the form:
Additionally, the optimal control is: In this set of equations, the first equation corresponds to the Hamilton-Jacobi-Isaacs equation, and the third equation corresponds to the Fokker-Planck-Kolmogorov equation. The proof of Lemma 1 can be found in Appendix A.
We assume that the time evolution of the common state is known and, subsequently, investigate the solution of the Hamilton-Jacobi equation. Consider the following problem with known k : where The next theoretical result presents the mean-field equilibrium control. In preparation to that, let us consider a probability density function that describes the density of the agents of a population in state c for the capacity of the edges at time t, m k (c, t) , with the property ∫ ℝ m m k (c, t)dc = 1 . Then the mean states can be computed following m c k (t) = ∫ ℝ m cm k (c, t)dc. We define an agent's objective in population k based on the aggregate kth state as: In the following, given a generic matrix A ∈ ℝ (3m+n)×(3m+n) we denote by A c ∈ ℝ m the matrix obtained extracting the rows and columns associated with the only variables c ∈ ℝ m×m . In addition, we consider the following value function: and denote Theorem 1 A mean-field equilibrium for the dynamics of (10) is obtained from the following set of equations: t). where

the inverse does not exist, and L is defined as the graph-Laplacian matrix where the kjth entry is the block matrix:
This result is relevant since we can solve (15) in closed form and we show that consensus is ultimately achieved among agents. The proof of Theorem 1 can be found in Appendix B.

Remark 1
Note that based on the way in which we have defined the mean-field game it is not possible to guarantee convergence to the optimal solution. We can only guarantee convergence of the agents to a feasible solution. It is important to highlight that the mean-field game introduced in this manuscript has similarities with the distributed subgradient algorithm used to solve distributed optimization problems [26]. From (19) we can observe that the term ̃ 1 m c (t) , which corresponds to the costs of the edges capacities, is related to the gradient dynamics of variable c.
On the other hand, the term Lm c (t) enforces consensus, as it aims to minimize the deviation of m c k from c k . It is worth mentioning that in the distributed subgradient method a step-size (t) is applied, and convergence to a feasible solution is only guaranteed for a diminishing step size. Nevertheless, based on (19), one can assume that the step-size for the approach introduced in this paper is always = 1 . There exist other algorithms that can be applied to solve distributed optimization problems. One of these algorithms is the gradient tracking method [26]. This method tracks the gradient of the cost function and not only the value of the optimization variables. By applying the gradient tracking algorithm it has been proven that it is possible to reach consensus to the optimal value. However, the study of the gradient tracking algorithm is outside of the scope of this research.

Simulation example
In this section we present a numerical example of a capacity network design for a supply chain network. We show that the agents in the macro-network reach consensus on the edges' capacities of their respective micro-networks. We analyse the ways in which penalties on control and state influence the dynamics of the agents in the mean-field game. Furthermore, we run various simulations under different scenarios to investigate the sub-optimality of the algorithm built on the mean-field game.
Consider a scale-free network where p = 1000 agents. The agents only consider the local average computed over its neighbors according to (6). Furthermore, we generate stochastic demand according to the normal distribution at every time instance for the nodes of the micro-networks displayed in Fig. 1. This makes the constant term vector C change over time since is now renewed at every time instance. Prior to the simulation, we parametrize the system as follows: Note that in our example, the penalty of using flow u 8 is twice the penalty on the other flows since we wish to keep the flows balancing w 4 and w 3 as much distinct as possible. In general, any route can be incentivised by adjusting penalties and this emphasizes the versatility of our approach. Furthermore, in Table 1 the following additional simulation parameters are defined: time step t , mean and standard deviation of initial states of the primal-dual dynamics ̃ 0 and ̃ 0 ; mean and standard deviation of demand at node 3, denoted ̃ 3 and ̃ 3 ; and mean and standard deviation of demand at node 4, denoted ̃ 4 and ̃ 4 . The multi-population considered allows the simulation algorithm to be initialized with different initial conditions and also allows to accommodate different realizations of the wait and see variables.
We use the parameters mentioned above as input for the following simulation algorithm by which we compute the states of each agent over time.
Figures 3a-c depict the evolution of the edges' capacities over time for all agents. Figure 3a shows a simulation where the penalty on control equals the penalty on state deviation, i.e., Q = R = 1 . Figure 3b presents a simulation of the mean-field  . 3 Simulation of the mean-field game for different values of the penalty on control R and the penalty on state deviations Q game where the penalty on state deviation is increased, namely Q = 10 , R = 1 . On the contrary, the results of a mean-field game simulation where Q = 1 , R = 10 can be found in Fig. 3c. By setting the penalty on state deviation sufficiently high as it was done in the simulation of Fig. 3b, the states converge to closer values compared to the case where Q and R are equal. This is in accordance to intuition as in the case of Fig. 3b states deviation from k incur in a higher penalty. Conversely, by setting the penalty on control sufficiently high, convergence is obtained quicker in comparison to the simulation where Q and R are equal. In this situation, the system is penalized more to make adjustments and this explains the thinner lines in the converged region. Another observation is that, in this case, the best strategy to pursue the minimization problem is to choose a different solution compared to the simulations of Fig. 3a, b.
From the examples presented above, let us consider the best strategy for the minimization, which is when the penalty on the states deviations is Q = 1 and the penalty on the control is R = 10 . To analyse the sub-optimality of the algorithm for the mean-field game, we compare the flows and capacities obtained from the consensus in the mean-field game with the values obtained by solving the primal-dual dynamics as it is explained in Sect. 3, and with the real optimal values obtained by solving (1). We compare the capacities and flows obtained from the different methods by evaluating the values of the objective function in (1). Furthermore, for the mean-field game we obtain a lower bound of the objective function by relaxing the constraint on consensus on the capacities of the edges.
In Fig. 4a we display the evolution over time of the objective function for the different methods applied to solve the two-stage stochastic program (1). Note that when the agents reach consensus on the capacities, the value of the objective function obtained from the mean-field game becomes an upper bound of the real optimal value. On the other hand, we can observe that the value of the objective function when we relax the constraint on consensus becomes a lower bound of the value of the objective function of the mean-field game when we take into account the consensus constraint. Let us denote by F MFG , F NC , F PD , F O the value of the objective function obtained from the mean field game, the value of the objective function obtained by relaxing the constraints on consensus, the value of the objective function obtained from the primal-dual dynamics and the value of the objective function obtained from the real optimum, respectively. To analyse the differences between the different approaches, we compute the percentage errors between the values of the objective function obtained from the mean-field game and the values of the objective function using the other methods as follows: In Fig. 4b we plot the percentage errors between the values of the objective function obtained from the mean-field game and the values of the objective function using the other methods. In the long term, the percentage error when we compare the results of the mean-field game with the results from the primal-dual dynamics is 8.47% . On the other hand, compared to the optimal value obtained by solving (1), the percentage error in the long term is 1.96%.
We develop the same procedure explained above for fifteen different scenarios. In these scenarios we consider different distributions of the demand, different topology for the micro-network and for the macro-network, different vectors f 1 and f 2 , and different values for the penalty on the states deviations and penalty on the control. We run fifty simulations of each scenario and we obtain the average of the percentage error over all simulations.
In the first three scenarios we change the mean and standard deviation of the demand at nodes 3 and 4, and we keep the rest of the simulation parameters as in the first example. For the first scenario the mean and standard deviation are ̃ 3 = 10 , ̃ 4 = 20 , ̃ 3 = 3 and ̃ 4 = 1 . For the second scenario the parameters are ̃ 3 = 30 , ̃ 4 = 30 , ̃ 3 = 1 and ̃ 4 = 3 . For the third scenario the mean and standard deviation are ̃ 3 = 5 , ̃ 4 = 5 , ̃ 3 = 3 and ̃ 4 = 3 . The percentage error for each scenario is depicted in Fig. 5. Observe that for lower mean and higher variance, the average percentage error increases.
For the next three scenarios we change the topology of the micro-network, and we keep the rest of the simulation parameters as in the first example. In the first scenario the network consists of n = 6 nodes and m = 11 edges. For the second scenario the micro-network consists of n = 6 nodes and m = 7 edges. For the third scenario we increase the number of nodes to n = 10 and the number of edges to m = 16 . The percentage error for each scenario is depicted in Fig. 6. Note that when we reduce the connections between the nodes, the average percentage error increases. Now we consider three different scenarios by changing the topology of the macro-network, and we keep the rest of the simulation parameters as in the first example. We design the topology of the macro-network by using a power law distribution, in which few nodes have high connectivity and most nodes have very low connectivity. In the first scenario the average degree of the nodes is four. In the second scenario the average degree is two. In the third scenario the average degree is six. The complex networks that we consider for these scenarios are depicted in Fig. 7a-c. As one can observe in Fig. 7d-f there is not a significant difference in the average percentage error between these three scenarios. The next three scenarios are generated by changing the cost coefficients f 1 and f 2 and by keeping the rest of the simulation parameters as in the first example. In the first scenario the cost coefficients are f We can observe in Fig. 8 that when we increase the cost coefficients for the capacities and flows, the average percentage error with respect to the primal-dual dynamics and with respect to the real optimal value are very similar.
The last three scenarios consist in changing the penalty on state deviation Q and the penalty on control R, and keep the rest of the simulation parameters as in the first example. In the first scenario the penalty on state deviation and the penalty on control are the same, namely R = Q = 1 . In the second scenario the  penalty on state deviation is Q = 10 and the penalty on control is R = 1 . In the third scenario we take the penalty on state deviation Q = 1 and the penalty on control R = 10 . In accordance with the first examples analysed at the beginning of this section, we observe that when the penalty on control is sufficiently high the convergence is obtained quicker, and the percentage error is smaller than in the other two scenarios (Fig. 9).
The average percentage errors, in the long-term, for the different scenarios when we compare the results of the mean-field game with the results from the primal-dual dynamics and with the optimal value obtained by solving (1) are summarized in Table 2.  In the long term, the total average of the percentage error over all scenarios and simulations when we compare the results of the mean-field game with the results from the primal-dual dynamics is 11.46% . On the other hand, compared to the optimal value obtained by solving (1), the average percentage error in the long term is 7.43%.

Concluding remarks and future work
In this paper, we have provided a heuristic method to solve a two-stage stochastic program. We have illustrated the method on a supply-chain network design problem. The underlying idea of the method is to replace the original stochastic program with a mean-field game with a large number of agents. Each agent controls a primal-dual dynamics with the double aim of moving along the gradient descent/ascent while at the same time reaching consensus on the here and now variables, which are represented by the edge capacities. We show that the resulting mean dynamics mimics elastic averaging stochastic gradient algorithms. The heuristics has been implemented on a numerical example for different values of the parameters such as the penalties on state deviation and control.
Future work involves the generalization of the method to nonlinear convex problems, the analysis of the sub-optimality performance at least for specific scenarios,  (39) otherwise.