Decomposition and Mean-Field approach to Mixed Integer Optimal Compensation Problems

Mixed integer optimal compensation deals with optimization problems with integer- and real-valued control variables to compensate disturbances in dynamic systems. The mixed integer nature of controls could lead to intractability in problems of large dimensions. To address this challenge, we introduce a decomposition method which turns the original n -dimensional optimization problem into n independent scalar problems of lot sizing form. Each of these problems can be viewed as a two-player zero-sum game, which introduces some element of conservatism. Each scalar problem is then refor-Communicated mulated as a shortest path one and solved through linear programming over a receding horizon, a step that mirrors a standard procedure in mixed integer programming. We apply the decomposition method to a mean-ﬁeld coupled multi-agent system problem, where each agent seeks to compensate a combination of an exogenous signal and the local state average. We discuss a large population mean-ﬁeld type of approximation and extend our study to opinion dynamics in social networks as a special case of interest.


Introduction
Mixed integer optimal compensation arises when optimizing a mix of integerand real-valued control variables in order to compensate for disturbances in dynamic systems. Mixed integer control can be viewed as a specific sub-field of optimal hybrid control [1], addressed recently also in a receding horizon framework [2]. Optimal integer control problems have been receiving growing attention and are often categorized under different names (e.g., alphabet control [3,4]). Handling integer control requires more than standard convex optimization techniques. It is known that new structural properties of the problem play important roles in mixed integer control; as an example, see multimodularity presented as the counterpart of convexity in discrete action spaces [5]. We should note that there is vast literature on mixed integer programming [6], and it is in this context that we cast the problem addressed in this paper. For a survey of solution methods for mixed integer lot sizing models circa early 1990's, we refer the reader to [7]. Mixed integer optimal control has been dealt with in [8][9][10][11].
Highlights of the main results and relationship with the relevant literature. We build on existing results in the lot sizing literature that convert lot sizing problems into shortest path problems. More details on this conversion can be found in [12, p.98], and [13,14]. The underlying idea is summarized in  (middle) one shot production policy; (bottom) two period production policy.
Specifically, the paper makes three main contributions. First, we formulate the mixed integer optimal compensation problem. Second, we provide a performance analysis of the decomposition method that reformulates the ndimensional mixed integer problem as n independent uncertain lot sizing systems. Each of these problems can be viewed as a two-player zero-sum game, which introduces some element of conservatism. Third, we view each decomposed mixed integer problem as a shortest path problem and solve the latter through linear programming.
The conservatism arising from the robust decomposition and approximation can be reduced if we operate in accordance with the predictive control technique: i) optimize controls for each independent system based on the pre-diction of other states, ii) apply the first control, iii) provide measurement updates of other states and re-iterate.
There are several differences between the problem treated and the approach adopted in this paper and those in the related literature. The difference from [2], for example, is that here we focus on a smaller class of problems that can be solved exactly by simply relaxing the integer constraints. In that respect, the lot sizing like model used in this paper has much to do with the inventory example briefly mentioned in [1]. There, the authors simply include the example in a large list of hybrid optimal control problems but do not address the issue of how to fit general methods to this specific problem. Here, however, we emphasize the computational benefits that can be derived from the "nice structure" of the lot sizing constraints matrix. Binary variables, used to model impulses, match linear programming in [15]. There, the linear reformulation is a straightforward derivation of the (inverse) dwell time conditions that have first appeared in [16]. Similarity with [15] is the use of total unimodularity to prove the exactness of the linear programming reformulation.
Differences are in the procedure itself upon which the linear program is built. The shortest path model is an additional new element which distinguishes the present approach from that of [15].
We also provide in the paper a discussion on a special case of interest where each agent seeks to compensate a combination of the exogenous signal and the local state average. Here, the model is suitable to capture opinion fluctuations (sawtooth waves) in social networks [17]. We assume that the opinion dynamics are influenced by three different factors: the media, whose influence is modeled as an exogenous signal; the presence of a stubborn agent who is able to reset other agent's opinions; and the interactions among the agents (the endogenous factor). An underlying assumption here is that a reset for a particular agent occurs whenever that agent chooses to meet with the stubborn agent, in which case the binary control is set to one. Also, the interactions among the agents are captured by an averaging process.
In the sense above, our decomposition idea is similar to mean-field methods in large population consensus. The mean-field theory of dynamical games with large but finite populations of asymptotically negligible agents (as the population size grows to infinity) originated in the work of M.Y. Huang, P.
E. Caines and R. Malhamé [18][19][20] and independently in that of J. M. Lasry and P.L. Lions [21][22][23], where the now standard terminology of Mean Field Games (MFG) was introduced. In addition to this, the closely related notion of Oblivious Equilibria for large population dynamic games was introduced by G.
Weintraub, C. Benkard, and B. Van Roy [24] in the framework of Markov Decision Processes. This theory is very versatile and is attracting an ever-increasing interest with several applications in economics, physics and biology (see [25][26][27]). From a mathematical point of view, the mean-field approach leads to the study of a system of partial differential equations (PDEs), where the classical Hamilton-Jacobi-Bellman equation is coupled with a Fokker-Planck equation for the density of the players, in a forward-backward fashion. The decomposition method proposed here requires that each agent i computes in advance the time evolution of the local average (see, e.g., the Fokker-Planck-Kolmogorov equation in [23,[28][29][30][31][32][33]). However, since this is practically impossible, we use here the predictive control method to approximate the computation of the solution.
The main contributions of this work can therefore be summarized as follows: First, we draw a connection between game theory and a class of mixed integer control problems by decomposing an n-dimensional optimization problem into n two-player zero-sum games. Second, by reformulating decomposed problems as shortest path problems, we show that mixed-integer optimal compensation problems are tractable under certain assumptions. Third, we leverage this connection to develop a mean-field game approach to study the largescale optimization problem using a large population game framework.
A preliminary version of this paper was presented at the 2012 American Control Conference [34]. In addition to what was presented in [34], the current paper includes a detailed analysis of the case where a large number of agents interact and this interaction is described through a state averaging process.
For this case we provide a macroscopic description of the system in terms of consensus to the average mass distribution. This part of the paper includes an additional example (Example 6.3) that illustrates possible population evolutions. A further element, which is not present in [34], is an experimentallydriven discussion on performance and complexity of the method provided in Example 6.1.
The paper is organized as follows. We present the problem statement in Section 2. We then move to present the decomposition method in Section 3.
In Section 4, we turn to introducing the shortest path reformulation and the linear program. In Section 5, we discuss the case where the local state average appears in the dynamics. In Section 6, we present three numerical examples to illustrate the results in the paper. We conclude the paper with the recap of Section 7.

Mixed Integer Optimal Compensation (MIPC)
In mixed integer optimal compensation problems, we have continuous states x(k) ∈ R n , continuous controls u(k) ∈ R n , discrete controls y(k) ∈ {0, 1} n , and continuous disturbances w(k) ∈ R n , where k = 0, 1, . . . is the time index.
Evolution of the state over a finite horizon of length N is described by a linear discrete-time (difference) equation in the general form (1) below, where A and E are matrices of compatible dimensions and x(0) = ξ 0 ≥ 0 is a given initial state. Continuous and discrete controls are linked through the general capacity constraints (2), where the (scalar) parameter c is an upper bound on control, with the inequalities in (1) and (2) to be interpreted component-wise.
The above dynamics are characterized by one discrete and one continuous control variable per each state. Starting from nonnegative initial states, we force the state to remain confined to the positive orthant, which may describe a safety region in engineering applications or reflect the desire to prevent shortfalls in inventory applications. The final state, x(N ), is forced to be equal to zero, which corresponds to saying that the control u(k) has to "compensate" the cumulative effects of the disturbances Ew(k) and term Ax(k) over the given horizon.
The following assumption serves to describe the common situation where the disturbance seeks to push the state out of the desired region. Its value is given at the beginning and fixed that way. Each column of matrix E establishes how each disturbance component influences the evolution of the state vector.

Assumption 1 (Unstabilizing disturbance effects)
where the inequality is to be interpreted component-wise.
Actually, the control actions push the state away from the boundaries into the positive orthant, thus counteracting the destabilizing effects of the disturbances. However, controlling the system has a cost and "over acting" on it is penalized, which is quantified through a cost/objective function. This function, to be minimized with respect to y(k) and u(k), is a linear one including proportional, holding, and fixed cost terms expressed by parameters p k , h k , and f k respectively: where ·, · denotes the Euclidean inner product. The problem of interest is thus completely characterized by (1)-(4). This hybrid minimization problem can be turned into a mixed integer linear program by using the standard method discussed next. Henceforth we refer to (1)-(4) as (MIPC).

Introducing some Structure on A
With regard to (1), we can isolate the dependence of one component state on the other ones and rewrite (1) in a way that establishes similarity with standard lot sizing models [7]: Equation (5) is a straighforward representation of (1) where To preserve the nature of the problem, which has stabilizing control actions playing against unstabilizing disturbances, we assume that the influence of other states on state i is relatively "weak". In other words we assume that the influence of Bx(k) is small if compared with the unstabilizing effects of disturbances captured by the term Ew(k).

Assumption 2 (Weak coupling)
where inequality is again component-wise.
Essentially, the states' mutual dependence expressed by Bx(k) only emphasizes or reduces "weakly" the destabilizing effects of the disturbances. In the next section, we present a decomposition approach that translates dynamics (5) into n scalar dynamics in "lot sizing" form [7].

Robust Decomposition
With the term "robust decomposition" we mean a transformation through which dynamics (5) are replaced by n independent uncertain lot sizing models of the form (8) where x i (k) is the inventory, d i (k) the demand, u i (k) the reordered quantity and D k i ⊂ R denotes the uncertainty set: Recall that in (5) the disturbance is given at the beginning and fixed that way.
We use those values of the disturbance to determine set D k i in (8) where we denote by B i• the ith row of the matrix B, with the same convention applying to E i• . Following the decomposition, each lot sizing model is controlled by an agent i (whose state is x i ) who plays against a virtual opponent which selects a worst-case demand, which can be viewed as a two-player game.
Our next step is to make the n dynamics in the form (8) mutually independent. Toward that end, we introduce X k as the set of x(k) and observe that this set is bounded for bounded d i (k). The set X k can be defined in two steps. First, we assume that the states never leave a given region, then we compute the worst-case vector x(k) in the region, namely the vector x(k) that, once substituted in (9), has the effect of pushing the ith state out of the safe region. Then, we check whether the trajectory still lies within the region.
Boundedness of X k means that there exists a scalar φ > 0 such that In view of this, it is possible to decompose the system by replacing the current demand d i (k) by the maximal or minimal demand as computed below: where [B ij ] + denotes the positive part of B ij , i.e., max{B ij , 0} and [B ij ] − the negative part. In the following we will write compactly d e i (k), e ∈ {+, −, nil} to generically address the maximal demand (10) when e = +, the minimal demand (11) when e = −, and the exact demand (9) when e = nil. From the above preamble we derive the uncertainty set as Likewise, (11) describes the demand that would push the state out of the positive orthant in the longest time. To complete the decomposition, it remains to transform the objective function (4) into n independent ones: Note that because of the linear structure of J(u, y) in (4), we have Thus, we have transformed the original problem into n independent mixed integer minimization problems of the form (12)- (14) below.
In the spirit of predictive control, we solve, for τ = 0, . . . , N − 1, and e(τ ) = nil, e(k) = e, for k > τ , e ∈ {nil, +, −}, and with ξ τ i being the measured state at time τ : Note that when the superscript e = nil then we simply write (M IP C i ).
Denote by (M IP C) r the relaxation of (M IP C i ) where 0 ≤ y ≤ 1.
Lemma 3.1 The following relations hold:

Shortest Path and Linear Programming
What we will establish here is that, for the problem at hand, relaxing and massaging the problem in a certain manner, leads to a shortest path reformu-lation of the original problem. Shortest path formulations are based on the notion of regeneration interval as discussed next.
Let us borrow from [7] the concept of regeneration interval and adapt it to the generic minimization problem i defined by (12)-(14).
Given a regeneration interval [α, β], we can define the accumulated demand over the interval d αβ i , and the residual demand r αβ i , as The path we take now is to reformulate problem (12) Variables y αβ i (k) and αβ i (k) tell us on which period full or partial batches are ordered. Then, we can use some well-known results from the lot sizing lit-erature to convert the original mixed integer problem (12) The above model has been extensively used in the lot sizing context.
Equality constraints (17) and (19) tell us that the ordered quantity over the interval has to be equal to the accumulated demand over the same interval.
This makes sense as the initial and final states of a regeneration interval are null by definition. The inequality constraints (18) and (20) impose that the accumulated demand in any subinterval may not exceed the ordered quantity over the same subinterval. Again, this is due to the condition that the states are nonnegative in any period of a regeneration interval. Finally, the objective function (16) is simply a rearrangement of (12) induced by the variable transformation seen above and specialized to the regeneration interval [α, β] rather than being on the entire horizon [0, N ].
The solutions of (LP αβ i ) that are binary are called "feasible". We are now in a position to recall the following "nice property" of (LP αβ i ) presented first by Pochet and Wolsey in [7]. Proof. Note that the constraint matrix of (LP αβ i ) is a 0 − 1 matrix. We can reorder the constraints in a certain manner, so that the matrix has the consecutive 1's property on each column and turns out to be totally unimodular.
It then follows that y α,β i and α,β i are 0 − 1 in any extreme solution.

Shortest Path
We now resort to well-known results on lot sizing to arrive at a shortest path

Receding Horizon Implementation of (LP i )
The main difference between the lot sizing model [7] and the (LP i ) arrived at here is that in the (LP i ) the initial state is non null. Actually, successive linear programs (LP i ) are linked together by the initial state condition expressed in (13), which we rewrite below To address this issue, we need to elaborate more on how to compute the accumulated demand in (15). Take for [τ, t] any interval with x(τ ) = ξ τ i > 0.
Then, condition (15) needs to be revised as The effective demand over an interval is the accumulated demand reduced by the inventory stored and initially available at the warehouse. From a computational standpoint, the revised expression (22) has a different effect depending on whether the accumulated demand exceeds the initial state or not, as discussed next. 1.

Mean Field Coupling
In this section, we discuss a special case of interest where each agent seeks to compensate a combination of the exogenous signal and the local state average.
We assume that the worst-case demand introduced earlier takes into account also of the mean-field influence of the population behavior on the ith dynamics. Thus, each agent plays his best-response against the population behavior.
The resulting model is a mean-field game, which is suitable to describe fluctuations (sawtooth waves) in opinion dynamics. Indeed, we can interpret the state of each agent as her opinion on a certain issue, the exogenous signal as the media influence, and the control as an instantaneous reset on the opinion subsequent to a meeting with a stubborn agent [17]. In addition, the dependence on endogenous factors, represented by the averaging process, is the result of the interactions among the agents. In this case, our decomposition methodology becomes similar to mean-field methods in large population consensus [20,33,35]. We discuss below the mean-field approximations as well as the application of predictive control methods to approximate the computation.

Multi-Agent System Model
Consider a graph G = (V, E) with a set of vertices V = {1, . . . , n} and a set of edges E ⊆ V × V . Denote by N i the neighborhood of agent i, i.e., We can associate with the graph G the normalized graph Laplacian matrix L ∈ R n×n whose ij-th entry is (5) is B = − L for some sufficiently small scalar > 0. In this case dynamics (5) become:

Now, a special case of interest is when B in
Essentially, the above dynamics together with the constraint x(N ) = 0 arise in all those situations where each agent i = 1, . . . , n tries to compensate a combination of the exogenous signal w(k) and the local state average given bȳ Elaborating along the line of the robust decomposition (8), we can then compute the disturbance taking into account the influence of the local average on the exogenous signal as follows: Note that Assumption 2 in this case says that the exogenous signal is dominant if compared to the weak influence from neighbors.
In principle, for the decomposition method to be exact, each agent i should know in advance the time evolution of the local averagem i (k), for k = 0, . . . , N . However, this may not be feasible. One way to approximate the local averagem i (k) is through mean-field methods. Under the further assumption that the number of agents is large and the agent dynamics are symmetric, the local average can be characterized through the finite-difference approximation of the continuity or advection equation that describes the transport of a conserved quantity [35]. Another way to deal with the problem is to use the predictive control method to approximate the computation. More specifically, when we solve the problem over the horizon fromk ≥ 0 to N , we assume that neighbor agents communicate their state and so at least the first samplē m i (k) is exact. In the later stages of the horizon each agent approximates the local average by specializing (10)-(11) to our case. Note that maximal and minimal demand can be obtained by assuming that all agents j = i are in 0 or φ respectively, and thus we have for agent i: Alternatively, this also corresponds to assuming for the uncertain set D k i the following expression: The above set up includes the case where agents are homogeneous as explained next.
Homogeneous agents. Within the realm of mean-field coupling, a particularly interesting case is the one where agents are homogeneous in the sense that they behave similarly when at the same state. For these problems, a main question is the asymptotic population behavior, i.e., the behavior of the population when the number of agents is large.
Suppose that all agents face the same disturbance comprised of a constant value plus a random walk, i.e., ω i (k) : is the random walk, and σ i is the random walk coefficient, for all agents i.

Denoting the saturation function by
the system dynamics takes the form where u i (k) is an (s, S) strategy (see, e.g., [36]) of the type Essentially, the control restores the original upper threshold S anytime the stocked inventory (the state) goes below a lower threshold s. Such a policy has been proven to be optimal in the presence of fixed costs in a number of inventory applications. Note that the saturation function is used here only to avoid state oscillations when the agents are far enough from the local average.
Our goal is now to provide a macroscopic description of the system and analyze the corresponding behavior. To do this, we borrow from [37] a modeling approach based on stochastic matrices. Let W = I − L be a row stochastic matrix, i.e., W 1 = 1. The system equation (23) can be rewritten as Given the distribution m(k) followed by x(k), denote the corresponding average distribution asm(k) = 1 n 1, x(k) . Using the property 1 T W = 1 T , we can derive for the average the following recursive equation: where ω(k) is the vector whose ith component is ω i (k).
The above is a stochastic process whose first-order moment is generated by Em(k + 1) = Em(k) + const. + 1 n 1, u(k) , Now, our aim is to analyze the convergence of the agents' opinions to their average. Toward that end, define M = 1 n 1⊗1. Then for a given vector x(k) we have Mx(k) = ( 1 n 1 ⊗ 1)x(k) =m(k)1. With the above in mind, the deviation of each agent state x i (k) from the averagem(k) is captured by the vector If agents reach average-consensus, i.e., their opinions all converge to the average, then the variable z(k) goes to zero. After some transformations we obtain for z(k) the following iteration: Following a few recursions, we can relate z(k) to the initial discrepancy value z(0) and to the sequence of inputs ω(k) and u(k) as follows:  If there exists a τ > 0 such that z(τ ) ≤ ε for a sufficiently small positive ε, Proof. First, note that from σ i = 0 for all i and homogeneity it follows that x 1 (k + 1) A closer look at the first equation reveals that a higher velocity x 2 (k) leads to a faster decrease of position x 1 (k+1). Similarly, the second equation tells us that a higher position x 1 (k) induces a faster increase of velocity x 2 (k+1) because of some elastic reaction. In both equations, the nonnegative disturbances w i (k) ≤ 0 seek to push the states x i (k) out of the positive quadrant in accordance with Assumption 3. Their effect is counterbalanced by positive control actions u i .
Also, acting on parameter κ we can easily guarantee the "weakly coupling" condition given in Assumption 2.
Turning to the capacity constraints (2), for this two-dimensional example, these constraints can be rewritten as: Regarding the objective function (4) ( 1 n , u(k) + 1 n , x(k) + 100 1 n , y(k) ) . Next, we decompose dynamics (29) in scalar lot sizing form (13) which we rewrite below: As regards the estimated demand d + i , a natural choice is to set d + i as below, where we have denoted byx 1 (k) (respectivelyx 2 (k)) the estimated value of state x 1 (k) (respectively x 2 (k)) in the dynamics of x 2 (k) (respectively x 1 (k)): Now, the question is: which expression should be used to represent the set of admissible state vectors, X k , appearing in equation (10)? A possible answer is given next: Let us elaborate more on the above equations. Regarding variablex 2 (k), this is used in the evolution of d + 1 (k) as in the first equation of (31). Because of the positive contribution of the term κx 2 (k) on d + 1 (k), a conservative approach would suggest to take forx 2 (k) a possible upper bound of x 2 (k) and this is exactly the spirit behind the evolution ofx 2 (k) as expressed in the second equation of (32). Here,x 1 is an average value for x 1 . A similar reasoning applies tox 1 (k), used in the evolution of d + 2 (k) as in the second equation of (31). We now observe a negative contribution of the term −κx 1 (k) on d + 2 (k) and therefore take forx 1 (k) a possible lower bound of x 1 (k) as shown in the first equation of (32).
We can now move to show and comment on our simulated results.
We have carried out two different sets of experiments In the line of the weakly coupling assumption (see Assumption 2), we have set κ small enough and in the range from 0.01 to 0.225. Such a range works well as we will see that |κx i | is always less than w i , which also means Bx(k) + Ew(k) < 0. For the sake of simplicity and without loss of generality, we take capacity C = 3, In a second set of simulations, for a horizon length N = 6, we have studied how the percentage error below varies with different values of the elastic coefficient κ = {0.01, 0.2, 0.225}: % = optimal cost of (M IP C i ) − optimal cost of (M IP C) optimal cost of (M IP C) %.
The role of κ is crucial as we recall that κ describes the effective tightness and coupling between different states x 1 (k) and x 2 (k). We do expect that small values for coefficient κ, which means weak coupling of state components, This is in line with what we can observe in Fig. 3 where we plot the error % as function of coefficient κ. For relatively small values of κ in the range from 0 to 0.2, we observe a percentage error not exceeding 1 percent, % ≤ 1.
A discountinuity at around κ = 0.2 causes the error % to go from about 1% to 20%.
In Fig. 4, for a horizon length N = 6 and for a value of κ = 0.225, we depict the exact solution (dashed squares) and approximate solution (solid triangles) returned by the (M IP C) and by the (LP i ) respectively. Dotted lines represent predicted trajectories in earlier periods of the receding horizon. We note that controls u i (k) never exceed the capacity and are always associated with unitary control actions y i (k). Also, we observe four control actions (four peaks at 1) We also compared exact and approximate solutions for a smaller value of κ = 0.2 and observed that we still have notable differences in the plot of continuous controls u 1 (k) which cause a reduced percentage error % = 1. We have concluded our simulations by noticing that the percentage error % is around zero when we reduce further the value of κ to 0.01.

Numerical examples on the mean-field
In this subsection, we present two numerical examples on the mean-field approximation.  ( 1 n , u(k) + 1 n , x(k) + 100 1 n , y(k) ) .
We also take φ = 13. We plot in Fig. 5 the time evolution of the state x(k).
As expected, the state is non-negative for all k. Also, the state x(k) converges to a neighborhood of zero of size c − min k {d − i (k)} = 2. the system dynamics take the form where u i (k) is an (s, S) strategy of the type   mean-field coupling in a multi-agent system problem, where each agent seeks to compensate a combination of an exogenous signal and the local state average.
We have discussed a large population mean-field type of approximation as well as the application of predictive control methods.
There are at least three possibilities for future developments. First, one needs to study connections between regeneration intervals and reverse dwell time conditions developed in hybrid/impulsive control. Second, we intend to zoom in on the exploitation of cutting plane methods to increase the efficiency of linear relaxation approximations. Third, it would be of interest to investigate the mean-field large population approximations that arise from the decomposition of the mixed-integer optimal compensation problem.