Accelerate the optimization of large-scale manufacturing planning using game theory

This paper studies a real-world manufacturing problem, which is modeled as a bi-objective integer programming problem. The variables and constraints involved are usually numerous and dramatically vary according to the manufacturing data. It is very challenging to directly solve such large-scale problems using heuristic algorithms or commercial solvers. Considering that the decision space of such problems is usually sparse and has a block-like structure, we propose to use decomposition methods to accelerate the optimization process. However, the existing decomposition methods require that the problem has strict block structures, which is not suitable for our problem. To deal with problems with such block-like structures, we propose a game theory based decomposition algorithm. This new method can overcome the large-scale issue and guarantee convergence to some extent, as it can narrow down the search space and accelerate the convergence. Extensive experimental results on real-world industrial manufacturing planning problems show that our method is more effective than the world fastest commercial solver Gurobi. The results also indicate that our method is less sensitive to the problem scale comparing with Gurobi.


Introduction
This paper focuses on the solving of large-scale manufacturing planning problems from the industrial applications, which can be modeled as bi-objective Integer Programming (IP) problems. One objective is to maximize the order fillrate (i.e., maximize the satisfaction of requirements). The other is to minimize the total cost (i.e., minimize the operation cost, including inventory cost, production cost and transportation cost). Our goal is to schedule the future manufacture in fac-  1 Huawei Noah's Ark Lab, Hong Kong, China tories every day. In practical scenarios, the IP problems can easily have over 1 billion decision variables and constraints. It is challenging for commercial solvers such as Gurobi and CPLEX to deal with such large-scale problems efficiently. The larger the problem scale is, the more serious situation becomes.
On one hand, one natural approach to handle the largescale problems is decomposition. Many decomposition methods such as Benders decomposition [5] and DantzigCWolfe decomposition [14] have been well devel-oped. There are also many tricks to accelerate the decomposition methods [5,8,12]. However, these decomposition methods require that the problems satisfy the strict blocked structures. Most practical problems are too complicated to meet the requirement.
On the other hand, there are many heuristic methods have been proposed. The most related work is [6]. It has replaced part of integer variables with heuristic constraints. Experimental results on facility location tasks show its advantage over CPLEX. In this paper, we adopt the similar idea of heuristic constraints. Another related work is [7], in which data-driven algorithms have been proposed to boost solvers. Nevertheless, these approaches are to solve relatively small problems. The problem scale involved in this paper is much larger than previous ones. A new method based on symmetry-breaking constraints have been utilized for both reformulation and reduction of model in solving large-scale LP and IP [9,10]. Otherwise, there also exists some related research from the field of game [1,4]. They have combined different games with various decomposition algorithms. However, their problems are still far more smaller and simpler than practical ones. To the best of our knowledge, this paper is the first work to use game-based decomposition algorithm to solve billion scale manufacturing planning problem from industrial practice.
Considering the above challenges, we propose a gamebased decomposition algorithm and apply it to big manufacturing planning problems. In this algorithm, we firstly reformulate the IP problem from game perspective. The two objectives are regarded as two players. One is the leader and the other is the follower. Apparently, both two players have relatively small scales, compared to original problem. Then optimizing IP problem is transformed into finding the equilibrium between two players. Different from others, our algorithm is more flexible to deal with non-strict blocked structure.
This paper makes the following major contributions: (1) We propose a novel decomposition algorithm inspired by game, for those non-strict blocked problems. (2) We transform the optimization of IP problem into finding equilibrium between two players, which overcomes the large scale and guarantees the convergence. (3) We construct heuristic constraints, which narrows down the search space, and hence accelerates the convergence.
Experiments are conducted on practical industrial manufacturing planning problems. The results show significant improvements over the best commercial solver Gurobi. It can also be observed that the running time of our algorithm increases much slower than that of Gurobi when problem size increases, which indicates that the proposed algorithm is more friendly to large-scale problems.

Model for large-scale manufacturing planning
In general, the manufacturing planning problem can be modeled as a bi-objective integer programming as below: where the x, y, z, and m are the decision variables, and they are all positive integers which must satisfy the following constraints: Hereby, G(x and y) are both linear, but there are no extra requirements on F(z, m). Meanwhile, c 1 , c 2 , b, β, γ , and ζ are constant vectors, and A, U 1 , U 2 , P 1 , P 2 , N 1 , N 2 , and N 3 are constraint matrices. All the constant vectors and constraint matrices are with corresponding dimensions. In practice, both the objectives and constraints have specific meanings. Figure 1 indicates the mathematical structure of model (1), from which we can find that there exist two main characteristics of the original IP model (1): (1) it has separable objectives. Clearly, one objective is of z and m, while the other only depends on x and y. It indicates that we can design an alternative and iterative decomposition method to replace the direct optimization. (2) Its constraint matrix is non-strict blocked. It indicates that the decomposition method must guarantee to converge to the same optima as the original problem. Thus, the conventional decomposition methods cannot be utilized directly. Considering above, a new decomposition algorithm based on the above properties is proposed in this paper.

New decomposition from game
We first decompose the original problem (1) into two subproblems I and II, and both its objective as well as constraints are divided into two parts. It should be noticed that owing to the structure of model, there exists an overlap between the constraints of two subproblems. Assuming that subproblem I could be optimized firstly, and subproblem II is optimized subsequently after the convergence of subproblem I, there exists an apparent alternative and iterative process during the total optimization. Now, we reconsider the two subproblems from game perspective, and then, the two subproblems are treated as two competed players. Let us consider the order during optimization, and we consider subproblem I as the leader, while subproblem II serves as the follower. As mentioned above, we can give the respective mathematical forms as follows: x ≥ 0, y ≥ 0, z ≥ 0, m ≥ 0. ( Follower: min G(x, y) = c T 1 x + c T 2 y, s.t., Ax ≤ b, Apparently, comparing with the original model, both problems (3) and (4) have relatively small scales. It is relatively easy for solvers to optimize them sequentially. Owing to the definitions of leader and follower, we call such approach as a game-based decomposition.

Optimization and convergence
Based on the decomposition and definitions above, we can naturally convert the optimization of the original problem into finding the equilibrium between leader (3) and follower (4). We consider the following alternative and iterative process to optimize leader and follower and find the equilibrium between them: 1) Solve problem (3) with optimal solutions (z * , m * ). From game perspective, that is the leader moves first. 2) Solve problem (4) after substituting (z * , m * ) with optimal solutions (x * , s * , v * ). It indicates that the follower finds a best response to the decision of leader. 3) Solve problem (3) again after fixing (x * , s * , v * ) and an additional constraint given by the follower's response, with new solutions (z * 1 , m * 1 ). It denotes the leader's adjustment according to the response of follower. 4) Problem (4) is solved again with (z * 1 , m * 1 ) substituted. That implies the follower modifies response according to leader's latest strategy.
The two players will stop when they cannot find better strategies in the next loop. It should be mentioned that after the follower's first response, we must add an additional constraint to the leader and such constraint does not exist in the original problem. In practice, we usually obtain the additional constraint in the following steps. We first give the approximate dual form of problem (4) as below: where σ = γ + (ζ − N 1 z) T ,x andỹ are the dual variables corresponding with x and y, respectively, and z is fixed by leader in the last loop. Assuming thatx * andỹ * are the optimal solutions of (5), we can add the following constraints to problem (3): where α is a hyperparameter with corresponding dimension and works as an interface. From optimization perspective, Constraint (6) serves as a heuristic bound and it is apparently additional constraints for leader. It can be understood as that we only merge the intersections of all the dual forms. In practice, they can narrow the search space and accelerate the convergence with a high probability. Solid lines in Fig. 3 show the results of heuristic constraints. It is observed that in a shorter time, heuristic constraints lead to larger F(z, m) and smaller G(x, y). Thus, we argue that the heuristic constraints can lead to improvements on both the convergence speed and solution quality.

Algorithm
Based on above discussions, we can give the algorithmic flow for proposed game-based decomposition in Algorithm 1.

Illustrations on algorithmic advantage
Back to other given alternative and iterative algorithms [5,15] in solving large-scale optimization problem, the new proposed game-based decomposition method has the following different points: (a) Different from the given decomposition algorithm, the order of different subproblems does matter during the optimization. Recall that the two subproblems are leader and follower, respectively. It should be mentioned that for leader, its solution matters for the total iterative process, since it provides the initial point for optimizing the follower; while for follower, its response is added to the leader and it reflects the convergence efficiency. In practice, we prefer to choose the subproblem that is relatively easy to be solved as the leader. (b) Different from the given alternative and iterative algorithm, we must add the response of follower as the additional constraints during optimization. Figure 3 shows the effect of additional constraints. The abscissa is F(z, m) and ordinate is G(x, y). They change along the given directions. Based on the settings, higher quality solutions mean larger F(z, m) but smaller G(x, y). Assume that Gurobi optimize original problem (1) in a weight-sum way. The dotted curve is obtained by Gurobi with different weights adjusted. Although the curve is similar as the Pareto front, no one can guarantee its Pareto optima. The dashed lines represent the results of game-based decomposition. It is clear that as F(z, m) increases and G(x, y) decreases, the game-based decomposition can converge to a solution, whose quality is no lower than the solutions given by Gurobi directly. (c) Different from the given decomposition algorithms which aim at finding the Nash equilibrium between subproblems, our algorithm focuses on the Stackelberg equilibrium. According to the game theory [3,11], the Stackelberg equilibrium between the leader and follower guarantees the convergence and solution quality. More discussions are given in Sect. 4.

Theoretical guarantee
Preliminaries of game Figure 4 shows the difference between Nash and Stackelberg equilibrium.  F(z, m), while ordinate is G (x, y). The dotted curve denotes the solutions given by Gurobi in a weight-sum way directly. The dashed and solid lines are from our algorithm with and without heuristic constraints, respectively. It shows the convergence of game-based decomposition.
Moreover, heuristic constraints not only accelerate the convergence but also lead to higher quality solutions  N , b N ) is a Nash equilibrium, since F A has a horizontal tangent at this point, while F B has a vertical tangent. It denotes that one cannot increase his payoff by single-mindedly changing his own strategy, as long as the other sticks to the Nash equilibrium.
Back to our problem, the leader (3) (i.e., A in Fig. 4) adopts the strategy a, and the follower (4) (i.e., B in Fig. 4) requires to  maximize F B (a, b) and chooses a best reply b * = f (a), the goal of leader is now to maximize F A (a, f (a)). Assuming that player A servers as the leader and announces his strategy in advance, then player B makes his decisions accordingly. In Pareto optima, one cannot increase its own payoff strictly without decreasing the payoff of the other.

Theoretical discussions
Theorem 1 first guarantees the solution quality of game-based decomposition under strict mathematical sense. It denotes that the game-based decomposition can converge to the optimal solution of original problem under certain mathematical assumptions, where the two players cannot find better strategy in the next loop. Proof Supposing that the two players are differentiable, we label the follower via u 1 , while u 2 refers to the leader. Due to that, they are differentiable, and the cost functions for u 1 and u 2 , respectively, can be given as: and Q j and R jk are both positive definite and symmetric. It is clear that the cost function of every player hopes to optimize its own function and minimize the partner's. Owing to the different levels of u 1 and u 2 , we can first obtain the optimal solutions of follower as: and the follower's Hamiltonian is: Thus, the optimal controller can be obtained as ∂H u 1 ∂u 1 = 0, that is: Since the follower's optimal solutions can affect the optimization of leader, the optimal solutions of leader can be expressed as: with r 2 (x, J * 1 , u 2 ) = r 2 (x, u 1 , u 2 ) Utilizing the gradient of follower's Hamiltonian: we can obtain the optimal cost function of leader: we have the leader's Hamiltonian as: with γ as the real vector. 1 To get the optimal controller, we have ∂H u 2 ∂u 2 = 0, which leads to: Obviously, we also have: and gradients of Hamiltonians for leader and follower also satisfy: i.e., the gradients on follower's and leader's Hamiltonians converge to the equilibrium from game perspective.
Considering the complicated circumstance in practice, Theorem 2 then guarantees the solution quality with a compact upper bound.

Theorem 2 The proposed game-based decomposition can approximate the optima of original problem, since optimal solutions of leader can be bounded by the dual form and optimal solutions of follower.
Proof . To prove Theorem 2, we first construct a toy model: to show why we can add a constraint to leader with the optimal solutions and dual form of follower. Hereby, x, y, and z are the decision variables, c and d are constant vectors, and A, M, and N are constant matrices with corresponding dimensions. Now, as the settings in our game-based decomposition algorithm, problem (13) is decomposed into two problems: the leader: and the follower: where x is given by the optimal solutions of leader (14). Different from [5], there exist two variables in the follower. A naive method is to use the nested form to give the dual form, which only has one decision variable in one loop, as nested Benders decomposition [13]. However, we think this method costs too much time in large-scale problem. Next, we give the dual form of follower (15) in an approximate way: whereỹ andz are the dual variables corresponding with y and z, respectively. Then, for the original problem (13), we have: whereỹ * andz * are the optimal solutions of corresponding dual problems. Thus, Theorem 2 is proved.

Experimental evaluation
Now, we take a practical manufacturing planning task from our company as an example, to show the deployment details for the proposed game-based decomposition algorithm. In this example, i ∈ I = I P ∪ I AI denotes all kinds of products; p and p are both for plant; and t ∈ [0, T ] refers to the time period. I P and I AI represent the sets of semi-finished and finished products, respectively. Other related notations are presented in Table 1.

Model
In this subsection, we ignore the problem details and focus on demonstrating the core problem structure. Hereby, the two objectives are embodied as the order fillrate F(z, m) and total cost G(x, s, v), respectively: where g 1 (x), g 2 (s), and g 3 (x, s, v) refer to the manufacturing, transportation, and holding costs, respectively: Meanwhile, there are many constraints to be considered. Among them, three are corresponding with constraint (2a): Constraint (19) is the limitation of production capacity. Constraint (20) is for the lot size of each productivity, which means that the goods must be produced in pair. Constraint (21) denotes the minimal production of every plant. The limitation on delay corresponds with constraint (2b): Limitations on inventory and inbound correspond with constraint (2c): Limitations on outbound and replacements relate to constraint (2d): Note that constraint (26) denotes that item i should first satisfy its own father node, and then serves for other leaf nodes as the replacement. All the related variables are positive.

Implementation details
According to Algorithm 1, the implementation details for such example is given below:  G(x, s, v), with all constraints apart from (22).
-Step 3.1. Compute an upper bound (UB) as below: where β (γ ) 1 and β (γ ) 2 are two hyperparameters. -Step 3.2. Compute a lower bound (LB) as below: lower < , stop, the optimal solutions are obtained. Otherwise, the algorithm continues to the next step.
It is worth mentioning that in practice, we always choose a relative easy player as the leader.

Numerical results
In this section, we evaluate the game-based decomposition algorithm in industrial applications. All baselines are given by Gurobi 8.1 [2,16]. To simplify, we only evaluate the linear relaxation of IP problems 2 .

Datasets
The experimental datasets are from real-world manufacturing planning applications, and their statistics are shown in Table 2. Here, column Item denotes the number of products to be manufactured. Column Scale lists the number of variables in the problem. Here, we also evaluate our algorithm on problems with millions of variables, to test its effectiveness on a variety of problems. Datasets I-1, II-1, and III are from normal manufacturing instances, where the order demand and production capacity of related factories are both with regular quantities. Datasets I-2 and II-2 are from irresistible marketing and employment situation, with limited production capacity and large demand. 3

Solution quality evaluation
We first evaluate the algorithm on solution quality, as shown in Figs. 5, 6, 7, 8, and 9. Horizontal axis refers to the order fillrate, and vertical axis refers to the total cost. Clearly, our game approach can lead to higher quality solutions, with higher order fillrate and smaller cost than that obtained from Gurobi. It is easy to understand that higher fillrate always costs more. There is no standard trade-off between fillrate and cost. In this sense, the higher quality solutions denote higher fillrate but with smaller cost.
Obviously, in most cases, our algorithm can dominate Gurobi on solution quality. In additions, we also find that in some special days, our algorithm leads to higher fillrate with more cost, e.g., 11/30-12/29 and 01/02-02/01 in Dataset I-1, 11/21-12/20 and 11/28-12/27 in Dataset II-1. We consider the latter case as our additional advantage over Gurobi, since in some specific manufacturing planning tasks, maximizing the order fillrate is more important than minimizing the cost. Therefore, we can claim that the proposed algorithm outperforms the best commercial solver Gurobi on solution quality. Figure 10 shows the comparison of computational efficiency. The upper part is from our algorithm, and the lower part is from Gurobi. Clearly, when the problem is small, Gurobi is much more efficient than ours. However, with the increasing of problem scale, our algorithm's efficiency improves sharply comparing with Gurobi. Figure 11 shows the comparison on time increment when problem scale increases. We can observe that as the running time of Gurobi has 1682-fold increment, while the running time of our algorithm only has 517-fold increment. Our algorithm's efficiency drops much slower than Gurobi's when the problem scale increases.

Efficiency evaluation
It is because the computation of dual problems and construction of heuristic bounds would cost a certain amount of time. It is very expensive in a relatively small-scale problem. However, with scale increasing, the percentage of time consumed by calculating dual forms and constructing heuristic constraints drops relatively. Therefore, our algorithm is less sensitive to the problem scale comparing with Gurobi.

Conclusions
Big industrial manufacturing planning problems bring great challenges to commercial solvers. In practice, these problems have up to billion decision variables and constraints. This paper has proposed a game-based decomposition algorithm to deal with these big problems. Experiments on industrial datasets have shown our improvements on solution quality and robustness. Furthermore, it can be observed that our algo- Different from other decomposition algorithms, our algorithm can deal with non-strict blocked problems. Our major It is shown that our algorithm's efficiency decreases much slower than Gurobi's when the problem scale increases contributions include: (1) a new decomposition algorithm inspired by game, which is different from previous works and can deal with non-strict blocked problems; (2) new optimization process, which overcomes the large scale and converge to a solution; (3) construction of heuristic constraints, which can narrow down the search space and accelerate the convergence.
To the best of our knowledge, this is the first work to apply game-based decomposition algorithm for billion scale industrial manufacturing planning problems. The algorithm demonstrates significant improvement over state-of-the-art commercial solver Gurobi on solution quality, robustness, and extensibility to large-scale problems. In the future, we will continue to study the efficient algorithms for industrial large-scale tasks in supply chain management and scheduling.