Exact and Metaheuristic Approaches for the Production Leveling Problem

In this paper we introduce a new problem in the field of production planning which we call the Production Leveling Problem. The task is to assign orders to production periods such that the load in each period and on each production resource is balanced, capacity limits are not exceeded and the orders' priorities are taken into account. Production Leveling is an important intermediate step between long-term planning and the final scheduling of orders within a production period, as it is responsible for selecting good subsets of orders to be scheduled within each period. A formal model of the problem is proposed and NP-hardness is shown by reduction from Bin Backing. As an exact method for solving moderately sized instances we introduce a MIP formulation. For solving large problem instances, metaheuristic local search is investigated. A greedy heuristic and two neighborhood structures for local search are proposed, in order to apply them using Variable Neighborhood Descent and Simulated Annealing. Regarding exact techniques, the main question of research is, up to which size instances are solvable within a fixed amount of time. For the metaheuristic approaches the aim is to show that they produce near-optimal solutions for smaller instances, but also scale well to very large instances. A set of realistic problem instances from an industrial partner is contributed to the literature, as well as random instance generators. The experimental evaluation conveys that the proposed MIP model works well for instances with up to 250 orders. Out of the investigated metaheuristic approaches, Simulated Annealing achieves the best results. It is shown to produce solutions with less than 3% average optimality gap on small instances and to scale well up to thousands of orders and dozens of periods and products. The presented metaheuristic methods are already being used in the industry.


Introduction
Production systems are being subject to continuous and radical change in the course of the last decades. The need for productivity improvements is provoking companies to invest heavily in automation on all levels. Production planning plays a major role in these developments as the replacement of manual planning with software-assisted or even autonomous systems can lead to considerable efficiency increases.

arXiv:2006.08731v1 [cs.AI] 15 Jun 2020
We introduce a new problem in the field of production planning which arises from the needs of an industrial partner. It is a combinatorial optimization problem which treats the leveling of production and we therefore call it the Production Leveling Problem (PLP). It belongs to medium-term planning, which means it is intended to be embedded between long-term planning and the scheduling of the concrete production sequence.
The problem is concerned with assigning orders of certain product types and demand sizes to production periods such that the production volume of each product type is leveled across all periods. Furthermore, the overall amount produced in each period is subject to leveling as well. A solution is feasible if the production volumes to be leveled do not exceed given maximum values. The optimization part consists in minimizing the deviation of the production from the optimal balance, while at the same time making sure that the orders are assigned approximately in the order of their priorities. The idea behind this goal is that considering orders only in the order of decreasing priority, as it is often done, frequently leads to spikes and idle times for certain resources involved in the production process. Leveling these highs and lows results in a smoother production process because a similar product mix is produced in every period. It is important to note, that the solution to the PLP is not a schedule since the orders are only assigned to production periods but the concrete execution sequence and assignment to machines and workers is not part of this problem. The intention is rather so serve as a step between long-term planning and short-term production scheduling.
The main goal of this work consists of modeling the PLP formally and developing solution strategies for it. The primary method we propose is Simulated Annealing because of its capability to solve large-scale instances of the problem. Therefore, we developed two move types for obtaining neighboring solutions, which are used in Simulated Annealing. Furthermore, we investigate Variable Neighborhood Descent (VND), which is a deterministic algorithm using also these two neighborhoods. We also propose a Mixed Integer Programming (MIP) model in order to obtain optimal solutions and lower bounds. We are interested in finding the border between instances which can be solved exactly and those where the exponential nature of the problem makes the usage of MIP impractical. Furthermore, we want to investigate how near we can get to optimality by the local search methods we propose. To sum up, the main contributions of this paper are: • A mathematical model for the PLP, • a proof of N P-hardness, • a MIP model, • two neighborhood structures for local search, • realistic and randomly generated problem instances and • an extensive evaluation of MIP and metaheuristic methods.
The rest of this paper is structured as follows: Section 2 presents the problem first informally, then a precise mathematical formulation is given and finally related work is discussed. In Section 3 the theoretical complexity of the PLP is analyzed, yielding an N P-hardness result. Section 4 introduces a MIP model. In Section 5 we finally turn to local search methods and present the variant of Simulated Annealing we apply. Experimental evaluation is performed in Section 6, providing answers to the research questions which we formulated above. Section 7 summarizes the results and presents ideas for future work.

Problem Statement and Related Work
In this section we first want to build up an intuition for the PLP which we approach through the use of examples. Then we specify the parameters, constraints and objective function formally. Finally we shed light on related problems presented in the literature.

Problem Description
The input to the PLP are a list of orders, each of them having a demand value, priority and product type. Furthermore, we are given a set of periods and the maximum production capacity per period, both for all product types together and for each one separately. We search for solutions by finding an assignment of orders to periods such that the production volume is balanced between the periods while trying to stick to the sequence implied by the order's priorities as good as possible. We can see the objective function of the PLP as the task of finding a good tradeoff between the following goals: 1. Minimize the sum of deviations of the planned production volume to the average demand (i.e. the target value) for each period, ignoring the product types. This makes sure that the overall production per period is being leveled. 2. Minimize the sum of deviations of the production volume of each product type to its respective mean (target) value, making sure that the production of each product type is being leveled.
3. Minimize the number of times a higher prioritized order is planned for a later period than a lower prioritized order, which we call a priority inversion. This objective makes sure that more important orders are scheduled in earlier periods.
Let us now explore the three optimization goals by means of examples: 1. Figure 1 shows a tiny example instance with five orders, which are shown as boxes, where the box height corresponds to the order size. The orders should be assigned to the three periods such that the distances of the stacks of orders and the dashed target line is minimized and no stack crosses the red line which represents the capacity limit. It is easy to see that this solution is optimal w.r.t. the leveling objective.
2. Figure 2 shows a slightly larger problem instance which has three product types (blue, green and red). It visualizes the solution from the perspective of the second objective, which works the same way as the first but discriminates by product types. That is, we seek to minimize for each of the three product types blue, green and red the deviation from the dashed target line in each period. 3. Figure 3 comes back to the first example, but views it from the perspective of the third objective. The numbers inside the orders signalize the priorities, whereby a larger number indicates a higher importance. That being said it is obvious that the red order should not be assigned to an earlier period than the yellow or the blue one. This bad state between red and blue/yellow is what we call a priority inversion and their number is what should be minimized by the third goal. In the example, a better solution can be easily constructed for example by swapping the red order with the yellow one, because it would make both priority inversions disappear.
We have seen in the examples, that an optimal solution w.r.t. one objective is not necessarily optimal w.r.t. another. As we want to combine the three objectives into one by a weighted sum, the location of the optima will clearly depend on the weights. These weights must be determined on the basis of a specific use case because there is no general way to decide without domain knowledge e.g. how important one priority inversion is compared to one unit more of imbalance. We worked out a sensible default weighting in cooperation with our industrial partner based on their real-life data and use it for all experiments throughout the paper. How the objective function can be formally stated and how the weighting works is described in more detail in the following section. The same solution as above has two priority inversions. An optimal solution w.r.t. priorities would be to swap the red and the yellow order.

Mathematical Formulation
Now we turn towards a formal description of the the problem, consisting of parameters, variables, constraints and the objective function.

Input parameters
where n is the number of periods ai ∈ R + for each objective function component i ∈ {1, 2, 3} the associated weight c ∈ R + the maximum overall production volume per period ct ∈ R + for each product type t ∈ M the maximum production volume per period dj ∈ Z + for each order j ∈ K its associated demand pj ∈ Z + for each order j ∈ K its associated priority tj ∈ Z + for each order j ∈ K the product type d * ∈ Z + the target production volume per period, i.e. 1 n j∈K dj d * t ∈ Z + the target production volume per period for each product type t ∈ M , i.e. 1 n j∈K|t j =t dj

Variables
• For each order the production period for which it is planned: • The production volume for each period (helper variable): • The production volume for each product type and period (helper variable):

Hard constraints
• The limit for the overall production volume is satisfied for each period: ∀i ∈ N w i ≤ c • The limit for the production volume of each product type is satisfied for each period:

Objective function
The following three objective functions represent the three targets to minimize: Function f 1 represents the sum over all periods of deviations from the overall target production volume (i.e. all product types at once). Function f 2 states the sum over all product types of sums over all periods of the deviations from the target production volume for that product type, normalized by the respective target value. The normalization is done so that every product has the same influence onto the objective function regardless of whether its average demand is high or low. Function f 3 counts the number of priority inversions in the assignment, or in other words the number of order-pairs (i, j) for which i is planned after j even though i has a higher priority than j.
In order to combine these three objectives into a single objective function and achieve a weighting which does not change its behavior between instances with different number of orders, periods or product types, the cost components need to be normalized.
The normalization ensures that g 1 and g 2 stay between 0 and 1 with a high probability. Only for degenerated instances, where even in good solutions the target is exceeded by factors ≥ 2 higher values are possible for g 1 and g 2 . The value of g 3 is guaranteed to be ≤ 1 because the maximum number of inversions in a permutation of length k is k · (k − 1)/2.
The final objective function is then a weighted sum of the three normalized objective functions, where the weight a i of an objective can be seen approximately as its relative importance.

Quadratic objective function
For some instances of the PLP the above presented objective function based on absolute differences may not be well suited. Intuitively, one could argue that a solution containing i periods with a missing demand of 1 is better compared to an otherwise equal one containing 1 period with missing demand i. However, the above presented objective penalizes the two scenarios in exactly the same way. In order to penalize larger deviations more than small ones, we introduce an alternative variant of the objective function which calculates the penalty by taking squared differences. This implies that also the normalization factors need to be adapted.
The final objective function using squared differences is again a weighted sum of the three normalized objective functions: minimizeg = a 1 ·g 1 + a 2 ·g 2 + a 3 ·g 3 To our experience the two variants of the objective function behave very similarly for the vast majority of our instances. Only in some rare cases, where no well-balanced solution is possible and there are large trade-offs to be made, we found that using the quadratic objective function produces results which intuitively look better than the ones produced by the absolute objective. However, there are also disadvantages to consider when using the alternative objective: • It renders the otherwise linear problem a quadratic one, which makes it harder to solve using MIP.
• Due to the squares in the denominators of the normalization factors, delta costs are often very small which increases the risk of numerical instabilities when using exact solvers.
There is also evidence in the literature that there is no a priori reason to prefer one of the two objectives. [15] investigated balancing objectives in a more general form and stated that a set of violation measures for the perfect balance is given by the L p -norm of a vector of variables X minus its mean m for p ≥ 0. For p = 1 that corresponds to X∈X |X − m|, which is the same as our linear objective f 1 . Similarly, the variant with p = 2 corresponds to the quadratic-difference based objective of the PLP. As a conclusion of the study of the different variants the authors state that neither criterion subsumes the others [15], which means that there is no reason to commit oneself to only one.

Related Work
The term production leveling is commonly associated with the Toyota Production System (TPS), where it is also called Heijunka. It is a concept which aims to increase efficiency and flexibility of mass-production by leveling the production in order to keep the stock size low and reduce waste. Ideally the result of applying Heijunka is zero fluctuation at the final assembly line. Heijunka can mean both the leveling of volume at the final assembly line and the leveling of the production of intermediary materials [12].
The PLP is clearly inspired by Heijunka in the sense that the usage of resources should be leveled in order to increase production efficiency but its concepts differ quite substantially from the classical implementation of Heijunka (in the TPS) in the following points: • The PLP does not operate on the level of schedules but disregards the ordering which the items are produced within a period. In other words, it is concerned with planning and not scheduling, which is performed subsequently for each production period.
• Intermediate materials are not part of the PLP. While Heijunka aims to level also their production to keep stock sizes of intermediary products small, the PLP is currently only concerned with one level.
There exists a whole research area concerning scheduling problems inspired by ideas from the TPS and especially Heijunka. Under the umbrella term level scheduling there exist several problems such as the Output Variation Problem and the Product Rate Variation Problem [9,3]. They have in common that they aim to find the best schedule for production at the final assembly line so that the demand for intermediary materials and their production is leveled which keeps the necessary stock sizes low. However, these problems are quite different from the PLP due to the same reasons presented above with respect to Heijunka.
Under the term Balancing Problems several other problems are known in the literature, which are more closely related to the PLP: • The Balanced Academic Curriculum Problem (Balanced Academic Curriculum Problem (BACP)): This problem deals with assigning courses to semesters such that the student's load is balanced and prerequisites are fulfilled [4]. The balancing of the sum of course sizes assigned to a semester is equivalent to the balancing of production load which we are confronted with in the PLP. There exists also variant called the Generalized BACP which introduces so-called Curricula where each of them should be leveled [5]. The concept is similar to the product-types of the PLP except for that an order has only one product type while a course can be in multiple curricula. The big difference to our problem are additional constraints of the BACP, which enforce prerequisites between courses -a concept appearing frequently in the diverse balancing problems. They differ from the PLP's priorities in the following aspects: Prerequisites are hard constraints while priorities are soft constraints, which makes a difference especially for exact solvers. Furthermore, prerequisites require one course to be finished strictly before another starts while it does not seem sensible for the PLP to require a penalty in case two differently prioritized orders are scheduled to the same period. There we only want to penalize when the ordering implied by the priorities is inverted, hence the term priority inversion. This is a fundamental difference which makes it impossible to solve one problem by converting it to the other.
• Nurse scheduling problems are an active field of research since their introduction in the 70s [18]. While most of the contributions do not consider workload balancing, a few of them, starting with [11], do consider also a fair distribution of the nurses' workload. They propose an Integer Programming model for the Nurse to Patient Assignment Problem in neonatal intensive care, which is concerned with finding the optimal assignment of patients to a set of working nurses, so that the workload of the team is balanced and a number of restrictions are fulfilled. The main difficulty is the variability of the infant's conditions which greatly influences the amount of work needed. The problem is often solved in two steps by first assigning nurses to zones of the nursery and then assigns infants to nurses. More recent work in this area is for example by a paper by Schaus et al. who investigated a Constraint Programming (CP) approach using the spread constraint for balancing [16]. Furthermore, stochastic programming based approaches with Bender's decomposition have been proposed [13].
The balancing objective of the Nurse to Patient Assignment Problem is again very similar to the objective function which we introduced for the PLP. However, we cannot directly compare to the results because the priorities of the PLP are have no equivalent in this problem and also vice-versa some side-constraints and the zone assignment cannot be expressed.
• Simple Assembly Line Balancing (SALB): An assembly line consists of identical work stations aligned along a conveyor belt. Workpieces move along the conveyor belt and at each station a set of (assembly) tasks is carried out, where each of them has a task time. By the cycle time we denote the time after which workpieces are moved on to the next station. The goal is either to minimize the number of work stations needed given a fixed cycle time or to minimize the cycle time given a fixed number of work stations.
The SALB problem is the simplest and most intensively studied variant of Assembly Line Balancing. A comprehensive overview over the different variants is provided by [2]. When comparing the SALB problem to the PLP, tasks map to orders, task times to order sizes and the fixed cycle time to the maximum capacity per production period. Hence, minimizing the cycle time is equivalent to minimizing the maximum load of a production period of the PLP, which would also be an admissible balancing objective. There is also recent work by [1] where the sum of squared deviations of the workstation loads is minimized, which is equivalent to the second variant of the objective which we proposed. However, the difference between precedence relations on the one hand and priority inversion minimization on the other hand, disallows once again a direct comparison between the problems.
For a more extensive list of Balancing Problems please refer to the dissertation of Pierre Schaus which investigates CP modeling approaches for a very diverse set of Balancing and Bin-Packing Problems [14].

Complexity analysis
As we are studying a new problem we are interested in its computational complexity. In this section we provide an N P-completeness proof of a decision variant of the PLP, followed by an argumentation of N P-hardness of the PLP optimization problem presented previously.
In order to prove N P-completeness, we consider the following decision variant of the problem where the objective function is dropped completely. Hence the task is solely to find a feasible assignment of orders to periods:

Instance:
A set of orders K, of products M and of periods N . For each order j ∈ K its demand dj with dj > 0, priority pj and product type tj. The maximum production capacity per period c and for each product type t ∈ M its associated maximum production capacity per period ct. Question: Does there exist an assignment {yj : N | j ∈ K} of orders to periods such that the capacity limit c and the capacity limit for each product type {ct : t ∈ M } are not exceeded for any period? Theorem 1. The Production Leveling decision problem is N P-complete even on instances with |M | = 1, i.e., a single product type.
Proof. In order to prove N P-hardness we give a polynomial time reduction from the N P-complete Bin Packing decision problem [17], which is defined as follows:

Instance:
A set of n bins S1, S2, . . . , Sn of size V and a list of k items of respective sizes a1, a2, . . . , a k Question: Can the items be packed into the bins?
The construction of the PLP instance is straightforward: That is, bins are converted to periods, each item with size a i to an order with demand d i and the bin capacity V becomes the maximum capacity per period c. There is only one product type and order priorities can be defined as some arbitrary constant.
If there exists a feasible solution to this instance of the Production Leveling decision problem (i.e. an assignment of orders to periods such that the capacity limit is obeyed), it follows that there exists also a valid bin packing into n bins because each bin with size V corresponds exactly to a period with the same capacity. Analogously, if no feasible solution of the PLP exists we know that the corresponding instance of Bin Packing is infeasible as well. Hence any instance of Bin Packing can be solved by converting it to an instance of the Production Leveling decision problem and solving that one. As the conversion is possible in linear time, the Production Leveling decision problem must be at least as hard as Bin Packing. Consequently, we have proven that it is N P-hard, even when considering only a single product type (|M | = 1).
In order to prove N P-membership, let us consider an assignment {y j : N | j ∈ K} of orders to periods. In order to verify whether this assignment is a valid solution, we need to check whether all capacity constraints are fulfilled: • Is the overall capacity limit satisfied for each period?
• Are the capacity bounds per period and product type satisfied?
In total, the number of inequalities that need to be checked is: |N | + |N | · |M | which is clearly polynomial in the size of the instance.
As the Production Leveling decision problem with only one product type is both N P-hard and in N Pit is N Pcomplete.
The Production Leveling optimization problem presented in Section 2.2 differs from the decision variant in that we do not only search a feasible assignment of orders to periods but the best possible one according to an objective function. Obviously, the optimization problem is N P-hard as well because it needs to satisfy the exact same set of hard constraints. N P-membership is, however, not the case as there is no polynomial-time algorithm for deciding whether a given solution is the optimal one.

Integer programming model for the PLP
The PLP is a weakly constrained optimization problem as the only existing constraints are upper bounds on the planned production volume per period and product. There are no constraints involved in the prioritization and the objective function is a trade-off which means for example that pruning solutions with a bad priority objective is not immediately possible as long as there is enough room for improvement in the balancing objectives. Hence the feasible solution space is very large. We propose an integer programming model for the PLP which is capable of providing exact solutions to the optimization problem for moderately sized instances.
The model is based on the mathematical formulation presented in Section 2.2. The problem input parameters are exactly the same, which is why they are not repeated in this section. However, because of performance reasons and syntactical restrictions we operate on more and in a few places also different variables. We introduce a binary view onto the order-period assignment through the variables X and the replacement of the helper variables w i and w i,t by two variables representing missing and surplus demand.
for each j ∈ K, whose value is the assigned period of order j zij ∈ {0, 1} for orders i, j ∈ K where pi > pj, existence of a priority inversion between i and j s + i ∈ R + for each i ∈ N the surplus demand for period i s − i ∈ R + for each i ∈ N the missing demand for period i s + it ∈ R + for each i ∈ N , t ∈ M the surplus demand for period i and product t s − it ∈ R + for each i ∈ N , t ∈ M the missing demand for period i and product t Constraints (13) to (17) are the model's required helper constraints. Constraint (13) makes sure, that there is exactly one period to which an order is assigned. Constraint (14) links the x ij to the y i variables. Constraint (15) links the y i to the z i,j variables. It makes sure that for every pair of orders i, j where i has a higher priority than j, z ij is 1 (representing an inversion) if i is planned later than j. Constraint (16) states for each period that the total demand planned plus the surplus minus the slack equals d * . As both variables have positive domains and they are subject to minimization, at most one of them will be non-zero in any optimal solution. Constraint (17) repeats this relationship over the variables s + it and s − it for each product type t. Constraint (18) ensures that the capacity bound per period is satisfied. This is elegantly achieved by stating that the sum of target demand d * and the surplus variable s + does not exceed the threshold. Analogously, Constraint (19) enforces the capacity limit per period and product type.
Finally, there are two redundant constraints for strengthening the formulation: Constraint (20) enforces a dominance relation for all pairs of orders which have the same product type and demand value. The constraint requires that the higher prioritized order occurs not later than the lower prioritized one which is sensible because otherwise we could swap the two orders to obtain a better solution. This cuts off parts of the search space where the optimal solution cannot reside. Constraint (21) links the s variables together, which also leads to improvements in the average runtime.

Absolute-difference-based objective
The following objective function is equivalent to the one presented in Section 2.2 but here it is stated on the variable set of the MIP formulation. It is not hard to see that in function g 1 the sum of the slack and surplus variable (s + i + s − i ) is equivalent to the absolute difference between planned an target demand |d * − w i |, because at least one of s + i and s − i will be 0 in any optimal solution and the other one holds the absolute difference. The same holds true for the analogous variables in g 2 .

Squared-difference-based objective
The second variant of the objective function which uses squared differences can also be easily expressed: The main difference to the first version is that the surplus and missing demand appears squared in the objective function. Furthermore, the normalization factors have been adapted.
Obviously, this formulation is no longer a linear program. However, as many state-of-the art solvers also support quadratic optimization we consider this variant worth reporting.

Local search for the PLP
As shown earlier, the PLP is an N P-hard optimization problem, i.e. it belongs to a class of problems for which no polynomial-time algorithms have been found so far and it is even unclear whether such algorithms exist. Consequently, there exist instances for which the required running time of any known algorithm to find the exact solution is exponential. It means also that exact solution approaches are impractical for solving large instances of the PLP and we need to take heuristic methods into account.
In this section we present metaheuristic local search techniques to solve the PLP. To obtain initial solutions we present two different approaches. Afterwards two neighborhood structures for the PLP are described and finally we explain the local search algorithms.

Construction of initial solutions
We developed a greedy construction heuristic which is capable of constructing good initial solutions in a very small amount of time. The parameters of the algorithm are a list of orders, the number of periods n and the random selection size r. The first step of the algorithm is sorting the orders by priority decreasingly which is already the approximate handling of objective 3. Then we loop over all periods i from 1 to n, performing the following steps: 1. Examine sequentially the orders from the head of the sorted order list: For each of them, if it still fits into this period obeying the capacity limits, calculate the delta cost for g 1 and g 2 (as defined in (4) and (5) in Section 2.2) which the inclusion of this order into the period would bring with it. If the delta cost is smaller than zero (i.e. including the order improves the objectives), it is added to a list of suitable orders. The orders 2. Afterwards, if the list is not empty, select randomly one of the r best suitable orders, plan it for period i, remove it from the sorted order list and go back to 1.
3. Otherwise (if there was no suitable order) repeat with i := i + 1.
Finally, we check whether there are any orders left which could not be assigned due to the capacity limits. If that is the case, they get assigned one by one to the period with maximal remaining capacity. This way especially those periods which are not filled well get assigned the remaining orders and the probability of a hard constraint violation is minimized. However, violating the maximum capacity constraint is allowed in this step because a complete assignment is required for the subsequent local search.
The parameter r controls the random selection size of step 2. If we set it to 1, the algorithm is deterministic. When using values greater than 1, the construction heuristic is randomized, which can be useful for some local search techniques (e.g. GRASP).

Neighborhood structures
We devised two types of moves for generating different neighborhoods of a solution which will be introduced in the following subsections. Furthermore, we briefly describe the delta evaluation approach and the methods of neighborhood exploration.

Move-order neighborhood
The move-order neighborhood (or simply move neighborhood) of a solution s consists of all solutions whose only difference to s is that one order has been moved to a different period. Figure 4 visualizes such a move. The figure on the left shows the leveling objective per product type before the move and on the right side we can see the result of applying the move. Order 2 is moved from P 2 to P 3 which yields in this case a better solution.
Enumerating the move neighborhood involves iterating over k orders for each of n − 1 possible target periods, i.e. the neighborhood size is exactly k · (n − 1).

Swap orders neighborhood
The swap-orders neighborhood (or simply swap neighborhood) of a solution s consists of all solutions s whose only difference to s is that two orders not assigned to the same period in s appear with swapped period assignments in s . Figure 5 visualizes such a move. Order 1 is swapped with order 2 which in this case again yields a better solution.
Enumerating the swap neighborhood involves iterating over all pairs of orders not assigned to the same period. Hence the neighborhood size is in O(k 2 ).

Neighborhood exploration
We investigated three types of neighborhood exploration: Orders per period (P1-P3) and product (blue, yellow) state before the move state after the move Figure 5: Example of a swap-orders move: solutions before (left) and after (right) • First Improvement: Generate and evaluate moves until the point where the first move is found who would improve the current solution. In order to prevent a bias towards the start of our neighborhood (e.g. the first orders in our input) the neighborhood traversal is performed in a cyclic way. That is, instead of starting every time at the same point we start right after the position where we found the first improving move the last time and search until either an improving move is found or we arrive again at the point where we started.
• Best Improvement: Generate and evaluate the complete neighborhood of a solution and select the move which leads to the biggest improvement. Ties are broken randomly.
• Random Neighbor: Generate and evaluate a random neighbor of the given solution.

Move evaluation
In order to explore a neighborhood systematically, we need to be able to compare moves with respect to their quality. Given two moves a and b, the first criterion to check is the number of hard constraint violations which each of them introduces or resolves. If a introduces fewer or resolves more of them we say that a is better than b. Otherwise -if the number of hard constraint violations is equal -we compare by selecting the one which has the lower move cost, which is defined as the change of the current solution's objective value if we would perform this move.
To avoid costly complete evaluations of whole solutions we propose a delta evaluation that efficiently evaluates how much the objective value changes for a given move. The delta evaluation implementations for the two move types both use the same primitive for evaluating the cost of moving one order to a different period. When performing swaps, we calculate the cost of moving order one to the period of order two, the cost for moving order two to the period of order one and compensate the error which results from assuming in both calculations that the respective other order remains unchanged. The delta cost of moving an order is calculated for the three objective function components separately: 1. For the leveling objective we only need to keep track of the planned production volume for each period, so that we can calculate the effect the move on the difference to the target value.
2. For the per-product leveling objective we can do the same thing, given that we keep track of the planned production volume for each period and product.
3. The priority objective is the hardest and most time-consuming part of delta evaluation because moving an order from period i to j can introduce or resolve inversions between the moved order and every order assigned to a period between i and j. When the number of orders is very large it is inefficient to iterate over all such orders and perform comparisons because we need to do that for every candidate move. Our idea for optimizing this evaluation is based on the insight that the only thing we care about when moving an order past a period is the number of orders in that period which have smaller and larger priorities, respectively, not the actual priority values. Therefore, we maintain the priority values of all orders assigned to a certain period in a sorted list (one for each period), so that we can efficiently retrieve via binary search how many orders have smaller / larger priorities than the order which we currently want to move.
The delta cost of the three objective function components is aggregated to a single value by the usual formula for the objective value (7).

Algorithms
In this section we present details of the metaheuristic local search methods which we investigated for solving the PLP, namely the simple and deterministic VND as well as Simulated Annealing.

Variable Neighborhood Descent
VND is a deterministic local search technique which can be seen as an extension of hill climbing to multiple neighborhoods. The general idea is to go on to the next neighborhood if the current one gets stuck in a local optimum and return to the first one as soon as a further improvement is found. The selection of an improving move in the neighborhood is usually done by using a deterministic exploration technique, i.e. either first or best improvement. Algorithm 1 shows the details using pseudo code, as it is given in the Handbook of Metaheuristics [7]. The idea of using multiple neighborhoods is based on the following insights [7]: • A local optimum w.r.t. one neighborhood structure is not necessarily a local optimum w.r.t. another.
• A global optimum is a local optimum w.r.t. all possible neighborhood structures.
That implies it is beneficial to use several complementary neighborhoods and try to escape local optima of one neighborhood by switching to another.

Simulated Annealing
Simulated Annealing is a metaheuristic optimization method introduced by [8]. It resembles the physical process of annealing in metallurgy insofar as both methods use a cooling schedule in order to control the amount of random movements in the process, which in theory allows for convergence to the optimal state. Even though convergence to the optimal solution is usually not achieved in practical settings, Simulated Annealing is still one of the most widely used metaheuristic optimization methods.
Given an initial solution, a set of neighborhoods N i with associated probabilities p i , the starting temperature t max , minimum temperature t min , number of iterations per temperature w, time limit and iteration limit the version of Simulated Annealing we propose works as shown in Algorithm 2.
The pseudo code makes use of two functions Accept, standing for the acceptance criterion, and Cool-Off, defining the cooling schedule, which we discuss in the following: • Acceptance Criterion: We use the metropolis criterion as acceptance function, which was introduced in the original paper by [8]. The probability of acceptance P (i ⇒ j) of a move from solution i to solution j (for the case of minimization), with f (x) standing for the objective value of solution x, can be defined as follows: Algorithm 2: Simulated Annealing Data: initialSolution, neighbohoods N i with probabilities p i , t max , t min , iterations per temperature w, timeLimit, iterationLimit Result: a solution at least as good as initialSolution 1 currentSolution ← initialSolution; 2 bestSolution ← currentSolution; 3 t ← t max ; 4 while t ≥ t min and ¬ time limit reached and ¬ iteration limit reached do If the candidate solution j is at least as good as the current solution i, it is accepted unconditionally. Otherwise it is accepted with a probability which is decreasing exponentially as a function of the negative delta cost divided by the current temperature. That means, if a candidate solution is much worse than the current one it will be accepted with a lower probability than a solution which is just a little bit worse.
• Cooling schedule: The temperature is decreased during the search process by means of a cooling schedule which is usually a geometric row. In our case it depends on the cooling rate α and the iterations per temperature level w. The function Cool-Down() reduces the temperature after every w iterations by the following formula: We want to stress now briefly how α and w interact. Assuming we are given an iteration limit l, the initial temperature t max and the final temperature t min there exist many different options to reach t min after l iterations, namely all combinations of α and w such that t min = α n · t max where the number of temperature steps n = l w . Two examples of schedules following that formula with l = 30000, t max = 1 and t min = 0.001 are depicted in Figure 6. Please observe that for both options depicted in the figure the temperature at each time is approximately the same as the different step sizes and widths compensate each other. Therefore, it is sufficient to fix the cooling rate α when tuning the parameters of Simulated Annealing and let the cooling schedule be determined only by the variation of w. If we do not know the number l, we can also derive a formula which relates two cooling schedules (α 1 , w 1 ) and (α 2 , w 2 ) that have the same slope: Using this relationship one can construct alternative cooling schedules which decrease equally fast on average.

Experimental Evaluation
In this section we evaluate the practical contributions of our work and provide answers to the questions which have been raised. As the PLP is a new problem we initially elaborate on the problem instances and propose two instance generation procedures. Next we describe properties of the test set, define parameters and describe the processing environment. After that we turn towards the actual evaluation and look at the MIP model in detail. Ultimately the metaheuristic approaches are extensively evaluated.

Problem Instances
The problem we describe emerges from a real-life use case of our industrial partner which also provided us some data from the production system. In total we received 27 PLP instances which all have 20 periods, 4 to 8 product types and 79 to 1585 orders. This set of instances will be called from now on R 1 . As these instances do not suffice for a thorough evaluation and we do not want to restrict ourselves to their size, we designed also two random instance generation procedures which are described in the following.

Perfectly solvable instances
We devised a method of generating instances which allow for a perfectly balanced solution with zero cost, that we know from the construction process. That is, of course, a restriction of generality, but it is extremely useful as a means of evaluating the optimality gap for large instances which would otherwise be impossible as we have currently no way of solving them exactly with usual compute resources. Despite the existence of a perfectly balanced solution with no priority inversions the instances are still not easy to solve to optimality, at least not as long as you don't provide the information of perfect realizability to the solvers.
The instance generation process relies on the subroutine for random integer partitioning shown in Algorithm 3. It takes as arguments the integer to partition, the number of partitions and a minimum value for each partition. The main idea is to represent the number n as an array of n − k · minV zeros and then inserting k − 1 ones at random positions. In the resulting array an integer partition of the number n − k · minV into k parts can be found by looking at the number of zeros between every two neighboring ones. Finally we add minV to every element of the result array to obtain the requested partition with minimum value.
Algorithm 3: Integer partitioning algorithm Data: n, k, minV Result: An array with k integers whose sum equals n, each of which being ≥ minV 1 let array ← an array consisting of n − (k · minV) zeros; 2 Insert k − 1 ones into array at random positions; 3 let spaces ← number of zeros between the ones in array; 4 add minV to every element of spaces; 5 return spaces; Using this partitioning algorithm, Algorithm 4 defines the procedure for generating random instances with a fixed number of orders, periods and products. First the total number of orders is partitioned into one part for each period where each part has to have at least as many orders as we have products. This is important because every product needs to meet its target in every period in order to achieve an objective value of 0. The same thing is done for each period to decide upon the number of orders for each product type.
Next we draw the overall target value for the production volume (which is the same for each period) by taking the desired avgDemandPerOrder and multiplying with the average number of orders per period plus a random deviation of at most 10%. Then we partition that value into one part for each product, which is the demand for each product per period.
Finally we need to partition the demand for each product, which we decided upon in line 4, into the number of orders for each period and product which we calculated in line 2. The priorities must be chosen such that no inversion can exist, which is achieved by assigning each period a range of priority values decreasingly such that the ranges do not overlap, and choosing for each order randomly one of the allowed values. From this data the order and product type list can be built, which completes the instance. The optimal solution is known as well from the construction process.
Algorithm 4: Procedure for the creation of perfectly solvable instance Data: m, n, k, avgDemandPerOrder Result: A realizable instance with m products, n periods, k orders and the optimal solution 1 let ordersPerPeriod ← partition(k, n, m); 2 let ordersPerPeriodAndProduct ← partition(ordersPerPeriod[o], m, 1) for every order o; 3 let plannedDemand ← k·avgDemandPerOrder n ± 10%; 4 let plannedDemandPerProduct ← partition(plannedDemand, m, max(ordersPerPeriodAndProduct)); 5 let orderDemands ← partition(plannedDemandPerProduct[p], ordersPerPeriodAndProduct[t, p], 1) for every period t and product p; 6 let allowedPriorities ← for each period n a distinct set of priorities s.t. they decrease with increasing n; 7 let orderPriorities ← choose for each order one of the priorities which are allowed according to the period of the order; 8 build the list of orders and products and shuffle them; 9 assign random product names; 10 return a new solution from the list of orders and products and the optimal solution; Using Algorithm 4 we generated 1000 instances, sampling the parameters for each one independently as follows: The number of orders k is chosen from 100 . . . 4000, the number of periods n from 2 . . . 80, the number of products m from 1 . . . 20 and avgDemandPerOrder from 5 . . . 500. The resulting set of instances is subsequently called R 2 .

Random instances
We also devised a second instance generation procedure where the optimal solutions are not known by design and we can't even guarantee that there exists a feasible one, which is surely a more practice-oriented approach. The instances are designed to share some properties of the 27 realistic instances: • There exist only a limited number l k of different order demand values. This means we frequently see repeated orders which may have different priorities though.
• Orders of different products draw their demand data from different distributions. Whereas product a may have demand values between 0 and 1000, product b may have it between 0 and 5000.
• Sometimes there exist product types whose number of orders is smaller than the number of periods which implies that the demand for some periods will exceed the target while for others it must be zero.
The actual generation process is very simple. Given a number of orders k, periods n and product types m the algorithm works as follows: 1. Partition the number of orders k into m parts c 1 . . . c m .

Experimental Setting
The instances which are described above are split into training and test set so that the parameter tuning is not executed on the same instances as the validation. The test set consists of the whole set of realistic instances R 1 , 50 instances of R 2 , 50 instances of R 3 and all 10 instances in R 4 . Table 1 provides an overview of the instance sets and the way they were split.
We chose to build the test set out of four different instance types because we wanted to make sure that our algorithms can cope with different characteristics and sizes. The size distribution is shown by Table 2 which states for each instance parameter -k (number of orders), m (number of product types), and n (number of periods) -the minimum, maximum and mean value on each part of the test set. The smallest instances are R 4 , followed by the realistic instances R 1 . The instances coming from R 2 and R 3 are much larger on average as we want to evaluate also the scalability of our algorithms. The set of test instances is publicly available on the following web-page: https://dbai.tuwien.ac.at/staff/jvass/productionleveling.
We use the absolute-difference-based objective function (7) to produce all the subsequent results. The reason is that in our setting the advantages over the objective with squares outweigh the disadvantages, especially because it enables us to solve much more instances exactly. The evaluation of the metaheuristics could just as well be done using the quadratic objective function (11) but as we want to compare to the exact results we use formula (7) as well. Hard constraint violations are not part of the objective function but undergo a special treatment where possible by reporting the number of violated constraints as a separate number or separate plot. In some cases, e.g. statistical significance tests, we handle objective and constraint violations at once by adding them up. Due to the small magnitude of the objective a penalization factor for hard constraint violations is not necessary.
As indicated during the problem statement, we worked out default values for the weights of the objective function components a 1 , a 2 and a 3 in cooperation with our industrial partner, namely 1, 1 and 1 3 , respectively. All experiments of the evaluation are using this weighting.
Wherever nothing different is stated the algorithm parameters are defined as follows: • The greedy heuristic has only one parameter, r, for controlling the amount of randomness, which we set to 1 (i.e. deterministic) for all the experiments. The parameter is only necessary for some local search techniques like GRASP which would require a randomized construction heuristic. • For VND the move neighborhood is used first because it can be enumerated very quickly. Only when no improving move can be found any more the larger swap neighborhood gets employed. This ordering leads to a  Figure 7: Share of solution statuses of MIP for each subset of the test set. Optimal means proven optimal. Suboptimal implies that an integer solution has been found but it has not been proven that it is optimal. Unsolved means that within the time limit no integer solution has been found and it is thus unclear whether there exists a feasible solution at all. Infeasible means that the solver proved that no feasible solution exists. much quicker termination because the first neighborhood is searched much more often than the second. As the neighborhood exploration strategy we use Next Improvement with restart at the last position (as defined in Section 5.2.3), which showed at least equal performance to Best Improvement in preliminary experiments.

Evaluation of the MIP model
In this subsection we will examine the MIP model presented in Section 4 with respect to its empirical performance. We first break down the results by the different instance sets of which the test set is composed. Afterwards we will investigate how the instance size affects the solution quality.
First, we want to investigate how well each part of the test set can be solved using MIP. For a description of the different parts please refer to Table 2. Figure 7 visualizes the shares of optimally solved, feasibly but not optimally solved, infeasible and unsolved instances per group R 1 to R 4 . The most noticeable difference between the sets is that in R 2 and R 3 the vast majority of the instances are unsolved while for the other two most of the instances are solved (but still not proven optimal). Presumably, the reason for that is that most of the instances in these sets are very large. An interesting fact is, though, that about 10 % of the instances in R 2 could be solved to proven optimality but not a single one in R 3 even though the instance sizes of the two sets have been sampled from the same distribution. One potential reason for that could be that the instances in R 2 are designed to have optimal solutions with objective value 0. That should make optimality proofs easy for the solver once the optimal solution has been found because no part of the objective function can by negative.
For the instance sets R 1 and R 4 over 70% of the instances end up with some solution, which is not proven optimal. We want to investigate how good these solutions are and use for that purpose the relative optimality gap with respect to the best lower bound. It is calculated as the percentage corresponding to one minus the ratio between bound cost and incumbent cost. Table 3 shows the minimum, maximum and mean optimality gap as well as the standard deviation for all suboptimal solutions in R 1 and R 4 . With an average gap of only 3.96% the realistic instance set R 1 is solved really well, so that the MIP model might be usable in practice when the instances are not too large and a runtime of one hour is not an issue. On the other hand, R 4 has a low minimum and a high maximum gap as well as a large standard deviation. That means that the randomly generated instances are quite difficult to solve using MIP, even though the ones in R 4 are mostly smaller than the realistic ones.   Figure 8: Solution statuses of MIP on the test set, grouped by value ranges of instance features. Optimal means proven optimal. Suboptimal implies that an integer solution has been found but it has not been proven that it is optimal. Unsolved means that within the time limit no integer solution has been found and it is thus unclear whether there exists a feasible solution at all. Infeasible means that the solver proved that no feasible solution exists.
Finally, we investigate in more detail how the instance size correlates with the results of the MIP model. Figure 8 visualizes the solution statuses of all instances in the test set, grouped by the number of orders k, the number of products m and the number of periods n, from left to right. The most apparent relationship is a correlation between the number of orders k and the percentage of unsolved instances. While below 250 orders almost every instance has either been proven feasible or infeasible, the share of unsolved solutions increases drastically when increasing k. When looking at the middle and right-hand-side figure, we can see that the share of unsolved instances is also increasing with increasing number of periods and product types but it starts already quite high in the smallest bin. We can conclude that instances with 250 orders or less can be solved with a high probability by the MIP model, but there is no such bound which we could state on the number of periods or product types. While increasing n and m clearly complicates the problem, making them small does not automatically make the problem easy to solve.

Evaluation of metaheuristics
As we have seen, the MIP formulation is not suitable for solving large instances, which is why we developed local search methods for the PLP as well. In this subsection we will first deal with the automatic parameter tuning for Simulated Annealing and validate the claim that it is sound to fix the cooling rate. Then we analyze benefits and shortcomings of our two approaches VND and Simulated Annealing in detail on the basis of results on our test set, comparing also against the greedy heuristic. Thereafter we examine the sensitivity of Simulated Annealing to variations in the weighting of the neighborhoods. Finally we examine how close the metaheuristic solutions get to the global optima by using dual bounds obtained through MIP and the perfectly solvable instance set R 2 .

Algorithm Configuration
As described in Section 5.3, Simulated Annealing depends on parameters whose setting has a huge influence on the algorithm's efficiency and effectiveness. We deal with their configuration by means of Sequential Model-based Algorithm Configuration (SMAC), an automatic algorithm configuration tool written in python. It relies on Bayesian Optimization in combination with an aggressive racing mechanism in order to efficiently search through huge configuration spaces [10].
We applied SMAC to tune the parameters of Simulated Annealing as it was presented above. The set of instances which was use for the tuning can be found in the column Training Set Selection of Table 1. The parameter optimization was executed for 24 hours on 24 cores in parallel. We used a time limit of five minutes per run and no iteration limit. The cooling rate was not tuned but set to a value of 0.95, which is not a restriction of generality as long as the number of iterations per temperature can still be adjusted (see Figure 6). This claim will be verified in a separate experiment later on.  We tuned the initial temperature t max , the number of iterations per temperature w and the probability p that the move neighborhood is used to generate the next random move (hence 1 − p is the probability of the swap neighborhood).
Tuning the minimum temperature t min is not necessary because the results cannot get worse when Simulated Annealing is run until the time limit instead of aborting when the minimal temperature is reached. Indeed, preliminary results showed that setting the minimum temperature to zero instead of using the tuning results of SMAC improves results to a small but significant extent. The configuration space with minimum and maximum values as well as the defaults and the tuning result is shown in Table 4.

Experiments about fixing the cooling rate
We claimed in Section 5.3.2 that the cooling rate α could be set to a constant value because it was redundant as long as the number of iterations per temperature w is free. During algorithm configuration we did exactly that and set α ← 0.95. Now we want to verify this claim by means of an experiment. We derive four more cooling schedules from the one defined by the result of parameter tuning whose temperature profile follows the same slope. Then we benchmark each configuration on the whole test set ten times with different random seeds and take the median of the objective values and number of constraint violations for each instance.
In Section 5.3.2 we already introduced an equation which allows to derive cooling schedules with equal average slopes. We selected the alternative cooling rates 0.5, 0.75, 0.9 and 0.99 and computed the associated values of w. A summary of the resulting cooling schedules is shown in Table 5. Figure 9 shows on the left hand side a box plot for each of the schedules, each of them plotting the median objective value resulting from the derived schedule divided by the median objective value resulting from the original schedule, per instance. The original schedule does, of course, not differ from itself, while the other schedules bring an improvement for some instances and worse results for others. For the three higher values of α more than 50% of the data is in the range ±1% and nearly all the rest in ±2% (except for a couple of outliers outside the plotting range). The median shown by the box plots of the schedules α = 0.9 and α = 0.99 is almost exactly at 1 whereas the other two variants have median differences slightly higher than one and the whiskers reach farther, although the differences are still very small. The right hand side of Figure 9 visualizes the same for the number of hard constraint violations, except that we do not divide but take the difference between the alternative schedules and the original one because the interpretation is more intuitive in this case. The whole box plot except for some outliers is zero which means that for most of the instances there is no change in the number of hard constraint violations. However, there are a few outliers going down to -1 which means that for these instances the alternative schedules have one violation less. Compared to the 137 instances of the test set the number of outliers is very small though.
In order to clarify whether the differences which we see in the median of α = 0.5 and α = 0.75 are statistically significant or not we conducted a Wilcoxon signed-rank test between the original schedule and each of the derived ones. In order to take also the hard constraint violations into account their number was added to the objective value as a penalty. The null hypothesis was that the median of the differences is zero and the alternative that it is different from zero. We want to reject the null hypothesis in case we find a p-value which is smaller than 0.05. The result of the tests yielded a p-value of 2.6 · 10 −8 for α = 0.5, 0.041 for α = 0.75, 0.966 for α = 0.9 and 0.26 for α = 0.99. That implies that we must reject the null hypothesis for the schedules with α ≤ 0.75. For the other two schedules the statistical test does not let us reject the null hypothesis that the differences which we see are the result of chance.
To sum up, the claim that we can change alpha without changing the result is valid in our setting as long as α is set high enough (i.e. in our experiment at least 0.9). That implies that we were on the safe side when fixing it to 0.95 during parameter tuning. For values α ≤ 0.75 there exists a tiny but statistically significant difference which corresponds to a median objective value which is about 0.1%, above the one of the default configuration.

Comparison of metaheuristic techniques
We want to compare now the presented heuristic and metaheuristic techniques for solving the PLP, namely the greedy algorithm presented in Section 5.1, and VND and Simulated Annealing which have been presented in Section 5.3. Therefore, these three algorithms were benchmarked on the test set with the usual settings. For Simulated Annealing we conducted 10 runs to account for randomness in the search process and aggregated the runs by taking the median value of each measure. In order to be able to compare objective values and hard constraint violations visually, we report for each instance the difference to the best solution we ever obtained using any method and time limit. 1 The result is shown by Figure 10: To the left one can see the objective values of the three approaches. The median difference between the greedy heuristic's objective values and the best known ones is about 0.07. Furthermore, we can see in the center figure that a considerable number of instances could not be solved without hard constraint violations by the Greedy heuristic. Expressed in numbers, that's the case for 63 out of 137 instances or some 46%. Compared to the Greedy, VND delivers solutions with much better objective values and fewer constraint violations, but there are still 33 solutions or 24% where at least one capacity constraint is violated. Simulated Annealing achieves the best median objective value of all methods and furthermore the fewest instances with constraint violations (22 out of 137, 16%). The non-overlapping notches of the box plot in the left figure indicate that the difference to VND is significant. The rightmost plot shows the solving time of the different methods in seconds. The greedy heuristic needs always less than a second of time. VND is also mostly fast, because the search continues only until a local optimum w.r.t. all neighborhood structures is found, which takes long only for the very largest of our instances. Simulated Annealing uses always the complete available time because we don't stop at a minimum temperature in order to maximize the solution quality.
In the supplementary materials a table with the objective value and the number of hard constraint violations for each instance of the test set for all reported algorithms can be found.

Sensitivity to neighborhood weightings in Simulated Annealing
Before selecting the next random move in the Simulated Annealing algorithm the neighborhood is chosen randomly according to some weighting. This weighting has been tuned by SMAC, resulting in a probability of 0.4 for the move  neighborhood and 0.6 for the swap neighborhood. The tuning progress of SMAC revealed quite large fluctuations in this weighting. Therefore, we conduct a sensitivity analysis in order to find out what impact different weightings have on the results.
We evaluate 6 different weightings, one of which is the result of SMAC. The probability p for the move neighborhood in the 6 scenarios ranges from 0 to 1 in steps of 0.2 and the probability of the swap neighborhood is the complementary probability 1 − p. Each configuration is executed on the test set 10 times with the usual time limit of five minutes. The runs are again aggregated using the median. Figure 11 shows to the left a box plot for each of the alternative weightings, each of them plotting the associated objective value divided by the objective value of the original weighting. The labels 'x -y' mean that the move neighborhood has weight x and the swap neighborhood has weight y. The objective value gets worse in the extreme cases which can be seen because the leftmost and the two rightmost boxes lie completely above the dashed line. The other cases are practically equal which means that the objective value does not change compared to the reference weighting.
In the right plot of Figure 11 the difference of the number of hard constraint violations to the respective number of the reference weighting '40 -60' is shown. All boxes are completely flat contained in {0} which means that the inter-quartile range is equal for all weightings. The outliers show that a few instances of the extreme configurations have one or two more hard constraint violations more than the reference configuration, while the weightings '20 -80' and '60 -40' have tendentially fewer. However, compared to the number of 137 instances contained in the test set the amount of outliers is very small, so that the significance of this difference must be doubted.
To sum up, the tested neighborhood weightings between '20 -80' and '60 -40' are equally good, therefore it is logical to assume that the untested weightings in between are good as well. Using one of the other weightings leads to worse results, especially in the case where only the move neighborhood is used.

Optimality gap of metaheuristic solutions
An interesting question during the evaluation of metaheuristic techniques is by how far the solutions deviate from the optimal solution. As stated previously, MIP solvers measure this property in terms of the optimality gap, which is calculated by taking one minus a lower bound divided by the objective value. In the following we assess how large the optimality gap of the metaheuristic solutions is which can be done by using the lower bounds obtained though MIP. However, as only a small fraction of the instances could be solved well enough that this kind of evaluation makes sense, we present afterwards also an analysis based on the randomly generated instances with known optimal solutions.

1.
Optimality gap for small instances: We want to analyze the optimality gap of the solutions produced by our metaheuristic approaches. Therefore, we use the best dual bound found by the MIP solver. In order to get even better bounds, we executed the solver again with a time limit of 10 hours and used for each instance the best available bound. We restrict this evaluation to the instances in R 1 and R 4 because they are the only sets where the instances are small enough so that we could obtain mostly good bounds. Furthermore, we select the subset of instances whose optimality gap of the best MIP solution is below 10% because we can only assume safely that the dual bound is good if the gap is small. This step eliminates 8 instances of R 1 (30%) and 4 of R 4 (40%), which means that 25 instances remain for our evaluation. By using the best bound the optimality gap for each of the metaheuristic approaches is calculated on the selected instances. Figure 12 shows the optimality gap for each instance in the reduced set for the greedy heuristic, VND, Simulated Annealing and MIP. For solutions, which are not valid because of constraint violations, no mark is shown. The figure conveys, that the solutions found by the greedy construction heuristic have gaps between 10% and 25% and a considerable number of instances is not solved to feasibility at all. VND already achieves drastic improvements by solving all instances but four with a maximum gap of 10% and mostly around 5%. Simulated Annealing is clearly the best metaheuristic for this restricted set of small instances, as it produced always valid solutions which have a similar optimality gap as the MIP solutions and in several cases even better. The gap is most almost always below 5% percent and with an average of about 3%. The resulting numbers state how large the relative difference between metaheuristic and optimal solutions is at most 2 . This result is interesting because it proves that our Simulated Annealing approach solves the majority of the small instances (which includes most of the realistic instances) extremely well. On average the solutions are at most 3% above the optimal one. 2. Comparison on large randomly perfect instances R 2 : The instances in R 2 have been constructed in a way that the optimal solutions are known and have the objective value 0. This enables us draw some conclusions about the optimality gap also for larger instances than those for which we can obtain bounds using MIP. However, the optimality gap as defined above cannot be computed for the instances in R 2 because the best known bound is zero which would lead to a division by zero. Instead, we can say something similar by looking at the objective function. The first objective function component g 1 , defined in Equation (4) states, informally speaking, the gap to a hypothetical perfectly leveled solution averaged over all periods. In case of the instances R 2 this hypothetical solution actually exists and the value g 1 can be interpreted as the average percentage by which planned demand for each period exceeds or falls short of the target. The same argument holds for g 2 , defined in Equation (5), which states average percentage by which planned demand for each period exceeds or falls short of the target, averaged over the different products. The objective component g 3 , defined in Equation (6), can be interpreted as the percentage of actual priority inversions measured against the theoretical maximum number of inversions k·(k−1) 2 . Table 6 shows the values of g 1 , g 2 and g 3 multiplied by 100, so that they can be interpreted as percentages, as explained above for each each algorithm and for each instance of R 2 where a valid solution has been reached. The fourth column reports the percentage of valid solutions. We can see that Simulated Annealing and VND reach negligible mean deviations for the first two objectives. However, also the greedy heuristic produces levelings which deviate from the target by only 2% on average, so we can assume that this part of the task is not very hard for this instance set. With respect to the priority objective the results leave some more room for improvements. The greedy heuristic reaches the best value here (at the cost of fewer valid solutions and a worse leveling), followed by VND and Simulated Annealing. The best total results are clearly reached by randomly_generated_small_0003 randomly_generated_small_0004 randomly_generated_small_0005 randomly_generated_small_0006 randomly_generated_small_0009 randomly_generated_small_0010 realistic_instance_01 realistic_instance_02 realistic_instance_03 realistic_instance_04 realistic_instance_05 realistic_instance_06 realistic_instance_08 realistic_instance_09 realistic_instance_10 realistic_instance_11 realistic_instance_12 realistic_instance_13 realistic_instance_15 realistic_instance_16 realistic_instance_17 realistic_instance_18 realistic_instance_19 realistic_instance_20 realistic_instance_21 The evaluation contains all instances of R 1 and R 4 which could be solved with an optimality gap of 10% or lower using MIP Table 6: Results for metaheuristic methods on R 2 . The values of the objective function components g 1 , g 2 and g 3 multiplied by 100 so that they can be interpreted as percentages for each each algorithm for each instance of R 2 where a valid solution has been reached. The rightmost column states for how many percent of the solutions each algorithm reached a valid solution. g 1 (%) g 2 (%) g 3 (%) valid (%) VND for this instance set. It is not entirely clear why the results on R 2 differ so much from the results which are obtained on the other parts of the test set but we suspect that it might have something to do with that the greedy heuristic finds very good initial solutions on this instance set. That might help VND more than Simulated Annealing because that latter starts with a random search which destroys some of the good structure which was already there.
The above analysis of the optimality gap on small, realistic instances and the larger perfectly solvable ones revealed that our metaheuristic methods produce solutions which are only few percentage points away from the optimum. When taking also the comparison of the different metaheuristics on the whole test set into account, which was presented in the previous section, we can conclude that Simulated Annealing is the overall best of our algorithms for the PLP because it can solve small instances in an excellent way and still scales to the largest instances which we have.

Conclusion
We introduced a new combinatorial optimization problem in the area of production planning, which concerns the assignment of orders to production periods. Thereby a number of production capacity constraints need to be fulfilled and a work balancing objective as well as the prioritization of the orders must be optimized.
We started with modeling the problem formally and provided a proof of N P-hardness. Then we turned towards solution methods and introduced a new MIP formulation. Finally we investigated local search methods and defined two neighborhood structures for the problem, which we evaluated using VND and Simulated Annealing.
The main results of this work are: • The PLP is N P-hard, which was shown by an N P-completeness proof of the associated decision problem via a reduction from Bin Packing.
• The proposed MIP model solves instances up to medium size. The complexity of solving grows with every dimension of the problem, but most notably with the number of orders k. For instances with less than 250 orders we can expect either a feasible solution or the proof of infeasibility within one hour of time, while we cannot count on finding any solution for instances with k ≥ 300.
• With Simulated Annealing very good solutions can be obtained within five minutes. The real-life instances can all be solved well and for most of them we can show though the use of dual bounds that the solutions are within 3% of optimality on average. Experiments based on the set of instances with perfectly leveled solutions indicate that Simulated Annealing is capable of providing really good solutions also for much larger instances.
• Another metaheuristic option studied in this work is VND. While the objective value and the number of hard constraint violations is a bit higher on average compared to Simulated Annealing, it still finds good solutions most of the time. The instances with perfectly leveled solutions R 2 are even solved better by VND than Simulated Annealing.
• An experiment regarding the weighting of the two neighborhoods in Simulated Annealing showed that it is clearly advantageous to use both neighborhoods instead of either of them alone. The best weighting between the move and the swap neighborhood is between '20 -80' and '60 -40'.
Future work could include improvements of the MIP model, e.g. by decomposition based techniques, so that more instances can be solved optimally or better bounds are obtained. That would open up ways to assess the quality of metaheuristic solutions of large instance and to see more clearly where they fall short.