Sensitivity analysis of combinatorial optimization problems using evolutionary bilevel optimization and data mining

Sensitivity analysis in general deals with the question of how changes in input data of a model affect its output data. In the context of optimization problems, such an analysis could, for instance, address how changes in capacity constraints affect the optimal solution value. Although well established in the domain of linear programming, sensitivity analysis approaches for combinatorial optimization problems are model-specific, limited in scope and not applicable to practical optimization problems. To overcome these limitations, Schulte et al. developed the concept of bilevel innovization. By using evolutionary bilevel optimization in combination with data mining and visualization techniques, bilevel innovization provides decision-makers with deeper insights into the behavior of the optimization model and supports decision-making related to model building and configuration. Originally introduced in the field of evolutionary computation, most recently bilevel innovization has been proposed as an approach to sensitivity analysis for combinatorial problems in general. Based on previous work on bilevel innovization, our paper illustrates this concept as a tool for sensitivity analysis by providing a comprehensive analysis of the generalized assignment problem. Furthermore, it is investigated how different algorithms for solving the combinatorial problem affect the insights gained by the sensitivity analysis, thus evaluating the robustness and reliability of the sensitivity analysis results.


Introduction
Combinatorial optimization problems arise in a variety of practical applications, most of which are subject to constraints, such as resource limitations or organizational and regulatory requirements [24,26]. Analyzing the behavior of an optimization model and answering the question of how changes in constraints and other model parameters affect the optimal solution are the main focus of sensitivity analysis (SA) [13,14].
In linear optimization, SA is well established and an important part of the decisionmaking process [11,15]. In the domain of combinatorial optimization, however, approaches to SA are model-specific, limited in scope and not applicable to practical optimization problems (see [1,10,19] for recent works and reviews on SA in combinatorial optimization). To be useful in practice, an SA approach should at least be independent of the optimization problem under consideration and algorithm used to solve the problem, as well as allow analysis of simultaneous independent changes in multiple parameters [14].
To overcome the existing limitations for SA in combinatorial optimization and to provide decision-makers with deeper insights into the behavior of the optimization model, Schulte et al. [29] developed the concept of bilevel innovization (BLI). The concept was originally demonstrated in the context of personnel scheduling by analyzing how variations in the number and qualifications of employees and the use of different shifts affect the objective function value. More recently, BLI has been proposed as general approach to SA on constraints in evolutionary computation [31] and most recently to SA in combinatorial optimization [32].
In previous works on BLI, the problems under consideration were analyzed only briefly and exemplarily. Moreover, each problem was analyzed using only a single algorithm for solving the combinatorial problem. This makes it impossible to draw conclusions about the extent to which the insights gained by the SA depend on the corresponding algorithm. Therefore, the objective of this publication is twofold. First, building on the most recent work on BLI [32], we provide an in-depth analysis of the generalized assignment problem (GAP) to illustrate the capabilities of BLI in gaining insight into the behavior of an optimization model and supporting decision-making. Second, we investigate how different algorithms for solving the combinatorial optimization problem affect the results of the SA. To this end, experiments are conducted using three types of algorithms, specifically, an exact branch-and-bound algorithm, a simple greedy heuristic, and a Genetic Algorithm (GA). Moreover, for the GA, we use two different configurations resulting in different levels of solution quality. Thus, in total four algorithms with different levels of performance in terms of solution quality are analyzed.
The remainder of this paper is structured as follows: In Section 2, the general concept of bilevel innovization is presented. In Sections 3 and 4, the single steps of the BLI process are demonstrated by the example of the generalized assignment problem. Here, Section 3 focuses on the bilevel optimization aspect of BLI, whereas Section 4 illustrates the application of visualization and data mining methods within the BLI process. In Section 5, we compare the results of the SA by using the data generated by the four different algorithms. Conclusions and suggestions for further research are presented in Section 6.

Bilevel innovization
Bilevel innovization comprises the concepts of evolutionary bilevel optimization [33,34] and innovization [7], an SA approach in the domain of evolutionary multi-objective optimization. In general, the idea of BLI is to repeatedly solve the combinatorial problem under consideration with different model parameters, e.g., different constraint coefficients of the corresponding (mathematical programming) formulation, (data generation) and subsequently analyze the resulting changes in the objective function value (data analysis) (see Fig. 1). The use of BLI for SA in combinatorial optimization is based on the core assumption that there exist trade-offs between the optimal solution value and the constraints (e.g., resource limitations). Therefore, the task of SA is formulated as a multi-objective bilevel optimization problem with the conflicting objectives of the optimal solution value for the combinatorial problem on the one hand and the constraint values on the other hand. In terms of bilevel optimization [5,23], the multi-objective problem represents the upperlevel problem and the combinatorial problem under consideration the lower-level problem. Here, the lower-level problem is provided parameters (e.g., constraint coefficients) by the upper-level problem, which in turn receives the lower-level objective function value to solve the corresponding multi-objective problem regarding the trade-off between optimizing the combinatorial problem and tightening its constraints. For solving the upper-level problem, an evolutionary multi-objective optimization algorithm is used. One core strength of BLI is that for the lower-level problem any kind of algorithm which is suitable for solving the combinatorial problem can be used. This is made possible due to the nested structure of the upper-level algorithm, which treats the lower-level problem as black-box and only provides the parameters to be analyzed and in turn receives the objective value of the lower-level problem. For more details on the nested evolutionary algorithm in general we refer to [34] and for a practical implementation to [30].
The first part of the BLI process shown in Fig. 1 serves the purpose of generating the data set for the subsequent analysis by repeatedly solving the investigated bilevel optimization problem. The data analysis part is based on the visual analytics process [18] and aims to gain insights into the model behavior and to extract rules that support decision-making regarding possible modifications of parameters (e.g., resource allocation) by using data mining and visualization techniques. The data sets generated in the bilevel optimization process are comprised of the evaluated individuals of the evolutionary multi-objective algorithm at the upper-level. Each individual (further also referred to as solution) is a vector containing the objective function value of the lower-level problem (output data) as well as the decision variables of the upper-level problem (input data). Since the data analysis is based on the Fig. 1 Bilevel innovization process [29] visual analytics process, it can be viewed as an iterative loop of data processing (see data generation), visualization (e.g., scatter plots or boxplots), and the application of data mining methods (e.g., regression or decision trees). The first step is to visualize the output data and to identify a region of interest for deeper analysis, for example by zooming or filtering the data in a selected region around the Pareto-optimal front. The filtered data can then be visualized again and segmented into different areas of interest. This can be done, for example, manually, by applying a clustering algorithm, or based on the distance from the Pareto-optimal front. These target areas provide the basis for further analysis. Subsequently, data mining methods can be applied to the target areas and the results can be visualized again. Each of the aforementioned steps can lead to insights, which in turn can be used to start the data generation process in any step (e.g., adjusting the lower-level model, adding or removing decision variables, selecting a new upper-level algorithm, performing further optimization runs, or creating new data fields). For a detailed step-by-step description and exemplary application of the BLI process, we refer to [31,32].

Data generation
The combinatorial optimization problem considered for SA here is the GAP, which is concerned with the optimal assignment of n tasks to m agents and can be formulated as follows [25,27]: subject to: where c ij is the cost and r ij the capacity requirement of assigning task j to agent i. Each task j has to be assigned to one agent i (1b) and one agent can be assigned multiple tasks while not exceeding the agent's capacity b i (1c). Whether a task j is assigned to agent i is indicated by the binary decision variable x ij (1d). The objective is to minimize the overall assignment cost of tasks to agents (1a). For more details on the GAP we refer to [25,27]. For solving the GAP, in total three different lower-level algorithms are used. First, an exact algorithm is applied using the standard branch-and-bound [20] implementation of the open-source MILP solver Cbc [17]. Second, we use a simple greedy heuristic, which iteratively considers all tasks that have not yet been assigned. Here, in each iteration, for each task j the cost c ij of being assigned to each agent i who still has available capacity for the considered task is calculated and sorted in ascending order. Then, for each task, the difference between the two lowest cost assignments is calculated as a measure of regret. Finally, the task with the highest regret value is assigned to the agent with the lowest cost. These steps are repeated until all tasks have been assigned or no more agents with free capacity are available [3,21,28]. Third, a GA described in [4] is applied. The algorithm uses integer encoded chromosomes, where the length corresponds to the number of tasks and the integer values represent the assigned agents. In addition, we implemented a stopping criterion based on the difference between the current objective value and the average of the best objective values over the last t last generations. The GA stops when the difference is less than or equal to a predefined threshold [12,16]. However, the stopping criterion is not activated until a feasible solution has been found.
The upper-level problem corresponds to the trade-off decision regarding cost and constraints. Here, the issue to be analyzed is how variations in the resource constraints (see (1c)) affect the overall assignment cost (see (1a)). Therefore, the first part of the objective function (3a) of the multi-objective upper-level problem corresponds to the overall capacity of all agents (2) and the second part corresponds to the overall assignment cost that is obtained from solving the lower-level problem (3c). The value range of the upper-level decision variables b i is limited by predefined upper (UB) and lower (LB) bounds (3b). For further information regarding the upper-level algorithm and decision variables, we refer to [32].
For the actual data generation, we used the benchmark data sets for the GAP presented in [4]. Here, we chose the first instance of the set "gapa" with m = 5 agents and n = 100 tasks. The selection was arbitrary, however, experiments on other instances showed similar behavior. For the upper-level evolutionary multi-objective algorithm, a population size of 80, a generation number of 100 and 30 restarts, each with a random initial population, were chosen. The lower bound of the decision variables b i is set to zero, the upper bound to the maximum of all constraint values required to assign all tasks to one agent. Moreover, a stopping criterion was implemented that measures the convergence based on how many nondominated individuals of the current iteration dominate those of the previous iteration [22]. All 30 restarts of the upper-level were repeated for each of the lower-level algorithms. For the MILP solver a time limit of 10 seconds was used. Experiments on the actual benchmark instances with larger time limits did not yield results where the improvements justified the considerable larger computation times. The greedy heuristic did not require any further configuration. For the lower-level GA, the configuration used in [4] was applied, with a population size of 100, a mutation rate of 2/v, with v being the number of genes of the encoded lower-level individual, and a random initial population. For the stopping criterion, we used two different configurations to simulate algorithms with different performance. While both algorithms use a threshold of = 0.005, the better performing algorithm (further referred to as GA 50) uses t last = 50 and the worse performing algorithm (GA 15) t last = 15. However, to ensure the algorithm termination, a maximum number of generations of 250 was selected.
The data sets to be analyzed comprise all evaluated upper-level individuals of the 30 optimization runs S. For each lower-level algorithm, an individual data set is considered. Each individual or solution is composed of the lower-level objective function value in (1a), the overall capacity of all agents (2) as well as the m = 5 resource constraint values or upper-level decision variables b i (see (1c)). Furthermore, we created an additional attribute which serves as efficiency indicator for the different solutions (4). This indicator measures the euclidean distance of each solution s ∈ S to the nearest Pareto-optimal solution p ∈ P , with the vector f d representing each upper-level objective in a d-dimensional objective space (see [9] for more information on this efficiency measure). Here, a 2-dimensional objective space is considered with f 1 corresponding to the lower-level objective function value in (1a) and f 2 corresponding to the sum of all right-hand side constraint values (2). In order to measure the distance, P was created by combining all non-dominated solutions from the 30 restarts and calculating a new "combined" Pareto-optimal front (see Fig. 8). In addition, the attribute of relative constraint values b i rel is introduced (5), which measures the proportion of an individual resource constraint b i on the total capacity of the considered solution. Finally, we removed all upper-level individuals with identical decision variables. For the experiments with the lower-level GA, we kept those duplicated solutions with the best objective function value.

Data analysis
In this analysis, the data set obtained by the exact lower-level algorithm is used. A comparison between the SA results of the different lower-level algorithms is shown in Section 5. Without considering duplicate solutions, in the 30 optimization runs a total of 111,202 upper-level individuals were evaluated and solutions were generated, respectively. For 1,654 of these individuals, no feasible solution of (1a)-(1d) was found, i.e., the constraints were set too tight or the time limit was too short. The left plot in Fig. 2 shows all 109,548 feasible as well as the 2,918 Pareto-optimal solutions (in the following also referred to as efficient solutions) of the different optimization runs. To narrow the analysis and to focus on efficient trade-offs, we further filtered the data by determining cut-off points at each dimension of the objective space. The capacity cut-off was chosen by first selecting all solutions with minimum cost value (as expressed by (1a)) and subsequently selecting the smallest capacity value (as expressed by (2)) within the remaining solutions. The same procedure was applied for the cost cut-off point. These (edge) cut-off points especially aim at removing solutions with the smallest possible cost value but larger capacity values along the y-Axis. The capacity and cost cut-off values were set to 1,531 and 3,175, respectively. As a result, 70,552 solutions remain in the area of interest for subsequent analysis (see Fig. 2 right). By visualizing the output data, we can already see that, whilst retaining feasible solutions, the minimum cost of 1,693 can be achieved with the maximum required capacity of 1,531. Also, a total capacity of at least 809 must be met to maintain feasible solutions with a maximum cost of 3,175.
In order to perform a more detailed examination of the input data, we further segmented the selected area of interest into multiple target areas. Here, we used a combination of the attribute eff iciency distance, introduced in (4), and a clustering algorithm. For the three efficiency based target areas TAP1 to TAP3 (see Fig. 3 top left), the attribute eff iciency distance was normalized to a 0-1 scale and (6) was applied. For the clustering based target areas TAC1 to TAC4, a k-means clustering algorithm was used with k = 4 and with capacity and cost as feature variables (see Fig. 3 top right). The number of clusters was selected based on the number of desired target areas. The clustered target areas allow an analysis with respect to the different objectives, e.g., low-cost and high-cost solutions.
In the further course of our analysis, we focus on the trade-off between cost and capacity. Therefore, we chose the clustered target areas as basis. Furthermore, we focus on the most efficient solutions. Thus, only solutions in the efficiency based target area TAP1 are analyzed. Figure 3  The analyzed trade-off between cost and capacity can also be considered as a trade-off between objective function value and resource utilization in the solution of a GAP instance. Therefore, target areas with a good objective function value with respect to the GAP, i.e., low cost, and high capacity are further referred to as low cost areas. Consequently, TA1 is the lowest and TA4 is the highest cost area. In terms of efficiency, TA1 to TA4 are considered equivalent. It should be noted, however, that in terms of SA low cost areas are not superior to high cost areas because the former require greater capacity. Hence, a decisionmaker may be equally interested in analyzing low and high cost areas, depending on their preference.
To get a better understanding of how the model is affected by changes of its input parameters, the input data (or decision variables) are subsequently visualized. Figure 4 shows the average values of each capacity constraint within the different target areas, both the absolute constraint value b i (Fig. 4 top) as well as the relative constraint value b i rel (Fig. 4  bottom). The average values are calculated by the sum of all b i or b i rel divided by the number of solutions in the corresponding target area. What can be seen from the absolute constraint values is that all target areas TA1-TA4 have similar curve shapes for constraints b1 to b4. The distances between the average absolute constraint values (i.e., average capacity) within the individual constraints can be explained by the positions of the target areas in Fig. 3, since the cost of a target area is generally driven by the overall capacity. However, in the lowest cost area TA1 we can see a strong increase in b5 compared to TA2, TA3 and TA4. This indicates that if the primary interest is to achieve low cost, i.e., a good lower-level objective value, the emphasis should be on increasing b5. The parallel coordinate diagram with the relative constraint values emphasis the equally low proportion of b1 across all target areas. Moreover, it shows that the higher cost target areas TA3 and TA4 have an equally low relative constraint value of b5. Also, high proportions of b3 and b4 are required for a solution to be located in these target areas. In general, both the absolute and relative constraint values provide useful insights. The absolute values show which amount of each resource is can be used to analyze how saturated each constraint is. For instance, if a constraint is not saturated (i.e., the ratio is low), one might decide that the corresponding b i can be lowered, which would be particularly useful in areas with (too) large capacity, such as TA1 and TA2. As was just shown, at this stage some conclusions can already be drawn about how the model is affected by changes of its input parameters.
In the next step, we focus on gaining deeper insights into the structure of TA1. For this purpose, we apply a k-means clustering algorithm on the constraint vector y = (b1, b2, b3, b4, b5). Based on a silhouette score analysis, the number of clusters was set to k = 2. Figure 5 shows the two resulting clusters TA1C1 (8,706 solutions) and TA1C2 (5,702 solutions). It becomes apparent that the primary difference of the two clusters is in b5 and to a smaller extend in b1. To get more detailed information about the distributions of the individual constraints, Fig. 6 shows the distribution of the absolute (top) and relative (bottom) constraint values. In contrast to the previous assumption of a general high constraint value of b5 based on Fig. 4, the cluster reveal that also a rather even distribution of the different constraints is possible. However, if one would favor a large proportion of b5, this should be achieved due to a lower proportion of b1. Now that there is an idea of which constraints influence the cost of the solutions, the next step is to obtain more precise rules that support decision-making regarding the design or configuration (e.g. allocation of additional resources) of the constraint values. In order to identify relationships between the decision variables and the output of the lower-level model, i.e., cost, as well as to extract rules to support decision-making, in this study we applied a decision tree algorithm in combination with Generalized Pythagoras Trees (GPT). GPT allow the compact visualization of decision trees and the extraction of rules and are particularly powerful when combined with interactive visualization [2]. To train the decision tree model the elements of the constraint vector y = (b1, b2, b3, b4, b5) were used as feature variables. In addition, three different sets of target areas S1 = {T A1, T A2, T A3, T A4}, S2 = {T A1, T A2} and S3 = {T A3, T A4} were created and set as classification targets. The classification studies were modeled as binary prediction problems, where the model is trained to predict the label T rue if a solution is inside the considered set of target areas. For example, for classification target S1, a solution would receive the label T rue if it belonged to either TA1, TA2, TA3, or TA4. Both the decision tree algorithm and the GPT were run using Orange, a Python-based open source toolbox for interactive visualization and data mining [8].
For each of the target area sets S1 to S3, an individual classification study was performed, each resulting in its own set of rules R1 to R3. The corresponding rule sets are shown in Table 1 and indicate that one is likely to end up in a particular set of target areas if the constraint coefficients are set as in the rows of the corresponding rule. For instance, R1 can help determine coefficients that lead to solutions that are generally close to the Pareto-front, while R2 would help determine good constraint coefficients b i that lead to solutions that are either in TA1 or TA2. Looking at R1, for example, it can be seen that there is a wide range of possible constraint values for b1, b2, and especially b5, while the ranges for b4 and especially b3 are more narrow. Moreover, the value ranges of b3 and b4 are quite similar in the lower and higher cost regions (R2 and R3). This suggests that to achieve an efficient setting, one should focus primarily on b4 and especially b3. Furthermore, b5, although seemingly having little effect on Pareto optimality, appears to have a large impact on the transition of high to low cost areas, as already indicated in Fig. 4. It is important to note that, since the data set is limited to these values, the rules imply the previously mentioned minimum and maximum capacity of 809 and 1,531, respectively.
When selecting the rule sets, in this study a special focus was put on precision to give the decision-maker a high level of confidence that their parameter configuration is within the desired target area. Hence, the most strict rules at the end of the GPT were chosen. Table 2 and Fig. 7 show the confusion matrix results of the different rule sets. The number of true and false positives shows that for all rules precision is almost at 100%. However, similarly for all rules there is also a large number of false negatives. Depending on the actual study at hand, the rules can be chosen less strict to achieve a desired balance of true positives and false negatives.
Finally, we used the obtained rules to give more precise information about the influence of the constraint values on the cost. In doing so, we trained a linear regression model on the solution subsets corresponding to the three rule sets shown in Table 1. For the linear model we used a least squares linear regression, with the constraint values as input features   Table 3 summarizes the results of the linear regression. The p-values are not shown because they were close to zero and R 2 for all rule sets is between 0.9 and 0.95. For the rule set R1 50,743 solutions were used to build the regression model, for R2 9,701 solutions and R3 9,421 solutions. All displayed values are rounded to two decimal digits.  The resulting linear model allows to give detailed information on how the cost are affected by variations of the constraint values, under the assumption that the underlying rules are met. The regression coefficients of the constraints shown in Table 3 can help the decision-maker to prioritize resource utilization. For rule set R1, for instance, the coefficients show that b4, b3 and b2 have the largest impact on cost reduction and thus should be fully utilized before increasing b5 and b1 (as long as the appropriate rules are followed).

Comparison of lower-level algorithms
In this part of our study, we investigate how different lower-level algorithms affect the results of the SA. This is particularly important to assess the robustness and reliability of the SA and thus the confidence it should be given. As described in Section 3, all experiments were repeated with four different algorithms, each of which yielding its own data set. Table 4 shows a summary of the upper-level optimization runs. Except for the greedy heuristic, all lower-level algorithms result in similar upper-level algorithm behavior in respect to number of iterations as well as the percentage of non-dominated and infeasible solutions. The combined Pareto-fronts, as described in Section 3, nicely show the expected difference in lower-level algorithm performance (see Fig. 8). Most pronounced here is the fact that despite the large number of overall evaluated solutions, the greedy heuristic only has very few solutions on its combined Pareto-front. The selected area of interest (see Fig. 9) Fig. 8 Combined Pareto-fronts of the upper-level algorithms Fig. 9 Comparison of the filtered objective space emphasises this issue. It becomes apparent that the heuristic fails to find feasible solutions, especially below a capacity level of around 1380, i.e., for lower-level problems with tighter constraints. While the exact algorithm and GA 50 show a similar objective space, GA 15 has a large cluster in the higher cost area. This bulk of solutions can be explained due to the tight stopping criterion. Thus, it can be assumed that these solutions are barely feasible, but did not have enough time to improve.
For the actual comparison of the SA results, the objective space was segmented into target areas by the same measures as described in Section 4 (see Fig. 10). Since the greedy heuristic only covers the area of TAC1, it only yields one target area for further analysis. In analogy to Fig. 6 (top), Fig. 11 shows the distribution of the absolute constraint values across the individual target areas. Although the spread of the data slightly differs, all charts show the same pattern and would lead to the same insights as discussed in Section 4. This is particularly interesting for the greedy heuristic, which initially did not seem very promising when only the output data was considered.
In the last step of the comparison we analyze how reliable the created rules obtained from the different datasets are. For this purpose, decision tree models were trained with the data set of each algorithm and evaluated with respect to precision (7) and recall (8) [6]. However, since the greedy heuristic only has such a limited scope, it was excluded from this analysis. To obtain the testing data, the exact data set was split into the results of 20 optimization runs for training and 10 for testing. These 10 optimization runs served as the benchmark and were used to test both the exact algorithm and the GAs. For the training data of the GAs, all 30 optimization runs were used. In the course of the classification study, we randomly selected a subset of 10 optimization runs of each training data set, i.e., 10 out of   20 for the exact algorithm and 10 out of 30 for the GAs. This step was repeated 10 times and the average values of the resulting performance metrics were taken. In total the experiments were executed for the target areas S1 to S3 as defined in Section 4. The classification study was modeled as a binary prediction problem and for all three data sets a decision tree with a depth of 10 yielded overall good results. However, since it was the objective to compare the data sets and not predictive models, we did not further optimize or tune the decision tree model.  Table 5 shows the precision (proportion of how many elements labeled positive are actually positive) and recall (proportion of how many of the actual positive elements were identified) values for the different lower-level algorithms across the three target areas. The first observation here is that for target areas S1 and S2, all predictors associated to the algorithms have an equally high precision, with the GAs even slightly outperforming the exact algorithm. However, in terms of recall, the predictive performance decreases as the performance of the lower-level algorithms decreases (see also Fig. 8). Target area S3 shows a different behavior for the GAs. In terms of precision, the predictive performance still decreases with decreasing performance of the lower-level algorithms, and is generally lower than for S1 and S2. However, all algorithms show a similar level of recall.
For a better interpretation of the predictive performance, Fig. 12 visualizes the confusion matrix values of an exemplary prediction result in the objective space. The most important observation here is that for all target areas, the GAs succeed in finding those solutions that are close to the Pareto-optimal front. This is particularly noteworthy since many of these solutions lie outside the target space originally covered by GAs (see Fig. 8). Moreover, for target area S3 it becomes apparent for which type of solutions the models have issues in reporting false positives. However, all falsely reported solutions still lie close to the Paretooptimal front. In the context of a practical application, the issue with false positives for S3 could be solved by adding an additional rule for maximum capacity.
When comparing the classification results for the different rule sets in Section 4 with the results of the exact algorithm in Section 5, it is noticeable that the number of false positives is much larger for the classification in Section 5 than in Section 4, even though both are based on the same data set. This can be explained by the fact that in Section 4 the goal was to obtain reliable rules and thus prioritize precision. Here, the decision tree model had a maximum depth of 100 and the strictest rules at the end of the GPT were selected, which correspond to the lower end of the decision tree. In contrast, the goal of the classification in Section 5 was to conduct an actual prediction study, taking into account overfitting and

Conclusions
In this paper, the concept of bilevel innovization as a tool for sensitivity analysis in combinatorial optimization was illustrated by an in depth-analysis of the generalized assignment problem. By the application of bilevel innovization and iteratively looping through the individual steps, several insights into the model behavior could be gained. It was shown that the objective function value (i.e., cost) of the problem can generally be decreased by increasing the capacity constraints. Moreover, solutions with lower cost are primarily driven by an increase of constraint b5, which is particularly important for the transition from target area TA3 to TA1. However, in the course of a more detailed investigation of the target area TA1 by applying a clustering algorithm, it became apparent that not only an increase of b5 yields solutions in TA1, but also an even distribution of constraint values. Further, the application of a decision tree algorithm in combination with Generalized Pythagoras Trees allowed to obtain rule sets that have to be fulfilled to achieve a certain level of solution efficiency and cost levels. These rule sets suggest, among others, that especially b3 and b4 are responsible for the solutions being close to the Pareto-front. Subsequently, the rules were used to formulate a linear model to quantify the influence of the individual constraint values on the overall assignment cost and the objective function value of the GAP, respectively.
In the second part of the analysis, the effect of different lower-level algorithms on the SA results were analyzed. In terms of general visual patterns, it was shown that all algorithms would lead to similar conclusions and therefore are equally suitable for SA. However, a crucial requirement here this is that the algorithm is able to find feasible solutions for the lower-level problem. Furthermore, the predictive performance of the lower-level algorithms regarding extracted rules was evaluated. If the focus of the analysis is primarily on precision, i.e., providing reliable rules for the decision-maker, less performant algorithms in terms of solution quality also provide suitable results. This can be especially useful in an environment with frequently changing framework conditions, where the SA has to be repeated more often. Here, for example, an algorithm configuration with a shorter runtime could be used to gain quick insights. However, when computation time is not a primary concern, the solution quality of the lower-level algorithm configuration should be as good as possible to increase the accuracy of the SA.
Further research should investigate the use of bilevel innovization for different types of problems, both benchmark problems as well as real world problems. For example, it should be examined how sensitivity analysis can be applied to problems with numerous constraints by initially applying dimensionality reduction techniques, such as principal component analysis. In the context of rule extraction and prediction, other classification methods such as random forests or artificial neural networks could be investigated. Furthermore, it could be investigated how the solution groups of true positives and false negatives differ with respect to their decision variables and constraint values, respectively. Finally, BLI should be applied to other parameters of an optimization model, such as constraint or objective function coefficients, and it could be investigated how different upper-level algorithms affect the SA results.