Instance Space Analysis for the Car Sequencing Problem

We investigate an important research question for solving the car sequencing problem, that is, which characteristics make an instance hard to solve? To do so, we carry out an instance space analysis for the car sequencing problem, by extracting a vector of problem features to characterize an instance. In order to visualize the instance space, the feature vectors are projected onto a two-dimensional space using dimensionality reduction techniques. The resulting two-dimensional visualizations provide new insights into the characteristics of the instances used for testing and how these characteristics influence the behaviours of an optimization algorithm. This analysis guides us in constructing a new set of benchmark instances with a range of instance properties. We demonstrate that these new instances are more diverse than the previous benchmarks, including some instances that are significantly more difficult to solve. We introduce two new algorithms for solving the car sequencing problem and compare them with four existing methods from the literature. Our new algorithms are shown to perform competitively for this problem but no single algorithm can outperform all others over all instances. This observation motivates us to build an algorithm selection model based on machine learning, to identify the niche in the instance space that an algorithm is expected to perform well on. Our analysis helps to understand problem hardness and select an appropriate algorithm for solving a given car sequencing problem instance.


Introduction
The production of cars in Europe rely on a "built-to-order" system, where cars with specific options need to be assembled in relatively short periods of time.The cars to be assembled are typically diverse in respect to their required options, and hence, the assembly process is necessarily slow.Since 1993, the car manufacturer, Renault, has been tackling a variant of "car sequencing" with the aim of assembling a range of diverse cars as efficiently as possible (Solnon et al., 2008).In 2005, the subject of the ROADEF challenge was the same car sequencing problem (Nguyen and Cung, 2005), which is one of finding an optimal order, or sequence, in which cars are to be processed on a production line.Each car has a set of options such as paint colour, air conditioning, etc.Each option has an associated constraint that gives the maximum number of cars that can have this option applied in any contiguous subsequence.The car sequencing problem is known to be NP-hard (Kis, 2004).
There are many variants of car sequencing problems that all share common characteristics but with small differences in the objective function or constraint.A comprehensive study of all variants is beyond the scope of a single paper.Here we will focus on one popular optimization variant of the car sequencing problem which minimizes the violation of constraints on the occurrence of options within subsequences (Bautista et al., 2008;Thiruvady et al., 2011Thiruvady et al., , 2014)).The first study in this direction was conducted by Bautista et al. (2008), who showed that a beam search approach can be very effective.Following this, Thiruvady et al. (2011) showed that a hybrid of beam search, ant colony optimization (ACO) and constraint programming can substantially improve solution qualities while reducing run-times.Thiruvady et al. (2014) proposed a hybrid of Lagrangian relaxation and ACO, and they showed the hybrid can find excellent solutions and at the same time very good lower bounds.A recent study in this direction is by Thiruvady et al. (2020), who showed that a large neighbourhood search (LNS) finds high quality solutions in relatively low run-times.Other methods used for this type of problem include beam search (Golle et al., 2015) and a hybrid neighbourhood search method combining tabu search and LNS (Zhang et al., 2018).
In the literature, there are three commonly used benchmarks (Gent, 1998;Gravel et al., 2005;Perron and Shaw, 2004a).The benchmark instances from Perron and Shaw (2004a) consisted of up to 500 cars with relatively high utilisation of options, and were considered as the "hardest" set of problem instances available to test algorithms against.Though, since then, studies (Thiruvady et al., 2011(Thiruvady et al., , 2014(Thiruvady et al., , 2020) ) have demonstrated different solvers (including those based on mixed integer programming (MIP), constraint programming and metaheuristics) have variable performance, and hence it is unclear as to which set of problem instances are in fact the most complex.Moreover, the specific characteristics of hard problem instances have not clearly been identified.
The aims of this study are twofold: (a) to address the gap of a lack of knowledge as to which characteristics of the car sequencing problem make finding high quality solutions hard, and (b) to use machine learning to build an algorithm selection model.The first aim is achieved by performing an instance space analysis for the problem.The methodology of instance space analysis was originally proposed by Smith-Miles et al. (2014), and has been successfully applied to many problems including combinatorial optimization (Smith-Miles et al., 2014), continuous optimization (Muñoz and Smith-Miles, 2017), machine learning classification (Muñoz et al., 2018), regression (Muñoz et al., 2021) and facial age estimation (Smith-Miles and Geng, 2020).However, it still requires problem-specific design to apply this general approach to the specific problem of car sequencing, such as feature extraction and instance generation.
To carry out an instance space analysis, we extract a vector of problem features to characterise a car sequence problem instance, and use principal component analysis to reduce the dimension of feature vectors into two.By doing so, an instance is projected to be a point in the two-dimensional instance space, allowing a clear visualization of the distribution of instances, their feature values and algorithms' performance.Our analysis reveals two problem characteristics that make a car sequencing instance hard to solve, i.e., high utilisation of options and large average number of options per car class.Further, this analysis allows us to develop a problem instance generator, which can generate problem instances that can be considered extremely complex irrespective of the type of solver.
The second aim, building an algorithm selection model, is achieved by using machine learning (Rice, 1976;Smith-Miles, 2009;Muñoz et al., 2015) to identify which algorithm is most effective in a particular region of the instance space.We select three state-of-the-art algorithms from the literature as our candidates: the Lagrangian relaxation and ACO hybrid (Thiruvady et al., 2014) and two LNS algorithms proposed by Thiruvady et al. (2020).Additionally, given the vast improvement in commercial solvers, we investigate three algorithms not previously successful or attempted in the literature.These algorithms are MIP, MIP based on lazy constraints, and an adaptive LNS that builds on algorithms studied in Thiruvady et al. (2020).In summary, the contributions of this study are: 1. Introducing two new optimization algorithms for the car sequencing problem: an adaptive LNS method and a MIP method with lazy constraints.We also demonstrate for the first time that with suitable tuning of parameters, modern MIP solvers are competitive for this problem.2. Performing an instance space analysis for the car sequence problem and successfully identifying the characteristics that make an instance hard to solve.3. Designing a benchmark to allow systematical generation of problem instances with controllable properties and showing that they complement the existing benchmark instances well in the instance space.4. Developing an algorithm selection model based on machine learning, which shows that our adaptive LNS and the MIP solver based on Gurobi become the new state-of-the-art for solving the car sequencing problem.Further, the adaptive LNS performs the best for solving hard problem instances, while the MIP solver is the best for medium-hard instances.
This study is different to many others published in the literature in not simply identifying one new or improved algorithm but advancing the state-of-the-art while giving insight into what types of algorithms are best suited for different types of instances.The analysis of the instance space and new benchmark instances are expected to pave the way for future research in designing better algorithms for car sequencing problems.The remainder of this paper is organized as follows.In Section 2, we introduce the methodology of algorithm selection and instance space analysis.In Section 3, we describe the car sequencing problem and present the MIP formulation used.Section 4 introduces the solution algorithms for the car sequencing problem, and Section 5 describes feature extraction and instance generation.Section 6 presents and analyses the experimental results, and the last section concludes the paper.

Algorithm Selection and Instance Space Analysis
The algorithm selection framework was originally proposed by (Rice, 1976).It has four key components: • Problem space (P ) consists of instances from the problem of interest.
• Algorithm space (A) contains algorithms available to solve the problem.
• Feature space (F ) contains measurable features extracted to characterise a problem instance.
• Performance space (Y ) consists of measures to evaluate the performance of algorithms when solving the instances, e.g., the optimality gap generated by an algorithm within a time limit.For an instance x ∈ P , a set of features f (x) can be first extracted.The goal of algorithm selection is to learn a mapping from the feature vector f (x) ∈ F to an algorithm α * ∈ A, such that the algorithm α * performs the best on the instance x when compared to other algorithms in A. This is typically achieved by training a machine learning model on the data composed of the feature vectors of training instances and the performances of candidate algorithms on the training instances.Algorithm selection is closely related to meta-learning (Vanschoren, 2018) and auto machine learning (He et al., 2021), which aims to select the best parameter settings for machine learning models.Algorithm selection is often beneficial for problem solving.For example the algorithm selection model, SATzilla-07, was able to solve more than 20% instances in the 2007 SAT competition than any of its component solver (Xu et al., 2008).Apart from the SAT problem, algorithm selection has been demonstrated successful on quantified Boolean formulas (Pulina and Tacchella, 2009), constraint solving (O'Mahony et al., 2008), answer set programming (Hoos et al., 2014), continuous black-box optimization (Kerschke and Trautmann, 2019) and clustering (Wang et al., 2020), to name a few.Some of those algorithm selection models were collected in the ASlib benchmark library (Bischl et al., 2016), and more comprehensive literature reviews are available in (Smith-Miles, 2009;Muñoz et al., 2015;Kerschke and Trautmann, 2019).Smith-Miles et al. (2014) further extended Rice's algorithm selection framework by including a visualization component (See Fig. 1).A significant advance of the extended algorithm selection framework is that it provides great insights into the strengths and weaknesses of an algorithm as well as problem hardness.A two-dimensional instance space can be created and visualised by using dimension reduction techniques such as principal component analysis to reduce the dimension of feature vectors to two.Importantly, the performance of an algorithm can also be visualized in the two-dimensional space, clearly showing in which region of the instance space the algorithm performs well or badly.Finally, by visualizing the values of features in the same space, one can identify the features that are mostly correlated with the performance of algorithms, indicating which features make an instance hard to solve.
Despite the existence of many studies in this area, the existing algorithm selection models are all problem-specific, and thus cannot be directly applied to car sequencing problems.In the following, we will perform an instance space analysis and build algorithm selection models for the car sequencing problem.

The Car Sequencing Problem
In this section, we describe the optimization version of the car sequencing problem, originally introduced by Bautista et al. (2008).
Consider D cars, each of which belongs to a class C = {c 1 , . . ., c K }, where K is the number of classes.The cars require one or more of a set of options O = {o 1 , o 2 , . . ., o O } that need to be installed.All cars within a class must be fitted with the same options.For this purpose, each car 1, . . ., D has an indicator vector associated with it Cars in the same class have the same indicator vector, or alternatively two cars i and k are in the same class if r i = r k .Moreover, the number of cars in class j is denoted as d j .
Typically, each station that installs the options along the assembly line has a certain capacity, which if not met, will slow down the assembly line.Hence, to satisfy aspirational targets, we introduce for each option p j and q j (j = 1, . . ., O), where in any subsequence of q j cars, there should only be p j cars that require option j.
Any sequence or ordering of the cars is a potential solution to the problem.We denote such a sequence by π, and π k is used to represent car class c in position k of π.As any permutation between cars in the same class produces an equivalent solution, we eliminate such symmetries in the problem in order to reduce the search space.
A sequence that satisfies all subsequence constraints for all options can be considered as ideal.Though, such a sequence can be hard to find, and hence the optimization version of the problem focuses on modulating option utilisation across the sequence.The measures that capture the modulation of options are presented next.

Assignment measures
An option o i with subsequence length q i will have D − q i + 1 subsequences.Those subsequences which have used o i in excess of the allowable p i options, can be computed as follows: where u i (j) = max(1, j + 1 − q i ).Here u i (j) is the starting position of a subsequence q i , which ends at position j.The inequality j < q i implies that the subsequence starts at position 1, resulting in the subsequence size being less than q i .We now define upper over-assignment (uoa) (Bautista et al., 2008), which computes the total number of times within a sequence π that an option o i violated the limit p i : where a ij is a penalty imposed for exceeding p j in subsequence positions j + 1 − q i to j (if j < q i , positions 1 to j).If p j is never violated, then uoa(π) = 0, and π is considered as a feasible solution.
The upper under-assignment (uua) is a second measure that determines the number of times, and by how much, the utilisation of options within the subsequences are under p j : where b ij is a penalty imposed for having fewer than p i cars in subsequence positions j + 1 − q i to j (if j < q i , positions 1 to j).
Penalties for over assignment are typically different to that of under assignment, and the penalties in both cases can vary across a sequence.If the penalties were set as constants, it would suffice to measure only over assignments (and hence over assignment penalties), as the total increase in over assignments leads to an equivalent decrease in under assignments in other parts of the sequence.Bautista et al. (2008) provided an efficient MIP formulation for this problem.We use this formulation for the MIP-based methods that follow in this study.

Mixed Integer Programming Model
As introduced earlier, y and z are chosen as variables (Section 3.1).Moreover, we define binary variables x where x it = 1 if car class i is chosen in position t.The objective is to minimize uoa(π) + uua(π) corresponding to Equations ( 3) and ( 4).The problem can be defined as follows: This formulation differs slightly from the discussion of the problem in the previous section.Observing that two cars of the same class are interchangeable (i.e., swapping two cars of the same class leads to the same solution), this formulation focuses on selecting car classes in each position.It has the added advantage of reducing the solution space by removing symmetries.

Algorithms
A large number of solution methods have been proposed to tackle at least some variants of the car sequencing problem, including exact methods such as constraint programming (Dincbas et al., 1988), MIP (Gravel et al., 2005), meta-heuristics (Gottlieb et al., 2003;Sun and Fan, 2018) and hybrid methods (Khichane et al., 2008).A comprehensive review of this literature is beyond the scope of this paper.Here, we focus on techniques for solving the optimization variant of the car sequencing problem, including Lagrangian relaxation and ant colony optimization (LR-ACO) (Thiruvady et al., 2014), and large neighbourhood search (LNS) (Thiruvady et al., 2020).Further, we introduce a new adpative LNS algorithm and investigate a MIP method with lazy constraints.

Lagrangian Relaxation and Ant Colony Optimization
An effective Lagrangian relaxation procedure from the MIP model (Section 3.2) was proposed by Thiruvady et al. (2014).Among the different sets of constraints that can be relaxed, they showed that Constraint (7) leads to the most effective Lagrangian relaxation.Hence, the resulting problem consists of all other constraints and the objective is modified as follows: Solving this relaxation leads to a solution that is potentially infeasible.Specifically, certain positions (i in x it ) may have no car class assigned and other positions may have two or more cars assigned.To rectify this infeasibility, Lagrangian multipliers (λ) are introduced, which penalise positions in the sequence where either zero or more than one car have been assigned.The multipliers are updated via subgradient optimization (Bertsekas, 1995), which aims to find the "ideal" set of multipliers over a number of iterations.The ACO algorithm (Dorigo and Stűtzle, 2004) is incorporated within Lagrangian relaxation as a repair method, leading to the LR-ACO heuristic.The flowchart of the LR-ACO method is shown in Fig. 2 and its pseudocode and associated details are presented in Appendix A. Figure 3: An illustration of one iteration of the LNS method.In this example, there are ten cars and four car classes, indicated by the colours: blue, green, yellow and orange.The first four cars are considered in the MIP, and solved leading to swapping all four car classes.Following this step, there is a shift of two cars, and the next four cars are considered to be solved.The search in this iteration continues until reaching the end of the sequence.

Large Neighbourhood Search
The LNS of Thiruvady et al. (2020) is used as the basis for the LNS implementations in this study.At a high-level the procedure works by starting with a (good) feasible solution, considering subsequences within the sequence, solving a MIP of the subsequences, and repeating this process until some termination criterion is met.The motivation for this approach is that solving the original MIP can be very time consuming, and hence solving smaller part of the original problem and piecing them together can be achieved efficiently.
The procedure works as follows.First, a high quality solution is obtained in a short timeframe.Following this step, subsequences within the solution are considered sequentially.For each subsequence, a MIP (Section 3.2) is generated (i.e., subproblem) that fixes all variables in the original solution, but leaves those solution components associated with the subsequence free.The MIP is then solved, possibly to a time limit or gap for efficiency purposes.1If the subsequence sizes are small, all subproblems can be solved efficiently.In some cases large subproblems can also be efficiently solved.In one iteration of the algorithm, the search progresses by solving a subsequence, defining a new subsequence by moving forward a few positions, and continuing this process until the end of the subsequence is reached (see Fig. 3 for an example).Key to the success of the method is to move forward a few positions by ensuring at least some overlap with the previous subsequence.The above process is repeated for multiple iterations until the termination criterion is reached.The pseudocode of LNS and its associated details are presented in Appendix B.
The size of the subsequence (i.e., window size) has a large impact on the performance of LNS.In our experiments, we will test two versions of LNS with different window sizes w: (1) 10-LNS with w = 10, and (2) LCM-LNS with w equal to the lowest common multiple of q values.

A New Adaptive LNS Approach
We make three modifications to the LNS algorithm proposed by Thiruvady et al. (2020) to improve its performance, creating a new adaptive LNS.
Firstly, we note that it is a waste of computational resources when solving consecutive subsequences of cars repeatedly.Considering a car sequence π to be improved, if a number of initial subsequences of π are already solved to optimality, there is no need to re-optimize those subsequences in the following iterations.To address this issue, we propose to optimize non-consecutive subsequences of π so that different subsequences are optimized in each iteration of the algorithm.More specifically, we generate a random permutation (P ) of integers from 1 to D in each iteration, and solve non-consecutive subsequences of π indexed from P [ŝ] to P [ŝ + w], where w is the window size, and ŝ is the starting point, which increments by an interval of w/2 to generate multiple subsequences.Note that the randomization of P has the effect that any large neighbourhood consisting of w cars may be searched, rather than only subsequences of w consecutive cars as was the case in the original LNS.
Secondly, the original LNS algorithm uses a fixed window size w for an instance, i.e., setting w to a constant (e.g., ten) or the lowest common multiple of q values.However, we observe that for some instances, the algorithm can solve subsequences of small sizes very quickly and may get stuck in a local optimum in an early stage of the search.To address this potential issue, we introduce an adaptive setting for the window size, that is, starting with a window size of thirty and increasing the window size by one if the improvement of the optimality gap in an iteration is less than 0.005.By doing so, the algorithm can escape from a local optimum by using a larger window size.If computational time allowance is sufficient and the window size is eventually increased to D, solving the subsequence is equivalent to solving the whole sequence (i.e., the original problem).
In addition, given the vast improvement in commercial solvers, we use Gurobi to solve the MIP within a cutoff time to generate an initial feasible solution for the LNS algorithm, instead of using LR-ACO.This modification makes the algorithm simpler and more straightforward to implement.

MIP with Lazy Constraints
In similar fashion to Lagrangian relaxation, the lazy constraint method relaxes one or more of the sets of constraints.As we have previously motivated, solving the relaxed problem is typically time efficient, and the aim in the lazy method is to exploit this potential advantage.The key idea behind this procedure is that solutions typically violate a small number of constraints within the constraint set.If chosen carefully, it is possible to find high quality solutions with little overhead.
The procedure works as follows.Prior to the execution of the algorithm, a constraint in the MIP formulation is chosen to be relaxed.The algorithm initializes the MIP model and executes without considering the chosen constraint.When a feasible solution is found, it is tested for feasibility against the constraint set that is relaxed.If violated, the specific constraint is introduced to the model as a "lazy" constraint (note, not the whole constraint set), and the procedure executes again.This process repeats until a provably optimal solution (to the original problem) is found or until a time limit expires.
For the purposes of this study, and in preliminary experimentation, we found that Constraints (9) are the most effective as implemented within the lazy method.This finding is not surprising, since relaxing this constraint set leads to a MIP that is equivalent to an assignment or bi-partite matching problem (Manlove, 2013).These problems are known to be pseudo-polynomial, and can hence be solved efficiently.Note, there is a potential disadvantage to this approach, which is that the link between the x and y/z variables are broken.It means for specific problem instances, an initial feasible solution can be hard to find (i.e.too many lazy constraints are needed to find a feasible solution).

Features
We seek to determine the features of the car sequencing problem that will allow generating hard problem instances, i.e., ones that are considered hard for existing and newly proposed methods.There are certain features that are shown to lead to problem hardness as identified in previous studies (e.g.number of cars), and we also consider other features which could lead to creating hard instances (Gent and Walsh, 1999;Gravel et al., 2004;Perron and Shaw, 2004b).The following are the features chosen using past studies as a guide.8. Average number of options per car class 9-12.p/q ratio (minimum, average, maximum, standard deviation) 13.Lowest common multiple of q values 14.Standard deviation of car class populations Note, the utilisation of an option o is defined as: µ o ( D i=1 r i )/D, where r i = 1 if and only if car i requires option o and D i=1 r i is the total number of cars that require option o, and µ o (t) is the minimum length sequence needed to accommodate option o occurring t times, i.e., µ o (t) = p o ((t−1) div q o ) + ((t − 1) mod q o ) + 1 (Perron and Shaw, 2004a).
The features 1-12 have already been used in different ways to create current benchmark problem instances (Gent and Walsh, 1999;Gravel et al., 2004;Perron and Shaw, 2004b).Hence, we use them as part of our feature set.Moreover, we have seen in Thiruvady et al. (2020) that the LCM of q values typically leads to effective subsequences that can be solved efficiently, and hence we make use of this feature 13.Finally, all known problem instances have been obtained by using uniform distributions to assign cars to classes, whereas allowing variances in these assignments could impact the hardness of a problem.Hence, we also include feature 14 in our feature set.
These features will be used to perform instance space analysis and also used as inputs into machine learning models to build algorithm selection models.Hence, they form the basis of this study.Moreover, we will systematically generate new problem instances by varying these features.

Instances
We use a total of 247 problem instances in this study, including 121 existing benchmark instances and 126 newly generated instances.
The existing benchmark instances we have chosen consist of the nine instances available from CSPLIB (Gent and Walsh, 1999), 82 from (Perron and Shaw, 2004a), and 30 from (Gravel et al., 2005).These benchmark instances were previously considered "hard" to solve, though, the last decade and a half has seen rapid development of commercial solvers and extensive research in heuristic/meta-heuristic methods, a number of these problem instances can now be solved efficiently.Hence, we also generate new problem instances with various problem characteristics for stress testing of algorithm performance.
The new problem instances are generated as follows.To generate an instance specified with n cars, o options, and k car classes, we generate o-dimensional option vectors, by including option i with probability p i /q i , until we reach k unique vectors.We then assign each car to a car class with uniform probability.Each of the following fourteen sets contain nine instances, three for each size of 100, 300, and 500 cars, for an overall total of 126 new instances: 1. noBhiU: no bias and high utilisation.This is the "default" set with o = 5, k = 25, 1 p 3, and 1 q − p 2. The car classes are selected uniformly on a per-car basis, and utilisation rate for all options is between 90 − 100%.2. negBhiU: negative bias and high utilisation.It is the same as noBhiU, except that car classes with fewer options receive more cars.In particular, when a car is assigned to a given class k containing o k options, we randomly reassign it with probability o k /o. 3. posBhiU: positive bias and high utilisation.It is the same as noBhiU, except that car classes with more options receive more cars.In particular, when a car is a assigned to a given class k containing o k options, we randomly reassign it with probability 1 − o k /o. 4. hiPQhiU: high p/q ratios and high utilisation.It is the same as noBhiU, except that 2 p 4. 5. hiPQmedU: high p/q ratios and medium utilisation.It is the same as hiPQhiU, except that all option utilisation rates are between 70 − 80%.6. negBhiPQloU: negative bias, high p/q ratios, and low utilisation.It is the same as hiPQhiU, except that all option utilisation rates are between 50 − 60%, and negative bias is added.It is difficult to generate such low-utilisation instances using p/q option probabilities without negative bias. 7. loPQ: low p/q ratios.The settings for loPQ are: o = 8, k = 20, 1 p 2, 2 q − p 3, and no lower bound on utilisation.It is difficult to generate low-p/q-ratio instances with o = 5 and k = 25.8. negBfixedPQ: negative bias and fixed p and q.The parameter p is fixed to (3, 2, 1, 2, 1); q is fixed to (4, 3, 4, 5, 2); and there is no lower bound on utilisation.It is difficult to generate these instances (presumably again because of the low p/q ratio included, namely 1/4) without negative bias.9. randN: random configuration of car classes.It is the same as noBhiU, except that we select uniformly from the n+k−1 k−1 possible assignments of cars to car classes, instead of uniform distribution of cars to classes.Doing so increases the variance in car class populations.10.RloU: random car classes and low utilisation (with option utilisation rate only restricted to > 50%).11.RanyU: random car classes, no utilisation restrictions, and 10 options.
While the above sets of parameter choices may seem arbitrary, they were selected after significant experimentation to provide a good coverage of the instance space, as will be shown in Section 6.1.

Experimental Results
This section presents the experimental results of the instance space analysis and algorithm selection for the car sequencing problem.The solution algorithms were implemented in C++ and compiled with GCC-5.2.0, and the algorithm selection models were implemented in Python with the scikit-learn library (Pedregosa et al., 2011).The experiments were conducted on the Monash University campus cluster, MonARCH.The source code and problem instances will be made publicly available when the paper is published.

Visualizing Instances
We create a two-dimensional instance space to visualize how well the newly generated problem instances complement the existing instances available in the literature.Each instance is represented by a vector of fourteen features listed in Section 5.1.For features that are not in the range of 0 and 1 (i.e., number of cars, number of options, number of car classes, and lowest common multiple of q values), we normalize them by their maximum value.For better visualization, we select eight features that are mostly correlated with the algorithm performances using information theoretic feature selection methods (Sun et al., 2020).The eight features selected are the option utilisation (minimum, average, standard deviation), number of car classes, number of options, average number of options per car class and p/q ratio (average and maximum).
We then use dimensionality reduction techniques to project the eight-dimensional feature vectors onto a two-dimensional plane.The techniques used are principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE) (Van der Maaten and Hinton, 2008) and uniform manifold approximation and projection (UMAP) (McInnes et al., 2018).The created twodimensional instance spaces are shown in Fig. 4. The instances are more evenly spread in the space created by PCA compared to the other two, and therefore we will only use PCA for visualization in the rest of the paper.The percentage of variance explained versus the number of components selected in PCA is shown in Fig. 5.The first two principal components explain 88% of the data variance, and the corresponding transformation are listed as follows: num-options num-classes min-utilisation ave-utilisation std-utilisation ave-options ave-pq-ratio max-pq-ratio We can observe that the existing problem instances, i.e., CSPLIB, Gravel, and Perron & Shaw, mainly lie at the bottom of the two-dimensional instance space created via PCA.The newly generated instances complement the existing instances, and they together fill the instance space well.We also visualize where each category of the newly generated instances lies in the twodimensional instance space in Fig. 6.

Visualizing Algorithm Performance
We test the performance of six algorithms on the 247 problem instances.The algorithms are LR-ACO, 10-LNS (LNS with a fixed window size of 10), LCM-LNS (LNS with a window size computed by the lowest common multiple of q values), and MIP (solved by Gurobi), as well as two new algorithms i.e., Adp-LNS (the adaptive version of LNS) and MIP with lazy constraints (solved by Gurobi).The parameter setting for LR-ACO is consistent with the original paper (Thiruvady et al., 2014).Gurobi 6.5.1 is used to solve the MIPs with 4 cores and 10GB of memory given.The parameter settings for Gurobi are tuned by hand: barrier iteration limit = 30; MIPgap = 0.005; Presolve = 1 (conservative); barrier ordering = nested dissection; cuts = -1 (aggressive); cut passes = 200 (large); barrier convergence tolerance = 0.0001; presparsify = 1 (turn on always); pre pass limit = 2.The time limit for each algorithm to solve an instance is set to one hour.The objective value (y b ) generated by an algorithm for an instance is compared against the best known lower bound (y lb ) for that instance to compute an optimality gap (y b −y lb )/y b ∈ [0, 1).The gap generated by an algorithm for an instance indicates how well the algorithm performs on that instance.
The performance of each algorithm is visualized in the two-dimensional instance space in Fig. 7, where each dot represents an instance and its color indicates the optimality gap generated by the corresponding algorithm for that instance.We observe that most of the hard instances lie in the bottom left region of the instance space, which corresponds to five categories of newly generated instances (as shown in the first graph of Fig. 6), as well as part of the CSPLIB and Gravel instances.Some of the newly generated RanyU and loPQ instances are also hard.The Perron & Shaw instances were considered to be among the hardest problem instances available in the literature (Perron and Shaw, 2004a), but we observe that these instances are relatively easy for the six algorithms to solve.The LR-ACO algorithm is clearly outperformed by other algorithms, thus we will leave LR-ACO out in the subsequent analysis.

Visualizing Features
We visualize the eight features in the two-dimensional instance space in Fig. 8, where each dot represents an instance and its color represents the value of corresponding feature.The aim is to identify which features make an instance hard to solve.From Fig. 7 and Fig. 8, we can observe that the minimum utilisation (or average utilisation) is highly positively correlated with the optimality gap generated by the algorithms, meaning that high utilisation makes an instance hard to solve.This result is not surprising since higher utilisation effectively means that satisfying the subsequence constraints is a lot more complex, and as a consequence, the modulation of the use of option is also more complex.Instances with a large average number of options per car class are mostly hard to solve.For the same reasons previously discussed, this result is also expected.However, the number of options is negatively correlated with the optimality gap, indicating that a large number of options makes an instance easy to solve.This is an interesting outcome which shows that a large number of options allows less room for car classes to be sequenced in different positions.There is no significant correlation between the algorithm performance and other three features, i.e., number of car classes, average p/q ratio and maximum p/q ratio.

Algorithm Selection
We build our algorithm selection model based on machine learning using the full set of fourteen features (Section 5.1) to select the best algorithm for solving a given instance.We perform four   comparisons between the algorithms: 1) MIP vs Lazy, 2) Adp-LNS vs LCM-LNS, 3) Adp-LNS vs 10-LNS, and 4) MIP vs Adp-LNS.For each pair of algorithms (denoted as A and B), we divide the instances into three classes: 1) algorithm A performs better; 2) algorithm B performs better; or 3) algorithms A and B perform equally well if the difference between the optimality gaps generated are less than 0.005.It then becomes the standard three-class classification problem, and any classification algorithm can be used for this task.We have tested support vector machine (SVM) with a radial basis function kernel, k nearest neighbour (KNN), decision tree (DT) and logistic regression (LR) models.We have tuned the regularization parameter (C) of SVM and LR, number of neighbors (k) of KNN, and maximum tree depth (d) of DT, with the ten-fold cross-validation accuracy reported in Table 1.The best parameter values identified are C = 1000 for SVM, k = 3 for KNN, d = 2 for DT and C = 10 for LR.The other parameter settings are consistent with the default of the scikit-learn library.The average ten-fold cross-validation accuracy of algorithm selection over 100 runs using each machine learning model is reported in Table 2.The results show that the machine learning models achieve a reasonably good accuracy for the algorithm selection tasks, comparing to the ZeroR method that simply predicts the majority class.KNN is clearly outperformed by the other three machine learning models.
To gain a deeper insight into the algorithm performance, we visualize the decision boundary identified by SVM in the two-dimensional instance space in Fig. 9.The color of a dot represents which algorithm performs better on that instance and the color of a region indicates which algorithm performs better for that region based on the SVM prediction.We can clearly see that MIP performs equally well as Lazy on easy instances, and outperforms Lazy on other instances.The Adp-LNS algorithm outperforms the other two LNS algorithms on most of the instances except RanyU instances.We then take the best performing exact algorithm, MIP, and compare it against the best performing local search method, Adp-LNS.Interestingly, by comparing Fig. 9 with Fig. 7, we can see a clear pattern that (1) Adp-LNS outperforms MIP on hard instances; (2) MIP outperforms Adp-LNS on medium-hard instances; and (3) they do equally well on easy instances.Fig. 10 presents the decision tree learned by the DT classifier to select the best algorithm between MIP and Adp-LNS for solving a problem instance.These results provide great insights into the algorithms' performance, and a simple guideline on which algorithm should be selected to solve a given car sequencing problem instance in practice.Finally, we report the AUC (area under the receiver operating characteristic curve), precision, recall and F1-score generated by each machine learning model for the algorithm selection tasks in Table 3.The machine learning models achieve a reasonably good performance in classifying majority class, while they may have trouble in predicting minority class when the data for the classification task is extremely unbalanced.For example, the machine learning models do not perform well in classifying the instances where 10-LNS outperforms Adp-LNS.However, this result is acceptable in practice because Adp-LNS generally performs much better than 10-LNS, and simply selecting Adp-LNS to solve all the instances would generate the best results for 94% of the instances when compared to 10-LNS.

Conclusion
Car sequencing problems are NP-hard, and as such the quest for better solution methods is always an ongoing one.In this paper, we have performed an instance space analysis for the car sequencing problem, which enabled us to gain deeper insights into problem hardness and algorithm performance.This analysis was achieved by extracting a vector of features to represent an instance and projecting the feature vectors onto a two-dimensional space via principal component analysis.The instance space provided a clear visualization for the distribution of instances, their feature values and the performance of algorithms.We systematically generated instances with controllable properties and showed that the newly generated instances complement the existing benchmark instances well in the instance space.Further, our analysis showed that the instances with a high utilisation or a large average number of options per car class are mostly hard to solve.Finally, we built algorithm selection models using machine learning to select the best algorithm for solving an instance, and found that our new adaptive large neighborhood search algorithm performed the best on hard problem instances, while the MIP solver, Gurobi, performed better on medium-hard instances.Our result based on decision tree provides a simple and effective guideline on which algorithm should be selected to solve a given instance.The new benchmark instances created here, and the insight as to which parts of the instance space are most amenable to integer programming versus local search approaches is expected to provide a useful basis for any future research in designing better algorithms for car sequencing problems.

Figure 1 :
Figure1: The methodological framework of algorithm selection and instance space analysis.Note that the reduced 2-D space is only used for visualization, and the algorithm selection models are built on the feature space F .

Figure 2 :
Figure2: A high-level view of the LR-ACO heuristic, where Lagrangian relaxation is first applied, and ACO is then used as a repair method.

Figure 4 :Figure 5 :
Figure 4: The two-dimensional instance spaces created by PCA, t-SNE and UMAP.Here each subfigure, a dot represents a problem instance, and its color indicates which category the instance belongs to.

Figure 6 :
Figure 6: The distribution of the newly generated instances.The axes of the figures are the first two principal components from PCA, and each dot represents a problem instance.

Figure 7 :
Figure 7: The optimality gap generated by each algorithm for the problem instances tested.The axes of the figures are the first two principal components of the feature vectors; each dot represents a problem instance in the space spanned by the first two principal components; and the color of the dots indicates the optimality gap (red is larger).

Figure 8 :
Figure 8: The distribution of each feature in the two-dimensional instance space created via PCA.Each dot represents a problem instance, and its color indicates the value of the corresponding feature of that instance.

Figure 9 :Figure 10 :
Figure 9: Pairwise comparison between the algorithms in terms of the optimality gap generated.The axes of the figures are the first two principal components of the feature vectors.Each dot represents a problem instance, and the color of the dots indicates which algorithm performs better.The color of a region indicates which algorithm performs better in that region predicted by SVM.

Table 1 :
The ten-fold cross-validation accuracy of algorithm selection generated by the classifiers with different parameter settings.

Table 2 :
The average ten-fold cross-validation accuracy of algorithm selection.