A Hybrid Evolutionary Algorithm for Reliable Facility Location Problem

The reliable facility location problem (RFLP) is an important research topic of operational research and plays a vital role in the decision-making and management of modern supply chain and logistics. Through solving RFLP, the decision-maker can obtain reliable location decisions under the risk of facilities' disruptions or failures. In this paper, we propose a novel model for the RFLP. Instead of assuming allocating a fixed number of facilities to each customer as in the existing works, we set the number of allocated facilities as an independent variable in our proposed model, which makes our model closer to the scenarios in real life but more difficult to be solved by traditional methods. To handle it, we propose EAMLS, a hybrid evolutionary algorithm, which combines a memorable local search (MLS) method and an evolutionary algorithm (EA). Additionally, a novel metric called l3-value is proposed to assist the analysis of the algorithm's convergence speed and exam the process of evolution. The experimental results show the effectiveness and superior performance of our EAMLS, compared to a CPLEX solver and a Genetic Algorithm (GA), on large-scale problems.


Introduction
The facility location problem aims at finding the optimal locations for facilities from a set of candidate location nodes in order to minimize the cost such as the fixed facility cost and the transposition cost, or to maximize the total revenue. In general, there are also some constraints to be considered, such as satisfying all customers' demands, etc. It is an NP-hard optimization problem [1,2,3] and has attracted much attention from researchers in both the scientific community and engineering field due to its wide application in real world. The facilities could be hospitals, restaurants, post stations, bus stations, industrial plants, banks, warehouses, and distribution centers, etc. The facility location decision has high precedence in the whole logistics decisions and has a great influence on subsequent operation level decisions [4]. Daskin et al. [1] regards the location decisions as the most critical and most difficult of the decisions needed to realize an efficient supply chain.
In RFLP, the facility is not always available all the time [1]. One or more of them may not work from time to time because of disruptions, examples include natural disasters, inclement weather, destruction of facilities by fire or flood, expiration of the contract, and any other force majeure factors. In such a situation, these are facility failures. The failures of the facilities will result in excessive transportation costs because the customers that were considered to be served by them must be served by other, usually more distant, facilities [1]. Therefore, by solving RFLP, we can get a location decision which can ensure a certain level of reliability to guarantee customers can get service when facilities' failures occur.
Many models have been proposed for RFLP, in which all kinds of factors were taken into account and many of them are formulated for specific applications in real life. In addition, large-scale RFLP problems have rarely been considered. The algorithms studied in literature were mainly tested on problems of small size.
This paper focuses on two aspects: the problem formulation and the algorithm. Based on the work of [5,6], we propose a new reliable facility locationallocation problem (RFLP) formulation, which does not fix the number of allocated facilities to each customer as a constant and is more close to reality. The resulted model is a nonlinear 0-1 integer programming model which is more complicated for traditional methods. In this paper, a hybrid evolutionary algorithm called EAMLS is proposed to solve it. EAMLS combines a memorable local search method with an evolutionary algorithm, which has a good performance on both small-scale and large-scale problems considered in this paper. It is worth mentioning that the instances used in our experiments are much larger than the ones used in previous work. Furthermore, a convergence metric l3-value is proposed for analyzing the algorithm and observing the evolutionary process.
The rest of this paper is organized as follows. Section 2 briefly reviews the related work of RFLP. In Section 3, our new RFLP formulation is introduced. We proposed a hybrid evolutionary algorithm EAMLS in Section 4. Section 5 presents computational studies, and Section 6 concludes.

Related Work
By solving a specific RFLP, decision-makers expect to get a robust location decision which is still economical when some facilities fail under various disruptions. The research can be divided into two categories according to the method used to handle facility failure or ensure reliability. Some works [7,8,9] use a disruptive scenarios approach to describe facility failure. In this approach, scenarios contain facility failure information, e.g., simultaneously disrupted facility sites, modified customer demands, and facility costs, etc. The disruptive scenarios approach can describe the facility failure information well, but it usually requires plenty of scenarios to cover different disruptive situations, which implies large computational cost, especially for largescale problems.
Another approach to ensure reliability is to allocate two or more facilities to serve each customer [5,6,11,12]. In this approach, the method for reliability is intuitive and easy to understand. Both a location decision (which contains how many facilities needed to build and where to build them) and an allocation decision (which shows how to allocate facilities to serve customers ) are determined before the occurrences of facilities' disruptions/failures. Some RFLP models have been proposed, e.g., models proposed by Li et al. [5] and Snyder and Darskin [6]. Table 1 summarizes the notations used in the models. Besides, there are two sets of decision variables: location decision variables (X) and allocation decision variables (Y): Y ijr = 1, if j is allocated as the level-r facility to serve i; 0, otherwise.
In Eq. (2), the "level-r" facility j for customer i means the facility j will provide service only when the front r allocated facilities (from level-0 to level-(r-1)) fail. A classical RFLP model in [6] is as follows. Min Subject to: Y ijr ∈ {0, 1} ∀i ∈ I; ∀j ∈ J; r = 0, . . . , m − 1 (12) In this model, there are two objectives in the objective function, w 1 is the operating cost and w 2 is the expected failure cost. The objective of the model is to minimize the weighted sum of the two objectives. Besides, there is an emergency facility u which will always be selected and not fail, and all customers can get service from it. Several shortcomings are observed in the literature: (1) The number of facilities allocated to each customer (i.e., m in Eq. (9)) is fixed in models of most literature, e.g., m = 2 (i.e., Y ij0 and Y ij1 ) in [5] and m = |J| in [6]. One issue of this allocation setting is the determination of an appropriate value of m. If m is bigger than the number of selected candidate location sites, i.e., j∈J X j , it is not in line with the actual situation because we cannot allocate nonexistent facilities to customers. If we set the value of m smaller than j∈J X j , the value of j∈J X j is changed during the exploration in solution space, therefore it is hard for us to set a suitable m value. If we set m = 2 directly, which means allocate just one primary facility and one backup facility to serve each customer, the reliability is a bit weak intuitively.
(2) To our best knowledge, there is a lack of research on the large-scale problem. The largest problem instance in the related research is 150-node and the optimization solver such as CPLEX can find near-optimal or even optimal solutions for the problem.
(3) There is a lack of research on the algorithm which can solve the large-scale problems efficiently as well.
Correspondingly, this paper: (1) constructs a new formulation in which a non-fixed allocation setting, i.e., m = j∈J X j , is used; (2) proposes a hybrid evolutionary algorithm EAMLS which combines a local search method with an evolutionary algorithm and performs well on both smallscale and large-scale problems; (3) performs experimental studies on large-scale problems whose scale is much larger than any related literature; (4) proposes a convergence metric l3-value to help observe the evolutionary process, adjust parameters and further improve the algorithm.

Problem Formulation
We propose a new RFLP formulation in which we set the number of allocated facilities to each customer as an variable instead of a fixed constant.
The mathematical formulation of our model is as follows, formulated based on [5,6]. The decision variables are defined by Eqs. (1) and (2).
The objective function of the model is to minimize the total cost associate with facilities construction (i.e., the term j∈J f j X j ) and transportation between the facilities and customers (i.e., the term i∈I j∈J Constraint (14) makes the number of facilities allocated to each customer (i.e., m) a variable and its value is related to location decision variables (i.e., X). Constraint (15) represents at lease two facilities are constructed to ensure reliability. Constraint (16) assures only one facility can be the level-r supplier of customer i. Constraint (17) means candidate location site j can be allocated to customer as a supplier only when it is selected. Constraint (18) and (19) are standard integrality constraints.
Compared with classical models shown in Section 2, the significant difference in our model is the new non-fixed facility allocation setting, i.e., constraint (14). In our model, the value of m is not fixed but varies with decision variables X, therefore it is more realistic, ensures reliability, but makes our model much more complex and difficult to solve by traditional methods as well.

A Hybrid Evolutionary Algorithm: EAMLS
This paper develops a new hybrid evolutionary algorithm EAMLS (Evolutionary Algorithm with Memorable Local Search) which combines a memorable local search method and an EA, and a convergence metric l3-value is proposed. In this section, the structure of EAMLS is explained first, then the design of operators of the Genetic Algorithm (GA) and EAMLS is introduced. Finally, the details of l3-value are described.

EAMLS
Algorithm 1 is the pseudo-code of EAMLS. Compared with the GA, the main characters of EAMLS contain: (1) no crossover operation; (2) population size self-adaptation; (3) the combination of a memorable local search (MLS) and EA; and (4) the adoption of convergence metric l3-value.
In Algorithm 1, variable allN eighborInds stores all non-repeating neighborhood individuals generated by MLS before current generation and is updated at the end of every generation (Algorithm 1, Line 2 and Line 13). In the evolutionary process, a new population is generated from the current population after mutation, MLS, and survival selection (Algorithm 1, Lines 5-8), and convergence metric l3-value is calculated (Algorithm 1, Line 9). If l3-value is bigger than a pre-set threshold β, population size is increased by a pre-set step size p (Algorithm 1, Lines 10-12). The description of the l3-value will be shown in Section 4.3. add neighborInds to of f springLS; 9: i ← i + 1; 10: if i > n then 11: break; 12: end if 13: end if 14: end for 15: Return of f springLS Algorithm 2 is the pseudo-code of the memorable local search (MLS). First, we will introduce the definition of the neighborhood. The neighborhood of an individual is the set of individuals whose Hamming distance is 1 from that individual. In MLS, sort (µ+λ) population (variable sortedP arentP op in Algorithm 2) in decreasing order, i.e., good individuals are in the front. Then check individuals one by one in sorted (µ + λ) population whether it has been local-searched before this generation, and do local-search for those have not been local-searched (Lines 5-7 in Algorithm 2. It looks like that the algorithm remembers all localsearched individuals and that's why we name it Memorable Local Search). Exit the loop until the number of new individuals which have been local-searched in this generation reaches n (Lines 9-12 in Algorithm 2).

Operator Design of GA and EAMLS
In Section 5, we use a GA for comparison. Here some operators' design for GA and EAMLS is as follows 1 : Representation This paper uses binary representation. Every bit represents a location decision variable X j , j ∈ J.
Population Initialization Stochastic initialization is used in GA and EAMLS. Every gene of an individual takes 0 or 1 with equal probability.
Fitness Function In general, the bigger the fitness value is, the better the individual will be. Therefore, the reciprocal of the objective value of the individual is used as the fitness function.
Selection Operator In GA, roulette wheel selection is used to select parents to do crossover operation.
Crossover Operator In GA, a one-point crossover operator is used. For two parent individuals selected by the selection operator, do crossover operation according to a pre-set crossover rate.
Mutation Operator The bit-flipping mutation is used in GA and EAMLS. During mutation, every gene/bit of one individual mutates with a pre-set mutation rate.
Survival Selection Strategy We adopt (µ + λ) strategy to select next generation population from (µ + λ) population, i.e., the mixed population of the current generation population and the offspring.
Repair Strategy Repair strategy is working when there are individuals which do not satisfy the constraint (15). For an individual needed repair, check every gene in ascending order of fixed cost and change the gene with 0-value to 1 until the individual satisfies the constraint (15).
How to determine Y For one customer, the selected candidate locations (i.e., locations whose X j = 1) are allocated to it in ascending order of distance, which has been proved the optimal allocation pattern under a certain solution X [6] and can satisfy the constraints (12), (13), and (15).

Convergence Metric l3-value
In order to observe the evolutionary process, a convergence metric l3-value is proposed.
Algorithm 3 is the pseudo-code of the calculation method of l3-value. The new population generated after survival selection is checked, and the number of individuals which also belong to the set allN eighborInds is counted (Lines 2-6 in Algorithm 3). Then we calculate the proportion of these individuals in the population as l3-value (Line 7 in Algorithm 3). l3-value can be used to measure the convergence during the evolutionary process. The bigger the l3-value is, the stronger the evolution converges.

Computational Studies
Because this paper proposes a new problem, and there are not any algorithms like EAMLS can be used to compare directly, we compare EAMLS with a GA and CPLEX (a commercial optimization solver of IBM) on two models: m=2 and m= j∈J X j models. The difference between the two models is the allocation setting. In the m=2 model, the number of facilities allocated to each customer, i.e. m, is fixed to 2, which is adopted in much literature. The m= j∈J X j model is proposed by us in this paper and m varies with decision variables X during the search process. Section 5.1 shows the experimental design, including instances generation, parameters setting, and experimental environment. The experiments and results of the m=2 and m= j∈J X j models are presented in Sections 5.2. Analyses and discussions are given in Section 5.3.

Experimental Design
Instance Generation This paper generates problem instances uniformly at random on different scales. The parameters used to generate instances are shown in Table 2. There are eight 10-node instances, eight 50-node instances, eight 100node instances, and four 600-node instances. Parameter Setting of Algorithms Some parameters values of GA and EAMLS are shown in Table 3. Table 4 presents the generation number and population size of GA and EAMLS, which associate with the scale of problem instances. The values of parameters in Tables 3 and 4 are chosen arbitrarily on the basis of meeting the following conditions: (1) EAMLS converges at the end of evolution; (2) the number of fitness evaluations (FEs) of GA is not lower than EAMLS. Besides, the default parameters of CPLEX are used. Experimental Environment The algorithms are implemented in Python 3.7 and run on Dell R370 server which has 2x Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz CPU, 128G RAM, and CentOS 7.6 operating system.
Statistical Test We use the Wilcoxon sign rank test to determine whether the results between EAMLS and other methods have statistically significant differences. The Wilcoxon sign rank test is a non-parameter test which is suitable for two related or matched samples and compares data in pair, hence it is suitable to use here.

Experiments on the m=2 and m= j∈J X j Models
For the m=2 model, We compare EAMLS with the GA and CPLEX on smallscale (10-node), mid-scale (100-node), and large-scale (600-node) instances. There are 30 runs on small and mid-scale instances and 10 runs on large-scale instances because of time. The computational results are shown in Table 5.
For the m= j∈J X j model, we compare EAMLS with the GA and CPLEX on 50 and 100-node instances, and there are 30 runs on each instance. Table 6 is the computational results.

Analyses and Discussions
We compare GA, CPLEX, and EAMLS on different scale (10, 100, and 600node) problem instances for m=2 model whose allocation setting is often used in literature, and the experimental results are shown in Table 5. Experimental results on 50 and 100-node instances of the new complicated m= j∈J X j model are presented in Table 6.
For m=2 model, from Table 5, we can see that CPLEX performs the best on both solution quality and time for small and mid-scale (10 and 100-node) instances. EAMLS can find solutions as good as CPLEX but need more time. Although CPLEX can solve small and mid-scale instances fast, it needs more RAM space as the problem scale increases. For large-scale problem (600-node) instances, EAMLS can find better solutions in less time compared with GA, while the CPLEX cannot find a solution.
The new m= j∈J X j model is more complicated to solve, especially for CPLEX. Table 6 demonstrates that the performance of EAMLS is better than GA and CPLEX on both solution quality and time.
According to the observation of computational results, we can get three features of EAMLS: (1) For small-and mid-scale problems, the solutions found Table 5. Computational results on m=2 model 10 (30 runs),100 (30 runs), and 600 (10 runs)-node instances. AOV is Average Objective Value. OR is the Optimal Rate and calculated by (# runs which finding the optimal solution)/(# all runs). Gap is calculated by (AOV(other method)-AOV(EAMLS))/AOV(EAMLS). When Gap is positive, the performance of other methods is worse than EAMLS, otherwise better. The symbol * in AOV represents the results between EAMLS and that method have statistically significant differences. The symbol -represents CPLEX cannot solve the instance or the optimal solution is unknown so no results can be given.

Instance
No. by EAMLS are comparable to those found by other methods; (2) For largescale problems, EAMLS significantly outperforms other methods; (3) EAMLS especially performs well on a) the new complicated model and b) large-scale problems. So why is EAMLS effective? Through combining MLS with EA and using l3-value to guide the population size to grow gradually, EAMLS performs a full local search while performing a global search, maintains good population diversity, as well as speeds up the convergence.
Our algorithm EAMLS performes well on large-scale problem instances of both m=2 and m= j∈J X j models, and its advantage will become more apparent as the problem scale increases. However, the larger the problem, the greater the number of FEs needed for EAMLS to converge.

Conclusion
This paper proposes a new RFLP formulation in which the number of facilities allocated to each customer (i.e., m) is not fixed but varies with decision variables X. This non-fixed allocation setting makes the model more close to scenarios in real life.
A hybrid evolutionary algorithm EAMLS (which can also be viewed as a memetic algorithm) is proposed to solve the model. Combining a memorable local search method and EA, EAMLS performs well on the new complicated model and large-scale problems considered in this paper, and its advantage will become more obvious as the problem scale increases. Besides, a convergence metric l3-value is proposed to analyze the algorithm's convergence speed and exam the evolutionary process.
Finally, we explore the large-scale problems of the two models. Under what conditions is a problem a large-scale problem? It is related to the model and whether the problem can be solved by the exact algorithm efficiently. For the m=2 model which allocates a fixed number of facilities to each customer as in the existing research, we solve large-scale problem instances (600-node) whose scale is much larger than other literature. For the new complicated m = j∈J X j model, 100-node instances can be treated as large-scale problems because the exact algorithm or optimization solver cannot solve them effectively. And our algorithm EAMLS has good performance on large-scale problems considered in this paper.
In the future, the model which integrates various factors should be studied, and more complicated FLPs, such as dynamic FLP and FLP under uncertain environments, should be focused. Furthermore, effective meta-heuristic algorithms for large-scale problems should be studied as well.