Generalised Pattern Search with Restarting Fitness Landscape Analysis

Fitness landscape analysis for optimisation is a technique that involves analysing black-box optimisation problems to extract pieces of information about the problem, which can beneficially inform the design of the optimiser. Thus, the design of the algorithm aims to address the specific features detected during the analysis of the problem. Similarly, the designer aims to understand the behaviour of the algorithm, even though the problem is unknown and the optimisation is performed via a metaheuristic method. Thus, the algorithmic design made using fitness landscape analysis can be seen as an example of explainable AI in the optimisation domain. The present paper proposes a framework that performs fitness landscape analysis and designs a Pattern Search (PS) algorithm on the basis of the results of the analysis. The algorithm is implemented in a restarting fashion: at each restart, the fitness landscape analysis refines the analysis of the problem and updates the pattern matrix used by PS. A computationally efficient implementation is also presented in this study. Numerical results show that the proposed framework clearly outperforms standard PS and another PS implementation based on fitness landscape analysis. Furthermore, the two instances of the proposed framework considered in this study are competitive with popular algorithms present in the literature.


Introduction
Although findings in the continuous domain are not entirely conclusive [2], No Free Lunch Theorems [42] suggest that algorithms are designed to address specific optimisation problems.
Many real-world problems can be formulated as blackbox optimisation problems [3]. In these cases, information about the problem is not available a priori; thus, modern implementations include mechanisms to make the algorithm suitable to the specific features of the problem. We can broadly distinguish two approaches to design algorithms. These two algorithmic philosophies, albeit ideologically different, overlap in their practical implementations.
adaptive algorithms: feedback on the algorithmic behaviour regarding the specific problem is collected and used to adjust the algorithm, see [6,7,36] -fitness landscape analysis: the optimisation problem is analysed by a method, e.g., an artificial intelligence tool, and the results are used to design the algorithm; see [17,23,24,33,34] The feedback used by adaptive algorithms can be categorised into the following two groups.
-Performance-based feedback: the most successful parameter setting and/or algorithmic operator(s) are likely to be selected for the subsequent stages of the optimisation process. This is the case for many hyperheuristic [4,8] and self-adaptive [19] schemes. -Behaviour-based feedback: some metrics associated with the functioning of the algorithm are monitored and fed back to update parameter settings and/or algorithmic operator (s). Some examples are diversity-based adaptation [26,32] and super-fit adaptation for swarm intelligence algorithms [5][6][7]16].
"This article is part of the topical collection "Applications of bioinspired computing (to real world problems)" guest edited by Aniko Ekart, Pedro Castillo, and Juanlu Jiménez-Laredo" This article focuses on fitness landscape analysis. This study proposes a novel method for analysing fitness landscapes and performs the design of the optimiser based on the proposed analysis' method. The section "Related Works: Algorithmic Design based on Fitness Landscape Analysis" provides some background about algorithmic design informed by fitness landscape analysis. The section "Proposal of this Article" briefly outlines the proposed technique, explains its motivation, and describes the content of the remainder of this article.

Related Works: Algorithmic Design Based on Fitness Landscape Analysis
A fitness landscape is a tuple composed of a set/domain of candidate solutions, a notion of neighborhood, and a fitness/ objective function; see [38]. Fitness landscape analysis is a popular topic that has attracted the attention of researchers in optimisation over the past 2 decades; see [22,24]. Although the majority of studies on fitness landscape analysis focus on the combinatorial domain [35], recent studies proposed valuable contributions to the continuous domain [23,25]. For example, an analysis of separability performed using the Pearson's coefficient was proposed in [9]. Using an interaction matrix to identify groups of strongly interacting variables has been proposed in [39]. The study in [1] proposes the construction of a graph-based abstraction of the search space representing the optimisation problem, known as a local optimisation network. In this graph, each node of the graph is a local optimum, while the edges between nodes represent the adjacency of the basins of optima. A special mention should be given to Covariance Matrix Adaptive Evolution Strategy (CMAES) [13,14]. This popular algorithm progressively adapts a multi-variate Gaussian distribution from which candidate solutions are sampled. This adaptation is performed to increase the likelihood of previously successful candidate solutions. While CMAES runs, its distribution adapts to the geometry of the problem/ local optimum. Thus, CMAES can be considered an adaptive algorithm belonging to the performance-based feedback group and an algorithm designed on the basis of a fitness landscape analysis.
Another recent example of an algorithm designed on the basis of a fitness landscape analysis for the continuous domain is the Covariance Pattern Search (CPS) [30,31]. This algorithm characterises the geometry of the problem by sampling points whose objective function is below a certain threshold. The covariance matrix associated with the sampled points and its eigenvectors are then calculated. These eigenvectors are then used as the search directions of the Generalised Pattern Search (GPS), see [40]. The results in [30,31] clearly show that the pattern based on the eigenvectors of a well-estimated covariance matrix outperforms the classical pattern based on the fundamental orthonormal basis (the directions of the variables). On the other hand, the application of CPS is impractical, since it requires the setting of the above-mentioned threshold parameter for each optimisation problem. This setting is performed empirically and thus requires considerable computational effort, especially in high-dimension cases. This feature makes CPS neither versatile (over various problems) nor easily scalable.
One paper [28] overcomes this limitation using a restarting scheme that divides the run into local runs. The resulting algorithm, Adaptive Covariance Patter Search (ACPS), uses the best objective function value at each restart as the threshold for the following local run.

Proposal of this Article
The present article extends the concept of CPS by enhancing its fitness landscape analysis. Besides determining the search directions of PS, herein referred to as PS, the present study also assigns a step size to each search direction. Each step size is calculated on the basis of an estimation of the directional derivative along the associated search directions: the proposed method performs large steps when the directional derivative is low (the fitness landscape is flat) and small steps when the directional derivative is high (the fitness landscape is steep). Furthermore, the present study makes use of the restarting strategy proposed in [28] to overcome the CPS limitation of setting a threshold for each problem. Thus, the present article can be considered a generalisation of ACPS to an algorithmic framework, which is referred to as Generalised Pattern Search with Restarting Fitness Landscape Analysis (GPSRFLA).
The remainder of this article is organised as follows: The section "Basic Notation and Generalised Pattern Search" introduces the notation and describes the basics of PS and GPS. The section "Proposal of this Article" describes the proposed framework and provides a pertinent theoretical justification for the fitness landscape analysis. The section "A Computationally Efficient Instance of GPSRFLA" describes ACPS and presents it as a computationally efficient instance of GPSRFLA. The section "Numerical Results" provides the numerical results of this work. Finally, the section "Conclusion" ends with the conclusive remarks of the study.

Basic Notation and Generalised Pattern Search
Before entering the description of the algorithms, let us introduce the notation used throughout this paper. Let us indicate with an n-dimensional vector of real numbers ( ∈ ℝ n ). We will refer to a numerical optimisation problem SN Computer Science that is the minimisation of a function f ∶ D → Y where D ⊆ ℝ n and Y ⊆ ℝ In this study, we will focus on the box constrained case ( a 1 , b 1 × a 2 , b 2 … × … a n , b n with × indicating the Cartesian product), which includes the unconstrained case ]−∞, +∞[ n = ℝ n .
We will call the set D "decision space". Also, we will refer to the n-dimensional vector as "vector", "point", or "candidate solution", while we will refer to its components as "design variables".
The PS algorithms are a family of deterministic direct search methods [40], i.e., deterministic optimisation algorithms that do not require gradient calculations. The algorithms that belong to this family have been conceptualised by means of a generalised scheme, namely GPS [40]. GPS is characterised by two elements: -a set of search directions (a basis of vectors) spanning the decision space D; -a trial step vector endowed with a step variation rule.
From an initial point , the PS algorithms perturb the solution along the search directions in an iterative manner. Let us indicate with k the iteration index. Formally, the search directions are determined by two matrices. The first is a non-singular matrix, namely the basis matrix, and it is indicated by ∈ ℝ n×n where ℝ n×n is the set of square matrices of real numbers of order n. The second is a rectangular matrix, namely the generating matrix, and it is indicated with k ∈ ℤ n×p where ℤ n×p is the set of matrices of relative numbers of size n by p with p > 2n and rank n.
The search directions are given by the columns of the matrix that is referred to as the pattern (and has size n × p ). Thus, a pattern can be seen as a repository of search directions, with n of them being in the direction of a basis of ℝ n and n of them being in the same directions but with opposite orientation. There may potentially be some additional directions.
The GPS k th trial step along the i th direction is the vector k , defined as where k is a positive real number and i k is the i th column of the matrix k . The parameter k determines the step size, while i k is the direction of the trial step. If k is the current best solution at the iteration k, the trial point generated by means of the trial step would be The set of operations that yields a current best point is called the exploratory move. The exploratory move succeeds when a solution with better performance is detected, and fails when no update of the current best is found. Within the GPS family, various PS implementations employ different strategies, e.g., attempting only one trial vector per step or exploring all the columns of k k .
The pseudocode of GPS is given in Algorithm 1.

An Implementation of Pattern Search
Although GPS in [40] refers to a generic basis matrix , most PS implementations use the identity matrix as the matrix, that moves along the directions of the problem. Furthermore, in the absence of specific information, the elements of the generating matrix k are selected to explore each direction in the same way.
One example is the greedy implementation proposed in [41], which states that each design variable samples a trial solution-if the first move fails, it attempts to explore the opposite direction. This greedy approach appears to be especially effective for multi-variate problems as it allows a quick enhancement of the initial solution; see [41]. Let us indicate with 1 , 2 , … n the orthonormal basis of ℝ n where the apex indicates the transpose and is a scalar ( = 1 ). This greedy PS first samples (minus move) and if this trial point is worse than the current best k , it attempts to sample (plus move) before moving to the following design variable. We will say that a move succeeded if the objective function f of the trial point is better than that of k . The number of successful and failed moves determines the cost of a full scan alongside all directions. Each scan requires between n and 2n objective function calls.
It can be remarked that an asymmetric exploration is carried out to avoid revisiting the same solution multiple times, see [31]. If moves in all directions fail, then radius is halved. The algorithm is stopped either when the radius is smaller than the tolerance value or when the computational budget is exceeded. The pseudocode of PS is reported in Algorithm 2.
In terms of GPS notation, this PS implementations in two dimensions ( n = 2 ) are characterised by the basis matrix while the generating matrix k is and k = . Each row of the matrix k represents a variable in the system of coordinates identified by the basis . Each column contains the information of a possible outcome of the for loop in Algorithm 2. For example, the first column represents a scenario in which, along the first variable, the minus move failed and the plus move succeeded and, along the second variable, both the moves failed. In a similar way, the sixth column indicates that, along the first variable, the minus move failed and the plus move succeeded, while along the second variable, the minus move succeeded. In other words, each column of k is a possible linear combination of plus and minus moves that can be potentially performed within the for loop in Algorithm 2.
, landscape uses progressively more updated data to make progressively more accurate decisions about the pattern that enhances the performance of the algorithm. Thus, the proposed framework is designed to progressively learn the optimisation problem and train Patter Search accordingly. At the beginning of the optimisation, one point is sampled within the decision space D. A total budget t b is allocated to the entire optimisation process, including the fitness landscape analysis. Then, a maximum local budget l b is allocated to the fitness landscape analysis and PS between two consecutive restarts, which is referred to as local run. The inputs of the fitness landscape analysis are the domain D, the objective function f, and the current best point . The outputs of the fitness landscape analysis are the basis matrix and the generating matrix k . The latter two matrices, which compose the pattern matrix, are then used as inputs with the current best solution for the Pattern Search local run. The output of the Pattern Search local run is the current best solution , which is then inputted into the fitness landscape analysis component again. At each restart, the radius k is reinitialised to search for the optimum with the new pattern matrix k in the following local run. Algorithm 3 describes the external framework of the proposed GPSRFLA.

Generalised Pattern Search with Restarting Fitness Landscape Analysis
The proposed GPSRFLA framework is composed of two algorithmic components -Fitness Landscape Analysis -(Generalised) Pattern Search, which are periodically restarted. At each restart, the fitness landscape is analysed and the analysis informs the setting of the pattern of PS. It is then ran. At each restart, the fitness The following two subsections describe in detail the functioning of the fitness landscape analysis and PS, respectively.

Fitness Landscape Analysis
The fitness landscape analysis component makes use of a data structure , which can contain up to n v candidate solutions. At the first local run, n s points (with n s > n v ) are sampled within the decision space D. The objective function value of these n s points is calculated and the n v with the best objective function values are saved in the data structure . In the following local runs, n s points (with n s > n v ) are sampled in the neighborhood of the best current solution . If the candidate solution is the neighborhood is determined by the hyper-cube, where each side is the interval k v parameter to set and is the radius of PS; see section 2.1.
The data structure can be represented as Using the points (vectors) in , the mean vector and covariance matrix are calculated. The mean vector is calculated as and the generic element c j,l of the covariance matrix is: Then, the eigenvectors of are calculated via Cholesky factorisation. Since is symmetric, it is diagonalizable, and an orthogonal basis of its eigenvectors can be found; see [27]. These eigenvectors are used as the basis to explore the space in the PS logic. In other words, the matrix , whose columns are the eigenvectors of , is used as the basis matrix of GPS; see [29]. The eigenvalues associated with the eigenvectors 1 , 2 , … , n , respectively, are used to update the generating matrix k of GPS. More specifically, the matrix k can be represented as a vector of row vectors; each of them associated with a design variable of the optimisation problem The generating matrix k is then updated by multiplying each row by the square root of the corresponding eigenvalue The PS is then run on current best solution , with the pattern k calculated as Algorithm 4 displays the pseudocode of the Fitness Landscape Analysis.

Rationale of Fitness Landscape Analysis
This section explains the rationale behind the choices made above, i.e., what the fitness landscape analysis measures and how it informs the algorithmic design of PS. First, it is important to visualise the information contained in the data structure . Let us consider the following four shifted and rotated objective functions in two dimensions within [−100, 100] 2 ; see [20]: and is a random rotation matrix. Figure 1 displays the plot of the bi-dimensional vectors contained in the data structure and the directions of the eigenvectors of the covariance matrix associated with the sampled points. Figure 1 shows that the data structure contain pieces of information about the geometry of the problem and that the eigenvectors of the covariance matrix identify important directions for such problem. For example, for the bent cigar problem, the eigenvectors identify the longitudinal and transverse axes of the line detected by the points.
As reported in [31], the rationale behind the choice to use the eigenvectors i is due to the fact that the matrix , whose = −21.98 11.55 columns are the eigenvectors i , is the transformation matrix that diagonalises the matrix that is where is a diagonal matrix whose diagonal elements are the eigenvalues of and − = as is an orthogonal matrix ( is the transpose of the matrix ). The directions of the eigenvectors can be interpreted as a new reference system characterised by a lack of correlation between pairs of variables. Thus, the new reference system exploits the available information about the geometry of the problem. This concept is broadly used in other contexts, especially in data science, and is closely related to principal component analysis [18].
Furthermore, as reported in [31], the directions of the eigenvectors of the covariance matrix identify the maximum and minimum directional derivatives. Thus, these eigenvectors are an efficient basis for Patter Search. For example, let us consider the shifted and rotated bent cigar function in two variables f ( ) = z 2 1 + 10 6 z 2 2 with = ( − ) . The shift vector is and is the rotation matrix Figure 2 shows the estimated directional derivative along the directions of the variables, as in the case of PS (see Algorithm 2), and along the directions of the eigenvectors i of the covariance matrix. Figure 2 implicitly provides an interpretation of the search along the directions of the eigenvectors i : the directions of these eigenvectors identify the steepest and flattest directions of the fitness landscape.
To enhance the performance of PS, it is here proposed to use large step sizes along those directions corresponding to a flat fitness landscape and small step sizes along those directions corresponding to a steep fitness landscape. Although we cannot know in advance the values of the directional derivatives, we have their estimated values by means of the eigenvalues of the covariance matrix. This is the main motivation of the proposed update of the generating matrix k .
Let us consider again the covariance matrix calculated as shown above. Let = , , … , be a matrix whose columns are the eigenvectors of and let us indicate with the corresponding eigenvalues.
It must be observed that, since is symmetric, the eigenvalues are all real numbers; see [27]. Furthermore, the eigenvectors can be chosen as an orthonormal basis (every pair of vectors is orthogonal and each vector has modulus equal to 1) of a vector space, which we refer to as eigenspace. These eigenvectors span the domain D.
Thus, if we consider a vector ∈ D expressed in the basis B = { 1 , 2 , … , n } , we may express it through the corresponding vector in the reference system/basis of the eigenvectors Since the mean vector calculated from the vectors in is also an element of D, we can express it via eigenvectors Let us now introduce the covariance matrix of the data in in the reference system identified by the eigenvectors. This is expressed by where From Eqs. (8) and (9), it follows that: Thus, the diagonal elements of are the eigenvalues of , while the extradiagonal elements are zero. Since the diagonal elements of a covariance matrix are the variances 2 i of the data along the direction , it follows that: = .

SN Computer Science
The selection of the best n v points out of the n s available samples can be conceptually considered the selection of points whose objective function value is below a certain threshold thre, where thre is the objective function value of the point in the data structure with the worst (highest) objective function value. Thus, we may say that the fitness landscape analysis selects those points, such that f ( ) ≤ thre . In a basin of attraction, these samples would be distributed around a local optimum. Let us suppose, for simplicity, the notation that the optimum is in the null vector (i.e., let us apply an operation of translation). The directional derivative along some direction is Let us observe that f ( ) is a constant, = l ⋅ with l modulus of and | | = 1 , since it is a versor (i.e., a vector with modulus 1). When we pose f ( ) = thre , we find that f ( ) − f ( ) = thre * is also a constant. Thus that is the directional derivative along the eigenvector is inversely related to the modulus l.
Along the direction of i , there exists a correlation between the modulus l and the corresponding eigenvalue i . Let us consider the two points and − belonging to the direction of . Let us assume that the objective function values of these points are thre. Thus, the distance between and − estimates the width of the distribution along the direction of i and the standard deviation estimates the average modulus of the points in . Considering that the standard deviation calculated along the direction of i is the square root of i , it follows that: By combining Eq. (11) and (12), we may conclude that the directional derivative in the direction of an eigenvector i of the covariance matrix , as calculated above, is inversely proportional to the square root of the corresponding eigenvalue Figure 3 reports an example that is useful in visualising the meaning of Eq. (12). The points belonging to the data structure associated with the samples for the shifted and rotated ellipsoid in two dimensions are reported as blue dots. The dashed lines indicate the directions of the eigenvectors of the covariance matrix. The two associated eigenvalues are (13) f ( ) ≈ thre * √ i . 1 = 1.5221 and 2 = 4324.1 , respectively. These numbers reflect the distribution of the points that appear as a thin and long line. We may observe that 2 ≫ 1 and the points in span a much wider range along the direction of 2 (in black) than along the direction of 1 (in red). Therefore, we can see that the eigenvalues estimate the extent of the distribution of points along the directions of the corresponding eigenvectors.
Furthermore, as shown by the contour, along the direction of the first eigenvector, the fitness landscape is very steep. However, along the direction of the second eigenvector, the landscape is nearly flat. This statement intuitively explains the meaning of Eq. (13).
Since the square roots of the eigenvalues are inversely proportional to the directional derivative, it is proposed to use them as direct multipliers to set the step sizes along each search direction of the basis of eigenvectors. With reference to Fig. 3, along the direction of 1 , the landscape is steep and the corresponding eigenvalue 1 is small. Therefore, we use 1 as a multiplier to ensure that small steps are performed. Conversely, along the direction of 2 , the landscape is flat and the corresponding eigenvalue 2 is large. Thus, we use 2 as a multiplier to enable large steps along that direction. This logic explains the proposed way of modifying k in Eq. (6).

Pattern Search Designed on the Basis of Fitness Landscape Analysis
This article proposes a restarting algorithm based on PS logic presented in Algorithm 2. At each local run, the fitness landscape analysis returns the pattern k = k . This means that the minus move along the ith direction is and the plus move is We may express the same concept in terms of GPS notation using the example in two dimensions of Section 2.1. The proposed PS in two dimensions ( n = 2 ) is characterised by the basis matrix and the generating matrix k and k = . The trial step would be where k = and i k is the i th column of the matrix k . We may easily verify that, by combining the moves in Eqs. (14) and (15), all the potential k i k can be generated. The main parameters of GPSRFLA are reported in the following.

A Computationally Efficient Instance of GPSRFLA
A characterising feature of algorithms based on fitness landscape analysis is the requirement of objective function calls, which impacts the budget of the optimiser. This section presents a PS implementation that, although fits within the GPSR-FLA framework in Algorithm 3, fills the data structure with the points visited during the search.
More specifically, the data structure is an output of the GPS and an input for the fitness landscape analysis. In the first local run, the basis matrix is initialised to the identity matrix . GPS moves along the directions of the variables as shown in Algorithm 2. During each local run, GPS saves all the successful trial points in the data structure , i.e., all the points that have been current best points during the local run. The filled data structure is then passed to the fitness landscape analysis component, which calculates the covariance matrix and its eigenvector matrix = 1 , 2 , … , n . The matrix is then used as the basis matrix for the following local run. It must be observed that, at each restart, the data structure contains the points below a threshold identified by f k with k starting point of that local run. Algorithm 6 illustrates the resulting algorithm, ACPS, which was presented for the first time in [28].

Function name Function calculation
Sphere Sum of powers The main advantage of the ACPS implementation in Algorithm 6 is that no extra objective function calls are required to perform the fitness landscape analysis. As shown above, ACPS uses the points visited during the search to perform the fitness landscape analysis. This means that, unlike the general GPSRFLA framework, ACPS may use the entire computational budget to optimise the objective function. ACPS uses some of these objective function calls to progressively analyse and learn the fitness landscape, while it refines the adaptation of the pattern matrix k to the optimisation problem.
It must be observed that the proposed ACPS resembles the Rosenbrock method [37] as both use a basis of vector that is progressively adapted during the run (the Rosenbrock Method belongs to the GPS family). However, the two algorithms are radically different in terms of the way the basis is selected and updated. More specifically, while ACPS makes use of the eigenvectors of the covariance matrix of a set of samples, the Rosenbrock Algorithm stores the successful moves and determines a new orthonormal basis guided by previous successful moves.
It must be remarked that, although ACPS can be considered an instance of the GPSRFLA framework, it does not make use of the eigenvalues to update the generating matrix k . This decision has been made considering the preliminary results that we obtained. The data structure is likely to obtain a small number of points. On one hand, these points are enough to correctly estimate, through the eigenvectors of the covariance matrix associated with them, the directions with maximum and minimum directional derivatives (multiple steps as illustrated in Fig. 4). On the other hand, these points are usually not enough to correctly estimate the values of the directional derivatives through the calculation of the eigenvalues. Since these wrong estimations tend to jeopardise the performance of the algorithm, it was decided not to exclude the update of the generating matrix k from ACPS.

Numerical Results
To test and compare the performance of the GPSRFLA framework and ACPS, a set of functions from the IEEE CEC2013 benchmark [20] was selected and adapted. Since PS is a local search, we selected all the unimodal problems, hence reproducing the testbed of CPS used in [30]. We also reproduced both the versions of ellipsoid presented in [30] ( f 2 and f 3 ). The condition number of these two ellipsoids   worsens with dimensionality at different speeds. In this paper, alongside bent cigar and discus, we included their modified versions.
Finally, to show that GPSRFLA is capable, to some extent, at handling multimodal fitness landscapes, we included two simple multimodal functions from [20]. The list of the functions used in this study is displayed in Table 1. As shown in Table 1, each problem has been shifted and rotated; the vector is transformed into . The shift vector of [20] has been used. The rotation matrices have been Table 4 Average error avg ± standard deviation over 51 runs for the problems listed in Table 1: BFGS algorithm [11], CMAES [14], the implementation of GPSRFLA as shown in Algorithms 3, 4, and 5, ACPS, as shown in Algorithm 6. GPSRFLA and ACPS, respectively, The problems in Table 1 have been considered in 10, 30, and 50 dimensions. For each problem in Table 1, each dimensionality level, and each algorithm in this study, 51 independent runs were performed. For each run, the algorithms under consideration, if single solution, have been run with the same initial solution. All the algorithms in this paper have been executed with a budget of 10000 ⋅ n function calls, where n is the problem dimensionality. The results for each algorithm and problem are expressed in terms of mean value and ± standard deviation over the 51 independent runs performed. Furthermore, to statistically investigate whether the application of the proposed method results in performance gain, the Wilcoxon rank-sum test was applied; see [12]. In the tables in this section, a "+" indicates that the proposed algorithm (GPSRFLA/ACPS) significantly outperformed its competitor, a "−" indicates that the competitor significantly outperformed the proposed algorithm, and a "=" indicates that there is no significant difference in performance.

Comparison among Pattern Search Algorithms
This section highlights the benefits of fitness landscape analysis on PS algorithms. To achieve this aim, the following algorithms have been tested on the problems in Table 1: -the original PS according to the implementation in [41] and the implementation Algorithm 2 -CPS presented in [30,31] -GPSRFLA according to the framework in Algorithm 3, the analysis component in Algorithm 4 and the GPS implementation in Algorithm 5 -ACPS [28] as reported in Algorithm 6

SN Computer Science
All PS variants in this article have been run with the initial radius = 0.1⋅ domain width = 20 . This parameter has been set using the indication in [41] and then tuning for our testbed.
As reported in [30], the budget of CPS has been split in two parts: 5000 ⋅ n function calls have been used to build the covariance matrix , while 5000 ⋅ n function calls have been spent to execute the algorithm along the directions of its eigenvectors.
The threshold thre for the problems in Table 1 are reported in Table 2. The threshold values were set empirically by testing values of the codomain that allowed for some points to be stored in the data structure , while some others were discarded; see [31].
GPSRFLA uses a local budget of l b = 1000 ⋅ n objective function calls, n s = 200 ⋅ n slots (and thus GPS is run with a budget of 800 ⋅ n objective function calls), n v = 5 ⋅ n slots, and k v = 100 . GPS is also stopped if ≤ 10 −15 .
ACPS is run with a maximum local budget l b = 1000 ⋅ n and is stopped if ≤ 10 −15 . Table 3 shows the numerical results of the four PS variants. The best results for each problem are highlighted in bold.
The results in Table 3 show that both GPSRFLA and ACPS significantly outperform PS in the vast majority of cases. Only for the sphere function, f 1 PS have an excellent performance. This is due to the fact that the identity matrix is already the ideal choice of basis matrix. For all other problems, PS either performs slightly worse or much worse than GPSRFLA and ACPS. The comparisons of GPSRFLA and ACPS against CPS also show that the proposed algorithms outperform their predecessors for almost all problems considered. This suggests that the restarting fitness landscape analysis logic is beneficial to the performance of the algorithm. Finally, the comparison of ACPS to GPSRFLA shows that, in most cases, the computational efficient logic embedded in ACPS yields significantly better results than GPSRFLA, which uses part of the budget to solely analyse the problem. However, the exploitation of the information regarding the directional gradients is potentially very powerful, as shown in the case of the rotated discus function f 6 .
Three examples of performance trends for the variants of PS included in this study are illustrated in Figs. 5, 6, and 7.

Comparison Against Other Algorithms
We have compared GPSRFLA and ACPS against the following two algorithms: -Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm [11] with an estimation of the gradient that may make it applicable to black-box problems; -Covariance Matrix Adaptive Evolution Strategy (CMAES) [14].
The motivations behind these two competitors are as follows: (1) BFGS is a Quasi-Newtonian (i.e., based on a solid theoretical foundation) algorithm that estimates the gradient and is here used as a benchmark algorithm; and (2) CMAES is a prevalent algorithm that, like GPSRFLA and ACPS, is based on theoretical considerations of multi-variate distributions and the covariance matrix. Table 4 lists the results of this comparison (the best results are highlighted in bold).
Numerical results in Table 4 show that, on average, CMAES, GPSRFLA, and ACPS are better suited than BFGS to address the black-box problems (without information on the gradient). However, BFGS is an excellent algorithm for several problems, especially the multi-variate ellipsoid function. The results show that CMAES and GPSFRLA have almost comparable performances, with CMAES performing better that GPSFRLA on average for seven problems and worse for four problems. ACPS is more competitive with CMAES as ACPS outperforms CMAES in 16 cases out of 33, while it was outperformed in 15. The algorithms have the same performance for the remaining two problems. Overall, we may conclude that ACPS and CMAES have a comparable performance for the problems under investigation.
Some further considerations can be made regarding the scalability of the algorithms. In the low-dimensional case ( n = 10 ), both the algorithms detect solutions very close to the optimum for the nine unimodal problems ( f 1 − f 9 ) and detect the global optimum in several runs. They also detect a local minimum for the two multimodal problems ( f 10 − f 11 ). In higher dimensions, we observe that the performances of both CMAES and ACPS deteriorate for some problems and remain excellent for others. For example, CMAES performs  Table 4, Figs. 8, 9, 10, and 11 show some examples of performance trends of GPSRFLA and ACPS against BFGS and CMAES. These plots confirm the reports on Table 4; at higher dimensions, CMAES and ACPS both appear to be inadequate at solving some problems but are very well suited for others. Figure 10 shows an example for which GPSRFLA and ACPS perform poorly compared to CMAES. Conversely, Fig. 11 shows an example for which both GPSRFLA and ACPS display an excellent performance, while CMAES appears to be inadequate.

Statistical Ranking via the Holm-Bonferroni Procedure
To further strengthen the statistical analysis of the presented results, we performed the Holm-Bonferroni [15] procedure for the six algorithms and 33 problems (11 objective functions ×3 levels of dimensionality) under investigation. The results of the Holm-Bonferroni procedure are presented in Table 5. A score R j for j = 1, … , N A (where N A is the number of algorithms under analysis, N A = 6 , in this paper) has been assigned. The score has been assigned in the following way: for each problem, a score of 6 is assigned to the algorithm displaying the best performance, 5 is assigned to the second best, 4 to the third, and so on. For each algorithm, the scores obtained for each problem are summed up and averaged over the 33 test problems. With the calculated R j values, ACPS has been taken as the reference algorithm. R 0 indicates the rank of ACPS and, with R j for j = 1, … , N A − 1 , the rank of the remaining four algorithms. Let j be the index of the algorithm. The values z j have been calculated as where N TP is the number of test problems (33 in this study). By means of the z j values, the corresponding cumulative normal distribution values p j have been calculated; see [10] These p j values have then been compared with the corresponding ∕j , where is the level of confidence, set to 0.05 in this case. Table 5 displays the ranks, z j values, p j values, and corresponding ∕j obtained. Moreover, it is indicated whether the null hypothesis (which states that the two algorithms have indistinguishable performance) is 'Rejected', i.e., the algorithms have statistically different performance, or 'Failed to Reject', meaning that the test failed to assess that there is different performance (one does not outperform the other).
The results of the Holm-Bonferroni procedure in Table 5 show that ACPS achieved the highest rank. ACPS and CMAES have comparable performances, while ACPS has a better performance than the other four algorithms in this study.

Conclusion
GPS is a family of single solution deterministic algorithms that search for the optimum by moving along a set of moves stored in the pattern matrix. The choice of pattern matrix is still an open issue. This article proposes a restarting scheme where, at each restart, the pattern matrix is updated following a fitness landscape analysis.
Two algorithmic implementations encompassing the two novel contributions of this study with respect to the literature are proposed. The first is the development of a criterion to estimate the directional derivatives (steep and flat directions) through the eigenvalues of the covariance matrix associated with a sample of points, as well as a mechanism to exploit this information within the search. The second is an algorithmic mechanism that uses the objective function calls performed during the search for the purpose of the fitness landscape analysis. The latter contribution enhances the computational efficiency and consequently the performance of the algorithm.
The two proposed implementations outperform their predecessors and are competitive with a Quasi-Newtonian method and a popular high-performing metaheuristic. The second implementation proved to have remarkably good performance. While the two novel contributions of this paper appear to be separately effective, more work is required to effectively combine them with the purpose of superposing their respective benefits.
Funding This study was funded by the institutions indicated in the list of affiliations.

Conflict of Interest
The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated SN Computer Science otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.