Introduction

Although findings in the continuous domain are not entirely conclusive [2], No Free Lunch Theorems [42] suggest that algorithms are designed to address specific optimisation problems.

Many real-world problems can be formulated as black-box optimisation problems [3]. In these cases, information about the problem is not available a priori; thus, modern implementations include mechanisms to make the algorithm suitable to the specific features of the problem. We can broadly distinguish two approaches to design algorithms. These two algorithmic philosophies, albeit ideologically different, overlap in their practical implementations.

  • adaptive algorithms: feedback on the algorithmic behaviour regarding the specific problem is collected and used to adjust the algorithm, see [6, 7, 36]

  • fitness landscape analysis: the optimisation problem is analysed by a method, e.g., an artificial intelligence tool, and the results are used to design the algorithm; see [17, 23, 24, 33, 34]

The feedback used by adaptive algorithms can be categorised into the following two groups.

  • Performance-based feedback: the most successful parameter setting and/or algorithmic operator(s) are likely to be selected for the subsequent stages of the optimisation process. This is the case for many hyper-heuristic [4, 8] and self-adaptive [19] schemes.

  • Behaviour-based feedback: some metrics associated with the functioning of the algorithm are monitored and fed back to update parameter settings and/or algorithmic operator(s). Some examples are diversity-based adaptation [26, 32] and super-fit adaptation for swarm intelligence algorithms [5,6,7, 16].

This article focuses on fitness landscape analysis. This study proposes a novel method for analysing fitness landscapes and performs the design of the optimiser based on the proposed analysis’ method. The section “Related Works: Algorithmic Design based on Fitness Landscape Analysis” provides some background about algorithmic design informed by fitness landscape analysis. The section “Proposal of this Article” briefly outlines the proposed technique, explains its motivation, and describes the content of the remainder of this article.

Related Works: Algorithmic Design Based on Fitness Landscape Analysis

A fitness landscape is a tuple composed of a set/domain of candidate solutions, a notion of neighborhood, and a fitness/objective function; see [38]. Fitness landscape analysis is a popular topic that has attracted the attention of researchers in optimisation over the past 2 decades; see [22, 24]. Although the majority of studies on fitness landscape analysis focus on the combinatorial domain [35], recent studies proposed valuable contributions to the continuous domain [23, 25]. For example, an analysis of separability performed using the Pearson’s coefficient was proposed in [9]. Using an interaction matrix to identify groups of strongly interacting variables has been proposed in [39]. The study in [1] proposes the construction of a graph-based abstraction of the search space representing the optimisation problem, known as a local optimisation network. In this graph, each node of the graph is a local optimum, while the edges between nodes represent the adjacency of the basins of optima.

A special mention should be given to Covariance Matrix Adaptive Evolution Strategy (CMAES) [13, 14]. This popular algorithm progressively adapts a multi-variate Gaussian distribution from which candidate solutions are sampled. This adaptation is performed to increase the likelihood of previously successful candidate solutions. While CMAES runs, its distribution adapts to the geometry of the problem/local optimum. Thus, CMAES can be considered an adaptive algorithm belonging to the performance-based feedback group and an algorithm designed on the basis of a fitness landscape analysis.

Another recent example of an algorithm designed on the basis of a fitness landscape analysis for the continuous domain is the Covariance Pattern Search (CPS) [30, 31]. This algorithm characterises the geometry of the problem by sampling points whose objective function is below a certain threshold. The covariance matrix associated with the sampled points and its eigenvectors are then calculated. These eigenvectors are then used as the search directions of the Generalised Pattern Search (GPS), see [40]. The results in [30, 31] clearly show that the pattern based on the eigenvectors of a well-estimated covariance matrix outperforms the classical pattern based on the fundamental orthonormal basis (the directions of the variables). On the other hand, the application of CPS is impractical, since it requires the setting of the above-mentioned threshold parameter for each optimisation problem. This setting is performed empirically and thus requires considerable computational effort, especially in high-dimension cases. This feature makes CPS neither versatile (over various problems) nor easily scalable.

One paper [28] overcomes this limitation using a restarting scheme that divides the run into local runs. The resulting algorithm, Adaptive Covariance Patter Search (ACPS), uses the best objective function value at each restart as the threshold for the following local run.

Proposal of this Article

The present article extends the concept of CPS by enhancing its fitness landscape analysis. Besides determining the search directions of PS, herein referred to as PS, the present study also assigns a step size to each search direction. Each step size is calculated on the basis of an estimation of the directional derivative along the associated search directions: the proposed method performs large steps when the directional derivative is low (the fitness landscape is flat) and small steps when the directional derivative is high (the fitness landscape is steep). Furthermore, the present study makes use of the restarting strategy proposed in [28] to overcome the CPS limitation of setting a threshold for each problem. Thus, the present article can be considered a generalisation of ACPS to an algorithmic framework, which is referred to as Generalised Pattern Search with Restarting Fitness Landscape Analysis (GPSRFLA).

The remainder of this article is organised as follows: The section “Basic Notation and Generalised Pattern Search” introduces the notation and describes the basics of PS and GPS. The section “Proposal of this Article” describes the proposed framework and provides a pertinent theoretical justification for the fitness landscape analysis. The section “A Computationally Efficient Instance of GPSRFLA” describes ACPS and presents it as a computationally efficient instance of GPSRFLA. The section “Numerical Results” provides the numerical results of this work. Finally, the section “Conclusion” ends with the conclusive remarks of the study.

Basic Notation and Generalised Pattern Search

Before entering the description of the algorithms, let us introduce the notation used throughout this paper. Let us indicate with \({\mathbf {x}}\) an n-dimensional vector of real numbers (\({\mathbf {x}}\in {\mathbb {R}}^n\)). We will refer to a numerical optimisation problem that is the minimisation of a function \(f: D \rightarrow Y\) where \(D \subseteq {\mathbb {R}}^n\) and \(Y \subseteq {\mathbb {R}}\)

$$\begin{aligned} \min _{{\mathbf {x}}\in D} f\left( {\mathbf {x}}\right) . \end{aligned}$$

In this study, we will focus on the box constrained case (\(\left[ a_1,b_1\right] \times \left[ a_2,b_2\right] \ldots \times \ldots \left[ a_n,b_n\right]\) with \(\times\) indicating the Cartesian product), which includes the unconstrained case \(\left]-\infty ,+\infty \right[^n = {\mathbb {R}}^n\).

We will call the set D “decision space”. Also, we will refer to the n-dimensional vector \({\mathbf {x}}\) as “vector”, “point”, or “candidate solution”, while we will refer to its components as “design variables”.

The PS algorithms are a family of deterministic direct search methods [40], i.e., deterministic optimisation algorithms that do not require gradient calculations. The algorithms that belong to this family have been conceptualised by means of a generalised scheme, namely GPS [40]. GPS is characterised by two elements:

  • a set of search directions (a basis of vectors) spanning the decision space D;

  • a trial step vector endowed with a step variation rule.

From an initial point \({\mathbf {x}}\), the PS algorithms perturb the solution along the search directions in an iterative manner. Let us indicate with k the iteration index. Formally, the search directions are determined by two matrices. The first is a non-singular matrix, namely the basis matrix, and it is indicated by \({\mathbf {B}} \in {\mathbb {R}}^{n\times n}\) where \({\mathbb {R}}^{n\times n}\) is the set of square matrices of real numbers of order n. The second is a rectangular matrix, namely the generating matrix, and it is indicated with \({\mathbf {G}}_k\in {\mathbb {Z}}^{n \times p}\) where \({\mathbb {Z}}^{n \times p}\) is the set of matrices of relative numbers of size n by p with \(p>2n\) and rank n.

The search directions are given by the columns of the matrix

$$\begin{aligned} {\mathbf {P}}_k=\mathbf {BG}_k \end{aligned}$$
(1)

that is referred to as the pattern (and has size \(n \times p\)). Thus, a pattern can be seen as a repository of search directions, with n of them being in the direction of a basis of \({\mathbb {R}}^n\) and n of them being in the same directions but with opposite orientation. There may potentially be some additional directions.

The GPS \(k^{th}\) trial step along the \(i^{th}\) direction is the vector \({\mathbf {s}}_k\), defined as

$$\begin{aligned} {\mathbf {s}}_k=\varDelta _k \mathbf {Bg}_k^i, \end{aligned}$$
(2)

where \(\varDelta _k\) is a positive real number and \({\mathbf {g}}_k^i\) is the \(i^{th}\) column of the matrix \({\mathbf {G}}_k\). The parameter \(\varDelta _k\) determines the step size, while \(\mathbf {Bg}_k^i\) is the direction of the trial step.

If \({\mathbf {x}}_k\) is the current best solution at the iteration k, the trial point generated by means of the trial step would be

$$\begin{aligned} \mathbf {x^t}_k={\mathbf {x}}_k+{\mathbf {s}}_k. \end{aligned}$$
(3)

The set of operations that yields a current best point is called the exploratory move. The exploratory move succeeds when a solution with better performance is detected, and fails when no update of the current best is found. Within the GPS family, various PS implementations employ different strategies, e.g., attempting only one trial vector per step or exploring all the columns of \(\varDelta _k{\mathbf {P}}_k\).

The pseudocode of GPS is given in Algorithm 1.

figure a

An Implementation of Pattern Search

Although GPS in [40] refers to a generic basis matrix \({\mathbf {B}}\), most PS implementations use the identity matrix \({\mathbf {I}}\) as the matrix, that moves along the directions of the problem. Furthermore, in the absence of specific information, the elements of the generating matrix \({\mathbf {G}}_k\) are selected to explore each direction in the same way.

One example is the greedy implementation proposed in [41], which states that each design variable samples a trial solution—if the first move fails, it attempts to explore the opposite direction. This greedy approach appears to be especially effective for multi-variate problems as it allows a quick enhancement of the initial solution; see [41]. Let us indicate with \({\mathbf {e}}^1\), \({\mathbf {e}}^2, \ldots {\mathbf {e}}^n\) the orthonormal basis of \({\mathbb {R}}^n\)

$$\begin{aligned}&{\mathbf {e}}^1=\left( 1,0,\ldots 0\right) ^{{\mathbf {T}}} \\&{\mathbf {e}}^2=\left( 0, 1, \ldots 0\right) ^{{\mathbf {T}}} \\&\ldots \\&{\mathbf {e}}^n=\left( 0, 0, \ldots 1\right) ^{{\mathbf {T}}}, \end{aligned}$$

where the apex \(^{{\mathbf {T}}}\) indicates the transpose and \(\rho\) is a scalar (\(\rho =\varDelta _1\)). This greedy PS first samples (minus move)

$$\begin{aligned} \mathbf {x^t}={\mathbf {x}}_k-\rho \cdot {\mathbf {e}}^i, \end{aligned}$$
(4)

and if this trial point is worse than the current best \({\mathbf {x}}_k\), it attempts to sample (plus move)

$$\begin{aligned} \mathbf {x^t}={\mathbf {x}}_k+\frac{\rho }{2}\cdot {\mathbf {e}}^i \end{aligned}$$
(5)

before moving to the following design variable. We will say that a move succeeded if the objective function f of the trial point \(\mathbf {x^t}\) is better than that of \({\mathbf {x}}_k\). The number of successful and failed moves determines the cost of a full scan alongside all directions. Each scan requires between n and 2n objective function calls.

It can be remarked that an asymmetric exploration is carried out to avoid revisiting the same solution multiple times, see [31]. If moves in all directions fail, then radius \(\rho\) is halved. The algorithm is stopped either when the radius \(\rho\) is smaller than the tolerance value or when the computational budget is exceeded. The pseudocode of PS is reported in Algorithm 2.

figure b

In terms of GPS notation, this PS implementations in two dimensions (\(n=2\)) are characterised by the basis matrix

$$\begin{aligned} {\mathbf {B}}=\left( \begin{array}{cc} 1 &{} 0 \\ 0 &{} 1 \end{array}\right) , \end{aligned}$$

while the generating matrix \({\mathbf {G}}_k\) is

$$\begin{aligned} {\mathbf {G}}_k=\left( \begin{array}{ccccccccc} \frac{1}{2} &{} 0 &{} -1 &{} 0 &{} \frac{1}{2} &{} \frac{1}{2} &{}-1 &{}-1 &{} 0 \\ 0 &{} \frac{1}{2} &{} 0 &{} -1 &{} \frac{1}{2} &{} -1 &{}-1 &{} \frac{1}{2} &{} 0 \end{array} \right) , \end{aligned}$$

and \(\varDelta _k=\rho\). Each row of the matrix \({\mathbf {G}}_k\) represents a variable in the system of coordinates identified by the basis \({\mathbf {B}}\). Each column contains the information of a possible outcome of the for loop in Algorithm 2. For example, the first column represents a scenario in which, along the first variable, the minus move failed and the plus move succeeded and, along the second variable, both the moves failed. In a similar way, the sixth column indicates that, along the first variable, the minus move failed and the plus move succeeded, while along the second variable, the minus move succeeded. In other words, each column of \({\mathbf {G}}_k\) is a possible linear combination of plus and minus moves that can be potentially performed within the for loop in Algorithm 2.

Generalised Pattern Search with Restarting Fitness Landscape Analysis

The proposed GPSRFLA framework is composed of two algorithmic components

  • Fitness Landscape Analysis

  • (Generalised) Pattern Search,

which are periodically restarted. At each restart, the fitness landscape is analysed and the analysis informs the setting of the pattern of PS. It is then ran. At each restart, the fitness landscape uses progressively more updated data to make progressively more accurate decisions about the pattern that enhances the performance of the algorithm. Thus, the proposed framework is designed to progressively learn the optimisation problem and train Patter Search accordingly.

At the beginning of the optimisation, one point \({\mathbf {x}}\) is sampled within the decision space D. A total budget \(t_b\) is allocated to the entire optimisation process, including the fitness landscape analysis. Then, a maximum local budget \(l_b\) is allocated to the fitness landscape analysis and PS between two consecutive restarts, which is referred to as local run. The inputs of the fitness landscape analysis are the domain D, the objective function f, and the current best point \({\mathbf {x}}\). The outputs of the fitness landscape analysis are the basis matrix \({\mathbf {B}}\) and the generating matrix \({\mathbf {G}}_k\). The latter two matrices, which compose the pattern matrix, are then used as inputs with the current best solution \({\mathbf {x}}\) for the Pattern Search local run. The output of the Pattern Search local run is the current best solution \({\mathbf {x}}\), which is then inputted into the fitness landscape analysis component again. At each restart, the radius \(\varDelta _k\) is reinitialised to search for the optimum with the new pattern matrix \({\mathbf {P}}_k\) in the following local run. Algorithm 3 describes the external framework of the proposed GPSRFLA.

figure c

The following two subsections describe in detail the functioning of the fitness landscape analysis and PS, respectively.

Fitness Landscape Analysis

The fitness landscape analysis component makes use of a data structure \({\mathbf {V}}\), which can contain up to \(n_v\) candidate solutions. At the first local run, \(n_s\) points (with \(n_s > n_v\)) are sampled within the decision space D. The objective function value of these \(n_s\) points is calculated and the \(n_v\) with the best objective function values are saved in the data structure \({\mathbf {V}}\). In the following local runs, \(n_s\) points (with \(n_s > n_v\)) are sampled in the neighborhood of the best current solution \({\mathbf {x}}\). If the candidate solution \({\mathbf {x}}\) is

$$\begin{aligned} {\mathbf {x}}=\left( x_1,x_2,\ldots ,x_n\right) , \end{aligned}$$

the neighborhood is determined by the hyper-cube, where each side is the interval \(\left[ x_i-\delta , x_i+\delta \right]\) where \(\delta =k_v\cdot \rho\) with \(k_v\) parameter to set and \(\rho\) is the radius of PS; see section 2.1.

The data structure \({\mathbf {V}}\) can be represented as

$$\begin{aligned} {\mathbf {V}}=\left( \begin{array}{cccc} x_{1,1} &{} x_{1,2} &{} \ldots &{} x_{1,n} \\ x_{2,1} &{} x_{2,2} &{} \ldots &{} x_{2,n} \\ \ldots &{} \ldots &{} \ldots &{} \ldots \\ x_{m,1} &{} x_{m,2} &{} \ldots &{} x_{m,n} \\ \end{array} \right) . \end{aligned}$$

Using the points (vectors) in \({\mathbf {V}}\), the mean vector and covariance matrix \({\mathbf {C}}\) are calculated. The mean vector \(\mu\) is calculated as

$$\begin{aligned} \mathbf {\mu }=\left( \mu _1,\mu _2,\ldots ,\mu _n\right) =\frac{1}{m}\left( \sum _{i=1}^m x_{i,1}, \sum _{i=1}^m x_{i,2}, \ldots , \sum _{i=1}^m x_{i,n} \right) ^{\mathbf {T}} \end{aligned}$$

and the generic element \(c_{j,l}\) of the covariance matrix \({\mathbf {C}}\) is:

$$\begin{aligned} c_{j,l}= \frac{1}{m}\sum _{i=1}^m\left( \left( x_{i,j}-\mu _j\right) \left( x_{i,l}-\mu _l\right) \right) . \end{aligned}$$

Then, the eigenvectors

$$\begin{aligned} {\mathbf {P}}=\left( {\mathbf {p}}^1,{\mathbf {p}}^2,\ldots ,{\mathbf {p}}^n\right) \end{aligned}$$

of \({\mathbf {C}}\) are calculated via Cholesky factorisation. Since \({\mathbf {C}}\) is symmetric, it is diagonalizable, and an orthogonal basis of its eigenvectors can be found; see [27]. These eigenvectors are used as the basis to explore the space in the PS logic. In other words, the matrix \({\mathbf {P}}\), whose columns are the eigenvectors of \({\mathbf {C}}\), is used as the basis matrix \({\mathbf {B}}\) of GPS; see [29].

The eigenvalues

$$\begin{aligned} \lambda _1,\lambda _2,\ldots \lambda _n \end{aligned}$$

associated with the eigenvectors \({\mathbf {p}}^1,{\mathbf {p}}^2,\ldots ,{\mathbf {p}}^n\), respectively, are used to update the generating matrix \({\mathbf {G}}_k\) of GPS. More specifically, the matrix \({\mathbf {G}}_k\) can be represented as a vector of row vectors; each of them associated with a design variable of the optimisation problem

$$\begin{aligned} {\mathbf {G}}_k = \left( \begin{array}{c} {\mathbf {g}}_{k1} \\ {\mathbf {g}}_{k2} \\ \ldots \\ {\mathbf {g}}_{kn} \\ \end{array}\right) . \end{aligned}$$

The generating matrix \({\mathbf {G}}_k\) is then updated by multiplying each row by the square root of the corresponding eigenvalue

$$\begin{aligned} {\mathbf {Q}}_k = \left( \begin{array}{c} \sqrt{\lambda _1}\cdot {\mathbf {g}}_{k1} \\ \sqrt{\lambda _2} \cdot {\mathbf {g}}_{k2} \\ \ldots \\ \sqrt{\lambda _n}\cdot {\mathbf {g}}_{kn} \\ \end{array}\right) . \end{aligned}$$
(6)

The PS is then run on current best solution \({\mathbf {x}}\), with the pattern \({\mathbf {P}}_{k}\) calculated as

$$\begin{aligned} {\mathbf {P}}_k= {\mathbf {P}}{\mathbf {Q}}_k. \end{aligned}$$
(7)

Algorithm 4 displays the pseudocode of the Fitness Landscape Analysis.

figure d

Rationale of Fitness Landscape Analysis

This section explains the rationale behind the choices made above, i.e., what the fitness landscape analysis measures and how it informs the algorithmic design of PS. First, it is important to visualise the information contained in the data structure \({\mathbf {V}}\). Let us consider the following four shifted and rotated objective functions in two dimensions within \(\left[ -100, 100\right] ^2\); see [20]:

$$\begin{aligned} \text {Sphere} \quad f\left( {\mathbf {x}}\right)& = {} z_1^2+z_2^2 \\ \text {Ellipsoid}\quad f\left( {\mathbf {x}}\right)& = {} 50z_1^2+200 z_2^2\\ \text {Bent Cigar} \quad f\left( {\mathbf {x}}\right)& = {} z_1^2+10^6z_2^2\\ \text {Rosenbrock}\quad f\left( {\mathbf {x}}\right)& = {} 100\left( z_1^2-z_{2}\right) ^2+\left( z_1-1\right) ^2, \end{aligned}$$

where \({\mathbf {z}}= {\mathbf {R}}\left( {\mathbf {x}}-{\mathbf {o}}\right)\). The shift vector is

$$\begin{aligned} {\mathbf {o}}=\left( \begin{array}{c} -21.98 \\ 11.55 \end{array}\right) \end{aligned}$$

and \({\mathbf {R}}\) is a random rotation matrix. Figure 1 displays the plot of the bi-dimensional vectors contained in the data structure \({\mathbf {V}}\) and the directions of the eigenvectors of the covariance matrix associated with the sampled points.

Fig. 1
figure 1

Sampling of points (blue dots) within \(\left[ -100,100\right] ^2\) for shifted and rotated sphere, ellipsoid, bent cigar, and Rosenbrock functions below the threshold values \(10^3\), \(3\times 10^4\), \(10^6\) and \(5\times 10^3\), respectively. Each sub-figure shows the contour of the function under consideration. The pairs of dashed lines in each sub-figure indicate the directions of the eigenvectors of the Covariance matrix associated with the sampled points

Figure 1 shows that the data structure \({\mathbf {V}}\) contain pieces of information about the geometry of the problem and that the eigenvectors of the covariance matrix identify important directions for such problem. For example, for the bent cigar problem, the eigenvectors identify the longitudinal and transverse axes of the line detected by the points.

As reported in [31], the rationale behind the choice to use the eigenvectors \({\mathbf {p}}^i\) is due to the fact that the matrix \({\mathbf {P}}\), whose columns are the eigenvectors \({\mathbf {p}}^i\), is the transformation matrix that diagonalises the matrix \({\mathbf {C}}\) that is

$$\begin{aligned} \varvec{\Lambda }=\mathbf {P^{-1}CP}=\mathbf {P^{T}CP}, \end{aligned}$$
(8)

where \(\varvec{\Lambda }\) is a diagonal matrix whose diagonal elements are the eigenvalues of \({\mathbf {C}}\) and \(\mathbf {P^{-1}}=\mathbf {P^T}\) as \({\mathbf {P}}\) is an orthogonal matrix (\(\mathbf {P^T}\) is the transpose of the matrix \({\mathbf {P}}\)). The directions of the eigenvectors can be interpreted as a new reference system characterised by a lack of correlation between pairs of variables. Thus, the new reference system exploits the available information about the geometry of the problem. This concept is broadly used in other contexts, especially in data science, and is closely related to principal component analysis [18].

Furthermore, as reported in [31], the directions of the eigenvectors of the covariance matrix identify the maximum and minimum directional derivatives. Thus, these eigenvectors are an efficient basis for Patter Search. For example, let us consider the shifted and rotated bent cigar function in two variables \(f\left( {\mathbf {x}}\right) =z_1^2+10^6z_2^2\) with \({\mathbf {z}}= {\mathbf {R}}\left( {\mathbf {x}}-{\mathbf {o}}\right)\). The shift vector is

$$\begin{aligned} {\mathbf {o}}=\left( \begin{array}{c} -21.98 \\ 11.55 \end{array}\right) \end{aligned}$$

and \({\mathbf {R}}\) is the rotation matrix

$$\begin{aligned} {\mathbf {R}}=\left( \begin{array}{cc} -0.45408 &{} -0.89096\\ -0.89096 &{} 0.45408 \\ \end{array} \right) . \end{aligned}$$

Figure 2 shows the estimated directional derivative along the directions of the variables, as in the case of PS (see Algorithm 2), and along the directions of the eigenvectors \({\mathbf {p}}^i\) of the covariance matrix.

Fig. 2
figure 2

Plot of the directional derivatives from the optimum of the bent cigar in two variables along the directions of the variables \({\mathbf {e}}^i\) (red dashed line) and along the directions of the eigenvectors \({\mathbf {p}}^i\) (blue solid line)

Figure 2 implicitly provides an interpretation of the search along the directions of the eigenvectors \({\mathbf {p}}^i\): the directions of these eigenvectors identify the steepest and flattest directions of the fitness landscape.

To enhance the performance of PS, it is here proposed to use large step sizes along those directions corresponding to a flat fitness landscape and small step sizes along those directions corresponding to a steep fitness landscape. Although we cannot know in advance the values of the directional derivatives, we have their estimated values by means of the eigenvalues of the covariance matrix. This is the main motivation of the proposed update of the generating matrix \({\mathbf {G}}_k\).

Let us consider again the covariance matrix \({\mathbf {C}}\) calculated as shown above. Let \({\mathbf {P}}=\left( \mathbf {p^1},\mathbf {p^2},\ldots ,\mathbf {p^n}\right)\) be a matrix whose columns are the eigenvectors of \({\mathbf {C}}\) and let us indicate with

$$\begin{aligned} diag(\varvec{\Lambda })=\left( \lambda _1,\lambda _2,\ldots ,\lambda _n\right) \end{aligned}$$

the corresponding eigenvalues.

It must be observed that, since \({\mathbf {C}}\) is symmetric, the eigenvalues are all real numbers; see [27]. Furthermore, the eigenvectors can be chosen as an orthonormal basis (every pair of vectors is orthogonal and each vector has modulus equal to 1) of a vector space, which we refer to as eigenspace. These eigenvectors span the domain D.

Thus, if we consider a vector \({\mathbf {x}}\in D\) expressed in the basis \(B_{{\mathbf {e}}}=\lbrace {\mathbf {e}}^1, {\mathbf {e}}^2, \ldots , {\mathbf {e}}^n\rbrace\), we may express it through the corresponding vector \({\mathbf {y}}\) in the reference system/basis of the eigenvectors

$$\begin{aligned} {\mathbf {y}} = \mathbf {P^Tx}. \end{aligned}$$

Since the mean vector \(\mu\) calculated from the vectors in \({\mathbf {V}}\) is also an element of D, we can express it via eigenvectors

$$\begin{aligned} \mu _\mathbf {y} = \mathbf {P^T}\mu . \end{aligned}$$

Let us now introduce the covariance matrix \(\mathbf {C_y}\) of the data in \({\mathbf {V}}\) in the reference system identified by the eigenvectors. This is expressed by

$$\begin{aligned} \mathbf {C_y} =(\mathbf {P^TX_c})\mathbf {(P^TX_c)^T} = \mathbf {P^TX_cX_c^TP} = \mathbf {P^TCP}, \end{aligned}$$
(9)

where

$$\begin{aligned} \mathbf {X_c}= (\mathbf {x_1}-{{\mu }}, \mathbf {x_2}-{{\mu }},...,\mathbf {x_m}-{{\mu }}). \end{aligned}$$

From Eqs. (8) and (9), it follows that:

$$\begin{aligned} \mathbf {C_y} = \varvec{\Lambda }. \end{aligned}$$
(10)

Thus, the diagonal elements of \(\mathbf {C_y}\) are the eigenvalues of \({\mathbf {C}}\), while the extradiagonal elements are zero. Since the diagonal elements of a covariance matrix are the variances \(\sigma _i^2\) of the data along the direction \(\mathbf {p^i}\), it follows that:

$$\begin{aligned} \sigma _i^2=\lambda _i. \end{aligned}$$

The selection of the best \(n_v\) points out of the \(n_s\) available samples can be conceptually considered the selection of points whose objective function value is below a certain threshold thre, where thre is the objective function value of the point in the data structure \({\mathbf {V}}\) with the worst (highest) objective function value. Thus, we may say that the fitness landscape analysis selects those points, such that \(f({\mathbf {x}}) \le thre\). In a basin of attraction, these samples would be distributed around a local optimum. Let us suppose, for simplicity, the notation that the optimum is in the null vector \({\mathbf {o}}\) (i.e., let us apply an operation of translation). The directional derivative along some direction \(\mathbf {p^i}\) is

$$\begin{aligned} \frac{\partial f\left( {\mathbf {x}}\right) }{\partial \mathbf {p^i}} \approx \frac{f(\mathbf {x^i})-f({\mathbf {o}})}{|\mathbf {x^i}-{\mathbf {o}}|} = \frac{ f(\mathbf {x^i}-f({\mathbf {o}}))}{|\mathbf {x^i}|}. \end{aligned}$$

Let us observe that \(f({\mathbf {o}})\) is a constant, \(\mathbf {x^i} = l\cdot \mathbf {p^i}\) with l modulus of \(\mathbf {x^{i}}\) and \(|\mathbf {p^{i}}|=1\), since it is a versor (i.e., a vector with modulus 1). When we pose \(f({\mathbf {x}}) = thre\), we find that \(f(\mathbf {x^i})-f({\mathbf {o}})=thre^{*}\) is also a constant. Thus

$$\begin{aligned} \frac{\partial f\left( {\mathbf {x}}\right) }{\partial \mathbf {p^i}} \approx \frac{thre^{*}}{ |l|} \end{aligned}$$
(11)

that is the directional derivative along the eigenvector \(\mathbf {p^{i}}\) is inversely related to the modulus l.

Along the direction of \({\mathbf {p}}^i\), there exists a correlation between the modulus l and the corresponding eigenvalue \(\lambda _i\). Let us consider the two points \(\mathbf {x^i}\) and \(\mathbf {-x^i}\) belonging to the direction of \(\mathbf {p^i}\). Let us assume that the objective function values of these points are thre. Thus, the distance between \(\mathbf {x^i}\) and \(\mathbf {-x^i}\) estimates the width of the distribution along the direction of \({\mathbf {p}}^i\) and the standard deviation estimates the average modulus of the points in \({\mathbf {V}}\). Considering that the standard deviation calculated along the direction of \({\mathbf {p}}^i\) is the square root of \(\lambda _i\), it follows that:

$$\begin{aligned} \sqrt{\lambda _i}=\sigma _i=\sqrt{\frac{1}{2}((\mathbf {x^i-o})^2+(\mathbf {-x^i-o})^2)} = l. \end{aligned}$$
(12)

By combining Eq. (11) and (12), we may conclude that the directional derivative in the direction of an eigenvector \({\mathbf {p}}^i\) of the covariance matrix \({\mathbf {C}}\), as calculated above, is inversely proportional to the square root of the corresponding eigenvalue

$$\begin{aligned} \frac{\partial f\left( {\mathbf {x}}\right) }{\partial \mathbf {p^i}} \approx \frac{thre^{*}}{ \sqrt{\lambda _i}}. \end{aligned}$$
(13)

Figure 3 reports an example that is useful in visualising the meaning of Eq. (12). The points belonging to the data structure \({\mathbf {V}}\) associated with the samples for the shifted and rotated ellipsoid in two dimensions are reported as blue dots. The dashed lines indicate the directions of the eigenvectors of the covariance matrix. The two associated eigenvalues are \(\lambda _1=1.5221\) and \(\lambda _2=4324.1\), respectively. These numbers reflect the distribution of the points that appear as a thin and long line. We may observe that \(\lambda _2 \gg \lambda _1\) and the points in \({\mathbf {V}}\) span a much wider range along the direction of \({\mathbf {p}}^2\) (in black) than along the direction of \({\mathbf {p}}^1\) (in red). Therefore, we can see that the eigenvalues estimate the extent of the distribution of points along the directions of the corresponding eigenvectors.

Fig. 3
figure 3

Distribution of points in \({\mathbf {V}}\), directions of the eigenvectors and meaning of eigenvalues (\(\lambda _1=1.5221\) and \(\lambda _2=4324.1\)) in the domain for the rotated and shifted ellipsoid in two dimensions

Furthermore, as shown by the contour, along the direction of the first eigenvector, the fitness landscape is very steep. However, along the direction of the second eigenvector, the landscape is nearly flat. This statement intuitively explains the meaning of Eq. (13).

Since the square roots of the eigenvalues are inversely proportional to the directional derivative, it is proposed to use them as direct multipliers to set the step sizes along each search direction of the basis of eigenvectors. With reference to Fig. 3, along the direction of \({\mathbf {p}}^1\), the landscape is steep and the corresponding eigenvalue \(\lambda _1\) is small. Therefore, we use \(\lambda _1\) as a multiplier to ensure that small steps are performed. Conversely, along the direction of \({\mathbf {p}}^2\), the landscape is flat and the corresponding eigenvalue \(\lambda _2\) is large. Thus, we use \(\lambda _2\) as a multiplier to enable large steps along that direction. This logic explains the proposed way of modifying \({\mathbf {Q}}_k\) in Eq. (6).

Pattern Search Designed on the Basis of Fitness Landscape Analysis

This article proposes a restarting algorithm based on PS logic presented in Algorithm 2. At each local run, the fitness landscape analysis returns the pattern \({\mathbf {P}}_k={\mathbf {P}}{\mathbf {Q}}_k\). This means that the minus move along the ith direction is

$$\begin{aligned} \mathbf {x^t}={\mathbf {x}}_k-\rho \cdot \sqrt{\lambda _i}\cdot {\mathbf {p}}^i \end{aligned}$$
(14)

and the plus move is

$$\begin{aligned} \mathbf {x^t}={\mathbf {x}}_k+\frac{\rho }{2}\cdot \sqrt{\lambda _i}\cdot {\mathbf {p}}^i. \end{aligned}$$
(15)

We may express the same concept in terms of GPS notation using the example in two dimensions of Section 2.1. The proposed PS in two dimensions (\(n=2\)) is characterised by the basis matrix

$$\begin{aligned} {\mathbf {P}}=\left( \begin{array}{cc} {\mathbf {p}}^1&{\mathbf {p}}^2 \end{array}\right) \end{aligned}$$

and the generating matrix \({\mathbf {Q}}_k\)

$$\begin{aligned} {\mathbf {Q}}_k=\left( \begin{array}{ccccccccc} \frac{\sqrt{\lambda _1}}{2} &{} 0 &{} -\sqrt{\lambda _1} &{} 0 &{} \frac{\sqrt{\lambda _1}}{2} &{} \frac{\sqrt{\lambda _1}}{2} &{}-\sqrt{\lambda _1} &{}-\sqrt{\lambda _1} &{} 0 \\ 0 &{} \frac{\sqrt{\lambda _2}}{2} &{} 0 &{} -\sqrt{\lambda _2} &{} \frac{\sqrt{\lambda _2}}{2} &{} -\sqrt{\lambda _2} &{}-\sqrt{\lambda _2} &{} \frac{\sqrt{\lambda _2}}{2} &{} 0 \end{array} \right) \end{aligned}$$

and \(\varDelta _k=\rho\). The trial step would be

$$\begin{aligned} {\mathbf {s}}_k=\varDelta _k \mathbf {Pq}_k^i, \end{aligned}$$
(16)

where \(\varDelta _k=\rho\) and \({\mathbf {q}}_k^i\) is the \(i^{th}\) column of the matrix \({\mathbf {Q}}_k\). We may easily verify that, by combining the moves in Eqs. (14) and (15), all the potential \(\varDelta _k \mathbf {Pq}_k^i\) can be generated.

The main parameters of GPSRFLA are reported in the following.

f

objective function

\({\mathbf {x}}\)

candidate solution

\(\mathbf {x^t}\)

trial solution

\(n_s\)

number of samples for the analysis

\(k_v\)

since of the space where points are sampled

\({\mathbf {V}}\)

data set for the analysis (\(n_v\) its number of rows)

\({\mathbf {C}}\)

covariance of the points in \({\mathbf {V}}\)

\({\mathbf {P}}\)

eigenvector (basis) matrix of \({\mathbf {C}}\)

\(\lambda _i\)

eigenvalue of \({\mathbf {C}}\)

\({\mathbf {Q}}_k\)

generating matrix

\({\mathbf {P}}_k\)

pattern matrix

\(\rho\)

radius

Algorithm 5 shows the pseudocode of the proposed GPS designed on the basis of the fitness landscape analysis.

figure e

A Computationally Efficient Instance of GPSRFLA

A characterising feature of algorithms based on fitness landscape analysis is the requirement of objective function calls, which impacts the budget of the optimiser. This section presents a PS implementation that, although fits within the GPSRFLA framework in Algorithm 3, fills the data structure \({\mathbf {V}}\) with the points visited during the search.

More specifically, the data structure \({\mathbf {V}}\) is an output of the GPS and an input for the fitness landscape analysis. In the first local run, the basis matrix \({\mathbf {B}}\) is initialised to the identity matrix \({\mathbf {I}}\). GPS moves along the directions of the variables as shown in Algorithm 2. During each local run, GPS saves all the successful trial points in the data structure \({\mathbf {V}}\), i.e., all the points that have been current best points during the local run. The filled data structure \({\mathbf {V}}\) is then passed to the fitness landscape analysis component, which calculates the covariance matrix \({\mathbf {C}}\) and its eigenvector matrix \({\mathbf {P}}=\left( {\mathbf {p}}^1,{\mathbf {p}}^2,\ldots ,{\mathbf {p}}^n\right)\). The matrix \({\mathbf {P}}\) is then used as the basis matrix \({\mathbf {B}}\) for the following local run. It must be observed that, at each restart, the data structure \({\mathbf {V}}\) contains the points below a threshold identified by \(f\left( {\mathbf {x}}_k\right)\) with \({\mathbf {x}}_k\) starting point of that local run.

Algorithm 6 illustrates the resulting algorithm, ACPS, which was presented for the first time in [28].

figure f

The main advantage of the ACPS implementation in Algorithm 6 is that no extra objective function calls are required to perform the fitness landscape analysis. As shown above, ACPS uses the points visited during the search to perform the fitness landscape analysis. This means that, unlike the general GPSRFLA framework, ACPS may use the entire computational budget to optimise the objective function. ACPS uses some of these objective function calls to progressively analyse and learn the fitness landscape, while it refines the adaptation of the pattern matrix \({\mathbf {P}}_k\) to the optimisation problem.

To illustrate the functioning of the proposed ACPS, Fig. 4 shows the trajectory of the algorithm in four consecutive local runs. With the term trajectory, we mean the current best solutions visited by ACPS. Figure 4 refers to the shifted and rotated ellipsoid in two dimensions

$$\begin{aligned}&\mathbf{INPUT} {\mathbf {x}}\\&{\mathbf {z}}\leftarrow {\mathbf {R}}\left( {\mathbf {x}}-{\mathbf {o}}\right) \\&f \leftarrow \sum _{i=1}^2 \left( 10^6\right) ^{\frac{i-1}{1}} z_i^2, \end{aligned}$$

where the shift vector is

$$\begin{aligned} {\mathbf {o}}=\left( \begin{array}{c} -21.98 \\ 11.55 \end{array}\right) \end{aligned}$$

and the rotation matrix is

$$\begin{aligned} {\mathbf {R}}=\left( \begin{array}{cc} -0.6358 &{} -0.7718\\ -0.7718 &{} 0.6358 \end{array}\right) . \end{aligned}$$

A random point \({\mathbf {x}}\) has been sampled within the domain. The objective function value of this starting point is \(7.4385e+09.\)

Fig. 4
figure 4

Trajectory of ACPS in four consecutive local runs on the rotated and shifted ellipsoid. The red dots are current best solutions and the trajectory of the ACPS is shown as a blue solid line. The black and red dashed lines indicate the eigenvectors. The best objective function values are at the top of each figure

Figure 4 shows that, in the first local run of the algorithm, while moving along the directions of the variables (black and red dashed lines), it approaches the optimum but still remains far from it. After the restart, ACPS uses the new search directions, i.e., the eigenvectors of the covariance matrix of the distribution of samples collected during the first local run. This system of reference appears to be ineffective. We may observe that, during the second local run only, a marginal improvement is achieved. However, the budget spent in the second local run is not wasted; the points sampled during the second local run enable the detection of an effective reference system (eigenvectors in the third local run). During the third local run, ACPS exploits the benefits of the fitness landscape analysis and quickly detects a solution close to the optimum. The results are then refined in the fourth local run where the eigenvectors are slightly corrected.

It must be observed that the proposed ACPS resembles the Rosenbrock method [37] as both use a basis of vector that is progressively adapted during the run (the Rosenbrock Method belongs to the GPS family). However, the two algorithms are radically different in terms of the way the basis is selected and updated. More specifically, while ACPS makes use of the eigenvectors of the covariance matrix of a set of samples, the Rosenbrock Algorithm stores the successful moves and determines a new orthonormal basis guided by previous successful moves.

It must be remarked that, although ACPS can be considered an instance of the GPSRFLA framework, it does not make use of the eigenvalues to update the generating matrix \({\mathbf {G}}_k\). This decision has been made considering the preliminary results that we obtained. The data structure \({\mathbf {V}}\) is likely to obtain a small number of points. On one hand, these points are enough to correctly estimate, through the eigenvectors of the covariance matrix associated with them, the directions with maximum and minimum directional derivatives (multiple steps as illustrated in Fig. 4). On the other hand, these points are usually not enough to correctly estimate the values of the directional derivatives through the calculation of the eigenvalues. Since these wrong estimations tend to jeopardise the performance of the algorithm, it was decided not to exclude the update of the generating matrix \({\mathbf {G}}_k\) from ACPS.

Numerical Results

To test and compare the performance of the GPSRFLA framework and ACPS, a set of functions from the IEEE CEC2013 benchmark [20] was selected and adapted. Since PS is a local search, we selected all the unimodal problems, hence reproducing the testbed of CPS used in [30]. We also reproduced both the versions of ellipsoid presented in [30] (\(f_2\) and \(f_3\)). The condition number of these two ellipsoids worsens with dimensionality at different speeds. In this paper, alongside bent cigar and discus, we included their modified versions.

Finally, to show that GPSRFLA is capable, to some extent, at handling multimodal fitness landscapes, we included two simple multimodal functions from [20]. The list of the functions used in this study is displayed in Table 1. As shown in Table 1, each problem has been shifted and rotated; the vector \({\mathbf {x}}\) is transformed into \({\mathbf {z}}\). The shift vector \({\mathbf {o}}\) of [20] has been used. The rotation matrices \({\mathbf {R}}\) have been randomly generated. One matrix \({\mathbf {Q}}\) has been generated for each problem and dimensionality value.

Table 1 Objective functions used in this study

The results are divided into the following categories:

  • Comparison among PS algorithms.

  • Comparison against other algorithms.

The problems in Table 1 have been considered in 10, 30, and 50 dimensions. For each problem in Table 1, each dimensionality level, and each algorithm in this study, 51 independent runs were performed. For each run, the algorithms under consideration, if single solution, have been run with the same initial solution. All the algorithms in this paper have been executed with a budget of \(10000 \cdot n\) function calls, where n is the problem dimensionality. The results for each algorithm and problem are expressed in terms of mean value and ± standard deviation over the 51 independent runs performed. Furthermore, to statistically investigate whether the application of the proposed method results in performance gain, the Wilcoxon rank-sum test was applied; see [12]. In the tables in this section, a “+” indicates that the proposed algorithm (GPSRFLA/ACPS) significantly outperformed its competitor, a “−” indicates that the competitor significantly outperformed the proposed algorithm, and a “=” indicates that there is no significant difference in performance.

Comparison among Pattern Search Algorithms

This section highlights the benefits of fitness landscape analysis on PS algorithms. To achieve this aim, the following algorithms have been tested on the problems in Table 1:

  • the original PS according to the implementation in [41] and the implementation Algorithm 2

  • CPS presented in [30, 31]

  • GPSRFLA according to the framework in Algorithm 3, the analysis component in Algorithm 4 and the GPS implementation in Algorithm 5

  • ACPS [28] as reported in Algorithm 6

All PS variants in this article have been run with the initial radius \(\rho = 0.1 \cdot\) domain width \(= 20\). This parameter has been set using the indication in [41] and then tuning for our testbed.

As reported in [30], the budget of CPS has been split in two parts: \(5000 \cdot n\) function calls have been used to build the covariance matrix \({\mathbf {C}}\), while \(5000 \cdot n\) function calls have been spent to execute the algorithm along the directions of its eigenvectors.

The threshold thre for the problems in Table 1 are reported in Table 2. The threshold values were set empirically by testing values of the codomain that allowed for some points to be stored in the data structure \({\mathbf {V}}\), while some others were discarded; see [31].

Table 2 Thresholds thre for CPS in 10, 30, and 50 dimensions as reported in [31]

GPSRFLA uses a local budget of \(l_b=1000 \cdot n\) objective function calls, \(n_s=200 \cdot n\) slots (and thus GPS is run with a budget of \(800 \cdot n\) objective function calls), \(n_v=5 \cdot n\) slots, and \(k_v=100\). GPS is also stopped if \(\rho \le 10^{-15}\).

ACPS is run with a maximum local budget \(l_b=1000 \cdot n\) and is stopped if \(\rho \le 10^{-15}\).

Table 3 shows the numerical results of the four PS variants. The best results for each problem are highlighted in bold.

Table 3 Average error avg ± standard deviation \(\sigma\) over 51 runs for the problems listed in Table 1: PS according to the implementation in [41], CPS proposed in [31], and the implementation of GPSRFLA, as shown in Algorithms 3, 4, and 5, ACPS, as shown in Algorithm 6. GPSRFLA and ACPS, respectively, are taken as references for Wilcoxon for the comparison against PS and CPS. ACPS is taken as reference for Wilcoxon for comparison against GPSRFLA

The results in Table 3 show that both GPSRFLA and ACPS significantly outperform PS in the vast majority of cases. Only for the sphere function, \(f_1\) PS have an excellent performance. This is due to the fact that the identity matrix is already the ideal choice of basis matrix. For all other problems, PS either performs slightly worse or much worse than GPSRFLA and ACPS. The comparisons of GPSRFLA and ACPS against CPS also show that the proposed algorithms outperform their predecessors for almost all problems considered. This suggests that the restarting fitness landscape analysis logic is beneficial to the performance of the algorithm. Finally, the comparison of ACPS to GPSRFLA shows that, in most cases, the computational efficient logic embedded in ACPS yields significantly better results than GPSRFLA, which uses part of the budget to solely analyse the problem. However, the exploitation of the information regarding the directional gradients is potentially very powerful, as shown in the case of the rotated discus function \(f_6\).

Three examples of performance trends for the variants of PS included in this study are illustrated in Figs. 5, 6, and 7.

Fig. 5
figure 5

Performance trend (logarithmic scale) of the PS variants in this study for the ellipsoid \(f_3\) in 10 dimensions

Fig. 6
figure 6

Performance trend (logarithmic scale) of the PS variants in this study for the discus \(f_6\) in 30 dimensions

Fig. 7
figure 7

Performance trend (logarithmic scale) of the PS variants in this study for Rosenbrock \(f_{10}\) in 50 dimensions

Comparison Against Other Algorithms

We have compared GPSRFLA and ACPS against the following two algorithms:

  • Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm [11] with an estimation of the gradient that may make it applicable to black-box problems;

  • Covariance Matrix Adaptive Evolution Strategy (CMAES) [14].

The motivations behind these two competitors are as follows: (1) BFGS is a Quasi–Newtonian (i.e., based on a solid theoretical foundation) algorithm that estimates the gradient and is here used as a benchmark algorithm; and (2) CMAES is a prevalent algorithm that, like GPSRFLA and ACPS, is based on theoretical considerations of multi-variate distributions and the covariance matrix. Table 4 lists the results of this comparison (the best results are highlighted in bold).

Table 4 Average error avg ± standard deviation \(\sigma\) over 51 runs for the problems listed in Table 1: BFGS algorithm [11], CMAES [14], the implementation of GPSRFLA as shown in Algorithms 3, 4, and 5, ACPS, as shown in Algorithm 6. GPSRFLA and ACPS, respectively, are taken as references for Wilcoxon for the comparison against PS and CPS. ACPS is taken as reference for Wilcoxon for the comparison against GPSRFLA

Numerical results in Table 4 show that, on average, CMAES, GPSRFLA, and ACPS are better suited than BFGS to address the black-box problems (without information on the gradient). However, BFGS is an excellent algorithm for several problems, especially the multi-variate ellipsoid function. The results show that CMAES and GPSFRLA have almost comparable performances, with CMAES performing better that GPSFRLA on average for seven problems and worse for four problems. ACPS is more competitive with CMAES as ACPS outperforms CMAES in 16 cases out of 33, while it was outperformed in 15. The algorithms have the same performance for the remaining two problems. Overall, we may conclude that ACPS and CMAES have a comparable performance for the problems under investigation.

Some further considerations can be made regarding the scalability of the algorithms. In the low-dimensional case (\(n=10\)), both the algorithms detect solutions very close to the optimum for the nine unimodal problems (\(f_{1}-f_9\)) and detect the global optimum in several runs. They also detect a local minimum for the two multimodal problems (\(f_{10}-f_{11}\)). In higher dimensions, we observe that the performances of both CMAES and ACPS deteriorate for some problems and remain excellent for others. For example, CMAES performs extremely well on \(f_2-f_4\) regardless of the number of variables, while ACPS deteriorates as the number of dimensions increases. Conversely, ACPS handles the \(f_5-f_7\) problems better than CMAES.

With reference to the results in Table 4, Figs. 89, 10, and 11 show some examples of performance trends of GPSRFLA and ACPS against BFGS and CMAES. These plots confirm the reports on Table 4; at higher dimensions, CMAES and ACPS both appear to be inadequate at solving some problems but are very well suited for others. Figure 10 shows an example for which GPSRFLA and ACPS perform poorly compared to CMAES. Conversely, Fig. 11 shows an example for which both GPSRFLA and ACPS display an excellent performance, while CMAES appears to be inadequate.

Fig. 8
figure 8

Performance trends (logarithmic scale) of BFGS, CMAES, GPSRFLA, and ACPS for the modified bent cigar \(f_5\) in 10 dimensions

Fig. 9
figure 9

Performance trends (logarithmic scale) of BFGS, CMAES, GPSRFLA, and ACPS for the modified discus \(f_7\) in 30 dimensions

Fig. 10
figure 10

Performance trends (logarithmic scale) of BFGS, CMAES, GPSRFLA, and ACPS for ellipsoid \(f_{2}\) in 50 dimensions

Fig. 11
figure 11

Performance trends (logarithmic scale) of BFGS, CMAES, GPSRFLA, and ACPS for modified bent cigar \(f_{5}\) in 50 dimensions

Statistical Ranking via the Holm–Bonferroni Procedure

To further strengthen the statistical analysis of the presented results, we performed the Holm–Bonferroni [15] procedure for the six algorithms and 33 problems (11 objective functions \(\times 3\) levels of dimensionality) under investigation. The results of the Holm–Bonferroni procedure are presented in Table 5. A score \(R_j\) for \(j = 1,\dots ,N_A\) (where \(N_A\) is the number of algorithms under analysis, \(N_A = 6\), in this paper) has been assigned. The score has been assigned in the following way: for each problem, a score of 6 is assigned to the algorithm displaying the best performance, 5 is assigned to the second best, 4 to the third, and so on. For each algorithm, the scores obtained for each problem are summed up and averaged over the 33 test problems. With the calculated \(R_j\) values, ACPS has been taken as the reference algorithm. \(R_0\) indicates the rank of ACPS and, with \(R_j\) for \(j = 1,\dots ,N_A-1\), the rank of the remaining four algorithms. Let j be the index of the algorithm. The values \(z_j\) have been calculated as

$$\begin{aligned} z_j = \frac{R_j - R_0}{\sqrt{\frac{N_A(N_A+1)}{6N_{TP}}}}, \end{aligned}$$

where \(N_{TP}\) is the number of test problems (33 in this study). By means of the \(z_j\) values, the corresponding cumulative normal distribution values \(p_j\) have been calculated; see [10]

$$\begin{aligned} p_j=\frac{2}{\sqrt{\pi }}\int _{\frac{-z_j}{\sqrt{2}}}^\infty e^{-t^2}dt. \end{aligned}$$

These \(p_j\) values have then been compared with the corresponding \(\delta /j\), where \(\delta\) is the level of confidence, set to 0.05 in this case. Table 5 displays the ranks, \(z_j\) values, \(p_j\) values, and corresponding \(\delta /j\) obtained. Moreover, it is indicated whether the null hypothesis (which states that the two algorithms have indistinguishable performance) is ‘Rejected’, i.e., the algorithms have statistically different performance, or ‘Failed to Reject’, meaning that the test failed to assess that there is different performance (one does not outperform the other).

Table 5 Holm–Bonferroni Procedure with ACPS as a reference (Rank 5.1515e+00)

The results of the Holm–Bonferroni procedure in Table 5 show that ACPS achieved the highest rank. ACPS and CMAES have comparable performances, while ACPS has a better performance than the other four algorithms in this study.

Conclusion

GPS is a family of single solution deterministic algorithms that search for the optimum by moving along a set of moves stored in the pattern matrix. The choice of pattern matrix is still an open issue. This article proposes a restarting scheme where, at each restart, the pattern matrix is updated following a fitness landscape analysis.

Two algorithmic implementations encompassing the two novel contributions of this study with respect to the literature are proposed. The first is the development of a criterion to estimate the directional derivatives (steep and flat directions) through the eigenvalues of the covariance matrix associated with a sample of points, as well as a mechanism to exploit this information within the search. The second is an algorithmic mechanism that uses the objective function calls performed during the search for the purpose of the fitness landscape analysis. The latter contribution enhances the computational efficiency and consequently the performance of the algorithm.

The two proposed implementations outperform their predecessors and are competitive with a Quasi-Newtonian method and a popular high-performing metaheuristic. The second implementation proved to have remarkably good performance. While the two novel contributions of this paper appear to be separately effective, more work is required to effectively combine them with the purpose of superposing their respective benefits.