Introduction

Modern numerical optimisation problems are often complex and do not allow the application of gradient-based algorithms, see [4]. Some of the main reasons why a gradient-based approach may be infeasible or impractical are the following (see [10]):

  • the first derivatives or objective functions are not available, e.g. the function is the result of a simulation or an experiment;

  • the numerical approximation of the gradient is computationally expensive and leads to an unacceptable overhead;

  • the objective function is noisy, e.g. coming from measurements or uncertain data, and the gradient is therefore unreliable.

In these cases, a derivative-free method may be the only feasible option [3, 10]. Derivative-free methods are algorithms that search for a null-gradient point, i.e. the closest local optimum, without calculating the gradient of the objective function.

Derivative free methods can be divided into two macro-categories:

  • model-based methods construct a surrogate model of the objective function, e.g. by sampling points and interpolating them, and then calculating the gradient of the surrogate model to search for the optimum (some modern brilliant examples are reported in [8, 9]);

  • direct search methods use only the objective function values to explore the domain and search for the optimum.

Direct search methods may employ various strategies to sample and test new points [7]. Amongst the many strategies, we can consider the following to be some of the most prevalent in the literature:

  • linesearch methods iteratively identify a unidimensional direction trans-passing the n-dimensional space, and then optimise the objective function along this direction, see [2];

  • pattern search methods use multiple directions to generate a set of potential moves (a pattern) to cover the entire domain, see [23, 27];

  • simplex methods define a geometric figure (a simplex that is the generalisation of the triangle) and makes use of its vertices to set the exploratory rules, see [37];

  • population-based metaheuristics define a large category of methods where multiple candidate solutions are stored and recombined to sample new points. Some methods refer to an evolutionary metaphor, e.g. the classical examples of Genetic Algorithms and Evolution Strategies [13], while some others refer to the motion of animals/particles, e.g. the Particle Swarm Optimisation [28] and Differential Evolution [43] (the latter is not associated with a swarm metaphor but is often considered a swarm intelligence algorithm in terms of its operation, see [50]).

A scheme summarising this partial taxonomy is reported in Fig. 1.

Fig. 1
figure 1

Overview of gradient-free optimisation methods

Pattern search methods have been used in the past seventy years, and there have been various implementations spanning from early versions preceding electronic computers, to modern hybrid versions embedding global search mechanisms [45, 46], as well as studies on dynamic optimisation [36]. This article focuses on pattern search and proposes a novel method belonging to this family of methods. The design of the proposed method is based on a fitness landscape analysis approach: in order to have a problem specific solver, the optimisation problem is analysed by an Artificial Intelligence tool and the results of the analysis are used to design the algorithm, see [7, 25, 32, 33]. In the specific case, the proposed pattern search aims to detect a preferential pattern that suits the specific problems. This article extends our previous work presented in [42]. More specifically, while in [42] we introduced a specific implementation of a pattern search method that uses a fitness landscape analysis, we extensively present here the fitness landscape analysis method and its applicability to the entire pattern search family. Furthermore, we investigate for the first time the theoretical standpoint of the proposed class of methods and provide a theoretical justification of the method.

The remainder of this article is organised in the following way. "Basic Notation and Pattern Search" section provides the background of the pattern search method, describes its evolution over the years, and highlights the open problem of selecting the pattern. "The Proposed Covariance Pattern Search" section describes the proposed pattern search variant from an implementation perspective. "Theoretical Justification of Covariance Pattern Search" section analyses the theoretical foundations of the proposed pattern search and justifies the conjecture on which the method is based. "Numerical Results" section experimentally validates the proposed method by comparing it against the pattern search method (without the fitness landscape analysis approach) and against modern algorithms. "Limitations of Covariance Pattern Search and Future Developments" section highlights the limitations and points out the opportunities for future work following this preliminary study. Finally, "Conclusion" section provides the conclusions to this study.

Basic Notation and Pattern Search

To clarify the notation employed, we will refer to the minimization problem of an objective function \(f\left( {\mathbf {x}}\right)\) in the continuous domain. The candidate solution \({\mathbf {x}}\) is a vector of n design variables in a hyper-cubical decision space \(D\subset {\mathbb {R}}^n\):

$$\begin{aligned}{\mathbf {x}}=\left( x_1,x_2,\ldots ,x_n\right)\end{aligned}.$$

In its original implementation, Pattern Search (PS) maintains a single solution and, while moving along the axes, improves upon its performance (objective function value). This procedure was part of the experimental description carried out by Nicholas Metropolis and Enrico Fermi, see [11, 29]. They were identifying the parameters of a physical model to minimise the error between the model and the experimental data. For this purpose, they varied one parameter of the model at a time by using a certain step size, and when no improvement in this error function was detected the step size was halved until it was smaller than a predefined threshold, see [1].

More formally, a starting solution \({\mathbf {x}}\) is inputted. Subsequently, the current best solution \({\mathbf {x}}\) is varied and a trial solution \(\mathbf {x^t}\) generated. The trial solution:

$$\begin{aligned} \mathbf {x^t}=\left( x^t_1,x^t_2,\ldots ,x^t_n\right) \end{aligned}$$

is generated by applying the update rules/search moves of a PS for each design variable \(\forall i=1,2,\ldots ,n\):

$$\begin{aligned} x^t_i=x_i\pm \rho \end{aligned}$$

where \(\rho\) is the step size or exploratory radius. The expression \(x_i\pm \rho\) means that, according to the specific PS implementation, either or both the points \(x^t_i=x_i+\rho\) and \(x^t_i=x_i-\rho\) are visited. Let us consider the orthonormal basis of \({\mathbb {R}}^n\)

$$\begin{aligned} \begin{array}{l} {\mathbf {e}}^1=\left( 1,0,\ldots 0\right) \\ {\mathbf {e}}^2=\left( 0, 1, \ldots 0\right) \\ \ldots \\ {\mathbf {e}}^n=\left( 0, 0, \ldots 1\right) \end{array} \end{aligned}$$

and rewrite the update equations for the \(i\)th design variable in the vector form

$$\begin{aligned} \mathbf {x^t}={\mathbf {x}} \pm \rho \cdot {\mathbf {e}}^i \end{aligned}$$

where \(\rho \cdot {\mathbf {e}}^i\) indicates the product of the scalar step size \(\rho\) and the versor \({\mathbf {e}}^i\).

If a trial solution is found to have outperformed the current best solution, it replaces it, i.e. if \(f\left( \mathbf {x^t}\right) \le f\left( {\mathbf {x}}\right)\) then \({\mathbf {x}}=\mathbf {x^t}\) (or equivalently \({\mathbf {x}}\leftarrow \mathbf {x^t}\)).

If we modify the notation to include the iteration index, k, where \({\mathbf {x}}_k\) is the candidate solution at the \(k\)th iteration, the trial solution can be generated by an equation of the type

$$\begin{aligned} \mathbf {x^t}={\mathbf {x}}_k \pm \rho \cdot {\mathbf {e}}^i \end{aligned}$$
(1)

and if \(f\left( \mathbf {x^t}\right) \le f\left( {\mathbf {x}}_k\right)\) then \({\mathbf {x}}_{k+1}=\mathbf {x^t}\). As a note for all the Pattern Search implementations in this section, the scalar \(\rho\) could be a vector if it is desired to individually assign a different step size to each design variable. This can happen when the algorithmic designer possesses auxiliary knowledge of the problem or if range of variability is different from one design variable to another (D is hyper-rectangular).

Original Pattern Search

The early Pattern Search (PS) implementations were steepest descent methods [21]: all the variables were explored before a new current solution was selected. For this reason, 2n trial vectors are sampled before the best trial vector is selected as the new current best.

Let us consider two trial solutions \(\mathbf {x^{t\textit{i}+}}\) and \(\mathbf {x^{t\textit{i}-}}\) which have been generated at the iteration k for the design variable i, where:

$$\begin{aligned} \begin{array}{l} \mathbf {x^{t\textit{i}+}}={\mathbf {x}}_k + \rho \cdot {\mathbf {e}}^i\\ \mathbf {x^{t\textit{i}-}}={\mathbf {x}}_k - \rho \cdot {\mathbf {e}}^i \end{array} \end{aligned}$$
(2)

The mechanism makes the algorithm revisit the same solutions on a regular basis. For example, in \({\mathbb {R}}^2\), an exploration involves four trial solutions. When a new current best is selected the new exploration includes one already calculated point. Fig. 2 illustrates this issue of revisiting solutions, where it is shown that if the successful trial point is \(\mathbf {x^{t\textit{i}+}}\), one search move would revisit the solution \({\mathbf {x}}_k\).

Fig. 2
figure 2

The issue of revisiting solutions through Pattern Search. Let \({\mathbf {x}}_k\) be the current best, and \(\mathbf {x^{t\textit{i}+}}\) \(\mathbf {x^{t\textit{i}-}}\) (\(i=1,2\)) (marked in blue) be the points visited during the exploration. If, for example, \(\mathbf {x^{t\textit{1}+}}\) is the new current best then the following exploration visits the four points marked in red. Hence, the algorithm would revisit \({\mathbf {x}}_k\)

To avoid wasting a large portion of the computational budget by revisiting previously calculated candidate solutions, a memory-based mechanism is often employed. The original naive version of Pattern Search, taking into account this revisiting issue, is reported in Algorithm 1.

figure a

As highlighted in [29], this approach can be accurate since it extensively explores the neighborhood before deciding on a search direction. However, this implementation is likely to require a large number of objective function evaluations before it detects a solution close to the optimum. Consequently, this approach can be impractical in high-dimensional domains.

At the opposite end of the spectrum, there are implementations of Pattern Search that are designed not to explore the entire neighborhood. In these implementations, Pattern Search can select and move as soon as a trial solution with a lower objective function value than that of the current best is detected. A greedy approach can yield good results when addressing large scale optimisation problems. For example, one search operator used in the large scale framework in [49] is fundamentally a greedy Pattern Search implementation that, for each design variable, samples a trial solution and if the first move fails it attempts to explore the opposite direction. More specifically, this greedy Pattern Search, herein referred to as gPS, first samples:

$$\begin{aligned} \mathbf {x^t}={\mathbf {x}}_k-\rho \cdot {\mathbf {e}}^i \end{aligned}$$
(3)

and if this trial point is worse than the current best \({\mathbf {x}}_k\), it attempts to sample:

$$\begin{aligned} \mathbf {x^t}={\mathbf {x}}_k+\frac{\rho }{2}\cdot {\mathbf {e}}^i \end{aligned}$$
(4)

before moving to the following design variable. It must be observed that the step size is asymmetric in the two directions. This is a preventative measure to avoid revisiting previously visited solutions. This idea has also been employed within Memetic Computing frameworks, see [5, 7, 24, 52].

To illustrate the functioning of gPS a graphical representation of the moves in two-dimensional space \({\mathbb {R}}^2\) is shown in Fig. 3. In two dimensions there are three potential outcomes. (1) For each design variable, one move is successful: two trial solutions are accepted and then a diagonal move is performed; (2) One move is successful: the move is along one axis; (3) All the moves are unsuccessful: there is no successful move and the step size is halved. In more than two dimensions, a diagonal move is produced by at least two successes while the other two potential outcomes remain identical. Figure 3 displays the fist outcome with a diagonal move.

Fig. 3
figure 3

Search moves of gPS: two (n) successes lead to a diagonal move. Dashed lines indicate failed moves, solid lines successful moves, and red the line the move from the previous current best to the new current best

The details of Pattern Search according to the implementation in [49] is reported in Algorithm 2.

figure b

Hooke–Jeeves Pattern Search

Although this paper and several recent studies [48] identify Pattern Search as a family of optimisation methods which dates back to the 1940s with Metropolis and Fermi’s method, the term “Pattern Search” was only coined in the 1960s by Hooke and Jeeves [23]. In fact, Hooke and Jeeves referred to their method as “Direct Search”, but it is more commonly known today as the Hooke–Jeeves Method or Hooke–Jeeves Pattern Search, see [4, 47].

The Hooke–Jeeves Pattern Search (HJPS) [23] is composed of two search moves, namely the exploratory and pattern moves, the latter of which has subsequently given the name to the method. The exploratory move scans all the decision variables and, in an approach similar to that of gPS in Algorithm 2, attempts one direction unless it fails, in which case it attempts the opposite direction. More specifically, at the generic \(k\)th iteration HJPS samples \(\forall i=1,2\ldots ,n\) at first the trial solution:

$$\begin{aligned} \mathbf {x^t}={\mathbf {x}}_k+\rho \cdot {\mathbf {e}}^i \end{aligned}$$
(5)

and if this move fails (\(f\left( \mathbf {x^t}\right) >f\left( {\mathbf {x}}_k\right)\)), it attempts to sample:

$$\begin{aligned} \mathbf {x^t}={\mathbf {x}}_k-\rho \cdot {\mathbf {e}}^i. \end{aligned}$$
(6)

If the moves in all directions fail, then the radius (step size) is halved. HJPS addresses the revisiting issue by applying the pattern move: if the exploratory move succeeded and a new current best solution is selected, a move along the direction identified by the previous and current best (the direction connecting the two points) is attempted. Let us indicate the previous best solutions with \(\mathbf {x^{old}}\) and the current best solution with \({\mathbf {x}}_k\), then the pattern move generates a new trial solution with:

$$\begin{aligned} \mathbf {x^p}=\mathbf {x^{old}}+\alpha \left( {\mathbf {x}}_k-\mathbf {x^{old}}\right) \end{aligned}$$
(7)

where \(\alpha\) is a scalar named acceleration factor and usually set equal to 2. A new exploration is then carried out around the point \(\mathbf {x^p}\). If the exploration fails, the following exploration is centered on \({\mathbf {x}}_k\), while if the exploration succeeds, it continues. Algorithm 3 displays the working principles of HJPS.

figure c

An illustration of the HJPS search logic is shown in Fig. 4. It can be seen that HJPS and gPS perform very similar search moves, see Figs. 4 and 3. The main difference is the double (possibly diagonal) step in HJPS. The latter can be viewed as an attempt to exploit a promising search direction.

Fig. 4
figure 4

Search moves of HJPS: diagonal move due to the exploratory move and double step due to the pattern move. Dashed lines indicate failed moves, solid lines successful moves, the red line the exploratory move from the previous current best to the new current best, and the black line the pattern move

Generalised Pattern Search

Virginia Torczon in [48] conceptualised Pattern Search as a family of direct search methods, i.e. optimisation algorithms that do not require calculations of the gradient. Hence, Pattern Search is a family of algorithms characterised by two elements:

  • a set of search directions spanning the decision space;

  • a trial step vector endowed with a step variation rule.

For example, in the original Pattern Search and gPS methods, the directions are given by the orthonormal basis \(B_{{\mathbf {e}}}=\lbrace {\mathbf {e}}^1, {\mathbf {e}}^2, \ldots , {\mathbf {e}}^n\rbrace\) in both the possible orientations, e.g. \({\mathbf {e}}^i\) and \(-{\mathbf {e}}^{i}\). The vector of step sizes is a vector of length 2n, whose elements are all \(\rho\) for the original Pattern Search method, or \(\frac{\rho }{2}\) and \(\rho\) for the gPS method. Pattern Search algorithms differ in their implementations of the exploratory moves, e.g. a steepest descent or greedy logic.

This generalization of Pattern Search should be considered in relation to the definition of the multidirectional Pattern Search introduced in [12], where directions different from those identified by the design variables (axes of the problem) are used during the search.

The Generalized Pattern Search (GPS) introduced in [48] is a framework that includes all the algorithms within this category. Formally, with reference to the \(k\)th iteration, the search directions are determined by two matrices. The first is a non-singular matrix, namely the basis matrix, and it is indicated with \({\mathbf {B}} \in {\mathbb {R}}^{n\times n}\) where \({\mathbb {R}}^{n\times n}\) is the set of square matrices of real numbers of order n. The second is a rectangular matrix, namely the generating matrix, and it is indicated with \({\mathbf {C}}_k\in {\mathbb {Z}}^{n \times p}\) where \({\mathbb {Z}}^{n \times p}\) is the set of matrices of relative numbers of size \(n \times p\) with \(p>2n\) and rank n. The matrix \({\mathbf {C}}_k\) can be partitioned as:

$$\begin{aligned} {\mathbf {C}}_k=\left( {\mathbf {M}}_k,-{\mathbf {M}}_k,{\mathbf {L}}_k\right) \end{aligned}$$

where \({\mathbf {M}}_k\) is a non-singular matrix of order n, \(-{\mathbf {M}}_k\) is the opposed matrix of \({\mathbf {M}}_k\), and \({\mathbf {L}}_k\) is a \(n\times \left( p-2n\right)\) matrix that contains at least the null column vector \({\mathbf {o}}\). The search directions are given by the columns of the matrix:

$$\begin{aligned} {\mathbf {P}}_k=\mathbf {BC}_k=\left( \mathbf {BM}_k,-\mathbf {BM}_k,\mathbf {BL}_k\right) \end{aligned}$$
(8)

that is referred to as the pattern. Thus a pattern can be seen as a repository of search directions, with n of them being the direction of a basis of \({\mathbb {R}}^n\), n of them being the same directions but in the opposite orientation, and potentially some additional directions.

The GPS \(k\)th trial iteration along the \(i\)th direction is the vector \({\mathbf {s}}_k\), defined as:

$$\begin{aligned} {\mathbf {s}}_k=\varDelta _k \mathbf {Bc}_k^i \end{aligned}$$
(9)

where \(\varDelta _k\) is a positive real number and \({\mathbf {c}}_k^i\) is the \(i\)th column of the matrix \({\mathbf {C}}_k\). The parameter \(\varDelta _k\) determines the step size while \(\mathbf {Bc}_k^i\) is the direction of the trial step.

If \({\mathbf {x}}_k\) is the current best solution at the iteration k, the trial point generated by means of the trial step would be:

$$\begin{aligned} \mathbf {x^t}_k={\mathbf {x}}_k+{\mathbf {s}}_k. \end{aligned}$$
(10)

The set of operations that yields a current best point is called the exploratory move (coherently with the implementations seen above). The exploratory move succeeds when a solution with better performance is detected, and fails when no update of the current best occurs. Different Pattern Search implementations employ different strategies, e.g. by attempting only one trial vector per step or exploring all the columns of \(\varDelta _k{\mathbf {P}}_k\). However, as explained in [48], Pattern Search implementations belong to the GPS framework only if the following hypotheses, namely the Strong Hypotheses, are verified.

Strong Hypotheses

Hypothesis 1

\({\mathbf {s}}_k\) is generated by the pattern \({\mathbf {P}}_k\) or, in other words, is a column vector of the matrix \(\varDelta _k{\mathbf {P}}_k\). The length is determined by the scalar \(\varDelta _k\).

Hypothesis 2

If there exists a column vector \({\mathbf {y}}\) of \(\left( \mathbf {BM}_k,-\mathbf {BM}_k\right)\) such that \(f\left( {\mathbf {x}}_k+{\mathbf {y}}\right) <f\left( {\mathbf {x}}_k\right)\), then the exploratory move must produce a trial step \({\mathbf {s}}_k\) such that \(f\left( {\mathbf {x}}_k+{\mathbf {s}}_k\right) <f\left( {\mathbf {x}}_k\right)\).

Hypothesis 3

The update of \(\varDelta _k\) should follow some rules. In the case of a failed exploratory move, \(\varDelta _k\) has to decrease, however, in the case of success \(\varDelta _k\) must either remain the same or increase.

The pseudocode of GPS is given in Algorithm 4.

figure d

It is worth mentioning that GPS may include a Pattern Search implementation where the search directions are updated during its operation.

The Proposed Covariance Pattern Search

The proposed algorithm is a GPS that allocates the initial part of its computational budget to the analysis of the problem, in order to determine a problem-specific pattern \({\mathbf {P}}_k\). This section describes the implementation aspects of the proposed algorithm, namely the Covariance Pattern Search (CPS). This section, just like the implementation, is divided into two parts:

  1. 1.

    Fitness Landscape Analysis;

  2. 2.

    Algorithmic Search.

Fitness Landscape Analysis

A number of candidate solutions/points are sampled in the decision space D and their objective function values are calculated. The function values are compared with a threshold thr and those values that are below thr are saved in a data structure, while the others are discarded. The purpose of this operation is to have a sample of points whose distribution describes the geometry of the problem.

To illustrate this fact let us consider the following four popular objective functions in two dimensions within \(\left[ -100, 100\right] ^2\), see e.g. [30].

$$\begin{aligned} \begin{array}{ll} \text {Sphere} &{} f\left( {\mathbf {x}}\right) =x_1^2+x_2^2 \\ \text {Ellipsoid} &{} f\left( {\mathbf {x}}\right) =50x_1^2+200 x_2^2\\ \text {Bent Cigar} &{} f\left( {\mathbf {x}}\right) =x_1^2+10^6x_2^2\\ \text {Rosenbrock} &{} f\left( {\mathbf {x}}\right) = 100\left( x_1^2-x_{2}\right) ^2+\left( x_1-1\right) ^2 \end{array} \end{aligned}$$

If we apply the procedure described above for these four problems after shifting and rotation, we obtain four sets of points that are distributed as shown in Fig. 5.

Fig. 5
figure 5

Sampling of points within \(\left[ -100,100\right] ^2\) for shifted and rotated Sphere, Ellipsoid, Bent Cigar, and Rosenbrock functions below the threshold values \(10^3, 3\times 10^4\), \(10^6\), and \(5\times 10^3\), respectively

Let us consider the scenario where out of the points sampled in \(D \subset {\mathbb {R}}^n\), a set of m vectors/candidate solutions have an objective function value below the threshold thr:

$$\begin{aligned} \begin{array}{l} \mathbf {x^1}=\left( x^1_1,x^1_2,\ldots , x^1_n\right) \\ \mathbf {x^2}=\left( x^2_1,x^2_2,\ldots , x^2_n\right) \\ \ldots \\ \mathbf {x^m}=\left( x^m_1,x^m_2,\ldots , x^m_n\right) \\ \end{array} \end{aligned}$$

and are allocated in the data structure \({\mathbf {V}}\).

These points can be interpreted as the samples of a multivariate statistical distribution characterised by a mean vector:

$$\begin{aligned} \mathbf {\mu }=\left( \mu _1,\mu _2,\ldots ,\mu _n\right) =\frac{1}{m}\left( \sum _{i=1}^m x^i_1, \sum _{i=1}^m x^i_2, \ldots , \sum _{i=1}^m x^i_n \right) \end{aligned}$$

and a covariance matrix:

$$\begin{aligned} {\mathbf {C}}=\left( \begin{array}{cccc} c_{1,1} &{} c_{1,2} &{}\ldots &{} c_{1,n} \\ c_{2,1}&{}c_{2,2} &{}\ldots &{} c_{2,n}\\ \ldots &{}\ldots &{}\ldots &{} \ldots \\ c_{n,1}&{}c_{n,2}&{}\ldots &{} c_{n,n} \end{array} \right) \end{aligned}$$

where:

$$\begin{aligned} c_{j,j}=\left( \frac{1}{m}\right) \sum _{i=1}^m\left( \left( x^i_{j}-\mu _j\right) \left( x^i_{j}-\mu _j\right) \right) \end{aligned}$$

and:

$$\begin{aligned} c_{j,l}= \left( \frac{1}{m}\right) \sum _{i=1}^m\left( \left( x^i_{j}-\mu _j\right) \left( x^i_{l}-\mu _l\right) \right) . \end{aligned}$$

Subsequently, the n eigenvectors of the matrix \({\mathbf {C}}\) are calculated by means of Cholesky Factorisation, see [38, 44]. These eigenvectors are the columns \({\mathbf {p}}^i\) of a matrix \({\mathbf {P}}\):

$$\begin{aligned} {\mathbf {P}}=\left( {\mathbf {p}}^1,{\mathbf {p}}^2,\ldots ,{\mathbf {p}}^n\right) . \end{aligned}$$

The directions of the eigenvectors are then used by the algorithms to perform the search. Algorithm 5 displays the pseudocode of the Fitness Landscape Analysis.

figure e

Algorithmic Search

If \({\mathbf {P}}\) is the matrix of the eigenvectors, we propose a GPS whose basis matrix is \({\mathbf {P}}\) and its pattern matrix \({\mathbf {P}}_k\) is thus:

$$\begin{aligned} {\mathbf {P}}_k=\mathbf {PC}_k=\left( \mathbf {PM}_k,\mathbf {-PM}_k,\mathbf {PL}_k\right) \end{aligned}$$
(11)

the trial step along the ith direction is:

$$\begin{aligned} {\mathbf {s}}_k=\varDelta _k \mathbf {Pc}_k^i \end{aligned}$$
(12)

and if \({\mathbf {x}}_k\) is the current best solution at the \(k\)th iteration, the corresponding trial solution \(\mathbf {x^t}_k\) is determined by:

$$\begin{aligned} \mathbf {x^t}_k={\mathbf {x}}_k+{\mathbf {s}}_k. \end{aligned}$$
(13)

To exemplify the content of the proposal, let us see the proposed implementations of gPS and HJPS. Since the main novel characteristic of the proposed idea is the use of the eigenvectors of a covariance matrix, we will refer to the implemented algorithms as Covariance Pattern Search, and these two implementations as greedy Covariance Pattern Search (gCPS) and Hooke-Jeeves Covariance Pattern Search (HJCPS).

In the case of gCPS in two dimensions (\(n=2\)), the basis matrix is the eigenvector matrix \({\mathbf {P}}\) while the generating matrix \({\mathbf {C}}_k\) is:

$$\begin{aligned} {\mathbf {C}}_k=\left( \begin{array}{ccccccccc} \frac{1}{2} &{} 0 &{} -1 &{} 0 &{} \frac{1}{2} &{} \frac{1}{2} &{}-1 &{}-1 &{} 0 \\ 0 &{} \frac{1}{2} &{} 0 &{} -1 &{} \frac{1}{2} &{} -1 &{}-1 &{} \frac{1}{2} &{} 0 \end{array} \right) \end{aligned}$$

and \(\varDelta _k=\rho\) (which gets halved every time the exploration fails). Thus, the total set of possible moves is determined by \(\mathbf {PC}_k\).

For example, the distribution in Fig. 6 obtained from a rotated ellipsoid function is associated with a covariance matrix:

$$\begin{aligned} {\mathbf {C}}=\left( \begin{array}{cc} 55.362 &{} 67.026\\ 67.026 &{} 109.40 \end{array} \right) \end{aligned}$$

whose eigenvector matrix would be:

$$\begin{aligned} {\mathbf {P}}=\left( \begin{array}{cc} -0.82881 &{} 0.55953\\ 0.55953 &{} 0.8.2881 \end{array}\right) \end{aligned}$$
Fig. 6
figure 6

Distribution of points for a rotated ellipsoid associated with a covariance matrix \({\mathbf {C}}\) and eigenvector matrix \({\mathbf {P}}\)

One potential trial step could be:

$$\begin{aligned} {\mathbf {s}}_k= & {} \varDelta _k \mathbf {Pc}_k^6=\rho \left( \begin{array}{cc} -0.82881 &{} 0.55953\\ 0.55953 &{} 0.82881 \end{array}\right) \left( \begin{array}{c} \frac{1}{2} \\ -1 \end{array}\right) \\= & {} \rho \left( \begin{array}{c} -0.97393 \\ -0.54905 \end{array} \right) \end{aligned}$$

which can be re-written as:

$$\begin{aligned} {\mathbf {s}}_k=\rho {\mathbf {P}}\left( \frac{1}{2}{\mathbf {e}}^1-{\mathbf {e}}^2\right) =\rho \left( \frac{1}{2}{\mathbf {e}}^1{\mathbf {p}}^1-{\mathbf {p}}^2\right) . \end{aligned}$$

In other words, in the n dimensional case every trial vector is the linear combination of the eigenvectors:

$$\begin{aligned} {\mathbf {s}}_k=\lambda _1 {\mathbf {p}}^1 +\lambda _2 {\mathbf {p}}^2+\ldots +\lambda _n {\mathbf {p}}^n \end{aligned}$$

where the coefficients \(\lambda _i\) can take only three values: \(\forall i, \lambda _i\in \lbrace 0, -\rho , \frac{\rho }{2} \rbrace\).

In the case of HJCPS in two dimensions (\(n=2\)), the basis matrix is the eigenvector matrix \({\mathbf {P}}\). The generating matrix \({\mathbf {C}}_k\) is slightly more complex due to the presence of the pattern move. Let us partition the matrix \({\mathbf {C}}_k\) as:

$$\begin{aligned} {\mathbf {C}}_k=\left( \mathbf {C^e}_k,\mathbf {C^p}_k\right) \end{aligned}$$

where \(\mathbf {C^{e}}_k\) refers to the exploratory move and \(\mathbf {C^{p}}_k\) refers to the pattern move. The first submatrix is analogous to gCPS, i.e.:

$$\begin{aligned} \mathbf {C^e}_k=\left( \begin{array}{ccccccccc} 1 &{} 0 &{} -1 &{} 0 &{} 1 &{} 1 &{}-1 &{}-1 &{}0\\ 0 &{} 1 &{} 0 &{} -1 &{} 1 &{} -1 &{}-1 &{} 1 &{}0 \end{array} \right) \end{aligned}$$

The pattern move attempts to search further along the direction of the successful exploratory move. With reference to Fig. 4, the successful exploratory move corresponds to the column:

$$\begin{aligned} {\mathbf {c}}_k^i=\left( \begin{array}{c} 1 \\ -1 \end{array}\right) . \end{aligned}$$

In this case, the pattern move adds a vector to the exploratory move as follows:

$$\begin{aligned} \alpha \left( {\mathbf {x}}-\mathbf {x^{old}}\right) =\alpha \left( \begin{array}{c} 1 \\ -1 \end{array}\right) . \end{aligned}$$

Thus, the total possible move is:

$$\begin{aligned} \left( \begin{array}{c} 1 \\ -1 \end{array}\right) +\alpha \left( \begin{array}{c} 1 \\ -1 \end{array}\right) =\left( \begin{array}{c} 1+\alpha \\ -1-\alpha \end{array}\right) \end{aligned}$$

Thus, the second submatrix of the generating matrix is:

$$\begin{aligned} \mathbf {C^p}_k=\left( \begin{array}{ccccccccc} 1+\alpha &{} 0 &{} -1-\alpha &{} 0 &{} 1+\alpha &{} 1+\alpha &{}-1-\alpha &{}-1-\alpha &{} 0\\ 0 &{} 1+\alpha &{} 0 &{} -1-\alpha &{} 1+\alpha &{} -1-\alpha &{}-1+\alpha &{} 1+\alpha &{} 0 \end{array} \right) . \end{aligned}$$

Additionally, for HJCPS, \(\varDelta _k=\rho\) and follows the halving logic. The other steps are analogous to what is shown for gCPS.

In the general n dimensional case, the trial step \({\mathbf {s}}_k\) can be represented as a linear combination of eigenvectors:

$$\begin{aligned} {\mathbf {s}}_k=\alpha '\left( \lambda _1 {\mathbf {p}}^1 +\lambda _2 {\mathbf {p}}^2+\ldots +\lambda _n {\mathbf {p}}^n\right) \end{aligned}$$

where \(\forall i, \lambda _i\in \lbrace 0, -\rho , \rho \rbrace\) and \(\alpha ' \in \lbrace 1, \alpha \rbrace\).

Let us consider a given problem that has been analysed as shown in Algorithm 5 with the eigenvectors made available. The variation operators composing the exploratory moves for gPS and HJPS become:

gPS

gCPS

\(\mathbf {x^{t}}=\mathbf {x}-\rho \cdot \mathbf {e}^i\)

\(\mathbf {x^{t}}=\mathbf {x}-\rho \cdot \mathbf {p}^i\)

\(\mathbf {x^t}=\mathbf {x}+\frac{\rho }{2}\cdot \mathbf {e}^i\)

\(\mathbf {x^t}=\mathbf {x}+\frac{\rho }{2}\cdot \mathbf {p}^i\)

HJPS

HJCPS

\(\mathbf {x^{t}}=\mathbf {x}+\rho \cdot \mathbf {e}^i\)

\(\mathbf {x^{t}}=\mathbf {x}+\rho \cdot \mathbf {p}^i\)

\(\mathbf {x^{t}}=\mathbf {x}-\rho \cdot \mathbf {e}^i\)

\(\mathbf {x^{t}}=\mathbf {x}-\rho \cdot \mathbf {p}^i\)

\(\mathbf {x^p}=\mathbf {x^{old}}+\alpha \left( \mathbf {x}-\mathbf {x^{old}}\right)\)

\(\mathbf {x^p}=\mathbf {x^{old}}+\alpha \left( \mathbf {x}-\mathbf {x^{old}}\right)\)

For the sake of clarity, the gPS in Algorithm 2 is then revisited and displayed in Algorithm 6.

figure f

Theoretical Justification of Covariance Pattern Search

This section provides a theoretical justification of the proposed approach. We structured this section into three parts addressing three subproblems:

  1. 1.

    If the covariance matrix contains enough eigenvectors to generate a pattern \({\mathbf {P}}_k\).

  2. 2.

    If the proposed method is guaranteed to converge and under what hypotheses the convergence occurs.

  3. 3.

    Why the choice of eigenvectors as search directions of a pattern is proposed in this study.

To address the first question, we have to recall some basic linear algebra notions. Different square matrices (or different endomorphisms) of order n can have a different number of linearly independent eigenvectors, see [38]. Thus, in the general case we might have a matrix \({\mathbf {A}}\) of order n and only \(n-1\) linearly independent eigenvectors. In that case, the matrix \({\mathbf {P}}\) would be rectangular and there would not be enough directions to span the entire decision space.

In our case the covariance matrix \({\mathbf {C}}\) is guaranteed to be symmetric due to the commutativity of the product of numbers \(\forall j,l\):

$$\begin{aligned} c_{j,l}= \left( \frac{1}{m}\right) \sum _{i=1}^m\left( \left( x^i_{j}-\mu _j\right) \left( x^i_{l}-\mu _l\right) \right) =\left( \frac{1}{m}\right) \sum _{i=1}^m\left( \left( x^i_{l}-\mu _l\right) \left( x^i_{j}-\mu _j\right) \right) =c_{l,j}. \end{aligned}$$

Since the covariance matrix \({\mathbf {C}}\) is symmetric, as long as it is calculated by means of at least \(n+1\) linearly independent samples (vectors), the following properties are verified (see [38] for the proofs).

Proposition 1

Properties of the covariance matrix \({\mathbf {C}}\).

  • The eigenvalues of \({\mathbf {C}}\) are all real.

  • Any two eigenvectors of \({\mathbf {C}}\) corresponding to two distinct eigenvalues are orthogonal.

  • The covariance matrix \({\mathbf {C}}\) is always diagonalisable, that is, there exists a non singular matrix \({\mathbf {P}}\) of order n such that:

    $$\begin{aligned} \mathbf {P^{-1}CP}={\mathbf {D}} \end{aligned}$$

    is diagonal.

  • The covariance matrix \({\mathbf {C}}\) can be diagonalised by means of an orthogonal matrix \({\mathbf {P}}\)that is \(\mathbf {P^{-1}=P^{T}}\), whose columns are the eigenvectors of \({\mathbf {C}}\).

Thus, in relation to the proposed method we can provide the following observation.

Observation 1

For an optimisation problem and an estimated covariance matrix \({\mathbf {C}}\) from Algorithm 5, a non singular matrix of eigenvectors \({\mathbf {P}}\) and a pattern containing the direction of the eigenvectors \({\mathbf {P}}_k=\left( \mathbf {PM}_k,\mathbf {-PM}_k,\mathbf {PL}_k\right)\) can always be found.

Regarding the second subproblem, the convergence of the method, we consider the study in [48] about the convergence of GPS and summarise some of the theoretical results in the following theorem.

Theorem 1

Let f be a continuously differentiable function on a compact neighborhood \(L\left( \mathbf {x_0}\right)\). The sequence \(\lbrace {\mathbf {x}}_k\rbrace\) produced by GPS in Algorithm 4 with respect to the Strong Hypotheses converges to a null gradient point \({\mathbf {x}}\):

$$\begin{aligned} \lim _{k \rightarrow \infty } |\nabla f\left( {\mathbf {x}}\right) |={\mathbf {o}} \end{aligned}$$

where \({\mathbf {o}}\) is the null vector.

Since from a theoretical standpoint, this article does not propose a completely new algorithm but a novel implementation of GPS (that is the use of the basis matrix \({\mathbf {B}}={\mathbf {P}}\)), the convergence of Covariance Pattern Search algorithms is subject to the same hypotheses discussed in [48] and summarised in Theorem 1.

Let us verify the strong hypotheses for gCPS.

  • Hypothesis 1: \({\mathbf {s}}_k\) is a linear combination of the eigenvectors that is a column of \(\varDelta _k{\mathbf {P}}_k\).

  • Hypothesis 2: from Algorithm 6, when a better solution is found there is an update in the value of \({\mathbf {x}}\). At the end of the exploratory moves the sum of all the updates contains the trial step \({\mathbf {s}}_k\).

  • Hypothesis 3: from Algorithm 6 \(, \rho\) remains the same when there is an update and decreases when the exploratory move failed

Similar considerations can be made about HJCPS. This leads to the following observation.

Observation 2

The proposed Covariance Pattern Search algorithms converge to a null gradient point (possibly a local optimum) under the same conditions of GPS. If the hypotheses on the problem are not valid then Covariance Pattern Search algorithms are heuristics.

From a practical position, the guarantee of convergence as in Theorem 1 is not itself a guarantee that the algorithm is usable and successful to solve real-world problems. The first reason is that many real problems are not continuously differentiable, and often the function is not available. The second reason is that a null gradient point (not necessarily a local optimum) is guaranteed to be found after a very large number of steps. On the other hand, the demand of real-world optimisation is to detect a high-performance solution in a short time.

In this sense, the convergence rate (convergence speed is the term used for heuristics and metaheuristics) is a very important characteristic of a search algorithm. The proposal to use eigenvectors of the covariance matrix is driven by this motivation which is then summarised in the following conjecture.

Conjecture 1

If a distribution of points describes the geometry of the problem, a Pattern Search performed along the directions of the eigenvectors (of the covariance matrix of this distribution) has higher convergence rate than the same Pattern Search performed along the directions of any other basis of \({\mathbb {R}}^n\).

Justification. To understand the rationale of the proposal we should consider the fundamental meaning of the covariance matrix of a multivariate distribution, see [14].

The diagonal elements of the matrix directly describe the geometry of the problem since they represent the extent of the distribution along with a variable. A diagonal element much larger in value than that of other diagonal elements means that the distribution is stretched along with a design variable. In our case, the distribution approximates the basin of attraction [21] and the shape of the contour plot, and that is the geometry of the problem. With reference to Fig. 5, for the sphere the diagonal elements of the covariance matrix are very similar to each other while for a (non-rotated) ellipsoid the diagonal elements would greatly differ from each other.

The extradiagonal elements represent the correlation between pairs of variables. A large value means a high correlation while zero means no correlation. In our case, since the points are a level set, the correlation is meant with respect to the objective function: zero means that the function can be decomposed over the variables while a large value means that this decomposition is not possible. To intuitively visualise this fact, the covariance matrix associated with a sphere or a non-rotated ellipsoid would be diagonal while that associated to rotated problems, such as rotated bent cigar, would be full.

As shown above, Pattern Search algorithms generate the trial vector as a linear combination of vectors along with the directions of the basis \({\mathbf {B}}\), hence they search the optimisation problem by decomposing it along with the directions of \({\mathbf {B}}\). The vast majority of Pattern Search algorithms, e.g. [4, 23, 41], use \({\mathbf {B}}\) equal to the identity matrix \({\mathbf {I}}\), i.e. a search along with the directions of the design variablesFootnote 1. This strategy is efficient in solving problems whose associated covariance matrix is diagonal, however, they are far less efficient in solving problems whose associated covariance matrix is full, e.g. rotated bent cigar.

The main idea behind this proposal is to choose those directions that diagonalise the covariance matrix. The proposed Covariance Pattern Search algorithms can be seen as methods that initially perform a change of coordinates and then search along with the variables of the new reference system. The problem can then be decomposed along these new variables resulting in efficient Pattern Search Algorithms.

This concept is broadly used in other contexts, especially in Data Science, and is closely related to Principal Component Analysis [26]. To better understand the direction of the exploration, Fig. 7 displays distribution points and eigenvector directions for the same problems displayed in Fig. 5

Fig. 7
figure 7

Sampling of points within \(\left[ -100, 100\right] ^2\) and corresponding eigenvector directions for shifted and rotated Sphere, Ellipsoid, Bent Cigar, and Rosenbrock functions below the threshold values \(10^3\), \(7\times 10^4\), \(10^6\), and \(9\times 10^3\), respectively

To better understand the functioning of the method, Fig. 8 illustrate the points visited by gPS and gCPS respectively to solve the rotated ellipsoid and rotated bent cigar problems. Both the algorithms have been run until \(\rho < 10^{-75}\) or a maximum function evaluation budget of \(100000\times n\). The points visited by gPS are marked in yellow while those visited by gCPS are marked in red. The final objective function errors are reported on the top of the figure.

The rotated ellipsoid in two dimensions is an easy problem for both the algorithms. However, we may observe that gCPS (red markers) reaches the axis of the ellipsoid in one step and then descend through it through the optimum, while gPS needs more steps to approach it. We found that gCPS reached its final value in 829 objective function calls (including the fitness landscape analysis to which a budget of 500 function calls was allocated) while gPS required 1005 objective function calls.

For the bent cigar, the advantages of the proposed Covariance Pattern Search over the standard gPS approach are more evident. The directions provided by the design variables appear unsuitable to descend the gradient along the thin stripe characterising the geometry of this problem. As a result, gPS fails to detect the local minimum. The proposed gCPS moves along the main dimension of the stripe (i.e. the direction of one eigenvector) and efficiently detects a point that is very close to the theoretical optimum.

Fig. 8
figure 8

Functioning of gPS (yellow markers) and gCPS (red markers). The result of the fitness landscape analysis and directions of the eigenvectors are also displayed. The error of the algorithms is reported on the top of each figure

To further justify Conjecture 1, with reference to the problem in Fig. 8, we performed the following test. From the optimum we performed ten steps of size \(\rho =0.01\) along the directions of the matrix \({\mathbf {B}}\) for gPS and gCPS. In other words, we performed ten steps in the directions of the variables and the directions of the eigenvectors of the covariance matrix \({\mathbf {C}}\). At each step, we calculated and saved the objective function value. With the calculated objective function values we calculated the numerical gradient of the objective functions along these directions. Figure 9 illustrates the directional derivative along the directions of gPS (continuous line) and gCPS (dashed lines).

Fig. 9
figure 9

Plot of the directional derivatives of gPS (continuous lines) and gCPS (dashed lines) along with their respective search directions (columns of matrix \({\mathbf {B}}\)) on the bent cigar problem in two dimensions

The test shows that gCPS optimises the objective function along a high derivative direction and a low derivative direction. The fitness landscape analysis identifies a search direction along which small steps quickly lead to large improvements and another direction along which the problem is nearly “flat”. Unlike gCPS, the standard gPS directions present intermediate gradient features.

We repeated the test for all the problems listed in Sect. 5 for 2, 10, 30, and 50 dimensions. Apart from the sphere where all the lines of the gradient plot collapse in one line, the fitness landscape analysis of gCPS systematically detected a direction whose gradient was higher than that of any variable directions of gPS. In other words, the CPS mechanism appears to be able to detect some preferential directions along which a pattern search displays a high convergence rate.

To show an example of the gradient in higher dimensions, Fig. 10 illustrates the results of the test for the rotated ill-conditioned ellipsoid in ten variables, see Section 5 for details on the function.

The results in Fig. 10 clearly show that gCPS performs its search along one direction where the derivative is higher than any variable directions of gPS.

Fig. 10
figure 10

Plot of the directional derivatives of gPS (continuous lines) and gCPS (dashed lines) along their respective search directions (columns of matrix \({\mathbf {B}}\)) on the ill-conditioned ellipsoid in ten dimensions

Numerical Results

Results in this section are divided into two subsections:

  • Validation of the Covariance Pattern Search.

  • Comparison against other algorithms.

To simulate the local search conditions we have considered and adapted a sample of the functions from the CEC 2013 benchmark (focussing on unimodal problems), see [31]. We have used two versions of the ill-conditioned ellipsoid since the use of two versions was relevant to demonstrate the performance of the local search. The ill-conditioned ellipsoid \(f_3\) is the one in [30] while the ellipsoid \(f_2\) has been introduced by us in this study. The condition numbers of both \(f_2\) and \(f_3\) worsen with the dimensionality of the problem. However, they worsen with the dimensionality at different speeds, see Table 1 . Each problem has been scaled to 10, 30, and 50 dimensions and has been studied in \(\left[ -100, 100\right] ^n\). Table 1 displays the functions and illustrates the shape of the corresponding basins of attraction. For each problem, a shift and a rotation has been applied: with reference to Table 1 the variable:

$$\begin{aligned} {\mathbf {z}}={\mathbf {Q}}_p\left( {\mathbf {x}}-{\mathbf {o}}\right) \end{aligned}$$

where \({\mathbf {o}}\) is the shifting vector (the same used in [31]) and \({\mathbf {Q}}_p\) is a rotation matrix (a randomly generated orthogonal matrix) set for the \(p\)th problem.

Table 1 Basins of attraction functions

The Pattern Search variants in this article have all been run with the initial radius \(\rho = 0.1 \times\) domain width \(= 20\). This parameter has been set by using the indication in [49] and then tuning for our testbed. The threshold thr for the problems in Table 1 are reported in Table 2. The threshold values have been set empirically by testing values of the codomain that allowed that some points were stored in the data structure \({\mathbf {V}}\) while some others were discarded. See Section 6 for further considerations about this issue.

Table 2 Thresholds thr in 10, 30, and 50 dimensions

Although Pattern Search is a deterministic algorithm, its performance can depend on the initial point and, for Covariance Pattern Search, its performance depends also on the sampled points used to estimate the covariance matrix. Therefore, for each experiment, 51 independent runs have been performed and the associated mean value and ± standard deviation have been displayed.

Furthermore, to statistically investigate the question of whether the application of the proposed method results in performance gains, the Wilcoxon rank-sum test has been applied, see [17, 51]. In the Tables in this section, a “+” indicates that gCPS significantly outperforms competitor, a “-” indicates that the competitor significantly outperforms gCPS, and a “=” indicates that there is no significant difference in performance.

All the algorithms in this section have been executed with a budget of \(10000 \times n\) function calls where n is the problem dimensionality. To guarantee a fair comparison, the budget of the proposed Covariance Pattern Search has been split into two parts: \(5000 \times n\) function calls have been used to build the covariance matrix \({\mathbf {C}}\), whilst \(5000 \times n\) function calls have been spent to execute the algorithm. Due to the nature of Pattern Search, i.e. deterministic local search, the bound handling has been performed by saturating the design variable to the bound. We preferred the saturation to the bound over the toroidal insertion or reflection [13] since the latter two mechanisms would be equivalent to the sampling of a point. This sampling would disrupt the gradient estimation logic of Pattern Search.

Validation of Covariance Pattern Search

To validate the functioning of Covariance Pattern Search we compared gPS and gCPS detailed in Algorithms 2 and 6, respectively. It must be remarked that the algorithms have the same structure except for the matrix \({\mathbf {B}}\).

Table 3 shows the numerical results of gPS vs gCPS.

Table 3 Average error avg ± standard deviation \(\sigma\) over 51 runs for the problems listed in Table 1: greedy Pattern Search (gPS) vs greedy Covariance Pattern Search (gCPS) with gCPS reference for Wilcoxon Test (W)

Numerical results in Table 3 show that, apart from \(f_1\), the proposed gCPS consistently outperforms the standard gPS across the three dimensions under consideration. The problem \(f_1\) is characterised by a central symmetry, meaning that the rotation is ineffective and any reference system would broadly perform in the same way. In this sense, gCPS “wastes” half of its budget to find a search direction whilst gPS uses its entire budget to search for the optimum. With regards to the other five problems, the use of the eigenvectors of \({\mathbf {C}}\) as search directions not only systematically improves upon gPS but appears to solve some problems for which gPS would fail, see \(f_4\) and \(f_5\).

Figures 11 and 12 illustrate the performance trends of gPS and gCPS for the ill-conditioned ellipsoid \(f_3\) and the discus \(f_5\) functions, respectively (see Table 1).

Fig. 11
figure 11

Performance trend (logarithmic scale) of gPS vs gCPS for the ellipsoid \(f_3\) in 10D

Fig. 12
figure 12

Performance trend (logarithmic scale) of gPS vs gCPS for the discus \(f_5\) in 30D

As shown in both Figs. 11 and 12, the performance trend of gCPS does not lead to any improvement for the first half of the budget. This is due to the computational cost that we have allocated to the fitness landscape analysis. Following the fitness landscape analysis, gCPS quickly improves upon the initial value and reaches a solution whose objective function value is orders of magnitude lower than that detected by the standard gPS.

To ensure that the results are not biased by specific rotation matrices, the experiments have been repeated on the problems in Table 1 with a modified experimental condition. For each run a new rotation matrix has been generated and both gPS and gCPS have been run on the rotated problem. An additional 51 independent runs have been executed under this new condition. Table 4 displays the results for this set of experiments with random matrices.

Table 4 Average error avg ± standard deviation \(\sigma\) over 51 runs for the problems listed in Table 1 subject to a random rotation at each run: greedy Pattern Search (gPS) vs greedy Covariance Pattern Search (gCPS) with gCPS reference for Wilcoxon Test

Numerical results in Table 4 show that gCPS maintains the same performance irrespective of the rotation matrix. These results allow us to conjecture that the proposed mechanism of optimising along the direction of the eigenvectors is effective and that the generation of the covariance matrix is robust.

Comparison Against Other Algorithms

We have compared gCPS against the following three algorithms:

  • Probabilistic Global Search Lausanne (PGSL) [45];

  • Covariance Matrix Adaptive Evolution Strategy (CMAES) [20];

  • Whale Optimisation Algorithm (WOA) [34].

The motivation behind these three competitors is the following: (1) PGSL, like gCPS, is a modern Pattern Search implementation; (2) CMAES is a prevalent algorithm that, like gCPS, is based on theoretical considerations about multivariate distributions and the covariance matrix; (3) WOA is a population-based metaheuristic recently proposed in the literature.

Furthermore, it must be remarked that as shown in [19], CMAES frameworks display excellent performance on unimodal problems and can be used as high-performance local search, see [35]. In order to provide further experimental evidence of this fact and that the comparison against CMAES is fair, we reported in Appendix the performance of two local search algorithms against the proposed gCPS. Also, the WOA, as shown in [34], is an emerging metaheuristic that can solve unimodal problems more efficiently than a large number of metaheuristics used for comparison.

We used the parameters suggested by the authors of the original papers/software: PGSL has been run with 20 intervals, 6 sub-intervals, and scale factors 0.96 and \(n^{-\frac{1}{n}}\), respectively (see [45]); CMAES has been run according to the implementation in [18] and the default parameters therein (initial \(\sigma\) set to one third of the domain); WOA has been run with 30 agents and for \(334 \times n\) iterations (to make the budget equal to that of gCPS). Numerical results with fixed and random rotation are displayed in Tables 5 and 6, respectively.

Table 5 Average error avg ± standard deviation \(\sigma\) over 51 runs for the problems listed in Table 1: greedy Covariance Pattern Search (gCPS) with gCPS reference for Wilcoxon vs Probabilistic Global Search Lausanne (PGSL) [45], Covariance Matrix Adaptive Evolution Strategy (CMAES) [20], and Whale Optimisation Algorithm (WOA) [34]
Table 6 Average error avg ± standard deviation \(\sigma\) over 51 runs for the problems listed in Table 1 subject to random rotation at each run: greedy Covariance Pattern Search (gCPS) with gCPS reference for Wilcoxon vs Probabilistic Global Search Lausanne (PGSL) [45], Covariance Matrix Adaptive Evolution Strategy (CMAES) [20], and Whale Optimisation Algorithm (WOA) [34]

Numerical results show that the proposed gCPS outperforms PGSL and WOA consistently for all the problems under consideration. It must be observed that while gCPS is designed as a “precision tool” to handle unimodal problems with rotation, WOA is a global optimiser to detect a solution close the optimum in multi-modal scenarios. In this sense we expected gCPS to outperform WOA. The comparison between gCPS and PGSL is more interesting. PGSL contains a pattern search logic with the same search directions of PS (directions of the variables) and various other elements integrated to improve its performance. We observe that for the problems under examination, the simple exploratory logic of gCPS appears more effective than that of the PGSL which is more sophisticated. This fact confirms that the selection of proper search directions is fundamental in GPS.

Finally, the comparison of gCPS vs CMAES has been included with the deliberate intent of emphasising the limitations and potentials of the proposed approach. For about half of the problems gCPS is competitive or outperforms CMAES, whilst for some other problems CMAES achieves results that are orders of magnitude better in performance than gCPS. This fact happens especially in cases of steep gradients, such as the ellipsoid functions \(f_2\) and \(f_3\). The superiority of CMAES for these problems highlights an area for improvement of gCPS: unlike CMAES that has an ingenious adaptive mechanism, CPS employs a naive search operation based on the same step size in all the directions. The latter is likely not suited for highly ill-conditioned problems. Nonetheless, the proposed logic can be exported to other schemes and more sophisticated operators, e.g. an adaptive search radius, can enhance upon the current performance.

An example of performance trends for all the algorithms involved in this study is illustrated in Fig. 13.

Fig. 13
figure 13

Performance trend (logarithmic scale) for the Bent Cigar \(f_4\) in 10 dimensions

To further strengthen the statistical analysis of the presented results, we performed the Holm-Bonferroni [22] procedure for the five algorithms and thirty-six problems (6 objective functions \(\times 3\) levels of dimensionality \(\times 2\) variants of rotation) under consideration. The results of the Holm-Bonferroni procedure are presented in Table 7. The Holm-Bonferroni procedure consists of the following. A score \(R_j\) for \(j = 1,\dots ,N_A\) (where \(N_A\) is the number of algorithms under analysis, \(N_A = 5\) in this paper) has been assigned. The score has been assigned in the following way: for each problem, a score of 5 is assigned to the algorithm displaying the best performance, 4 is assigned to the second best, 3 to the third, and so on. For each algorithm, the scores obtained on each problem are summed up and averaged over the 36 test problems. With the calculated \(R_j\) values, gCPS has been taken as the reference algorithm. \(R_0\) indicates the rank of gCPS, and with \(R_j\) for \(j = 1,\dots ,N_A-1\) the rank of the remaining four algorithms. Let j be the index of the algorithm, the values \(z_j\) have been calculated as:

$$\begin{aligned} z_j = \frac{R_j - R_0}{\sqrt{\frac{N_A(N_A+1)}{6N_{TP}}}} \end{aligned}$$

where \(N_{TP}\) is the number of the 36 test problems. By means of the \(z_j\) values, the corresponding cumulative normal distribution values \(p_j\) have been calculated, see [15]:

$$\begin{aligned} p_j=\frac{2}{\sqrt{\pi }}\int _{\frac{-z_j}{\sqrt{2}}}^\infty e^{-t^2}dt. \end{aligned}$$

These \(p_j\) values have then been compared with the corresponding \(\delta /j\) where \(\delta\) is the level of confidence, set to 0.05 in this case. Table 7 displays the ranks, \(z_j\) values, \(p_j\) values, and corresponding \(\delta /j\) obtained. Moreover, it is indicated whether the null-hypothesis (that the two algorithms have indistinguishable performance) is “Rejected”, i.e. the algorithms have statistically different performance, or “Failed to Reject” if the test failed to assess that there is different performance (one does not outperform the other).

Table 7 Holm-Bonferroni Procedure with gCPS as reference (Rank 4.1944e+00)

The results of the Holm-Bonferroni procedure in Table 7 show that gCPS and CMAES have better performance than the other algorithms and that gCPS significantly outperforms PGSL, gPS, and WOA for the 36 problems under consideration. Conversely, the ranking values of gCPS and CMAES are very similar with slightly better performance for gCPS. However, the Holm-Bonferroni procedure shows that, statistically, gCPS has the same performance as CMAES.

Limitations of Covariance Pattern Search and Future Developments

Although gCPS and the concept of Covariance Pattern Search appear to be extremely promising, we acknowledge that there are opportunities for improvement. Hence, this section highlights the limitations of the proposed approach and some ideas on how the research can be moved forward.

At first we wish to comment on a feature of gCPS as opposed to a strict limitation: gCPS works as a local search algorithm and thus may be inefficient on its own to address multimodal problems. This is of course a limitation of local search algorithms and can be overcome by integrating gCPS within a global search framework, see [39, 40], or by simply using a resampling mechanism to periodically restart the search in an unexplored area of the decision space, see [6].

A second issue is that the performance of Covariance Pattern Search depends on the threshold parameter thr, see Algorithm 5 . This parameter should allow the generation of a data set that describes the geometry of the problem. With reference to minimisation, a value that is too low would result in a small or empty data set, while a value that is too high would generate a dataset that does not contain the geometrical features of the basin of attraction. Furthermore, this parameter is problem dependent as it depends on the specific objective function values. This means that to ensure high performance of gCPS, a tuning of this parameter is currently performed. Besides being inelegant, this procedure can also be computationally expensive in the high dimensional domain.

The third limitation that we identified (especially when we compared the performance of gCPS against CMAES) is in the exploratory radius (step size) and the way it is handled. The current naive implementation that uses the same radius along all the exploratory directions is inefficient in the case of high ill-conditioning. Although the preliminary analysis allows the detection of search directions with very different gradients, as shown in Figs. 9 and 10, this information is not fully exploited during the optimisation. Qualitatively, we may say that we would like small steps when the gradient is high and large steps when it is low. Further works will investigate the use of gradient-based and adaptive step sizes.

Finally, further work will be made to integrate within the proposed gCPS techniques for constraint handling, such as the use of external penalties.

Conclusion

This paper proposes an implementation of Generalised Pattern Search (GPS) in which search directions are given by the eigenvectors of the covariance matrix associated with the distribution of points whose objective function value falls below a prearranged threshold.

The theoretical analysis carried out in this study shows that the proposed algorithm, namely Covariance Pattern Search, is always applicable and converges to a null-gradient point under the same conditions of GPS. The conjecture that the eigenvectors enable fast convergence is confirmed by the experimental results that show a clear superiority of Covariance Pattern Search with respect to its standard version. It should be observed that the proposed directions (eigenvectors) allow the identification of a high gradient direction which would enable the quick improvement upon an initial point.

Numerical results show that the specific implementation of Covariance Pattern Search proposed in this article outperforms, on a set of rotated problems, modern algorithms based on pattern search and swarm intelligence.