1 Introduction

Over the past few decades, global optimisation techniques for solving combinatorial problems have flourished. The ever-increasing complexity of engineering applications means that variable sets are growing larger, and the subsequent landscapes that need to be explored by these optimisation problems are becoming increasingly complicated.

Many metaheuristic algorithms were developed to solve optimisation problems by mimicking biological and physical analogies (Ser et al. 2019), including algorithms such as genetic algorithms (GAs) (Holland 1992), particle swarm optimisation (PSO) (Kennedy and Eberhart 1995) and the recent generalized version (GEPSO) (Sedighizadeh et al. 2021), and harmony search (HS) (Geem et al. 2002) and its latest modifications such as the Geem and Sim (2010), Shaqfa and Orbán (2019) and Jeong et al. (2020), as well as the recent whales optimisation algorithm (WOA) (Mirjalili and Lewis 2016), and pathfinder algorithm (PFA) (Yapici and Cetinkaya 2019), to mention but a few. The candidate problems usually range from continuous differentiable problems to discrete, noisy, and even loosely defined objectives, such as in engineering applications.

Sequential sampling was used intensively in the second half of the last century for estimating the distribution of unknown functions, and the influence of such methods is still highly echoed in current engineering metamodels (Jin et al. 2002). As the purpose of this paper is not to review all the optimisation algorithms that employed sequential sampling techniques, we advise the readers to refer to (de Mello and Bayraksan 2014) where they can find a comprehensive review for algorithms used the Monte Carlo sampling approach for solving optimisation problems. Holistically, metaheuristics can be seen as performance-driven sequential sampling processes (Markovian processes) (Yang et al. 2013), where the efficiency of such algorithms depends on the random walks that are used to explore and exploit the provided landscapes. Many random walks have been proposed in the literature, such as Brownian motion (obeys Gaussian distribution) and the Lévy flights (obeys Lévy distribution) for handling stochastic optimisation problems (see Yang et al. 2013; Yang 2009 for comparison).

Estimation of distribution algorithms (EDAs) is another rank of search methods that tries to obtain better solutions from the search domain by estimating the probability density function (PDF) over the whole landscape from the so-far sampled solutions (see Dai et al. 2019 for a detailed review). One relatively recent method that has been used by many engineering problems was proposed by Raphael and Smith (2003). In their method, they subdivide the search domain into a fixed number of sub-domains and depending on the fitness of the collected solutions the PDF of the domain evolves over time and the solutions are eventually focused around the most promising regions.

Generally, most current memetic algorithms are population-based and heavily dependent—as an initial step—on crude random sampling. This initial population step is crucial for the overall performance of the algorithms, affects all subsequent search phases (Ser et al. 2019), is independent of the objective function and is considered an unbiased step of metaheuristics. Said otherwise, the quality of the chosen solutions in the initial population depends only on the design-of-experiment (DOE) methods used (space-filling techniques). Such methods assure equal accessibility to each part of the landscape. A plethora of publications dealt with the influence of the initial sampling and the population size on the overall performance of metaheuristics (such as in Polkov and Bujok 2018). For the here proposed algorithm, we use at every iteration DOE methods for sampling the new generation, as this guarantees simplicity and minimum structural bias of the algorithm.

The algorithm that we propose in this paper builds on the gbest topology that was originally used in the PSO algorithm by Eberhart and Kennedy (1995). This simple and conservative topology prevents the algorithm from exploiting too much information from the candidates, which could drag the algorithm into a structural bias (see Kononova et al. 2015 for definition) or even a premature convergence in late stages. In this work, we demonstrate how this simple algorithm can perform surprisingly well for complex optimisation problems, sometimes outperforming the existing state-of-the-art metaheuristics.

We further postulate our motivation behind this algorithmic proposal where we will use it as a heuristic for solving physical packing problems. The chosen topology of this algorithm was tailored to analogise the exact mathematical solvers used in packing regular items (boxes) in containers. Those solvers use the domain-reduction techniques of the available packing domain as we add more items to the container; some areas become more occupied than others and exploring the search domain of such areas is not computationally efficient. However, this algorithm will be used for packing physical items that are highly irregular and nonconvex in containers. Such complex geometries make the use of traditional methods impractical. For a comprehensive review about the techniques of domain-reduction in mathematics and optimisation problems we refer the reader to Puranik and Sahinidis (2017) and Caprara and Locatelli (2010) and some examples such as Paulen et al. (2013) and for the packing optimisation problems literature see Martello and Toth (1990) and Carlier et al. (2007).

Fig. 7
figure 1

The topology of Pareto-like sequential sampling (PSS) algorithm

Our algorithm uses only the best solution from the previous generation. This best solution will be the center of the tightened search domain, which is referred to as the current promising region. The size of the current promising region will be a function of time–the more optimisation steps that have already been completed, the smaller the promising region will be. The boundaries of the prominent region are only updated if a better solution than any previously obtained solution has been found. We sample inside and outside this region using a standard DOE method; in this paper we use Monte-Carlo sampling, though any other DOE method could also be used such as the Latin hypercube sampling (LHS) method (see Owen 1992; Tang 1993; Ye 1998).

The chosen topology in this paper could be good for global exploration, as it maintains the population diversity, i.e., avoids premature convergence. Conversely, it could be problematic for exploiting local solutions where this process normally depends on the intensified information flow among the members of the population. Thus, this algorithm might be best hybridised in sequential or parallel schemes with other well-known algorithms. For more about topologies in metaheuristics, we refer the readers to Lynn et al. (2018) as they comprehensively review the common algorithmic topologies used in the literature. However, in this paper, we use the proposed algorithm as a standalone metaheuristic, with its simple topology, when benchmarking the algorithm against a range of problems and engineering applications.

The renowned no free lunch (NFL) theorems by Wolpert and Macready (1997) have proven the non-uniformity of the superiority of algorithms for all ranks of optimisation problems. In this spirit, we propose that our algorithm be used as template, which can be combined with various DOE methods and other optimisation algorithms.

This work has been organized such that Sect. 2 explains the analogy, the topology and the mathematical formulation of this algorithm. Section 3 provides a simple probabilistic analysis of the parameter settings, including a numerical illustration. In Sect. 4, we test and compare the performance of this simplified approach with the state-of-the-art algorithms. Finally, our conclusions are provided in Sect. 5, some future applications for this approach are given in Sect. 6 and links to the source code of the algorithm are provided in Sect. 7.

2 Pareto-like sequential sampling (PSS)

2.1 Analogy

As mentioned earlier, the main analogy of this model is to mimic the Pareto principle, which implies that a “virtual few” of a population account for the majority of the effects (Juran and Gryna 1988; Pareto 1897). The Pareto principal is popularly referred to as the 80/20 rule. This principle has been applied to a wide range of human relations and can be observed when a small percentage of contributors make the bulk impact on human-related matters. Historically, this 80/20 principle was first observed and described by the Italian professor Vilfredo Pareto from the University of Lausanne (UNIL), Switzerland (Pareto 1897).

We here employ the Pareto principle by more densely sampling the solutions’ design variables from a tightened search domain, i.e., the current prominent region, while sampling the remainder of the solutions from the overall search domain.

In order to better understand the collective behavior considered in our analogy, one out of many possible examples, is by letting us imagine a selected group of people were appointed to discuss a certain social problem in a local community to find a proper solution. First, the problem will be communicated to each one of them individually and each one of them will have an initial opinion about what the solution could look like. However, these appointees, as representatives for the society, must sit together and meet regularly in order to reach a consensus on the optimal solution. As the appointees regularly meet, every one of them must express her/his own opinion, perspective, and exchange ideas of different solutions. Every meeting the majority of these individuals will group behind certain revealing ideas that could be better to settle down the problem and try even to push it further by combining different solutions or sub-solutions from other individuals to reach an optimal solution. While the rest will always try to find better and optimal ideas that could convince the majority to shift to their sides. In this particular example, the appointees are the population of the solutions as they start randomly at one point (before the meetings and when they got informed) and then they refine their ideas every time they meet (number of iterations).

2.2 Topology

In this paper, we achieve the proposed analogy using mere performance-driven sequential sampling (see Fig. 1) to create better generations every iteration. We do this in a way that is compatible with any sampling method, i.e., classical DOE methods, such as the traditional Monte Carlo (MC) and Latin hypercube sampling (LHS), though in this paper we only use the MC sampling approach. Unlike traditional evolutionary algorithms (EAs), this algorithm does not use operations such as mutation or crossover to obtain better generations.

The process builds on the assumption that the current best solution vector, \({\varvec{x}}_\mathrm{best}\), lays in a sub-domain where the majority of the good features exist (components of \({\varvec{x}}\)). We refer to this sub-domain as the prominent region \(\varOmega ^{'}\). Hence, we sample the majority of the features, approximately \(\alpha \times 100\%\), from the surrounding neighborhood. This topology of keeping track of only \({\varvec{x}}_\mathrm{best}\) each iteration is similar to the traditional gbest topology proposed by Eberhart and Kennedy (1995).

The location and the size of the prominent domain changes every time the \({\varvec{x}}_\mathrm{best}\) changes. The size of the prominent domain (bandwidth), as will be described later, depends on the time remaining for the algorithm to make improvements and the predetermined size of the analogy \((1-\alpha )\). The size of the prominent region is set to a maximum of \((1-\alpha )\%\) of \(\varOmega \), centered about \({\varvec{x}}_\mathrm{best}\), where \(\alpha \) is the acceptance probability that allows us to sample the components of the solutions from the prominent neighbourhood. The closer the algorithm gets to the maximum number of iterations, i.e., the shorter the time left for the algorithm to make improvements, the smaller the size of the prominent region.

This algorithm explores and exploits the domain by giving special emphasis to the prominent region around the current best solution. At each iteration, the algorithm performs the following steps:

  • Determine the new best fitness \({\varvec{x}}_{~\mathrm{best}}^{~i}\),

  • Update the prominent region \({\varvec{\eta }}^{~i}\) if \(~{\varvec{x}}_{~\mathrm{best}}^{~i}\) is better than \({\varvec{x}}_{~\mathrm{best}}^{~i-1}\),

  • Sample from the search domain using classical DOE methods; sample more densely from the prominent regions \({\varvec{\eta }}^{~i}\) and less densely from the overall search domain.

2.3 Mathematical model

Let the real-valued optimization problem \(f({\varvec{x}})\) be defined on the search domain \(\varOmega \in \mathrm{I\!R}^{n}\) and for all sub-domains \(^{i}\varOmega ^{'} \subset \varOmega \). The corresponding boundaries are \({\varvec{\Gamma }}\) and \(^{i}{\varvec{\Gamma ^{'}}}\) for the main domain and the sub-domain, respectively.

$$\begin{aligned}&\text {optimize}~f({\varvec{x}}), ~{{\varvec{x}}} \in \varOmega , \varOmega \subset \mathrm{I\!R}^{n},~ \text {where} \\&{\varvec{x}} = \{x_1, \dots , x_n\},~ {\varvec{\Gamma }}^{-}\le {\varvec{x}} \le {\varvec{\Gamma }}^{+}.\nonumber \end{aligned}$$
(1)

More specifically, the lower and the upper bounds for the main domain and the sub-domain can be written as \([{\varvec{\Gamma }}^{-}, {\varvec{\Gamma }}^{+}\)] and \([ ^{i}{\varvec{\Gamma }}^{'-}, ^{i}{\varvec{\Gamma }}^{'+}]\), respectively. Additionally, let there be a solution vector \({\varvec{x}}_\mathrm{opt}\) that reclines in \(\varOmega ^{'}\) and minimizes f(.), where \(f({\varvec{x}}_\mathrm{opt}) \le f({\varvec{x}})~ \forall ~ {\varvec{x}} \in \varOmega \) is a global solution. For clarity, the following notations will be used in this work: i is the current time step (iteration), k is the index of the solution vector in the population matrix and j indicates the \(j^{th}\) design variable (feature \(x_j\)) of any solution vector \({\varvec{x}}\).

The process of random feature selection depends either on the algorithm sampling a decision variable \(x_{j}^{i} \in {\varvec{x}}_j\) from the prominent domain \(\varOmega ^{'}\) or improvising and sampling randomly from the entire region \(\varOmega \). This decision is dependent on the acceptance probability \(\alpha \).

Like any other metaheuristic algorithm, the first step is the initial population sampling. For a continuous landscape, the population can be composed by:

$$\begin{aligned} _{k}{\varvec{x}} = {\varvec{\Gamma }}^{-} +~ _{k}{\varvec{u}} \odot ({\varvec{\Gamma }}^{+} - {\varvec{\Gamma }}^{-}). \end{aligned}$$
(2)

In this equation, \(_{k}{\varvec{u}}\) is a vector of random coefficients sampled using one of the classical DOE methods. The \(\odot \) operator indicates an element-wise multiplication of vectors.

The matrix \(\mathbf {u}^{i}\) is sampled at each iteration i with the chosen DOE method. It contains the sampled random coefficients; its size is \(\beta \times n\) where \(\beta \) is the size of the population and n is the number of dimensions of the problem.

$$\begin{aligned}{}[\mathbf {u}^{i}] = \begin{bmatrix} _{1}u_{1}^{i} &{} _{1}u_{2}^{i} &{} \dots &{} _{1}u_{n}^{i} \\ _{2}u_{1}^{i} &{} _{2}u_{2}^{i} &{} \dots &{} _{2}u_{n}^{i} \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ _{\beta }u_{1}^{i} &{} _{\beta }u_{2}^{i} &{} \dots &{} _{\beta }u_{n}^{i} \\ \end{bmatrix}. \end{aligned}$$
(3)

After evaluating the initial population and determining the best solution \({\varvec{x}}_{~best}^{~i}\), the algorithm estimates the domain-tightening coefficients to update the current prominent upper and lower bounds for each design feature as per the following expressions:

$$\begin{aligned}&^{i}{\varvec{\Gamma }}^{~'+} = {\varvec{x}}_{~best}^{~i} + {\varvec{\eta }}^{~i},~ ^{i}{\varvec{\Gamma }}^{~'+} \le ~ {\varvec{\Gamma }}^{+}, \end{aligned}$$
(4)
$$\begin{aligned}&^{i}{\varvec{\Gamma }}^{~'-} = {\varvec{x}}_{~best}^{~i} - {\varvec{\eta }}^{~i},~ ^{i}{\varvec{\Gamma }}^{~'-} \ge ~ {\varvec{\Gamma }}^{-}, \end{aligned}$$
(5)
$$\begin{aligned}&{\varvec{\eta }}^{~i} = \frac{(1-\alpha )\Big ( 1 - \frac{i}{\gamma }\Big )}{2}~ ({\varvec{\Gamma }}^{~+} -~ {\varvec{\Gamma }}^{~-}). \end{aligned}$$
(6)

The bandwidth (\({\varvec{\eta }}^{~i}\)) is updated every time \({\varvec{x}}^{~i}_{~best}\) is updated (self-adaptive mechanism). As can be seen in (6), we applied time-dependent bandwidth tightening using the term \(\big (1-\frac{i}{\gamma }\big )\). This controls the greediness of the algorithm with time: as time passes, the algorithm becomes greedier. The \((1-\alpha )\) term in (6) is used to maintain the \(\alpha / (1-\alpha )\) ratio (resembling the 80/20 analogy). To control the number of features (design variables) that will be sampled per solution from the prominent domain \(\varOmega ^{'}\), the acceptance probability \(\alpha \) has been used.

If the algorithm decides to draw from the prominent domain \(\varOmega ^{'}\), the set of Eqs. (4) to (6) together with (7) are used to generate a feature as follows:

$$\begin{aligned} _{k}{x}_j^i =~ ^{i}_{j}{\Gamma }^{'-} +~ _{k}{u}_j^i ~(^{i}_{j}{\Gamma }^{'+} -~ ^{i}_{j}{\Gamma }^{'-}). \end{aligned}$$
(7)

If the algorithm decides to draw a feature from the overall domain \(\varOmega \), then it will instead be evaluated using:

$$\begin{aligned} _{k}{x}_j^i =~ _{j}{\Gamma }^{-} +~ _{k}{u}_j^i ~(_{j}{\Gamma }^{+} -~ _{j}{\Gamma }^{-}). \end{aligned}$$
(8)

This recursive process runs until the stopping condition is satisfied, which we have set to the maximum number of evaluations \(\gamma \). Algorithm 1 explains the steps of the algorithm.

figure a

3 Parameters setting

The no free lunch (NFL) theorem directly promotes that no algorithm has a global parameters set that ensures an overall good performance for all the possible optimisation problems (Arcuri and Fraser 2013). In line with this, we provide a comprehensive theoretical and visual explanation of the parameter-setting (tuning) for the PSS algorithm. A survey about traditional tuning techniques can be found in Eiben et al. (1999). The PSS algorithm is based on three input parameters that need to be set by the user. Namely, the population size (\(\beta \)), the maximum number of iterations (\(\gamma \)) and the acceptance probability (\(\alpha \)). The taxonomy and terminology used in this section are taken from Eiben et al. (1999). The derived formulae here follow probabilistic rules that can be found in standard textbooks about the theory of probability (for example see Capiński and Zastawniak (2001)). In this section, we provide an analytical investigation of the interaction between the three parameters and illustrate their effects on finding the global optimum by means of a numerical example. Apart from this, the self-adaptive mechanism for the prominent domain will not be discussed here (see Sect. 2.3).

Fig. 8
figure 2

A virtual 1D optimization objective \(f({\varvec{x}})\) – discretized space

Let f(x) be a 1D discrete optimization problem with N finite possible solutions that we need to minimize. Figure 2 shows the distribution of x along the domain \(\varOmega \). Now, let \(\varOmega ^{'}\) be the trueFootnote 1 prominent domain around the optimum solution \(x_\mathrm{opt}\). The red-dashed line, shown in Fig. 2, represents the lowest valley of all local minima, wherein all the points below this line are better than any local optima above it. If any solution below this line is chosen, it will always lead to the global solution in the coming generations by using the gbest topology. Assuming that \(n^{'}\) is the number of all possible points below the line \(\in \varOmega ^{'}\) and N is the number of all possible solutions \(\in \varOmega \), the probability of finding a solution inside the true prominent domain in the first step (initial population sampling) is:

$$\begin{aligned} P(x \in \varOmega ^{'}) = \dfrac{n^{'}}{N}. \end{aligned}$$
(9)

We can expand (9) to determine the probability of drawing at least one solution that lays in \(\varOmega ^{'}\) (event \(A_0\)) from \(\beta \) drawn solutions:

$$\begin{aligned} P(A_0) = 1 - \bigg (1 - \dfrac{n^{'}}{N}\bigg )^{\beta }. \end{aligned}$$
(10)

Equation (10) explains the exponential relationship between the landscape size and the initial population size. The probability of event \(A_0\) is a problem-specific property that differs per objective and independent design variable. The complementary event to \(A_0\) will be called \(B_0\), wherein \(P(B_0) = 1 - P(A_0)\) and \(P(B_0)\) means none of the drawn \(\beta \) solutions initially is located in the true prominent region.

Fig. 9
figure 3

The population history for Case (a)—2D Schwefel function

After the population initialization, the algorithm performance phase begins. In this phase, the algorithm tests the acceptance-rejection paradigm against \(\alpha \) for each solution component (\(\forall x^i_j \in {\varvec{x}}^i\)). If a random number \(r^i\sim U(0,1) \le \alpha \), the algorithm will sample the design component from the prominent region, else it will be sampled from the overall region. The probability of exploring new solutions \(\in \varOmega ^{'}\) from the domain is defined in Eq. (11). This equation is independent of the time step, i, and it is always constant if the population size \(\beta \) is constant for a specific optimization problem and an independent design variable.

$$\begin{aligned} P^i\big (x\in \varOmega ^{'}|r^{i} > \alpha \big ) = \left[ 1 - \bigg (1 - \dfrac{n^{'}}{N}\bigg )^{\beta } \right] (1-\alpha ). \end{aligned}$$
(11)

Now if the algorithm were to intensificate (success against \(\alpha \)), the probability of getting a better solution within the current fictitious prominent region is shown in the following:

$$\begin{aligned} P^i\big (x^{i+1}_\mathrm{best} < x^{i}_\mathrm{best}|r^{i} \le \alpha \big ) = \left[ 1 - \bigg (1 - \dfrac{n_{t}^{i}}{N_{t}^{i}}\bigg )^{\beta } \right] \alpha . \end{aligned}$$
(12)

Notice that Eq. (12) is time-dependent. The terms \(n_t^{i}\) and \(N_t^{i}\) by definition depend on the sub-region (the current fictitious prominent region), and they are a function of time (notice the i superscription). This can be imagined as if the red-dotted line corresponds to the current best solution and the algorithm tries to find a better sub-optimal. Put differently, it implies—only in this algorithm—that the intensification property is time-dependent and it resembles a perturbation problem.

Fig. 10
figure 4

Domain tightening and convergence histories—2D Schwefel function

In real life, the algorithm is not aware of the true prominent domain. Indeed, the only parameters known to the algorithm are the domain and the objective function (as an evaluator). We assume that the best-known solution so far lays in the prominent domain and hope that the algorithm’s exploration will find a better one in the upcoming generations (if any).

If we have an optimization problem with n dimensions where \(n > 1\), we always assume independent design variables. For expanding the reflections made above for n design variables, the probability of obtaining an optimal solution can be calculated using the multiplication rule, wherein every design variable is treated as an independent event. This simple probabilistic analysis shows that the algorithm is dependent on the distribution of the objective function, though because the algorithmic parameters are interactive, they must be carefully set. From this analysis, we can see that with a bigger \(\beta \), we boost the probability of obtaining solutions that belong to the true prominent region, i.e., global optimum. With a higher \(\alpha \), the algorithm puts most of its effort into intensification, weakening the diversification. For multimodal and nonconvex problems, \(\alpha \) should be smaller than in convex and unimodal problems. Using more iterations will always increase the probability of exploring better solutions. Finally, it is worth mentioning that we used only MC sampling in the presented analysis. Unless mentioned otherwise, we used a population of 30 and an acceptance probability of 0.95.

Fig. 11
figure 5

a The change of \(P(A_0)\) with \(\beta \); b the diversification probability versus \(\alpha \) in \(\varOmega \); c the intensification probability versus \(\alpha \) in \(\varOmega ^{'}\)nD Schwefel function for \(n=1\), \(n=2\), \(n=4\), \(n=10\), and \(n=100\)

3.1 Numerical illustration

The example shown in Figs. 3 and 4 indicates the solution of the standard 2D Schwefel problem. The optimum solution \(f(\mathbf{x}_\mathrm{opt}) = 0.0\) is obtained when both \(x_1\) and \(x_2\) hit 420.9687 (the red-dotted lines in Fig. 4). A population of 30 members and a sum of 20 iterations were used to solve this problem. For this problem, we set \(\alpha = 0.95\) and 0.70. Figure 4 reveals how the algorithm adaptively changed the prominent search area \(\varOmega ^{'}\) (the gray fill-up) and dynamically tightened it to control the greediness of the algorithm with respect to the remaining time (for the two cases).

According to our definition of the true prominent region for Schwefel function \(\varOmega ^{'} \in [389.33, 452.16]~ \forall ~ x_j\) (determined graphically), we say that the algorithm has globally converged if the algorithm successfully found both \(x_1\) and \(x_2 \in \varOmega ^{'}\). We analyzed 30 consecutive runs for each case, and the converged results are expressed as: average ± standard deviation (success rate %). For Case (a), the results were 0.197345 ± 0.835687 (83.33%), and for Case (b), the results were 2.435370 ± 2.599124 (96.67%). These results highlight the trade-off balance between the diversification and the intensification processes explained in Eqs. (11) and (12). Side by side, Fig. 5a reveals the effect of choosing the population size (\(\beta \)) on the probability of finding better initial solutions (event \(A_0\)). Moreover, Fig. 5b, c explain the global diversification (\(\forall x_i \in \varOmega \)) and the intensification inside the true prominent domain (\(\forall x_i \in \varOmega ^{'}\)) for different dimensions of Schwefel function, respectively.

4 Benchmarks and comparisons

The presented PSS algorithm was used to solve a set of standard functions (Sect. 4.1), the CEC2017 composite functions (Sect. 4.3), and another selected set of engineering problems (Sect. 4.4). We benchmarked and compared the performance of the PSS algorithm with the obtained results from other state-of-the-art algorithms, the results of which were mostly retrieved directly from the source papers and not re-simulated herein. To obtain a meaningful comparison, we used the same number of evaluations per problem as were used by the authors of the other algorithms. This benchmarking approach follows the recommendations by Arcuri and Fraser (2013) and Liao et al. (2015), which were recently reemphasized by Ser et al. (2019) and used by Piotrowski and Napiorkowski (2018).

For the standard benchmarks, we chose state-of-the-art algorithms to compare with and avoided using the classical and outperformed algorithms following the suggestions by Ser et al. (2019) and Molina et al. (2018). The first algorithm we used in the benchmarking was the whale optimization algorithm (WOA) as this is one of the recent and heavily cited algorithms in the literature, which outperformed many other recent algorithmic proposals (see Mirjalili and Lewis 2016). The second one was the pathfinder algorithm (PFA), which has also been published recently and shown to outperform many other algorithms (see Yapici and Cetinkaya 2019).

For engineering benchmarks, we compared the obtained solution by directly adopting the best results obtained from different algorithms in the literature. We also solved a recent engineering case study with high dimensions that was solved by using the modified parameter-setting-free harmony search (MPSFHS) algorithm by Shaqfa and Orbán (2019) and we used the same algorithm to benchmark the new algorithm with regard to its scalability.

Table 1 Standard benchmark functions (refer to Jamil and Yang 2013 for details)

4.1 Standard benchmarks

Here, we compare our proposed algorithm with the WOA (Mirjalili and Lewis 2016) and the PFA Yapici and Cetinkaya (2019) using the standard benchmarks explained in Table 1. We carefully chose the benchmarks in this paper to cover a wide range of problems.

In Table 2, we compared the WOA with our proposed PSS method. The problems were solved by assuming \(\alpha = 0.95\). For the population size and the number of iterations, we chose the same values as in Mirjalili and Lewis (2016), i.e., a population size of 30 and a total of 500 iterations. In general, 30 dimensions were used for all the proposed problems (\(n=30\)) except for \(f_{13}\) and \(f_{15}\) (see Table 1).

Table 2 Comparison with WOA Mirjalili and Lewis (2016)
Table 3 Comparison with PFA Yapici and Cetinkaya (2019)
Table 4 Scalability with PFA Yapici and Cetinkaya (2019)
Table 5 Comparison with MPSFHS Shaqfa and Orbán (2019)

The reported results in Table 2 express the average ± the standard deviation for 25 consecutive runs per problem (as in Mirjalili and Lewis 2016). The boldfaced results indicate—for each problem—the algorithm that performed the best in the comparison. From Table 2, it can be seen that in easy unimodal and convex functions, the best algorithm was WOA. This behaviour was expected due to the weak intensification in the PSS algorithm that the crude random walk usually reveals. For harder problems, such as nonconvex and/or multimodal ones, though, the PSS algorithm was better able to allocate global optima–this finding holds notably for the Schwefel function \(f_{11}\).

In Table 3, we compare the PSS algorithm to the recently proposed PFA algorithm (Yapici and Cetinkaya 2019). As in their work, the problems in this table were simulated using a population size of 30 and 1000 iterations. For the PSS algorithm, \(\alpha \) was set to 0.95 as per the previous comparison. However, the number of dimensions used for \(f_{8}\) was 20, while for \(f_{7}\) \(n=6\), and for \(f_{14}\) and \(f_{16}\), n was set to 2 and 3, respectively, as shown in Table 1. The stated results are for 30 consecutive runs per function (as in Yapici and Cetinkaya 2019).

As can be seen in Table 3, \(f_2\) and \(f_5\) behaved the same as in Table 2 (the pronounced weak intensification for the proposed approach). On the other hand, the results of \(f_{8}\) and \(f_{11}\), see Table 3, were outperformed by the Pareto-like sampling. The results of \(f_{7}\), \(f_{14}\), and \(f_{16}\) are almost identical.

4.2 Testing scalability

With an increased dimensionality, the problem complexity increases, and accordingly, the search domain expands exponentially. To test the scalability of the current algorithm, we compared the behaviour of the proposed approach with the PFA algorithm for \(f_8\) and \(f_{11}\) functions. The simulations were run 30 times for each, with 10, 50, and 100 dimensions as illustrated in Table 4. As the results suggest, the proposed PSS algorithm outperformed the PFA and registered fewer deteriorations in performance with the increase of dimensions.

Table 6 Composite benchmark functions—adopted from CEC2017 Wu et al. (2016). The definitions of the base functions (\(F_N\)) are defined in CEC2017
Table 7 CEC2017 composite functions tested with \(n=2\) by using only 10 iterations (only for the PSS, PFA, WOA, and PSO and equivalent evaluations for the MPSFHS) for \(f_{c,1} \rightarrow f_{c,8}\) – reported the error values here \(\big (f_{c,i}({\varvec{x_{i}}}) - f_{c,i}({\varvec{x}}_\mathrm{opt})\big )\)
Table 8 CEC2017 composite functions tested with \(n=100\) by using only 100 iterations for both the PSS and PFA algorithms on \(f_{c,1} \rightarrow f_{c,10}\)—reported the error values here \(\big (f_{c,i}({\varvec{x_{i}}}) - f_{c,i}({\varvec{x}}_\mathrm{opt})\big )\)
Table 9 Composite functions comparison with PFA algorithm Yapici and Cetinkaya (2019)—reported here the error values \(\big (f_{c,i}({\varvec{x_{i}}}) - f_{c,i}({\varvec{x}}_\mathrm{opt})\big )\)

As this test implies that increased dimensionality requires more computational capacity, we ran simulations with the recently modified parameter-setting-free harmony search algorithm (MPSFHS) and the proposed PSS approach to see how increasing the iterations could enhance the results. To do this, \(f_{8}\) and \(f_{11}\) were solved for \(n=100\) and a total of 300, 000 evaluations (10, 000 iterations with a population size of 30 for PSS). Table 5 illustrates the results of 30 consecutive runs for each algorithm. The PSS approach outperformed the MPSFHS algorithm, and considerable enhancements were seen in the end results.

4.3 Composite benchmarks

In this section, we used the CEC2017 by Wu et al. (2016) and reviewed by Molina et al. (2018), competition’s composite functions to benchmark the behaviour of the proposed algorithm (PSS). The used functions are briefly described in Table 6. We first tested our algorithm against these functions under extreme cases where only a few iterations are allowed. In Table 7, we ran 30 consecutive tests with \(n = 2\), \(\alpha = 0.95\), \(\beta = 30\), and with allowing a total of 10 iterations per run for the composite functions \(f_{c,1} \rightarrow f_{c,8}\). We conducted a comparison under the same conditions for the PSS, PFA, WOA, and PSO algorithms. The MPSFHS was tuned to \(HMS = 30\) \(m=2\), \(HMCR_i = 0.75\), \(PAR_i = 0.15\), \(HMCR_\mathrm{max} = 0.99\), \(PAR_\mathrm{min} = 0.05\), and 300 iterations to have an equivalent number of evaluations as the PSS, PFA, WOA, and PSO (for more about the used parameter-settings refer to Shaqfa and Orbán (2019)).

The results shown in Table 7 suggest that the PSS algorithm can still be globally convergent and allocate global minima in most of the runs and it scales well with the available time (iterations). Indeed it scored the best results for most functions, though, the gap between the PSS and the PFA algorithms was small. The best runs are revealed in Fig. 6 showing two iteration-milestones \(i = 5\) and \(i = 10\). Notice that the reported results in Table 7 are expressed in terms of the error values and computed as (\(f_{c,i}({\varvec{x_{i}}}) - f_{c,i}({\varvec{x_\mathrm{opt}}})\)) and we revealed the minimum, maximum, mean, median, and the standard deviation of the error values as recommended by the CEC2017 report (refer to Wu et al. 2016).

In another extreme case we tested the functions \(f_{c,1} \rightarrow f_{c,10}\) but this time with \(n=100\) and by only employing 100 iterations. In this experiment, we only compared the PSS with the PFA algorithm. The shown results in Table 8 suggest that the gap between the PSS and the PFA widens. Indeed, the PSS outperformed the PFA in all the functions except \(f_{c,2}\) and \(f_{c,7}\). These results signify the capability of the PSS algorithm to scale with high-dimensional problems while exploiting very limited computational resources. This is quite important for complex surrogate models that require a significant computational capacity for each functional evaluation (Forrester and Keane 2009; Queipo et al. 2005).

Fig. 12
figure 6

The convergence of composite functions using \(\alpha = 0.95\), \(\beta = 30\), and \(\gamma = 10\)—illustrated the best runs for \(f_{c,1} \rightarrow f_{c,4}\) where \(n=2\).

Fig. 13
figure 7

Continued.

We also conducted simulations similar to the ones presented by Yapici and Cetinkaya (2019) and again we compared our results with the adopted ones directly from their paper. The tests were conducted with different dimensions for each function; we used \(n=10\), \(n=30\), and \(n=50\). 50 runs with 1000 iterations per each were used to run all the tests. \(\alpha \) and \(\beta \) were also set to be 0.95 and 30, respectively. The results of both algorithms, shown in Table 9, are in proximity to each other for most functions. However, in their paper they compared the performance of the PFA with other algorithms and the best performance was registered by the effective butterfly optimizer (EBO) (Kumar et al. 2017) and the enhanced version of the success-History based adaptive differential evolution LSHADE-cnEpSin algorithm (Kumar et al. 2017) and they outperformed, though the results are not shown here (refer to Yapici and Cetinkaya 2019), the PFA and the PSS as those algorithms (EBO and LSHADE-cnEpSin) were specifically designed to solve the CEC2017 problems.

Fig. 14
figure 8

The statistical distribution of the obtained optima for a set of composite functions, with 50 runs per each, computed at four iteration-milestones (\(i=50\), \(i=100\), \(i=500\), and \(i=1000\)) with a semi-log scale (on the y-axis). (insets), reveal the actual log-log scale convergence curves for the all 50 runs

Fig. 15
figure 9

The statistical distribution of convergence data of the composite function \(f_{c,4}\) for \(n=10\), \(n=30\), and \(n=50\) at four different iteration-milestones (\(i=50\), \(i=100\), \(i=500\), and \(i=1000\)). The right column shows the actual convergence curves in log-log scalef corresponding to \(n=10\), \(n=30\), and \(n=50\)

Fig. 16
figure 10

a The cantilever beam design problem; b the train gears design problem (a modified figure from Mirjalili 2015); c the three-bar truss design problem

Fig. 17
figure 11

Case study (1) retrieved from Shaqfa and Orbán (2019)—best reinforcement details by PSS

The CEC2017 report indeed recommends to measure the performance of the algorithms at several iteration-milestones to monitor the convergence behavior. However, the reported error values in terms of the mean and standard deviation does not reveal the real distribution of the data. For this purpose, we used the raincloud plots by Allen et al. (2019) to monitor the convergence at several iteration-milestones. In Fig. 7, we evinced a sample of the convergence data distribution at different milestones \(i = 50\), \(i = 100\), \(i = 500\), and \(i = 1000\) and for problem dimensions \(n = 30\) and \(n = 50\). As can be seen in the figure the results are widely-distributed at the beginning of the performance stage (notice data at \(i = 50\)); at later iterations relatively narrower distributions can be identified. It can be noticed as well that some distributions have two peaks and this could be related to the corresponding topology of the function as they contain many attractive local minima that could trick the algorithm into it (for example see \(f_{c,7}\) in Fig. 7 and the corresponding 2D surface in Fig. 6).

The raincloud figures help to visualize how frequent the algorithm could be trapped in local minima and how often it converges globally. This is more clear in lower dimensions, for instance, when \(n = 10\) as shown in Fig. 8 three different distributions can be noticed as each one of them represents a possible local minima. The consistency of the algorithm can be seen as it generates more iterations it approaches global answers without being staggered at local minima. Moreover, the dimensional discrepancy can be clearly distinguished on the same figure for all the runs.

4.4 Engineering benchmarks

4.4.1 Traditional engineering benchmarks

In this section, the PSS was tested and compared with some of the traditional constrained engineering design problems, including: (i) the cantilever beam design (Fig. 9a), (ii) the train gears design (Fig. 9b), and (iii) the three-bar truss design (Fig. 9c). The reader is referred to Yapici and Cetinkaya (2019) and Mirjalili (2015) for the formulations of these problems.

Table 10 Comparisons of solutions to engineering problems

The results of the three problems are shown in Table 10. For the first problem, we obtained an almost-identical solution to the PFA algorithm. However, Sharma and Sah recently obtained an even better optimum solution of 1.32545 with the novel hybrid algorithm m-MBOA (Sharma and Saha 2019). Another recently published article obtained an optimum solution of 1.330565414 by (Zhao et al. 2020)

Regarding the train gears design problem, the PSS algorithm reached the same optimal answer as many other algorithms using integer formulation, such as the moth-flame optimization algorithm (MOA) (Mirjalili 2015). Notwithstanding, Sharma and Saha (2019) claim to obtain the optimum answer of \(3.3610E-16\) by considering the problem landscape as continuous. By also considering the problem’s distribution to be continuous, we obtained a solution of \(2.2695E-21\) with \({\varvec{x}} =\) \(\{43.170946,\) 14.205443,  17.919979,  \(40.869247\}\). However, these solution must not be considered as a new valid optimum, as it disregards the nature of the problem by dealing with the number of teeth per gear as a continuously distributed design variable.

Finally, the third case study minimizes the weight of a three-bar truss structure. As can be seen in Table 10, the PSS obtained a slightly better solution than the one obtained by Mirjalili (2015). Sharma and Saha (2019) again detail a new optimal solution of 1.325454889710144, which lays outside of the feasibility domain of the problem and is therefore inconsequential for engineers. We believe that this solution is obtained by incorrectly enforcing the constraints, or it could be a simple typo. It is worth to mention that the recent water strider algorithm (WSA) by Kaveh and Eslamlou (2020) obtained a slightly better solution with a weight of 263.89584340 kN.

4.4.2 Design of reinforced concrete beams

In the fourth design case study, we recall the problem of designing a reinforced concrete beam (formulated in Shaqfa and Orbán 2019). In this example, the PSS algorithm solved a complex weight minimization case study that was presented and solved by Shaqfa and Orbán (2019) using the MPSFHS algorithm. Herein, we used an \(\alpha \) of 0.85 instead of 0.95 as previously, because this problem required more diversification. Table 10 shows the difference between the results obtained with both the MPSFHS and the PSS algorithms. For this discrete problem with 25 independent design variables, the best answer was obtained by the MPSFHS. However, the difference between the two is small and can be related to some details of the top reinforcement (see Fig. 10). This level of perturbation normally requires a small-stepped random walk algorithm to exploit small details. As outlined above, the PSS algorithm does not perform well with regard to finding the local optimum and should be, for this purpose, combined with other algorithms whenever necessary.

5 Conclusions

In this work, we propose a heuristic approach that uses a simple analogy with classical DOE methods. The presented approach samples solution features more densely in a detected prominent region than in the entire search domain. The design of this algorithm was kept as simple as possible, while avoiding structural bias and premature convergence. The algorithm requires three input parameters, namely the population size (\(\beta \)), the maximum number of iterations (\(\gamma \)), and the acceptance probability (\(\alpha \)). We provided a simple probabilistic derivation for investigating good parameter choices. The algorithm was benchmarked against state-of-the-art algorithms using a selected set of standard and engineering optimization problems. The algorithm behaved better in allocating global minima for nonconvex and multimodal problems than the state-of-the-art algorithms, while it does need enhancements on the intensification side to make better use of all the available solution vectors. Moreover, the PSS algorithm proved useful and outperformed state-of-the-art algorithmic proposals in the scalability problems and under extreme cases where only a few iterations are allowed. Even though we used it as a standalone metaheuristic in this paper when benchmarking the algorithm, the algorithm will likely be most useful hybridised in sequential or parallel schemes with other well-known algorithms, e.g., Lévy flights.

6 Future work

In future work, we plan to implement the lbest (see Eberhart and Kennedy 1995) analogy with the algorithm, where it can redefine the prominent domain region, \(\varOmega ^{'}\), by using, for instance, the best \((1-\alpha ) \times \beta \) candidates per iteration. This topology could be used to strengthen the intensification of the algorithm. Hybridization via other well-known algorithms, including but not limited to Lévy flights (Yang 2009; Yang et al. 2013) and spiral search (Mirjalili and Lewis 2016), could be another way to control the intensification step size in the PSS.

Another possible direction includes applying dynamic schemes for the number of population candidates to scale up or down with the global and local search needs. Liang and Juarez (2020) applied a dynamic approach for the population sizing in their algorithm self-adaptive virus optimization algorithm (SaVOA) and it seems as a promising technique to be applied with the PSS algorithm. Eventually, hybridizing the algorithm with sequential and parallel topologies could be interesting for multimodal optimization problems.

7 Reproducibility

To promote openness and transparency, the authors have provided the readers with the source code of the algorithm written in C++14, Python 3.7, and Octave (Matlab) programming languages. The source codes can be downloaded from the following online platforms:

  • Zenodo: 10.5281/zenodo.3630764

  • Github: git@github.com:eesd-epfl/pareto-optimizer.git