1 Introduction

Algorithms [1] comprise a problem-solving strategy. They are used to perform certain tasks or solve problems under limited resources, and must be executed with limited instructions or time and meet five conditions: accurate input description, clarity and validity, correctness, finiteness, and output. Metaheuristic algorithms [2] are novel optimization algorithms that were conceptualized decades ago when the software and hardware were immature and it was challenging for a computer to perform numerous calculations simultaneously. Recently, owing to software and hardware improvements, the field of optimization problems has received increasing attention. Experts and scholars have proposed different metaheuristic algorithms for solving combinatorial optimization problems such as non-deterministic polynomial hard (NP-hard) problems [3]. A metaheuristic algorithm is a type of optimization algorithm that effectively explores the search space to find a solution closest to the optimal solution. Among optimization algorithms, the swarm intelligence algorithm [4] that simulates biological social behavior has been commonly used in real-life applications, such as data clustering [5, 6], image recognition [7, 8], data mining [9, 10], power systems [11, 12], image processing [13, 14], permutation flow shop scheduling problems (PFSPs) [15, 16], bioinformatics [17], DNA recombination [18], and robot control [19, 20]. The genetic algorithm [21], particle swarm optimization (PSO) [22], ant colony optimization [23], ABC [24], firefly [25], cuckoo search [26], grey wolf optimizer [27], moth-flame optimization [28], and crow search algorithm [29], whale optimization algorithm [30], and butterfly optimization [31] have been the commonly used swarm intelligence algorithms. Among them, PSO is the most used because it solves various problems and is easy to implement. However, PSO has the disadvantage of premature convergence, which occurs when a particle reaches a better position in the search space and stops moving, and other particles approach its position. When a particle is near the local optimum, it may fall into the local optimum, resulting in a rapid decrease in the population diversity. However, the most obvious disadvantage of PSO is that it easily converges prematurely. Therefore, scholars have proposed memetic algorithms [32,33,34,35] to improve algorithm diversity. They involve simulating cultural evolution using a metaheuristic algorithm with a random global search. The main concept is to combine several metaheuristic and local search algorithms [36] to solve NP-hard optimization problems, reduce computation time, and obtain the approximate optimal solution. Therefore, an algorithm that combines different algorithms can provide a better solution. This study proposes a memetic algorithm, called whale particle optimization (WPO), that combines the advantages of WOA and PSO. WPO was evaluated and compared with existing algorithms using four optimization problems (function evaluation, image clustering, PFSP, and data clustering) based on the following four criteria: optimal value eventually representing function optimization, peak signal-to-noise ratio (PSNR) representing image clustering, standard deviation representing the PFSP, and accuracy rate representing data clustering. The main contributions of this study are as follows.

  • First, we developed a new hybrid WPO algorithm that combines the advantages of the WOA and PSO algorithms.

  • Second, by optimizing the hybrid operator and increasing the algorithm diversity, we exchanged particles between the two systems and avoided local optima, thereby improving the global search capability. The introduction of this hybrid operator is the highlight of this study and makes a significant contribution to the literature.

  • We demonstrated the effectiveness and performance of the algorithm through experiments, and applied it for solving practical problems, thus revealing that our algorithm has significant advantages in terms of efficiency and performance.

The rest of this paper is organized as follows. Preliminaries and related work are introduced in Sect. 2. Section 3 describes the proposed algorithm and its application to the four optimization problems. Section 4 presents the results and analysis, and finally Sect. 5 concludes the paper and presents future research directions.

2 Background Knowledges

2.1 Metaheuristic Algorithms

Metaheuristic algorithms, most of which are developed by observing creatures and phenomena in nature, aim to determine an acceptable approximate optimal solution within an effective time limit. Recently, owing to hardware improvements, these algorithms have received increasing attention and have demonstrated the following advantages: good problem-solving performance that is not limited by the initial value setting, ability to jump out of the local optimal solution, and wide application range. The following sections introduce the metaheuristic algorithms employed in this study. The algorithms were used to simulate the particles as the search object proposed by the authors. Assuming that we have n particles, and particle i is at time t, we obtain

$${x}_{i}^{t}={x}_{i,1}^{t},{x}_{i,2}^{t},{x}_{i,j}^{t}\dots ,{x}_{i,d}^{t}\, i=\mathrm{1,2},3,\dots ,n$$
(1)

where \({x}_{i,j}^{t}\) represents the position of particle i in dimension j at time t.

2.1.1 PSO

PSO is an evolutionary computing technology developed by J. Kennedy and R. C. Eberhart et al. [22] in 1995. Derived from the algorithm developed by humans observing the foraging behavior of birds, this algorithm uses particles to simulate a bird, each with its own position and velocity at a certain time (t). The position of each particle corresponds to a set of solutions for the fitness value, which is used by each particle to determine the pros and cons of its solution. To determine its movement direction at each step, the particle uses the current position and direction as references, and its own best experience (\({P}_{best}\)) and best position according to the group (\({P}_{Gbest}\)). In 1998, Shi and Eberhart [37] added inertia weight to the PSO calculation to increase the search ability (variety) of the particles, which is now used as the standard PSO calculation. Equations (2) and (3) show the update methods for the position and velocity of particle i at time t + 1, respectively:

$${v}_{i,j}^{t+1}=\omega {v}_{i,j}^{t}+{c}_{1}{r}_{1}\left({p}_{i,j}^{t}-{x}_{i,j}^{t}\right)+{c}_{2}{r}_{2}\left({p}_{Gbest,j}^{t}-{x}_{i,j}^{t}\right)$$
(2)
$${x}_{i,j}^{t+1}={x}_{i,j}^{t}+{v}_{i,j}^{t+1}$$
(3)

where \({v}_{i,j}^{t}\) and \({x}_{i,j}^{t}\) denote the position and velocity of particle i, respectively, in dimension j at time t; \(\omega\) denotes the inertia weight, \({p}_{i,j}^{t}\) denotes the best position for particle i has traveled, \({p}_{Gbest,j}^{t}\) denotes the best position all particles have traveled, \({c}_{1}\) denotes the particle experience weight, \({c}_{2}\) denotes the group experience weight, and \({r}_{1}\) and \({r}_{2}\) are random numbers between 0 and 1.

The PSO algorithm operates as follows:

  1. Step 1:

    Initialize the particle position and velocity.

  2. Step 2:

    Calculate the fitness value of each particle.

  3. Step 3:

    Find the individual optimal solution for each particle and the optimal solution for all particles.

  4. Step 4:

    Update the velocity and position of each particle at the next time step.

  5. Step 5:

    Check if the end condition is satisfied. If not, return to Step 2 and repeat.

2.1.2 WOA

Mirjalili and Lewis [30] proposed WOA in 2016. It was inspired by the hunting behavior of humpback whales, which yielded three behaviors: encircling prey, bubble-net attacking, and searching for prey:

The first behavior of encircling prey involves humpback whales determining the position of their prey and surrounding them. When the best whale is found, other whales attempt to approach its position. This phenomenon is expressed as follows:

$$D=\left|C\times {\omega }^{t}-{x}_{i}^{t}\right|$$
(4)
$${x}_{i}^{t+1}={\omega }^{t}-A\times D$$
(5)

where \({\omega }^{t}\) represents the best whale position at time t, and D denotes the distance between whale i and the best whale \(\upomega\). A and C are coefficients that are updated at every iteration and calculated as

$$A=2a\times {r}_{1}-a$$
(6)
$$C=2\times {r}_{2}$$
(7)

where a is a number that decreases linearly from 2 to 0 with the number of iterations, and \({r}_{1}\) and \({r}_{2}\) are random numbers between 0 and 1.

The second search method is bubble-net attacking. To mimic the bubble-net attacking behavior of humpback whales, the authors proposed two mathematical models. The first is the shrinking encircling mechanism, which is similar to encircling prey, except that A has a random value between -a and a, where a decrease linearly from 2 to 0 with the number of iterations. The second is the spiral updating position model, which imitates the spiral motion of humpback whales. Humpback whales create a spiral-shaped bubble around their prey, follow the bubble, approach the ocean surface, and capture their prey. This spiral motion is expressed as

$$D=\left|{\omega }^{t}-{x}_{i}^{t}\right|$$
(8)
$${x}_{i}^{t+1}=\mathrm{D}\times {e}^{bl}\times \mathrm{cos}(2\pi l)+{\omega }^{t}$$
(9)

where b is the shape constant used to define the logarithmic spiral lines, and l is a random number between − 1 and 1. Humpback whales use these two methods to create spiral bubbles for surrounding their prey. The chances of them selecting either of these methods to update their positions is 50–50:

$$x_{i}^{{t + 1}} = \,\left\{ {\begin{array}{*{20}l} {\omega ^{t} - A \times D} & {if\,rand_{p} \, < \,0.5} \\ {D \times e^{{bl}} \times \cos \left( {2\pi l} \right) + \omega ^{t} } & {else} \\ \end{array} } \right.$$
(10)

where \({rand}_{p}\) is a random number between 0 and 1.

Finally, humpback whales also search for prey based on the locations of other whales. All metaheuristic algorithms adopt random searching. The mathematical model is similar to that for encircling prey, but humpback whales move based on the position of another whale. The use of random searching depends on the A value. When |A|> 1, random searching is employed as follows:

$$D= \left|C\times {x}_{rand}^{t}-{x}_{i}^{t}\right|$$
(11)
$${x}_{i}^{t+1}={x}_{rand}^{t}-A\times D$$
(12)

where \({x}_{rand}^{t}\) is the position of a random whale at time t.

The WOA operates as follows:

  1. Step 1:

    Initialize the whale positions and optimal point.

  2. Step 2:

    Calculate the fitness value of each whale.

  3. Step 3:

    Update the position of each whale at the next time step.

  4. Step 4:

    Update the optimal point.

  5. Step 5:

    Check if the end condition is met. If not, return to Step 2 and repeat.

2.2 Functional Optimization

Function evaluation is a branch of applied mathematics that primarily studies the optimization of a specific function in a specific situation. The problem is defined as follows: given a function f: A \(\to {\mathbb{R}}\), find element \({x}^{0}\in A\) such that for all x in A, \(\Vert \mathrm{f}({x}^{0})\Vert \le \Vert \mathrm{f}(x)\Vert\), where A is a subset in the Euclidean space \({\mathbb{R}}^{n}\). This is usually specified by a constraint equation or inequality that A must satisfy. The elements of A are called feasible solutions and function f is called the objective function. The goal is to find a set of feasible solutions closest to the optimal solution after being substituted into the objective function.

2.3 Image Clustering

Image clustering is an important component of computer vision [38] and a proven NP-hard problem. It is the process of dividing an image into multiple sub-regions according to each pixel (Fig. 1). Image clustering simplifies and changes the image representation to facilitate its understanding and analysis. It is often used to accentuate objects and boundaries in images. Particularly, each pixel in an image is labeled such that pixels with the same label share some visual properties. Examples of well-known image clustering algorithms are Otsu [39] and k-means [34]. Its practical applications include medical image processing [40], tumor localization, and face recognition.

Fig. 1
figure 1

Image clustering. a Before and b after

2.3.1 PSNR

After image compression, the output image is usually different from the original image. To measure the image quality after processing, we usually refer to the PSNR value and determine if a processing program is satisfactory. The PSNR is calculated as follows:

$${\text{PSNR}} = 10 \times {\text{log}}\,\left( {\frac{{255^{2} }}{MSE}} \right)$$
(13)
$${\text{MSE}} = \frac{{\mathop \sum \nolimits_{n = 1}^{FrameSize} \,\left( {I_{n} - P_{n} } \right)^{2} }}{Frame\,Size}$$
(14)

2.4 PFSPs

A PFSP is a scheduling problem wherein each task should be properly sequenced and processed on a set of machines. Its goal is to discover how to maintain the fluency of task processing with the least amount of idle and waiting time. Assume that there are n jobs and m machines, and the m machines must process each job. Time \({p}_{xy}\) is defined as the execution time of job x on machine y. The order of tasks arranged in the m machines is \(\pi =\left\{{\pi }_{1},{\pi }_{2},\dots ,{\pi }_{n}\right\}\), where n is the total number of tasks. The time required by machine y to complete task x is defined as \(\mathrm{C}({\pi }_{x},\mathrm{y})\), and the time required for completing all n jobs by m machines is calculated as follows:

$$C\left({\pi }_{1},1\right)={p}_{{\pi }_{1},1}$$
(15a)
$$C\left({\pi }_{x},1\right)={C\left({\pi }_{x-1},1\right)+p}_{{\pi }_{x},1}, x=\mathrm{2,3},\dots ,n$$
(15b)
$$C\left({\pi }_{1},y\right)={C\left({\pi }_{1},y-1\right)+p}_{{\pi }_{1},y}, y=\mathrm{2,3},\dots ,m$$
(15c)
$$C\left({\pi }_{x},y\right)={max\{C\left({\pi }_{x-1},y\right),C\left({\pi }_{x},y-1\right)\}+p}_{{\pi }_{x},y}, x=\mathrm{2,3},\dots ,n;y=\mathrm{2,3},\dots ,m$$
(15d)

The makespan is the total time required to complete the final job by the final machine:

$${C}_{max}\left(\pi \right)=\mathrm{ C}\left({\pi }_{n},\mathrm{m}\right)$$
(16)

2.4.1 NEH Algorithm

The NEH algorithm [41] is a scheduling algorithm proposed by Nawaz, Enscore, and Ham, wherein the task with the highest total time consumption across all machines has the highest priority. It operates as follows:

  1. Step 1:

    Arrange the time required by n tasks on the machines in the descending order.

  2. Step 2:

    Schedule the first two tasks such that the makespan decreases.

  3. Step 3:

    Insert the next job into the existing schedule and to find the minimum makespan.

2.4.2 Local Search for Scheduling Problems

This study adopts the variable neighborhood search (VNS) algorithm [42] for local search that used two methods: pair-swap and insertion. As shown in Fig. 2, it begins by randomly selecting two positions in the schedule. Pair-swap implies exchanging the order of the two selected positions, whereas insertion implies deleting the latter of the selected positions and inserting it before the former.

Fig. 2
figure 2

VNS local search, pair-swap, and insertion

2.4.3 Standard Deviation

The quality of permutation flow shop scheduling is evaluated using the makespan value. In the final table presentation, this study uses the average percentage relative deviation (\({\Delta }_{avg}\)), which is calculated as

$$\Delta_{avg} = {\raise0.7ex\hbox{${\mathop \sum \nolimits_{i = 1}^{R} \left( {\frac{{\left( {H_{i} - U_{i} } \right)*100}}{{U_{i} }}} \right)}$} \!\mathord{\left/ {\vphantom {{\mathop \sum \nolimits_{i = 1}^{R} \left( {\frac{{\left( {H_{i} - U_{i} } \right)*100}}{{U_{i} }}} \right)} R}}\right.\kern-0pt} \!\lower0.7ex\hbox{$R$}}$$
(17)

where R denotes the total number of problem executions, \({H}_{i}\) denotes the solution calculated by the algorithm, and \({U}_{i}\) denotes the currently known best solution.

2.5 Data Clustering

Data clustering is a type of unsupervised learning for identifying the cluster that the data belong to, where the data within (inter) the cluster are similar and those outside (outer) the cluster are dissimilar. Suppose n pieces of data exist, they can be expressed as follows:

$$\mathrm{O}=\{{O}_{1},{O}_{2},{O}_{3},\dots ,{O}_{n}\}$$
(18)

Each data piece has d dimensions, which implies that each data piece has d different attributes and features. The i-th data piece can be expressed as

$${\mathrm{O}}_{i}=\{{a}_{1},{a}_{2},{a}_{j},\dots ,{a}_{d}\}$$
(19)

where \({\mathrm{O}}_{i,j}\) represents the jth dimension of the ith data.

In the cluster, each data are divided into different clusters, and each cluster can be expressed as

$$\mathrm{C}=\{{C}_{1},{C}_{2},{C}_{j},\dots ,{C}_{k}\}$$
(20)

where k is the total number of clusters, and \({C}_{j}\) is the jth cluster. There are several rules for clustering, such as no cluster is an empty set, any two random clusters do not have an intersection, and the data in all clusters equals the total amount of data. They are expressed as follows:

$$C_{i} \cap C_{j} = \emptyset \forall \,i \ne j, i,j \in \left\{ {1,2,3, \ldots ,k} \right\}$$
(21)
$${\cup }_{i=1}^{k}{C}_{i}=O$$
(22)
$$c_{i} \ne \emptyset \forall i \in \left\{ {1,2,3, \ldots ,k} \right\}$$
(23)

The sum of the Euclidean distances from each data piece to the cluster center were calculated. The smaller the calculated value, the better the clustering result, which is expressed as

$${\text{f}}\left( {{\text{O}},{\text{Z}}} \right) = \sum\limits_{{i = 1}}^{n} {\sum\limits_{{j = 1}}^{k} {\omega _{{ij}} } } \left\| {O_{i} - Z_{i}^{{}} } \right\|^{2}$$
(24)

where n is the total number of data pieces; k is the total number of clusters; \({\omega }_{ij}\) is a value between 0 and 1, depending on the weight of data piece \({O}_{i}\) and center point \({Z}_{i}\); and \(\Vert {O}_{i}-{Z}_{i}\Vert\) is the Euclidean distance from data piece \({O}_{i}\) to center point \({Z}_{i}\).

2.5.1 Accuracy Rate

Data clustering initially removes the fields representing the clusters in the dataset. If the clustering result of the algorithm is the same as that of the original dataset, the clustering is correct. The total number of correct clusters is denoted as fc, and the total amount of data is denoted as n. The accuracy is calculated as follows:

$$Accuracy=(fc/n)\times 100\%$$
(25)

3 Methodology

3.1 Particle Initialization

This study uses random generation as the initialization method to randomly initialize particles in the search space. To allow WPO to solve the particles, all particles are initially encoded as

$${x}_{i,j}^{0}={x}_{min}+({x}_{max}-{x}_{min})\times {r}_{3}$$
(26)

where \({x}_{min}\) and \({x}_{max}\) are the lower and upper limits of the particles,

3.2 Simultaneous Execution

WPO combines WOA and PSO. WOA has good search capability, whereas PSO has fast convergence. After initializing the particles, WPO calculates them. The particles are executed simultaneously in the two systems to obtain the novel characteristics of both WOA and PSO. The fitness value is recalculated after each iteration of particle movement to proceed to the next step.

3.3 Particle Update

In each iteration, the particles move in a new direction. However, sometimes the fitness value obtained after moving is worse than that before moving. At this time, WPO adjusts the particle to its position before moving this iteration, as shown in Eq. (30). If the fitness value is low, the particle remains at \({x}_{i,j}^{t}\) in the next iteration \({x}_{i,j}^{t+1}\).

$${x}_{i,j}^{t+1}=\left\{\begin{array}{c}{x}_{i,j}^{t}, if f\left({x}_{i,j}^{t}\right)<f({x}_{i,j}^{t+1})\\ {x}_{i,j}^{t+1}, if f\left({x}_{i,j}^{t+1}\right)<f({x}_{i,j}^{t})\end{array}\right.$$
(27)

where \(f\left({x}_{i,j}^{t}\right)\) and \(f({x}_{i,j}^{t+1})\) denote the fitness values of particle \({x}_{i,j}\) at times t and t + 1, respectively, which can also be considered as the fitness values calculated before and after the particle moves, respectively.

3.4 Hybrid Operator

When the algorithm reaches a specific iteration, the hybrid operator [43] is executed. The hybrid operator is based on the roulette wheel method of GA. In WOA and PSO, some particles are selected based on their fitness values and swapped. This method can avoid rapid convergence, which makes the result reach the local optimal solution, and maintain the diversity of the two systems. In the example shown in Fig. 3, six particles exist in the WOA and PSO. When executing the hybrid operator, the probabilities of the particles are calculated based on their fitness values. When selecting particles, the WOA obtains a random number of 0.72, resulting in the selection of the fifth particle A5, whereas the PSO algorithm obtains a random number of 0.48, resulting in the selection of the third particle P3. Therefore, after the execution of the hybrid operator, particles A5 and P3 swap positions.

Fig. 3
figure 3

Hybrid operator

3.5 Summary of the Proposed Algorithm

This hybrid WPO algorithm proposed in this study combines WOA and PSO. WOA has excellent search capability, wherein the three search modes can effectively expand the search range and avoid being trapped in the local optimum. PSO has the advantage of fast convergence, wherein algorithm can quickly converge to the optimal solution to a certain degree. WPO successfully employs both WOA and PSO. The searches are executed in both systems and after a specific number of iterations, the hybrid operator swaps particles among the two systems to increase the diversity of particles and to avoid being trapped in local optimal solutions. This study successfully combines the advantages of the two algorithms such that the proposed algorithm has a fast convergence speed, excellent search capability, and more powerful search capability when using the hybrid operator.

Figure 4 details the steps of using WPO. First, all particles are randomly generated and the two systems then execute simultaneously. After each movement iteration, the fitness values of the particles are recalculated, and their next positions are determined by comparing their fitness values with those before the movement. The hybrid operator executes at certain iterations to improve particle diversity. Finally, the optimal particle is output as the solution. The pseudocode of WPO is shown in Fig. 5.

Fig. 4
figure 4

WPO flowchart for solving optimization problems

Fig. 5
figure 5

Pseudocode of WPO

4 Experimental Results

4.1 Benchmarks

The function evaluation adopted ten functions to evaluate the proposed algorithm and d is the dimension of the function, as shown in Table 1. This study used d = 10 for the evaluation and d = 0 for the optimal values of Functions 1–10. In image clustering tests, six images of 256 × 256 pixels in size were used. The images were Lena, Baboon, Air Plane, Papper, Goldhill, and Sailboat, which were used as the criteria for evaluating the proposed algorithm (Table 2). For permutation flow shop scheduling tests, the popular Taillard dataset [44] was used. The data and the number of jobs ranged from 20 to 500 and the number of machines from 5 to 20 (Table 3). Data clustering employed eight real-world datasets to evaluate the clustering ability of the algorithm. The length of the data ranged from 150 to 6435, the dimension from 3 to 36, and the number of clusters from 2 to 10. The details are listed in Table 4.

Table 1 Test functions
Table 2 Images used for the clustering tests
Table 3 Taillard dataset [44]
Table 4 UCI dataset

4.2 Parameter Settings

The WOA and PSO used 10 particles to search, the number of search iterations was 1000, the hybrid operator executed every 30 iterations to swap 5 particles, and the final data were averaged after executing each algorithm 20 times. Table 5 presents the parameter settings of the algorithm.

Table 5 Parameter settings of the algorithm

4.3 Function Evaluation Results

To evaluate the performance of the proposed algorithm, we compared its performance with that of a well-known hybrid algorithm, called the crow particle optimization [34], as well as with those of PSO, WOA, and CSA. The results of the function evaluation, image clustering, permutation flow scheduling, and data clustering are discussed in the first, second, third, and fourth subsections, respectively.

Figures 6, 7, 8, 9, 10, 11, 12, 13, 14 and 15 show the convergence graphs of performing function evaluations using 10 well-known functions. The figures show that WPO inherits the fast convergence characteristics of PSO and the search capability of WOA, which allow it to quickly determine the approximate optimal solution and escape the local optimum. WPO was compared with some existing algorithms, specifically: CPO, PSO, WOA, and CSA. The results presented in Table 6 indicate that the closer the result is to the optimal value of the function, the better the algorithm performance. WPO performs better than other algorithms for most functions, especially for Sphere, Schwefel, Rastrigin, Cigar, Quartic, and Alpine. Table 7 presents the results of the Wilcoxon signed-rank test for WPO and other algorithms [45]. In Table 6, the ( +) symbol indicates that the WPO performs significantly better than the other algorithms, (−) indicates that it performs significantly worse than the other algorithms, and (≈) indicates an insignificant performance difference.

Fig. 6
figure 6

Convergence graph of the sphere function

Fig. 7
figure 7

Convergence graph of the Rosenbrocks function

Fig. 8
figure 8

Convergence graph of the Ackley function

Fig. 9
figure 9

Convergence graph of the Griewanks function

Fig. 10
figure 10

Convergence graph of the Schwefel function

Fig. 11
figure 11

Convergence graph of the Rastrigin function

Fig. 12
figure 12

Convergence graph of the cigar function

Fig. 13
figure 13

Convergence graph of the step function

Fig. 14
figure 14

Convergence graph of the quartic function

Fig. 15
figure 15

Convergence graph of the alpine function

Table 6 Function evaluation results
Table 7 Wilcoxon signed-rank test results

To evaluate the convergence ability of the proposed algorithm, a goal of \({10}^{-3}\) was set based on the results, and the number of iterations and time required by each algorithm to reach this goal were evaluated. The results (Tables 8 and 9) indicate that six selected functions reached \({10}^{-3}\) within 2000 iterations, and their averages were calculated after 20 iterations. Table 8 lists the number of iterations required for each algorithm to reach the goal. Evidently, WPO achieved the goal in the fewest iterations. Table 9 presents the time required to reach the goal in milliseconds (ms). From Tables 8 and 9, we can conclude that although the proposed algorithm cannot always outperform WOA in terms of time, owing to the powerful search strategy of WOA, combining it with PSO improved the performance in terms of the number of iterations compared to the existing algorithms, which also demonstrated that the proposed algorithm has a strong and fast convergence capability.

Table 8 Number of convergence iterations required for each function
Table 9 Convergence time (ms) of each function

4.4 Image Clustering Results

Six well-known images were used for the comparisons, and the results are presented using PSNR values, with higher values indicating lower distortion rates. Table 10 presents the results, wherein WPO outperforms other algorithms.

Table 10 Image clustering results

4.5 Permutation Flow Shop Scheduling Results

Here, the well-known Taillard dataset [44] was used for the evaluation, and the results are presented as standard deviations, with those closer to zero indicating better performances. Table 11 presents the results, wherein the proposed algorithm outperforms the other algorithms. For datasets TA001, TA091, and TA111, all algorithms obtained the same result, whereas for TA061, TA071, and TA081, the WOA performed better. The WPO, which combines WOA and PSO, inherits the search capability of the WOA, and with the addition of PSO, the overall particle diversity is increased and a better solution can be obtained.

Table 11 Comparison results of permutation flow shop scheduling

4.6 Data Clustering Results

This section presents the data clustering results of WPO and compares them with those of CPO [34], K-means, FCM [46], PSO, WOA, and CSA. The proposed algorithm was evaluated using eight well-known UCI datasets (i.e., Iris, Wine, Breast Cancer, Car Evaluation, Statlog, Yeast, Glass, and CMC). The results (Table 12) are presented in the form of accuracy rate, with higher values indicating better performances. Except for the poorer performance for the Glass dataset, compared to the CSA, the proposed algorithm outperformed the other algorithms.

Table 12 Data clustering results

4.7 Comparison with the SOTA Algorithms

We compared its performance with that of a SOTA algorithms, called the PSO-GSA [47], ARO [48], and COA [49]. The results presented in Table 13 indicate that the closer the result is to the optimal value of the function, the better the algorithm performance. WPO almost performs better than other algorithms for most functions, especially for Sphere, Schwefel, Rastrigin, Cigar, Quartic, and Alpine. Table 14 presents the results of the Wilcoxon signed-rank test for WPO and other algorithms [45]. In Table 13, the ( +) symbol indicates that the WPO performs significantly better than the other algorithms, (−) indicates that it performs significantly worse than the other algorithms, and (≈) indicates an insignificant performance difference.

Table 13 Function evaluation results
Table 14 Wilcoxon signed-rank test results

The results (Tables 15 and 16) indicate that six selected functions reached \({10}^{-3}\) within 2000 iterations, and their averages were calculated after 20 iterations. Table 15 lists the number of iterations required for each algorithm to reach the goal. Evidently, WPO also achieved the goal in the fewest iterations. Table 16 presents the time required to reach the goal in milliseconds (ms). From Tables 15 and 16, we can conclude that the proposed algorithm has a strong and fast convergence capability with these SOTA algorithms.

Table 15 Number of convergence iterations required for each function
Table 16 Convergence time (ms) of each function

5 Conclusions and Future work

Owing to the rise of metaheuristic algorithms, researchers have proposed different algorithms for solving various optimization problems. However, different algorithms have different disadvantages. For example, the PSO has rapid convergence characteristics and is easily trapped in the local optimum. Therefore, this paper proposed a hybrid algorithm, called the WPO, that combines a powerful search capability of the WOA and fast convergence characteristics of PSO.

WPO first initializes particles randomly and then simultaneously executes WOA and PSO. The next positions of the particles are determined according to their fitness values after moving. Furthermore, after some specific iterations, a hybrid operator is executed to increase the particle diversity. Finally, the optimal particle is output as the solution. Four optimization problems were used to evaluate the proposed algorithm: function evaluation, image clustering, PFSP, and data clustering. The performance was compared with those of existing algorithms, and WPO achieved some better results than other algorithms. In addition, it can outperform existing algorithms in terms of convergence capability.

In the future, we will aim to improve the performance, speed, and optimization of WPO. Specifically, doctors can use this algorithm in medical image processing applications. Finally, we expect a complete system will be designed and a time-saving, easy-to-use software will be developed based on the results of this study.