An efficient DBSCAN optimized by arithmetic optimization algorithm with opposition-based learning

Yang, Yang; Qian, Chen; Li, Haomiao; Gao, Yuchao; Wu, Jinran; Liu, Chan-Juan; Zhao, Shangrui

doi:10.1007/s11227-022-04634-w

An efficient DBSCAN optimized by arithmetic optimization algorithm with opposition-based learning

Open access
Published: 26 June 2022

Volume 78, pages 19566–19604, (2022)
Cite this article

Download PDF

You have full access to this open access article

The Journal of Supercomputing Aims and scope Submit manuscript

An efficient DBSCAN optimized by arithmetic optimization algorithm with opposition-based learning

Download PDF

4374 Accesses
20 Citations
Explore all metrics

Abstract

As unsupervised learning algorithm, clustering algorithm is widely used in data processing field. Density-based spatial clustering of applications with noise algorithm (DBSCAN), as a common unsupervised learning algorithm, can achieve clusters via finding high-density areas separated by low-density areas based on cluster density. Different from other clustering methods, DBSCAN can work well for any shape clusters in the spatial database and can effectively cluster exceptional data. However, in the employment of DBSCAN, the parameters, EPS and MinPts, need to be preset for different clustering object, which greatly influences the performance of the DBSCAN. To achieve automatic optimization of parameters and improve the performance of DBSCAN, we proposed an improved DBSCAN optimized by arithmetic optimization algorithm (AOA) with opposition-based learning (OBL) named OBLAOA-DBSCAN. In details, the reverse search capability of OBL is added to AOA for obtaining proper parameters for DBSCAN, to achieve adaptive parameter optimization. In addition, our proposed OBLAOA optimizer is compared with standard AOA and several latest meta heuristic algorithms based on 8 benchmark functions from CEC2021, which validates the exploration improvement of OBL. To validate the clustering performance of the OBLAOA-DBSCAN, 5 classical clustering methods with 10 real datasets are chosen as the compare models according to the computational cost and accuracy. Based on the experimental results, we can obtain two conclusions: (1) the proposed OBLAOA-DBSCAN can provide highly accurately clusters more efficiently; and (2) the OBLAOA can significantly improve the exploration ability, which can provide better optimal parameters.

Combination of Cooperative Grouper Fish -- Octopus Algorithm and DBSCAN to Automatic Clustering

An Improved DBSCAN Algorithm Using Local Parameters

AEDBSCAN—Adaptive Epsilon Density-Based Spatial Clustering of Applications with Noise

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Clustering, a common unsupervised learning algorithm [1,2,3,4], groups the samples in the unlabeled dataset according to the nature of features, so that the similarity of data objects in the same cluster is the highest while that of different clusters is the lowest [5,6,7]. Clustering is popularly used in biology [8], medicine [9], psychology [10], statistics [11], mathematics [12] and computer science [13]. Since the early 1950s, many clustering algorithms have been proposed. In this paper, considering the novelty and effectiveness of density-based method, we will focus on density-based noise application spatial clustering algorithm (DBSCAN) and explore an adaptive method to tune the hyperparameter for DBSCAN instead of empirical setting.

1.1 Literature review

In clustering algorithms, K-means [14], as the most basic partition clustering algorithm at present, has the advantages of simple principle, strong practicability, fast convergence speed and strong model interpretation and so on. However, it is difficult to converge non-convex datasets and often stops at the local optimal solution.

Different from K-means, DBSCAN [15, 16] is another popular clustering algorithm based on density. It achieves clusters via finding high-density areas separated by low-density areas based on cluster density. Compared with other clustering algorithms based on the distance between objects, DBSCAN is suitable for finding clusters of any shape in spatial database and connecting adjacent regions with corresponding density. It can effectively deal with abnormal data, especially the clustering of spatial data [17]. Although DBSCAN has many advantages in clustering, it still has some disadvantages. For different datasets, DBSCAN needs to set the most appropriate parameters, MinPts and EPS, to achieve the best clustering effect. To some extent, the process of setting parameters limits the application of DBSCAN [18].

Over the years, to apply DBSCAN effectively, many researchers have improved DBSCAN [19] through meta-heuristic algorithm [20,21,22,23] to realize the automatic search and determination of EPS and MinPts parameters in DBSCAN. For example, Lai et al. [24] proposed a multi-segment optimization algorithm. As a special variable updating method, it has good optimization performance, can obtain good DBSCAN accuracy, and can quickly obtain appropriate EPS parameter selection. Ji’an et al. [25] proposed an adaptive DBSCAN to solve the clustering problem, taking the target solution and its motion range as noise points, in which DBSCAN $\epsilon$ The neighborhood is affected by some specific physical factors. Zhu et al. [26] applied the harmony search optimization algorithm to DBSCAN, and obtained better clustering parameters and better clustering results. Hu et al. [27] proposed a density-based clustering algorithm, KR-DBSCAN, which is based on reverse nearest neighbor and influence space. Li et al. [28] combined the improved DBSCAN algorithm based on bat optimization and DP algorithm for clustering, and obtained good results. However, these methods still have the characteristics of low convergence accuracy, poor universality, and slow convergence speed.

Meta heuristic algorithm is a popular algorithm in recent years, such as Gray Wolf Whale (GWO), Dragonfly algorithm (DA) and Ant Lion Optimizer (ALO). It has the characteristics of high convergence accuracy and strong robustness. It can be used to solve the selection of parameters in DBSCAN. However, the common meta heuristic algorithms are easy to fall into local optimization. Therefore, we choose Arithmetic optimization algorithm (AOA) as the optimization algorithm. AOA is a new population-based metaheuristic algorithm proposed by Abualah [29], which uses four basic arithmetic operators in mathematics. AOA can not only deal with low dimensional problems [30], but also has a strong ability to solve high-dimensional problems [31]. The distribution mechanism enhances its global search ability, and the algorithm based on population [32] without optimization also helps to achieve faster convergence speed.

However, the ability of standard AOA to balance global optimization and local optimization is still insufficient, and the optimization accuracy is also insufficient. To better balance global optimization and local optimization and improve the optimization accuracy, we proposed some search strategies of improving development (local search) and exploration (global search). In addition, Opposition-based learning (OBL) [33,34,35] is one of the most popular strategies to enhance exploration, which can improve the population diversity of the algorithm in the search space. In the optimization problem, the strategy of checking the candidate solution and its opposite solution at the same time is adopted to speed up the convergence speed to the global optimal solution.

In general, the current clustering effect of DBSCAN is limited by the optimization results of its parameters. At present, the optimization algorithm used to solve DBSCAN parameter optimization has low convergence accuracy and is easy to fall into local optimal solution. Although the standard AOA improves the global dispersion compared with other optimization algorithms, it still has some shortcomings, such as insufficient convergence accuracy and global search ability.

1.2 The gap

To sum up, the demand for the accuracy of DBSCAN clustering algorithm is still increasing. To improve the accuracy of DBSCAN clustering algorithm, more advanced machine learning methods are needed to automatically optimize the parameters of DBSCAN clustering algorithm to improve the accuracy of clustering.

1.3 The contribution

To improve the accuracy and convergence speed of the automatic selection of DBSCAN parameters, this paper proposes a new meta-heuristic improvement strategy, OBLAOA-DBSCAN, which combines the advantages of AOA and OBL with DBSCAN to adjust dynamically the two parameters of DBSCAN. In addition, according to the experimental results, DBSCAN improved with OBLAOA performs well in a variety of public datasets. Therefore, the contributions of this article are as follows:

(1)
An OBLAOA-DBSCAN clustering algorithm is proposed, which can realize automatic parameter search and improve the clustering accuracy and efficiency.
(2)
By adding the OBL strategy, an OBLAOA optimizer is established, which can effectively improve the exploration performance of AOA.
(3)
The proposed OBLAOA-DBSCAN algorithm can provide better clustering results than other clustering algorithms including K-means, Spectral, Optics, DPC and the combination method of DBSCAN and other meta-heuristic optimization algorithms.

1.4 The structure of the paper

The remaining contents are organized as following. Section 2 outlines some backgrounds of DBSCAN and AOA. Section 3 introduces the OBLAOA and gives the use principle and concrete operation. Section 4 illustrates the proposed OBLAOA-DBSCAN algorithm. Section 5 compares the proposed OBLAOA with the original AOA by using 12 benchmark functions. Section 6 demonstrates the superiority of the proposed algorithm with 10 datasets by comparing with some considered clustering algorithms. Section 7 concludes the paper.

2 Related work

2.1 The basic theory of DBSCAN

DBSCAN, an unsupervised learning method, is proposed by [36] handling the clustering problem efficiently based on density. DBSCAN has the capacity to identify noise points efficiently and exactly. Furthermore, it can also distinguish clusters with arbitrary shapes.

In this clustering method, two parameters, the epsilon (EPS) and MinPts, are required to be pre-set to appraise the density distribution of points. DBSCAN starts from an unvisited point randomly. Then, it counts the points fallen within the adjacent area radius of the point less than EPS.

If the number of points is more than MinPts, the current point and its nearby points from a cluster, and the starting point is marked as visited. Then, all the points in the cluster are not marked as visited that are processed in the same way recursively, to expand the cluster. Otherwise, the point is temporarily marked as a noise point. If the cluster is fully expanded, that is, all points in the cluster are marked as visited, and then the same algorithm is used to process the non-visited points. Until all objects are marked as a certain cluster or noise, the clustering process ends. The DBSCAN algorithm flow is presented in Algorithm 1.

DBSCAN suffers from the determination of these two parameters. Previous studies have presented that these two parameters can be found by statistical and classical methods of combining different data mining ways, but these methods consume excessive time. Therefore, we introduce a meta-heuristic optimization to improve the accuracy and efficiency of finding these parameters considerably to achieve clustering faster and more precisely.

2.2 The arithmetic optimization algorithm

Arithmetic Optimization Algorithm (AOA) is a new meta-heuristic optimization algorithm [29] inspired by four major arithmetic operators (Multiplication (M), Division (D), Subtraction(S)), and Addition (A)). The mathematical models of exploration and exploitation phase are detailed as follows. Note that the exploration stage and exploitation stage is conditioned by the math optimizer accelerated (MOA) function. It is calculated by

$$\begin{aligned} \mathrm{MOA}\left( C_\mathrm{Iter }\right) =\delta +C_\mathrm{Iter} *\left( \frac{\gamma -\delta }{M_{Iter}}\right) , \end{aligned}$$

(1)

where $M_{Iter}$ is the maximum number of iterations, and $C_{Iter}$ represents the current iteration, which is between 1 and $M_{Iter}$. $MOA (C_{Iter})$ represents the value of MOA at the current iteration. $\gamma$ and $\delta$ are set to 1 and 0.2 respectively. The math optimizer probability (MOP) at the current iteration is calculated by

$$\begin{aligned} \mathrm{MOP} (C_\mathrm{Iter} ) = 1 - \frac{{C_\mathrm{Iter}}^{\frac{1}{\alpha }}}{{M_\mathrm{Iter}}^{\frac{1}{\alpha }}}, \end{aligned}$$

(2)

where $\alpha$ is a sensitive parameter and represents the exploitation accuracy over the iterations, which is set to 0.5.

$r_1, r_2, r_3$ are random numbers. When $MOA < r_1$, we carry out exploration section by executing D or M. The position updating equation in the exploration stage is followed:

$$\begin{aligned} x_{i,j}(C_\mathrm{Iter}+1) = {\left\{ \begin{array}{ll} x^{\star }(C_\mathrm{Iter}) \div (\mathrm{MOP} + \epsilon ) \times ( (ub_j - lb_j ) \times \mu + lb_j), &{} r_2 < 0.5 \\ x^{\star }(C_\mathrm{Iter}) \times \mathrm{MOP} \times ((ub_j - lb_j) \times \mu + lb_j), &{} \text { otherwise}, \end{array}\right. } \end{aligned}$$

(3)

where $x_{i,j}(C_{\text {Iter}}+1)$ denotes the jth dimension of the ith solution in the next iteration, and $x^{\star }(C_{Iter})$ is the best-obtained solution in the previous iteration. $\epsilon$ is a small integer, $ub_j$ and $lb_j$ refer to the upper and lower bound value of jth position. $\mu$ is a control parameter, which is set to 0.5.

When $MOA \ge r_1$, we carry out exploitation section by executing S or A. In the case of $r_3 < 0.5$, S performs (first rule in Eq. 4). Otherwise, A performs the task in the position of S (second rule in Eq. 4). The position updating equation in the exploitation stage is followed:

$$\begin{aligned} x_{i,j}(C_{\text {Iter}} + 1 ) = {\left\{ \begin{array}{ll} x^{\star }(C_\mathrm{Iter}) - \mathrm{MOP} \times ((ub_j - lb_j) \times \mu + lb_j), &{} r_3 < 0.5 \\ x^{\star }(C_\mathrm{Iter}) + \mathrm{MOP} \times ((ub_j - lb_j) \times \mu + lb_j), &{} \text{ otherwise } . \end{array}\right. } \end{aligned}$$

(4)

2.3 The opposition-based learning

Opposition-based learning (OBL) is employed to consider candidate schemes and their inverses. Depending on which estimate, or inverse estimate is closer to the solution, the search interval can be recursively halved until the estimate or inverse estimate is close enough to the existing solution. It determines whether the original solution x is replaced by the opposite solution $\bar{x}$ by comparing the fitness function values of them. Considering the solution ${x} \in [lb,ub]$, $\bar{x}$ is calculated by the following equation:

$$\begin{aligned} \bar{x} = ub + lb - x. \end{aligned}$$

(5)

This equation above can be popularized to n-dimension via:

$$\begin{aligned} \bar{x}_{j}=ub_{j} +lb_{j}-x_{j}, j = 1,2,\cdots ,n. \end{aligned}$$

(6)

According to the results of comparison, it ends up with storing the best of two solutions.

3 The proposed OBLAOA

OBL is committed to taking both candidate solutions and their opposite solutions into consideration, which shows greater opportunity to reach the global optimal and faster convergence acceleration than only executing S or A. It is adopted to find a solution, which is opposite to the present solution, and subsequently it determines if the opposite solution is used by comparing the fitness function values of them. For example, if $f(x^{\star }(C_{\text {Iter}})) \le f(\bar{x}^{\star }(C_{\text {Iter}}))$, then $x^{\star }(C_{\text {Iter}})$ is saved; otherwise, $\bar{x}^{\star }(C_{\text {Iter}})$ is stored. The equation used in OBLAOA to get the opposite solution is as,

$$\begin{aligned} \bar{x}^{\star }(C_{\text {Iter}}) = ub + lb - x^{\star }(C_{\text {Iter}}) \end{aligned}$$

(7)

where $x^{\star }(C_{\text {Iter}})$ denotes the position of the best solution in the current iteration. $\bar{x}^{\star }(C_\mathrm{Iter} )$ denotes the opposite position of the best solution in the current iteration.

The flowchart of the proposed OBLAOA is given in Fig. 1 and the pseudocode is recorded in Algorithm 2.

4 The improved DBSCAN with OBLAOA

In this section, we apply OBLAOA to DBSCAN to optimize two parameters of DBSCAN (EPS and MinPts). Here more advanced modification method, namely OBLAOA-DBSCAN, is proposed, which can further improve the performance of the clustering algorithm.

In details, the OBLAOA-DBSCAN can perform the optimization process of determining the parameters EPS and MinPts automatically in an extensive scope of search spaces via a meta-heuristic method. First, set the normalized range matrix of two parameters (EPS and MinPts) as the upper bounds ($ub_{j}$) and lower bounds ($lb_{j}$) of search space. Then, the OBLAOA is used to search for suitable parameters within the effective search space.

To get the best clustering results, the sum of the average Euclidean distance of each cluster, the fitness function in OBLAOA-DBSCAN is given as,

$$\begin{aligned} \mathrm {D} \left( o_{i}, o_{l}\right) =\left( \sum _{j=1}^{m}\left( o_{i j}-o_{l j}\right) ^{r}\right) ^{\frac{1}{r}} \end{aligned}$$

(8)

where $D (o_{i}, o_{l})$ is an Euclidean distance function that produces different metrics between object i and object l, $o_{i j}, o_{l j} (i, l=1, \ldots , n, j=1, \ldots ,m)$ represents the value of the j-th attribute of object i and object l, respectively.

With the value of fitness function updates continuously, the position of best-obtained solution, which determines the value of two parameters, varies. At this time, the corresponding parameters MinPts and EPS will change. Until the fitness value no longer changes, apply the obtained parameters into DBSCAN algorithm for clustering.

When only using DBSCAN for clustering, problems, such as low accuracy of clustering results and low definition of noise points, always appear because of parameters setting manually. By introducing OBL to enhance the exploration ability of AOA, OBLAOA can provide effective parameter solutions for DBSCAN, thereby improving the clustering ability. The flowchart is shown in Fig. 2. After calculation, the time complexity of OBLAOA-DBSCAN is $O(N(1 + M \times {nlog(n)} + M \times n))$.Where N represents the number of candidate solutions, M is the number of iterations, and n is the dimension of solving the problem.

5 Numerical simulation

5.1 The benchmark functions

To evaluate the performance of the proposed OBLAOA optimizer, we conducted numerical simulation experiments with 8 test functions in CEC2021. The benchmark functions are presented in Table 1, and its constraint range is represented by Range in the table.

Table 1 The CEC2021 benchmark functions

Full size table

5.2 The setting of experimental parameters

The results of OBLAOA are saved and compared with five traditional methods (i.e., AOA, IAOA, DAOA, EN-GWO and WSSA) for each test case. The parameters of each algorithm are set as follows. The maximum number of iterations and population size of all algorithms are set as 500 and 20, respectively, and the number of function evaluations is 30 [37]. In addition, the initial random population set of all algorithms are the same. All CEC2021 test functions are simulated in 10 and 20 dimensions, respectively.

Table 2 Results of 10-dimensional CECE2021 test functions ($F_1$-$F_8$)

Full size table

Table 3 Results of 20-dimensional CECE2021 test functions ($F_1$-$F_8$)

Full size table

Table 4 Results of three engineering problems

Full size table

5.3 Analysis of the results

The results of numerical simulation are recorded in Tables 2 and 3. To verify the effectiveness of OBLAOA, we compared the results of OBLAOA with the standards AOA, IAOA, DAOA, ENGWO and WSSA. We select the corresponding average value (AVG), standard deviation (STD) and best value (BEST) as performance indicators and report them in all tables. We show better results in bold in Tables 2 and 3. In addition, Wilson’s rank test was used for all results, and all results of Wilson’s rank test (h) were 1. It can be seen from the table that OBLAOA has better performance than standard AOA and other current popular optimization algorithms (i.e., IAOA, DAOA, ENGWO and WSSA). In the test of high-dimensional meta heuristic algorithm, for all functions, the average value and optimal value of OBLAOA are better than standard AOA and current popular algorithms. In the test of low-dimensional meta-heuristic algorithm, the average and optimal values of OBLAOA are better than AOA for F₁, F₂, F₃, F₅, F₆, F₇ and F₈ functions. In some experiments, compared with AOA, the performance of OBLAOA is significantly improved. Taking the F₃ function of 10 dim as an example, the best index of OBLAOA is 106.76, which is 46.62$\%$ lower than standard AOA, 77.17$\%$ lower than DAOA, 2.67$\%$ lower than ENGWO and 80.91$\%$ lower than WSSA. As far as F₆ is concerned, the index of best is 1600, which is 44$\%$ lower than standard AOA, 21.95$\%$ lower than DAOA, 24.88$\%$lower than ENGWO and 31.91$\%$ lower than WSSA. From the F₈ function, the index of best is 2.99e + 3, which is 59.15$\%$ lower than standard AOA, 59.14$\%$ lower than DAOA, 54.83$\%$ lower than IAOA, 63.49$\%$ lower than ENGWO and 65.67$\%$ lower than WSSA. To sum up, our proposed OBL is better than standard AOA and other current popular algorithms in dealing with complex functions.

In order to further prove the optimization effect of OBLAOA, we selected three practical engineering problems for verification, including welded beam design [38], compression spring design [39] and design problems of I-beam [40]. The results are recorded in Table 4 and shown in Fig. 3. To verify the adequacy of the experimental results, we also carried out Wilcoxon signed rank test. The results are expressed in h, which are all 1, and recorded in Table 4. From Fig. 3, we can see that our OBLAOA has better optimization effect compared with other algorithms. Our OBLAOA has the highest convergence accuracy among all problems. Specifically, in the CSD problem, our OBLAOA converges first, and the convergence effect is greatly improved compared with ENGWO and WSSA. In general, our OBLAOA has better convergence effect in solving practical engineering problems. As can be seen from Table 4, our OBLAOA algorithm also has great advantages over standard AOA and the latest algorithm in solving practical engineering problems. Our OBLAOA has obtained the best value in all three engineering problems. Taking the WBD problem as an example, our best value is 4.25, which is 34$\%$ lower than the standard AOA algorithm, 40.3$\%$ lower than DAOA and 56.72$\%$ lower than IAOA. ENGWO and WSSA do not converge, which is quite different from OBLAOA. We can also see from Figs. 4 and 5 that OBLAOA converges earlier and faster, and the final fitness value is lower than that of other algorithms.

6 Experiment and performance evaluation

This section is summarized as follows. In Sect. 6.1, we describe the datasets in the experiment. In Sect. 6.2, we introduce the evaluation indexes that used. In Sect. 6.3, we describe the parameter setting process in detail. In Sect. 6.4, we use ten datasets to test different optimization algorithms. In Sect. 6.5, we compare the optimized OBLAOA-DBSCAN with five classical clustering algorithms.

6.1 The datasets

In this part, we use ten datasets to test the performance of our optimization algorithm OBLAOA-DBSCAN. The instance of 10 datasets is 788, 399, 373, 150, 251, 300, 198, 1980, 341 and 846. The dimensions of 10 datasets are 3, 3, 3, 5, 3, 2, 34, 3, 3 and 19. The clusters of 10 datasets are 7, 6, 2, 3, 3, 5, 2, 5, 9 and 4. Table 5 shows ten datasets as experimental data. We compared the real labels with the clustering label and use the comparison result as the evaluation index of the algorithm, therefore, we use the datasets with real labels.

Table 5 Datasets used in experiments

Full size table

6.2 The error index

In order to measure the clustering results of the improved method, we use Accuracy, Davies- Bouldin index (DBI), Silhouette index (Sil), Rand index (RI) [41, 42], Normalized Mutual Information (NMI), Homogeneity, Completeness, and V-measure [43]. Because of the datasets with the real label, we use the accuracy index to show the performance of the proposed method.

Accuracy is the ratio of correctly clustered data to total data. The correctly clustered data is obtained by comparing the cluster labels K with the actual labels C. DBI is used to measure the distance within the cluster and the distance between the clusters. The smaller DBI means the smaller distance within the cluster and the larger distance among clusters that is formulated as:

$$\begin{aligned} \mathrm {DBI}=\frac{1}{N}\sum _{i=1}^{N}\left( \max \limits _{j=1,\ldots ,N, j \ne i} \left( \frac{d_{i j}}{S_i+S_j}\right) \right) , \end{aligned}$$

where N is the number of clusters, $d_{i j}$ is the average of the distance between clusters i and j. In addition, $S_{i}$ and $S_{j}$ are the mean distances of cluster i and cluster j.

The Silhouette value describes the similarity between different clusters. The larger this value is, the higher similarity between the target and its cluster, and the lower similarity with other clusters. The formula is as follows:

$$\begin{aligned} \mathrm {SIL}=\frac{1}{N} \sum _{i=1}^{N}\left( \frac{b\left( {i}\right) -a\left( {i}\right) }{\max \left\{ a\left( {i}\right) , b\left( {i}\right) \right\} }\right) , \end{aligned}$$

where a(i) is the average distance between a cluster $C_{i}$ and all other data points in the same cluster, and b(i) is the average difference between a cluster $C_{i}$ and other clusters.

The Rand index is a way to compare the similarity of results between two different clustering methods. The larger the value is, the clustering result that compared with the real situation is more consistent. The formula is as follows:

$$\begin{aligned} \mathrm {RI}=\frac{x+y}{C_{n}^{2}}, \end{aligned}$$

where x represents the number of the same labels in both C and K, and y represents the number of different labels in both C and K. $C_{n}^{2}$ represents the number of combinations of C and K that can be made in the dataset.

NMI is used to measure the coincidence degree of two datasets and refers to the correlation between two sets of results. The greater the NMI, the greater the degree of correlation between categories. The formula is as follows:

$${\text{Hl}} = {\text{ }} - \sum\limits_{{i = 1}}^{{{\text{ N }}}} {\left( {\frac{{{\text{Ml}}}}{N}*\log _{2} \frac{{{\text{Ml}}}}{N}} \right)} ,{\text{Hr}} = {\text{ }} - \sum\limits_{{i = 1}}^{{{\text{ N }}}} {\left( {\frac{{{\text{Mr}}}}{N}*\log _{2} \frac{{{\text{Mr}}}}{N}} \right)} ,{\text{Hlr}} = {\text{ }} - \sum\limits_{{i = 1}}^{{{\text{ N }}}} {\left( {\frac{{{\text{Ml*Mr}}}}{N}*\log _{2} \frac{{{\text{Ml*Mr}}}}{N}} \right)} ,$$

and

$${\text{NMI}} = {\text{ }}\sqrt {\left( {{\text{Hl + Hr}} - \frac{{{\text{Hlr}}}}{{{\text{Hl}}}}} \right)*\left( {{\text{Hl + Hr}} - \frac{{{\text{Hlr}}}}{{Hr}}} \right)} ,$$

where Ml represents the cluster distribution of the randomly selected object from the clustering result K, Mr represents the cluster distribution of the randomly selected object from the actual labels C.

Homogeneity refers to each cluster only containing one member of the same cluster. Completeness refers to all members of a cluster are in the same cluster. V-measure is average of Homogeneity and Completeness. The formula is as follows:

$${\text{homogeneity}} = {\text{ Hl + Hr}} - \frac{{{\text{Hlr}}}}{{{\text{Hl}}}},{\text{completeness}} = {\text{Hl + Hr}} - \frac{{{\text{Hlr}}}}{{{\text{Hr}}}},$$

and

$$\begin{aligned} \mathrm {V-measure}= & {} \frac{2*homogeneity*completeness}{completeness+homogeneity}. \end{aligned}$$

The DBI index is usually less than 1, and the lower the index, the better the performance. SIL and RI index values are usually within 1, the closer they are to 1, the better the clustering performance of this method will be. The bigger Accuracy, NMI, homogeneity, completeness, and V-measure are the more real the clustering results are. Through the analysis of evaluation index, we can clearly compare the clustering performance of the new algorithm.

6.3 Experiment settings

Table 6 The range of parameters for the investigated datasets

Full size table

DBSCAN [44] requires two parameters to be selected during clustering. By changing the values of EPS and MinPts parameters, we can get different clustering results. We first set up a large range of two parameters to run. We set EPS to 0-20 and MinPts to 0-40 to find an appropriate clustering results and adjust the range of parameters manually. These ranges of parameters are shown in Table 6. By comparing the results, we get a more accurate range for each dataset, which is used for the following experiments. We take EPS to one decimal and round down MinPts. We use the OBLAOA-DBSCAN algorithm to optimize these two parameters in the experiment. Firstly, we compare the optimization algorithms. The results of OBLAOA are compared with the following algorithms: Arithmetic Optimization Algorithm (AOA), Whale Optimization Algorithm (WOA) [45], Salp Swarm Algorithm (SSA) [46], Weighted Salp Swarm Algorithms (WSSA) [47], Exponential Neighborhood Grey Wolf Optimization (ENGWO) [48], developed Arithmetic Optimization Algorithm (dAOA) [49] and improved arithmetic optimization algorithm (IAOA) [50]. Secondly, we compare our OBLAOA-DBSCAN algorithm with five classical clustering algorithms, namely K-means [51], Spectral [52], OPTICS [53], clustering by fast search and find of density peaks (DPC) [54] and the original DBSCAN.

To compare the gap conveniently and clearly among the algorithms, we set the parameters in the test as follows. The maximum number of iterations and population size of all algorithms are set to 100 and 20. In addition, we run each algorithm 20 times, and take the average result to eliminate the error in the experiment. The experimental algorithm run by MATLAB 2017b.

6.4 Experimental results of the optimization algorithm

In this part, we compared our improved optimization algorithm OBL-AOA with other seven meta-heuristic optimization algorithms. We take Euclidean distance as the fitness function and get the convergence curve of fitness function. In Tables 7, 8, 9 and 10, we show the error indexes of different algorithms and make better indexes in bold. In Fig. 6, we show the convergence curves of six datasets, and the convergence curves of other datasets are in Fig. 9 in Appendix.

Table 7 The evaluation indexes of datasets in DBSCAN optimized by different meta-heuristic algorithms I

Full size table

Table 8 The evaluation indexes of datasets in DBSCAN optimized by different meta-heuristic algorithms II

Full size table

Table 9 The evaluation indexes of datasets in DBSCAN optimized by different meta-heuristic algorithms III

Full size table

Table 10 The evaluation indexes of datasets in DBSCAN optimized by different meta-heuristic algorithms IV

Full size table

The experiment shows that our OBLAOA algorithm is better than the original AOA algorithm, and it is the best among the eight optimization algorithms when we apply it into DBSCAN algorithm. We use the convergence curve and the error index to introduce them. Our optimization algorithm has better fitness function and the convergence rate, it can be seen through the convergence curve in Fig. 6. In all the datasets, the fitness function of our OBLAOA algorithm are better than the original AOA algorithm and other optimization algorithms. Fig. 6 shows that the convergence accuracy and rate of the OBLAOA are better than those of the AOA. In the datasets Aggregation, Jain, and Synthesis, as the function gradually converges, all algorithms converge more slowly and sometimes AOA falls into local optimal solution. However, because the OBL algorithm has strong local search capability, OBLAOA can still update the optimal solution.

Our OBLAOA algorithm performs better than the other optimization algorithms according to the results of error index in Tables 7, 8, 9 and 10. In the datasets Compound, Jain, Iris, Wpbc, Synthesis and Vehicle, we can clearly see that our OBLAOA-DBSCAN algorithm is better in accuracy. Its DBI index is smaller than the others, and its SIL, RI, NMI, homogeneity, completeness, and V-measure index are larger than the others. Their accuracy is the best of the eight algorithms, the accuracy of Compound is 0.8538, the accuracy of Jain is 0.7151, the accuracy of Iris is 1, the accuracy of Wpbc is 0.9346, the accuracy of Synthesis is 0.9998, the accuracy of Vehicle is 0.9656. Although some of the indexes are the same, our OBLAOA algorithm is better in general. In the four datasets Aggregation, Spiral, Pathbased and R15, the accuracy and the evaluation index of the different algorithms are similar. However, it can be concluded that, in general, the OBLAOA algorithm has a better effect on the analysis of clustering problems than the original AOA algorithm and other six meta-heuristic algorithms. Therefore, the OBLAOA-DBSCAN algorithm has a good influence on the clustering of datasets.

6.5 Experimental results of clustering algorithm

Table 11 The evaluation indexes of datasets in different clustering algorithms I

Full size table

Table 12 The evaluation indexes of datasets in different clustering algorithms II

Full size table

Specific clustering results of these datasets are recorded in Figs. 7 and 8, where we show the results by using K-means algorithm, Spectral algorithm, Optics algorithm, DPC algorithm, DBSCAN algorithm and the best clustering optimization algorithm (OBLAOA-DBSCAN). Each colour in the figure represents a cluster of data. By comparing the graphs of each cluster, we can make a basic judgment about the effect of clustering as follows. We can find that OBLAOA-DBSCAN algorithm has a better clustering result in Figs. 7 and 8. It can cluster the data into a better shape and find the actual number of clusters. The graphs of datasets without illustrations are in the in Fig. 10 in Appendix. In Tables 11 and 12, we show the error indexes of different Clustering algorithms and make better indexes in bold. The data with * in the table represents the data in articles [55] and [56].

In Fig. 7, compared with K-means, the effect of dataset Aggregation shows that our algorithm has more reliable clusters. Each cluster in the figure is clearly distinguished, while some clusters in K-means are not clearly distinguished. In Fig. 8, compared with Spectral, the effect of dataset Synthesis shows that our algorithm clusters more accurately on the left side of the graph. The cluster effect for a whole block of data is better than Spectral algorithm. From the graphs of Jain, Spiral and Pathbased in Figs. 7 and 8, the OBLAOA-DBSCAN algorithm is more accurate than K-means and Spectral for the for circular datasets.

In Fig. 8, we can see from the graphs of datasets Pathbased and R15 that our algorithm clusters more accurately than Optics when dealing with dense data. When dealing with discrete data points, the Optics algorithm marks them as noise points. Our algorithm is more accurate when dealing with these points. We can draw this conclusion from the picture of dataset Synthesis. Through the datasets Aggregation and Jain in Fig. 7, it can be obtained that DPC algorithm marks the boundary points as noise points when dealing with data. Therefore, we can find that OBLAOA-DBSCAN algorithm has a better clustering effect than Optics and DPC on circular datasets by comparing cluster graphs. In addition, OBLAOA-DBSCAN correctly identifies sets of data points in areas of lower local density, and edge data points. In contrast, the original DBSCAN failed to accurately cluster these points.

We can find the Accuracy, RI, Sil, NMI, homogeneity, completeness, and V-measure index of OBLAOA-DBSCAN algorithm are significantly higher than those of K-means, Spectral, Optics, DPC and DBSCAN algorithm, the DBI index of the OBLAOA-DBSCAN algorithm is lower than that of K-means and Spectral algorithm from Tables 11 and 12. Therefore, the accuracy of improved OBLAOA-DBSCAN algorithm is better than the original DBSCAN in the dataset clustering.

Compared with the indexes of other articles in Table 11, our algorithm has better NMI indexes than K-means and original DBSCAN algorithms. On dataset Compound, OBLAOA-DBSCAN’s NMI index is 48.74$\%$ higher than K-means’s and 1.04$\%$ higher than DBSCAN’s. On dataset Iris, OBLAOA-DBSCAN’s NMI index is 13.25$\%$ higher than K-means’ and 56.25$\%$ higher than DBSCAN’s. Compared with the indexes of other articles in Table 12, our algorithm has better NMI indexes than K-means and DPC algorithms. On dataset Aggregation, OBLAOA-DBSCAN’s NMI index is 17.52$\%$ higher than K-means’s and 2.08$\%$ higher than DPC’s. On dataset Jain, OBLAOA-DBSCAN’s NMI index is 72.16$\%$ higher than K-means’s and 2.74$\%$ higher than DPC’s. On dataset Pathbased, OBLAOA-DBSCAN’s NMI index is 28.99$\%$ higher than K-means’s and 16.22$\%$ higher than DPC’s. On dataset R15, OBLAOA-DBSCAN’s NMI index is 0.58$\%$ higher than K-means’s and 0.67$\%$ higher than DPC’s.

In Table 11, we can find the index DBI and RI of dataset Spiral and Pathbased are not best, but their accuracy is better than those compared with real labels. According to figure, we can draw a conclusion that for circular datasets like Figs. 7 and 8, our DBSCAN algorithm can determine the shape of clustering more accurately and get better results. In Table 11, we can see that the SIL index has a negative set of values on circular dataset Spiral, but the clustering shapes are more consistent with the real labels. Through the above comparative analysis, we can find that OBLAOA-DBSCAN algorithm not only optimizes better than other optimization algorithms, but also performs better in clustering analysis compared with some classical clustering algorithms. In general, we can conclude that OBLAOA-DBSCAN algorithm has a very good effect on the clustering of datasets.

7 Conclusion

In this paper, we have proposed a new clustering algorithm named OBLAOA-DBSCAN. In this algorithm, we introduce OBL into AOA algorithm and develop an OBLAOA optimizer to improve the global search ability and convergence accuracy of standard AOA algorithm. Then, we use the improved OBLAOA algorithm to adjust the EPS and MinPts parameters of DBSCAN in order to improve the clustering effect of DBSCAN and propose a hybrid clustering algorithm (OBLAOA-DBSCAN). In our numerical simulation, we have demonstrated that the improved OBLAOA is more effective than the original AOA and other current popular algorithms. In addition, we also have validated the effectiveness of our proposed OBLAOA-DBSCAN algorithm by many clustering projects and found that the proposed clustering algorithm can achieve an accurate and reliable clustering results with less computational costs.

Although OBLAOA-DBSCAN can achieve significant improvement, there are still some insufficient, such as the selection of the best parameters of the optimization algorithm, the global search ability and clustering effect of the optimization algorithm need to be further improved. In the future, we will apply OBLAOA-DBSCAN to clustering problems on more datasets. In addition, OBLAOA can also be applied to other application problems like clustering model, such as image classification and recognition, speech signal classification, electrical information classification and so on, which needs further research by other researchers.

Data availability

Data are available from the second and third author.

References

Yuvaraj N, Suresh Ghana Dhas C (2020) High-performance link-based cluster ensemble approach for categorical data clustering. J Supercomput 76(6):4556–4579
Article Google Scholar
Hussein S, Kandel P, Bolan CW, Wallace MB, Bagci U (2019) Lung and pancreatic tumor characterization in the deep learning era: novel supervised and unsupervised learning approaches. IEEE Trans Med Imag 38(8):1777–1787
Article Google Scholar
Wu J, Wang YG, Burrage K, Tian YC, Lawson B, Ding Z (2020) An improved firefly algorithm for global continuous optimization problems. Expert Syst Appl 149:113340
Article Google Scholar
Chen H, Li W, Yang X (2020) A whale optimization algorithm with chaos mechanism based on quasi-opposition for global optimization problems. Expert Syst Appl 158:113612
Article Google Scholar
Edwin Dhas P, Sankara Gomathi B (2020) A novel clustering algorithm by clubbing GHFCM and GWO for microarray gene data. J Supercomput 76(8):5679–5693
Article Google Scholar
Wang C, Koh JM, Yu T, Xie NG, Cheong KH (2020) Material and shape optimization of bi-directional functionally graded plates by GIGA and an improved multi-objective particle swarm optimization algorithm. Computer Methods Appl Mech Eng 366:113017
Article MathSciNet MATH Google Scholar
Nanda SJ, Panda G (2014) A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm Evol Comput 16:1–18
Article Google Scholar
Hu L, Zhang J, Pan X, Yan H, You ZH (2021) HiSCF: leveraging higher-order structures for clustering analysis in biological networks. Bioinformatics. 37(4):542–550
Article Google Scholar
Chen YJ, Chen MZ, Zhang HW, Wu GS, Guo SR (2021) Effect of Guo Qing Yi Tang combined with Western medicine cluster therapy on acute pancreatitis. Am J Emergency Med 50:66–70
Article Google Scholar
Rochat L, Bianchi-Demicheli F, Aboujaoude E, Khazaal Y (2019) The psychology of swiping: a cluster analysis of the mobile dating app Tinder. J Behav Addict 8(4):804–813
Article Google Scholar
Kim S, Jung I (2017) Optimizing the maximum reported cluster size in the spatial scan statistic for ordinal data. PLoS One 12(7):182234
Article Google Scholar
Celebi ME (2014) Partitional clustering algorithms. Springer, New York
MATH Google Scholar
Medová J, Bakusová J (2019) Application of hierarchical cluster analysis in educational research: Distinguishing between transmissive and constructivist oriented mathematics teachers. Statistika: Stat Econ J 99:142–150
Google Scholar
Gong W, Pang L, Wang J, Xia M, Zhang Y (2021) A social-aware K means clustering algorithm for D2D multicast communication under SDN architecture. AEU-Int J Electron Commun 132:153610
Article Google Scholar
Raj S, Improved Ghosh D, Optimal DBSCAN, for Embedded Applications Using High-Resolution Automotive Radar. In, (2020) 21st International Radar Symposium (IRS). IEEE 2020:343–346
Mardani K, Maghooli K (2021) Enhancing retinal blood vessel segmentation in medical images using combined segmentation modes extracted by DBSCAN and morphological reconstruction. Biomed Signal Process Control 69:102837
Article Google Scholar
Fouedjio F (2020) Clustering of multivariate geostatistical data. Wiley Interdiscipl Rev: Comput Stat 12(5):1510
Article MathSciNet Google Scholar
Wang L, Wang H, Han X, Zhou W (2021) A novel adaptive density-based spatial clustering of application with noise based on bird swarm optimization algorithm. Computer Commun 174:205–214
Article Google Scholar
Wang C, Ji M, Wang J, Wen W, Li T, Sun Y (2019) An improved DBSCAN method for LiDAR data segmentation with automatic Eps estimation. Sensors 19(1):172
Article Google Scholar
Jian Z, Zhu G (2021) Affine invariance of meta-heuristic algorithms. Inf Sci 576:37–53
Article MathSciNet Google Scholar
Agarwal P, Mehta S, Abraham A (2021) A meta-heuristic density-based subspace clustering algorithm for high-dimensional data. Soft Comput 25:10237–10256
Article Google Scholar
Zhang H, Nguyen H, Bui XN, Pradhan B, Mai NL, Vu DA (2021) Proposing two novel hybrid intelligence models for forecasting copper price based on extreme learning machine and meta-heuristic algorithms. Resour Policy 73:102195
Article Google Scholar
Singh H, Singh B, Kaur M (2021) An improved elephant herding optimization for global optimization problems. Eng Computers 55:1–33
Google Scholar
Lai W, Zhou M, Hu F, Bian K, Song Q (2019) A new DBSCAN parameters determination method based on improved MVO. IEEE Access 7:104085–104095
Article Google Scholar
Jian S, Li D, Yu Y (2021) Research on Taxi Operation Characteristics by Improved DBSCAN Density Clustering Algorithm and K-means Clustering Algorithm. In: Journal of Physics: Conference Series. vol. 1952. IOP Publishing; p. 042103
Zhu Q, Tang X, Elahi A (2021) Application of the novel harmony search optimization algorithm for DBSCAN clustering. Expert Syst Appl 178:115054
Article Google Scholar
Hu L, Liu H, Zhang J, Liu A (2021) KR-DBSCAN: a density-based clustering algorithm based on reverse nearest neighbor and influence space. Expert Syst Appl 186:115763
Article Google Scholar
Li M, Bi X, Wang L, Han X (2021) A method of two-stage clustering learning based on improved DBSCAN and density peak algorithm. Computer Commun 167:75–84
Article Google Scholar
Abualigah L, Diabat A, Mirjalili S, Abd Elaziz M, Gandomi AH (2021) The arithmetic optimization algorithm. Computer Methods Appl Mech Eng 376:113609
Article MathSciNet MATH Google Scholar
Brust JJ, Marcia RF, Petra CG (2019) Large-scale quasi-Newton trust-region methods with low-dimensional linear equality constraints. Comput Optim Appl 74(3):669–701
Article MathSciNet MATH Google Scholar
Bouhlel MA, Martins JR (2019) Gradient-enhanced kriging for high-dimensional problems. Eng Computers 35(1):157–173
Article Google Scholar
Fu G, Wang C, Zhang D, Zhao J, Wang H (2019) A multiobjective particle swarm optimization algorithm based on multipopulation coevolution for weapon-target assignment. Math Probl Eng 2019:1424590
Article Google Scholar
Elgamal ZM, Yasin NM, Sabri AQM, Sihwail R, Tubishat M, Jarrah H (2021) Improved equilibrium optimization algorithm using elite opposition-based learning and new local search strategy for feature selection in medical datasets. Computation 9(6):68
Article Google Scholar
Lei D, You T, Cai L (2021) Parameter identification of roll motion equation of ship in regular wave using opposition based learning gaussian bare bone imperialist competition algorithm. IEEJ Trans Electr Electron Eng 16(8):1086–1092
Article Google Scholar
Nekooei-Joghdani A, Safi-Esfahani F (2021) Dynamic scheduling of independent tasks in cloud computing applying a new hybrid metaheuristic algorithm including Gabor filter, opposition-based learning, multi-verse optimizer, and multi-tracker optimization algorithms. J Supercomput 78:1182–1243
Article Google Scholar
Ester M, Kriegel H, Sander J, Xu X, Idrissov A, Nascimento M et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD. vol. 2. ACM Press. p. 49–60
Abualigah L, Ewees AA, Al-qaness MAA, Elaziz MA, Yousri D, Ibrahim RA et al (2022) Boosting arithmetic optimization algorithm by sine cosine algorithm and levy flight distribution for solving engineering optimization problems. Neural Comput Appl 34(11):8823–8852
Article Google Scholar
Kamil AT, Saleh HM, Abd-Alla IH (2021) A multi-swarm structure for particle swarm optimization: Solving the welded beam design problem. In: Journal of Physics: Conference Series. vol. 1804. IOP Publishing. p. 01201
Gupta S (2021) Enhanced harmony search algorithm with non-linear control parameters for global optimization and engineering design problems. Eng Computers 87:1–24
Google Scholar
Kumar N, Mahato SK, Bhunia AK (2021) Design of an efficient hybridized CS-PSO algorithm and its applications for solving constrained and bound constrained structural engineering design problems. Results Control Optim 5:100064
Article Google Scholar
Rad MH, Abdolrazzagh-Nezhad M (2020) A new hybridization of DBSCAN and fuzzy earthworm optimization algorithm for data cube clustering. Soft Comput 24(20):15529–15549
Article Google Scholar
Gholizadeh N, Saadatfar H, Hanafi N (2021) K-DBSCAN: an improved DBSCAN algorithm for big data. J Supercomput 77(6):6214–6235
Article Google Scholar
Zhang H, Guo H, Wang X, Ji Y, Wu QJ (2020) Clothescounter: a framework for star-oriented clothes mining from videos. Neurocomputing 377:38–48
Article Google Scholar
Bryant A, Cios K (2017) RNN-DBSCAN: a density-based clustering algorithm using reverse nearest neighbor density estimates. IEEE Trans Knowl Data Eng 30(6):1109–1121
Article Google Scholar
Jiang J, Feng T, Liu C (2021) An improved nonlinear grey bernoulli model based on the whale optimization algorithm and its application. Math Probl Eng 2021:6691724
Google Scholar
Abd El-sattar S, Kamel S, Ebeed M, Jurado F (2021) An improved version of salp swarm algorithm for solving optimal power flow problem. Soft Comput 25(5):4027–4052
Article Google Scholar
Chouhan N, Bhatt UR, Upadhyay R (2021) Weighted salp swarm and salp swarm algorithms in fiWi access network: a new paradigm for ONU placement. Opt Fiber Technol 63:102505
Article Google Scholar
Mohakud R, Dash R (2022) Skin cancer image segmentation utilizing a novel EN-GWO based hyper-parameter optimized FCEDN. J King Saud Univ-Computer Inf Sci 45:1–16
Google Scholar
Xu YP, Tan JW, Zhu DJ, Ouyang P, Taheri B (2021) Model identification of the proton exchange membrane fuel cells by extreme learning machine and a developed version of arithmetic optimization algorithm. Energy Rep 7:2332–2342
Article Google Scholar
Kaveh A, Hamedani KB (2022) Improved arithmetic optimization algorithm and its application to discrete structural optimization. In: Structures. vol. 35. Elsevier; p. 748–764
Karczmarek P, Kiersztyn A, Pedrycz W, Al E (2020) K-Means-based isolation forest. Knowl-Based Syst 195:105659
Article Google Scholar
Allab K, Labiod L, Nadif M (2016) Power simultaneous spectral data embedding and clustering. In: Proceedings of the 2016 SIAM International Conference on Data Mining. SIAM; p. 270–278
Kim JH, Choi JH, Yoo KH, Nasridinov A (2019) AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities. J Supercomput 75(1):142–169
Article Google Scholar
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Article Google Scholar
Guo W, Xu P, Dai F, Hou Z (2022) Harris hawks optimization algorithm based on elite fractional mutation for data clustering. Appl Intell 89:1–27
Google Scholar
Zhang Y, Ding S, Wang L, Wang Y, Ding L (2021) Chameleon algorithm based on mutual k-nearest neighbors. Appl Intell 51(4):2031–2044
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank the six excellent reviewers for their constructive comments and suggestions, which have led to a much-improved paper. Also, the authors would like to acknowledge Ms. Xin Jiang and Ms. Xia Lin for their preparation for the original manuscript. This work is supported in part by the National Natural Science Foundation of China under Grant 61873130 and Grant 61833011, in part by the Natural Science Foundation of Jiangsu Province under Grant BK20191377, in part by the 1311 Talent Project of Nanjing University of Posts and Telecommunications, in part by Natural Science Foundation of Nanjing University of Posts and Telecommunications under Grant NY220194 and under Grant NY221082 by the Australian Research Council project DP160104292 and the National Natural Science Foundation of China under Grant 62001337.

Funding

Open Access funding enabled and organized by CAUL and its Member Institutions.

Author information

Chen Qian, Haomiao Li have contributed equally to this work.

Authors and Affiliations

College of Automation and College of Artificial Intelligence, Nanjing University of Posts and Telecommunications, Nanjing, 210023, Jiangsu, People’s Republic of China
Yang Yang, Chen Qian, Haomiao Li & Yuchao Gao
School of Mathematical Sciences, Queensland University of Technology, Brisbane, 4001, Queensland, Australia
Jinran Wu
School of Business Administration and Customs Affairs, Shanghai Customs College, Shanghai, Shanghai, 201204, People’s Republic of China
Chan-Juan Liu
School of Science, Wuhan University of Technology, Wuhan, 430070, Hubei, People’s Republic of China
Shangrui Zhao

Authors

Yang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Chen Qian
View author publications
You can also search for this author in PubMed Google Scholar
Haomiao Li
View author publications
You can also search for this author in PubMed Google Scholar
Yuchao Gao
View author publications
You can also search for this author in PubMed Google Scholar
Jinran Wu
View author publications
You can also search for this author in PubMed Google Scholar
Chan-Juan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shangrui Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

YY: Project administration; QC: Software, Visualization, Formal analysis, Writing-review and editing; HL: Software, Visualization, Formal analysis, Writing-original draft; YG: Writing-review and editing; JW: Investigation, Formal analysis, Writing-review and editing; C-JL: Writing-review and editing; SZ: Writing-review and editing.

Corresponding authors

Correspondence to Jinran Wu or Chan-Juan Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A The additional results for our experiments

See Figs. 9 and 10.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yang, Y., Qian, C., Li, H. et al. An efficient DBSCAN optimized by arithmetic optimization algorithm with opposition-based learning. J Supercomput 78, 19566–19604 (2022). https://doi.org/10.1007/s11227-022-04634-w

Download citation

Accepted: 26 May 2022
Published: 26 June 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s11227-022-04634-w

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

An efficient DBSCAN optimized by arithmetic optimization algorithm with opposition-based learning

Abstract

Similar content being viewed by others

Combination of Cooperative Grouper Fish -- Octopus Algorithm and DBSCAN to Automatic Clustering

An Improved DBSCAN Algorithm Using Local Parameters

AEDBSCAN—Adaptive Epsilon Density-Based Spatial Clustering of Applications with Noise

1 Introduction

1.1 Literature review

1.2 The gap

1.3 The contribution

1.4 The structure of the paper

2 Related work

2.1 The basic theory of DBSCAN

2.2 The arithmetic optimization algorithm

2.3 The opposition-based learning

3 The proposed OBLAOA

4 The improved DBSCAN with OBLAOA

5 Numerical simulation

5.1 The benchmark functions

5.2 The setting of experimental parameters

5.3 Analysis of the results

6 Experiment and performance evaluation

6.1 The datasets

6.2 The error index

6.3 Experiment settings

6.4 Experimental results of the optimization algorithm

6.5 Experimental results of clustering algorithm

7 Conclusion

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A The additional results for our experiments

Appendix A The additional results for our experiments

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation