A Bi-directional Fuzzy C-Means Clustering Ensemble Algorithm Considering Local Information

The classic Fuzzy C-means (FCM) algorithm has limited clustering performance and is prone to misclassification of border points. This study offers a bi-directional FCM clustering ensemble approach that takes local information into account (LI_BIFCM) to overcome these challenges and increase clustering quality. First, various membership matrices are created after running FCM multiple times, based on the randomization of the initial cluster centers, and a vertical ensemble is performed using the maximum membership principle. Second, after each execution of FCM, multiple local membership matrices of the sample points are created using multiple K-nearest neighbors, and a horizontal ensemble is performed. Multiple horizontal ensembles can be created using multiple FCM clustering. Finally, the final clustering results are obtained by combining the vertical and horizontal clustering ensembles. Twelve data sets were chosen for testing from both synthetic and real data sources. The LI_BIFCM clustering performance outperformed four traditional clustering algorithms and three clustering ensemble algorithms in the experiments. Furthermore, the final clustering results has a weak correlation with the bi-directional cluster ensemble parameters, indicating that the suggested technique is robust.


Introduction
As one of the most commonly used data analysis methods in machine learning, data mining, and artificial intelligence, clustering divides data sets into clusters according to their features so that the sample points in the same cluster are highly similar, while those in different clusters are dissimilar [1]. Without knowing the class relations of any data, clustering is an unsupervised learning method, which is widely used in image processing [2], information security [3], and market analysis [4], and other fields since it can discover the potential values of data. According to various clustering theories, clustering can be classified into five categories [5]: partition-based clustering, density-based clustering, grid-based clustering, hierarchical clustering, and modelbased clustering. Among them, partition-based clustering such as K-means [6] and FCM [7,8] are simple and efficient. Therefore, they are broadly applied in the engineering field.
First proposed by Dunn in 1974 and different from the hard clustering of K-means, FCM [7] offers a more flexible way of clustering and introduces fuzzy membership. The fuzzy membership function is introduced in fuzzy mathematics and uncertainty theory [9,10]. The optimal memberships and cluster centers of sample points are obtained by iteratively calculating and are finally clustered following the principle of maximal membership. Although FCM has been applied successfully in many sectors since its emergence, it still has some shortcomings, e.g., sensitivity to initial cluster centers, noise points, and boundary points, poor performance in unbalanced data sets, and the tendency for local optima during iteration.
This work proposes a bi-directional FCM clustering ensemble technique that takes local information into account (LI_BIFCM) to address the drawbacks of FCM. The suggested technique considers not just clusters diversity but also sample points local information.
The following are the paper's significant innovations and contributions: 1. A vertical clustering ensemble is used to keep the algorithm stable. 2. Using multiple horizontal clustering ensembles, the technique effectively prevents border points from being misclassified. 3. LI_BIFCM increases clustering performance even further by employing horizontal and vertical ensembles.
The rest of this paper is laid out as follows: the second section examines and summarizes the associated research. The core ideas of FCM are briefly described in Sect. 3. The LI_BIFCM method is discussed in detail in Sect. 4. The experimental evaluation of the suggested algorithm is discussed in Sect. 5. Section 6 wraps up this paper and looks ahead to future research.

Related Works
Many FCM-derived algorithms have been presented and examined in past research work to address FCM's shortcomings.
To overcome its tendency for local optima, two improved FCM algorithms (named FCM-IDPSO and FCM2-IDPSO) were proposed by Silva [11], which dynamically adjusted the parameters based on improved particle swarm optimization and provided a better balance between exploration and exploitation. Experimental results suggested that the proposed method delivered excellent clustering results at a faster speed. To automatically identify the cluster center and initial location of FCM, an entropy-based fuzzy clustering method was presented by Yao [12] and to determine the cluster center by calculating the entropy of each sample point. A new method was also devised to estimate the initial membership functions of fuzzy sets, which has good predictive output values. Proposed by Ding et al. [13], the improved FCM algorithm combined genetic algorithm and Gaussian kernel technique, which could overcome the deficiency of FCM's failure to determine the number of clusters and boost the clustering performance. Zou et al. [14] presented an initialization method of FCM, which extracted approximate cluster centers from samples based on grid and density, and adopted the number of approximate cluster centers to initialize the number of cluster centers. Experiments indicated that this method could effectively improve the clustering performance and shorten the clustering time. To tackle the problem of FCM's sensitivity to initial cluster centers and noise points, the improved FCM algorithm based on the initial center optimization method, density clustering, and grid clustering was proposed by Shi [15] and its effectiveness was proved by taxi trajectory data sets. Aiming to improve FCM's clustering performance in noisy data sets, the effective objective function with the cluster center learning method-based quadratic mean distance, entropy, and regularization terms was presented [16]. The cluster centers were updated according to the new objective function, and the strengths of this method were validated by varied data sets. Given that most of the improved FCM neglected the dissimilarity in clusters, Qamar et al. [17] introduced a dissimilarity measure between clusters, designed an objective function that generated highly dissimilar clusters and verified better performance of the improved FCM through data sets experiments. Li et al. [18] proposed a double fuzzy C-means clustering model, which comprised two interconnected and interactive FCM algorithms, and redesigned a new objective function to enhance the intra-cluster compactness and inter-cluster separation and to improve the clustering accuracy. Since FCM is easily affected by the Euclidean distance, Wang et al. [19] presented a weighted FCM algorithm based on the weighted Euclidean distance, which included feature weights into the weighted Euclidean distance. Experiments suggested that the clustering performance could be enhanced by the improved FCM. Haldar et al. [20] devised an improved FCM algorithm based on the Mahalanobis distance, which improved the clustering quality compared with the traditional FCM algorithm. Considering the varying weights of sample points, Wu et al. [21] proposed an improved FCM algorithm combining adaptive weights with SA-PSO, which could avoid local optima and effectively improve the clustering performance. To solve the fuzzy boundary value and fuzzy integrodifferential equations problem, the reproducing kernel algorithm is adopted by Arqub [22,23]. In response to different sample weights and feature weights, Wu [24] further proposed an improved FCM algorithm, which introduced the adaptive data weight vector, the adaptive feature weights matrix, and constructed a new objective function. Experimental results demonstrated that the clustering performance of the algorithm was improved significantly. Although the above-related studies of FCM have made some progress, the performance of an individual or improved FCM clustering algorithm is limited for different data sets, and poor stability.
With the deepening of the research and the differences in the actual application of data sets, the individual FCM clustering algorithm has certain limitations. Hence, the idea of integrating different clustering algorithms has emerged to upgrade the clustering results and meet the actual application requirements [25]. As an unsupervised learning technique based on ensemble, clustering ensemble combines multiple individual clustering results into a unified robust result and can achieve higher accuracy than the individual clustering method [26]. To overcome the failure of traditional FCM to cluster large data sets effectively, Li et al. [27] proposed an FCM-based ensemble clustering algorithm for large data sets, which could improve the clustering accuracy by clustering atoms on data sets. To improve the robustness of clustering, Su et al. [28] proposed a link-based pairwise matrix method for the clustering ensemble of FCM, which adopted a fuzzy graph to represent the relationship between component clusters and then obtained the final ensemble clustering results. Experimental results illustrated that the proposed method outperformed other methods. Su et al. [29] also presented a hierarchical fuzzy clustering ensemble approach, which employed FCM and hierarchical clustering method to generate base clusters and achieve consensus functions, and verified its advantages in clustering accuracy and time efficiency on large data sets. The fuzzy C-means clustering algorithm with improved random projection was proposed [30], which improved the efficiency of clustering through singular value decomposition of the concatenation of membership matrices. To unify the fuzzy clustering partitions, Wan et al. [31] presented an FCM-based fuzzy consensus clustering framework (FCC), which redesigned the objective function and translated FCC into a weighted and segmented FCM clustering, and verified its effectiveness theoretically and experimentally. At present, much attention is paid to the generation methods of cluster membership and design of consensus function in the field of clustering ensembles, whereas little consideration is given to local information of sample points in the process of the ensemble.
In conclusion, the quality of a single FCM cluster is limited, and effective border point allocation is impossible. As a result, more research is required in response to these flaws. In contrast, our research will focus on the membership category of cluster border points and increase the clustering effect by focusing on the idea of clustering ensemble. The original FCM and the proposed algorithm are described in depth in the next section.

Fuzzy C-Means Clustering Algorithm
As a classical partition clustering algorithm, the FCM algorithm is mainly worked by obtaining the fuzzy membership of each sample point to all cluster centers through optimizing the objective function, to determine the category of sample points. Given the data set X = {x 1 , … , x i , … , x n } , it is divided into c clusters, and the center of each cluster is c j (j = 1, 2, … , c).
The objective function of FCM is defined as follows: (1) where u ij is the fuzzy membership of x i in the c j cluster, e is the membership factor, and ||x i − c j || represents the Euclidean distance from x i to c j . In the process of fuzzy clustering, the membership u ij and the cluster center c j are iterated constantly until clustering is completed, that is when the membership does not change greatly or the number of iterations t satisfies the following equation, or when the objective function J e reaches the local optimum (minimum): where is an error threshold.
To get the minimum J e , the updating calculation method of u ij and c j is shown in the following equations: The clustering algorithm of FCM is as follows:

The LI_BIFCM Algorithm
This section explains the LI_BIFCM algorithm's framework and the essential steps of the proposed technique in depth. Figure 1 depicts the LI_BIFCM structure, which is separated into three sections: vertical ensemble, horizontal ensemble, and final clustering ensemble. First, the parameters introduced into the system are the data set, cluster center, ensemble times m, multiple KNN parameters p (p is a percentage, normally takes 2%), and s (s is used to control the number of executions of multiple KNN). Then, the data set is executed for m times by FCM, yielding distinct membership matrices due to the randomness of the FCM cluster center, resulting in the vertical ensemble. Following that, after each cluster, multiple K-nearest neighbors are utilized to build the local membership matrix of sample points to form a horizontal ensemble, and the m times cluster can produce m horizontal ensemble clustering results. Finally, the vertical and horizontal ensembles are combined to generate the final clustering results.

Vertical Ensemble
A single FCM is used as the base cluster member in LI_ BIFCM. Clustering results are different because FCM randomly generates the initial cluster center each time, and this feature is used to generate diverse and stable clustering results.
Suppose the data set has n sample points

Horizontal Ensemble
The multiple K-nearest neighbors approach is used to consider the local information of sample points to solve the problem of boundary point misclassification.
First, the K-nearest neighbors of each sample point is calculated, and its calculation formula is shown in the following equation: where d(x i , x j ) denotes the Euclidean distance between the point x i and x j , NN k (x i ) represents the k-th nearest neighbor of x i , and k ∈ [1, n]. The membership matrix is obtained after one FCM clustering, and the maximum membership matrix C 1,k x i of K-nearest neighbors of the sample points is calculated, as shown in the following equation: Then, the maximum membership matrix C 1,k+s x i of multiple K-nearest neighbors (k takes different values) of the sample points are calculated, as shown in the following equation: where s is used to control the number of executions of multiple K-nearest neighbors, and its range is 1 to n − 1.
A horizontal ensemble is carried out by combining the clustering result of one FCM and the membership matrix of multiple K-nearest neighbors to obtain its clustering result C 1,Horizontal

The LI_BIFCM Algorithm Flow
The following is the LI_BIFCM algorithm flow: 1. Set the number of cluster centers as c, the fuzzy factor as e, the condition of stopping iteration as , the maximum number of iteration as T, the number of cluster ensemble parameters m and s. (10) (11) C 1,k+s C result 4. Calculate the maximum membership matrix L f of each clustering by Eq. (7), and carry out a vertical ensemble by Eq. (8) to obtain the vertical ensemble clustering results C Vertical Eq. (9). 6. Calculate the maximum membership matrix C 1,k LI_BIFCM's pseudo code is given as follows based on the above analysis.

Time Complexity Analysis
The time complexity of LI_BIFCM is given when the preceding description is combined. Let n be the number of test sample sets, c be the number of cluster centers, and t be the number of FCM iterations. Our algorithm's time complexity is determined by the following two factors: (a) the time it takes to run FCM numerous times determines the vertical clustering ensemble. Executing

Experimental Settings
To evaluate the performance of the LI_BIFCM algorithm, twelve synthetic and real-world data sets are used in the experiment, which are from different fields and have different sizes, dimensions, and category numbers, as shown in Tables 1 and 2. These data sets are widely used in clustering tests, through which the clustering performance of LI_BIFCM in different application scenarios can be simulated. The experimental environment is a PC with an Intel (R) Core(TM) i7-7500 CPU @ 2.70GHz, 2.90 GHz, 12G RAM, Windows 10 64-bit OS, and the programming tool is Matlab 2015b.
Three classical clustering evaluation indexes are used in the experiment. The accuracy of clustering (ACC) describes the comparison results between the clustering labels and the real labels of the sample points [32]. The adjusted rand index (ARI) represents the overlap degree between clustering partition and actual partition [33]. The adjusted mutual information index (AMI) indicates the consistency between the clustering results and the real categories [34]. The specific calculating formulas of the three evaluation indexes are as follows: where n represents the total number of samples, u i ,v i are the clustering labels and the real labels, respectively, and (u i , v i ) is a delta function.
where a means the sample logarithm of the clustering result and the real category in the same category, b means that the clustering result and the real category are the sample logarithm of different categories, and E[RI] means the expectation of rand index RI.
,  where U,V represent the real labels vector and clustering labels vector, respectively, R,C represent the number of real clusters and clustering clusters, respectively, and P(.) stands for probability, and H(.) stands for information entropy. To eliminate the difference between data dimensions, the data sets need to be standardized before the experiment and the calculation method is shown as the following equation: where x ij represents the attribute value of the sample points x i in jth column, and max x j and min x j are the maximum and minimum values of the jth attribute column, respectively.

The Comparison with the Original FCM Algorithm on Synthetic Data Sets
To test the effectiveness of LI_BIFCM in processing boundary points, three two-dimensional synthetic data sets are chosen for experimental comparison with FCM. The size of these three data sets is 5000, with 15 clusters. The degree of contact between border points varies, and misclassification is common. Figures 2, 3 and 4 depict the clustering findings. Figure 2 shows that LI_BIFCM can handle the boundary points of the S1 data set successfully, whereas FCM has two issues: one is an error in the cluster center, and the other is an error in the boundary points. This is because when a cluster center error occurs in FCM, it sets off a chain reaction. Black boxes 1 and 2 are independent clusters, as illustrated in Figs. 2c, but because black box 1 has two cluster centers, the clusters in black box 2 can only be assigned to the adjacent cluster. Figure  Similarly, FCM splits a cluster into many clusters by mistake, but LI_BIFCM maintains excellent efficiency. Figure 5 summarizes the clustering index results from Figs. 2, 3 and 4, showing that LI_BIFCM beats FCM in all three indicators on the synthetic data sets. FCM is easily influenced by the initial clustering center, falls into a local optimum, and is sensitive to border points for this reason. By using a vertical ensemble, our approach can discover the best clustering center. The proposed algorithm uses vertical ensemble to determine the best clustering center, and horizontal ensemble to efficiently assign boundary points, which increases clustering quality.

The Comparison with the Basic Algorithms and the Clustering Ensemble Algorithms on Real-World Data Sets
To test the LI_BIFCM algorithm's clustering performance further, seven representative algorithms are selected for comparison, which includes basic algorithms (K-means, DBSCAN, AP, and FCM), soft clustering ensemble algorithm (CSPA [35,36]), and hard clustering ensemble algorithms (HGPA [35,36], and MCLA [37]). K-means is a clustering technique based on partitions. It must determine the number of clusters to be created, choose initial cluster centers at random, and calculate the distance between each sample and each cluster center. The cluster centers are recalculated once each sample is assigned to the cluster that is closest to it. This step is repeated until the algorithm reaches a particular termination condition.
DBSCAN is a clustering algorithm that is based on density. The number of clusters does not need to be specified in advance, but it does have two parameters: domain radius (Eps) and core point threshold (MinPts). The clustering process starts from a selected core point and continuously expands to the area where the density is reachable to obtain a maximum area containing the core point and the boundary point, and any two points in the area are connected by density.
AP is a clustering algorithm that works by transferring information between sample points. It uses all samples as network nodes, iteratively calculating the information (a) (b) (c) Fig. 2 Clustering results of LI_BIFCM and FCM on S1 data set (responsibility and availability) of each network edge until several high-quality exemplars are obtained and the remaining points are assigned to the appropriate clusters. CSPA is a cluster-based similarity partition algorithm that starts by defining a new similarity between any two sample points, then calculates a n * n similarity matrix, and finally clusters the data using a pairwise similarity-based clustering algorithm.
HGPA is a kind of hypergraph partition algorithm. It starts by creating a hypergraph with sample points at the vertices. Each cluster in each cluster member is a closed hyperedge that includes all of the cluster's vertices, and the clustering results are obtained using the hypergraph partition procedure. MCLA is a meta-clustering algorithm. It measures the similarity of two clusters using the Jaccard coefficient, then obtains meta-clusters using the METIS method, and lastly allocates samples to the most relevant meta-clusters.
As a result, using these algorithms to conduct comparative trials is extremely sensible. Each basic algorithm experiment is conducted ten times to get the average value of the clustering findings, and each clustering ensemble algorithm is run until 10 ideal cluster member sets are produced. The clustering accuracy of eight algorithms on nine test data sets is shown in Table 3, in which the best clustering evaluation index among eight clustering algorithms is indicated in bold. Table 3 reveals the average ACC of LI_BIFCM ranks first among eight algorithms, achieves the highest clustering accuracy in six of the nine data sets and ranking top two in the other data sets (Dermatology and Iris). Only in the Ionosphere data set, LI_BIFCM ranks 4th.
To see if there are statistically significant differences between the LI_BIFCM method and the other seven algorithms in Table 3. With a confidence level of 0.95, the Aligned Friedman test [38] is used. The specific method is the clustering accuracy of a certain algorithm on a certain data set minus the average clustering accuracy of all algorithms on this data set to participate in the sorting. The clustering accuracy of eight algorithms on nine data sets ranging from 1 to 72 is shown in Table 4, and the numbers in bold indicate the best ACC sorting. Table 4 shows that LI_BIFCM achieved the best-ranking result with an average of 16.8, MCLA ranked second with an average ranking value of 24.1, FCM ranked third with an average ranking value of 28.6, and the ranking of the remaining algorithms are as follows: AP, K-means, CSPA, DBSCAN, and HGPA.
The Friedman aligned rank test [38] is then applied, with the calculation method depicted as the following equation: In the eight algorithms and nine test data sets, the statistic T obeys the Chi-square distribution with seven degrees of freedom. We can observe that the p value of 2 (7) is 1.91E − 04, which is substantially less than 0.05, by searching up the Chi-square distribution table. Therefore, the null hypothesis can be rejected. The results obtained by all algorithms on the nine data sets are considered to be significantly different.
Similarly, Tables 5 and 6 show that the average ARI and AMI of LI_BIFCM rank first among eight algorithms, and achieve the highest ARI and AMI in at least seven data sets, indicated by bold numbers. Then, the Friedman Aligned Rank test [38] is used in conjunction with Table 7 to compare the clustering index ARI, and calculated T = 25.4161 . The statistical variable T obeys the chi-square distribution with seven degrees of freedom, the p value of 2 (7) is 6.40E − 04, which is much less than 0.05. Therefore, the null hypothesis can be rejected, and the ARI value of the LI_BIFCM algorithm is statistically significantly better than the other seven algorithms in the ranking results. On the other clustering index AMI, the ranking results of AMI are shown in Table 8, the p value of 2 (7) is 4.25E − 05, and its value is also much smaller than 0.05. The bold numbers in Tables 7 and 8 represent the best ARI and AMI rankings of the algorithm, respectively. Therefore, it is concluded that the proposed algorithm is superior to other seven algorithms. In summary, LI_BIFCM achieved the best clustering results on the most real-world data sets for the three reasons listed below. First, numerous membership matrices can correctly locate cluster centers and ensure rather stable clustering results in the vertical ensemble. Second, the method of multiple K-nearest neighbors is employed in the horizontal ensemble process to fully use the local information for the sample points to find the optimal category for border points. Third, the attribution of sample points is clarified after the bi-directional clustering ensembles, and clustering performance is effectively improved.

The Number of Vertical Ensembles Parameters m
Each additional vertical ensemble will increase the time cost using the algorithm of this paper. Experiments are    Fig. 6, which shows that with the number of vertical ensembles changing from 5 to 10 to 15... and to 30, no obvious fluctuation is observed in all clustering evaluation indexes.

The Number of Horizontal Ensembles Parameters s
In a horizontal clustering ensemble, the most critical parameter s governs the number of executions of multiple K-nearest neighbors. Therefore, to study the influence of the parameter s on the clustering performance, the following experiments are carried out. The average clustering index of the algorithm conducted 10 times is utilized as the experimental result, assuming that the parameter m of the vertical clustering ensemble of LI_BIFCM is five.
The s values are 4, 8, 12, 16, and 20. Figure 7 shows the experimental result, with the horizontal ensemble parameter s increasing from 4 to 8, then to 12, ... , and 20. The three clustering indexes did not show any noteworthy changes. As a result, the experimental results of Figs. 6 and 7 can be summarized as follows. For the following reasons, LI_BIFCM is unaffected by the parameters of vertical and horizontal ensembles: FCM and KNN are reasonably stable; increasing the number of vertical and horizontal ensembles will only result in homogenized clustering; and the additional number of vertical and horizontal ensembles will have no effect on LI_BIFCM's performance.

Run Time Comparison of 4 Algorithms
The following experiment was done to see how long LI_BIFCM takes to run. The CSPA, HGPA, and MCLA algorithms were chosen for experimental research since the single K-means, DBSCAN, AP, and FCM algorithms are much faster than the clustering ensemble method. To acquire the final clustering results, the LI_BIFCM, CSPA, HGPA, and MCLA algorithms are used to integrate the cluster members produced by running the FCM algorithm five times. Each experiment was performed 10 times under the same conditions, with the average of the 10 running times used as the result. Table 9 shows the experimental result, with the minimal running time for each data set highlighted in bold. Table 9 shows that LI_BIFCM takes substantially less time to run than the other three algorithms in all data sets (excluding Pima). Because the CSPA method must calculate the similarity of all sample points, the HGPA algorithm must produce a hypergraph, and the MCLA must also compute the similarity of any two clusters. The execution times of HGPA and MCLA are nearly same. LI_BIFCM not only provides a better clustering effect than other clustering ensemble algorithms when combined with the preceding experimental analysis, but it also has a shorter run time.

Conclusion
In this paper, to increase clustering performance, we introduced a new clustering ensemble framework that considers local information (LI_BIFCM), dubbed the bi-directional FCM clustering ensemble algorithm. To achieve the best clustering results, LI_BIFCM combines vertical and horizontal ensembles. The properties of FCM random initial clustering centers are employed in the vertical ensemble, and FCM is run numerous times to acquire multiple cluster members for the vertical ensemble to ensure clustering stability. To avoid boundary point misdivision, multiple K-nearest neighbors are used to execute a horizontal ensemble after acquiring each cluster member. A comprehensive experimental analysis indicates: (1) LI_BIFCM surpasses FCM in dealing with boundary points. (2) On most data sets, LI_BIFCM outperforms a single clustering method as well as some clustering ensemble algorithms. (3) The vertical and horizontal ensembles parameters do not affect LI_BIFCM.
(4) When compared to the clustering ensemble approach, the suggested algorithm takes less time to run. Base clustering optimization and its application to real-world scenarios, such as customer analytics, will receive greater attention in the future.