Introduction

Clustering is a frequently-used data mining tool for unlabeled data analyzing and a significant branch of unsupervised machine learning, the distribution of sample data could be obtained by clustering [1, 2]. Clustering has been applied in many real-world applications: image processing [3,4,5], text organization [6], food detection [7], bioinformatics [8], etc.

The conventional fuzzy clustering approach (FCM) [9] is one of the most famous clustering approaches. The conventional fuzzy clustering approach has attracted the attention of many researchers because it has a lot of advantages, such as: low computation complexity, fast convergence, easy implementation, and low storage capacity, etc. The conventional fuzzy clustering approach has been successfully utilized in various applications [10,11,12,13,14,15,16,17,18,19,20].

In Recent times, various enhanced versions of fuzzy clustering frameworks have been constructed to improve the performance of fuzzy clustering. In [21], due to that the conventional fuzzy clustering approach is sensitive to noise, an improved fuzzy clustering framework with a new method to constrain the membership functions is designed; In [22], aiming at the disadvantage of the conventional FCM: each data is treated equally, even if the outlier, an enhanced fuzzy clustering approach (AFCM) taking different data importance into account is put forward; In [23], to improve the performance of fuzzy clustering, a novel data reconstruction method to determine the partition matrix is presented; In [24], to address the problem that the conventional fuzzy clustering approach is sensitive to the initial centroids, a new version of fuzzy clustering approach with a self-adaptive strategy is put forward; In [25], considering the clustering process as an underlying optimization problem, by introducing a multi-phase learning approach, an improved fuzzy clustering framework is presented, in which the clustering process is optimized by genetic approach and simulated annealing; In [26], an enhanced fuzzy clustering approach by regulating the location of the prototypes is developed, in which the clustering process is optimized by differential evolution (DE) in a supervised mode; In [27], to obtain better performance, by projecting the original data into a new data space in a nonlinear manner, a transformed data-based fuzzy clustering framework is presented, in which the clustering process is conducted in the new data space; In [28], for handling the problem that the number of clusters can’t be identified automatically, a novel fuzzy clustering method mimicking the way human observe the peak of a mountain is presented; In [29], by combing the advantages of kernel functions, intuitionistic fuzzy sets and genetic algorithms in actual clustering problem, an efficient evolutionary kernel intuitionistic fuzzy clustering algorithm is proposed; In [30], for addressing the problem that the importance of different feature is not taken into account in the conventional fuzzy clustering approach, a novel fuzzy clustering framework named EWFCM with the item of feature weights entropy is put forward.

In most existing fuzzy clustering algorithms, the membership degrees of an individual belonging to different clusters are relied on the different distances between the individual and different cluster centroids, the similarity between the individual and data in different clusters are ignored, which would affect the clustering performance. To better illustrate this problem, let's take a synthetic dataset as an example. There is a 2-D dataset with two clusters as shown in Fig. 1: in cluster 1, data are randomly generated in the square with centroid: (0.2, 0.2), side length: 0.4; in cluster 2, data are randomly generated in the square with centroid: (0.7, 1.1), side length: 1. When the conventional fuzzy clustering algorithm is applied for clustering, we can get the clustering result shown in Fig. 2. From Fig. 2, we can see that many data in cluster 2 are assigned into cluster 1 mistakenly. Except the conventional fuzzy clustering algorithm, this problem exists in almost all the aforementioned extensions of fuzzy clustering algorithm. The main reason for this is that in the existing extensions of fuzzy clustering algorithm, the membership degrees of an individual belonging to different clusters only depends on the distances between the individual and different cluster centroids, whereas the similarity between the individual and data in different clusters is not taken into consideration, as shown in Fig. 2, the individuals assigned to cluster 2 mistakenly are more similar with more data in cluster 1, compared with cluster 2. Besides, the outliers cannot be distinguished effectively in the conventional fuzzy clustering algorithm, a 2-D dataset with outlier is shown in Fig. 3, by the conventional fuzzy clustering algorithm, the membership value of the outlier referring to cluster 1 and cluster 2 is about 0.5, which would have influence on the calculation of cluster centroids. In this paper, to address this problem, we propose a novel fuzzy clustering algorithm.

Fig.1
figure 1

A dataset with cluster 1: data randomly generated in the square with centroid: (0.2, 0.2) side length: 0.4; cluster 2: data randomly generated in the square with centroid: (0.7, 1.1), side length: 1

Fig. 2
figure 2

The clustering result by the conventional fuzzy clustering algorithm on the dataset shown in Fig. 1

Fig. 3
figure 3

A 2-D dataset with outlier

Based on the existing research, on account of the concept that data close to each other should be grouped together and be assigned to a same cluster, in this paper, a novel efficient fuzzy clustering approach is presented. The main contributions of this paper are as follows:

  1. 1.

    an entirely new idea that the membership degree values of an individual referring to different clusters should not only depend on the distances between the individual and different cluster centers, but also rely on the distances between the individual and several nearest neighbor data in different clusters is put forward;

  2. 2.

    a new scheme to search for the outliers is presented, a method for identifying the different importance of different features is introduced;

  3. 3.

    considering the adaptive local neighborhood information of each data referring to different clusters, a novel efficient fuzzy clustering approach is presented.

Experiments on synthetic and real-world datasets are conducted, the clustering results show that the algorithm proposed provides a new way to improve the performance of fuzzy clustering. The organization of the paper is as follows: the research background and challenge are introduced in Sect. "Introduction"; the conventional fuzzy clustering approach and classical extensions of fuzzy clustering approach are introduced in Sect. "Fuzzy clustering approaches"; our proposed fuzzy clustering approach based on adaptive local neighborhood information is described in detail in Sect. "Adaptive local neighborhood information based efficient fuzzy clustering approach"; experiments and analysis are conducted in Sect. "Experiments and results"; in Sect. "Conclusion", a conclusion is given at last.

Fuzzy clustering approaches

Let \(X = \{ x_{k} \}_{k = 1}^{n}\) be the input dataset, let c be the number of categories, \(1 \le k \le n\); let \(V = [v_{1} \begin{array}{*{20}c} {} & {v_{2} } \\ \end{array} \begin{array}{*{20}c} {...} & {v_{c} } \\ \end{array} ]\) be the prototypes (cluster centroids), Vi is the ith prototype (centroid), \(1 \le i \le c\); let n be the sample size; let U be the partition matrix; let \(u_{ik}\) be the degree of membership of the kth individual referring to the ith cluster, \(u_{ik} \in [0,1]\), \(\sum\nolimits_{i = 1}^{c} {u_{ik} = 1}\); let \(\alpha\) be the fuzziness parameter; Ler \(f_{ij}\) be the importance of the jth feature relating to the ith cluster, \(f_{ij} \in [0,1]\), \(\sum\nolimits_{j = 1}^{m} {f_{ij} = 1}\), \(1 \le j \le m\); m indicates the number of features; \(w_{k}\) indicates the importance of \(x_{k}\), \(\prod\nolimits_{k = 1}^{n} {w_{k} = 1}\); p, \(\lambda\) are hyper-parameters; \(d(x_{k} ,v_{i} )\) indicates the Euclidean distance between individual \(x_{k}\) and Vi, \(d(x_{k} ,v_{i} ) = \left\| {x_{k} - v_{i} } \right\|^{2}\).

The conventional FCM approach minimizes the objective function as follows:

$$ J_{FCM} (X,U,V) = \sum\limits_{i = 1}^{c} {\sum\limits_{k = 1}^{n} {u_{ik}^{\alpha } d^{2} (x_{k} ,v_{i} )} } $$
(1)

AFCM minimizes the objective function as follows:

$$ J_{AFCM} (X,U,V) = \sum\limits_{i = 1}^{c} {\sum\limits_{k = 1}^{n} {w_{k}^{p} u_{ik}^{\alpha } d^{2} (x_{k} ,v_{i} )} } $$
(2)

EWFCM minimizes the objective function as follows:

$$ J_{EWFCM} (X,U,V) = \sum\limits_{i = 1}^{c} {\sum\limits_{k = 1}^{n} {u_{ik}^{\alpha } \sum\limits_{j = 1}^{m} {f_{ij} (x_{kj} - v_{ij} )^{2} } } } +\lambda^{ - 1} \sum\limits_{i = 1}^{c} {\sum\limits_{j = 1}^{m} {f_{ij} } } \log f_{ij} $$
(3)

By minimizing Eqs. (1), (2) and (3), respectively, we can derive the updating formulas of different fuzzy clustering approaches in each iteration:

$$ u_{ik} (FCM) = \left[ {\sum\limits_{j = 1}^{c} {\left( {\frac{{\left\| {x_{k} - v_{i} } \right\|^{2} }}{{\left\| {x_{k} - v_{j} } \right\|^{2} }}} \right)}^{1/(\alpha - 1)} } \right]^{ - 1} $$
$$ v_{i} (FCM) = \frac{{\sum\nolimits_{k = 1}^{n} {u_{ik}^{\alpha } x_{k} } }}{{\sum\nolimits_{k = 1}^{n} {u_{ik}^{\alpha } } }} $$
$$ u_{ik} (AFCM) = \left[ {\sum\limits_{p = 1}^{c} {\left( {\frac{{\left\| {x_{k} - v_{i} } \right\|^{2} }}{{\left\| {x_{k} - v_{p} } \right\|^{2} }}} \right)}^{1/(\alpha - 1)} } \right]^{ - 1} $$
$$ w_{k} (AFCM) = \left[ {\frac{{\left[ {\prod\nolimits_{j = 1}^{n} {\left( {\sum\nolimits_{i = 1}^{c} {u_{ij} d_{ij}^{2} } } \right)} } \right]^{\frac{1}{n}} }}{{\sum\nolimits_{i = 1}^{c} {u_{ik} d_{ik}^{2} } }}} \right]^{\frac{1}{p}} $$
$$ v_{i} (AFCM) = \frac{{\sum\nolimits_{k = 1}^{n} {w_{k} u_{ik}^{\alpha } x_{k} } }}{{\sum\nolimits_{k = 1}^{n} {w_{k} u_{ik}^{\alpha } } }} $$
$$ u_{ik} (EWFCM) = \left[ {\sum\limits_{h = 1}^{c} {\left( {\frac{{\sum\nolimits_{j = 1}^{m} {f_{ij} } (x_{ij} - v_{ij} )^{2} }}{{\sum\nolimits_{j = 1}^{m} {f_{hj} } (x_{ij} - v_{hj} )^{2} }}} \right)}^{1/(\alpha - 1)} } \right]^{ - 1} $$
$$ f_{ij} (EWFCM) = \frac{{\exp \left( { - \lambda \sum\nolimits_{k = 1}^{n} {u_{ik}^{\alpha } (x_{kj} - v_{ij} )^{2} } } \right)}}{{\sum\nolimits_{h = 1}^{m} {\exp \left( { - \lambda \sum\nolimits_{k = 1}^{n} {u_{ik}^{\alpha } (x_{kh} - v_{ih} )^{2} } } \right)} }} $$
$$ v_{ij} (EWFCM) = \frac{{\sum\nolimits_{k = 1}^{n} {u_{ik}^{\alpha } x_{kj} } }}{{\sum\nolimits_{k = 1}^{n} {u_{ik}^{\alpha } } }} $$

Adaptive local neighborhood information based efficient fuzzy clustering approach

The objective function of the new fuzzy clustering approach

In the conventional fuzzy clustering approach, the membership degrees of an individual belonging to different clusters are relied on the different distances between the individual and different cluster centroids, the local neighborhood information of each individual is ignored, even if an individual and its nearest neighbor data in dataset are partitioned into different clusters, besides, the conventional fuzzy clustering algorithm cannot discover the outliers effectively, even if the outlier is very far away its nearest neighbor data in different clusters. For improving the clustering performance and addressing the problem existing in the introduction part, we put forward a novel fuzzy clustering algorithm based on adaptive local neighborhood information, we believe that data close to each other should be grouped together and assigned to a same cluster, the membership degrees of an individual belonging to different clusters should be relied not only on the different distances between the individual and different cluster centroids, but also on the distances between the individual and its nearest neighbor data in different clusters, so, in this paper, the local neighborhood information of each data is taken into consideration.

For distinguishing the different importance of different features, an improved clustering framework considering different importance of different features referring to different clusters is presented [30]. However, in this paper, there is similarity computing between different data, for facilitating the computing of similarity between different data, different importance of different features is taken into consideration instead of different importance referring to different clusters. When different importance of different features gets considered, the case that few features with large weights dominate the clustering process could happen, which may lead to poor clustering performance. To prevent this case from happening, feature entropy regularization term is introduced into the novel objective function.

The improved detailed objective function of the presented fuzzy clustering approach based on adaptive local neighborhood information (EFCM_ALN) is defined as:

$$ \begin{aligned} & J_{EFCM\_ALN} (X,U,V,F)\\ &\quad= \sum\limits_{k = 1}^{n} {\sum\limits_{i = 1}^{c} {u_{ik}^{2} \sum\limits_{j = 1}^{m} {f_{j} (x_{kj} - v_{ij} )^{2} } } } \\ &\quad\quad +\delta \sum\limits_{k = 1}^{n} {\sum\limits_{i = 1}^{c} {u_{ik}^{2} \sum\limits_{j = 1}^{m} {\sum\limits_{p = 1}^{P} {f_{j} (x_{kj} - x_{pj}^{k,i} )^{2} } } } }\\&\quad + \eta \sum\limits_{j = 1}^{m} {f_{j} \ln f_{j} }\end{aligned} $$
(4)

\(J_{EFCM\_ALN}\) is subjected to the constraint:

$$ \sum\limits_{j = 1}^{m} {f_{j} = 1} ;\sum\limits_{i = 1}^{c} {u_{ik} = 1} . $$
$$ f_{j} \in [0,1];u_{ik} \in [0,1];1 \le k \le n;1 \le i \le c. $$

In the presented objective function, the first item \(\sum\nolimits_{k = 1}^{n} {\sum\nolimits_{i = 1}^{c} {u_{ik}^{2} \sum\nolimits_{j = 1}^{m} {f_{j} (x_{kj} - v_{ij} )^{2} } } }\) stands for the compactness of the fuzzy partition, its value depends on the sum of the distances between each data and different cluster centroids, n is the sample size, c is the number of clusters, m is the number of features, \(U = \left[ {\begin{array}{*{20}c} {u_{11} } & \cdots & {u_{1n} } \\ \vdots & \vdots & \vdots \\ {u_{c1} } & \cdots & {u_{cn} } \\ \end{array} } \right]\), \(u_{ik}\) is the membership degree value of the individual \(x_{k}\) referring to the ith cluster; \(V{ = [}V_{1} \cdots V_{i} \cdots V_{c} {]}^{T}\), \(V_{i} { = [}v_{i1} \cdots v_{ij} \cdots v_{im} {]}\) is the ith cluster centroids; \(F{ = [}f_{1} \cdots f_{j} \cdots \, f_{m} {]}\), \(f_{j}\) represents the importance of the jth feature, \(1 \le j \le c\); the second item is the adaptive neighborhood information of each data referring to different clusters; the third item is the feature entropy regulation. \(\delta\) and \(\eta\) are two hyper-parameters, \(\delta\) is applied for adjusting the relative effect of the adaptive neighborhood information item on the value of the objective function, which could balance the stability and reliability of the clustering results, when \(\delta = 0\), the proposed algorithm would turn to the conventional fuzzy clustering algorithm with feature weights, \(\eta\) is the feature weight entropy regularization coefficient.

\(x_{p}^{k,i}\) stands for the data meets the condition: \(x_{p} \in N_{P} (x^{k} )\), \(x_{p} \in C(i)\), that is: data \(x_{p}\) is assigned to the ith cluster, and data \(x_{p}\) belongs to P nearest neighbor data of \(x^{k}\) in the ith cluster.

\(\sum\nolimits_{j = 1}^{m} {\sum\nolimits_{p = 1}^{P} {f_{j} (x_{kj} - x_{pj}^{k,i} )^{2} } }\) stands for the sum of the distances between the individual \(x_{k}\) and P nearest neighbor data in the ith cluster. Data close to each other should be grouped together and assigned to a same cluster, if the sum of the distances between the individual \(x_{k}\) and P nearest neighbor data in the ith cluster is smaller, the probability of data \(x_{k}\) belonging to the ith cluster should be larger, the value of \(u_{ik}\) should be larger; on the contrary, if the sum of the distances between the individual \(x_{k}\) and P nearest neighbor data in the ith cluster is larger, the probability of data \(x_{k}\) belonging to the ith cluster should be smaller, the value of \(u_{ik}\) should be smaller, as well. So, minimizing the second item of the novel objective function: \(\sum\nolimits_{k = 1}^{n} {\sum\nolimits_{i = 1}^{c} {u_{ik}^{2} \sum\nolimits_{j = 1}^{m} {\sum\nolimits_{p = 1}^{P} {f_{j} (x_{kj} - x_{pj}^{k,i} )^{2} } } } }\) could achieve such positive effect.

The mean maximum nearest neighbor distance of dataset is defined as:

$$ {\raise0.7ex\hbox{${\sum\limits_{k = 1}^{n} {\mathop {\max }\limits_{i} \left( {\sum\limits_{j = 1}^{m} {\sum\limits_{p = 1}^{P} {f_{j} (x_{kj} - x_{pj}^{k,i} )^{2} } } } \right)} }$} \!\mathord{\left/ {\vphantom {{\sum\limits_{k = 1}^{n} {\mathop {\max }\limits_{i} \left( {\sum\limits_{j = 1}^{m} {\sum\limits_{p = 1}^{P} {f_{j} (x_{kj} - x_{pj}^{k,i} )^{2} } } } \right)} } {Pn}}}\right.\kern-0pt} \!\lower0.7ex\hbox{${Pn}$}} $$

where \({\raise0.7ex\hbox{${\mathop {\max }\limits_{i} \left( {\sum\nolimits_{j = 1}^{m} {\sum\nolimits_{p = 1}^{P} {f_{j} (x_{kj} - x_{pj}^{k,i} )^{2} } } } \right)}$} \!\mathord{\left/ {\vphantom {{\mathop {\max }\limits_{i} \left( {\sum\limits_{j = 1}^{m} {\sum\limits_{p = 1}^{P} {f_{j} (x_{kj} - x_{pj}^{k,i} )^{2} } } } \right)} P}}\right.\kern-0pt} \!\lower0.7ex\hbox{$P$}}\) represents the maximum value of the mean distance between data \(x_{k}\) and P nearest neighbor data in different clusters.

Based on the mean maximum nearest neighbor distance of dataset, we present a new approach to search for the outliers. If the mean distances between individual \(x_{k}\) and P nearest neighbor data in different clusters are all larger than the mean maximum nearest neighbor distance of dataset, then individual \(x_{k}\) is regarded as an outlier. That is for \(\forall i\), judge:

$$ \begin{aligned}& {\raise0.7ex\hbox{${\sum\limits_{j = 1}^{m} {\sum\limits_{p = 1}^{P} {f_{j} (x_{kj} - x_{pj}^{k,i} )^{2} } } }$} \!\mathord{\left/ {\vphantom {{\sum\limits_{j = 1}^{m} {\sum\limits_{p = 1}^{P} {f_{j} (x_{kj} - x_{pj}^{k,i} )^{2} } } } P}}\right.\kern-0pt} \!\lower0.7ex\hbox{$P$}} \\ &\quad > {\raise0.7ex\hbox{${\sum\limits_{k = 1}^{n} {\mathop {\max }\limits_{i} \Bigg(\sum\limits_{j = 1}^{m} {\sum\limits_{p = 1}^{P} {f_{j} (x_{kj} - x_{pj}^{k,i} )^{2}\Bigg )} } } }$} \!\mathord{\left/ {\vphantom {{\sum\limits_{k = 1}^{n} {\mathop {\max }\limits_{i} (\sum\limits_{j = 1}^{m} {\sum\limits_{p = 1}^{P} {f_{j} (x_{kj} - x_{pj}^{k,i} )^{2} )} } } } {Pn}}}\right.\kern-0pt} \!\lower0.7ex\hbox{${Pn}$}}\end{aligned} $$
(5)

Then, in the cluster centroids updating process of this iteration, the outliers will be removed.

To obtain the optimal updating formula of U, V, F in each iteration, Lagrange multipliers function is constructed:

$$ \begin{aligned}&\varphi_{EFCM\_ALN} (U,V,F)\\ &\quad= \sum\limits_{k = 1}^{n} {\sum\limits_{i = 1}^{c} {u_{ik}^{2} \sum\limits_{j = 1}^{m} {f_{j} (x_{kj} - v_{ij} )^{2} } } } \\ &\qquad +\delta \sum\limits_{k = 1}^{n} {\sum\limits_{i = 1}^{c} {u_{ik}^{2} \sum\limits_{j = 1}^{m} {\sum\limits_{p = 1}^{P} {f_{j} (x_{kj} - x_{pj}^{k,i} )^{2} } } } }\\ &\qquad + \eta \sum\limits_{j = 1}^{m} {f_{j} \ln f_{j} } +\phi_{1} \left( {\sum\limits_{i = 1}^{c} {u_{ik} - 1} } \right)\\&\qquad+\phi_{2} \left( {\sum\limits_{j = 1}^{m} {f_{j} - 1} } \right) \end{aligned}$$
(6)

Set \(\frac{{\partial \varphi_{EFCM\_ALN} (U,V,F)}}{{\partial u_{ik} }} = 0\), we can get the optimal updating formula of U:

$$\begin{aligned} &\frac{{\partial \varphi_{EFCM\_ALN} (U,V,F)}}{{\partial u_{ik} }} = 2u_{ik} \sum\nolimits_{j = 1}^{m} {f_{j} (x_{kj} - v_{ij} )^{2} }\\ &\quad+ 2\delta u_{ik} \sum\nolimits_{j = 1}^{m} {\sum\nolimits_{p = 1}^{P} {f_{j} (x_{kj} - x_{pj}^{k,i} )^{2} + } } \phi_{1} = 0\end{aligned} $$
$$ u_{ik} = \frac{{ - \phi_{1} }}{{2\sum\nolimits_{j = 1}^{m} {f_{j} (x_{kj} - v_{ij} )^{2} } { + 2}\delta \sum\nolimits_{j = 1}^{m} {\sum\nolimits_{p = 1}^{P} {f_{j} (x_{kj} - x_{pj}^{k,i} )^{2} } } }} $$

Because \(\sum\nolimits_{r = 1}^{c} {u_{rk} = 1}\)

$$\begin{aligned} - \phi_{1} \sum\limits_{r = 1}^{c} \bigg[ 2\sum\limits_{j = 1}^{m} f_{j} (x_{kj} - v_{rj} )^{2} + 2\delta \sum\limits_{j = 1}^{m} {\sum\limits_{p = 1}^{P} {f_{j} (x_{kj} - x_{pj}^{k,r} )^{2} } } \bigg]^{ - 1} = 1\end{aligned} $$
$$ u_{ik} = \frac{{\left[ {\sum\nolimits_{j = 1}^{m} {f_{j} (x_{kj} - v_{ij} )^{2} } +\delta \sum\nolimits_{j = 1}^{m} {\sum\nolimits_{p = 1}^{P} {f_{j} (x_{kj} - x_{pj}^{k,i} )^{2} } } } \right]^{ - 1} }}{{\sum\nolimits_{r = 1}^{c} {\left[ {\sum\nolimits_{j = 1}^{m} {f_{j} (x_{kj} - v_{rj} )^{2} } +\delta \sum\nolimits_{j = 1}^{m} {\sum\nolimits_{p = 1}^{P} {f_{j} (x_{kj} - x_{pj}^{k,r} )^{2} } } } \right]}^{ - 1} }} \\ $$
(7)

Set \(\frac{{\partial \varphi_{EFCM\_ALN} (U,V,F)}}{{\partial f_{j} }} = 0\), we can get the optimal updating formula of F:

$$\begin{aligned} &\frac{{\partial \varphi_{EFCM\_ALN} (U,V,F)}}{{\partial f_{j} }}\\ &\quad= \sum\limits_{k = 1}^{n} {\sum\limits_{i = 1}^{c} {u_{ik}^{2} (x_{kj} - v_{ij} )^{2} } }\\&\qquad + \delta \sum\limits_{k = 1}^{n} {\sum\limits_{i = 1}^{c} {u_{ik}^{2} \sum\limits_{p = 1}^{P} {f_{j} (x_{kj} - x_{pj}^{k,i} )^{2} } } }\\ &\qquad+\eta + \eta \ln f_{j} + \phi_{2} = 0\end{aligned} $$
$$ f_{j} = \exp \left( { - \frac{{\sum\nolimits_{k = 1}^{n} {\sum\nolimits_{i = 1}^{c} {u_{ik}^{2} (x_{kj} - v_{ij} )^{2} } +} \delta \sum\nolimits_{k = 1}^{n} {\sum\nolimits_{i = 1}^{c} {u_{ik}^{2} \sum\nolimits_{p = 1}^{P} {(x_{kj} - x_{pj}^{k,i} )^{2} } } } }}{\eta }} \right) \times \exp \left( { - \frac{{\phi_{2} + \eta }}{\eta }} \right) $$

Because \(\sum\nolimits_{r = 1}^{m} {f_{r} = 1}\)

$$ \sum\limits_{r = 1}^{m} {\exp \left( { - \frac{{\sum\nolimits_{k = 1}^{n} {\sum\nolimits_{i = 1}^{c} {u_{ik}^{2} (x_{kr} - v_{ir} )^{2} } +} \delta \sum\nolimits_{k = 1}^{n} {\sum\nolimits_{i = 1}^{c} {u_{ik}^{2} \sum\nolimits_{p = 1}^{P} {(x_{kr} - x_{pr}^{k,i} )^{2} } } } }}{\eta }} \right)} \times \exp \left( { - \frac{{\phi_{2} + \eta }}{{\phi_{2} }}} \right) = 1 $$
$$ \exp \left( { - \frac{{\phi_{2} + \eta }}{{\phi_{2} }}} \right) = \frac{1}{{\sum\nolimits_{r = 1}^{m} {\exp \left( { - \frac{{\sum\nolimits_{k = 1}^{n} {\sum\nolimits_{i = 1}^{c} {u_{ik}^{2} (x_{kr} - v_{ir} )^{2} } +} \delta \sum\nolimits_{k = 1}^{n} {\sum\nolimits_{i = 1}^{c} {u_{ik}^{2} \sum\nolimits_{p = 1}^{P} {(x_{kr} - x_{pr}^{k,i} )^{2} } } } }}{\eta }} \right)} }} $$
$$ f_{j} = \frac{{\exp ( - \frac{{\sum\nolimits_{k = 1}^{n} {\sum\nolimits_{i = 1}^{c} {u_{ik}^{2} (x_{kj} - v_{ij} )^{2} } +} \delta \sum\nolimits_{k = 1}^{n} {\sum\nolimits_{i = 1}^{c} {u_{ik}^{2} \sum\nolimits_{p = 1}^{P} {(x_{kj} - x_{pj}^{k,i} )^{2} } } } }}{\eta })}}{{\sum\nolimits_{r = 1}^{m} {\exp ( - \frac{{\sum\nolimits_{k = 1}^{n} {\sum\nolimits_{i = 1}^{c} {u_{ik}^{2} (x_{kr} - v_{ir} )^{2} } +} \delta \sum\nolimits_{k = 1}^{n} {\sum\nolimits_{i = 1}^{c} {u_{ik}^{2} \sum\nolimits_{p = 1}^{P} {(x_{kr} - x_{pr}^{k,i} )^{2} } } } }}{\eta })} }} $$
(8)

Set \(\frac{{\partial \varphi_{EFCM\_ALN} (U,V,F)}}{{\partial v_{ij} }} = 0\), we can get the optimal updating formula of V:

$$ v_{ij} = \frac{{\sum\nolimits_{k = 1}^{n} {u_{ik} f_{j} x_{kj} } }}{{\sum\nolimits_{k = 1}^{n} {u_{ik} f_{j} } }} $$
(9)

The mechanism of EFCM_ALN

In this paper, we propose a novel fuzzy clustering approach based on adaptive local neighborhood information, we put forward an entirely new idea that the membership degree values of an individual referring to different clusters should not only depend on the distances between the individual and different cluster centers, but also rely on the distances between the individual and several nearest neighbor data in different clusters based on the concept that data close to each other should be grouped together and be assigned to a same cluster. The conventional fuzzy clustering approach can take into account the nature of global data distribution well, an individual would belong to different clusters by different membership degree values, according to the different distances between the individual and different cluster centroids. With adaptive local neighborhood information, the proposed fuzzy clustering approach can consider both the global data distribution and the local data structures, the proposed fuzzy clustering approach could firstly adjust the local data partition, and gradually realize the global data partition adjustment through iteration. The relative effect of the adaptive neighborhood information item on the objective function of the proposed fuzzy clustering approach could be adjusted by the hyper-parameter \(\delta\) in the objective function, when \(\delta = 0\), the proposed fuzzy clustering approach would turn to the conventional fuzzy clustering approach without adaptive local neighborhood information.

Figure 4 shows the schematic of clustering mechanism of EFCM_ALN. Assuming that there is a dataset with two clusters, the clustering result by the conventional clustering algorithm is shown in Fig. 4a. From Fig. 4a, we can see: there are 3 data belonging to cluster 2 are assigned to cluster 1 mistakenly, how does EFCM_ALN works in such situation? From Fig. 4b, we can see: if we consider 3 nearest neighbor data in two different clusters, the sum of distances between data A and 3 nearest neighbor data in cluster 1 (3 data in red polygon in Fig. 4b is much smaller than the sum of distances between data A and 3 nearest neighbor data in cluster 2 (3 data in blue polygon in Fig. 4b, by the updating formula of U in Eq. (9), we know that an individual is inclined to be grouped to a cluster when the distance between the individual and the cluster center is smallest or the sum of the distances between the individual and P nearest neighbor data in the cluster is smallest. So, with the effect of the second item in the objective function of EFCM_ALN, data A would be grouped into the correct cluster with more similar data. From Fig. 4c, we can see: when data A is grouped into the correct cluster, if we consider 3 nearest neighbor data in two different clusters, the sum of distances between data B and 3 nearest neighbor data in cluster 1 (3 data in red polygon in Fig. 4c) is much smaller than the sum of distances between data B and 3 nearest neighbor data in cluster 2 (3 data in blue polygon in Fig. 4c). So, with the effect of the second item in the objective function of EFCM_ALN, data B would be grouped into the correct cluster with more similar data, as well. Gradually, with the effect of the second item in the objective function of EFCM_ALN, all misclustered data would be grouped into the correct cluster with more similar data.

Fig. 4
figure 4figure 4

Schematic of clustering mechanism of EFCM_ALN: a clustering result by the conventional clustering algorithm for a dataset with two clusters; b considering 3 nearest neighbor data in two different clusters for data A; c considering 3 nearest neighbor data in two different clusters for data B; d considering 3 nearest neighbor data in two different clusters for data C

The overall procedures of EFCM_ALN

The overall procedures and flowchart of the novel fuzzy clustering approach are shown in Table 1 and Fig. 5, respectively. It is worth noting that it is necessary to perform 0–1 normalization on the data to avoid undue influence of certain features on distance calculation; weighted Euclidean distance is used in the paper to calculate the distance between individuals, which may no longer be valid in high-dimensional data, so when dealing with high-dimensional data, feature dimension reduction algorithms such as principal components analysis (PCA) algorithm or autoencoder neural network based feature dimension reduction algorithm are recommended to be applied firstly.

Table 1 The procedures of the novel fuzzy clustering approach
Fig. 5
figure 5

The flowchart of the novel fuzzy clustering approach

Experiments and results

For assessing the clustering performance of the proposed fuzzy clustering algorithm, experiments on synthetic and real-world datasets by state-of-the-art clustering algorithms were conducted in this section.

In this paper, three performance metrics: accuracy (AC), normalized mutual information (NMI) and rand index (RI) [31] are utilized to measure the clustering performance by different clustering approaches. AC stands for the rate of data correctly assigned. RI stands for the rate of pairs of data assigned correctly. NMI stands for normalization of the mutual information (MI) score. \(AC \in [0,1]\), \(RI \in [0,1]\), \(NMI \in [0,1]\). The values of AC, RI and NMI are larger, the clustering performance is better.

In the objective function of the proposed fuzzy clustering approach, in this section, let’s analyze the different effect by different value of parameter \(P\) qualitatively. From Fig. 4, we can see: there are 3 data belonging to cluster 2 are assigned to cluster 1 mistakenly. We expect to improve the clustering performance by introducing the neighborhood information item, \(P \in [0,\min (Nc_{i} )]\), \(Nc_{i}\) represents the number of data assigned to the ith cluster during the clustering process. If \(P\) is set to be less than 3, take data A as an example, when we consider 2 nearest neighbor data in two different clusters, from Fig. 6, we can see: the distance between data A and 2 nearest neighbor data in cluster 1 (2 data in red polygon in Fig. 6) is almost equal to the distance between data A and 2 nearest neighbor data in cluster 2 (2 data in blue polygon in Fig. 6), so we can conclude that if the parameter \(P\) is set to a small value, the neighborhood information item has no obvious positive effect to the clustering performance. However, if the parameter \(P\) is set to a large value, the time complexity of the approach would increase. \(P\) is an important parameter, the choice of P value is very similar to the choice of K value in classical K-nearest neighbor (KNN) algorithm. To select the appropriate \(P\) value, we have conducted lots of experiments to compare the different clustering performance corresponding to different P values, we recommend that in practical application, the appropriate value of P should be larger than 3 and less than \(\left\lfloor {0.5\min (Nc_{i} )} \right\rfloor\), ⌊ ⌋ stands for round down operation. In the follow-up experiments of this paper, \(P\) is set to \(\left\lfloor {0.25\min (Nc_{i} )} \right\rfloor\), the neighborhood information is adaptively determined according to the minimum sample number of each cluster in each iteration of clustering process.

Fig. 6
figure 6

Schematic of EFCM_ALN considering 2 nearest neighbor data in two different clusters for dataset in Fig. 4

Experimental Results on synthetic dataset

The synthetic dataset shown in Fig. 1 is utilized for evaluating the performance of the approach put forward. In order to better verify the effect of the adaptive neighborhood information presented in this paper, the conventional fuzzy clustering algorithm and the proposed fuzzy clustering algorithm based on adaptive neighborhood information with feature weights not considered are applied for clustering, the clustering results by the conventional fuzzy clustering algorithm and the proposed fuzzy clustering algorithm considering adaptive neighborhood information with different values of hyper-parameter \(\delta\) are shown in Fig. 7. From Fig. 7, we can see: when the value of \(\delta = 0.08\), AC = 100%, however, by the conventional FCM clustering algorithm, AC = 94%, so we can conclude that by adjusting the value of hyper-parameter \(\delta\), the clustering performance could get improved and the presented improved fuzzy clustering algorithm has significant advantages.

Fig. 7
figure 7figure 7

The clustering results by the conventional fuzzy clustering algorithm and the proposed fuzzy clustering algorithm considering adaptive neighborhood information with different values of hyper-parameter \(\delta\)

Experimental Results on publicly real-world datasets

Frequently-used publicly available real-world datasets to evaluate the performance of clustering algorithms: Harberman, Breasttissue, Wdbc and Wine datasets in UCI Machine Learning Datasets (http://archive.ics.uci.edu/ml) are utilized. In Harberman database, there are 306 instances, 2 categories with 3 attributes; In BreastTissue database, there are 106 instances, 6 categories with 9 attributes; In Wdbc database, there are 569 instances, 2 categories with 30 attributes; In Wine database, there are 150 instances, 3 categories with 4 attributes.

The different values of the hyper-parameters: \(\delta\) and \(\eta\) have great influence on the clustering performance, we have carried out experiments to study the influence of different hyper-parameters, the clustering performance: NMI is compared when \(\delta\) varies from 0.001 to 0.01 with a step size of 0.002, varies from 0.01 to 0.1 with a step size of 0.01, varies from 0.1 to 2 with a step size of 0.2, and \(\eta\) varies from 0.1 to 1, with a step size of 0.1, varies from 1 to 20, with a step size of 1. As a rule of thumb, the suggested range of the value of \(\delta\) is \([0.001,0.8]\); the suggested range of the value of \(\eta\) is \([0.5,20]\). In the use of the proposed method, \(\delta\) and \(\eta\) are get tuned in the respective range, until the clustering algorithm get a satisfactory clustering performance. To obtain the optimal performance by tuning the hyper-parameters, the hyper-parameters optimization process should be under the help of internal cluster validation [33] and cross validation [34]. When \(\delta\) is set to a very small value, the effect of the adaptive local neighborhood information gets reduced, and the clustering results of the presented clustering approach and the conventional fuzzy clustering algorithm are basically the same. When the value of \(\delta\) get increased, the influence of the adaptive local neighborhood information in the proposed algorithm gets strengthened. When \(\eta\) is set to a very small value, the effect of the feature entropy regularization term gets reduced. When the value of \(\eta\) get increased, the influence of the feature entropy regularization term in the proposed algorithm gets strengthened, the difference between different feature weights would decrease.

We compare the clustering results produced by the proposed fuzzy clustering approach (EFCM_ALN) with the state-of-the-art clustering algorithms: K-means clustering algorithm [32], AFCM [22], EWFCM [30] and the conventional fuzzy clustering algorithm (FCM) [20]. To reduce the effect of initialization, the clustering results by the different clustering approaches are get over twenty independent experiments, the results are shown in Fig. 8 and Table 2.

Fig. 8
figure 8

The comparison of the mean AC among different datasets by different algorithms

Table 2 The comparison of the mean AC, RI, NMI among different datasets by different algorithms

From Fig. 8 and Table 2, we can see that EFCM_ALN provides higher successful clustering rates than the other fuzzy clustering algorithms, the mean values of three performance metrics: RI, AC and NMI of EFCM_ALN are all better. In comparison to AFCM, EWFCM, K-means and FCM, the mean AC of EFCM_ALN on Harberman dataset is 0.7581, higher than 0.5062 of AFCM, 0.5163 of K-means, 0.6619 of EWFCM and 0.5204 of FCM; the mean AC of EFCM_ALN on Breasttisue dataset is 0.5883, higher than 0.4921 of K-means, 0.5148 of AFCM, 0.5350 of EWFCM and 0.5074 of FCM; the mean AC of FCM_MEW on Wdbc dataset is 0.9384, higher than 0.9314 of AFCM, 0.9279 of K-means, 0.9297 of EWFCM and 0.9277 of FCM; the mean AC of EFCM_ALN on Wine dataset is 0.9494, higher than 0.9484 of AFCM, 0.9417 of EWFCM and 0.9146 of K-means, which indicates that the presented clustering method provides a new way to improve the performance of fuzzy clustering.

Conclusion

In this paper, based on the concept that data close to each other should be grouped together and be assigned to a same cluster, a novel fuzzy clustering algorithm in which the adaptive local neighborhood information of each data referring to different clusters is taken into consideration, a new scheme to search for the outliers is presented, a method for identifying the different importance of different features is introduced, is put forward, a new objective function is constructed to formulate the framework. By minimizing the objective function, the optimal iterative formula of the different feature weights, the membership degrees and cluster centroids are derived. The presented approach provides a new way to enhance the performance of fuzzy clustering, comprehensive experiments and theoretical analysis on both synthetic and publicly real-world datasets have demonstrated that the algorithm put forward can effectively enhancing the clustering performance.

Weighted Euclidean distance is used in the paper to calculate the distance, which may no longer be valid in high-dimensional data, so when dealing with high-dimensional data, feature dimension reduction algorithms such as principal components analysis (PCA) algorithm or autoencoder neural network based feature dimension reduction algorithm are recommended to be applied firstly. Considering the adaptive local neighborhood information of each data referring to different clusters, autoencoder neural network based deep fuzzy clustering approach would be the future research direction.