Differential evolution-based transfer rough clustering algorithm

Zhao, Feng; Wang, Chaofei; Liu, Hanqiang

doi:10.1007/s40747-023-00987-8

Differential evolution-based transfer rough clustering algorithm

Original Article
Open access
Published: 27 February 2023

Volume 9, pages 5033–5047, (2023)
Cite this article

Download PDF

You have full access to this open access article

Complex & Intelligent Systems Aims and scope Submit manuscript

Differential evolution-based transfer rough clustering algorithm

Download PDF

1111 Accesses
4 Citations
Explore all metrics

Abstract

Due to well processing the uncertainty in data, rough clustering methods have been successfully applied in many fields. However, when the capacity of the available data is limited or the data are disturbed by noise, the rough clustering algorithms always cannot effectively explore the structure of the data. Furthermore, rough clustering algorithms are usually sensitive to the initialized cluster centers and easy to fall into local optimum. To resolve the problems mentioned above, a novel differential evolution-based transfer rough clustering (DE-TRC) algorithm is proposed in this paper. First, transfer learning mechanism is introduced into rough clustering and a transfer rough clustering framework is designed, which utilizes the knowledge from the related domain to assist the clustering task. Then, the objective function of the transfer rough clustering algorithm is optimized by using the differential evolution algorithm to enhance the robustness of the algorithm. It can overcome the sensitivity to initialized cluster centers and meanwhile achieve the global optimal clustering. The proposed algorithm is validated on different synthetic and real-world datasets. Experimental results demonstrate the effectiveness of the proposed algorithm in comparison with both traditional rough clustering algorithms and other state-of-the-art clustering algorithms.

A Note on Objective-Based Rough Clustering with Fuzzy-Set Representation

Various Types of Objective-Based Rough Clustering

A Novel Rough Set Based Clustering Approach for Streaming Data

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

As an unsupervised machine learning method, clustering algorithm has been widely used in pattern recognition, data mining, image processing, and other fields. It can automatically cluster objects according to the inherent characteristics of data, in which similar objects can be divided into the same categories. Scholars have developed many clustering methods based on different concepts: (1) prototype-based clustering [1]; (2) density-based clustering [2, 3]; (3) graph-based clustering [4, 5]; (4) other clustering based on model [6]. Among these types of methods, prototype-based clustering is perhaps the most popular one, and has been studied extensively. Hard c-means (HCM) [7] is one of the most widely used prototype-based clustering algorithms. HCM divides data into disjoint clusters, where each data element belongs to exactly one cluster. In the real-world scene, however, the data distribution has some uncertainties, such as overlapping, crossing, and so on. Therefore, hard clustering algorithms have great limitations in practical application.

To deal with the uncertainty in data, theories of uncertainty are integrated into clustering algorithms. Fuzzy c-means (FCM) [8] introduces the fuzzy set theory into HCM and adopts the concept of membership with values in continuous range between 0 and 1 to represent the belongingness of an object to multiple clusters. In contrast to HCM, FCM belongs to the prototype-based soft clustering algorithm. Possibilistic c-means (PCM) [9] relaxes the restriction that the sum of memberships of an object to all clusters is one in FCM. PCM behaves well on data with outliers. Rough set theory [10, 11] proposed by Pawlak can deal with the uncertainty, and inherent incompleteness in data by considering rough approximations in roughly granulated spaces. Considering the characteristic of rough set theory, rough clustering methods are developed by incorporating rough set theory and clustering method. Lingras and West [12] first proposed rough c-means (RCM) which introduces the lower approximation, upper approximation, and boundary region to represent the certain, possible, and uncertain belongingness of objects to clusters. Many variants of RCM [13, 14] have been proposed these years. Ubukata et al. proposed a novel rough clustering framework called objective function-based rough membership c-means clustering (RMCM2) [15]. RMCM2 algorithm adopts rough membership function, which considers the neighborhood information of data and designs the objective function derived from rough membership c-means algorithm [16].

Traditional rough clustering methods usually cannot effectively explore the structure of the data when the capacity of the available data is limited or the data are disturbed by noise. Several clustering methods have been developed to solve the problems mentioned above, such as co-clustering [17], multitask learning [18], semi-supervised learning [19], and transfer learning [20, 21]. Transfer learning is perhaps the most promising model due to its specific mechanism. In the past decade or so, many unsupervised transfer clustering algorithms have been developed by combining clustering algorithms with transfer learning. According to the transfer method, transfer clustering algorithms can be roughly divided into four categories [22]: instance-based [23], feature-representation-based [24, 25], parameter-based [26,27,28,29], relational knowledge-based [30]. The earliest study is the self-taught clustering (STC) based on mutual information proposed by Dai et al. [24]. After that, Sun et al. proposed a transfer maximum entropy clustering algorithm based on maximum entropy clustering [29]. In [21, 26], transfer learning was applied to prototype-based fuzzy clustering. The recently proposed transfer learning possibilistic c-means (TLPCM) [28] works in applications where data are limited or polluted by noise.

Furthermore, the traditional rough clustering algorithms are sensitive to the initialized cluster centers and easy to fall into local optimum. The differential evolution (DE) algorithm is a nature-inspired optimization algorithm, which can approach the global optimum for a specific problem. DE has been widely studied and applied because of its simple implementation and fast convergence [31, 32]. The clustering algorithms based on differential evolution avoid the algorithms falling into local optimization and enhance the robustness of the clustering algorithms.

Based on the problems mentioned above, a novel differential evolution-based transfer rough clustering algorithm (DE-TRC) is proposed in this paper. Using the special mechanism of transfer learning, the objective function of unsupervised transfer rough clustering algorithm is designed in this paper. It combines transfer learning and rough clustering algorithm to improve the performance of the algorithm in classifying sparse data and impure data. In addition, the differential evolution algorithm is introduced to optimize the objective function of the clustering algorithm, which improves the robustness of the algorithm. The experimental results presented here are thoroughly evaluated. By comparing with the state-of-the-art clustering algorithm, the advantages of DE-TRC on both synthetic and real-world data sets are proved.

The rest of this paper is organized as follows. In “Related works”, the related works of this paper including rough clustering and transfer mechanism are introduced. The implementation details of the suggested model are described in “Proposed methods”. The experiments of synthetic and real-world datasets are carried out in “Experiment analysis”. “Conclusion” provides conclusions of this paper.

Related works

In this section, we review and introduce the rough set theory and the method of prototype-based transfer clustering matching mechanism, which are the two important theories related to our proposed method.

Rough set theory and clustering

Rough set theory is a mathematical tool for dealing with uncertain problems, which is considered from the perspective of granular computation [33]. Rough set is the approximation of a vague concept by a pair of precise concepts, namely, lower and upper approximations.

Let $U = \{ x_{1} ,x_{2} , \cdots ,x_{N} \}$ be a set of N objects and $F = \{ X_{1} ,X_{2} , \cdots ,X_{C} \}$ be a family of C clusters on U. $\mu_{ci}^{R}$ is the rough membership of x_i to X_c with respect to the neighborhood relationship R and is computed by:

$$ \mu_{ci}^{R} = \frac{{\sum\nolimits_{t = 1}^{N} {R_{it} u_{ct} } }}{{\sum\nolimits_{t = 1}^{N} {R_{it} } }}, $$

(1)

where, u_ci is the degree of x_i belongs to X_c and is denoted by:

$$ u_{ci} = \left\{ \begin{gathered} 1\;{\text{if}}\;x_{i} \; \in \;X_{c} \hfill \\ 0\;{\text{else}} \hfill \\ \end{gathered} \right.. $$

(2)

R_it represents the nearest relation and is represented by the following condition:

$$ R_{it} = \left\{ \begin{gathered} 1{\text{ if }}x_{t} \; {\text{is in the }}k{\text{ nearest neighbor of}} \,x_{i} \hfill \\ 0{\text{ else}} \hfill \\ \end{gathered} \right.. $$

(3)

k can be determined according to the strategy studied in Ref. [15]. In order to describe the lower approximation, upper approximation, and boundary region of X_c, their corresponding memberships $\underline {u}_{ci} ,\,\overline{u}_{ci}$, and $\hat{u}_{ci}$ can be calculated by using the rough membership:

$$ \underline {u}_{ci} = \left\{ \begin{gathered} 1 \, (\mu_{ci}^{R} = 1) \hfill \\ 0 \, (otherwise) \hfill \\ \end{gathered} \right. $$

(4)

$$ \hat{u}_{ci} = \left\{ \begin{gathered} 1 \, (\mu_{ci}^{R} \in (0,1)) \hfill \\ 0 \, (otherwise) \hfill \\ \end{gathered} \right. $$

(5)

$$ \overline{u}_{ci} = \left\{ \begin{gathered} 1 \, (\mu_{ci}^{R} > 0) \hfill \\ 0 \, (otherwise) \hfill \\ \end{gathered} \right.. $$

(6)

Prototype-based transfer mechanism

Generally, transfer learning can utilize source data to improve the learning result of target data, if the useful knowledge can be extracted from the source domain and is transferred to the target domain [20]. For prototype-based clustering problem, the most critical information is the cluster centers. Therefore, a prototype-based transfer clustering mechanism is given as follows:

$$ \Delta (\tilde{V}_{S} ,V_{T} ) = \sum\limits_{c = 1}^{C} {\left\| {\tilde{v}_{c} - v_{c} } \right\|^{2} } , $$

(7)

where $\tilde{v}_{c}$ denotes the cth cluster center in the source domain and v_c is the cth cluster center in the target domain. The number C of clusters in the source domain and target domain is the same. This transfer clustering mechanism can be introduced into the clustering objective functions to improve the clustering performance.

In addition, in order to process the different number of clusters between the source domain and the target domain, a prototype-based transfer matching mechanism [21] is proposed and given as follows:

$$ \Delta (\tilde{V}_{S} ,V_{T} ) = \sum\limits_{k = 1}^{{C_{S} }} {\sum\limits_{c = 1}^{{C_{T} }} {\left( {r_{ck} } \right)^{m} \left\| {\tilde{v}_{k} - v_{c} } \right\|^{2} } } , $$

(8)

where $\tilde{v}_{k}$ denotes kth cluster center in the source domain and v_c is the cth cluster center in the target domain. C_S and C_T are the number of clusters in the source domain and target domain, respectively. r_ck represents the similarity between the kth center in the source domain and the cth center in the target domain. m is the fuzzier and is usually set to 2.

Proposed methods

In order to effectively explore the structure of the data and avoid the influence of the initialized cluster centers, a differential evolution-based transfer rough clustering algorithm is proposed to improve the clustering performance when the data in the target domain is insufficient or is disturbed by noise. As shown in Fig. 1, the learned knowledge, i.e., cluster centers, is usually available from the source domain. Then, the knowledge applies to the target domain through the transfer learning strategy to assist the target domain to perform the clustering task effectively. Furthermore, the objective function of the transfer rough clustering algorithm is optimized by using the differential evolution algorithm to enhance the robustness of the algorithm.

Transfer rough clustering algorithm (TRC)

We firstly propose a transfer rough clustering algorithm (TRC). The objective function of TRC fully utilizes the target domain data X_T and cluster centers $\tilde{V}_{S} = \{ \tilde{v}_{k} \} (k = 1,2,...,C_{S} )$ of the source domain as the auxiliary knowledge. The cluster centers $\tilde{V}_{S}$ of source domain were obtained by the classical RMCM2 algorithm. The specific form of this objective function is designed as follows.

$$ \begin{aligned} & \min \cdot J_{TRC} = \left( {1 - \lambda } \right)\sum\limits_{c = 1}^{C_T} {\sum\limits_{i = 1}^{N_T} {\mu_{ci}^{R} \left\| {x_{i} - v_{c} } \right\|^{2} } }\\ & + \lambda \sum\limits_{k = 1}^{C_S} {\sum\limits_{c = 1}^{C_S} {\left( {r_{ck} } \right)^{m} \left\| {\tilde{v}_{k} - v_{c} } \right\|^{2} } } \\ & s.t. \, \sum\limits_{c = 1}^{C_T} {\mu_{ci}^{R} } = 1, \, \sum\limits_{k = 1}^{C_S} {r_{ck} } = 1, \end{aligned} $$

(9)

where x_i is ith sample in the target domain, N_T is the number of samples in the target domain. $\mu_{ci}^{R}$ is a rough membership function, which represents the degree to which the ith sample belongs to the cth cluster center. $\left\| {x_{i} - v_{c} } \right\|$ is the Euclidean distance between the ith sample and the cth cluster center in the target domain. $\lambda$ is a trade-off parameter.

In Eq. (9), the first term measures the internal compactness in the target domain data and the second term measures the similarity between cluster centers in the target domain and the cluster centers in the source domain. Through minimizing this objective function by using the Lagrange multipliers, the update expressions of r_ck and v_c are given as follows,

$$ r_{ck} = \frac{1}{{\sum\limits_{l = 1}^{{C_{S} }} {\left( {\frac{{\left\| {\tilde{v}_{k} - v_{c} } \right\|}}{{\left\| {\tilde{v}_{l} - v_{c} } \right\|}}} \right)^{{\frac{2}{m - 1}}} } }} $$

(10)

$$ v_{c} = \frac{{\left( {1 - \lambda } \right)\sum\nolimits_{i = 1}^{{N_{T} }} {\mu_{ci}^{R} x_{i} } + \lambda \sum\nolimits_{k = 1}^{{C_{S} }} {\left( {r_{ck} } \right)^{m} \tilde{v}_{k} } }}{{\left( {1 - \lambda } \right)\sum\nolimits_{i = 1}^{{N_{T} }} {\mu_{ci}^{R} } + \lambda \sum\nolimits_{k = 1}^{{C_{S} }} {\left( {r_{ck} } \right)^{m} } }}. $$

(11)

The rough membership $\mu_{ci}^{R}$ depends on the nearest neighbor matrix R_it and membership u_ci, as shown in Eq. (3). The update rule of u_ci is given as follows [15]:

$$ u_{ci} = \left\{ \begin{gathered} 1{ (}c = \mathop {argmin}\limits_{{1 \le l \le C_{T} }} \sum\limits_{k = 1}^{{N_{T} }} {\frac{{R_{ki} }}{{\sum\nolimits_{t = 1}^{{N_{T} }} {R_{kt} } }}\left\| {x_{k} - v_{l} } \right\|^{2} } {)} \hfill \\ 0 \, \left( {otherwise} \right) \hfill \\ \end{gathered} \right.. $$

(12)

A pseudocode for TRC algorithm is given in Algorithm 1.

Differential evolution-based transfer rough clustering algorithm (DE-TRC)

Although the proposed TRC utilizes the source data to improve the clustering performance, it may be influenced by the initialized cluster centers and is easy to fall into local optimum. In order to solve the above problems, differential evolution (DE) algorithm is introduced and differential evolution-based transfer rough clustering algorithm is proposed. In this method, there are three important parts: population initialization, evolutionary operators, and fitness function computation. The fitness function describes a quality of the individual in a population. Therefore, the objective function designed in Eq. (9) is adopted as the fitness function in our method.

Population initialization

The population contains the candidate solutions to the clustering problem. In DE-TRC, the solution is the cluster centers. Therefore, we select the random population initialization technique to generate the population. D is the dimension of the problem and NP is the population size which is often specified by user. As shown in Fig. 2, a chromosome is represented as a vector of real numbers of six dimension. The number of cluster centers is 3.

Evolution optimization process

After initialization, some evolutionary operations including mutation, crossover, and selection [34] are performed to evolve the population. To generate an offspring $p_{i,t + 1}$ of chromosome $p_{i,t}$ in the tth generation, three different chromosomes are randomly selected from the population to mutate a differential vector by using the following equation,

$$ o_{i,t} = p_{j,t} + F \cdot (p_{m,t} - p_{n,t} ), $$

(13)

where F denotes a scalar number (that typically lies in the interval [0.4, 1]). $p_{j,t} ,\;p_{m,t}$, and $p_{n,t}$ represent three different chromosomes, respectively.

Then, a binomial crossover operator is applied on $o_{i,t}$ and $p_{i,t}$. In particular, the offspring $w_{i,t}$ is generated by:

$$ w_{i,t}^{j} = \left\{ \begin{gathered} o_{i,t}^{j} \, if \, r\;and\;(0,\;1) \le CR \, \hfill \\ p_{i,t}^{j} {\text{ otherwise}} \hfill \\ \end{gathered} \right., $$

(14)

where CR denotes crossover control parameter. $w_{i,t}^{j}$ represents the jth gene of the ith chromosome in the tth generation.

Finally, in order to evaluate the offspring $w_{i,\;t}$ and the chromosome $p_{i,t}$, their corresponding fitness value should be computed by Eq. (9). Only the good one can be passed to the next generation

$$ p_{i,t + 1} = \left\{ \begin{gathered} w_{i,t} \, if \, Fitness\;(w_{i,t} ) \le Fitness\;(p_{i,t} ) \hfill \\ p_{i,t} {\text{ otherwise}} \hfill \\ \end{gathered} \right.. $$

(15)

The above operators are repeated until the stopping criterion is met. The details of the DE-TRC algorithms can be described as Algorithm 2.

In this work, a setting of NP = 100 is sufficient for reliable convergence behavior. If the NP is too large, the computation complexity is high. If the NP is too small, the diversity of the population seems too small to escape a local minimum [45, 47]. As for F, F = 0.5 is usually a good initial choice [31, 43]. It is feasible in all data sets with F = 0.5 in the manuscript. The crossover constant CR is a real number from interval [0, 1]. The CR should not be too large to avoid that the perturbations get too high and the convergence speed decreases. According to the suggestions in [45, 46], we set CR = 0.3 in this manuscript. Usually, the algorithm is stopped after exceeding a maximum number of iterations [44]. To achieve global convergence in this manuscript, we set the maximum number of iterations is 500 according to [47].

Time complexity analysis

We assume that the size of a dataset in target domain is $N_{T}$, the number of clusters in target domain and source domain are $C_{T}$ and $C_{S}$, respectively. The population size is N, the number of generations is T. In each generation, the computation of fitness function consumes more time than other operations. For each individual, the time complexity for calculating the fitness $J_{TRC}$ is $O\;\left( {C_{T} N_{T}^{2} + C_{T} C_{S} } \right)$. Then, the time complexity consumed by the fitness calculations in the whole population is $O\;\left( {N\;\left( {C_{T} N_{T}^{2} + C_{T} C_{S} } \right)} \right)$.Therefore, the total time complexity of the DE-TRC algorithm is $O\;\left( {TNC_{T} \;\left( {N_{T}^{2} + C_{S} } \right)} \right)$.

Experiment analysis

In this section, we first verify the superiority of the transfer learning mechanism and differential evolution algorithm on synthetic and real-world datasets. Then, we conduct various experiments to evaluate the clustering performance of TRC, DE-TRC, some relevant clustering methods, and transfer clustering methods, including FCM [8], RCM [12], FCDE [35], GARCM [36], E-TFCM [21], and STC [24]. Respective parameters for all algorithms are presented in Table 1. We use three popular external measures: accuracy (ACC) [38], normalized mutual information (NMI) [39] and rand index (RI) [40] as evaluation criteria. All of them lie between 0 and 1. The higher the index value is, the better the clustering performance is. In this experiment, we run all algorithms 10 times on the datasets, and calculate the mean and standard deviation to evaluate the algorithm performance.

Table 1 Algorithms and the parameters settings

Full size table

All the experimental methods in this paper are performed using MATLAB R2018b installed on the 64-bit Windows 10 operating system with an Intel(R) Core i5-10400F CPU and 16 GB RAM.

Synthetic datasets

Five different synthetic two-dimensional datasets are generated to evaluate the performance of the proposed algorithm in this experiment. In real world, there may be insufficient available data in some special or emerging fields [20]. To simulate above scenario, we create T1 and T2, which have the same distribution as the source domain data. The main difference is that T1 and T2 have fewer samples in each cluster than the source domain. Noise and outlier are inevitable in the process of data acquisition. These disturbances limit the performance of common clustering algorithms. To verify the anti-noise performance of the proposed algorithm, in this paper, we create T3 and T4 with additive and multiplicative noise, respectively. It is known that the performance of classical clustering algorithms is limited when clustering unbalanced data [42]. In this paper, an unbalance data T5 is created, in which the number of samples in different categories varies greatly. Furthermore, existing transfer clustering algorithms have a common limitation, i.e., the numbers of clusters in the source and target domains must be the same. However, in most practical applications, the above assumption cannot always hold. Therefore, T6 is constructed to verify the performance of clustering algorithms in this situation.

S1 and T1 are generated as the dataset in the source domain and target domain (Synthetic dataset 1), respectively, which are uniform distribution with same maximum and minimum range values and derived from Ref. [37]. The parameters used to generate other four synthetic datasets are listed in Table 2. All source data and target data are shown in Figs. 3 and 4.

Table 2 Parameters used to generate the synthetic datasets

Full size table

Real-world datasets

To further validate the proposed algorithm in the real-world scenario, this paper develops a series of experiments for UCI datasets [41]. The characteristics of five classical datasets used in this paper are shown in Table 3. In order to verify the clustering performance of DE-TRC algorithm on the limited real-world datasets, each dataset is divided into source domain data and target domain data. 10% of the full data are the target data and the rest are served as the source data.

Table 3 Characteristics of the real-world datasets

Full size table

Parameter analysis

In the proposed method, the parameter $\lambda$ is used to weight the proportion of two terms in the objective function, and may greatly affect the performance of the algorithm. When $\lambda = 1$, the DE-TRC algorithm degenerates into the RMCM2 algorithm. When $\lambda = 0$, the clustering results are completely determined by transfer term, that is, the final cluster centers are completely depends on the knowledge of the source domain. In this section, T2, T5, T6, Iris, Haberman, and Wine are selected to validate the parameter sensitivity analysis. $\lambda$ is varied from 0 to 1 in 0.05 steps and the impact of $\lambda$ on the performance of the algorithm is presented in Figs. 5 and 6.

It can be seen from Figs. 5 and 6 that the clustering performance of almost all data sets improve with the increase of parameter $\lambda$. In Figs. 5(b), 6(a), and 6(c), when $\lambda$ is higher than 0.8, DE-TRC obtains the satisfactory performance in terms of ACC, NMI, and RI. In Figs. 5(a) and 6(b), the clustering performance of DE-TRC is improved when $\lambda$ is higher than 0.9 and 0.6, respectively. But the performance index of ACC decreases to 0.6129 when the value of $\lambda$ is 0.95 in Fig. 6(b). Therefore, $\lambda$ is assigned to 0.9 in the following experiment.

Experiments to verify the superiority of transfer mechanism and evolution algorithm

In order to verify the superiority of the transfer learning mechanism and differential evolution algorithm, we first set up experiments and select three comparison algorithms: RMCM2, TRC, and DE-TRC. Figures 7 and 8 show the scatter plots of clustering results obtained by RMCM2 and DE-TRC algorithm, respectively. Samples in different categories are represented by different colors and markers, and the cluster prototypes are marked by black pentagrams. We can see from Figs. 7(c) and 8(c), the obtained prototype of RMCM2 algorithm is far away from the cluster center of cluster 2 due to the influence of noise. DE-TRC algorithm obtains the ideal cluster prototypes. Furthermore, as shown in Figs. 7(e) and 8(e), the resulting prototype of cluster 3 obtained by RMCM2 is biased toward other clusters, and DE-TRC algorithm can find more desirable clustering prototypes. In order to quantitatively verify the importance of transfer learning mechanism and evolutionary optimization in DE-TRC proposed, RI is selected as the performance index, and the results of original RMCM2, TRC, and DE-TRC algorithms on synthetic datasets and real-world datasets are listed in Tables 4 and 5, respectively. In these tables, the best algorithm is highlighted in bold.

Table 4 The performance index RI obtain on the synthetic datasets by RMCM2, TRC, DE-TRC

Full size table

Table 5 The performance index RI obtain on the real-world datasets by RMCM2, TRC, DE-TRC

Full size table

As shown in Tables 4 and 5, the clustering results of TRC are obviously superior to original RMCM2 algorithm on most of data sets. For example, although TRC obtains the higher mean value of RI than RMCM2 on all datasets, the large standard deviation means that the performance of TRC algorithm is a little unstable. DE-TRC obtains the mean value 1.0000 and mean standard deviation 0.0000 on datasets T1, T2, and T5. Due to introducing the evolutionary optimization algorithm, DE-TRC not only gets a better clustering result but also achieves a better convergence.

Experiments of related comparative algorithm

To further prove the superiority and stability of the proposed algorithm, we conduct comparative experiments of six state-of-the-art algorithms on all data sets. The comparative experimental results on the synthetic data sets and real-world data sets are shown in Tables 6 and 7, respectively. For STC algorithm, the number of clusters in source and target domain data should be equal. Thus, it is not applicable to T6.

Table 6 Performance comparisons of involved algorithms on synthetic datasets

Full size table

Table 7 Performance comparisons of involved algorithms on real-world datasets

Full size table

Based on the result presented in Tables 6 and 7, we give some analyses as follows. For T1, T2, and T5, due to utilizing the advanced knowledge coming from the source domain, DE-TRC obtains entirely correct clustering result. Datasets T3 and T4 were polluted by either the interference data or the noise. In general, E-TFCM and DE-TRC obtained better results than other algorithms because of the noise robust capability of transfer methods. In addition, benefitting from the evolutionary optimization method, DE-TRC owns the most stable clustering performance among all comparison algorithms.

Time efficiency analysis

The computational time of DE-TRC and other comparison algorithms on five real-world data sets are listed in Table 8. All algorithms can be divided into three categories.

The classical clustering algorithms: FCM, RCM, and RMCM2.
The transfer clustering algorithms: E-TFCM, STC.
The clustering algorithms based on evolutionary optimization: FCDE, GARCM, DE-TRC.

Table 8 Computation time of all algorithms (s)

Full size table

It is known that clustering algorithms based on evolutionary optimization require more running time to compute fitness functions of individuals in each generation. Therefore, they required more computational time. However, they can obtain better clustering results and more stable clustering performance than classical clustering algorithms and transfer clustering algorithms, as shown in Tables 6 and 7.

Conclusion

Based on the traditional rough clustering algorithm, we design an unsupervised transfer rough clustering framework based on differential evolution algorithm (DE-TRC). Firstly, the introduction of the transfer learning mechanism can help DE-TRC work well on the data which are limited or polluted by noise. Secondly, the differential evolution algorithm is used to optimize the objective function of transfer rough clustering algorithm, which effectively solves the problem of sensitivity to the initial cluster centers. Experimental results show that the proposed DE-TRC algorithm is able to provide better clustering result than the traditional rough clustering algorithms and other state-of-the-art clustering algorithms in both artificial synthetic and real-world datasets. However, it is possible to further improve the efficiency of the algorithm in this paper. Future study will focus on proposing more effective evolutionary strategies to improve the overall efficiency of the algorithm.

Data availability

The real-data that support the findings of this study are openly available in UCI Machine Learning Repository at https://archive.ics.uci.edu/ml/index.php. And the synthetic data that support the findings of this study are available from the author, Wang, upon reasonable request.

References

Huang JZ, Ng MK, Rong H, Li Z (2005) Automated variable weighting in k-means type clustering. IEEE Trans Pattern Anal Mach Intell 27:657–668. https://doi.org/10.1109/TPAMI.2005.95
Article Google Scholar
Ester M, Kriegel H-P, Xu XA (1996) Density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 6:226–231. https://doi.org/10.5555/3001460.3001507
Article Google Scholar
Pei T, Jasra A, Hand DJ et al (2008) DECODE: a new method for discovering clusters of different densities in spatial data. Data Min Knowl Discov 18:337. https://doi.org/10.1007/s10618-008-0120-3
Article MathSciNet Google Scholar
Hartuv E, Shamir R (2000) A clustering algorithm based on graph connectivity. Inf Process Lett 76:175–181. https://doi.org/10.1016/S0020-0190(00)00142-3
Article MathSciNet MATH Google Scholar
Qian P, Chung F-L, Wang S, Deng Z (2012) Fast graph-based relaxed clustering for large data sets using minimal enclosing ball. IEEE Trans Syst Man Cybern Part B Cybern 42:672–687. https://doi.org/10.1109/TSMCB.2011.2172604
Article Google Scholar
Tseng LY, Yang SB (2001) A genetic approach to the automatic clustering problem. Pattern Recognit 34:415–424. https://doi.org/10.1016/S0031-3203(00)00005-4
Article MATH Google Scholar
Queen J (1966) Some methods for the classification and analysis of multivariate observations. Proc Fifth Berkely Symp Math Stat Probab 1:281–297
Google Scholar
Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10:191–203. https://doi.org/10.1016/0098-3004(84)90020-7
Article Google Scholar
Krishnapuram R, Keller JM (1993) A possibilistic approach to clustering. Trans Fuz Sys 1:98–110. https://doi.org/10.1109/91.227387
Article Google Scholar
Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11:341–356. https://doi.org/10.1007/BF01001956
Article MATH Google Scholar
Pawlak Z (1997) Rough set approach to knowledge-based decision support. Eur J Oper Res 99:48–57. https://doi.org/10.1016/S0377-2217(96)00382-7
Article MATH Google Scholar
Lingras P, West C (2004) Interval set clustering of web users with rough k-means. J Intell Inf Syst 23:5–16. https://doi.org/10.1023/B:JIIS.0000029668.88665.1a
Article MATH Google Scholar
Peters G (2006) Some refinements of rough k-means clustering. Pattern Recogn 39:1481–1491. https://doi.org/10.1016/j.patcog.2006.02.002
Article MATH Google Scholar
Ubukata S, Notsu A, Honda K (2016) The Rough Set k-Means Clustering. In: 2016 Joint 8th International Conference on Soft Computing and Intelligent Systems (SCIS) and 17th International Symposium on Advanced Intelligent Systems (ISIS). pp 189–193. https://doi.org/10.1109/SCIS-ISIS.2016.0049
Ubukata S, Notsu A, Honda K (2021) Objective function-based rough membership C-means clustering. Inf Sci 548:479–496. https://doi.org/10.1016/j.ins.2020.10.037
Article MathSciNet MATH Google Scholar
Ubukata S, Notsu A, Honda K (2016) The Rough Membership k-Means Clustering. In: Huynh V-N, Inuiguchi M, Le B et al (eds) Integrated uncertainty in knowledge modelling and decision making. Springer International Publishing, Cham, pp 207–216
Chapter Google Scholar
Dhillon I, Mallela S, Modha D (2003) Information-theoretic co-clustering. Proc Ninth ACM SIGKDD Intern Conf Knowled Dis Data Min. https://doi.org/10.1145/956750.956764
Article Google Scholar
Caruana R (1998) Multitask Learning. In: Thrun S, Pratt L (eds) Learning to learn. Springer, US, Boston, pp 95–133
Chapter Google Scholar
Zhu X, Ghahramani Z, Lafferty J (2003) Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions. In: ICML-03, 20th International Conference on Machine Learning. 912–919
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22:1345–1359. https://doi.org/10.1109/TKDE.2009.191
Article Google Scholar
Deng Z, Jiang Y, Chung F-L et al (2016) Transfer prototype-based fuzzy clustering. IEEE Trans Fuzzy Syst 24:1210–1232. https://doi.org/10.1109/TFUZZ.2015.2505330
Article Google Scholar
Wang F, Jiao L, Pan Q (2021) A Survey on Unsupervised Transfer Clustering. In: 2021 40th Chinese Control Conference (CCC). IEEE. Shanghai, China. pp 7361–7365.
Kong S, Wang D (2012) Transfer heterogeneous unlabeled data for unsupervised clustering. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012). pp 1193–1196. http://doi.ieeecomputersociety.org/
Dai W, Yang Q, Xue G-R, Yu Y (2008) Self-taught clustering. In: Proceedings of the 25th international conference on Machine learning - ICML ’08. ACM Press. Helsinki, Finland. pp 200–207.
Jiang W, Liu W, Chung F (2018) Knowledge transfer for spectral clustering. Pattern Recognit 81:484–496. https://doi.org/10.1016/j.patcog.2018.04.018
Article Google Scholar
Qian P, Zhao K, Jiang Y et al (2017) Knowledge-leveraged transfer fuzzy C -Means for texture image segmentation with self-adaptive cluster prototype matching. Knowl-Based Syst 130:33–50. https://doi.org/10.1016/j.knosys.2017.05.018
Article Google Scholar
Dang B, Zhou J, Liu X, et al (2019) Transfer Learning Based Kernel Fuzzy Clustering. In: 2019 International Conference on Fuzzy Theory and Its Applications (iFUZZY). pp 21–25.
Gargees R, Keller JM, Popescu M (2021) TLPCM: transfer learning possibilistic c-means. IEEE Trans Fuzzy Syst 29:940–952. https://doi.org/10.1109/TFUZZ.2020.3005273
Article Google Scholar
Sun S, Jiang Y, Qian P (2014) Transfer learning based maximum entropy clustering. In: 2014 4th IEEE International Conference on Information Science and Technology. pp 829–832.
Li S, Fu Y (2016) Unsupervised transfer learning via Low-Rank Coding for image clustering. In: 2016 International Joint Conference on Neural Networks (IJCNN). pp 1795–1802.
Storn R, Price K (1997) Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11:341–359. https://doi.org/10.1023/A:1008202821328
Article MathSciNet MATH Google Scholar
Zhang J, Avasarala V, Sanderson AC, Mullen T (2008) Differential evolution for discrete optimization: An experimental study on Combinatorial Auction problems. In: 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence). pp 2794–2800.
Miao D, Shengdan HU (2019) Uncertainty analysis based on granular computing. J Northwest Univ Sci Ed 49:487–495
MATH Google Scholar
Zhang J, Sanderson AC (2009) JADE: adaptive differential evolution with optional external archive. IEEE Trans Evol Comput 13:945–958. https://doi.org/10.1109/TEVC.2009.2014613
Article Google Scholar
Kao Y, Lin J-C, Huang S-C (2008) Fuzzy Clustering by Differential Evolution. In: 2008 Eighth International Conference on Intelligent Systems Design and Applications. pp 246–250.
Lingras P (2009) Evolutionary Rough K-Means Clustering. In: Wen P, Li Y, Polkowski L et al (eds) Rough sets and knowledge technology. Springer, Berlin, Heidelberg, pp 68–75
Chapter Google Scholar
He J, Zhao G, Zhang HL, et al (2014) An Effective Clustering Algorithm for Auto-Detecting Well-Separated Clusters. In: 2014 IEEE International Conference on Data Mining Workshop. pp 867–874.
Wu M, Schölkopf B (2006) A local learning approach for clustering. Adv Neural Inf Process Syst 19:1529–1536
Google Scholar
Strehl A, Ghosh J, Mooney R (2000) Impact of Similarity Measures on Web-page Clustering. Workshop Artif. Intell. Web Search AAAI, Austin Texas, pp 58–64
Google Scholar
D.M. Christopher, R. Prabhakar, S. Hinrich 2008 Introduction to Information Retrieval. Cambridge University Press.
Dua D, Graff C (2017) UCI Machine Learning Repository. http://archive.ics.uci.edu/m1 . Accessed 1 October 2021.
Rezaei M, Fränti P (2016) Set-matching methods for external cluster validity. IEEE Trans Knowl Data Eng 28:2173–2186. https://doi.org/10.1109/TKDE.2016.2551240
Article Google Scholar
Price KV (1996) Differential evolution: a fast and simple numerical optimizer. In: Proceedings of North American fuzzy information processing. IEEE. New Jersey, USA. pp 524–527.
Opara KR, Arabas J (2019) Differential evolution: a survey of theoretical analyses. Swarm Evol Comput 44:546–558. https://doi.org/10.1016/j.swevo.2018.06.010
Article Google Scholar
Gämperle R, Müller SD, Koumoutsakos P (2002) A parameter study for differential evolution. Adv Intell Syst Fuzzy Syst Evol Comput 10:293–298
Google Scholar
Neri F, Tirronen V (2010) Recent advances in differential evolution: a survey and experimental analysis. Artif Intell Rev 33:61–106. https://doi.org/10.1007/s10462-009-9137-2
Article Google Scholar
Zielinski K, Weitkemper P, Laur R, et al. (2006) Parameter Study for Differential Evolution Using a Power Allocation Problem Including Interference Cancellation. In: 2006 IEEE International Conference on Evolutionary Computation. IEEE. BC, Canada. pp 1857–1864.

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Grant Nos. 62071379, 62106196, 62071378 and 61901365), the Natural Science Basic Research Plan in Shaanxi Province of China (Grant No. 2021JM-461), the Fundamental Research Funds for the Central Universities (Grant No. GK202103085), and New Star Team of Xi'an University of Posts & Telecommunications (Grant No. xyt2016-01).

Author information

Authors and Affiliations

School of Communications and Information Engineering, Xi’an University of Posts and Telecommunications, Xi’an, 710121, China
Feng Zhao & Chaofei Wang
School of Computer Science, Shaanxi Normal University, Xi’an, 710119, China
Hanqiang Liu

Authors

Feng Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Chaofei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hanqiang Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Feng Zhao or Hanqiang Liu.

Ethics declarations

Conflict of interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

In order to minimize the objective function, the Lagrangian multiplier $\alpha$ is introduced and the Lagrangian formulation is

$$\begin{aligned} L & = \left( {1 - \lambda } \right)\sum\limits_{{c = 1}}^{{C_{T} }} {\sum\limits_{{i = 1}}^{{N_{T} }} {\mu _{{ci}}^{R} \left\| {x_{i} - v_{c} } \right\|^{2} } } \\ & + \lambda \sum\limits_{{k = 1}}^{{C_{S} }} {\sum\limits_{{c = 1}}^{{C_{T} }} {\left( {r_{{ck}} } \right)^{m} \left\| {\tilde{v}_{k} - v_{c} } \right\|^{2} } } \\ & - \sum\limits_{{c = 1}}^{{CT}} {\alpha _{c} } \left[ {\sum\limits_{{k = 1}}^{{C_{S} }} {r_{{ck}} } - 1} \right] \\ \end{aligned}$$

(A.1)

Let $\frac{\partial L}{{\partial r_{ck} }} = 0$, then $\frac{\partial L}{{\partial r_{ck} }} = m(r_{ck} )^{m - 1} \left\| {\tilde{v}_{k} - v_{c} } \right\|^{2} - \alpha_{c} = 0$

$$ \, r_{ck} = \left( {\frac{{\alpha_{c} }}{{m\left\| {\tilde{v}_{k} - v_{c} } \right\|^{2} }}} \right)^{{\frac{1}{m - 1}}} $$

(A.2)

Because r_ck should satisfy the constraint $\sum\nolimits_{k = 1}^{{C_{S} }} {r_{ck} } = 1$, then we can get r_ck in a closed-form as follows.

$$ r_{ck} = \frac{1}{{\sum\limits_{l = 1}^{{C_{S} }} {\left( {\frac{{\left\| {\tilde{v}_{k} - v_{c} } \right\|}}{{\left\| {\tilde{v}_{l} - v_{c} } \right\|}}} \right)^{{{\raise0.7ex\hbox{$2$} \!\mathord{\left/ {\vphantom {2 {m - 1}}}\right.\kern-0pt} \!\lower0.7ex\hbox{${m - 1}$}}}} } }} $$

(A.3)

Let $\begin{gathered} \frac{{\partial L}}{{\partial v_{c} }} = - 2(1 - \lambda )\sum\limits_{{i = 1}}^{{N_{T} }} {\mu _{{ci}}^{R} (x_{i} - v_{c} )} \hfill \\ + ( - 2)\lambda \sum\limits_{{k = 1}}^{{C_{S} }} {(r_{{ck}} )^{m} (\tilde{v}_{k} - v_{c} )} = 0 \hfill \\ \end{gathered}$,

$$\begin{aligned} & v_{c} \left[(1 - \lambda )\sum\limits_{i = 1}^{{N_{T} }} {\mu_{ci}^{R} + } \lambda \sum\limits_{k = 1}^{{C_{S} }} {(r_{ck} )^{m} }\right ]\\ & \quad = (1 - \lambda )\sum\limits_{i = 1}^{{N_{T} }} {\mu_{ci}^{R} x_{i} } + \lambda \sum\limits_{k = 1}^{{C_{S} }} {(r_{ck} )^{m} \tilde{v}_{k} }\end{aligned} $$

(A.4)

We can get v_c in a closed-form as follows:

$$ v_{c} = \frac{{\left( {1 - \lambda } \right)\sum\nolimits_{i = 1}^{{N_{T} }} {\mu_{ci}^{R} x_{i} } + \lambda \sum\nolimits_{k = 1}^{{C_{S} }} {\left( {r_{ck} } \right)^{m} \tilde{v}_{k} } }}{{\left( {1 - \lambda } \right)\sum\nolimits_{i = 1}^{{N_{T} }} {\mu_{ci}^{R} } + \lambda \sum\nolimits_{k = 1}^{{C_{S} }} {\left( {r_{ck} } \right)^{m} } }} $$

(A.5)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zhao, F., Wang, C. & Liu, H. Differential evolution-based transfer rough clustering algorithm. Complex Intell. Syst. 9, 5033–5047 (2023). https://doi.org/10.1007/s40747-023-00987-8

Download citation

Received: 07 July 2022
Accepted: 22 January 2023
Published: 27 February 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s40747-023-00987-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Differential evolution-based transfer rough clustering algorithm

Abstract

Similar content being viewed by others

A Note on Objective-Based Rough Clustering with Fuzzy-Set Representation

Various Types of Objective-Based Rough Clustering

A Novel Rough Set Based Clustering Approach for Streaming Data

Introduction