Algorithms for known number of clusters
Fuzzy c-means clustering
The FCM algorithm (Bezdek 1981) is a widely used technique that uses the principles of fuzzy sets to evolve a partition matrix U(X) while minimizing the measure
$$ J_{m} = \sum^{n}_{j=1} \sum^{K}_{k=1} u^{m}_{k,j} D^{2}(z_{k},x_{j}),\quad 1\leq m\leq \infty $$
(1)
where n is the number of data objects, K represents number of clusters, u
k,j
is cluster membership of jth point in the kth cluster and m denotes the fuzzy exponent. D(z
k
, x
j
) denotes the distance of point x
j
from the kth cluster centre z
k
. FCM algorithm starts with random initial K cluster centres, and then at every iteration, it finds the fuzzy membership of each data points using the following equation:
$$ u_{k,i} = \frac{\left(\frac{1}{D(z_{k},x_{i})}\right)^{\frac{1}{m-1}}} {\sum^{K}_{j=1} \left(\frac{1}{D(z_{j},x_{i})}\right)^{\frac{1}{m-1}}}, \quad {\rm for}\quad 1\leq k \leq K,\quad 1\leq i\leq n $$
(2)
The cluster centres are recomputed using the following equation:
$$ z_{k} = \frac{\sum_{i=1}^{n} u_{k,i}^{m} x_{i}}{\sum_{i=1}^{n} u_{k,i}^{m}}\quad 1\leq k \leq K $$
(3)
The algorithm terminates when there is no further change in the cluster centres. Finally, each data point is assigned to the cluster to which it has maximum membership.
Fuzzy c-medoids clustering
The FCMdd (Krishnapuram et al. 1999) algorithm is the extension of FCM (Bezdek 1981) algorithm replacing cluster means with cluster medoids. A medoid is defined as follows: Let \(V = \left\{v_{1}, v_{2}, \ldots, v_{i} \right\}\) be a set of z objects. The medoid of V is an object \(O \in V\) such that the sum of distances from O to other objects in V is minimum. The aim of FCMdd algorithm is to cluster the dataset \(X = \left\{ x_{1}, x_{2}, \ldots, x_{n} \right\}\) into K partitions so that the J
m
(Eq. 1) is minimized. The FCMdd algorithm also iteratively estimates the partition matrix U(X) followed by computation of new cluster medoids. It starts with random initial C medoids, and then at every iteration it finds the fuzzy membership of each object to every cluster using the Eq. 2. Based on the membership values, the cluster medoids are recomputed as follows:
$$ p_{k} = {\rm argmin}_{1 \leq j \leq n} \sum^{n}_{i=1} u^{m}_{k,i} D(x_{j},x_{i}),\quad 1 \leq k \leq K $$
(4)
and
$$ z_{k} = x_{pk}, 1 \leq k \leq K $$
(5)
The algorithm terminates when there is no significant improvement in J
m
value. Finally, assignment of each data point is performed in a manner identical to that of FCM.
Differential evolution-based fuzzy c-medoids clustering
In DEFCMdd (Maulik et al. 2010; Maulik and Saha 2009) clustering, the medoids of the clusters are encoded in the vector. For initializing a vector, C medoids are randomly selected from n data points. The fitness of a vector indicates the degree of goodness of the solution, which is defined by J
m
. The objective is, therefore, to minimize the J
m
index for achieving proper clustering. Subsequently, the medoids encoded in a vector are updated using Eqs. 3 and 5. The process of mutation and crossover follows Eqs. 6 and 9.
$$ \vartheta_{k}(t+1) = \vartheta_{m}(t) + F(\vartheta_{r}(t) -\vartheta_{j}(t)) $$
(6)
Here ϑ
m
(t), ϑ
r
(t) and ϑ
j
(t) are randomly taken vectors from the current population (indicated by t time stamp) with the d dimensions for the mutant vector ϑ
k
(t + 1). F is the scaling factor usually \(\in\) [0,1]. If the index value of ϑ
k
(t + 1) lies beyond the permissible range of \(\left\{1,\ldots, n \right \}\) then it is scaled using one of the following two operations:
$$ \vartheta_{k}(t+1) - n $$
(7)
and
$$ \vartheta_{k}(t+1) + n $$
(8)
To increase the diversity of the perturbed parameter vectors, crossover is introduced.
$$ U_{jk}(t+1) = \left\{ \begin{array}{l} \vartheta_{jk}(t+1) \\ \quad{\rm if}\,{\rm rand}_{j}(0,1) \leq {\rm CR}\quad{\rm or}\quad j={\rm rand}(k)\\ \vartheta_{jk}(t)\\ \quad{\rm if}\,{\rm rand}_{j}(0,1) > {\rm CR}\quad{\rm and}\quad j \not= {\rm rand}(k)\\ \end{array}\right. $$
(9)
In Eq. (9), rand
j
(0,1) is the jth evaluation of a uniform random number generator with outcome \(\in\) [0, 1]. CR is the crossover rate \(\in\) [0, 1], which has to be determined by the user. rand(k) is a randomly chosen index \(\in\left\{1, 2,\ldots, d\right\},\) which ensures that U
k
(t + 1) gets at least one parameter from ϑ
k
(t + 1). To make the population for the next generation, the trial vector U
k
(t + 1) is compared with the target vector ϑ
k
(t) using the greedy criterion. If vector U
k
(t + 1) yields a better fitness value than ϑ
k
(t), then U
k
(t + 1) is set to ϑ
k
(t); otherwise, the old value ϑ
k
(t) is retained. The algorithm is terminated after a fixed number of generations. The algorithm is outlined in Fig. 1.
Genetic algorithm-based fuzzy c-medoids clustering
GA-based fuzzy c-medoids (GAFCMdd) (Maulik et al. 2010; Maulik and Saha 2009; Maulik and Bandyopadhyay 2000) clustering algorithm also uses the same encoding policy as DEFCMdd to represent the vectors. The fitness of each chromosome is computed using Eq. 1. Subsequently, the medoids encoded in a chromosome are also updated using Eqs. 3 and 5. Conventional proportional selection has been implemented on the population. The standard single point crossover is applied stochastically with probability μc. Each chromosome also undergoes mutation with a fixed probability μm. Termination condition is the same as the other algorithm. The elitism model of GAs has been used, where the best chromosome seen till the current generation is stored in a location within the population. The best chromosome of the last generation provides the solution to the clustering problem. Figure 2 demonstrates the GAFCMdd algorithm.
Algorithms for unknown number of clusters
Automatic differential evolution-based fuzzy clustering
Automatic differential evolution-based fuzzy clustering (Maulik and Saha 2010) has been developed on the framework of differential evolution (DE). The technique uses a masker along with the initial population of DE, which contains 0’s and 1’s. The value 1 in the masker cell indicates that the encoded medoids in the same position of the vector is valid, otherwise not. Fitness of the each vector is computed by XB index (Xie and Beni 1991). Let \(\left\{z_{1}, z_{2}, \ldots, z_{K} \right\}\) be the set of K cluster medoids encoded in a vector. The XB index is defined as a function of the ratio of the total variation σ to the minimum separation sep of the clusters. Here σ and sep can be written as
$$ \sigma(U,Z;X) = \sum_{k=1}^{K}\sum_{i=1}^{n} u_{k,i}^{2}\,D^{2}(z_{k},x_{i}), $$
(10)
and
$$ {\rm sep} (Z) = min_{i \neq j} {\parallel z_{i} -z_{j} \parallel^{2}}, $$
(11)
where ∥.∥ is the Euclidean norm, and D(z
k
, x
i
), as mentioned earlier, is the distance between the pattern x
i
and the cluster medoid z
k
. The XB index is then define as
$$ XB (U,Z;X) = \frac{\sigma(U,Z;X)}{n \times {\rm sep} (Z)} $$
(12)
Note that when the partitioning is compact and good, value of σ should be low while sep should be high, thereby yielding lower values of the XB index. The objective is, therefore, to minimize the XB index for achieving proper clustering. Moreover, the process of mutation, crossover and selection are the same as in DE, and it terminates after a fixed number of generations.
Variable length genetic algorithm-based fuzzy clustering
A variable string length GA (VGA)-based clustering technique has been developed by Maulik and Bandyopadhyay (2003) on the framework of genetic algorithm (GA), where real valued encoding of cluster centres is used. However, index-encoding of cluster medoids is implemented due to context of this article. The algorithm automatically evolves the number of clusters as well as the partitioning and minimizes the XB cluster validity index. Since the number of clusters is considered to be variable, the string lengths of different chromosomes in the same population are allowed to vary. The selection, crossover and mutation operations are also performed in each generation. Elitism is also incorporated to keep track of the best chromosome obtained so far.
Cluster validity indices
Minkowski Score (Jardine and Sibson 1971), Kappa Index (Cohen 1960) (these indices are applied when true cluster is known), and Silhouette Index (S(C)) (Rousseeuw 1987) are used for evaluating the performance of the clustering algorithms.
Minkowski Score
The performances of the clustering algorithms are evaluated in terms of the Minkowski Score (MS) (Jardine and Sibson 1971). This is a measure of the quality of a solution given the true clustering. Let T be the “true” solution and S the solution we wish to measure. Denote by n
11 the number of pairs of elements that are in the same cluster in both S and T. Denote by n
01 the number of pairs that are in the same cluster only in S, and by n
10 the number of pairs that are in the same cluster in T. Minkowski Score (MS) is then defined as:
$$ {\rm MS} = \sqrt{\frac{n_{01} + n_{10}}{n_{11} + n_{10}}} $$
(13)
For MS, the optimum score is 0, with lower scores being “better”.
Kappa index
The kappa index was developed by Cohen (1960) and used by the medical community as a useful measure of classification accuracy. The score of the method is derived from the contingency table called confusion matrix, where each element C
ij
is the number of records pertaining to cluster i that have been automatically classified in cluster j. So the diagonal elements correspond to the records that have been correctly classified. Overall accuracy (% correct) and kappa index are computed by Eqs. 14 and 15, respectively.
$$ \% \ {\rm correct} = \frac{\sum_{k}C_{kk}}n\times 100 $$
(14)
$$ {\rm kappa} = \frac{n \sum_{k}C_{kk} - \sum_{k} C_{k+} C_{+k}}{n^{2} - \sum_{k} C_{k+} C_{+k}} $$
(15)
where C
k+ = ∑
j
C
kj
, C
+k
= ∑
i
C
ik
and n is the number of data points. Kappa values range from 0 to 1. Higher value of kappa (close to 1) indicates better accuracy.
Silhouette index
Silhouette index (Rousseeuw 1987) reflects the compactness and separation of the clusters. Given a set of n samples \(S = \left\{s_{1}, s_{2},\ldots, s_{n} \right\}\) and a clustering of the samples \(C = \left\{C_{1}, C_{2}, \ldots,C_{K}\right\},\) the silhouette width S(s
i
) for each sample s
i
belonging to cluster C
j
denotes a confidence measure of belongingness, and it is defined as follows:
$$ S(s_{i})=\frac{b(s_{i}) - a(s_{i})}{{\rm max}\left\{a(s_{i}), b(s_{i})\right\}} $$
(16)
Here a(s
i
) denotes the average distance of the sample s
i
from the other samples of the cluster to which sample s
i
is assigned, and b(s
i
) represents the minimum of the average distances of sample s
i
from the samples of the clusters \(C_{l}, l = 1, 2,\ldots, K,\) and l ≠ j. The value of S(s
i
) lies between −1 and 1. Large value of S(s
i
) (approaching 1) indicates that the sample s
i
is well clustered. Overall silhouette index S(C) of a clustering C is defined as
$$ S(C)=\frac{1}{n}\sum_{i=1}^{n} S(s_{i}) $$
(17)
Greater value of S(C) (approaching 1) indicates that most of the samples are correctly clustered and this, in turn, reflects better clustering solution.