Abstract
Clustering validation is one of the most important and challenging parts of clustering analysis, as there is no ground truth knowledge to compare the results with. Up till now, the evaluation methods for clustering algorithms have been used for determining the optimal number of clusters in the data, assessing the quality of clustering results through various validity criteria, comparison of results with other clustering schemes, etc. It is also often practically important to build a model on a large amount of training data and then apply the model repeatedly to smaller amounts of new data. This is similar to assigning new data points to existing clusters which are constructed on the training set. However, very little practical guidance is available to measure the prediction strength of the constructed model to predict cluster labels for new samples. In this study, we proposed an extension of the crossvalidation procedure to evaluate the quality of the clustering model in predicting cluster membership for new data points. The performance score was measured in terms of the root mean squared error based on the information from multiple labels of the training and testing samples. The principal component analysis (PCA) followed by kmeans clustering algorithm was used to evaluate the proposed method. The clustering model was tested using three benchmark multilabel datasets and has shown promising results with overall RMSE of less than 0.075 and MAPE of less than 12.5% in three datasets.
Introduction
Overview of Unsupervised Learning
Unsupervised learning aims to find the underlying structure or the distribution of data. It is an important area in the domain of machine learning, where the labels for the data examples are not necessarily required for model building. The main tasks in unsupervised learning include cluster analysis [40, 42], building selforganizing maps (SOM) [21], representation learning [2], and density estimation [31]. Cluster analysis, the main focus of this study, is a central task for grouping heterogeneous data points into a number of more homogenous subgroups based on distance, or naturally occurring trends, patterns, and relationships in the data. The formation of homogenous or heterogeneous grouping (or clustering) structure from a complex dataset requires a measure of ‘closeness’ or ‘similarity’. In clustering, the definition of similarity is highly dependent on the applied distance function between the data objects. The choice of similarity measure can be considered based on the type of the variable used to cluster objects (continuous, discrete, binary), the type of measurements (nominal, ordinal, ratio, interval), and subject matter knowledge. The most commonly used distance measure in most clustering algorithm is the Euclidian distance [9]. Other measures include Minkowski’s distance [6], Cosine distance [29], Sdistance [5], etc.
The clustering problem has a clear goal of finding distinct groups or ‘clusters’ within the dataset. However, the notion of a ‘cluster’ has not been precisely defined, which has contributed to the rise of different clustering algorithms [11]. The existence of different types of clustering algorithms poses difficulties to select the best algorithm for a particular task. Independent of the type of algorithm used, Kleinberg [17] proposes three properties that an ideal clustering algorithm should have so that it can be considered good: scale invariance, consistency and richness. Scale invariance indicates that the clustering algorithm does not change its results when all distances between points are scaled by a constant factor. A clustering process is considered to be consistent when the clustering results do not change if the distances within clusters decreases and/or the distance between clusters increase. The richness criteria mean that the clustering function must be flexible enough to potentially produce any arbitrary partitions of the input dataset. According to Kleinberg’s impossibility theorem [17], no clustering algorithm satisfies all three requirements simultaneously. This implies that it has been very difficult to develop a unified framework for validation of clustering methods and to reason about it at a technical level.
MultiLabel Data
Several types of research in machine learning deal with the analysis of singlelabel data, where training instances are associated with a single label λ from a set of disjoint labels L. However, training samples in several application domains are often associated with a set of labels Y ⊆ L. Such datasets are called multilabel data. Multilabel datasets have been popular in various domains, such as protein function classification, medical diagnosis, emotion recognition, text classification, etc. For instance, a medical patient may be affected by more than one chronic disease: diabetes, hypertension, and fatty liver. We can cluster the patients into distinct groups each with specific characteristics, and then the burden of these unwanted outcomes (diabetes, hypertension, fatty liver, etc.) can be identified to provide tailored interventions in each cluster. One of the common trends for solving supervised learning through the use of multilabel data is decomposing the multilabel problem into binary classification problems [34, 35]. In unsupervised learning, we can use the labels information of the multilabel data for evaluation of the clustering algorithm. In this study, we used features for forming clusters and class labels for performance evaluation.
Cluster Validation
Cluster validation is one of the most important and challenging parts of cluster analysis, which involves the objective and quantitative assessment of clustering results [42]. One of the problems in cluster validation is that there is no clear notion as to what exactly the ‘prediction error’ is. Because of that, clusters are sometimes validated by ad hoc methods based on the application area. Due to the absence of the ground truth and the nature of the problem, cluster validation has not been well developed [33]. As a result, evaluating the performance of a clustering algorithm is not an easy task. Commonly, the evaluation process depends on the algorithm used to obtain clustering results, which resulted in the development of multiple evaluation techniques. Various methods have been suggested in the literature for cluster validation, including external validation, internal validation, relative criteria and stabilitybased approaches.
External Clustering Validity Methods External validation index uses prior knowledge, such as externally provided class labels, to evaluate results of cluster analysis. External clustering validity approaches, such as Rand Index [26] and normalized mutual information [38] are used to measure the quality of clustering results by comparing the generated cluster labels with the preexisting clustering (reference labels) structure, i.e. ground truth solution. If the result is in some way similar to the reference, the final output is regarded as a “good” clustering. The external validation is straightforward when the closeness between two clusterings is welldefined. However, it has a basic caveat that the reference result is not given in most realworld applications. Therefore, external evaluation is generally used for synthetic data and for tuning clustering algorithms [27].
Internal Cluster Validity Methods These are used to assess the goodness of clustering structure without reference to the external information, using only the data themselves. Internal clustering validity methods measure the quality of clusteringbased solely on information intrinsic to the data; as a result, they have great practical application and numerous criteria have been proposed in the literature, such as Silhouette analyses [28], Calinski–Harabasz index [3], Davies–Bouldin [7]. The internal criteria are the most commonly used evaluation methods designed to compute the ratio of withincluster scattering (compactness) and to betweencluster separation. Measures that grouped under this category have been designed for the validation of convexshaped clusters (such as globular clusters), and fail when applied to validate nonconvex clusters [22].
The Relative Approach is performed by comparing two sets of clusters (usually built with similar algorithms but with different parameter settings) to determine which one is better. It is generally used for determining the optimal number of clusters.
Clustering Stability Approach Clustering stability measure is a slightly different approach used to assess the similarity of clustering solutions obtained by applying the same clustering algorithm on multiple independent and identically distributed samples. The intuitive idea behind the stability approach is that if we repeatedly sample data points from the population and apply the candidate clustering algorithm, then a good algorithm should produce clusterings that do not vary much from one sample to another [25]. In other words, the algorithm is stable with respect to input randomization. There are several studies to validate clusters by stability criteria [1]36, 39]. In general, the existing validation criteria are useful for such tasks as determining the correct number of clusters in the dataset, verifying whether the clusters obtained are meaningful or are just an artefact produced by the algorithms, justifying why we choose some algorithms instead of others or assessing the quality of clustering solutions. However, in the literature, there is still a lack of methods to measure the ability of the clustering algorithm to predict cluster memberships for new data points.
The Focus of This Paper
The primary aim of this paper is to measure the performance of a clustering model to predict cluster labels for new data points, given that the model is already constructed from the training data. For example, we have three existing clusters, C1, C2, and C3 and a new data point D1. The clustering model should assign D1 to one of the clusters, say C2. In this case, we want to know ‘how good is the model on new data?’ i.e. to what extent the model has correctly assigned D1 into C2.
Cluster validation idea presented in this study is different from the existing methods in that it focuses on measuring the prediction strength of a clustering algorithm using the crossvalidation procedure. The kfold crossvalidation method is used for simulating the situation when we have built the clustering model on some previously available data, and then we want to assign new data points to the previously built clusters. The prediction strength concept presented here, similarly, as the stability of the clusters, can be used for assessing the performance of a clustering method. Clustering stability results are mostly obtained based on perturbations introduced to the input data, such as subsampling or the addition of noise. Unlike in the other studies, the prediction strength of an algorithm introduced here is measured by incorporating information from several labels of multilabel data. Namely, the probability of occurrence of the labels in the training and testing data is calculated for each cluster. If label probabilities in the training and testing data are similar, the clustering can be considered as a good one. Thus, this study assumes that the clusters are already formed from the training data, and the aim is to measure how well the clustering model predicts the corresponding cluster labels for the test data based on their membership on the clustering results obtained from the training data.
This approach is motivated by medical applications in which we would like to assess the probability of various health problems in different patient groups. For example, the labels for the chronic dataset are diabetes, hypertension, and fatty liver, as indicated in Sect. “Cluster Validation”. Once the clusters are formed, the probabilities of the occurrence of these labels, i.e. diabetes, hypertension, and fatty liver are estimated in each cluster and compared between the training set and the test set. The aim is to measure how well we can predict the probabilities of these three outcomes in new patients (i.e. in the test data) based on their membership in the training clusters. In this paper, the kfold crossvalidation procedure is used to simulate such a scenario.
The kfold crossvalidation (CV) is one of the most commonly used model evaluation procedures in supervised learning. Unfortunately, it is challenging to apply CV to unsupervised learning, for example, to clustering validation. In this study, the kfold CV procedure is adapted, using labels from a multilabel dataset, to be applicable to unsupervised learning (i.e. clustering) for evaluating the performance of clustering algorithms. Following the kfold crossvalidation approach, the input data is randomly divided into k parts, of which k1 parts are used to construct the model, and the remaining part is used as an evaluation set. Then, the prediction strength is used as a statistic for clustering stability. Thus, here we propose the use of the kfold crossvalidation procedure for evaluating the prediction strength of the clustering model using the information acquired from multiple labels.
The contributions of this study are: (1) a new cluster validity index is proposed that uses the information from multiple labels to evaluate the quality of clustering algorithms; (2) the study validates the proposal through the crossvalidation analysis of some challenging multilabel datasets; (3) the root mean squared error (RMSE), which is the most frequently used measure of the differences between values in regression problem, is exploited and adjusted to be used as a cluster validity index; (4) this study shows that the proposed method can be used to measure the ability of a clustering algorithm to predict the cluster membership for new data.
Proposed Method
Given a particular clustering result, one can predict cluster membership for new data based on a clustering model built on training data. This is not always easy for all types of clustering algorithms. For example, it is hard for densitybased clustering algorithms (e.g. DBSCAN) to predict a cluster for the new data points, because the new data points may change the underlying clustering structure. For centroidbased cluster algorithms (e.g. kmeans clustering), however, prediction of a cluster for new data points is relatively easy since it only requires finding the minimum distance of a new data from all cluster centres and then updating the cluster centre of that cluster. Hence, kmeans clustering is employed to test the proposed method in this paper. Recently, several techniques have been proposed to improve the standard kmeans algorithm for high dimensional datasets, such as the Entropy Regularized Power kMeans [4], sparse kmeans [41] and others [24]. The proposed kFold CV for unsupervised learning can also be applied to these modified versions of the kmeans algorithm.
Assigning new data points to existing clusters that are constructed through the training data is considered to be an important practical application. However, very little practical guidance is available to measure the prediction strength of the constructed model to predict the cluster membership of a new data point. Prediction strength is a global measure forcing all clusters to be stable, as it uses the minimum value of cluster similarity over all clusters [14]. In this paper, we proposed a kfold crossvalidation procedure followed by the root mean squared error (RMSE) or the mean absolute percentage error (MAPE) to evaluate the prediction strength of clustering algorithm. RMSE and MAPE are the most commonly used error measurements in statistics. In prediction tasks, RMSE indicates the absolute fit of the model to the data, i.e., it is used to compare how close the observed data points are to the predicted values of the model. MAPE is the average magnitude of the difference between predicted and actual values in percentages, without considering their direction, that is, since absolute percentage errors are used, the positive and negative errors are not cancelling each other. In clustering validation, these two metrics can be used to measure the average distance between the data points and their cluster centres [12, 13, 30]. The smaller the RMSE/MAPE, the better the prediction results.
At each iteration of the kfold CV procedure, one fold is used as the test set and the remaining folds as the training set. The training set is presented to a clustering method, giving a partition as a result (training partition). Then, new data points are assigned to the clusters in the training partition based on the minimum distance from all the cluster centres. The CV method allows calculating the quality measure expressing the difference between the probability of occurrence of the outcomes (i.e. labels) in the training data and in the test data assigned to the same cluster. Once the clusters are formed using the training part of the data, the probability of occurrence of the labels in the training set and in the testing set in each cluster will be assessed and analyzed. This is similar to estimating the probability that an outcome will occur, given that a sample belongs to a certain cluster, mathematically written as P(outcomecluster). For instance, in the chronic disease dataset, one can estimate a probability of the risk of having hypertension in each of the generated clusters. Below, we describe the kfold crossvalidation procedure used to calculate a quality measure for a clustering model.
Let:
L = {λ_{i}: i = 1,…, q}: the set of all labels in a multilabel dataset.
\(q=L\): the number of labels in the multilabel dataset.
k: the number of folds in the crossvalidation procedure,
C: the number of clusters generated by the clustering algorithm.
Because we calculate label probabilities separately for each cluster i in each of the crossvalidation folds j we denote these probabilities without using the number of the cluster nor the number of the fold in order not to clutter the equations:
\({y}_{m}\), m = 1,…, q: the probability that a sample from the training dataset assigned to cluster i has the mth label.
\({\widehat{y}}_{m}\), m = 1,…, q: the probability that a sample from the testing dataset assigned to cluster i has the mth label.

1.
Shuffle the original dataset randomly

2.
Split the original dataset into k parts (folds) # k = 10, for tenfold crossvalidation.

3.
For each fold j = 1,…,k.

a)
Take fold j as the test dataset (each fold, in turn, is used as the test dataset).

b)
Take the remaining folds together as the training dataset.

c)
apply dimensionality reduction (if needed)

d)
apply normalization to dataset (if needed)

e)
Generate clusters on the training dataset.

f)
Assign data points from the test dataset (selected in step ‘a’) into the corresponding clusters obtained in step ‘e’.

g)
For each cluster i = 1, …, C found in step 'e':

a.
Compute the probabilities \({y}_{m}\), m = 1,…, q of the occurrence of the labels in cluster i based on the samples in the training dataset.

b.
Compute the probabilities \({\widehat{y}}_{m}\), m = 1,…, q of the occurrence of the labels in cluster i using the assignment of the points from the test dataset to the clusters, which was obtained in step ‘f’.

c.
Compute the root mean squared error (RMSE_{ij}) between the probabilities calculated in steps ‘a.’ and ‘b.’. Note down the scores/errors as a quality measure for cluster i obtained in fold j.

a.

a)

4.
When the loop in step 3 finishes (and so every fold served as the test set) take the average over the k folds of the recorded scores for each cluster and/or overall the clusters (Eq. (3)).
In the context of this study, RMSE and MAPE are proposed to measure the prediction strength of clustering techniques. RMSE represents the standard deviation of the difference between the probabilities of occurrence of the labels of the training data and the probabilities of occurrence of the labels of the test data in clusters. Intuitively, the RMSE in this study can be understood as the Euclidean distance between the vector of the observed probability scores of labels in the training data and the estimated probability scores of the labels in the test data for a given cluster, averaged by the total number of labels in the data (Eq. 1). Similarly, MAPE measures the size of the error between the probability scores of the training set and the probability scores of the test set in percentage terms (Eq. 2). RMSE and MAPE are evaluation methods that can be used together to diagnosis the variation in the errors of a clustering algorithm. For cluster i and crossvalidation fold j these two measures are calculated as follows:
The resulting score obtained through RMSE with kfold crossvalidation across all clusters based on the probability score information from multiple labels, named CVIM in short, can be used as a cluster validity index (i.e. stability index). The better the values of the cluster validity index, the more stable the outputs of the clustering algorithm. High cluster stability is achieved when memberships of the clusters are not affected by small changes in the data set. The RMSE of the clustering algorithm obtained using the kfold crossvalidation can be computed as shown in Eq. (3): let \({\text{RMSE}}_{ij}\) be the RMSE for the ith cluster obtained in the jth fold (Eq. (1)). The average RMSE for the ith clusters obtained in k fold with C clusters in each fold, denoted by \({\text{ARMSE}}_{i}\), can be computed as:
Finally, the RMSEbased cluster validity index across all clusters is found using Eq. (4). MAPE is also calculated in a similar fashion as the RMSE. The architecture of the proposed method for calculating RMSE and MAPE for each cluster in ten folds of crossvalidation is presented in Fig. 1 for an algorithm generating C = 3 clusters. In the final stage, the average RMSE/MAPE of 10 similar clusters is taken from each fold of crossvalidation.
Experiments
In this paper, three public multilabel datasets were used to test the proposed method: the chronic diseases dataset [43], emotions [37] and Yeast [10] datasets. The chronic diseases dataset contains a collection of physical examination records for 110,300 patients with 62 features and 3 class labels. All the input features were used for forming clusters. The class labels (nonclustering variables), which include hypertension, diabetes and fatty liver, were not used for defining clusters but only for cluster validation. Each record in the data may be associated with more than one of the class labels. As a result, the probability of occurrence of hypertension, diabetes or fatty liver in patients of the test data can be estimated in the corresponding clusters. The chronic disease dataset is available online at https://pinfish.cs.usm.edu/dnn/. The Yeast dataset is formed by microarray expression data and phylogenetic profiles with 2417 genes. The dataset consists of 103 features with 14 labels, and each gene is associated with a set of functional labels. The emotions dataset contains examples of songs according to people’s emotions. The emotions and Yeast datasets were taken from the Mulan Library at https://mulan.sourceforge.net/datasetsmlc.html.
Multilabel datasets, and current data in general, tend to be more complex than conventional data and need dimensionality reduction. All three multilabel datasets used in this experiment have a large number of features and labels/outcomes. Taking this problem into account, we applied the dimensionality reduction process to convert the dataset into twodimensional space. The purpose of reducing data into lowerdimensional representation is to visualize and interpret the samples so that such visualization can be used to obtain insights from the data, e.g. to detect clusters and identify outliers. Moreover, a clustering process requires data reduction to obtain an efficient processing time while clustering and avoid the curse of dimensionality. For example, kmeans clustering algorithm often doesn’t work well for high dimensional data [23]. There are different techniques proposed in the literature for high dimensional features in clustering [16, 19]. In this study, principal component analysis (PCA) [32], one of the most commonly used technique, was applied as a data dimensionality reduction to convert each dataset into a twodimensional representation. Emotions and Yeast datasets have large variations within the range of feature values which can affect the quality of computed clusters. Therefore, after PCA, we applied normalization technique [8] for Emotions and Yeast datasets to ensure that good quality clusters are generated. Then, kmeans clustering [15] was applied to the reduced dataset. All the experiments have been implemented using Python programming language.
Results and Discussions
With the help of the Calinski–Harabasz index, three clusters for emotions dataset, four clusters for chronic disease dataset and five clusters for yeast dataset were identified using the kmeans clustering algorithm. A twodimensional (2D) representation of clustering results for each dataset is shown in Fig. 2. Colours of the points represent cluster memberships of the samples. For each dataset, the probabilities of the occurrence of each target variable in each cluster have been calculated both in the training and testing part of the data during the crossvalidation procedure. We first evaluated the quality of the clusters using the existing internal validity criteria. Silhouette analysis is one of the most popular and effective internal measures which allows evaluating the appropriateness of the assignment of a data object to a cluster by measuring both intracluster cohesion and intercluster separation. Clusters within the range of 51 to 70% and 71 to 100%, respectively, indicate that a reasonable and a strong intracluster cohesion and intercluster separation are found [20]. The silhouette score can take values in the interval [− 1, 1]. Negative silhouette values represent wrong data placements, while positive silhouette values better data assignments. Therefore, we want the scores to be as big as possible and close to 1 to have good clusters. In our experiments, the silhouette score has shown good results. The silhouette score for clusters found on emotion, chronic disease and Yeast datasets were 0.76, 0.82 and 0.69, respectively, indicating that the obtained clusterings were good ones.
As the main objective of this study is to evaluate the prediction performance of the clustering algorithm through tenfold crossvalidation procedure, the result of prediction performance in terms of RMSE and MAPE are presented for each cluster and across all clusters (i.e. the CVIM value), as shown in Table 1. The results represent the strength of the clustering algorithm to predict cluster labels for the test data. The obtained RMSE and MAPE scores of the clustering results in each cluster of each dataset represent the prediction errors.
Figures 3 and 4 show the RMSE and MAPE of the kmeans clustering algorithm applied to each dataset, respectively. The smallest RMSE is found in the chronic dataset in each cluster, while the highest RMSE was found in the Emotions dataset. This also holds true for the total RMSE across all the clusters (i.e. the CVIM score) on each dataset. Generally, an RMSE close to zero is indicative of the high similarity between the training and testing probabilities. Similarly, low MAPE values indicate good predictions of the occurrence of labels in each cluster across all datasets. The smaller the MAPE, the better the forecast, and more specifically, Lewis’s [18] interpretation of MAPE is that a value of less than 10% indicates highly accurate forecast, 11 to 20% is a good forecast, 21 to 50% is a reasonable forecast, and 51% or more is an inaccurate forecast. Accordingly, a highly accurate forecast is found in chronic disease dataset. The results on emotion and yeast datasets show a good prediction.
Conclusions
Evaluating the quality of clustering algorithms is an important and challenging part of the clustering task. In this study, the kfold crossvalidation procedure was adapted to the task of evaluating the quality of the clustering algorithms that is, measuring the ability of these algorithms to predict cluster membership for new data. A new clustering validity index was proposed to measure the effectiveness of the clustering algorithm through the use of root mean squared error (RMSE) and mean absolute percentage error (MAPE) values. The index was developed using the probability information obtained from several labels of multilabel data. This measure is useful for evaluating clusterings which can be used for estimating the probability of the occurrence of the labels. For example, patients can be grouped into several clusters, and the occurrence of diseases can be studied separately in each group. The results presented in the paper show that the proposed method works well for evaluating the quality of clusters obtained using the kmeans algorithm. Combining the proposed method with other, for example, densitybased, clustering algorithms require solving additional problems such as finding an effective way of assigning new data points to previously discovered clusters. Therefore, combining the proposed method with such clustering algorithms was left as further work.
References
 1.
BenDavid S, Von Luxburg U. Relating clustering stability to properties of cluster boundaries. In: 21st Annual Conference on Learning Theory, COLT 2008. 2008.
 2.
Bengio Y, et al. Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell. 2013;35(8):1798–828. https://doi.org/10.1109/TPAMI.2013.50.
 3.
Caliñski T, Harabasz J. A Dendrite method foe cluster analysis. Commun Stat. 1974. https://doi.org/10.1080/03610927408827101.
 4.
Chakraborty S et al. Entropy regularized power kmeans clustering. 23rd International Conference on Artificial Intelligence and Statistics (AISTATS 2020), Palermo, Italy; 2020. http://arxiv.org/abs/2001.03452.
 5.
Chakraborty S, Das S. KMeans clustering with a new divergencebased distance metric: convergence and performance analysis. Pattern Recogn Lett. 2017. https://doi.org/10.1016/j.patrec.2017.09.025.
 6.
Cordeiro De Amorim R, Mirkin B. Minkowski metric, feature weighting and anomalous cluster initializing in KMeans clustering. Pattern Recogn. 2012;45:1061. https://doi.org/10.1016/j.patcog.2011.08.012.
 7.
Davies DL, Bouldin DW. A cluster separation measure. IEEE Trans Pattern Anal Mach Intell. 1979. https://doi.org/10.1109/TPAMI.1979.4766909.
 8.
Do JH, Choi DK. Normalization of microarray data: singlelabeled and duallabeled arrays. Mole Cells. 2006;22(3):254–61.
 9.
Dokmanic I, et al. Euclidean distance matrices: essential theory, algorithms, and applications. IEEE Signal Process Mag. 2015. https://doi.org/10.1109/MSP.2015.2398954.
 10.
Elisseeff A, Weston J. A kernel method for multilabelled classification. In: Advances in neural information processing systems. Cambridge: The MIT Press; 2002. https://doi.org/10.7551/mitpress/1120.003.0092.
 11.
EstivillCastro V. Why so many clustering algorithms. ACM SIGKDD Explor Newsl. 2002. https://doi.org/10.1145/568574.568575.
 12.
Goran Petrović ŽĆ. Comparison of clustering methods for failure data analysis: a real life application. In: Proceedings of the XV international scientific conference on industrial systems (IS’11). pp. 297–300; 2011.
 13.
Hassani M, Seidl T. Using internal evaluation measures to validate the quality of diverse stream clustering algorithms. Vietnam J Comput Sci. 2017. https://doi.org/10.1007/s4059501600869.
 14.
Hennig C, et al. Handbook of cluster analysis. 2015. https://doi.org/10.1201/b19706.
 15.
Jain AK. Data clustering: 50 years beyond Kmeans. Pattern Recogn Lett. 2010;31(8):651–66. https://doi.org/10.1016/j.patrec.2009.09.011.
 16.
Jin J, Wang W. Influential features PCA for high dimensional clustering. Ann Stat. 2016. https://doi.org/10.1214/15AOS1423.
 17.
Kleinberg J. An impossibility theorem for clustering. In: Advances in neural information processing systems (NIPS).pp. 446–453. MIT Press, Cambridge;2002.
 18.
Lewis CD. Industrial and business forecasting methods: a practical guide to exponential smoothing and curve fitting. Oxford: Butterworth Scientific; 1982. https://doi.org/10.1002/for.3980010202.
 19.
Li W, et al. Application of tSNE to human genetic data. J Bioinf Comput Biol. 2017;15(04):1750017. https://doi.org/10.1142/S0219720017500172.
 20.
Lv Y, et al. An efficient and scalable densitybased clustering algorithm for datasets with complex structures. Neurocomputing. 2016. https://doi.org/10.1016/j.neucom.2015.05.109.
 21.
Miljkovic D. Brief review of selforganizing maps. In: 2017 40th International convention on information and communication technology, electronics and microelectronics, MIPRO 2017—Proceedings; 2017. https://doi.org/10.23919/MIPRO.2017.7973581.
 22.
Moulavi D et al. Densitybased clustering validation. In: Proceedings of the 2014 SIAM international conference on data mining. pp. 839–847 Society for Industrial and Applied Mathematics, Philadelphia, PA; 2014. https://doi.org/10.1137/1.9781611973440.96.
 23.
Napoleon D, Pavalakodi S. A new method for dimensionality reduction using K means clustering algorithm for high dimensional data set. Int J Comput Appl. 2011;13(7):41–6. https://doi.org/10.5120/17892471.
 24.
Olukanmi P, et al. Rethinking kmeans clustering in the age of massive datasets: a constanttime approach. Neural Comput Appl. 2019. https://doi.org/10.1007/s00521019046730.
 25.
Rakhlin A, Caponnetto A. Stability of Kmeans clustering. In: Advances in neural information processing systems; 2007. https://doi.org/10.1007/9783540729273_4.
 26.
Rand WM. Objective criteria for the evaluation of clustering methods. J Am Stat Assoc. 1971. https://doi.org/10.1080/01621459.1971.10482356.
 27.
Rendón E, et al. Internal versus external cluster validation indexes. Int J Comput Commun. 2011;5(1):27–34.
 28.
Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math. 1987;20:53–655. https://doi.org/10.1016/03770427(87)901257.
 29.
Sahu L, Mohan BR. An improved Kmeans algorithm using modified cosine distance measure for document clustering using Mahout with Hadoop. In: 9th International conference on industrial and information systems, ICIIS 2014; 2015. https://doi.org/10.1109/ICIINFS.2014.7036661.
 30.
Sidhu RS, et al. A subtractive clustering based approach for early prediction of fault proneness in software modules. World Acad Sci. Eng Technol. 2010;. https://doi.org/10.5281/zenodo.1331265.
 31.
Silverman BW. Density estimation: for statistics and data analysis. 2018. https://doi.org/10.1201/9781315140919.
 32.
Syms C. Principal components analysis. In: Encyclopedia of ecology. Amsterdam: Elsevier; 2018. https://doi.org/10.1016/B9780124095489.111522.
 33.
Tan PN et al. Chap 8: Cluster analysis: basic concepts and algorithms. Introduction to data mining. 2005. https://doi.org/10.1016/00224405(81)900078.
 34.
Tarekegn A, et al. Predictive Modeling for Frailty Conditions in Elderly People: Machine Learning Approaches. JMIR medical informatics. 2020;8:e16678. http://www.ncbi.nlm.nih.gov/pubmed/32442149.
 35.
Tarekegn A et al. Detection of frailty using genetic programming. Presented at the (2020). https://doi.org/10.1007/9783030440947_15.
 36.
Tibshirani R, Walther G. Cluster validation by prediction strength. J Comput Graph Stat. 2005. https://doi.org/10.1198/106186005X59243.
 37.
Trohidis K et al. Multilabel classification of music into emotions. In: ISMIR 2008—9th international conference on music information retrieval. 2008.
 38.
Vinh NX et al. Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J Mach Learn Res. 2010;11(95):2837−2854.
 39.
Wang J. Consistent selection of the number of clusters via crossvalidation. Biometrika. 2010. https://doi.org/10.1093/biomet/asq061.
 40.
Wilks DS. Cluster analysis. Int Geophys. 2011;100:603–616. https://doi.org/10.1016/B9780123850225.000154.
 41.
Witten DM, Tibshirani R. A framework for feature selection in clustering. J Am Stat Assoc. 2010. https://doi.org/10.1198/jasa.2010.tm09415.
 42.
Xu R, WunschII D. Survey of clustering algorithms. IEEE Trans Neural Netw. 2005;16(3):645–78. https://doi.org/10.1109/TNN.2005.845141.
 43.
Zhang X, et al. A novel deep neural network model for multilabel chronic disease prediction. Front Genet. 2019. https://doi.org/10.3389/fgene.2019.00351.
Acknowledgements
The author would like to thank the reviewers of this paper for their supportive comments.
Funding
No funding was received for this study.
Author information
Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declares no competing interests.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Tarekegn, A.N., Michalak, K. & Giacobini, M. CrossValidation Approach to Evaluate Clustering Algorithms: An Experimental Study Using MultiLabel Datasets. SN COMPUT. SCI. 1, 263 (2020). https://doi.org/10.1007/s4297902000283z
Received:
Accepted:
Published:
Keywords
 Clustering validation
 Clustering analysis
 Crossvalidation
 Multilabel data