Hybrid Cluster Validation Techniques

  • Satish Gajawada
  • Durga Toshniwal
Part of the Advances in Intelligent Systems and Computing book series (volume 167)


Clustering methods divide the dataset into groups of similar objects called as clusters. Two objects in different clusters are dissimilar and objects in the same cluster are similar. Evaluation of clustering results is known as cluster validation. Cluster validation can be of different types. Internal cluster validation indices measure the quality of the clusters based on the intrinsic properties of the data. External cluster validation is based on external information about the data. The advantage of internal validation is that external information is not required. But using small amount of external information can make unsupervised clustering technique using internal cluster validation for finding optimal clustering solution achieve better results. The advantage with supervised clustering technique using external validation is that clusters confirming to class distribution are obtained. But using intrinsic information present in the data can prevent over fitting of data by supervised learning technique using external validation. In this paper we propose various hybrid cluster validation indices using internal and external cluster validation indices. The advantage with hybrid indices is that validation is done using both intrinsic information of data and available external information.In this work we focus on hybrid cluster validation indices for semi-supervised clustering.


Semi-supervised clustering hybrid cluster validation indices cluster validation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Rendon, E., Abundez, I., Arizmendi, A., Quiroz, E.M.: Internal versus External cluster validation indexes. International Journal of Computers and Communications 5(1), 27–34 (2011)Google Scholar
  2. 2.
    Bolshakova, N., Azuaje, F.: Cluster validation techniques for genome expression data. Signal Processing 83(4), 825–833 (2003)MATHCrossRefGoogle Scholar
  3. 3.
    Bolshakova, N., Azuaje, F.: Machaon CVE: cluster validation for gene expression data. Bioinformatics 19(18), 2494–2495 (2003)CrossRefGoogle Scholar
  4. 4.
    Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comp. App. Math. 20, 53–65 (1987)MATHCrossRefGoogle Scholar
  5. 5.
    Dunn, J.: Well separated clusters and optimal fuzzy partitions. J. Cybernetics 4, 95–104 (1974)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Transactions on Pattern Recognition and Machine Intelligence 1(2), 224–227 (1979)CrossRefGoogle Scholar
  7. 7.
    Hubert, L., Schultz, J.: Quadratic assignment as a general data-analysis strategy. British Journal of Mathematical and Statistical Psychologie 29, 190–241 (1976)MathSciNetMATHCrossRefGoogle Scholar
  8. 8.
    Bolshakova, N., Azuaje, F.: Estimating the number of clusters in DNA microarray data. Methods of Information in Medicine (2006)Google Scholar
  9. 9.
    Dimitriadou, E., Dolnicar, S., Weingessel, A.: An examination of indexes for determining the Number of Cluster in binary data sets. Psychometrika 67(1), 137–160 (2002)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Dudoit, S., Fridlyand, J.: A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biology 3(7) (2002)Google Scholar
  11. 11.
    Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On Clustering Validation Techniques. Intelligent Information Systems Journal 17(2), 107–145 (2001)MATHCrossRefGoogle Scholar
  12. 12.
    Gajawada, S., Toshniwal, D., Patil, N., Garg, K.: Optimal clustering method based on genetic algorithm. In: Deep, K., Nagar, A., Pant, M., Bansal, J.C. (eds.) Proceedings of the International Conf. on SocProS 2011. AISC, vol. 131, pp. 295–304. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  13. 13.
    Niu, Z.-Y., Ji, D.-H., Tan, C.-L.: Document Clustering Based on Cluster Validation. In: Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management, CIKM 2004 (2004)Google Scholar
  14. 14.
    Kryszczuk, K., Hurley, P.: Estimation of the Number of Clusters Using Multiple Clustering Validity Indices. In: El Gayar, N., Kittler, J., Roli, F. (eds.) MCS 2010. LNCS, vol. 5997, pp. 114–123. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  15. 15.
    Pihur, V., Datta, S., Datta, S.: Weighted rank aggregation of cluster validation measures: A Monte Carlo cross-entropy approach. Bioinformatics 23(13), 1607–1615 (2007)CrossRefGoogle Scholar
  16. 16.
    Demiriz, A., Bennett, K.P., Embrechts, M.J.: Semi-supervised clustering using genetic algorithms. Artificial Neural Networks in Engineering, 1–20 (1999)Google Scholar
  17. 17.
    Patil, B.M., Joshi, R.C., Durga, T.: Effective framework for prediction of disease outcome using medical datasets: clustering and classification. Int. J. Computational Intelligence Studies 1(3) (2010)Google Scholar
  18. 18.
    Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Education (2009)Google Scholar

Copyright information

© Springer-Verlag GmbH Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.Department of Electronics and Computer EngineeringIndian Institute of Technology RoorkeeRoorkeeIndia

Personalised recommendations