Hybrid Cluster Validation Techniques
Clustering methods divide the dataset into groups of similar objects called as clusters. Two objects in different clusters are dissimilar and objects in the same cluster are similar. Evaluation of clustering results is known as cluster validation. Cluster validation can be of different types. Internal cluster validation indices measure the quality of the clusters based on the intrinsic properties of the data. External cluster validation is based on external information about the data. The advantage of internal validation is that external information is not required. But using small amount of external information can make unsupervised clustering technique using internal cluster validation for finding optimal clustering solution achieve better results. The advantage with supervised clustering technique using external validation is that clusters confirming to class distribution are obtained. But using intrinsic information present in the data can prevent over fitting of data by supervised learning technique using external validation. In this paper we propose various hybrid cluster validation indices using internal and external cluster validation indices. The advantage with hybrid indices is that validation is done using both intrinsic information of data and available external information.In this work we focus on hybrid cluster validation indices for semi-supervised clustering.
KeywordsSemi-supervised clustering hybrid cluster validation indices cluster validation
Unable to display preview. Download preview PDF.
- 1.Rendon, E., Abundez, I., Arizmendi, A., Quiroz, E.M.: Internal versus External cluster validation indexes. International Journal of Computers and Communications 5(1), 27–34 (2011)Google Scholar
- 8.Bolshakova, N., Azuaje, F.: Estimating the number of clusters in DNA microarray data. Methods of Information in Medicine (2006)Google Scholar
- 10.Dudoit, S., Fridlyand, J.: A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biology 3(7) (2002)Google Scholar
- 13.Niu, Z.-Y., Ji, D.-H., Tan, C.-L.: Document Clustering Based on Cluster Validation. In: Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management, CIKM 2004 (2004)Google Scholar
- 16.Demiriz, A., Bennett, K.P., Embrechts, M.J.: Semi-supervised clustering using genetic algorithms. Artificial Neural Networks in Engineering, 1–20 (1999)Google Scholar
- 17.Patil, B.M., Joshi, R.C., Durga, T.: Effective framework for prediction of disease outcome using medical datasets: clustering and classification. Int. J. Computational Intelligence Studies 1(3) (2010)Google Scholar
- 18.Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Education (2009)Google Scholar