Abstract
Kmeans-type clustering aims at partitioning a data set into clusters such that the objects in a cluster are compact and the objects in different clusters are well-separated. However, most of kmeans-type clustering algorithms rely on only intra-cluster compactness while overlooking inter-cluster separation. In this chapter, a series of new clustering algorithms by extending the existing kmeans-type algorithms is proposed by integrating both intra-cluster compactness and inter-cluster separation. First, a set of new objective functions for clustering is developed. Based on these objective functions, the corresponding updating rules for the algorithms are then derived analytically. The properties and performances of these algorithms are investigated on several synthetic and real-life data sets. Experimental studies demonstrate that our proposed algorithms outperform the state-of-the-art kmeans-type clustering algorithms with respects to four metrics: Accuracy, Rand Index, Fscore, and normal mutual information (NMI).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aggarwal, C., Wolf, J., Yu, P., Procopiuc, C., Park, J.: Fast algorithms for projected clustering. ACM SIGMOD Rec. 28(2), 61–72 (1999)
Ahmad, A., Dey, L.: A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets. Pattern Recogn. Lett. 32(7), 1062–1069 (2011)
Al-Razgan, M., Domeniconi, C.: Weighted clustering ensembles. In: Proceedings of SIAM International Conference on Data Mining, pp. 258–269 (2006)
Anderberg, M.: Cluster Analysis for Applications. Academic, New York (1973)
Arthur, D., Vassilvitskii, S.: k-means++: The advantages of careful seeding. In: Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics, Philadelphia (2007)
Bezdek, J.: A convergence theorem for the fuzzy isodata clustering algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 2(1), 1–8 (1980)
Bradley, P., Fayyad, U., Reina, C.: Scaling clustering algorithms to large databases. In: Proceedings of the 4th International Conference on Knowledge Discovery & Data Mining, pp. 9–15 (1998)
Cai, D., He, X., Han, J.: Semi-supervised discriminant analysis. In: Proceedings of IEEE 11th International Conference on Computer Vision, pp. 1–7 (2007)
Celebi, M.E.: Partitional Clustering Algorithms. Springer, Berlin (2014)
Celebi, M.E., Kingravi, H.A.: Deterministic initialization of the k-means algorithm using hierarchical clustering. Int. J. Pattern Recogn. Artif. Intell. 26 870–878, (2013)
Celebi, M.E., Kingravi, H.A., Vela, P.A.: A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst. Appl. 40(1), 200–210 (2013)
Chan, E., Ching, W., Ng, M., Huang, J.: An optimization algorithm for clustering using weighted dissimilarity measures. Pattern Recogn. 37(5), 943–952 (2004)
Chen, X., Ye, Y., Xu, X., Zhexue Huang, J.: A feature group weighting method for subspace clustering of high-dimensional data. Pattern Recogn. 45(1), 434–446 (2012)
Chen, X., Xu, X., Huang, J., Ye, Y.: Tw-k-means: automated two-level variable weighting clustering algorithm for multi-view data. IEEE Trans. Knowl. Data Eng. 24(4), 932–944 (2013)
Das, S., Abraham, A., Konar, A.: Automatic clustering using an improved differential evolution algorithm. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 38(1), 218–237 (2008)
De Sarbo, W., Carroll, J., Clark, L., Green, P.: Synthesized clustering: a method for amalgamating alternative clustering bases with differential weighting of variables. Psychometrika 49(1), 57–78 (1984)
De Soete, G.: Optimal variable weighting for ultrametric and additive tree clustering. Qual. Quant. 20(2), 169–180 (1986)
De Soete, G.: Ovwtre: a program for optimal variable weighting for ultrametric and additive tree fitting. J. Classif. 5(1), 101–104 (1988)
Deng, Z., Choi, K., Chung, F., Wang, S.: Enhanced soft subspace clustering integrating within-cluster and between-cluster information. Pattern Recogn. 43(3), 767–781 (2010)
Dhillon, I., Modha, D.: Concept decompositions for large sparse text data using clustering. Machine learning 42(1), 143–175 (2001)
Domeniconi, C.: Locally adaptive techniques for pattern classification. Ph.D. thesis, University of California, Riverside (2002)
Domeniconi, C., Papadopoulos, D., Gunopulos, D., Ma, S.: Subspace clustering of high dimensional data. In: Proceedings of the SIAM International Conference on Data Mining, pp. 517–521 (2004)
Domeniconi, C., Gunopulos, D., Ma, S., Yan, B., Al-Razgan, M., Papadopoulos, D.: Locally adaptive metrics for clustering high dimensional data. Data Min. Knowl. Disc. 14(1), 63–97 (2007)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley-Interscience, New York (2012)
Dzogang, F., Marsala, C., Lesot, M.J., Rifqi, M.: An ellipsoidal k-means for document clustering. In: Proceedings of the 12th IEEE International Conference on Data Mining, pp. 221–230 (2012)
Fisher, R.: The use of multiple measurements in taxonomic problems. Ann. Hum. Genet. 7(2), 179–188 (1936)
Friedman, J., Meulman, J.: Clustering objects on subsets of attributes. J. R. Stat. Soc. Ser. B 66(4), 815–849 (2004)
Frigui, H., Nasraoui, O.: Simultaneous clustering and dynamic keyword weighting for text documents. In: Survey of Text Mining, Springer New York, pp. 45–70 (2004)
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann, Los Altos (2011)
Huang, J., Ng, M., Rong, H., Li, Z.: Automated variable weighting in k-means type clustering. IEEE Trans. Pattern Anal. Mach. Intell. 27(5), 657–668 (2005)
Jain, A.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)
Jing, L., Ng, M., Xu, J., Huang, J.: Subspace clustering of text documents with feature weighting k-means algorithm. In: Advances in Knowledge Discovery and Data Mining, Springer Berlin Heidelberg, pp. 802–812 (2005)
Jing, L., Ng, M., Huang, J.: An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Trans. Knowl. Data Eng. 19(8), 1026–1041 (2007)
Makarenkov, V., Legendre, P.: Optimal variable weighting for ultrametric and additive trees and k-means partitioning: methods and software. J. Classif. 18(2), 245–271 (2001)
Modha, D., Spangler, W.: Feature weighting in k-means clustering. Mach. Learn. 52(3), 217–237 (2003)
Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: a review. ACM SIGKDD Explor. Newsl. 6(1), 90–105 (2004)
Pelleg, D., Moore, A.: X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the 17th International Conference on Machine Learning, San Francisco, pp. 727–734 (2000)
Sardana, M., Agrawal, R.: A comparative study of clustering methods for relevant gene selection in microarray data. In: Advances in Computer Science, Engineering & Applications, Springer Berlin Heidelberg, pp. 789–797 (2012)
Selim, S., Ismail, M.: K-means-type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans. Pattern Anal. Mach. Intell. 6(1), 81–87 (1984)
Shamir, O., Tishby, N.: Stability and model selection in k-means clustering. Mach. Learn. 80(2), 213–243 (2010)
Steinbach, M., Karypis, G., Kumar, V., et al.: A comparison of document clustering techniques. In: KDD Workshop on Text Mining, vol. 400, pp. 525–526 (2000)
Tang, L., Liu, H., Zhang, J.: Identifying evolving groups in dynamic multi-mode networks. IEEE Trans. Knowl. Data Eng. 24(1), 72–85 (2012)
Wu, K., Yu, J., Yang, M.: A novel fuzzy clustering algorithm based on a fuzzy scatter matrix with optimality tests. Pattern Recogn. Lett. 26(5), 639–652 (2005)
Xu, R., Wunsch, D., et al.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)
Yang, M., Wu, K., Yu, J.: A novel fuzzy clustering algorithm. In: Proceedings of the 2003 IEEE International Symposium on Computational Intelligence in Robotics and Automation, vol. 2, pp. 647–652 (2003)
Acknowledgements
The authors are very grateful to the editors and anonymous referees for their helpful comments. This research was supported in part the National Natural Science Foundation of China (NSFC) under Grant No.61562027 and Social Science Planning Project of Jiangxi Province under Grant No.15XW12, in part by Shenzhen Strategic Emerging Industries Program under Grants No. ZDSY20120613125016389, Shenzhen Science and Technology Program under Grant No. JCYJ20140417172417128 and No. JSGG20141017150830428, National Commonweal Technology R&D Program of AQSIQ China under Grant No.201310087, National Key Technology R&D Program of MOST China under Grant No. 2014BAL05B06
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Huang, X., Ye, Y., Zhang, H. (2016). Extending Kmeans-Type Algorithms by Integrating Intra-cluster Compactness and Inter-cluster Separation. In: Celebi, M., Aydin, K. (eds) Unsupervised Learning Algorithms. Springer, Cham. https://doi.org/10.1007/978-3-319-24211-8_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-24211-8_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24209-5
Online ISBN: 978-3-319-24211-8
eBook Packages: EngineeringEngineering (R0)