Skip to main content

Extending Kmeans-Type Algorithms by Integrating Intra-cluster Compactness and Inter-cluster Separation

  • Chapter
  • First Online:
Unsupervised Learning Algorithms

Abstract

Kmeans-type clustering aims at partitioning a data set into clusters such that the objects in a cluster are compact and the objects in different clusters are well-separated. However, most of kmeans-type clustering algorithms rely on only intra-cluster compactness while overlooking inter-cluster separation. In this chapter, a series of new clustering algorithms by extending the existing kmeans-type algorithms is proposed by integrating both intra-cluster compactness and inter-cluster separation. First, a set of new objective functions for clustering is developed. Based on these objective functions, the corresponding updating rules for the algorithms are then derived analytically. The properties and performances of these algorithms are investigated on several synthetic and real-life data sets. Experimental studies demonstrate that our proposed algorithms outperform the state-of-the-art kmeans-type clustering algorithms with respects to four metrics: Accuracy, Rand Index, Fscore, and normal mutual information (NMI).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 119.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Aggarwal, C., Wolf, J., Yu, P., Procopiuc, C., Park, J.: Fast algorithms for projected clustering. ACM SIGMOD Rec. 28(2), 61–72 (1999)

    Article  Google Scholar 

  2. Ahmad, A., Dey, L.: A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets. Pattern Recogn. Lett. 32(7), 1062–1069 (2011)

    Article  Google Scholar 

  3. Al-Razgan, M., Domeniconi, C.: Weighted clustering ensembles. In: Proceedings of SIAM International Conference on Data Mining, pp. 258–269 (2006)

    Google Scholar 

  4. Anderberg, M.: Cluster Analysis for Applications. Academic, New York (1973)

    MATH  Google Scholar 

  5. Arthur, D., Vassilvitskii, S.: k-means++: The advantages of careful seeding. In: Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics, Philadelphia (2007)

    Google Scholar 

  6. Bezdek, J.: A convergence theorem for the fuzzy isodata clustering algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 2(1), 1–8 (1980)

    Article  MATH  Google Scholar 

  7. Bradley, P., Fayyad, U., Reina, C.: Scaling clustering algorithms to large databases. In: Proceedings of the 4th International Conference on Knowledge Discovery & Data Mining, pp. 9–15 (1998)

    Google Scholar 

  8. Cai, D., He, X., Han, J.: Semi-supervised discriminant analysis. In: Proceedings of IEEE 11th International Conference on Computer Vision, pp. 1–7 (2007)

    Google Scholar 

  9. Celebi, M.E.: Partitional Clustering Algorithms. Springer, Berlin (2014)

    MATH  Google Scholar 

  10. Celebi, M.E., Kingravi, H.A.: Deterministic initialization of the k-means algorithm using hierarchical clustering. Int. J. Pattern Recogn. Artif. Intell. 26 870–878, (2013)

    MathSciNet  Google Scholar 

  11. Celebi, M.E., Kingravi, H.A., Vela, P.A.: A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst. Appl. 40(1), 200–210 (2013)

    Article  Google Scholar 

  12. Chan, E., Ching, W., Ng, M., Huang, J.: An optimization algorithm for clustering using weighted dissimilarity measures. Pattern Recogn. 37(5), 943–952 (2004)

    Article  MATH  Google Scholar 

  13. Chen, X., Ye, Y., Xu, X., Zhexue Huang, J.: A feature group weighting method for subspace clustering of high-dimensional data. Pattern Recogn. 45(1), 434–446 (2012)

    Article  MATH  Google Scholar 

  14. Chen, X., Xu, X., Huang, J., Ye, Y.: Tw-k-means: automated two-level variable weighting clustering algorithm for multi-view data. IEEE Trans. Knowl. Data Eng. 24(4), 932–944 (2013)

    Article  Google Scholar 

  15. Das, S., Abraham, A., Konar, A.: Automatic clustering using an improved differential evolution algorithm. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 38(1), 218–237 (2008)

    Article  Google Scholar 

  16. De Sarbo, W., Carroll, J., Clark, L., Green, P.: Synthesized clustering: a method for amalgamating alternative clustering bases with differential weighting of variables. Psychometrika 49(1), 57–78 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  17. De Soete, G.: Optimal variable weighting for ultrametric and additive tree clustering. Qual. Quant. 20(2), 169–180 (1986)

    Article  Google Scholar 

  18. De Soete, G.: Ovwtre: a program for optimal variable weighting for ultrametric and additive tree fitting. J. Classif. 5(1), 101–104 (1988)

    Article  Google Scholar 

  19. Deng, Z., Choi, K., Chung, F., Wang, S.: Enhanced soft subspace clustering integrating within-cluster and between-cluster information. Pattern Recogn. 43(3), 767–781 (2010)

    Article  MATH  Google Scholar 

  20. Dhillon, I., Modha, D.: Concept decompositions for large sparse text data using clustering. Machine learning 42(1), 143–175 (2001)

    Article  MATH  Google Scholar 

  21. Domeniconi, C.: Locally adaptive techniques for pattern classification. Ph.D. thesis, University of California, Riverside (2002)

    Google Scholar 

  22. Domeniconi, C., Papadopoulos, D., Gunopulos, D., Ma, S.: Subspace clustering of high dimensional data. In: Proceedings of the SIAM International Conference on Data Mining, pp. 517–521 (2004)

    Google Scholar 

  23. Domeniconi, C., Gunopulos, D., Ma, S., Yan, B., Al-Razgan, M., Papadopoulos, D.: Locally adaptive metrics for clustering high dimensional data. Data Min. Knowl. Disc. 14(1), 63–97 (2007)

    Article  MathSciNet  Google Scholar 

  24. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley-Interscience, New York (2012)

    MATH  Google Scholar 

  25. Dzogang, F., Marsala, C., Lesot, M.J., Rifqi, M.: An ellipsoidal k-means for document clustering. In: Proceedings of the 12th IEEE International Conference on Data Mining, pp. 221–230 (2012)

    Google Scholar 

  26. Fisher, R.: The use of multiple measurements in taxonomic problems. Ann. Hum. Genet. 7(2), 179–188 (1936)

    Google Scholar 

  27. Friedman, J., Meulman, J.: Clustering objects on subsets of attributes. J. R. Stat. Soc. Ser. B 66(4), 815–849 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  28. Frigui, H., Nasraoui, O.: Simultaneous clustering and dynamic keyword weighting for text documents. In: Survey of Text Mining, Springer New York, pp. 45–70 (2004)

    Google Scholar 

  29. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann, Los Altos (2011)

    MATH  Google Scholar 

  30. Huang, J., Ng, M., Rong, H., Li, Z.: Automated variable weighting in k-means type clustering. IEEE Trans. Pattern Anal. Mach. Intell. 27(5), 657–668 (2005)

    Article  Google Scholar 

  31. Jain, A.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)

    Article  Google Scholar 

  32. Jing, L., Ng, M., Xu, J., Huang, J.: Subspace clustering of text documents with feature weighting k-means algorithm. In: Advances in Knowledge Discovery and Data Mining, Springer Berlin Heidelberg, pp. 802–812 (2005)

    Google Scholar 

  33. Jing, L., Ng, M., Huang, J.: An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Trans. Knowl. Data Eng. 19(8), 1026–1041 (2007)

    Article  Google Scholar 

  34. Makarenkov, V., Legendre, P.: Optimal variable weighting for ultrametric and additive trees and k-means partitioning: methods and software. J. Classif. 18(2), 245–271 (2001)

    MathSciNet  MATH  Google Scholar 

  35. Modha, D., Spangler, W.: Feature weighting in k-means clustering. Mach. Learn. 52(3), 217–237 (2003)

    Article  MATH  Google Scholar 

  36. Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: a review. ACM SIGKDD Explor. Newsl. 6(1), 90–105 (2004)

    Article  Google Scholar 

  37. Pelleg, D., Moore, A.: X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the 17th International Conference on Machine Learning, San Francisco, pp. 727–734 (2000)

    Google Scholar 

  38. Sardana, M., Agrawal, R.: A comparative study of clustering methods for relevant gene selection in microarray data. In: Advances in Computer Science, Engineering & Applications, Springer Berlin Heidelberg, pp. 789–797 (2012)

    Google Scholar 

  39. Selim, S., Ismail, M.: K-means-type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans. Pattern Anal. Mach. Intell. 6(1), 81–87 (1984)

    Article  MATH  Google Scholar 

  40. Shamir, O., Tishby, N.: Stability and model selection in k-means clustering. Mach. Learn. 80(2), 213–243 (2010)

    Article  MathSciNet  Google Scholar 

  41. Steinbach, M., Karypis, G., Kumar, V., et al.: A comparison of document clustering techniques. In: KDD Workshop on Text Mining, vol. 400, pp. 525–526 (2000)

    Google Scholar 

  42. Tang, L., Liu, H., Zhang, J.: Identifying evolving groups in dynamic multi-mode networks. IEEE Trans. Knowl. Data Eng. 24(1), 72–85 (2012)

    Article  Google Scholar 

  43. Wu, K., Yu, J., Yang, M.: A novel fuzzy clustering algorithm based on a fuzzy scatter matrix with optimality tests. Pattern Recogn. Lett. 26(5), 639–652 (2005)

    Article  Google Scholar 

  44. Xu, R., Wunsch, D., et al.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)

    Article  Google Scholar 

  45. Yang, M., Wu, K., Yu, J.: A novel fuzzy clustering algorithm. In: Proceedings of the 2003 IEEE International Symposium on Computational Intelligence in Robotics and Automation, vol. 2, pp. 647–652 (2003)

    Google Scholar 

Download references

Acknowledgements

The authors are very grateful to the editors and anonymous referees for their helpful comments. This research was supported in part the National Natural Science Foundation of China (NSFC) under Grant No.61562027 and Social Science Planning Project of Jiangxi Province under Grant No.15XW12, in part by Shenzhen Strategic Emerging Industries Program under Grants No. ZDSY20120613125016389, Shenzhen Science and Technology Program under Grant No. JCYJ20140417172417128 and No. JSGG20141017150830428, National Commonweal Technology R&D Program of AQSIQ China under Grant No.201310087, National Key Technology R&D Program of MOST China under Grant No. 2014BAL05B06

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yunming Ye .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Huang, X., Ye, Y., Zhang, H. (2016). Extending Kmeans-Type Algorithms by Integrating Intra-cluster Compactness and Inter-cluster Separation. In: Celebi, M., Aydin, K. (eds) Unsupervised Learning Algorithms. Springer, Cham. https://doi.org/10.1007/978-3-319-24211-8_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24211-8_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24209-5

  • Online ISBN: 978-3-319-24211-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics