Advertisement

An Efficient K-Medoids Clustering Algorithm for Large Scale Data

  • Xiaochun WangEmail author
  • Xiali Wang
  • Don Mitchell Wilkes
Chapter

Abstract

K-medoids clustering is a popular partition-based clustering technique to identify usual patterns in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for it. In this paper, we propose an efficient K-medoids clustering algorithm which preserves the clustering performance by following the notion of a simple and fast K-medoids algorithm while improving the computational efficiency. The proposed algorithm does not require pre-calculating the distance matrix and therefore is applicable to large scale datasets. When a simple pruning rule is used, it can give near linear time performance. To this end, the complexity of this proposed algorithm is analyzed and found to be lower than that of the state of the art K-medoids algorithms. We test our algorithm on real data sets with millions of examples and experimental results show that the proposed algorithm outperforms state-of-the-art K-medoids clustering algorithms.

Keywords

Clustering Patition based clustering K-means algorithm K-medoids algorithm INCK algorithm 

References

  1. Amorèse, D., Bossu, R., & Mazet-Roux, G. (2015). Automatic clustering of macroseismic intensity data points from internet questionnaires: Efficiency of the partitioning around medoids (PAM). Seismological Research Letters, 86, 1171–1177.CrossRefGoogle Scholar
  2. Arumugam, M., Raes, J., & Pelletier, E. (2011). Enterotypes of the human gut microbiome. Nature, 506, 174–180.CrossRefGoogle Scholar
  3. Ayyala, D., & Lin, S. (2015). GrammR: Graphical representation and modeling of count data with application in metagenomics. Bioinformatics, 31, 1648–1654.CrossRefGoogle Scholar
  4. Bach, F.R., & Jordan, M.I. (2004, December). Blind one-microphone speech separation: a spectral learning approach. In Proceedings of the 17th International Conference on Neural Information Processing Systems (NIPS’04), (pp. 65–72). MIT Press.Google Scholar
  5. Broin, P. Ó., Smith, T., & Golden, A. (2015). Alignment-free clustering of transcription factor Binding motifs using agenetic-k-medoids approach. BMC Bioinformatics, 16, 1–12.CrossRefGoogle Scholar
  6. Han, J., Kamber, M., & Tung, A.K.H. (2001). Spatial clustering methods in data mining: aaurvey. In H. J. Miller & J. Han (Eds.), Geographic data mining and knowledge discovery. Taylor & Francis.Google Scholar
  7. Jain, A.K. (2008). Data clustering: 50 years beyond Kmeans. In: W. Daelemans, B. Goethals & K. Morik (Eds.). Machine learning and knowledge discovery in databases. ECML PKDD 2008. Lecture notes in computer science, Vol. 5211, pp. 3–4, Springer, Berlin, Heidelberg.Google Scholar
  8. Kaufman, L., & Rousseeuw, P.J. (1987). Clustering by means of medoids. In Y. Dodge (Ed.). Statistical data analysis based on the norm and related methods (pp. 405–416). North-Holland.Google Scholar
  9. Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. New York: Wiley.CrossRefGoogle Scholar
  10. Khatami, A., Mirghasemi, S., Khosravi, A., Lim, C. P., & Nahavandi, S. (2017). A new PSO-based approach to fire flame detection using K-medoids clustering. Expert Systems with Applications, 68, 69–80.CrossRefGoogle Scholar
  11. Lai, P.-S., & Hu, H.-C. (2011). Variance enhanced K-medoids clustering. Expert Systems with Applications, 38, 764–775.CrossRefGoogle Scholar
  12. Lucasius, C.B., Dane, A.D., & Kateman, G. (1993). On K-medoid clustering of large data sets with the aid of agenetic algorithm: background, feasibility and comparison. Analytica Chimica Acta, 282, 647–669.Google Scholar
  13. MacQueen, J.B. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability (pp. 281–297). University of California Press.Google Scholar
  14. Malik, J., Belongie, S., Leung, T., et al. (2001). Contour and texture analysis for image segmentation. International Journal of Computer Vision, 43, 7–27.CrossRefGoogle Scholar
  15. Ng, R., & Han, J. (1994). Efficient and effective clustering methods for spatial data mining. In Proceedings of the 20th International Conference On Very Large Databases (pp. 144–155). Santiago, Chile.Google Scholar
  16. Ohnishi, Y., Huber, W., & Tsumura, A. (2014). Cell-to-cell expression variability followed by signal reinforcement progressively segregates early mouse lineages. Nature Cell Biology, 16, 27–37.CrossRefGoogle Scholar
  17. Park, H.-S., & Jun, C.-H. (2009). A simple and fast algorithm for K-medoids clustering. Expert Systems with Applications, 36, 3336–3341.CrossRefGoogle Scholar
  18. Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of density peaks. Science, 344, 1492–1496.CrossRefGoogle Scholar
  19. van der Laan, M. J., Pollard, K. S., & Bryan, J. (2003). A new partitioning around medoids algorithm. Journal of Statistical Computation and Simulation, 73(8), 575–584.MathSciNetCrossRefGoogle Scholar
  20. Wei, C.-P., Lee, Y.-H., & Hsu, C.-M. (2003). Empirical comparison of fast partitioning-based clustering algorithms for large data sets. Expert Systems with Applications, 24(4), 351–363.CrossRefGoogle Scholar
  21. Weiss, Y. (1999, February). Segmentation using eigenvectors: Aunified view. In Proceedings of the 7th IEEE International Conference on Computer Vision (pp. 975–982).Google Scholar
  22. Xie, J., & Qu, Y. (2016). K-medoids clustering algorithms with optimized initial seeds by density peaks. Journal of Frontiers of Computer Science and Technology, 9, 230–247.Google Scholar
  23. Yu, D., Liu, G., Guo, M., & Liu, X. (2018). An improved K-medoids algorithm based on step increasing and optimizing medoids. Expert Systems with Applications, 92, 464–473.CrossRefGoogle Scholar
  24. Zadegan, S. M. R., Mirzaie, M., & Sadoughi, F. (2013). Ranked K-medoids: afast and accurate rand-based partitioning algorithm for clustering large datasets. Knowledge-Based Systems, 39, 133–143.CrossRefGoogle Scholar
  25. Zhang, Q., & Couloigner, I. (2005). Anew and efficient K-medoid algorithm for spatial clustering. Lecture Notes in Computer Science, 3482, 181–189.CrossRefGoogle Scholar

Copyright information

© Xi'an Jiaotong University Press 2020

Authors and Affiliations

  • Xiaochun Wang
    • 1
    Email author
  • Xiali Wang
    • 2
  • Don Mitchell Wilkes
    • 3
  1. 1.School of Software EngineeringXi’an Jiaotong UniversityXi’anChina
  2. 2.School of Information EngineeringChang’an UniversityXi’anChina
  3. 3.Department of Electrical Engineering and Computer ScienceVanderbilt UniversityNashvilleUSA

Personalised recommendations