Abstract
Cluster analysis can be viewed as a result of the natural evolution of the vast amount of data from daily life, and can discover invisible feature information to contribute to the analysis. K-means algorithm is one of the wide data clustering methods in a variety of real-world applications thanks to its simpleness. However, the k-means is sensitive to noise and outlier data points because a small number of such data can substantially influence the mean value of the cluster. In light of this, the k-medoids algorithm selects a point as a new center that minimizes the sum of the dissimilarities in the cluster, to diminish such sensitivity to outliers. Nevertheless, the line of the k-medoids algorithm is limited by its amounts of computation and not to handle with data efficiently. To this end, we present a novel k-medoids algorithm motivated by the theory of ball cluster, relationship between clusters and partitioning cluster for assigning samples into their nearest medoids efficiently, called ball k-medoids, which drop the distance calculation of sample-medoid significantly. Moreover, a threshold is inferenced by the rollback method for reducing computation of medoid-medoid distance and accelerating clustering. Experiments finally demonstrate that the performance of ball k-medoids achieves more efficient in comparison with other k-medoids algorithms, and it performs exacter accuracy compared with k-means.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aggarwal, C., Reddy, C.: Data Clustering: Algorithms and Applications. Chapman & Hall/CRC Press, Boca Raton, FL, USA (2013)
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco, CA, USA (2012)
Xiao, Z., Xu, X., Xing, H., Chen, J.: RTFN: Robust Temporal Feature Network. arXiv preprint arXiv:2008.07707 (2020)
Huang, Y., Wang, D., Sun, Y., Hang, B.: A fast intra coding algorithm for HEVC by jointly utilizing naive Bayesian and SVM. Multimedia Tools Appl. 79(45–46), 33957–33971 (2020). https://doi.org/10.1007/s11042-020-08882-x
Shepitsen, A., Gemmell, J., Mobasher, B., Burke, R.: Personalized recommendation in social tagging systems using hierarchical clustering. In: Proceedings of the ACM Conference on Recommender Systems (RECSYS), New York, MY, USA, pp. 259–266 (2008)
Chien, Y.: Pattern classification and scene analysis. IEEE Trans. Autom. Control 19(4), 462–463 (1974)
Tao, D., Li, X., Gao, X.: Large sparse cone non-negative matrix factorization for image annotation. In: ACM Transactions on Intelligent Systems and Technology (TIST), vol. 8, no. 3, p. 37 (2017)
Medan, G., Shamul, N., Joskowicz, L.: Sparse 3D radon space rigid registration of CT scans: method and validation study. IEEE Trans. Med. Imaging 36(2), 497–506 (2017)
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, Hoboken (1990)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: a new data clustering algorithm and its applications. Data Min. Knowl. Disc. 1(2), 141–182 (1997)
Karypis, G., Han, E.-H., Kumar, V.: Chameleon: hierarchical clustering using dynamic modeling. Computer. 32(8), 68–75 (1999)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings ACM Conference on Knowledge Discovery and Data Mining, Oregon, Portland, pp. 226–231 (1996)
Ankerst, M., Breunig, M., Kriegel, H.-P., Sander, J.: OPTICS: ordering points to identify the clustering structure. In: Proceedings of the ACM SIGMOD international conference on Management of data, New York, NY, USA, pp. 49–60 (1999)
Wang, W., Yang, J., Muntz, R.R.: STING: a statistical information grid approach to spatial data mining. In: Proceedings of the International Conference on Very Large Data Bases, San Francisco, CA, USA, pp 186–195 (1997)
Kaufman, L., Rousseeuwm, P.J.: Clustering by means of medoids. In: Statistical Data Analysis Based on the L1 Norm and Related Methods, pp. 405–416 (1987)
Ng, R.T., Han, J.: Efficient and effective clustering methods for spatial data mining. In: Proceedings of the International Conference on Very Large Data Bases, San Francisco, CA, USA, pp. 144–155 (1994)
Ng, R.T., Han, J.: CLARANS: a method for clustering objects for spatial data mining. In: IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 5, pp. 1003–1016 (2002)
Park, H.-S., Jun, C.-H.: A simple and fast algorithm for k-medoids clustering. Expert Syst. Appl. 36(2), 3336–3341 (2009)
Xia, S., et al.: A fast adaptive k-means with no bounds. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, p. 1 (2020)
Han, J., Kamber, M., Tung, A.K.H.: Spatial clustering methods in data mining: a survey. In: Geographic Data Mining and Knowledge Discovery (2001)
Lucasius, C.B., Dane, A.D., Kateman, G.: On k-medoid clustering of large data sets with the aid of a genetic algorithm: background, feasibility and comparison. Anal. Chim. Acta 282(3), 647–669 (1993)
Gao, S., Zhou, X., Li, S.: Improved K-medoids clustering based on Gray association rule. In: Patnaik, S., Jain, V. (eds.) Recent Developments in Intelligent Computing, Communication and Devices. AISC, vol. 752, pp. 349–356. Springer, Singapore (2019). https://doi.org/10.1007/978-981-10-8944-2_41
Acknowledgement
The authors really appreciate the handling associate editor and all innominate reviews for their valuable comments. This work is supported by the National Natural Science Foundation of China (No. 62076042, No. 61572086), the Key Research and Development Project of Sichuan Province (No. 2020YFG0307, No. 2018TJPT0012), the Key Research and Development Project of Chengdu (No. 2019-YF05–02028-GX), the Innovation Team of Quantum Security Communication of Sichuan Province (No. 17TD0009), Sichuan Science and Technology Program under Grants 2018RZ0072, 20ZDYF0660, the Foundation of Chengdu University of Information Technology under Grant J201707, the National Key R&D Program of China under Grant (No. 2017YFB0802300), the Key Research and Development Project of Chengdu (No. 2019-YF05–02028-GX).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Peng, Q., Zhang, S., Zhang, J., Huang, Y., Yao, B., Tang, H. (2021). Ball K-Medoids: Faster and Exacter. In: Sun, X., Zhang, X., Xia, Z., Bertino, E. (eds) Advances in Artificial Intelligence and Security. ICAIS 2021. Communications in Computer and Information Science, vol 1422. Springer, Cham. https://doi.org/10.1007/978-3-030-78615-1_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-78615-1_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-78614-4
Online ISBN: 978-3-030-78615-1
eBook Packages: Computer ScienceComputer Science (R0)