Performance Analysis of Uncertain K-means Clustering Algorithm Using Different Distance Metrics

Aggarwal, Swati; Agarwal, Nitika; Jain, Monal

doi:10.1007/978-981-13-1132-1_19

Swati Aggarwal¹⁶,
Nitika Agarwal¹⁶ &
Monal Jain¹⁶

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 798))

813 Accesses
4 Citations

Abstract

Real-world data generally deals with inconsistency. The uncertain k-means (UK-means) clustering algorithm, a modification of k-means, handles uncertain objects whose positions are represented by probability density functions (pdfs). Various techniques have been developed to enhance the performance of UK-means clustering algorithm but they are all centered on two major factors: choosing initial cluster centers and determining the number of clusters. This paper proposes that the measure of “closeness” is also a critical factor in deciding the quality of clusters. In this paper, the authors study the performance of UK-means clustering algorithm on four different distance functions using Haberman’s survival dataset. The analysis is performed on the basis of Davies–Bouldin index and purity values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Wu, J.-S., Lai, J.-H., Wang, C.-D.: A novel co-clustering method with intra-similarities.In: 2011 IEEE 11th International Conference on Data Mining Workshops (ICDMW). IEEE (2011)
Google Scholar
Kaufman, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley Interscience (1990)
Google Scholar
Review of various enhancement for clustering algorithms in big data mining. Int. J. Adv. Res. Comput. Sci. Softw. Eng. (2015)
Google Scholar
Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., … , Steinberg, D.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)
Article Google Scholar
Qiao, J., Lu, Y.: A new algorithm for choosing initial cluster centers for k-means. In: Proceedings of the 2nd International Conference on Computer Science and Electronics Engineering. Atlantis Press (2013)
Google Scholar
Sugar, C.A., James, G.M.: Finding the number of clusters in a dataset. J. Am. Stat. Assoc. (2011)
Google Scholar
Aggarwal, C.C., Yu, P.S.: A survey of uncertain data algorithms and applications. IEEE Trans. Knowl. Data Eng. 21(5), 609–623 (2009)
Article Google Scholar
Aggarwal, C.C.: Managing and Mining Uncertain Data. Springer (2009)
Google Scholar
Chau, M., Cheng, R., Kao, B.: Uncertain data mining: a new research direction. In: Proceedings of the Workshop on the Sciences of the Artificial, Hualien, Taiwan, 7–8 Dec 2005
Google Scholar
Aggarwal, S., Agarwal, N., Jain, M.: Uncertain data mining: A review of optimization methods for UK-means. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, pp. 3672–3677 (2016)
Google Scholar
Lee, S.D., Kao, B., Cheng, R.: Reducing UK-means to K-means. In: The 1st Workshop on Data Mining of Uncertain Data (DUNE), in Conjunction with ICDM (2007)
Google Scholar
Chau, M., Cheng, R., Kao, B., Ng, J.: Uncertain data mining: an example in clustering location data. In: Proceeding Pacific-Asia Conference Knowledge Discovery and Data Mining (PAKDD), pp. 199–204 Apr 2006
Chapter Google Scholar
Ji, J., Pang, W., Zheng, Y., Wang, Z., Ma, Z.: An initialization method for clustering mixed numeric and categorical data based on the density and distance. Int. J. Pattern Recognit. Artif. Intell. 29, 1550024 (2015)
Article MathSciNet Google Scholar
Vimal, A., Valluri, S.R., Karlapalem, K.: An experiment with distance measures for clustering. In: COMAD (2008)
Google Scholar
Giancarlo, R., Bosco, G.L., Pinello, L.: Distance functions, clustering algorithms and microarray data analysis. Learning and Intelligent Optimization, pp. 125–138. Springer, Berlin, Heidelberg (2010)
Google Scholar
Giancarlo, R., Bosco, G.L., Pinello, L.: Distance functions, clustering algorithms and microarray data analysis. In: Proceedings of the 4th International Conference on Learning and Intelligent Optimization, pp. 125–138 (2010)
Google Scholar
Arkhangel’skii, A.V., Pontryagin, L.S.: General Topology I: Basic Concepts and Constructions Dimension Theory. Springer, Encyclopedia of Mathematical Sciences (1990). ISBN 3-540-18178-4
Book Google Scholar
Yiakopoulos, C., Gryllias, K., Antoniadis, I.: Rolling element bearing fault classification using K-means frequency domain based clustering. In: ASME 2009 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. American Society of Mechanical Engineers (2009)
Google Scholar
Färber, I., et al.: On using class-labels in evaluation of clusterings. In: MultiClust: 1st Inter-National Workshop on Discovering, Summarizing and Using Multiple Clusterings held in Conjunction with KDD (2010)
Google Scholar
Lichman, M.: UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA [http://archive.ics.uci.edu/ml] (2013)
The Central Limit Theorem Math.uah.edu. Retrieved 23 Jan 2017
Google Scholar
Rice, J.: Mathematical Statistics and Data Analysis, 2nd edn. Duxbury Press (1995). ISBN 0-534-20934-3.)
Google Scholar
Seber, G.A.F.: Multivariate Observations. Wiley Inc, Hoboken, NJ (1984)
Book Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Netaji Subhas Institute of Technology, Delhi, India
Swati Aggarwal, Nitika Agarwal & Monal Jain

Authors

Swati Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar
Nitika Agarwal
View author publications
You can also search for this author in PubMed Google Scholar
Monal Jain
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Swati Aggarwal .

Editor information

Editors and Affiliations

Department of Electrical Engineering, Indian Institute of Technology Kanpur, Kanpur, Uttar Pradesh, India
Nishchal K. Verma
Department of Aerospace Engineering, Indian Institute of Technology Kanpur, Kanpur, Uttar Pradesh, India
A. K. Ghosh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aggarwal, S., Agarwal, N., Jain, M. (2019). Performance Analysis of Uncertain K-means Clustering Algorithm Using Different Distance Metrics. In: Verma, N., Ghosh, A. (eds) Computational Intelligence: Theories, Applications and Future Directions - Volume I. Advances in Intelligent Systems and Computing, vol 798. Springer, Singapore. https://doi.org/10.1007/978-981-13-1132-1_19

Download citation

DOI: https://doi.org/10.1007/978-981-13-1132-1_19
Published: 01 August 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1131-4
Online ISBN: 978-981-13-1132-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics