Abstract
Real-world data generally deals with inconsistency. The uncertain k-means (UK-means) clustering algorithm, a modification of k-means, handles uncertain objects whose positions are represented by probability density functions (pdfs). Various techniques have been developed to enhance the performance of UK-means clustering algorithm but they are all centered on two major factors: choosing initial cluster centers and determining the number of clusters. This paper proposes that the measure of “closeness” is also a critical factor in deciding the quality of clusters. In this paper, the authors study the performance of UK-means clustering algorithm on four different distance functions using Haberman’s survival dataset. The analysis is performed on the basis of Davies–Bouldin index and purity values.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wu, J.-S., Lai, J.-H., Wang, C.-D.: A novel co-clustering method with intra-similarities.In: 2011 IEEE 11th International Conference on Data Mining Workshops (ICDMW). IEEE (2011)
Kaufman, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley Interscience (1990)
Review of various enhancement for clustering algorithms in big data mining. Int. J. Adv. Res. Comput. Sci. Softw. Eng. (2015)
Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., … , Steinberg, D.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)
Qiao, J., Lu, Y.: A new algorithm for choosing initial cluster centers for k-means. In: Proceedings of the 2nd International Conference on Computer Science and Electronics Engineering. Atlantis Press (2013)
Sugar, C.A., James, G.M.: Finding the number of clusters in a dataset. J. Am. Stat. Assoc. (2011)
Aggarwal, C.C., Yu, P.S.: A survey of uncertain data algorithms and applications. IEEE Trans. Knowl. Data Eng. 21(5), 609–623 (2009)
Aggarwal, C.C.: Managing and Mining Uncertain Data. Springer (2009)
Chau, M., Cheng, R., Kao, B.: Uncertain data mining: a new research direction. In: Proceedings of the Workshop on the Sciences of the Artificial, Hualien, Taiwan, 7–8 Dec 2005
Aggarwal, S., Agarwal, N., Jain, M.: Uncertain data mining: A review of optimization methods for UK-means. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, pp. 3672–3677 (2016)
Lee, S.D., Kao, B., Cheng, R.: Reducing UK-means to K-means. In: The 1st Workshop on Data Mining of Uncertain Data (DUNE), in Conjunction with ICDM (2007)
Chau, M., Cheng, R., Kao, B., Ng, J.: Uncertain data mining: an example in clustering location data. In: Proceeding Pacific-Asia Conference Knowledge Discovery and Data Mining (PAKDD), pp. 199–204 Apr 2006
Ji, J., Pang, W., Zheng, Y., Wang, Z., Ma, Z.: An initialization method for clustering mixed numeric and categorical data based on the density and distance. Int. J. Pattern Recognit. Artif. Intell. 29, 1550024 (2015)
Vimal, A., Valluri, S.R., Karlapalem, K.: An experiment with distance measures for clustering. In: COMAD (2008)
Giancarlo, R., Bosco, G.L., Pinello, L.: Distance functions, clustering algorithms and microarray data analysis. Learning and Intelligent Optimization, pp. 125–138. Springer, Berlin, Heidelberg (2010)
Giancarlo, R., Bosco, G.L., Pinello, L.: Distance functions, clustering algorithms and microarray data analysis. In: Proceedings of the 4th International Conference on Learning and Intelligent Optimization, pp. 125–138 (2010)
Arkhangel’skii, A.V., Pontryagin, L.S.: General Topology I: Basic Concepts and Constructions Dimension Theory. Springer, Encyclopedia of Mathematical Sciences (1990). ISBN 3-540-18178-4
Yiakopoulos, C., Gryllias, K., Antoniadis, I.: Rolling element bearing fault classification using K-means frequency domain based clustering. In: ASME 2009 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. American Society of Mechanical Engineers (2009)
Färber, I., et al.: On using class-labels in evaluation of clusterings. In: MultiClust: 1st Inter-National Workshop on Discovering, Summarizing and Using Multiple Clusterings held in Conjunction with KDD (2010)
Lichman, M.: UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA [http://archive.ics.uci.edu/ml] (2013)
The Central Limit Theorem Math.uah.edu. Retrieved 23 Jan 2017
Rice, J.: Mathematical Statistics and Data Analysis, 2nd edn. Duxbury Press (1995). ISBN 0-534-20934-3.)
Seber, G.A.F.: Multivariate Observations. Wiley Inc, Hoboken, NJ (1984)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Aggarwal, S., Agarwal, N., Jain, M. (2019). Performance Analysis of Uncertain K-means Clustering Algorithm Using Different Distance Metrics. In: Verma, N., Ghosh, A. (eds) Computational Intelligence: Theories, Applications and Future Directions - Volume I. Advances in Intelligent Systems and Computing, vol 798. Springer, Singapore. https://doi.org/10.1007/978-981-13-1132-1_19
Download citation
DOI: https://doi.org/10.1007/978-981-13-1132-1_19
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1131-4
Online ISBN: 978-981-13-1132-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)