Skip to main content

Performance Analysis of Uncertain K-means Clustering Algorithm Using Different Distance Metrics

  • Conference paper
  • First Online:
Computational Intelligence: Theories, Applications and Future Directions - Volume I

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 798))

Abstract

Real-world data generally deals with inconsistency. The uncertain k-means (UK-means) clustering algorithm, a modification of k-means, handles uncertain objects whose positions are represented by probability density functions (pdfs). Various techniques have been developed to enhance the performance of UK-means clustering algorithm but they are all centered on two major factors: choosing initial cluster centers and determining the number of clusters. This paper proposes that the measure of “closeness” is also a critical factor in deciding the quality of clusters. In this paper, the authors study the performance of UK-means clustering algorithm on four different distance functions using Haberman’s survival dataset. The analysis is performed on the basis of Davies–Bouldin index and purity values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wu, J.-S., Lai, J.-H., Wang, C.-D.: A novel co-clustering method with intra-similarities.In: 2011 IEEE 11th International Conference on Data Mining Workshops (ICDMW). IEEE (2011)

    Google Scholar 

  2. Kaufman, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley Interscience (1990)

    Google Scholar 

  3. Review of various enhancement for clustering algorithms in big data mining. Int. J. Adv. Res. Comput. Sci. Softw. Eng. (2015)

    Google Scholar 

  4. Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., … , Steinberg, D.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)

    Article  Google Scholar 

  5. Qiao, J., Lu, Y.: A new algorithm for choosing initial cluster centers for k-means. In: Proceedings of the 2nd International Conference on Computer Science and Electronics Engineering. Atlantis Press (2013)

    Google Scholar 

  6. Sugar, C.A., James, G.M.: Finding the number of clusters in a dataset. J. Am. Stat. Assoc. (2011)

    Google Scholar 

  7. Aggarwal, C.C., Yu, P.S.: A survey of uncertain data algorithms and applications. IEEE Trans. Knowl. Data Eng. 21(5), 609–623 (2009)

    Article  Google Scholar 

  8. Aggarwal, C.C.: Managing and Mining Uncertain Data. Springer (2009)

    Google Scholar 

  9. Chau, M., Cheng, R., Kao, B.: Uncertain data mining: a new research direction. In: Proceedings of the Workshop on the Sciences of the Artificial, Hualien, Taiwan, 7–8 Dec 2005

    Google Scholar 

  10. Aggarwal, S., Agarwal, N., Jain, M.: Uncertain data mining: A review of optimization methods for UK-means. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, pp. 3672–3677 (2016)

    Google Scholar 

  11. Lee, S.D., Kao, B., Cheng, R.: Reducing UK-means to K-means. In: The 1st Workshop on Data Mining of Uncertain Data (DUNE), in Conjunction with ICDM (2007)

    Google Scholar 

  12. Chau, M., Cheng, R., Kao, B., Ng, J.: Uncertain data mining: an example in clustering location data. In: Proceeding Pacific-Asia Conference Knowledge Discovery and Data Mining (PAKDD), pp. 199–204 Apr 2006

    Chapter  Google Scholar 

  13. Ji, J., Pang, W., Zheng, Y., Wang, Z., Ma, Z.: An initialization method for clustering mixed numeric and categorical data based on the density and distance. Int. J. Pattern Recognit. Artif. Intell. 29, 1550024 (2015)

    Article  MathSciNet  Google Scholar 

  14. Vimal, A., Valluri, S.R., Karlapalem, K.: An experiment with distance measures for clustering. In: COMAD (2008)

    Google Scholar 

  15. Giancarlo, R., Bosco, G.L., Pinello, L.: Distance functions, clustering algorithms and microarray data analysis. Learning and Intelligent Optimization, pp. 125–138. Springer, Berlin, Heidelberg (2010)

    Google Scholar 

  16. Giancarlo, R., Bosco, G.L., Pinello, L.: Distance functions, clustering algorithms and microarray data analysis. In: Proceedings of the 4th International Conference on Learning and Intelligent Optimization, pp. 125–138 (2010)

    Google Scholar 

  17. Arkhangel’skii, A.V., Pontryagin, L.S.: General Topology I: Basic Concepts and Constructions Dimension Theory. Springer, Encyclopedia of Mathematical Sciences (1990). ISBN 3-540-18178-4

    Book  Google Scholar 

  18. Yiakopoulos, C., Gryllias, K., Antoniadis, I.: Rolling element bearing fault classification using K-means frequency domain based clustering. In: ASME 2009 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. American Society of Mechanical Engineers (2009)

    Google Scholar 

  19. Färber, I., et al.: On using class-labels in evaluation of clusterings. In: MultiClust: 1st Inter-National Workshop on Discovering, Summarizing and Using Multiple Clusterings held in Conjunction with KDD (2010)

    Google Scholar 

  20. Lichman, M.: UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA [http://archive.ics.uci.edu/ml] (2013)

  21. The Central Limit Theorem Math.uah.edu. Retrieved 23 Jan 2017

    Google Scholar 

  22. Rice, J.: Mathematical Statistics and Data Analysis, 2nd edn. Duxbury Press (1995). ISBN 0-534-20934-3.)

    Google Scholar 

  23. Seber, G.A.F.: Multivariate Observations. Wiley Inc, Hoboken, NJ (1984)

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Swati Aggarwal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Aggarwal, S., Agarwal, N., Jain, M. (2019). Performance Analysis of Uncertain K-means Clustering Algorithm Using Different Distance Metrics. In: Verma, N., Ghosh, A. (eds) Computational Intelligence: Theories, Applications and Future Directions - Volume I. Advances in Intelligent Systems and Computing, vol 798. Springer, Singapore. https://doi.org/10.1007/978-981-13-1132-1_19

Download citation

Publish with us

Policies and ethics