Skip to main content

Accuracy of Clustering Prediction of PAM and K-Modes Algorithms

  • Conference paper
  • First Online:
Advances in Information and Communication Networks (FICC 2018)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 886))

Included in the following conference series:

Abstract

The concept of grouping (or clustering) data points with similar characteristics is of importance when working with the data that frequently appears in everyday life. Data scientists cluster the data that is numerical in nature based on the notion of distance, usually computed using Euclidean measure. However, there are many datasets that often consists of categorical values which require alternative methods for grouping the data. That is why clustering of categorical data employs methods that rely on similarity between the values rather than distance. This work focuses on studying the ability of different clustering algorithms and several definitions of similarity to organize categorical data into groups.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Barbara, D; Jajodia, S.: Applications of data mining in computer security. In: Advances in Information Security, vol. 6 (2002). https://doi.org/10.1007/978-1-4615-0953-0

    MATH  Google Scholar 

  2. Boriah, S., Chandola, V., Kumar, V.: Similarity measures for categorical data: A comparative evaluation. In: 8th SIAM International Conference on Data Mining, pp. 243–254 (2008)

    Chapter  Google Scholar 

  3. Boriah, S., Chandola, V., Kumar, V.: A framework for exploring categorical data. In: 9th SIAM International Conference on Data Mining, pp. 187–198 (2009)

    Google Scholar 

  4. Huang, J.Z.: Clustering categorical data with k-Modes. In: Encyclopedia of Data Warehousing and Mining, 2nd Edn., pp. 246–250 (2009)

    Google Scholar 

  5. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985). https://doi.org/10.1007/BF01908075

    Article  MATH  Google Scholar 

  6. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)

    Article  Google Scholar 

  7. Morey, L., Agresti, A.: The measurement of classification agreement: an adjustment to the rand statistic for chance agreement. Educ. Psychol. Meas. 44(1), 33–37 (1984)

    Article  Google Scholar 

  8. Muck, I., Hnatyshin, V., Thayasivam, U.: Accuracy of class prediction using similarity functions in PAM. In: IEEE International Conference on Industrial Technology(ICIT), pp. 586–591 (2016)

    Google Scholar 

  9. Park, H.-S., Jun, C.-H.: A simple and fast algorithm for K-medoids clustering. Expert Syst. Appl. 36, 3336–3341 (2009)

    Article  Google Scholar 

  10. Vreda, P., Black, P.E.: Manhattan distance. Dictionary of Algorithms and Data Structures 5/31/06. www.nist.gov/dads/HTML/manhattanDistance.html. Accessed 09 May 2017

  11. Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971). https://doi.org/10.2307/2284239. JSTOR 2284239

    Article  Google Scholar 

  12. Saxena, A., Singh, M.: Using categorical attributes for clustering. Int. J. Eng. Appl. Comput. Sci. 2(2), 324–329 (2016). https://doi.org/10.24032/ijeacs

    Article  Google Scholar 

  13. University of California Irvine: KDD Cup 1999 Data. kdd.ics.uci.edu/databases/kddcup99/kddcup99.html. Accessed 09 May 2017

  14. University of California Irvine: Machine Learning Repository. archive.ics.uci.edu/ml/datasets.html. Accessed 09 May 2017

  15. University of Eastern Finland: Clustering Benchmark Datasets. cs.joensuu.fi/sipu/datasets/. Accessed 09 May 2017

  16. Wagner S., Wagner, D.: Comparing clustering: An overview. Technical Report 2006-04, Faculty of Informatics, Universität Karlsruhe (TH), 2007

    Google Scholar 

  17. Zhou, E., et al.: PAM spatial clustering algorithm research based on CUDA. In: 24th International Conference on Geoinformatics, August 2016. https://doi.org/10.1109/geoinformatics.2016.7578971

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc-Gregory Dixon .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dixon, MG., Genov, S., Hnatyshin, V., Thayasivam, U. (2019). Accuracy of Clustering Prediction of PAM and K-Modes Algorithms. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Advances in Information and Communication Networks. FICC 2018. Advances in Intelligent Systems and Computing, vol 886. Springer, Cham. https://doi.org/10.1007/978-3-030-03402-3_22

Download citation

Publish with us

Policies and ethics