Accuracy of Clustering Prediction of PAM and K-Modes Algorithms

  • Marc-Gregory Dixon
  • Stanimir Genov
  • Vasil Hnatyshin
  • Umashanger Thayasivam
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 886)


The concept of grouping (or clustering) data points with similar characteristics is of importance when working with the data that frequently appears in everyday life. Data scientists cluster the data that is numerical in nature based on the notion of distance, usually computed using Euclidean measure. However, there are many datasets that often consists of categorical values which require alternative methods for grouping the data. That is why clustering of categorical data employs methods that rely on similarity between the values rather than distance. This work focuses on studying the ability of different clustering algorithms and several definitions of similarity to organize categorical data into groups.


Clustering Partitioning around medoids K-modes Similarity functions 


  1. 1.
    Barbara, D; Jajodia, S.: Applications of data mining in computer security. In: Advances in Information Security, vol. 6 (2002). Scholar
  2. 2.
    Boriah, S., Chandola, V., Kumar, V.: Similarity measures for categorical data: A comparative evaluation. In: 8th SIAM International Conference on Data Mining, pp. 243–254 (2008)CrossRefGoogle Scholar
  3. 3.
    Boriah, S., Chandola, V., Kumar, V.: A framework for exploring categorical data. In: 9th SIAM International Conference on Data Mining, pp. 187–198 (2009)Google Scholar
  4. 4.
    Huang, J.Z.: Clustering categorical data with k-Modes. In: Encyclopedia of Data Warehousing and Mining, 2nd Edn., pp. 246–250 (2009) Google Scholar
  5. 5.
    Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985). Scholar
  6. 6.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)CrossRefGoogle Scholar
  7. 7.
    Morey, L., Agresti, A.: The measurement of classification agreement: an adjustment to the rand statistic for chance agreement. Educ. Psychol. Meas. 44(1), 33–37 (1984)CrossRefGoogle Scholar
  8. 8.
    Muck, I., Hnatyshin, V., Thayasivam, U.: Accuracy of class prediction using similarity functions in PAM. In: IEEE International Conference on Industrial Technology(ICIT), pp. 586–591 (2016)Google Scholar
  9. 9.
    Park, H.-S., Jun, C.-H.: A simple and fast algorithm for K-medoids clustering. Expert Syst. Appl. 36, 3336–3341 (2009)CrossRefGoogle Scholar
  10. 10.
    Vreda, P., Black, P.E.: Manhattan distance. Dictionary of Algorithms and Data Structures 5/31/06. Accessed 09 May 2017
  11. 11.
    Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971). JSTOR 2284239CrossRefGoogle Scholar
  12. 12.
    Saxena, A., Singh, M.: Using categorical attributes for clustering. Int. J. Eng. Appl. Comput. Sci. 2(2), 324–329 (2016). Scholar
  13. 13.
    University of California Irvine: KDD Cup 1999 Data. Accessed 09 May 2017
  14. 14.
    University of California Irvine: Machine Learning Repository. Accessed 09 May 2017
  15. 15.
    University of Eastern Finland: Clustering Benchmark Datasets. Accessed 09 May 2017
  16. 16.
    Wagner S., Wagner, D.: Comparing clustering: An overview. Technical Report 2006-04, Faculty of Informatics, Universität Karlsruhe (TH), 2007Google Scholar
  17. 17.
    Zhou, E., et al.: PAM spatial clustering algorithm research based on CUDA. In: 24th International Conference on Geoinformatics, August 2016.

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Marc-Gregory Dixon
    • 1
  • Stanimir Genov
    • 2
  • Vasil Hnatyshin
    • 2
  • Umashanger Thayasivam
    • 1
  1. 1.Department of MathematicsRowan UniversityGlassboroUSA
  2. 2.Department of Computer ScienceRowan UniversityGlassboroUSA

Personalised recommendations