Abstract
The concept of grouping (or clustering) data points with similar characteristics is of importance when working with the data that frequently appears in everyday life. Data scientists cluster the data that is numerical in nature based on the notion of distance, usually computed using Euclidean measure. However, there are many datasets that often consists of categorical values which require alternative methods for grouping the data. That is why clustering of categorical data employs methods that rely on similarity between the values rather than distance. This work focuses on studying the ability of different clustering algorithms and several definitions of similarity to organize categorical data into groups.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Barbara, D; Jajodia, S.: Applications of data mining in computer security. In: Advances in Information Security, vol. 6 (2002). https://doi.org/10.1007/978-1-4615-0953-0
Boriah, S., Chandola, V., Kumar, V.: Similarity measures for categorical data: A comparative evaluation. In: 8th SIAM International Conference on Data Mining, pp. 243–254 (2008)
Boriah, S., Chandola, V., Kumar, V.: A framework for exploring categorical data. In: 9th SIAM International Conference on Data Mining, pp. 187–198 (2009)
Huang, J.Z.: Clustering categorical data with k-Modes. In: Encyclopedia of Data Warehousing and Mining, 2nd Edn., pp. 246–250 (2009)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985). https://doi.org/10.1007/BF01908075
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)
Morey, L., Agresti, A.: The measurement of classification agreement: an adjustment to the rand statistic for chance agreement. Educ. Psychol. Meas. 44(1), 33–37 (1984)
Muck, I., Hnatyshin, V., Thayasivam, U.: Accuracy of class prediction using similarity functions in PAM. In: IEEE International Conference on Industrial Technology(ICIT), pp. 586–591 (2016)
Park, H.-S., Jun, C.-H.: A simple and fast algorithm for K-medoids clustering. Expert Syst. Appl. 36, 3336–3341 (2009)
Vreda, P., Black, P.E.: Manhattan distance. Dictionary of Algorithms and Data Structures 5/31/06. www.nist.gov/dads/HTML/manhattanDistance.html. Accessed 09 May 2017
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971). https://doi.org/10.2307/2284239. JSTOR 2284239
Saxena, A., Singh, M.: Using categorical attributes for clustering. Int. J. Eng. Appl. Comput. Sci. 2(2), 324–329 (2016). https://doi.org/10.24032/ijeacs
University of California Irvine: KDD Cup 1999 Data. kdd.ics.uci.edu/databases/kddcup99/kddcup99.html. Accessed 09 May 2017
University of California Irvine: Machine Learning Repository. archive.ics.uci.edu/ml/datasets.html. Accessed 09 May 2017
University of Eastern Finland: Clustering Benchmark Datasets. cs.joensuu.fi/sipu/datasets/. Accessed 09 May 2017
Wagner S., Wagner, D.: Comparing clustering: An overview. Technical Report 2006-04, Faculty of Informatics, Universität Karlsruhe (TH), 2007
Zhou, E., et al.: PAM spatial clustering algorithm research based on CUDA. In: 24th International Conference on Geoinformatics, August 2016. https://doi.org/10.1109/geoinformatics.2016.7578971
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Dixon, MG., Genov, S., Hnatyshin, V., Thayasivam, U. (2019). Accuracy of Clustering Prediction of PAM and K-Modes Algorithms. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Advances in Information and Communication Networks. FICC 2018. Advances in Intelligent Systems and Computing, vol 886. Springer, Cham. https://doi.org/10.1007/978-3-030-03402-3_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-03402-3_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03401-6
Online ISBN: 978-3-030-03402-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)