Accuracy of Clustering Prediction of PAM and K-Modes Algorithms

Dixon, Marc-Gregory; Genov, Stanimir; Hnatyshin, Vasil; Thayasivam, Umashanger

doi:10.1007/978-3-030-03402-3_22

Marc-Gregory Dixon¹⁷,
Stanimir Genov¹⁸,
Vasil Hnatyshin¹⁸ &
…
Umashanger Thayasivam¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 886))

Included in the following conference series:

Future of Information and Communication Conference

1089 Accesses
2 Citations

Abstract

The concept of grouping (or clustering) data points with similar characteristics is of importance when working with the data that frequently appears in everyday life. Data scientists cluster the data that is numerical in nature based on the notion of distance, usually computed using Euclidean measure. However, there are many datasets that often consists of categorical values which require alternative methods for grouping the data. That is why clustering of categorical data employs methods that rely on similarity between the values rather than distance. This work focuses on studying the ability of different clustering algorithms and several definitions of similarity to organize categorical data into groups.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Barbara, D; Jajodia, S.: Applications of data mining in computer security. In: Advances in Information Security, vol. 6 (2002). https://doi.org/10.1007/978-1-4615-0953-0
MATH Google Scholar
Boriah, S., Chandola, V., Kumar, V.: Similarity measures for categorical data: A comparative evaluation. In: 8th SIAM International Conference on Data Mining, pp. 243–254 (2008)
Chapter Google Scholar
Boriah, S., Chandola, V., Kumar, V.: A framework for exploring categorical data. In: 9th SIAM International Conference on Data Mining, pp. 187–198 (2009)
Google Scholar
Huang, J.Z.: Clustering categorical data with k-Modes. In: Encyclopedia of Data Warehousing and Mining, 2nd Edn., pp. 246–250 (2009)
Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985). https://doi.org/10.1007/BF01908075
Article MATH Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)
Article Google Scholar
Morey, L., Agresti, A.: The measurement of classification agreement: an adjustment to the rand statistic for chance agreement. Educ. Psychol. Meas. 44(1), 33–37 (1984)
Article Google Scholar
Muck, I., Hnatyshin, V., Thayasivam, U.: Accuracy of class prediction using similarity functions in PAM. In: IEEE International Conference on Industrial Technology(ICIT), pp. 586–591 (2016)
Google Scholar
Park, H.-S., Jun, C.-H.: A simple and fast algorithm for K-medoids clustering. Expert Syst. Appl. 36, 3336–3341 (2009)
Article Google Scholar
Vreda, P., Black, P.E.: Manhattan distance. Dictionary of Algorithms and Data Structures 5/31/06. www.nist.gov/dads/HTML/manhattanDistance.html. Accessed 09 May 2017
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971). https://doi.org/10.2307/2284239. JSTOR 2284239
Article Google Scholar
Saxena, A., Singh, M.: Using categorical attributes for clustering. Int. J. Eng. Appl. Comput. Sci. 2(2), 324–329 (2016). https://doi.org/10.24032/ijeacs
Article Google Scholar
University of California Irvine: KDD Cup 1999 Data. kdd.ics.uci.edu/databases/kddcup99/kddcup99.html. Accessed 09 May 2017
University of California Irvine: Machine Learning Repository. archive.ics.uci.edu/ml/datasets.html. Accessed 09 May 2017
University of Eastern Finland: Clustering Benchmark Datasets. cs.joensuu.fi/sipu/datasets/. Accessed 09 May 2017
Wagner S., Wagner, D.: Comparing clustering: An overview. Technical Report 2006-04, Faculty of Informatics, Universität Karlsruhe (TH), 2007
Google Scholar
Zhou, E., et al.: PAM spatial clustering algorithm research based on CUDA. In: 24th International Conference on Geoinformatics, August 2016. https://doi.org/10.1109/geoinformatics.2016.7578971

Download references

Author information

Authors and Affiliations

Department of Mathematics, Rowan University, Glassboro, NJ, 08062, USA
Marc-Gregory Dixon & Umashanger Thayasivam
Department of Computer Science, Rowan University, Glassboro, NJ, 08062, USA
Stanimir Genov & Vasil Hnatyshin

Authors

Marc-Gregory Dixon
View author publications
You can also search for this author in PubMed Google Scholar
Stanimir Genov
View author publications
You can also search for this author in PubMed Google Scholar
Vasil Hnatyshin
View author publications
You can also search for this author in PubMed Google Scholar
Umashanger Thayasivam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marc-Gregory Dixon .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai
The Science and Information (SAI) Organization, London, UK
Supriya Kapoor
The Science and Information (SAI) Organization, Bradford, UK
Rahul Bhatia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dixon, MG., Genov, S., Hnatyshin, V., Thayasivam, U. (2019). Accuracy of Clustering Prediction of PAM and K-Modes Algorithms. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Advances in Information and Communication Networks. FICC 2018. Advances in Intelligent Systems and Computing, vol 886. Springer, Cham. https://doi.org/10.1007/978-3-030-03402-3_22

Download citation

DOI: https://doi.org/10.1007/978-3-030-03402-3_22
Published: 06 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03401-6
Online ISBN: 978-3-030-03402-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics