Methods for Clustering Categorical and Mixed Data: An Overview and New Algorithms

Miyamoto, Sadaaki; Huynh, Van-Nam; Fujiwara, Shuhei

doi:10.1007/978-3-319-75429-1_7

Sadaaki Miyamoto¹⁷,
Van-Nam Huynh¹⁸ &
Shuhei Fujiwara¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10758))

Included in the following conference series:

International Symposium on Integrated Uncertainty in Knowledge Modelling and Decision Making

1487 Accesses
1 Citations

Abstract

Methods of clustering for categorical and mixed data are considered. Dissimilarities for this purpose are reviewed and different classes of algorithms according to different classes of similarities are discussed. Details of several algorithms are then given, which include agglomerative hierarchical clustering, K-means and related methods such as K-medoids and K-modes, and methods of network clustering. The way how the combinations of existing ideas leads to new algorithms is discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Anderberg, M.R.: Cluster Analysis for Applications. Academic Press, New York (1973)
MATH Google Scholar
Arthur, D., Vassilvitskii, S.: k-means++: The advantages of careful seeding. In: Proceedings of SODA 2007, pp. 1027–1035 (2007)
Google Scholar
Bezdek, J.C.: Fuzzy Mathematics in Pattern Classification, Ph.D. Thesis, Cornell University, Ithaca, NY (1973)
Google Scholar
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer, Norwell (1981)
Book MATH Google Scholar
Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. P10008 (2008)
Google Scholar
Davé, R.N.: Characterization and detection of noise in clustering. Pattern Recogn. Lett. 12, 657–664 (1991)
Article Google Scholar
Davé, R.N., Krishnapuram, R.: Robust clustering methods: a unified view. IEEE Trans. Fuzzy Syst. 5(2), 270–293 (1997)
Article Google Scholar
Dunn, J.C.: A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Cybern. Syst. 3, 32–57 (1973)
MathSciNet MATH Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD 1996, pp. 226–231 (1996)
Google Scholar
Everitt, B.S.: Cluster Analysis, 3rd edn. Arnold, London (1993)
MATH Google Scholar
Fujiwara, S.: Hierarchical Clustering for Directed Network Data, Master’s thesis. University of Tsukuba, Master’s Program in Risk Engineering (2017). (in Japanese)
Google Scholar
Honda, K., Oshio, S., Notsu, A.: Fuzzy co-clustering induced by multinomial mixture model. J. Adv. Comput. Intell. Intell. Inform. 19(6), 717–726 (2015)
Article Google Scholar
Kaizu, Y., Miyamoto, S., Endo, Y.: Hard fuzzy C-Medoids for asymmetric networks. In: Proceedings of 16th World Congress of the International Fuzzy Systems Association (IFSA 2015), 30 June–July 3, Gijon, Spain, pp. 435–440 (2015)
Google Scholar
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)
Book MATH Google Scholar
MacQueen, J.B.: Some methods of classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1 (University of California Press 1967), pp. 281–297 (1967)
Google Scholar
Miyahara, S., Miyamoto, S.: A family of algorithms using spectral clustering and DBSCAN. In: Proceedings of 2014 IEEE International Conference on Granular Computing (GrC 2014), Noboribetsu, Hokkaido, Japan, pp. 196–200, 22–24 October 2014
Google Scholar
Miyamoto, S., Agusta, Y.: An efficient algorithm for \(\ell _1\) fuzzy \(c\)-means and its termination. Control Cybern. 24(4), 421–436 (1993)
MATH Google Scholar
Miyamoto, S.: Fuzzy Sets in Information Retrieval and Cluster Analysis. Springer, Heidelberg (1990)
Book MATH Google Scholar
Miyamoto, S., Ichihashi, H., Honda, K.: Algorithms for Fuzzy Clustering. Springer, Heidelberg (2008)
MATH Google Scholar
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)
Article Google Scholar
Newman, M.E.J.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69, 066133 (2004)
Article Google Scholar
Tamura, Y., Miyamoto, S.: A method of two stage clustering using agglomerative hierarchical algorithms with one-Pass k-Means++ or k-Median++. In: Proceedings of 2014 IEEE International Conference on Granular Computing (GrC2014), Noboribetsu, Hokkaido, Japan, pp. 281–285, 22–24 October 2014
Google Scholar

Download references

Acknowledgment

This paper is based upon work supported in part by the Air Force Office of Scientific Research/Asian Office of Aerospace Research and Development (AFOSR/AOARD) under award number FA2386-17-1-4046.

Author information

Authors and Affiliations

University of Tsukuba, Tsukuba, Japan
Sadaaki Miyamoto & Shuhei Fujiwara
Japan Advanced Institute of Science and Technology, Nomi, Japan
Van-Nam Huynh

Authors

Sadaaki Miyamoto
View author publications
You can also search for this author in PubMed Google Scholar
Van-Nam Huynh
View author publications
You can also search for this author in PubMed Google Scholar
Shuhei Fujiwara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sadaaki Miyamoto .

Editor information

Editors and Affiliations

Japan Advanced Institute of Science and Technology, Nomi, Japan
Van-Nam Huynh
Osaka University, Osaka, Japan
Masahiro Inuiguchi
Hanoi National University of Education, Hanoi, Vietnam
Dang Hung Tran
Université de Technologie de Compiègne, Compiègne, France
Thierry Denoeux

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Miyamoto, S., Huynh, VN., Fujiwara, S. (2018). Methods for Clustering Categorical and Mixed Data: An Overview and New Algorithms. In: Huynh, VN., Inuiguchi, M., Tran, D., Denoeux, T. (eds) Integrated Uncertainty in Knowledge Modelling and Decision Making. IUKM 2018. Lecture Notes in Computer Science(), vol 10758. Springer, Cham. https://doi.org/10.1007/978-3-319-75429-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-75429-1_7
Published: 04 February 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75428-4
Online ISBN: 978-3-319-75429-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics