Skip to main content

Methods for Clustering Categorical and Mixed Data: An Overview and New Algorithms

  • Conference paper
  • First Online:
Integrated Uncertainty in Knowledge Modelling and Decision Making (IUKM 2018)

Abstract

Methods of clustering for categorical and mixed data are considered. Dissimilarities for this purpose are reviewed and different classes of algorithms according to different classes of similarities are discussed. Details of several algorithms are then given, which include agglomerative hierarchical clustering, K-means and related methods such as K-medoids and K-modes, and methods of network clustering. The way how the combinations of existing ideas leads to new algorithms is discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Anderberg, M.R.: Cluster Analysis for Applications. Academic Press, New York (1973)

    MATH  Google Scholar 

  2. Arthur, D., Vassilvitskii, S.: k-means++: The advantages of careful seeding. In: Proceedings of SODA 2007, pp. 1027–1035 (2007)

    Google Scholar 

  3. Bezdek, J.C.: Fuzzy Mathematics in Pattern Classification, Ph.D. Thesis, Cornell University, Ithaca, NY (1973)

    Google Scholar 

  4. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer, Norwell (1981)

    Book  MATH  Google Scholar 

  5. Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. P10008 (2008)

    Google Scholar 

  6. Davé, R.N.: Characterization and detection of noise in clustering. Pattern Recogn. Lett. 12, 657–664 (1991)

    Article  Google Scholar 

  7. Davé, R.N., Krishnapuram, R.: Robust clustering methods: a unified view. IEEE Trans. Fuzzy Syst. 5(2), 270–293 (1997)

    Article  Google Scholar 

  8. Dunn, J.C.: A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Cybern. Syst. 3, 32–57 (1973)

    MathSciNet  MATH  Google Scholar 

  9. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD 1996, pp. 226–231 (1996)

    Google Scholar 

  10. Everitt, B.S.: Cluster Analysis, 3rd edn. Arnold, London (1993)

    MATH  Google Scholar 

  11. Fujiwara, S.: Hierarchical Clustering for Directed Network Data, Master’s thesis. University of Tsukuba, Master’s Program in Risk Engineering (2017). (in Japanese)

    Google Scholar 

  12. Honda, K., Oshio, S., Notsu, A.: Fuzzy co-clustering induced by multinomial mixture model. J. Adv. Comput. Intell. Intell. Inform. 19(6), 717–726 (2015)

    Article  Google Scholar 

  13. Kaizu, Y., Miyamoto, S., Endo, Y.: Hard fuzzy C-Medoids for asymmetric networks. In: Proceedings of 16th World Congress of the International Fuzzy Systems Association (IFSA 2015), 30 June–July 3, Gijon, Spain, pp. 435–440 (2015)

    Google Scholar 

  14. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)

    Book  MATH  Google Scholar 

  15. MacQueen, J.B.: Some methods of classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1 (University of California Press 1967), pp. 281–297 (1967)

    Google Scholar 

  16. Miyahara, S., Miyamoto, S.: A family of algorithms using spectral clustering and DBSCAN. In: Proceedings of 2014 IEEE International Conference on Granular Computing (GrC 2014), Noboribetsu, Hokkaido, Japan, pp. 196–200, 22–24 October 2014

    Google Scholar 

  17. Miyamoto, S., Agusta, Y.: An efficient algorithm for \(\ell _1\) fuzzy \(c\)-means and its termination. Control Cybern. 24(4), 421–436 (1993)

    MATH  Google Scholar 

  18. Miyamoto, S.: Fuzzy Sets in Information Retrieval and Cluster Analysis. Springer, Heidelberg (1990)

    Book  MATH  Google Scholar 

  19. Miyamoto, S., Ichihashi, H., Honda, K.: Algorithms for Fuzzy Clustering. Springer, Heidelberg (2008)

    MATH  Google Scholar 

  20. Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)

    Article  Google Scholar 

  21. Newman, M.E.J.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69, 066133 (2004)

    Article  Google Scholar 

  22. Tamura, Y., Miyamoto, S.: A method of two stage clustering using agglomerative hierarchical algorithms with one-Pass k-Means++ or k-Median++. In: Proceedings of 2014 IEEE International Conference on Granular Computing (GrC2014), Noboribetsu, Hokkaido, Japan, pp. 281–285, 22–24 October 2014

    Google Scholar 

Download references

Acknowledgment

This paper is based upon work supported in part by the Air Force Office of Scientific Research/Asian Office of Aerospace Research and Development (AFOSR/AOARD) under award number FA2386-17-1-4046.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sadaaki Miyamoto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Miyamoto, S., Huynh, VN., Fujiwara, S. (2018). Methods for Clustering Categorical and Mixed Data: An Overview and New Algorithms. In: Huynh, VN., Inuiguchi, M., Tran, D., Denoeux, T. (eds) Integrated Uncertainty in Knowledge Modelling and Decision Making. IUKM 2018. Lecture Notes in Computer Science(), vol 10758. Springer, Cham. https://doi.org/10.1007/978-3-319-75429-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-75429-1_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-75428-4

  • Online ISBN: 978-3-319-75429-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics