Skip to main content

Abstract

The task of grouping data points or instances into clusters is quite fundamental in data science. In general, clustering methods belong to the area of unsupervised learning because the data sets using such methods are unlabeled; that is, no information is available about the true cluster to which a data point belongs. The clustering is inferred by using distance- and similarity-based relations. The aim of clustering methods is to group a set of data points, which can correspond to a wide variety of objects, for example, texts, vectors, or networks into groups that we call clusters. Many different approaches can be used for defining clustering methods. Also, analyzing the validity of clusters can be quite intricate. However, in this chapter, we focus on clustering methods based on similarity and distance measures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 69.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. J. Bacher, Clusteranalyse (Oldenbourg Verlag, Munich, 1996).

    Google Scholar 

  2. R. Baeza-Yates, B. Ribeiro-Neto (eds.), Modern Information Retrieval (Addison-Wesley, Reading, 1999).

    Google Scholar 

  3. H.H. Bock, Automatische Klassifikation. Theoretische und praktische Methoden zur Gruppierung und Strukturierung von Daten. Studia Mathematica (Vandenhoeck & Ruprecht, Göttingen, 1974).

    Google Scholar 

  4. D. Bonchev, Information Theoretic Indices for Characterization of Chemical Structures (Research Studies Press, Chichester, 1983).

    Google Scholar 

  5. D. Cook, L.B. Holder, Mining graph data (Wiley-Interscience, Hoboken, 2007).

    MATH  Google Scholar 

  6. M. Dehmer, F. Emmert-Streib, Structural information content of networks: graph entropy based on local vertex functionals. Comput. Biol. Chem. 32, 131–138 (2008).

    Article  MathSciNet  MATH  Google Scholar 

  7. M. Dehmer, F. Emmert-Streib, Quantitative Graph Theory. Theory and Applications. (CRC Press, Boca Raton, 2014).

    Google Scholar 

  8. M. Dehmer, A. Mowshowitz, A history of graph entropy measures. Inf. Sci. 1, 57–78 (2011).

    Article  MathSciNet  MATH  Google Scholar 

  9. J. Devillers, A.T. Balaban, Topological indices and related descriptors in QSAR and QSPR (Gordon and Breach Science Publishers, Amsterdam, 1999).

    Google Scholar 

  10. M.M. Deza, E. Deza, Encyclopedia of distances, 2nd ed. (Springer, Berlin, 2012).

    MATH  Google Scholar 

  11. M.V. Diudea, I. Gutman, L. Jäntschi, Molecular topology (Nova Publishing, New York, 2001).

    Google Scholar 

  12. F. Emmert-Streib, M. Dehmer, Global information processing in gene networks: fault tolerance, in Proceedings of the Bio-Inspired Models of Network, Information, and Computing Systems, Bionetics 2007 (2007).

    Google Scholar 

  13. F. Emmert-Streib, M. Dehmer (eds.), Analysis of microarray data: a network-based approach. (Wiley VCH Publishing, Hoboken, 2010).

    Google Scholar 

  14. F. Emmert-Streib, M. Dehmer, Y. Shi, Fifty years of graph matching, network alignment and network comparison. Inf. Sci. 346–347, 180–197 (2016).

    Article  MathSciNet  MATH  Google Scholar 

  15. B.S. Everitt, S. Landau, M. Leese, D. Stah, Cluster Analysis, 5th ed. (Wiley-VCH, Weinheim, 2011).

    Book  MATH  Google Scholar 

  16. M. Halkidi, Y. Batistakis, M. Vazirgiannis, On clustering validation techniques. J. Intel. Inf. Syst. 17, 107–145 (2001).

    Article  MATH  Google Scholar 

  17. J. Han, M. Kamber, Data mining: concepts and techniques (Morgan and Kaufmann Publishers, Burlington, 2001).

    MATH  Google Scholar 

  18. F. Harary, Graph theory (Addison-Wesley Publishing Company, Reading, 1969).

    Book  MATH  Google Scholar 

  19. T. Hastie, R. Tibshirani, J.H. Friedman, The elements of statistical learning. (Springer, Berlin, 2001).

    Google Scholar 

  20. W. Huber, V. Carey, L. Long, S. Falcon, R. Gentleman, Graphs in molecular biology. BMC Bioinf. 8(Suppl 6), S8 (2007).

    Google Scholar 

  21. A.K. Jain, R.C. Dubes, Algorithms for clustering data (Prentice-Hall Inc., Upper Saddle River, 1988).

    MATH  Google Scholar 

  22. L. Kaufman, P.J. Rousseeuw, Clustering by means of medoids (North Holland/Elsevier, Amsterdam, 1987), pp. 405–416.

    Google Scholar 

  23. K.G. Kugler, L.A.J. Müller, A. Graber, M. Dehmer, Integrative network biology: Graph prototyping for co-expression cancer networks. PLoS ONE 6, e22843 (2011).

    Article  Google Scholar 

  24. J.B. MacQueen, Some methods for classification and analysis of multivariate observations, in Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability (University of California Press, Berkeley, 1967), pp. 281–297.

    Google Scholar 

  25. A. Mowshowitz, Entropy and the complexity of the graphs I: an index of the relative complexity of a graph. Bull. Math. Biophys. 30, 175–204 (1968).

    Article  MathSciNet  MATH  Google Scholar 

  26. L. Mueller, K. Kugler, A. Graber, et al., Structural measures for network biology using QuACN. BMC Bioinf. 12(1), 492 (2011).

    Google Scholar 

  27. L.A.J. Müller, M. Schutte, K.G. Kugler, M. Dehmer, QuACN: Quantitative Analyze of Complex Networks (2012). R Package Version 1.6.

    Google Scholar 

  28. L.A.J. Müller, M. Dehmer, F. Emmert-Streib, Network-based methods for computational diagnostics by means of R, in Computational Medicine (Springer, Berlin, 2012), pp. 185–197.

    Google Scholar 

  29. M.E.J. Newman, Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 103, 8577–8582 (2006).

    Article  Google Scholar 

  30. J. Oyelade, I. Isewon, F. Oladipupo, et al., Clustering algorithms: their application to gene expression data. Bioinf. Biol. Insights 10, 237–253 (2016).

    Article  Google Scholar 

  31. P.J. Rousseeuw, Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Comput. Appl. Math. 20, 53–65 (1987).

    Article  MATH  Google Scholar 

  32. S. Santini, R. Jain, Similarity measures. IEEE Trans. Pattern Anal. Mach. Intell. 21(9), 871–883 (1999).

    Article  Google Scholar 

  33. N. Trinajstić, Chemical graph theory (CRC Press, Boca Raton, 1992).

    Google Scholar 

  34. J.H. Ward, Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244 (1963).

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frank Emmert-Streib .

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Emmert-Streib, F., Moutari, S., Dehmer, M. (2023). Clustering. In: Elements of Data Science, Machine Learning, and Artificial Intelligence Using R. Springer, Cham. https://doi.org/10.1007/978-3-031-13339-8_7

Download citation

Publish with us

Policies and ethics