Clustering

Emmert-Streib, Frank; Moutari, Salissou; Dehmer, Matthias

doi:10.1007/978-3-031-13339-8_7

Frank Emmert-Streib⁴,
Salissou Moutari⁵ &
Matthias Dehmer⁶

582 Accesses

Abstract

The task of grouping data points or instances into clusters is quite fundamental in data science. In general, clustering methods belong to the area of unsupervised learning because the data sets using such methods are unlabeled; that is, no information is available about the true cluster to which a data point belongs. The clustering is inferred by using distance- and similarity-based relations. The aim of clustering methods is to group a set of data points, which can correspond to a wide variety of objects, for example, texts, vectors, or networks into groups that we call clusters. Many different approaches can be used for defining clustering methods. Also, analyzing the validity of clusters can be quite intricate. However, in this chapter, we focus on clustering methods based on similarity and distance measures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

J. Bacher, Clusteranalyse (Oldenbourg Verlag, Munich, 1996).
Google Scholar
R. Baeza-Yates, B. Ribeiro-Neto (eds.), Modern Information Retrieval (Addison-Wesley, Reading, 1999).
Google Scholar
H.H. Bock, Automatische Klassifikation. Theoretische und praktische Methoden zur Gruppierung und Strukturierung von Daten. Studia Mathematica (Vandenhoeck & Ruprecht, Göttingen, 1974).
Google Scholar
D. Bonchev, Information Theoretic Indices for Characterization of Chemical Structures (Research Studies Press, Chichester, 1983).
Google Scholar
D. Cook, L.B. Holder, Mining graph data (Wiley-Interscience, Hoboken, 2007).
MATH Google Scholar
M. Dehmer, F. Emmert-Streib, Structural information content of networks: graph entropy based on local vertex functionals. Comput. Biol. Chem. 32, 131–138 (2008).
Article MathSciNet MATH Google Scholar
M. Dehmer, F. Emmert-Streib, Quantitative Graph Theory. Theory and Applications. (CRC Press, Boca Raton, 2014).
Google Scholar
M. Dehmer, A. Mowshowitz, A history of graph entropy measures. Inf. Sci. 1, 57–78 (2011).
Article MathSciNet MATH Google Scholar
J. Devillers, A.T. Balaban, Topological indices and related descriptors in QSAR and QSPR (Gordon and Breach Science Publishers, Amsterdam, 1999).
Google Scholar
M.M. Deza, E. Deza, Encyclopedia of distances, 2nd ed. (Springer, Berlin, 2012).
MATH Google Scholar
M.V. Diudea, I. Gutman, L. Jäntschi, Molecular topology (Nova Publishing, New York, 2001).
Google Scholar
F. Emmert-Streib, M. Dehmer, Global information processing in gene networks: fault tolerance, in Proceedings of the Bio-Inspired Models of Network, Information, and Computing Systems, Bionetics 2007 (2007).
Google Scholar
F. Emmert-Streib, M. Dehmer (eds.), Analysis of microarray data: a network-based approach. (Wiley VCH Publishing, Hoboken, 2010).
Google Scholar
F. Emmert-Streib, M. Dehmer, Y. Shi, Fifty years of graph matching, network alignment and network comparison. Inf. Sci. 346–347, 180–197 (2016).
Article MathSciNet MATH Google Scholar
B.S. Everitt, S. Landau, M. Leese, D. Stah, Cluster Analysis, 5th ed. (Wiley-VCH, Weinheim, 2011).
Book MATH Google Scholar
M. Halkidi, Y. Batistakis, M. Vazirgiannis, On clustering validation techniques. J. Intel. Inf. Syst. 17, 107–145 (2001).
Article MATH Google Scholar
J. Han, M. Kamber, Data mining: concepts and techniques (Morgan and Kaufmann Publishers, Burlington, 2001).
MATH Google Scholar
F. Harary, Graph theory (Addison-Wesley Publishing Company, Reading, 1969).
Book MATH Google Scholar
T. Hastie, R. Tibshirani, J.H. Friedman, The elements of statistical learning. (Springer, Berlin, 2001).
Google Scholar
W. Huber, V. Carey, L. Long, S. Falcon, R. Gentleman, Graphs in molecular biology. BMC Bioinf. 8(Suppl 6), S8 (2007).
Google Scholar
A.K. Jain, R.C. Dubes, Algorithms for clustering data (Prentice-Hall Inc., Upper Saddle River, 1988).
MATH Google Scholar
L. Kaufman, P.J. Rousseeuw, Clustering by means of medoids (North Holland/Elsevier, Amsterdam, 1987), pp. 405–416.
Google Scholar
K.G. Kugler, L.A.J. Müller, A. Graber, M. Dehmer, Integrative network biology: Graph prototyping for co-expression cancer networks. PLoS ONE 6, e22843 (2011).
Article Google Scholar
J.B. MacQueen, Some methods for classification and analysis of multivariate observations, in Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability (University of California Press, Berkeley, 1967), pp. 281–297.
Google Scholar
A. Mowshowitz, Entropy and the complexity of the graphs I: an index of the relative complexity of a graph. Bull. Math. Biophys. 30, 175–204 (1968).
Article MathSciNet MATH Google Scholar
L. Mueller, K. Kugler, A. Graber, et al., Structural measures for network biology using QuACN. BMC Bioinf. 12(1), 492 (2011).
Google Scholar
L.A.J. Müller, M. Schutte, K.G. Kugler, M. Dehmer, QuACN: Quantitative Analyze of Complex Networks (2012). R Package Version 1.6.
Google Scholar
L.A.J. Müller, M. Dehmer, F. Emmert-Streib, Network-based methods for computational diagnostics by means of R, in Computational Medicine (Springer, Berlin, 2012), pp. 185–197.
Google Scholar
M.E.J. Newman, Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 103, 8577–8582 (2006).
Article Google Scholar
J. Oyelade, I. Isewon, F. Oladipupo, et al., Clustering algorithms: their application to gene expression data. Bioinf. Biol. Insights 10, 237–253 (2016).
Article Google Scholar
P.J. Rousseeuw, Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Comput. Appl. Math. 20, 53–65 (1987).
Article MATH Google Scholar
S. Santini, R. Jain, Similarity measures. IEEE Trans. Pattern Anal. Mach. Intell. 21(9), 871–883 (1999).
Article Google Scholar
N. Trinajstić, Chemical graph theory (CRC Press, Boca Raton, 1992).
Google Scholar
J.H. Ward, Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244 (1963).
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Tampere University, Tampere, Finland
Frank Emmert-Streib
Queen’s University Belfast, Belfast, UK
Salissou Moutari
Swiss Distance University of Applied Science, Birg, Switzerland
Matthias Dehmer

Authors

Frank Emmert-Streib
View author publications
You can also search for this author in PubMed Google Scholar
Salissou Moutari
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Dehmer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Frank Emmert-Streib .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Emmert-Streib, F., Moutari, S., Dehmer, M. (2023). Clustering. In: Elements of Data Science, Machine Learning, and Artificial Intelligence Using R. Springer, Cham. https://doi.org/10.1007/978-3-031-13339-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-13339-8_7
Published: 04 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-13338-1
Online ISBN: 978-3-031-13339-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics