Abstract
In the context of taxonomic methods, in recent years, much attention has been paid to the issue of the stability of these methods, i.e., the answer to the question: to what extent the structure discovered by a given method is actually present in the data. This criterion examines whether the groups that were created as a result of using clustering method to a set of objects are real (the structure is stable), or whether they appeared accidentally. Most often this criterion is used when selecting the number of groups (k), for which should be clustered a set of data. The aim of the article is to compare the results in terms of the indicated correct number of groups by classical indexes and stability measures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Packages clv, clValid and fpc, were also selected because the methods implemented there have been the subject of the author’s research for a long time (e.g., Rozmus 2017).
- 2.
This measure is implemented by the function cls.stab.sim.ind in clv package in R.
- 3.
This measure is implemented by the function cls.stab.opt.assign in clv package in R.
- 4.
This measure can be found in clValid package in R.
- 5.
This measure can be found in fpc package in R. It includes two functions for measuring stability: clusterboot and nselectboot. In the experiments only the nselectboot function was used.
- 6.
The indexes were calculated using the functions from the clusterSim and clusterCrit packages.
References
Ben-Hur A, Guyon I (2003) Detecting stable clusters using principal component analysis. Methods Mol Biol 224:159–182
Borys T (ed) (2005) Wskaźniki zrównoważonego rozwoju. Wydawnictwo Ekonomia i Środowisko, Warszawa-Białystok
Borys T (2014) Wybrane problemy metodologii pomiaru nowego paradygmatu rozwoju—polskie doświadczenia. Optimum. Studia Ekonomiczne 3(69):3–21
Brock G, Pihur V, Datta S, Datta S (2008) clValid: an R package for cluster validation. J Stat Softw 25(4)
Fang Y, Wang J (2012) Selection of the number of clusters via the bootstrap method. Comput Stat Data Anal 56:468–477
Henning C (2007) Cluster-wise assessment of cluster stability. Comput Stat Data Anal 52:258–271
Lord E, Willems M, Lapointe FJ, Makarenkov V (2017) Using the stability of objects to determine the number of clusters in datasets. Inf Sci 393:29–46
Lorek E (2011) Ekonomia zrównoważonego rozwoju w badaniach polskich i niemieckich. In: Kos B (ed) Transformacja gospodarki—poziom krajowy i międzynarodowy. Zeszyty Naukowe Uniwersytetu Ekonomicznego w Katowicach 90:103–112
Marino V, Presti LL (2019) Stay in touch! New insights into end-user attitudes towards engagement platforms. J Consum Mark 36:772–783
Rozmus D (2017) Using R packages for comparison of cluster stability. Acta Universitatis Lodziensis Folia Oeconomica 330(4):77–86
Shamir O, Tishby N (2008) Cluster stability for finite samples. Adv Neural Inf Process Syst 20:1297–1304
Suzuki R, Shimodaira H (2006) Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics 22(12):1540–1542
Volkovich Z, Barzily Z, Toledano-Kitai D, Avros R (2010) The Hotteling’s metric as a cluster stability index. Comput Model New Technol 14(4):65–72
Wang J (2010) Consistent selection of the number of clusters via cross-validation, “Biometrika”, 97:893–904
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Rozmus, D. (2021). Determining the Number of Groups in Cluster Analysis Using Classical Indexes and Stability Measures—Comparison of Results. In: Jajuga, K., Najman, K., Walesiak, M. (eds) Data Analysis and Classification. SKAD 2020. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-030-75190-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-75190-6_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75189-0
Online ISBN: 978-3-030-75190-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)