Abstract
DESPOTA is a method proposed to seek the best partition among the ones hosted in a dendrogram. The algorithm visits nodes from the tree root toward the leaves. At each node, it tests the null hypothesis that the two descending branches sustain only one cluster of units through a permutation test approach. At the end of the procedure, a partition of the data into clusters is returned. This paper focuses on the interpretation of the test statistic using a data–driven approach, exploiting a real dataset to show the details of the test statistic and the algorithm in action. The working principle of DESPOTA is shown in the light of the Lance–Williams recurrence formula, which embeds all types of agglomeration methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bruzzese, D., Vistocco, D.: DESPOTA: DEndrogram slicing through a pemutation test approach. J. Classif. 32(2), 285–304 (2015)
Cormack, R.M.: A review of classification. J, R. Stat. Soc. Ser. A (General) 134(3), 321–367 (1971)
Everitt, B., Landau, M., Leese, M.: Cluster Analysis, 4th edn. Arnold, London (2001)
Gandy, A.: Sequential implementation of monte carlo tests with uniformly bounded resampling risk. J. Am. Stat. Assoc. 104(88), 1504–1511 (2009)
Good, P.I.: Permutations Tests for Testing Hypotheses. Springer, New York (1994)
Gordon, A.D.: Classification, 2nd edn. Chapman & Hall/CRC Press (1999)
Gurrutxaga, I., Albisua, I., Arbelaitz, O., Martìn, J.I., Muguerza, J., Pèrez, J.M., Perona, I.: SEP/COP: an efficient method to find the best partition in hierarchical clustering based on a new cluster validity index. Pattern Recogn. 43(10), 3364–3373 (2010)
Kaufman, L., Rousseeuw, P.J.: Finding groups in data. In: An Introduction to Cluster Analysis. Wiley. New York (1990)
Lance, G.N., Williams, W.T.: A generalised sorting strategy for computer classifications. Nature 212, 218 (1966b)
Lance, G.N., Williams, W.T.: A general theory of classificatory sorting strategies. 1. Hierarchical systems. Comput. J. 9(4), 373–380 (1967)
Lichman, M.: UCI machine learning repository. Irvine, CA: University of California, School of Information and Computer Science. http://archive.ics.uci.edu/ml (2013)
Milligan, G.W.: A Monte Carlo study of thirty internal criterion measures for cluster analysis. Psychometrika 42, 187–199 (1981)
Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a dataset. Psychometrika 52(2), 159–179 (1985)
Pesarin, F., Salmaso, L.: Permutation tests for complex data. In: Theory, Applications and Software. Wiley, Chichester, UK (2010)
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/ (2015)
Romano, J.P., Wolf, M.: Control of generalized error rates in multiple testing. Ann. Stat. 35(4), 1378–1408 (2007)
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1986)
Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. B 83(2), 411–423 (2001)
Ward Jr., J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244 (1963)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Passaretti, D., Vistocco, D. (2018). DESPOTA: An Algorithm to Detect the Partition in the Extended Hierarchy of a Dendrogram. In: Perna, C., Pratesi, M., Ruiz-Gazen, A. (eds) Studies in Theoretical and Applied Statistics. SIS 2016. Springer Proceedings in Mathematics & Statistics, vol 227. Springer, Cham. https://doi.org/10.1007/978-3-319-73906-9_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-73906-9_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73905-2
Online ISBN: 978-3-319-73906-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)