DESPOTA: An Algorithm to Detect the Partition in the Extended Hierarchy of a Dendrogram

Passaretti, Davide; Vistocco, Domenico

doi:10.1007/978-3-319-73906-9_8

Davide Passaretti⁴ &
Domenico Vistocco⁴

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 227))

Included in the following conference series:

Convegno della Società Italiana di Statistica

999 Accesses

Abstract

DESPOTA is a method proposed to seek the best partition among the ones hosted in a dendrogram. The algorithm visits nodes from the tree root toward the leaves. At each node, it tests the null hypothesis that the two descending branches sustain only one cluster of units through a permutation test approach. At the end of the procedure, a partition of the data into clusters is returned. This paper focuses on the interpretation of the test statistic using a data–driven approach, exploiting a real dataset to show the details of the test statistic and the algorithm in action. The working principle of DESPOTA is shown in the light of the Lance–Williams recurrence formula, which embeds all types of agglomeration methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bruzzese, D., Vistocco, D.: DESPOTA: DEndrogram slicing through a pemutation test approach. J. Classif. 32(2), 285–304 (2015)
Article MathSciNet MATH Google Scholar
Cormack, R.M.: A review of classification. J, R. Stat. Soc. Ser. A (General) 134(3), 321–367 (1971)
Google Scholar
Everitt, B., Landau, M., Leese, M.: Cluster Analysis, 4th edn. Arnold, London (2001)
MATH Google Scholar
Gandy, A.: Sequential implementation of monte carlo tests with uniformly bounded resampling risk. J. Am. Stat. Assoc. 104(88), 1504–1511 (2009)
Article MathSciNet MATH Google Scholar
Good, P.I.: Permutations Tests for Testing Hypotheses. Springer, New York (1994)
Book MATH Google Scholar
Gordon, A.D.: Classification, 2nd edn. Chapman & Hall/CRC Press (1999)
Google Scholar
Gurrutxaga, I., Albisua, I., Arbelaitz, O., Martìn, J.I., Muguerza, J., Pèrez, J.M., Perona, I.: SEP/COP: an efficient method to find the best partition in hierarchical clustering based on a new cluster validity index. Pattern Recogn. 43(10), 3364–3373 (2010)
Article MATH Google Scholar
Kaufman, L., Rousseeuw, P.J.: Finding groups in data. In: An Introduction to Cluster Analysis. Wiley. New York (1990)
Google Scholar
Lance, G.N., Williams, W.T.: A generalised sorting strategy for computer classifications. Nature 212, 218 (1966b)
Article Google Scholar
Lance, G.N., Williams, W.T.: A general theory of classificatory sorting strategies. 1. Hierarchical systems. Comput. J. 9(4), 373–380 (1967)
Article Google Scholar
Lichman, M.: UCI machine learning repository. Irvine, CA: University of California, School of Information and Computer Science. http://archive.ics.uci.edu/ml (2013)
Milligan, G.W.: A Monte Carlo study of thirty internal criterion measures for cluster analysis. Psychometrika 42, 187–199 (1981)
Article MathSciNet MATH Google Scholar
Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a dataset. Psychometrika 52(2), 159–179 (1985)
Article Google Scholar
Pesarin, F., Salmaso, L.: Permutation tests for complex data. In: Theory, Applications and Software. Wiley, Chichester, UK (2010)
Google Scholar
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/ (2015)
Romano, J.P., Wolf, M.: Control of generalized error rates in multiple testing. Ann. Stat. 35(4), 1378–1408 (2007)
Article MathSciNet MATH Google Scholar
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1986)
Article MATH Google Scholar
Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. B 83(2), 411–423 (2001)
Article MathSciNet MATH Google Scholar
Ward Jr., J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244 (1963)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Dip.to di Economia e Giurisprudenza – Università degli Studi di Cassino e del Lazio Meridionale, Via S. Angelo S.N. – Località Folcara, Cassino (FR), Italy
Davide Passaretti & Domenico Vistocco

Authors

Davide Passaretti
View author publications
You can also search for this author in PubMed Google Scholar
Domenico Vistocco
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Domenico Vistocco .

Editor information

Editors and Affiliations

Dipartimento di Scienze Economiche e Statistiche, Università degli Studi di Salerno, Fisciano, Salerno, Italy
Cira Perna
Dipartimento di Economia e Management, Università degli Studi di Pisa, Pisa, Italy
Monica Pratesi
Toulouse School of Economics, University of Toulouse, Toulouse Cedex 6, France
Anne Ruiz-Gazen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Passaretti, D., Vistocco, D. (2018). DESPOTA: An Algorithm to Detect the Partition in the Extended Hierarchy of a Dendrogram. In: Perna, C., Pratesi, M., Ruiz-Gazen, A. (eds) Studies in Theoretical and Applied Statistics. SIS 2016. Springer Proceedings in Mathematics & Statistics, vol 227. Springer, Cham. https://doi.org/10.1007/978-3-319-73906-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-73906-9_8
Published: 02 April 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73905-2
Online ISBN: 978-3-319-73906-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics