Skip to main content

DESPOTA: An Algorithm to Detect the Partition in the Extended Hierarchy of a Dendrogram

  • Conference paper
  • First Online:
Studies in Theoretical and Applied Statistics (SIS 2016)

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 227))

Included in the following conference series:

  • 999 Accesses

Abstract

DESPOTA is a method proposed to seek the best partition among the ones hosted in a dendrogram. The algorithm visits nodes from the tree root toward the leaves. At each node, it tests the null hypothesis that the two descending branches sustain only one cluster of units through a permutation test approach. At the end of the procedure, a partition of the data into clusters is returned. This paper focuses on the interpretation of the test statistic using a data–driven approach, exploiting a real dataset to show the details of the test statistic and the algorithm in action. The working principle of DESPOTA is shown in the light of the Lance–Williams recurrence formula, which embeds all types of agglomeration methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bruzzese, D., Vistocco, D.: DESPOTA: DEndrogram slicing through a pemutation test approach. J. Classif. 32(2), 285–304 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  2. Cormack, R.M.: A review of classification. J, R. Stat. Soc. Ser. A (General) 134(3), 321–367 (1971)

    Google Scholar 

  3. Everitt, B., Landau, M., Leese, M.: Cluster Analysis, 4th edn. Arnold, London (2001)

    MATH  Google Scholar 

  4. Gandy, A.: Sequential implementation of monte carlo tests with uniformly bounded resampling risk. J. Am. Stat. Assoc. 104(88), 1504–1511 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  5. Good, P.I.: Permutations Tests for Testing Hypotheses. Springer, New York (1994)

    Book  MATH  Google Scholar 

  6. Gordon, A.D.: Classification, 2nd edn. Chapman & Hall/CRC Press (1999)

    Google Scholar 

  7. Gurrutxaga, I., Albisua, I., Arbelaitz, O., Martìn, J.I., Muguerza, J., Pèrez, J.M., Perona, I.: SEP/COP: an efficient method to find the best partition in hierarchical clustering based on a new cluster validity index. Pattern Recogn. 43(10), 3364–3373 (2010)

    Article  MATH  Google Scholar 

  8. Kaufman, L., Rousseeuw, P.J.: Finding groups in data. In: An Introduction to Cluster Analysis. Wiley. New York (1990)

    Google Scholar 

  9. Lance, G.N., Williams, W.T.: A generalised sorting strategy for computer classifications. Nature 212, 218 (1966b)

    Article  Google Scholar 

  10. Lance, G.N., Williams, W.T.: A general theory of classificatory sorting strategies. 1. Hierarchical systems. Comput. J. 9(4), 373–380 (1967)

    Article  Google Scholar 

  11. Lichman, M.: UCI machine learning repository. Irvine, CA: University of California, School of Information and Computer Science. http://archive.ics.uci.edu/ml (2013)

  12. Milligan, G.W.: A Monte Carlo study of thirty internal criterion measures for cluster analysis. Psychometrika 42, 187–199 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  13. Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a dataset. Psychometrika 52(2), 159–179 (1985)

    Article  Google Scholar 

  14. Pesarin, F., Salmaso, L.: Permutation tests for complex data. In: Theory, Applications and Software. Wiley, Chichester, UK (2010)

    Google Scholar 

  15. R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/ (2015)

  16. Romano, J.P., Wolf, M.: Control of generalized error rates in multiple testing. Ann. Stat. 35(4), 1378–1408 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  17. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1986)

    Article  MATH  Google Scholar 

  18. Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. B 83(2), 411–423 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  19. Ward Jr., J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244 (1963)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Domenico Vistocco .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Passaretti, D., Vistocco, D. (2018). DESPOTA: An Algorithm to Detect the Partition in the Extended Hierarchy of a Dendrogram. In: Perna, C., Pratesi, M., Ruiz-Gazen, A. (eds) Studies in Theoretical and Applied Statistics. SIS 2016. Springer Proceedings in Mathematics & Statistics, vol 227. Springer, Cham. https://doi.org/10.1007/978-3-319-73906-9_8

Download citation

Publish with us

Policies and ethics