DESPOTA: DEndrogram Slicing through a PemutatiOn Test Approach

Bruzzese, Dario; Vistocco, Domenico

doi:10.1007/s00357-015-9179-x

DESPOTA: DEndrogram Slicing through a PemutatiOn Test Approach

Published: 08 July 2015

Volume 32, pages 285–304, (2015)
Cite this article

Journal of Classification Aims and scope Submit manuscript

Dario Bruzzese¹ &
Domenico Vistocco²

245 Accesses
9 Citations
Explore all metrics

Abstract

Hierarchical clustering represents one of the most widespread analytical approaches to tackle classification problems mainly due to the visual powerfulness of the associated graphical representation, the dendrogram. That said, the requirement of appropriately choosing the number of clusters still represents the main difficulty for the final user. We introduce DESPOTA (DEndrogram Slicing through a PermutatiOn Test Approach), a novel approach exploiting permutation tests in order to automatically detect a partition among those embedded in a dendrogram. Unlike the traditional approach, DESPOTA includes in the search space also partitions not corresponding to horizontal cuts of the dendrogram. Applications on both real and syntethic datasets will show the effectiveness of our proposal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A practical guide to amplicon and metagenomic analysis of microbiome data

Article Open access 11 May 2020

Yong-Xin Liu, Yuan Qin, … Yang Bai

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Article 27 November 2022

Gbeminiyi John Oyewole & George Alex Thopil

References

BANFIELD, J.D., and RAFTERY, A.E. (1993), “Model Based Gaussian and Non Gaussian Clustering”, Biometrics, 49, 803–821.
CALINSKI, R.B., and HARABASZ, J. (1974), “A Dendrite Method for Cluster Analysis”, Communications in Statistics, 3, 1–27.
CHARRAD M., GHAZZALI N., BOITEAU V., HUBERT M., and NIKNAFS A. (2013), An Examination of Indices for Determining the Number of Clusters: NbClust Package, R Package Version 1.3.
DUDA, R.O., and HART, P.E. (1973), Pattern Classification and Scene Analysis, New York: Wiley.
EVERITT, B., LANDAU, M., and LEESE, M. (2001), Cluster Analysis (4th ed.), London: Arnold.
GOOD, P.I. (1994), Permutations Tests for Testing Hypotheses, New York: Springer-Verlag.
GURRUTXAGA, I., ALBISUA, I., ARBELAITZ, O., MART`IN, J.I., MUGUERZA, J., P`EREZ, J.M., and PERONA, I. (2010), “SEP/COP: An Efficient Method to Find the Best Partition in Hierarchical Clustering Based on a New Cluster Validity Index”, Pattern Recognition, 43(10), 3364–3373.
HOCHBERG, Y. (1988), “A Sharper Bonferroni Procedure for Multiple Tests of Significance”, Biometrika, 75, 800–802.
HOLM, S. (1979), “A Simple Sequentially Rejective Multiple Testing Procedure”, Scandinavian Journal of Statistics, 6, 65–70.
HORTON P., and NAKAI K. (1996), “A Probabilistic Classification System for Predicting the Cellular Localization Sites of Proteins”, Proceedings of the International Conference on Intelligent Systems for Molecular Biology, 4, 109–115.
HUBERT, L.J., and LEVIN, J.R. (1976), “A General Statistical Framework for Assessing Categorical Clustering in Free Recall”, Psychological Bulletin, 83, 1072–1080.
HUBERT, L., and ARABIE, P. (1985), “Comparing Partitions”, Journal of Classification, 2, 193–218.
JOHNSON, R.A., and WICHERN, D.W. (1982), Applied Multivariate Statistical Analysis, Upper Saddle River, NJ: Prentice Hall.
KIM, M., and RAMAKRISHNA, R.S. (2005), “New Indices for Cluster Validity Assessment”, Pattern Recognition Letters, 26(15), 2353–2363.
KUIPER, K.K., and FISHER, L. (1975), “A Monte Carlo Comparison of Six Clustering Procedures”, Biometrics, 31, 777–783.
LAGO-FERNA’ NDEZ, L.F., and CORBACHO, F. (2010), “Normality-Based Validation for Crisp Clustering”, Pattern Recognition, 43, 782–795.
LIU, Y., HAYES, D.N., NOBEL, A., and MARRON, J.S. (2008), “Statistical Significance of Clustering for High-Dimension, Low-Sample Size Data”, Journal of the American Statistical Association, 103(483), 1281–1293.
MAECHLER M., ROUSSEEUWP., STRUYF A., HUBERT M., and HORNIK K. (2011), Cluster: Cluster Analysis Basics and Extensions, R Package Version 1.14.1.
MILLIGAN, G.W. (1981), “A Monte Carlo Study of Thirty Internal Criterion Measures for Cluster Analysis”, Psychometrika, 46(2), 187–199.
MILLIGAN, G.W., and COOPER, M.C. (1985), “An Examination of Procedures for Determining the Number of Clusters in a Dataset”, Psychometrika, 52(2), 159–179.
PARK, P.J., MANJOURIDES, J., BONETTI, M., and PAGANO, M. (2009), “A Permutation Test for Determining Significance of Clusters with Applications to Spatial and Gene Expression Data”, Computational Statistics and Data Analysis, 53(12), 4290–4300.
PESARIN, F., and SALMASO, L. (2010), Permutation Tests for Complex Data. Theory, Applications and Software, Chichester: John Wiley and Sons.
QIU, W.L., and JOE, H. (2006), “Separation Index and Partial Membership for Clustering”, Computational Statistics and Data Analysis, 50, 585–603.
QIU, W.L., and JOE, H. (2006), “Generation of Random Clusters with Specified Degree of Separation”, Journal of Classification, 23(2), 315–334.
QIU, W.L., and JOE, H. (2009). ClusterGeneration: Random Cluster Generation (with Specified Degree of Separation), R package version 1.2.7.
R DEVELOPMENT CORE TEAM (2010), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0, http://www.R-project.org.
ROMANO, J.P., SHAIKH, A.M., and WOLF, M. (2008), “Formalized Data Snooping Based on Generalized Error Rates”, Econometric Theory, 24, 404–447.
RYOTA, S., and SHIMODAIRA, H. (2011), pvclust: Hierarchical Clustering with P-Values via Multiscale Bootstrap Resampling, R package version 1.2-2, http://CRAN.Rproject.org/package=pvclust.
SHIMODAIRA, H. (2004), “Approximately Unbiased Tests of Regions Using Multistep-Multiscale Bootstrap Resampling”, Annals of Statistics, 32, 2616–2641.
STEINLEY, D. (2004), “Properties of the Hubert-Arabie Adjusted Rand Index”, Psychological Methods, 9(3), 386–396.
TIBSHIRANI, R., WALTHER, G., and HASTIE, T. (2001), “Estimating the Number of Clusters in a Data Set via the Gap Statistic, Journal of Royal Statistical Society B, 83(2), 411–423
WARRENS, M.J. (2008), “On the Equivalence of Cohens Kappa and the Hubert-Arabie Adjusted Rand Index”, Journal of Classification, 25, 177–183.
WICKHAM, H. (2009), ggplot2: Elegant Graphics for Data Analysis, New York: Springer.
WISHART D. (1969),“An Algorithm for Hierarchical Classification”, Biometrics, 25, 165–170.
WU, K.-L., YANG, M.-S., and HSIEH, J.-N. (2009), “Robust Cluster Validity Indexes”, Pattern Recognition, 42(11), 2541–2550.

Download references

Author information

Authors and Affiliations

Department of Public Health, University of Naples “Federico II”, Via S. Pansini, 5, I-80131, Naples, Italy
Dario Bruzzese
Department of Economics and Law, University of Cassino, Via S. Angelo, Localit’a Folcara, I-03043, Cassino, Frosinone, Italy
Domenico Vistocco

Authors

Dario Bruzzese
View author publications
You can also search for this author in PubMed Google Scholar
Domenico Vistocco
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dario Bruzzese.

Additional information

The authors wish to thank Professor Ibai Gurrutxaga and his colleagues for kindly providing the data and the R code used in their paper: this allowed us to make a worthwhile comparison of the two methods. The authors are also grateful to Professor Jaromir Antoch for helpful comments on a previous draft of the paper and the three anonymous referees for their valuable suggestions which helped us to improve the final version of this paper.

All computation and graphics were done in the R language (R Development Core Team 2010) using the basic packages and the additional cluster (Maechler et al. 2011), ggplot2 (Wickham 2009) and NbClust (Charrad et al. 2013) packages.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bruzzese, D., Vistocco, D. DESPOTA: DEndrogram Slicing through a PemutatiOn Test Approach. J Classif 32, 285–304 (2015). https://doi.org/10.1007/s00357-015-9179-x

Download citation

Published: 08 July 2015
Issue Date: July 2015
DOI: https://doi.org/10.1007/s00357-015-9179-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DESPOTA: DEndrogram Slicing through a PemutatiOn Test Approach

Abstract

Access this article

Similar content being viewed by others

A practical guide to amplicon and metagenomic analysis of microbiome data

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

DESPOTA: DEndrogram Slicing through a PemutatiOn Test Approach

Abstract

Access this article

Similar content being viewed by others

A practical guide to amplicon and metagenomic analysis of microbiome data

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation