Journal of Classification

, Volume 12, Issue 2, pp 265–282 | Cite as

Comparison tests for dendrograms: A comparative evaluation

  • François-Joseph Lapointe
  • Pierre Legendre
Article

Abstract

Classifications are generally pictured in the form of hierarchical trees, also called dendrograms. A dendrogram is the graphical representation of an ultrametric (=cophenetic) matrix; so dendrograms can be compared to one another by comparing their cophenetic matrices. Three methods used in testing the correlation between matrices corresponding to dendrograms are evaluated. The three permutational procedures make use of different aspects of the information to compare dendrograms: the Mantel procedure permutes label positions only; the binary tree methods randomize the topology as well; the double-permutation procedure is based on all the information included in a dendrogram, that is: topology, label positions, and cluster heights. Theoretical and empirical investigations of these methods are carried out to evaluate their relative performance. Simulations show that the Mantel test is too conservative when applied to the comparison of dendrograms; the methods of binary tree comparisons do slightly better; only the doublepermutation test provides unbiased type I error.

Key words

Binary tree Dendrogram Classification Permutation test Ultrametric tree 

Résumé

Les arbres utilisés pour illustrés les groupements sont généralement représentés sous la forme de classifications hiérarchiques ou dendrogrammes. Un dendrogramme représente graphiquement l’information contenue dans la matrice ultramétrique (=cophénétique) correspondant à la classification. Dès ultramétriques correspondantes. Nous comparons trois méthodes permettant d’évaluer la signification statistique du coefficient de correlation mesuré entre deux matrices ultramétriques. Ces trois tests par permutations tiennent compte d’aspects différents pour comparer des dendrogrammes: le test de Mantel permute les feuilles de l’arbre, les méthodes pour arbres binaires permutent les feuilles et la topologie, alors que la procédure à double permutation permute les feuilles, la topologie et les niveaux de fusion des dendrogrammes comparés. L’efficacité relative des trois méthodes est évaluée empiriquement et théoriquement. Nos résultats suggèrent l’utilisation préférentielle du test à double permutation pour la comparaison de dendrogrammes: le test de Mantel s’avère trop conservateur, tandis que les méthodes pour arbres binaires ne sont pas toujours adéquates.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. ARDISSON, P.-L., BOURGET, E., and LEGENDRE, P. (1990), “Multivariate Approach to Study Species Assemblages at Large Spatiotemporal Scales: The Community Structure of the Epibenthic Fauna of the Estuary and Gulf of St. Lawrence,”Canadian Journal of Fisheries and Aquatic Sciences 47, 1364–1377.CrossRefGoogle Scholar
  2. BURGMAN, M.A. (1987), “An Analysis of the Distribution of Plants on Granite Outcrops in Southern Western Australia Using Mantel Tests,”Vegetatio, 71, 79–86.Google Scholar
  3. CUERRIER, A., BARABÉ, D., and BROUILLET, L. (1992), “Bessey and Engler: A Numerical Analysis of their Classification of the Flowering Plants,”Taxon, 41, 667–684.CrossRefGoogle Scholar
  4. CZEKANOWSKI, J. (1909), “Zur Differentialdiagnose der Neandertalgruppe,”Korrespondenz-Blatt der deutschen Gesellschaft für Anthropologie, Ethnologie und Urgeschichte, 40, 44–47.Google Scholar
  5. DAY, W. H. E. (1983), “Distribution of Distances Between Pairs of Classifications,” inNumerical Taxonomy, Ed., J. Felsenstein, NATO Advanced Studies Institute, Ser. G. (Ecological Sciences) 1, Springer Verlag, Berlin, 127–131.Google Scholar
  6. DAY, W. H. E. (1986), “Analysis of Quartet Dissimilarity Measures Between Undirected Phylogenetic Trees,”Systematic Zoology, 35, 325–333.CrossRefGoogle Scholar
  7. DE WALL, F. B. M., and LUTTRELL, L. M. (1988), “Mechanisms of Social Reciprocity in Three Primate Species: Symmetrical Relationship Characteristics or Cognition?”Ethology and Sociobiology, 9, 101–118.CrossRefGoogle Scholar
  8. FELSENSTEIN, J. (1978), “The Number of Evolutionary Trees,”Systematic Zoology, 27, 27–33.CrossRefGoogle Scholar
  9. FRANK, O., and SVENSSON, K. (1981), “On Probability Distributions of Single-Linkage Dendrograms,”Journal of Statistics and Computer Simulation, 12, 121–131.MATHCrossRefMathSciNetGoogle Scholar
  10. FURNAS, G. W. (1984), “The Generation of Random, Binary Unordered Trees,”Journal of Classification, 1, 187–233.MATHCrossRefMathSciNetGoogle Scholar
  11. HARDING, E. F. (1971), “The Probabilities of Rooted Tree-Shapes Generated by Random Bifurcation,”Advances in Applied Probability, 3, 44–77.MATHCrossRefMathSciNetGoogle Scholar
  12. HARTIGAN, J. A. (1967), “Representation of Similarity Matrices by Trees,”Journal of the American Statistical Association, 62, 1140–1158.CrossRefMathSciNetGoogle Scholar
  13. HENDY, M. D., LITTLE, C. H. C., and PENNY, D. (1984), “Comparing Trees with Pendant Vertices Labelled,”SIAM Journal of Applied Mathematics, 44, 1054–1065.MATHCrossRefMathSciNetGoogle Scholar
  14. HUBERT, L. J., and BAKER, F. B. (1977), “The Comparison and Fitting of Given Classification Schemes,”Journal of Mathematical Psychology, 16, 233–253.MATHCrossRefMathSciNetGoogle Scholar
  15. HUBERT, L. J., and LEVIN, J. R. (1976), “Evaluating Object Set Partitions: Free-Sort Analysis and Some Generalizations,”Journal of Verbal Learning and Verbal Behavior, 15, 459–470.CrossRefGoogle Scholar
  16. HUDON, C., and LAMARCHE, G. (1989), “Niche Segregation Between American LobsterHomarus americanus and Rock CrabCancer irroratus,”Marine Ecology Progress Series, 52, 155–168.CrossRefGoogle Scholar
  17. JOHNSON, S. C. (1967), “Hierarchical Clustering Schemes,”Psychometrika, 32, 241–254.CrossRefGoogle Scholar
  18. KRACKHARDT, D., and KILDUFF, M. (1990), “Friendship Patterns and Culture: The Control of Organizational Diversity,”American Anthropologist, 92, 142–154.CrossRefGoogle Scholar
  19. KRACKHARDT, D., and PORTER, L. W. (1986), “The Snowball Effect: Turnover Embedded in Communication Networks,”Journal of Applied Psychology, 71, 50–55.CrossRefGoogle Scholar
  20. KULCZYNSKI, S. (1928), “Die Pflanzenassoziationen der Pieninen,”Bulletin international de l’Académie polonaise des Sciences et des Lettres. Classe des Sciences mathématiques et naturelles, Série B, Supplément II, (1927), 57–203.Google Scholar
  21. LAPOINTE, F.-J. (1992), “On the Congruence of Brain Evolution with Taxonomic Distances and Eco-ethological Affinities: A Statistical Evaluation,” Unpublished Dissertation Thesis, Université de Montréal.Google Scholar
  22. LAPOINTE, F.-J., and LEGENDRE, P. (1990), “A Statistical Framework to Test the Consensus of Two Nested Classifications,”Systematic Zoology, 39, 1–13.CrossRefGoogle Scholar
  23. LAPOINTE, F.-J., and LEGENDRE, P. (1991), “The Generation of Random Ultrametric Matrices Representing Dendrograms,”Journal of Classification, 8, 177–200.CrossRefGoogle Scholar
  24. LAPOINTE, F.-J., and LEGENDRE, P. (1992), “Statistical Significance of the Matrix Correlation Coefficient for Comparing Independent Phylogenetic Trees,”Systematic Biology, 41, 378–384.Google Scholar
  25. LAPOINTE, F.-J., and LEGENDRE, P. (1994), “A Classification of Pure Malt Scotch Whiskies,”Applied Statistics, 43, 237–257.MATHCrossRefGoogle Scholar
  26. LEGENDRE, P., and FORTIN, M.-J. (1989), “Spatial Pattern and Ecological Analysis,”Vegetatio, 80, 107–138.CrossRefGoogle Scholar
  27. LUKASZEWICZ, J. (1951), “Sur la liaison et la division des points d’un ensemble fini,”Colloquium mathematicum, 2, 282–285.MATHMathSciNetGoogle Scholar
  28. MANTEL, N. (1967), “The Detection of Disease Clustering and a Generalized Regression Approach,”Cancer Research, 27, 209–220.Google Scholar
  29. MURTAGH, F. (1984), “Counting Dendrograms: A Survey,”Discrete Applied Mathematics, 7, 191–199.MATHCrossRefMathSciNetGoogle Scholar
  30. ODEN, N. L., and SHAO, K. T. (1984), “An Algorithm to Equiprobably Generate All Directed Trees with k Labeled Terminal Nodes and Unlabeled Interior Nodes,”Bulletin of Mathematical Biology, 46, 379–387.MATHMathSciNetGoogle Scholar
  31. PAGE, R. D. M. (1987), “Graphs and Generalized Tracks: Quantifying Croizat’s Panbiogeography,”Systematic Zoology, 36, 1–17.CrossRefGoogle Scholar
  32. PAGE, R. D. M. (1988), “Quantitative Cladistic Biogeography: Constructing and Comparing Area Cladograms,”Systematic Zoology, 37, 254–270.CrossRefGoogle Scholar
  33. PAGE, R. D. M. (1990), “Temporal Congruence and Cladistic Analysis of Biogeography and Cospeciation,”Systematic Zoology, 39, 205–226.CrossRefGoogle Scholar
  34. PAGE, R. D. M. (1991), “Random Dendrograms and Null Hypotheses in Cladistic Biogeography,”Systematic Zoology, 40, 54–62.CrossRefGoogle Scholar
  35. PHIPPS, J. B. (1975), “The Numbers of Classifications,”Canadian Journal of Botany, 54, 686–688.CrossRefGoogle Scholar
  36. QUIROZ, A. J. (1989), “Fast Random Generation of Binary, t-ary, and Other Types of Trees,”Journal of Classification, 6, 223–231.MATHCrossRefMathSciNetGoogle Scholar
  37. ROHLF, F. J. (1982), “Consensus Indices for Comparing Classifications,”Mathematical Biosciences, 59, 131–144.CrossRefMathSciNetGoogle Scholar
  38. ROHLF, F. J., and SOKAL, R. R. (1981), “Comparing Numerical Taxonomic Studies,”Systematic Zoology, 30, 459–490.CrossRefGoogle Scholar
  39. SAVAGE, H. M. (1983), “The Shape of Evolution: Systematic Tree Topology,”Biological Journal of the Linnean Society, 20, 225–244.CrossRefMathSciNetGoogle Scholar
  40. SCHNELL, G. D., DOUGLAS, M. E., and HOUGH, D. J. (1986), “Geographic Patterns of Variation in Offshore Spotted Dolphins (Stenella attenuata) of the Eastern Tropical Pacific Ocean,”Marine Mammal Science, 2, 186–213.CrossRefGoogle Scholar
  41. SIMBERLOFF, D. (1987), “Calculating Probabilities that Cladograms Match: A Method of Biogeographical Inference,”Systematic Zoology, 36, 175–195.CrossRefGoogle Scholar
  42. SIMBERLOFF, D., HECK, K. L., McCOY, E. D., and CONNOR, E. F. (1981), “There Have Been no Statistical Tests of Cladistic Biogeographical Hypotheses,” inVicariance Biogeography: A Critique, Eds., G. Nelson and D. Rosen, Columbia University Press, New York, 40–63.Google Scholar
  43. SHAO, K., and ROHLF, F. J. (1983), “Sampling Distributions of Consensus Indices when all Bifurcating Trees are Equally Likely” inNumerical Taxonomy, Ed., J. Felsenstein, NATO Advanced Studies Institute, Ser. G. (Ecological Sciences) 1, Springer Verlag, Berlin, 132–137.Google Scholar
  44. SHAO, K., and SOKAL, R. R. (1986), “Significance Tests of Consensus Indices,”Systematic Zoology, 35, 582–590.CrossRefGoogle Scholar
  45. SNEATH, P. H. A. (1957), “The Application of Computers to Taxonomy,”Journal of General Microbiology, 17, 201–226.Google Scholar
  46. SNEATH, P. H., and SOKAL, R. R. (1973),Numerical Taxonomy, San Francisco: W. H. Freeman and Co.MATHGoogle Scholar
  47. SOKAL, R. R. (1979), “Testing Statistical Significance of Geographic Variation Patterns,”Systematic Zoology, 28, 227–232.CrossRefGoogle Scholar
  48. SOKAL, R. R., and MICHENER, C. D. (1958), “A Statistical Method for Evaluating Systematic Relationships,”University of Kansas Science Bulletin, 3, 1409–1438.Google Scholar
  49. SOKAL, R. R., SMOUSE, P. E., and NEEL, J. V. (1986), “The Genetic Structure of a Tribal Population, the Yanomama Indians. XV. Patterns Inferred by Autocorrelation Analysis,”Genetics, 114, 259–287.Google Scholar
  50. SOKAL, R. R., and SNEATH, P. H. A. (1963),Principles of Numerical Taxonomy, San Francisco: W. H. Freeman and Co.Google Scholar
  51. SOKAL, R. R., and UNNASCH, R. S. (1988), “Geographic Covariation of Hosts and Parasites: Evidence fromPopulus andPemphigus,”Zeitschrift für zoologische Systematik und Evolutionsforschung, 26, 73–88.Google Scholar
  52. SOKAL, R. R., UYTTERSCHAUT, H., RÖSING, F.W., and SCHWIDETZKY, I. (1987), “A Classification of European Skulls from Three Time Periods,”American Journal of Physical Anthropology, 74, 1–20.CrossRefGoogle Scholar
  53. SOKAL, R. R., and WARTENBERG, D. E. (1983), “A Test of Spatial Autocorrelation Using an Isolation-by-Distance Model,”Genetics, 105, 219–237.Google Scholar
  54. SØRENSEN, T. (1948), “A Method of Establishing Groups of Equal Amplitude in Plant Sociology Based on Similarity of Species Content and its Application to Analysis of the Vegetation on Danish Commons,”Biologiske Skrifter, 5, 1–34.Google Scholar
  55. STEEL, M. A. (1988), “Distribution of the Symmetric Difference Metric on Phylogenetic Trees,”SIAM Journal of Discrete Mathematics, 1, 541–551.MATHCrossRefMathSciNetGoogle Scholar
  56. STEEL, M. A., and PENNY, D. (1993), “Distributions of Tree Comparison Metrics_— Some New Results,”Systematic Biology, 42, 126–141.Google Scholar

Copyright information

© Springer-Verlag 1995

Authors and Affiliations

  • François-Joseph Lapointe
    • 1
  • Pierre Legendre
    • 1
  1. 1.Département de Sciences biologiquesUniversité de MontréalMontréalCanada

Personalised recommendations