Journal of Classification

, Volume 8, Issue 2, pp 177–200 | Cite as

The generation of random ultrametric matrices representing dendrograms

  • François-Joseph Lapointe
  • Pierre Legendre
Article

Abstract

Many methods and algorithms to generate random trees of many kinds have been proposed in the literature. No procedure exists however for the generation of dendrograms with randomized fusion levels. Randomized dendrograms can be obtained by randomizing the associated cophenetic matrix. Two algorithms are described. The first one generates completely random dendrograms, i.e., trees with a random topology, random fusion level values, and random assignment of the labels. The second algorithm uses a double-permutation procedure to randomize a given dendrogram; it proceeds by randomization of the fixed fusion levels, instead of using random fusion level values. A proof is presented that the double-permutation procedure is a Uniform Random Generation Algorithmsensu Furnas (1984), and a complete example is given.

Keywords

Random dendrograms Random matrices Uniform sampling Tree algorithm Monte Carlo studies Clustering methodology 

Résumé

On retrouve dans la littérature plusieurs méthodes et algorithmes destinés à générer des arbres aléatoires de toutes sortes. Il n'existe cependant aucune procédure permettant la génération de dendrogrammes comportant des niveaux de fusion aléatoires. De tels dendrogrammes peuvent être obtenus à partir des matrices cophénétiques associées. Nous décrivons deux algorithmes pour ce faire. Le premier permet de générer des dendrogrammes complètement aléatoires, c'est-à-dire des arbres possédant une topologie aléatoire, des niveaux de fusion aléatoires ainsi que des feuilles étiquetées de façon aléatoire. Le deuxième algo- rithme utilise une procédure à double permutation afin de randomiser un dendrogramme donné; on proc de dans ce cas à la permutation des véritables niveaux de fusion au lieu de génèrer des niveaux aléatoires. Nous présentons la preuve démontrant que la procédure à double permutation représente un Algorithme de Génération Aléatoire Uniformesensu Furnas (1984). Un exemple complet est également fourni.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. COLLESS, D. H. (1980), “Congruence Between Morphometric and Allozyme Data forMenidia Species: a Reappraisal,”Systematic Zoology, 29, 288–299.CrossRefGoogle Scholar
  2. DE SOETE, G. (1984), “Ultrametric Tree Representations of Incomplete Dissimilarity Data,”Journal of Classification, 1, 235–242.CrossRefGoogle Scholar
  3. FAITH, D.P., and BELBIN, L. (1986), “Comparison of Classifications Using Measures Intermediate Between Metric Dissimilarity and Consensus Similarity,”Journal of Classification, 3, 257–280.MATHCrossRefGoogle Scholar
  4. FELSENSTEIN, J. (1978), “The Number of Evolutionary Trees,”Systematic Zoology, 27, 27–33.CrossRefGoogle Scholar
  5. FELSENSTEIN, J. (1985), “Confidence Limits on Phylogenies: An Approach Using the Bootstrap,”Evolution, 39, 783–791.CrossRefGoogle Scholar
  6. FRANK, O., and SVENSSON, K. (1981), “On Probability Distributions of Single-Linkage Dendrograms,”Journal of Statistics and Computer Simulation, 12, 121–131.MATHMathSciNetGoogle Scholar
  7. FURNAS, G. W. (1984), “The Generation of Random, Binary Unordered Trees,”Journal of Classification, 1, 187–233.MATHCrossRefMathSciNetGoogle Scholar
  8. GÖBEL, F. (1980), “On a 1-1 Correspondence Between Rooted Trees and Natural Numbers,”Journal of Combinatorial Theory, Series B, 29, 141–143.MATHCrossRefMathSciNetGoogle Scholar
  9. GOWER, J. C., and LEGENDRE, P. (1986), “Metric and Euclidean Properties of Dissimilarity Coefficients,”Journal of Classification, 3, 5–48.MATHCrossRefMathSciNetGoogle Scholar
  10. GUENOCHE, A. (1983), “Random Spanning Trees,”Journal of Algorithms, 4, 214–220.MATHCrossRefMathSciNetGoogle Scholar
  11. HAJDU L. J. (1981), “Graphical Comparison of Resemblance Measures in Phytosociology,”Vegetatio, 48, 47–59.CrossRefGoogle Scholar
  12. HARDING, E. F. (1971), “The Probabilities of Rooted Tree-Shapes Generated by Random Bifurcation,”Advances in Applied Probability, 3, 44–77.MATHCrossRefMathSciNetGoogle Scholar
  13. HARTIGAN, J. A. (1967), “Representation of Similarity Matrices by Trees,”Journal of the American Statistical Association, 62, 1140–1158.CrossRefMathSciNetGoogle Scholar
  14. KNOTT, G. D. (1977), “A Numbering System for Binary Trees,”Communication of the Association for Computing Machinery,20(2).Google Scholar
  15. LAPOINTE, F.-J., and LEGENDRE, P. (1990), “A Statistical Framework to Test the Consensus of Two Nested Classifications,”Systematic Zoology, 39, 1–14.CrossRefGoogle Scholar
  16. LAPOINTE, F.-J., and LEGENDRE, P. (submitted), “A Statistical Framework to Test the Consensus among Additive Trees (Cladograms).”Google Scholar
  17. MICKEVICH, M. F. (1978), “Taxonomic Congruence,”Systematic Zoology, 27, 143–158.CrossRefGoogle Scholar
  18. MURTAGH, F. (1983), “A Probability Theory of Hierarchic Clustering Using Random Dendrograms,”Journal of Statistics and Computer Simulation, 18, 145–157.MATHGoogle Scholar
  19. MURTAGH, F. (1984), “Counting Dendrograms: A Survey,”Discrete Applied Mathematics, 7, 191–199.MATHCrossRefMathSciNetGoogle Scholar
  20. NEMEC, A. F. L., and BRINKHURST, R. O. (1988), “Using the Bootstrap to Assess Statistical Significance in the Cluster Analysis of Species Abundance Data,”Canadian Journal of Fisheries and Aquatic Sciences, 45, 965–970.CrossRefGoogle Scholar
  21. NIJENHUIS, A., and WILF, H. S. (Eds.) (1978),Combinatorial Algorithms for Computers and Calculators, Second Edition, New York: Academic Press.MATHGoogle Scholar
  22. ODEN, N. L., and SHAO, K. T. (1984), “An Algorithm to Equiprobably Generate All Directed Trees With k Labeled Terminal Nodes and Unlabeled Interior Nodes,”Bulletin of Mathematical Biology, 46, 379–387.MATHMathSciNetGoogle Scholar
  23. PAGE, R. D. M. (1988), “Quantitative Cladistic Biogeography: Constructing and Comparing Area Cladograms,”Systematic Zoology, 37, 254–270.CrossRefGoogle Scholar
  24. PAGE, R. D. M. (1990), “Temporal Congruence and Cladistic Analysis of Biogeography and Cospeciation,”Systematic Zoology, 39, 205–226.CrossRefGoogle Scholar
  25. PHIPPS J. B. (1975), “The Numbers of Classifications,”Canadian Journal of Botany, 54, 686–688.CrossRefGoogle Scholar
  26. PROSKUROWSKI, A. (1980), “On the Generation of Binary Trees,”Journal of the Association for Computing Machinery, 27, 1–2.MathSciNetGoogle Scholar
  27. QUIROZ, A. J. (1989), “Fast Random Generation of Binary, t-ary and Other Types of Trees,”Journal of Classification, 6, 223–231.MATHCrossRefMathSciNetGoogle Scholar
  28. ROHLF, F. J. (1983), “Numbering Binary Trees With Labeled Terminal Vertices,”Bulletin of Mathematical Biology, 45, 33–40.MATHMathSciNetGoogle Scholar
  29. ROSEN, D. E. (1978), “Vicariant Patterns and Historical Explanation in Biogeography,”Systematic Zoology, 27, 159–188.CrossRefGoogle Scholar
  30. ROTEM, D., and VAROL, Y. L. (1978), “Generation of Binary Trees from Ballot Sequences,”Journal of the Association for Computing Machinery, 25, 396–404.MATHMathSciNetGoogle Scholar
  31. SHAO, K., and ROHLF, F. J. (1983), “Sampling Distributions of Consensus Indices when all Bifurcating Trees are Equally Likely” inNumerical Taxonomy, ed., J. Felsenstein, NATO Advanced Studies Institute, Ser. G. (Ecological Sciences) 1, Berlin: Springer Verlag, 132–137.Google Scholar
  32. SHAO, K., and SOKAL, R. R. (1986), “Significance Tests of Consensus Indices,”Systematic Zoology, 35, 582–590.CrossRefGoogle Scholar
  33. SIBSON, R. (1972), “Order Invariant Methods for Data Analysis,”Journal of the Royal Statistical Society, B 34, 311–349.MATHMathSciNetGoogle Scholar
  34. SNEATH, P. H. A., and SOKAL, R. R. (1973),Numerical Taxonomy, San Francisco: Freeman.MATHGoogle Scholar
  35. SOKAL, R. R., and ROHLF, F. J. (1962), “The Comparison of Dendrograms by Objective Methods,”Taxon, 11, 33–40.CrossRefGoogle Scholar
  36. SOLOMON, M., and FINKEL, R. A. (1980), “A Note on Enumerating Binary Trees,”Journal of the Association for Computing Machinery, 27, 3–5.MathSciNetGoogle Scholar

Copyright information

© Springer-Verlag 1991

Authors and Affiliations

  • François-Joseph Lapointe
    • 1
  • Pierre Legendre
    • 1
  1. 1.Département de Sciences biologiquesUniversité de MontréalMontréalCanada

Personalised recommendations