Distributions of topological tree metrics between a species tree and a gene tree

  • Jing Xi
  • Jin Xie
  • Ruriko Yoshida


In order to conduct a statistical analysis on a given set of phylogenetic gene trees, we often use a distance measure between two trees. In a statistical distance-based method to analyze discordance between gene trees, it is a key to decide “biologically meaningful” and “statistically well-distributed” distance between trees. Thus, in this paper, we study the distributions of the three tree distance metrics: the edge difference, the path difference, and the precise K interval cospeciation distance, between two trees: First, we focus on distributions of the three tree distances between two random unrooted trees with n leaves (\(n \ge 4\)); and then we focus on the distributions the three tree distances between a fixed rooted species tree with n leaves and a random gene tree with n leaves generated under the coalescent process with the given species tree. We show some theoretical results as well as simulation study on these distributions.


Coalescent Phylogenetics Tree metrics Tree topologies 



The authors would like to thank the referees for very useful comments to improve the manuscript.


  1. Allen, B., Steel, M. (2001). Subtree transfer operations and their induced metrics on evolutionary trees. Annals of Combinatorics, 5(1), 1–15.Google Scholar
  2. Arnaoudova, E., Haws, D., Huggins, P., Jaromczyk, J. W., Moore, N., Schardl, C., et al. (2010). Statistical phylogenetic tree analysis using differences of means. Frontier Psychiatry, 1(47).Google Scholar
  3. Betancur, R., Li, C., Munroe, T., Ballesteros, J., Ortí, G. (2013). Addressing gene tree discordance and non-stationarity to resolve a multi-locus phylogeny of the flatfishes (teleostei: Pleuronectiformes). Systematic Biology,. doi: 10.1093/sysbio/syt039.
  4. Bollback, J., Huelsenbeck, J. (2009). Parallel genetic evolution within and between bacteriophage species of varying degrees of divergence. Genetics, 181(1), 225–234.Google Scholar
  5. Brito, P., Edwards, S. (2009). Multilocus phylogeography and phylogenetics using sequence-based markers. Genetica, 135, 439–455.Google Scholar
  6. Brodal, G., Fagerberg, R., Pedersen, C. N. (2001). Computing the quartet distance between evolutionary trees in time nlog2n. Algorithmica, 731–742.Google Scholar
  7. Carling, M., Brumfield, R. (2008). Integrating phylogenetic and population genetic analyses of multiple loci to test species divergence hypotheses in passerina buntings. Genetics, 178, 363–377.Google Scholar
  8. Carstens, B. C., Knowles, L. L. (2007). Estimating species phylogeny from gene-tree probabilities despite incomplete lineage sorting: an example from melanoplus grasshoppers. Systematic Biology, 56, 400–411.Google Scholar
  9. Coons, J. Rusinko, J. (2014). Combinatorics of k-interval cospeciation for cophylogeny. (preprint)
  10. Dasgupta, B., He, X., Jiang, T., Li, M., Tromp, J., Zhang, L. (1997). On computing the nearest neighbor interchange distance. In Proceedings of DIMACS Workshop on Discrete Problems with Medical Applications (pp. 125–143) (press).Google Scholar
  11. Degnan, J., Salter, L. (2005a). Gene tree distribtutions under the coalescent process. Evolution, 59(1), 24–37.Google Scholar
  12. Degnan, J. H., Salter, L. A. (2005b). Gene tree distributions under the coalescent process. Evolution, 59, 24–37.Google Scholar
  13. Edwards, S. (2009). Is a new and general theory of molecular systematics emerging? Evolution, 63, 1–19.CrossRefGoogle Scholar
  14. Edwards, S., Liu, L., Pearl, D. (2007). High-resolution species trees without concatenation. Proceedings of the National Academy of Sciences USA, 104, 5936–5941.Google Scholar
  15. Graham, M., Kennedy, J. (2010). A survey of multiple tree visualisation. Information Visualization, 9, 235–252.Google Scholar
  16. Heled, J., Drummond, A. (2011). Bayesian inference of species trees from multilocus data. Molecular Biology and Evolution, 27(3), 570–580.Google Scholar
  17. Hickey, G., Dehne, F., Rau-Chaplin, A., Blouin, C. (2008). SPR distance computation for unrooted trees. Evolutionary Bioinformatics Online, 4, 17–27.Google Scholar
  18. Hillis, D. M., Heath, T. A., St. John, K. (2005). Analysis and visualization of tree space. Systematic Biology, 54(3), 471–482.Google Scholar
  19. Holmes, S. (2007). Statistical Approach to Tests Involving Phylogenies. New York: Oxford University Press.zbMATHGoogle Scholar
  20. Huggins, P., Owen, M., Yoshida, R. (2012). First steps toward the geometry of cophylogeny. In The Proceedings of the Second CREST-SBM International Conference “Harmony of Gröbner Bases and the Modern Industrial Society” (pp. 99–116).Google Scholar
  21. Maddison, W. P. (1997). Gene trees in species trees. Systematic Biology, 46(3), 523–536.CrossRefGoogle Scholar
  22. Maddison, W. P., Knowles, L. L. (2006). Inferring phylogeny despite incomplete lineage sorting. Systematic Biology, 55, 21–30.Google Scholar
  23. Maddison, W. P. Maddison, D. R. (2011). Mesquite: a modular system for evolutionary analysis. version 2.75.Google Scholar
  24. Mossel, E., Roch, S. (2010). Incomplete lineage sorting: consistent phylogeny estimation from multiple loci. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 7(1), 166–171.Google Scholar
  25. Pamilo, P., Nei, M. (1988). Relationships between gene trees and species trees. Molecular Biology and Evolution, 5, 568–583.Google Scholar
  26. Paradis, E., Claude, J., Strimmer, K. (2004). APE: analyses of phylogenetics and evolution in R language. Bioinformatics, 20, 289–290.Google Scholar
  27. Robinson, D. F., Foulds, L. R. (1981). Comparison of phylogenetic trees. Mathematical Biosciences, 53, 131–147.Google Scholar
  28. Rosenberg, N. (2002). The probability of topological concordance of gene trees and species trees. Theoretical Population Biology, 61, 225–247.CrossRefzbMATHGoogle Scholar
  29. Rosenberg, N. A. (2003). The shapes of neutral gene genealogies in two species: probabilities of monophyly, paraphyly, and polyphyly in a coalescent model. Evolution, 57, 1465–1477.CrossRefGoogle Scholar
  30. RoyChoudhury, A., Felsenstein, J., Thompson, E. A. (2008). A two-stage pruning algorithm for likelihood computation for a population tree. Genetics, 180, 1095–1105.Google Scholar
  31. Semple, C. Steel, M. (2003). Phylogenetics, vol. 24 of Oxford Lecture Series in mathematics and its applications. Oxford: Oxford University Press.Google Scholar
  32. Steel, M., Penny, D. (1993). Distributions of tree comparison metrics-some new results. Systematic Biology, 42(2), 126–141.Google Scholar
  33. Takahata, N. (1989). Gene genealogy in 3 related populations: consistency probability between gene and population trees. Genetics, 122, 957–966.Google Scholar
  34. Takahata, N., Nei, M. (1990). Allelic genealogy under overdominant and frequency-dependent selection and polymorphism of major histocompatibility complex loci. Genetics, 124, 967–978.Google Scholar
  35. Tavaré, S. (1984). Line-of-descent and genealogical processes, and their applications in population genetics models. Theoretical Population Biology, 26, 119–164.MathSciNetCrossRefzbMATHGoogle Scholar
  36. Thompson, K., Kubatko, L. (2013). Using ancestral information to detect and localize quantitative trait loci in genome-wide association studies. BMC Bioinformatics, 14, 200.Google Scholar
  37. Weyenberg, G., Huggins, P., Schardl, C., Howe, D., Yoshida, R. (2014). kdetrees: non-parametric estimation of phylogenetic tree distributions. Bioinformatics, 30(16), 2280–2287.Google Scholar
  38. Williams, W. T., Clifford, H. T. (1971). On the comparison of two classifications of the same set of elements. Taxon, 20, 519–522.Google Scholar
  39. Yu, Y., Warnow, T., Nakhleh, L. (2011). Algorithms for mdc-based multi-locus phylogeny inference: Beyond rooted binary gene trees on single alleles. Journal of Computational Biology, 18(11), 1543–1559.Google Scholar

Copyright information

© The Institute of Statistical Mathematics, Tokyo 2016

Authors and Affiliations

  1. 1.Department of MathematicsNorth Carolina State UniversityRaleighUSA
  2. 2.Statistics DepartmentUniversity of KentuckyLexingtonUSA
  3. 3.Statistics DepartmentUniversity of KentuckyLexingtonUSA

Personalised recommendations