A Fitness Distance Correlation Measure for Evolutionary Trees

  • Hyun Jung Park
  • Tiffani L. Williams
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5462)

Abstract

Phylogenetics is concerned with inferring the genealogical relationships between a group of organisms (or taxa), and this relationship is usually expressed as an evolutionary tree. However, inferring the phylogenetic tree is not a trivial task since it is impossible to know the true evolutionary history for a set of organisms. As a result, most phylogenetic analyses rely on effective heuristics for obtaining accurate trees. These heuristics use tree score as a basis for establishing an accurate depiction of evolutionary tree relationships. Relatively little work has been done to analyze the relationship between improving tree scores (fitness) and topological accuracy (distance). In this paper, we present a new fitness-distance correlation coefficient called rfd to quantify the relationship between evolutionary trees. By applying this measure to three biological datasets consisting of 44, 60, and 174 taxa, our results show that improvements in fitness are strongly correlated (rfd > 0.8) with topological accuracy to the best-tree-overall. Moreover, we investigated the use of the rfd coefficient if the best overall tree is not available and found similar results. Thus, our results show that rfd is a robust measure with several potential applications such as the development of stopping criteria for phylogenetic search.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Butt, D., Roger, A., Blouin, C.: libcov: A C++ bioinformatic library to manipulate protein structures, sequence alignments and phylogeny. BMC Bioinformatics 6(138) (2005)Google Scholar
  2. 2.
    Deans, A.R., Gillespie, J.J., Yoder, M.J.: An evaluation of ensign wasp classification (Hymenoptera: Evanildae) based on molecular data and insights from ribosomal rna secondary structure. Syst. Ento. 31, 517–528 (2006)CrossRefGoogle Scholar
  3. 3.
    Felsenstein, J.: Phylogenetic inference package (PHYLIP), version 3.2. Cladistics 5, 164–166 (1989)Google Scholar
  4. 4.
    Felsenstein, J.: Inferring Phylogenies. Sinauer Associates (2003)Google Scholar
  5. 5.
    Fitch, W.M.: Toward defining the course of evolution: minimal change for a specific tree topology. Syst. Zool. 20, 406–416 (1971)CrossRefGoogle Scholar
  6. 6.
    Foulds, L.R., Graham, R.L.: The Steiner problem in phylogeny is NP-complete. Advances in Applied Mathematics 3, 43–49 (1982)CrossRefGoogle Scholar
  7. 7.
    Gillespie, J., McKenna, C., Yoder, M., Gutell, R., Johnston, J., Kathirithamby, J., Cognato, A.: Assessing the odd secondary structural properties of nuclear small subunit ribosomal rna sequences (18s) of the twisted-wing parasites (Insecta: Strepsiptera). Insect Mol. Biol. 15, 625–643 (2005)CrossRefGoogle Scholar
  8. 8.
    Goloboff, P.: Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics 15, 415–428 (1999)CrossRefGoogle Scholar
  9. 9.
    Goloboff, P.A., Farris, J.S., Nixon, K.C.: TNT, a free program for phylogenetic analysis. Cladistics 24(5), 774–786 (2008)CrossRefGoogle Scholar
  10. 10.
    Jones, T., Forrest, S.: Fitness distance correlation as a measure of problem difficulty for genetic algorithms. In: Eshelman, L. (ed.) Proceedings of the Sixth International Conference on Genetic Algorithms, pp. 184–192. Morgan Kaufmann, San Francisco (1995)Google Scholar
  11. 11.
    Murphy, W.J., Eizirik, E., O’Brien, S.J., Madsen, O., Scally, M., Douady, C.J., Teeling, E., Ryder, O.A., Stanhope, M.J., de Jong, W.W., Springer, M.S.: Resolution of the early placental mammal radiation using bayesian phylogenetics. Science 294, 2348–2351 (2001)CrossRefPubMedGoogle Scholar
  12. 12.
    Nixon, K.C.: The parsimony ratchet, a new method for rapid parsimony analysis. Cladistics 15, 407–414 (1999)CrossRefGoogle Scholar
  13. 13.
    Ronquist, F., Huelsenbeck, J.P.: Mrbayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19(12), 1572–1574 (2003)CrossRefPubMedGoogle Scholar
  14. 14.
    Saitou, N., Nei, M.: The neighbor-joining method: A new method for reconstructiong phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987)PubMedGoogle Scholar
  15. 15.
    Stamatakis, A., Ludwig, T., Meier, H.: RAxML: A fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics 1(1), 1–8 (2004)Google Scholar
  16. 16.
    Sul, S.-J., Williams, T.L.: An experimental analysis of robinson-foulds distance matrix algorithms. In: Halperin, D., Mehlhorn, K. (eds.) ESA 2008. LNCS, vol. 5193, pp. 793–804. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  17. 17.
    Swofford, D.L.: PAUP*: Phylogenetic analysis using parsimony (and other methods), Sinauer Associates, Underland, Massachusetts, Version 4.0 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Hyun Jung Park
    • 1
  • Tiffani L. Williams
    • 2
  1. 1.Department of Computer ScienceRice UniversityUSA
  2. 2.Department of Computer Science and EngineeringTexas A&M UniversityUSA

Personalised recommendations