A Fitness Distance Correlation Measure for Evolutionary Trees
Phylogenetics is concerned with inferring the genealogical relationships between a group of organisms (or taxa), and this relationship is usually expressed as an evolutionary tree. However, inferring the phylogenetic tree is not a trivial task since it is impossible to know the true evolutionary history for a set of organisms. As a result, most phylogenetic analyses rely on effective heuristics for obtaining accurate trees. These heuristics use tree score as a basis for establishing an accurate depiction of evolutionary tree relationships. Relatively little work has been done to analyze the relationship between improving tree scores (fitness) and topological accuracy (distance). In this paper, we present a new fitness-distance correlation coefficient called rfd to quantify the relationship between evolutionary trees. By applying this measure to three biological datasets consisting of 44, 60, and 174 taxa, our results show that improvements in fitness are strongly correlated (rfd > 0.8) with topological accuracy to the best-tree-overall. Moreover, we investigated the use of the rfd coefficient if the best overall tree is not available and found similar results. Thus, our results show that rfd is a robust measure with several potential applications such as the development of stopping criteria for phylogenetic search.
Unable to display preview. Download preview PDF.
- 1.Butt, D., Roger, A., Blouin, C.: libcov: A C++ bioinformatic library to manipulate protein structures, sequence alignments and phylogeny. BMC Bioinformatics 6(138) (2005)Google Scholar
- 3.Felsenstein, J.: Phylogenetic inference package (PHYLIP), version 3.2. Cladistics 5, 164–166 (1989)Google Scholar
- 4.Felsenstein, J.: Inferring Phylogenies. Sinauer Associates (2003)Google Scholar
- 7.Gillespie, J., McKenna, C., Yoder, M., Gutell, R., Johnston, J., Kathirithamby, J., Cognato, A.: Assessing the odd secondary structural properties of nuclear small subunit ribosomal rna sequences (18s) of the twisted-wing parasites (Insecta: Strepsiptera). Insect Mol. Biol. 15, 625–643 (2005)CrossRefGoogle Scholar
- 10.Jones, T., Forrest, S.: Fitness distance correlation as a measure of problem difficulty for genetic algorithms. In: Eshelman, L. (ed.) Proceedings of the Sixth International Conference on Genetic Algorithms, pp. 184–192. Morgan Kaufmann, San Francisco (1995)Google Scholar
- 15.Stamatakis, A., Ludwig, T., Meier, H.: RAxML: A fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics 1(1), 1–8 (2004)Google Scholar
- 17.Swofford, D.L.: PAUP*: Phylogenetic analysis using parsimony (and other methods), Sinauer Associates, Underland, Massachusetts, Version 4.0 (2002)Google Scholar