Rearrangements in Phylogenetic Inference: Compare, Model, or Encode?
- Bernard M. E. MoretAffiliated withLaboratory for Computational Biology and Bioinformatics, EPFL Email author
- , Yu LinAffiliated withLaboratory for Computational Biology and Bioinformatics, EPFL
- , Jijun TangAffiliated withDepartment of Computer Science and Engineering, University of South Carolina
We survey phylogenetic inference from rearrangement data, as viewed through the lens of the work of our group in this area, in tribute to David Sankoff, pioneer and mentor.
Genomic rearrangements were first used for phylogenetic analysis in the late 1920s, but it was not until the 1990s that this approach was revived, with the advent of genome sequencing. G. Watterson et al. proposed to measure the inversion distance between two genomes, J. Palmer et al. studied the evolution of mitochondrial and chloroplast genomes, and D. Sankoff and W. Day published the first algorithmic paper on phylogenetic inference from rearrangement data, giving rise to a fertile field of mathematical, algorithmic, and biological research.
Distance measures for sequence data are simple to define, but those based on rearrangements proved to be complex mathematical objects. The first approaches for phylogenetic inference from rearrangement data, due to D. Sankoff, used model-free distances, such as synteny (colocation on a chromosome) or breakpoints (disrupted adjacencies). The development of algorithms for distance and median computations led to modeling approaches based on biological mechanisms. However, the multiplicity of such mechanisms pose serious challenges. A unifying framework, proposed by S. Yancopoulos et al. and popularized by D. Sankoff, has supported major advances, such as precise distance corrections and efficient algorithms for median estimation, thereby enabling phylogenetic inference using both distance and maximum-parsimony methods.
Likelihood-based methods outperform distance and maximum-parsimony methods, but using such methods with rearrangements has proved problematic. Thus we have returned to an approach we first proposed 12 years ago: encoding the genome structure into sequences and using likelihood methods on these sequences. With a suitable a bias in the ground probabilities, we attain levels of performance comparable to the best sequence-based methods. Unsurprisingly, the idea of injecting such a bias was first proposed by D. Sankoff.
- Rearrangements in Phylogenetic Inference: Compare, Model, or Encode?
- Book Title
- Models and Algorithms for Genome Evolution
- Book Part
- Part II
- pp 147-171
- Print ISBN
- Online ISBN
- Series Title
- Computational Biology
- Series Volume
- Series ISSN
- Springer London
- Copyright Holder
- Springer-Verlag London
- Additional Links
- Industry Sectors
- eBook Packages
- Editor Affiliations
- 5. Department of Mathematics, Simon Fraser University
- 6. Computer Science and Operations Research, University of Montreal
- 7. Biometry and Evolutionary Biology, INRIA Rhône-Alpes, University of Lyon
- Author Affiliations
- 8. Laboratory for Computational Biology and Bioinformatics, EPFL, EPFL-IC-LCBB INJ230, Station 14, 1015, Lausanne, Switzerland
- 9. Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, 29208, USA
To view the rest of this content please follow the download PDF link above.