GASTS: Parsimony Scoring under Rearrangements

  • Andrew Wei Xu
  • Bernard M. E. Moret
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6833)

Abstract

The accumulation of whole-genome data has renewed interest in the study of genomic rearrangements. Comparative genomics, evolutionary biology, and cancer research all require models and algorithms to elucidate the mechanisms, history, and consequences of these rearrangements. However, rearrangements lead to NP-hard problems, so that current approaches, such as the MGR tool, are limited to small collections of genomes and low-resolution data of a few hundred syntenic blocks.

We describe the first algorithm for rearrangement analysis that scales up, in both time and accuracy, to modern high-resolution genomic data. Our main contribution is GASTS, an algorithm for scoring a fixed phylogenetic tree: given a tree and a collection of genomes, one for each leaf of the tree, each genome given by an ordered list of syntenic blocks, GASTS infers genomes for the internal nodes of the tree so as to minimize the sum, taken over all tree edges, of the pairwise genomic distances between tree nodes. We present the results of extensive testing on both simulated and real data showing that our algorithm runs several orders of magnitude faster than existing approaches and scales up linearly instead of exponentially with the size of the genomes involved; on the small instances that current approaches can complete in a day, our algorithm also returns much better scores. In simulations, our tree scores stay within 0.5% of the model value for trees up to 100 taxa and genomes of up to 10,000 syntenic blocks. GASTS enables us to attack heretofore unapproachable problems, such as accurate ancestral reconstruction of large genomes and phylogenetic inference for high-resolution vertebrate genomes, as we demonstrate on a set of vertebrate genomes with over 2,000 syntenic blocks.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Adam, Z., Sankoff, D.: The ABCs of MGR with DCJ. Evol. Bioinf. Online 4, 69–74 (2008)Google Scholar
  2. 2.
    Aldous, D.J.: Stochastic models and descriptive statistics for phylogenetic trees, from Yule to today. Stat. Sci. 16, 23–34 (2001)CrossRefMATHGoogle Scholar
  3. 3.
    Bergeron, A., Mixtacki, J., Stoye, J.: A unifying view of genome rearrangements. In: Bücher, P., Moret, B.M.E. (eds.) WABI 2006. LNCS (LNBI), vol. 4175, pp. 163–173. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. 4.
    Bininda-Emonds, O.R.P., Brady, S.G., Kim, J., Sanderson, M.J.: Scaling of accuracy in extremely large phylogenetic trees. In: Proc. 6th Pacific Symp. on Biocomputing (PSB 2001), pp. 547–558. World Scientific Pub., Singapore (2001)Google Scholar
  5. 5.
    Blanchette, M., Bourque, G., Sankoff, D.: Breakpoint phylogenies. In: Miyano, S., Takagi, T. (eds.) Genome Informatics, pp. 25–34. Univ. Academy Press, Tokyo (1997)Google Scholar
  6. 6.
    Bourque, G., Pevzner, P.: Genome-scale evolution: reconstructing gene orders in the ancestral species. Genome Res. 12, 26–36 (2002)Google Scholar
  7. 7.
    Day, W.H.E., Sankoff, D.: The computational complexity of inferring phylogenies from chromosome inversion data. J. Theor. Biol. 127, 213–218 (1987)CrossRefGoogle Scholar
  8. 8.
    Earnest-DeYoung, J., Lerat, E., Moret, B.M.E.: Reversing gene erosion: reconstructing ancestral bacterial genomes from gene-content and gene-order data. In: Jonassen, I., Kim, J. (eds.) WABI 2004. LNCS (LNBI), vol. 3240, pp. 1–13. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  9. 9.
    Fertin, G., Labarre, A., Rusu, I., Tannier, E., Vialette, S.: Combinatorics of Genome Rearrangements. MIT Press, Cambridge (2009)CrossRefMATHGoogle Scholar
  10. 10.
    Hillis, D.M.: Approaches for assessing phylogenetic accuracy. Syst. Biol. 44, 3–16 (1995)CrossRefGoogle Scholar
  11. 11.
    Larget, B., Kadane, J.B., Simon, D.L.: A Markov chain Monte Carlo approach to reconstructing ancestral genome arrangements. Mol. Biol. Evol. 22, 486–489 (2002)CrossRefGoogle Scholar
  12. 12.
    Miklós, I., Mélykúti, B., Swenson, K.M.: The metropolized partial importance sampling MCMC mixes slowly on minimal reversal rearrangement paths. ACM/IEEE Trans. on Comput. Bio. & Bioinf. 7(4), 763–767 (2010)CrossRefGoogle Scholar
  13. 13.
    Moret, B.M.E., Siepel, A.C., Tang, J., Liu, T.: Inversion medians outperform breakpoint medians in phylogeny reconstruction from gene-order data. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 521–536. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  14. 14.
    Moret, B.M.E., Wyman, S.K., Bader, D.A., Warnow, T., Yan, M.: A new implementation and detailed study of breakpoint analysis. In: Proc. 6th Pacific Symp. on Biocomputing (PSB 2001), pp. 583–594. World Scientific Pub., Singapore (2001)Google Scholar
  15. 15.
    Rajan, V., Xu, A.W., Lin, Y., Swenson, K.M., Moret, B.M.E.: Heuristics for the inversion median problem. In: Proc. 8th Asia Pacific Bioinf. Conf. (APBC 2010). BMC Bioinformatics, vol. 11(suppl. 1), p. S30 (2010)Google Scholar
  16. 16.
    Robinson, D.R., Foulds, L.R.: Comparison of phylogenetic trees. Math. Biosci. 53, 131–147 (1981)CrossRefMATHGoogle Scholar
  17. 17.
    Rokas, A., Holland, P.W.H.: Rare genomic changes as a tool for phylogenetics. Trends in Ecol. and Evol. 15, 454–459 (2000)CrossRefGoogle Scholar
  18. 18.
    Siepel, A.C., Moret, B.M.E.: Finding an optimal inversion median: Experimental results. In: Gascuel, O., Moret, B.M.E. (eds.) WABI 2001. LNCS, vol. 2149, pp. 189–203. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  19. 19.
    Strimmer, K., von Haeseler, A.: Quartet puzzling: A quartet maximum likelihood method for reconstructing tree topologies. Mol. Biol. Evol. 13, 964–969 (1996)CrossRefGoogle Scholar
  20. 20.
    Sturtevant, A.H.: A crossover reducer in Drosophila melanogaster due to inversion of a section of the third chromosome. Biol. Zent. Bl. 46, 697–702 (1926)Google Scholar
  21. 21.
    Sturtevant, A.H., Dobzhansky, T.: Inversions in the third chromosome of wild races of D. pseudoobscura and their use in the study of the history of the species. Proc. Nat’l Acad. Sci., USA 22, 448–450 (1936)CrossRefGoogle Scholar
  22. 22.
    Tang, J., Moret, B.M.E.: Scaling up accurate phylogenetic reconstruction from gene-order data. In: Proc. 11th Int’l Conf. on Intelligent Systems for Mol. Biol (ISMB 2003). Bioinformatics, vol. 19, pp. i305–i312 (2003)Google Scholar
  23. 23.
    Xu, A.W.: On exploring genome rearrangement phylogenetic patterns. In: Tannier, E. (ed.) RECOMB-CG 2010. LNCS, vol. 6398, pp. 121–136. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  24. 24.
    Xu, A.W.: DCJ median problems on linear multichromosomal genomes: Graph representation and fast exact solutions. In: Ciccarelli, F.D., Miklós, I. (eds.) RECOMB-CG 2009. LNCS, vol. 5817, pp. 70–83. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  25. 25.
    Xu, A.W.: A fast and exact algorithm for the median of three problem—a graph decomposition approach. J. Comput. Biol. 16(10), 1369–1381 (2009)CrossRefGoogle Scholar
  26. 26.
    Xu, A.W., Sankoff, D.: Decompositions of multiple breakpoint graphs and rapid exact solutions to the median problem. In: Crandall, K.A., Lagergren, J. (eds.) WABI 2008. LNCS (LNBI), vol. 5251, pp. 25–37. Springer, Heidelberg (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Andrew Wei Xu
    • 1
  • Bernard M. E. Moret
    • 1
  1. 1.Laboratory for Computational Biology and Bioinformatics, EPFLEPFL-IC-LCBB INJ230LausanneSwitzerland

Personalised recommendations