Advertisement

Linear Programming for Phylogenetic Reconstruction Based on Gene Rearrangements

  • Jijun Tang
  • Bernard M. E. Moret
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3537)

Abstract

Phylogenetic reconstruction from gene rearrangements has attracted increasing attention from biologists and computer scientists over the last few years. Methods used in reconstruction include distance-based methods, parsimony methods using sequence-based encodings, and direct optimization. The latter, pioneered by Sankoff and extended by us with the software suite GRAPPA, is the most accurate approach, but has been limited to small genomes because the running time of its scoring algorithm grows exponentially with the number of genes in the genome. We report here on a new method to compute a tight lower bound on the score of a given tree, using a set of linear constraints generated through selective applications of the triangle inequality (in the spirit of GESTALT). Our method generates an integer linear program with a carefully limited number of constraints, rapidly solves its relaxed version, and uses the result to provide a tight lower bound. Since this bound is very close to the optimal tree score, it can be used directly as a selection criterion, thereby enabling us to bypass entirely the expensive scoring procedure. We have implemented this method within our GRAPPA software and run several series of experiments on both biological and simulated datasets to assess its accuracy. Our results show that using the bound as a selection criterion yields excellent trees, with error rates below 5% up to very large evolutionary distances, consistently beating the baseline Neighbor-Joining. Our new method enables us to extend the range of applicability of the direct optimization method to chromosomes of size comparable to those of bacteria, as well as to datasets with complex combinations of evolutionary events.

Keywords

Gene Order False Negative Rate Integer Linear Program Internal Node Gene Rearrangement 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Blanchette, M., Bourque, G., Sankoff, D.: Breakpoint phylogenies. In: Miyano, S., Takagi, T. (eds.) Genome Informatics 1997, pp. 25–34. Univ. Academy Press, Tokyo (1997)Google Scholar
  2. 2.
    Berkelaar, M., Eikland, K., Notebaert, P.: lp solve Available at http://www.geocities.com/lpsolve/
  3. 3.
    Bourque, G., Pevzner, P.: Genome-scale evolution: reconstructing gene orders in the ancestral species. Genome Research 12, 26–36 (2002)Google Scholar
  4. 4.
    Caprara, A.: Formulations and hardness of multiple sorting by reversals. In: Proc. 3rd Ann. Int’l. Conf. Comput. Mol. Biol (RECOMB 1999), pp. 84–93. ACM Press, New York (1999)Google Scholar
  5. 5.
    Caprara, A.: On the practical solution of the reversal median problem. In: Gascuel, O., Moret, B.M.E. (eds.) WABI 2001. LNCS, vol. 2149, pp. 238–251. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  6. 6.
    Cosner, M.E., Jansen, R.K., Palmer, J.D., Downie, S.R.: The highly rearranged chloroplast genome of Trachelium caeruleum (Campanulaceae): multiple inversions, inverted repeat expansion and contraction, transposition, insertions/deletions, and several repeat families. Curr. Genet. 31, 419–429 (1997)CrossRefGoogle Scholar
  7. 7.
    Cosner, M.E., Jansen, R.K., Moret, B.M.E., Raubeson, L.A., Wang, L., Warnow, T., Wyman, S.K.: A new fast heuristic for computing the breakpoint phylogeny and experimental phylogenetic analyses of real and synthetic data. In: Proc. 8th Int’l. Conf. on Intelligent Systems for Mol. Biol (ISMB 2000), pp. 104–115 (2000)Google Scholar
  8. 8.
    Downie, S.R., Palmer, J.D.: Use of chloroplast DNA rearrangements in reconstructing plant phylogeny. In: Soltis, P., Soltis, D., Doyle, J.J. (eds.) Plant Molecular Systematics, pp. 14–35. Chapman and Hall, Boca Raton (1992)Google Scholar
  9. 9.
    Earnest-DeYoung, J., Lerat, E., Moret, B.M.E.: Reversing gene erosion: reconstructing ancestral bacterial genomes from gene-content and gene-order data. In: Jonassen, I., Kim, J. (eds.) WABI 2004. LNCS (LNBI), vol. 3240, pp. 1–13. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  10. 10.
    El-Mabrouk, N.: Genome rearrangement by reversals and insertions/deletions of contiguous segments. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 222–234. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  11. 11.
    Eppstein, D.: Finding the k shortest paths. SIAM J. on Computing 28(2), 652–673 (1998)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Lancia, G., Ravi, R.: GESTALT: GEnomic STeiner ALignmenTs. In: Crochemore, M., Paterson, M. (eds.) CPM 1999. LNCS, vol. 1645, pp. 101–114. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  13. 13.
    Moret, B.M.E., Tang, J., Wang, L.-S., Warnow, T.: Steps toward accurate reconstructions of phylogenies from gene-order data. J. Comput. Syst. Sci. 65(3), 508–525 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Moret, B.M.E., Tang, J., Warnow, T.: Reconstructing phylogenies from gene-content and gene-order data. In: Gascuel, O. (ed.) Mathematics of Evolution and Phylogeny, pp. 321–352. Oxford University Press, Oxford (2005)Google Scholar
  15. 15.
    Moret, B.M.E., Wyman, S.K., Bader, D.A., Warnow, T., Yan, M.: A new implementation and detailed study of breakpoint analysis. In: Proc. 6th Pacific Symp. on Biocomputing (PSB 2001), pp. 583–594. World Scientific Pub., Singapore (2001)Google Scholar
  16. 16.
    Palmer, J.D.: Chloroplast and mitochondrial genome evolution in land plants. In: Herrmann, R. (ed.) Cell Organelles, pp. 99–133. Springer, Heidelberg (1992)Google Scholar
  17. 17.
    Pe’er, I., Shamir, R.: The median problems for breakpoints are NP-complete. Elec. Colloq. on Comput. Complexity 71 (1998)Google Scholar
  18. 18.
    Raubeson, L.A., Jansen, R.K.: Chloroplast DNA evidence on the ancient evolutionary split in vascular land plants. Science 255, 1697–1699 (1992)CrossRefGoogle Scholar
  19. 19.
    Robinson, D.R., Foulds, L.R.: Comparison of phylogenetic trees. Mathematical Biosciences 53, 131–147 (1981)zbMATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Saitou, N., Nei, M.: The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987)Google Scholar
  21. 21.
    Sankoff, D., Blanchette, M.: Multiple genome rearrangement and breakpoint phylogeny. J. Comput. Biol. 5, 555–570 (1998)CrossRefGoogle Scholar
  22. 22.
    Sankoff, D., Nadeau, J. (eds.): Comparative Genomics. Kluwer Academic Pubs., Dordrecht (2000)zbMATHGoogle Scholar
  23. 23.
    Siepel, A.C.: Exact algorithms for the reversal median problem. Master’s thesis, U. New Mexico, Albuquerque, NM (2001), Available at http://www.cs.unm.edu/~acs/thesis.html
  24. 24.
    Siepel, A.C., Moret, B.M.E.: Finding an optimal inversion median: experimental results. In: Gascuel, O., Moret, B.M.E. (eds.) WABI 2001. LNCS, vol. 2149, pp. 189–203. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  25. 25.
    Swenson, K.M., Marron, M., Earnest-DeYoung, J.V., Moret, B.M.E.: Approximating the true evolutionary distance between two genomes. In: Proc. 7th Workshop on Alg. Engineering & Experiments (ALENEX 2005), Vancouver, SIAM Press, Philadelphia (2005)Google Scholar
  26. 26.
    Swofford, D.L., Olson, G., Waddell, P., Hillis, D.M.: Phylogenetic inference. In: Hillis, D.M., Moritz, C., Mable, B. (eds.) Molecular Systematics, ch. 11, 2nd edn., Sinauer Associates (1996)Google Scholar
  27. 27.
    Tang, J., Moret, B.M.E.: Scaling up accurate phylogenetic reconstruction from gene-order data. In: Proc. 11th Int’l Conf. on Intelligent Systems for Mol. Biol (ISMB’03). Bioinformatics, vol. 19, pp. i305–i312. Oxford U. Press, Oxford (2003)Google Scholar
  28. 28.
    Tang, J., Moret, B.M.E., Cui, L., de Pamphilis, C.W.: Phylogenetic reconstruction from arbitrary gene-order data. In: Proc. 4th IEEE Symp. on Bioinformatics and Bioengineering BIBE 2004, pp. 592–599. IEEE Press, Piscataway (2004)CrossRefGoogle Scholar
  29. 29.
    Wang, L.-S., Jansen, R.K., Moret, B.M.E., Raubeson, L.A., Warnow, T.: Fast phylogenetic methods for genome rearrangement evolution: An empirical study. In: Proc. 7th Pacific Symp. on Biocomputing (PSB 2002), pp. 524–535. World Scientific Pub., Singapore (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Jijun Tang
    • 1
  • Bernard M. E. Moret
    • 2
  1. 1.Dept. of Computer Science & EngineeringU. of South CarolinaColumbiaUSA
  2. 2.Dept. of Computer ScienceU. of New MexicoAlbuquerqueUSA

Personalised recommendations