Bootstrapping Phylogenies Inferred from Rearrangement Data

  • Yu Lin
  • Vaibhav Rajan
  • Bernard M. E. Moret
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6833)

Abstract

Large-scale sequencing of genomes has enabled the inference of phylogenies based on the evolution of genomic architecture, under such events as rearrangements, duplications, and losses. Many evolutionary models and associated algorithms have been designed over the last few years and have found use in comparative genomics and phylogenetic inference. However, the assessment of phylogenies built from such data has not been properly addressed to date. The standard method used in sequence-based phylogenetic inference is the bootstrap, but it relies on a large number of homologous characters that can be resampled; yet in the case of rearrangements, the entire genome is a single character. Alternatives such as the jackknife suffer from the same problem, while likelihood tests cannot be applied in the absence of well established probabilistic models.

We present a new approach to the assessment of distance-based phylogenetic inference from whole-genome data; our approach combines features of the jackknife and the bootstrap and remains nonparametric. For each feature of our method, we give an equivalent feature in the sequence-based framework; we also present the results of extensive experimental testing, in both sequence-based and genome-based frameworks. Through the feature-by-feature comparison and the experimental results, we show that our bootstrapping approach is on par with the classic phylogenetic bootstrap used in sequence-based reconstruction, and we establish the clear superiority of the classic bootstrap and of our corresponding new approach over proposed variants. Finally, we test our approach on a small dataset of mammalian genomes, verifying that the support values match current thinking about the respective branches.

Our method is the first to provide a standard of assessment to match that of the classic phylogenetic bootstrap for aligned sequences. Its support values follow a similar scale and its receiver-operating characteristics are nearly identical, indicating that it provides similar levels of sensitivity and specificity. Thus our assessment method makes it possible to conduct phylogenetic analyses on whole genomes with the same degree of confidence as for analyses on aligned sequences. Extensions to search-based inference methods such as maximum parsimony and maximum likelihood are possible, but remain to be thoroughly tested.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Amrine-Madsen, H., Koepfli, K.P., Wayne, R., Springer, M.: A new phylogenetic marker, apolipoprotein b, provides compelling evidence for eutherian relationships. Mol. Phyl. Evol. 28(2), 225–240 (2003)CrossRefGoogle Scholar
  2. 2.
    Anisimova, M., Gascuel, O.: Approximate likelihood-ratio test for branches: A fast, accurate, and powerful alternative. Syst. Biol. 55(4), 539–552 (2006)CrossRefGoogle Scholar
  3. 3.
    Bergeron, A., Mixtacki, J., Stoye, J.: A unifying view of genome rearrangements. In: Bücher, P., Moret, B.M.E. (eds.) WABI 2006. LNCS (LNBI), vol. 4175, pp. 163–173. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. 4.
    Cannarozzi, G., Schneider, A., Gonnet, G.: A phylogenomic study of human, dog, and mouse. PLoS Comput. Biol. 3, e2 (2007)CrossRefGoogle Scholar
  5. 5.
    Desper, R., Gascuel, O.: Theoretical foundation of the balanced minimum evolution method of phylogenetic inference and its relationship to weighted least-squares tree fitting. Mol. Biol. Evol. 21(3), 587–598 (2003)CrossRefGoogle Scholar
  6. 6.
    Efron, B., Gong, G.: A leisurely look at the bootstrap, the jackknife, and cross-validation. American Statistician 37(1), 36–48 (1983)Google Scholar
  7. 7.
    Efron, B., Tibshirani, R., Tibshirani, R.: An Introduction to the Bootstrap. Chapman & Hall/CRC, Boca Raton (1993)CrossRefMATHGoogle Scholar
  8. 8.
    Farris, J.: The future of phylogeny reconstruction. Zoologica Scripta 26(4), 303–311 (1997)CrossRefGoogle Scholar
  9. 9.
    Farris, J., Albert, V., Källersjö, M., Lipscomb, D., Kluge, A.: Parsimony jackknifing outperforms neighbor-joining. Cladistics 12(2), 99–124 (1996)CrossRefGoogle Scholar
  10. 10.
    Felsenstein, J.: Confidence limits on phylogenies: an approach using the bootstrap. Evol. 39, 783–791 (1985)CrossRefGoogle Scholar
  11. 11.
    Felsenstein, J., Kishino, H.: Is there something wrong with the bootstrap on phylogenies? A reply to Hillis and Bull. Syst. Biol. 42(2), 193–200 (1993)CrossRefGoogle Scholar
  12. 12.
    Fertin, G., Labarre, A., Rusu, I., Tannier, E., Vialette, S.: Combinatorics of Genome Rearrangements. MIT Press, Cambridge (2009)CrossRefMATHGoogle Scholar
  13. 13.
    Guindon, S., Gascuel, O.: PHYML—a simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52(5), 696–704 (2003)CrossRefGoogle Scholar
  14. 14.
    Hillis, D., Huelsenbeck, J.: Assessing molecular phylogenies. Science 267, 255–256 (1995)CrossRefGoogle Scholar
  15. 15.
    Holmes, S.: Bootstrapping phylogenetic trees: theory and methods. Statistical Science 18(2), 241–255 (2003)CrossRefMATHGoogle Scholar
  16. 16.
    Hu, F., Gao, N., Tang, J.: Maximum likelihood phylogenetic reconstruction using gene order encodings. In: Proc. 8th IEEE Symp. Comput. Intell. in Bioinf. & Comput. Biol. (CIBCB 2011). IEEE Press, Los Alamitos (accepted, to appear 2011)Google Scholar
  17. 17.
    Huttley, G., Wakefield, M., Easteal, S.: Rates of genome evolution and branching order from whole-genome analysis. Mol. Biol. Evol. 24(8), 1722–1730 (2007)CrossRefGoogle Scholar
  18. 18.
    Lin, Y., Moret, B.: Estimating true evolutionary distances under the DCJ model. In: Proc. 16th Int’l Conf. on Intelligent Systems for Mol. Biol. (ISMB 2008) (2008); Bioinformatics 24(13), i114–i122 (2008)Google Scholar
  19. 19.
    Lin, Y., Rajan, V., Moret, B.: Fast and accurate phylogenetic reconstruction from high-resolution whole-genome data and a novel robustness estimator. In: Tannier, E. (ed.) RECOMB-CG 2010. LNCS, vol. 6398, pp. 137–148. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  20. 20.
    Lin, Y., Rajan, V., Swenson, K., Moret, B.: Estimating true evolutionary distances under rearrangements, duplications, and losses. In: Proc. 8th Asia Pacific Bioinf. Conf. (APBC 2010) (2010); BMC Bioinformatics 11(suppl. 1), s54 (2010)Google Scholar
  21. 21.
    Madsen, O., et al.: Parallel adaptive radiations in two major clades of placental mammals. Nature 409, 610–614 (2001)CrossRefGoogle Scholar
  22. 22.
    Marron, M., Swenson, K., Moret, B.: Genomic distances under deletions and insertions. Theor. Comput. Sci. 325(3), 347–360 (2004)CrossRefMATHGoogle Scholar
  23. 23.
    Moret, B., Tang, J., Wang, L.S., Warnow, T.: Steps toward accurate reconstructions of phylogenies from gene-order data. J. Comput. Syst. Sci. 65(3), 508–525 (2002)CrossRefMATHGoogle Scholar
  24. 24.
    Moret, B., Warnow, T.: Advances in phylogeny reconstruction from gene order and content data. In: Zimmer, E., Roalson, E. (eds.) Molecular Evolution: Producing the Biochemical Data, Part B. Methods in Enzymology, vol. 395, pp. 673–700. Elsevier, Amsterdam (2005)CrossRefGoogle Scholar
  25. 25.
    Mort, M., Soltis, P., Soltis, D., Mabry, M.: Comparison of three methods for estimating internal support on phylogenetic trees. Syst. Biol. 49(1), 160–171 (2000)CrossRefGoogle Scholar
  26. 26.
    Murphy, W., Eizirik, E., Johnson, W., Zhang, Y., Ryder, O., O’Brien, S.: Molecular phylogenetics and the origins of placental mammals. Nature 409, 614–618 (2001)CrossRefGoogle Scholar
  27. 27.
    R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing (2009)Google Scholar
  28. 28.
    Rokas, A., Holland, P.: Rare genomic changes as a tool for phylogenetics. Trends in Ecol. and Evol. 15, 454–459 (2000)CrossRefGoogle Scholar
  29. 29.
    Saitou, N., Nei, M.: The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987)Google Scholar
  30. 30.
    Salamin, N., Chase, M., Hodkinson, T., Savolainen, V.: Assessing internal support with large phylogenetic DNA matrices. Mol. Phyl. Evol. 27(3), 528 (2003)CrossRefGoogle Scholar
  31. 31.
    Shao, J., Wu, C.: A general theory for jackknife variance estimation. Annals of Statistics 17(3), 1176–1197 (1989)CrossRefMATHGoogle Scholar
  32. 32.
    Shi, J., Zhang, Y., Luo, H., Tang, J.: Using jackknife to assess the quality of gene order phylogenies. BMC Bioinformatics 11(1), 168 (2010)CrossRefGoogle Scholar
  33. 33.
    Soltis, P., Soltis, D.: Applying the bootstrap in phylogeny reconstruction. Statist. Sci. 18(2), 256–267 (2003)CrossRefMATHGoogle Scholar
  34. 34.
    Swenson, K., Marron, M., Earnest-DeYoung, J., Moret, B.: Approximating the true evolutionary distance between two genomes. In: Proc. 7th SIAM Workshop on Algorithm Engineering & Experiments (ALENEX 2005). SIAM Press, Philadelphia (2005)Google Scholar
  35. 35.
    Swofford, D., Olson, G., Waddell, P., Hillis, D.: Phylogenetic inference. In: Hillis, D., Moritz, C., Mable, B. (eds.) Molecular Systematics, 2nd edn., ch. 11. Sinauer Assoc. (1996)Google Scholar
  36. 36.
    Tannier, E.: Yeast ancestral genome reconstructions: The possibilities of computational methods. In: Ciccarelli, F.D., Miklós, I. (eds.) RECOMB-CG 2009. LNCS, vol. 5817, pp. 1–12. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  37. 37.
    Wang, L.S.: Exact-IEBP: a new technique for estimating evolutionary distances between whole genomes. In: Proc. 33rd Ann. ACM Symp. Theory of Comput. (STOC 2001), pp. 637–646. ACM Press, New York (2001)Google Scholar
  38. 38.
    Wang, L.S., Jansen, R., Moret, B., Raubeson, L., Warnow, T.: Fast phylogenetic methods for genome rearrangement evolution: An empirical study. In: Proc. 7th Pacific Symp. on Biocomputing (PSB 2002), pp. 524–535. World Scientific Pub., Singapore (2002)Google Scholar
  39. 39.
    Wang, L.S., Warnow, T.: Estimating true evolutionary distances between genomes. In: Gascuel, O., Moret, B.M.E. (eds.) WABI 2001. LNCS, vol. 2149, pp. 176–190. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  40. 40.
    Wildman, D., et al.: Genomics, biogeography, and the diversification of placental mammals. Proc. Nat’l. Acad. Sci., USA 104(36), 14395–14400 (2007)CrossRefGoogle Scholar
  41. 41.
    Yancopoulos, S., Attie, O., Friedberg, R.: Efficient sorting of genomic permutations by translocation, inversion and block interchange. Bioinformatics 21(16), 3340–3346 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Yu Lin
    • 1
  • Vaibhav Rajan
    • 1
  • Bernard M. E. Moret
    • 1
  1. 1.Laboratory for Computational Biology and Bioinformatics, EPFLEPFL-IC-LCBB INJ230LausanneSwitzerland

Personalised recommendations