MURPAR: A Fast Heuristic for Inferring Parsimonious Phylogenetic Networks from Multiple Gene Trees

  • Hyun Jung Park
  • Luay Nakhleh
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7292)

Abstract

Phylogenetic networks provide a graphical representation of evolutionary histories that involve non-treelike evolutionary events, such as horizontal gene transfer (HGT). One approach for inferring phylogenetic networks is based on reconciling gene trees, assuming all incongruence among the gene trees is due to HGT. Several mathematical results and algorithms, both exact and heuristic, have been introduced to construct and analyze phylogenetic networks. Here, we address the computational problem of inferring phylogenetic networks with minimum reticulations from a collection of gene trees. As this problem is known to be NP-hard even for a pair of gene trees, the problem at hand is very hard. In this paper, we present an efficient heuristic, MURPAR, for inferring a phylogenetic network from a collection of gene trees by using pairwise reconciliations of trees in the collection. Given the development of efficient and accurate methods for pairwise gene tree reconciliations, MURPAR inherits this efficiency and accuracy. Further, the method includes a formulation for combining pairwise reconciliations that is naturally amenable to an efficient integer linear programming (ILP) solution. We show that MURPAR produces more accurate results than other methods and is at least as fast, when run on synthetic and biological data. We believe that our method is especially important for rapidly obtaining estimates of genome-scale evolutionary histories that can be further refined by more detailed and compute-intensive methods.

References

  1. 1.
    Addario-Berry, L., Hallett, M., Lagergren, J.: Towards identifying lateral gene transfer events. In: Proc. Eighth Pacific Symp. Biocomputing (PSB 2003), pp. 279–290 (2003)Google Scholar
  2. 2.
    Semple, C., Baroni, M., Steel, M.: A framework for representing reticulate evolution. Annals of Combinatorics 8, 391–408 (2004)MathSciNetMATHGoogle Scholar
  3. 3.
    Beiko, R.G., Hamilton, N.: Phylogenetic identification of lateral genetic transfer events. BMC Evolutionary Biology 6, 15+ (2006)CrossRefGoogle Scholar
  4. 4.
    Beiko, R.G., Ragan, M.A.: Untangling hybrid phylogenetic signals: Horizontal gene transfer and artifacts of phylogenetic reconstruction. Methods Mol. Biol. 532, 241–256 (2009)CrossRefGoogle Scholar
  5. 5.
    Bordewich, M., Linz, S., John, K.S., Semple, C.: A reduction algorithm for computing the hybridization number of two trees. Evolutionary Bioinformatics 3, 86–98 (2007)Google Scholar
  6. 6.
    Bordewich, M., Semple, C.: On the computational complexity of the rooted subtree prune and regraft distance. Annals of Combinatorics 8, 409–423 (2004)MathSciNetMATHCrossRefGoogle Scholar
  7. 7.
    Galtier, N.: A model of horizontal gene transfer and the bacterial phylogeny problem. Systematic Biology 56(4), 633–642 (2007)CrossRefGoogle Scholar
  8. 8.
    Goloboff, P.A.: Calculating SPR distances between trees. Cladistics 24, 591–597 (2007)CrossRefGoogle Scholar
  9. 9.
    Hallett, M.T., Lagergren, J.: Efficient algorithms for lateral gene transfer problems. In: Proc. 5th Ann. Int’l Conf. Comput. Mol. Biol. (RECOMB 2001), pp. 149–156. ACM Press, New York (2001)Google Scholar
  10. 10.
    Hill, T., Nordstrom, K., Thollesson, M., Safstrom, T., Vernersson, A., Fredriksson, R., Schioth, H.: Sprit: Identifying horizontal gene transfer in rooted phylogenetic trees. BMC Evolutionary Biology 10(1), 42+ (2010)CrossRefGoogle Scholar
  11. 11.
    Huson, D.H., Bryant, D.: Application of phylogenetic networks in evolutionary studies. Molecular Biology and Evolution 23(2), 254–267 (2006)CrossRefGoogle Scholar
  12. 12.
    Huson, D.H., Rupp, R.: Summarizing Multiple Gene Trees Using Cluster Networks. In: Crandall, K.A., Lagergren, J. (eds.) WABI 2008. LNCS (LNBI), vol. 5251, pp. 296–305. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  13. 13.
    Linz, S., Semple, C.: A cluster reduction for computing the subtree distance between phylogenies. Annals of Combinatorics 15, 465–484 (2011)MathSciNetMATHCrossRefGoogle Scholar
  14. 14.
    MacLeod, D., Charlebois, R.L., Doolittle, F., Bapteste, E.: Deduction of probable events of lateral gene transfer through comparison of phylogenetic trees by recursive consolidation and rearrangement. BMC Evolutionary Biology 5 (2005)Google Scholar
  15. 15.
    Nakhleh, L.: Evolutionary phylogenetic networks: models and issues. In: Heath, L., Ramakrishnan, N. (eds.) The Problem Solving Handbook for Computational Biology and Bioinformatics, pp. 125–158. Springer, New York (2010)CrossRefGoogle Scholar
  16. 16.
    Nakhleh, L., Ruths, D., Wang, L.-S.: RIATA-HGT: A Fast and Accurate Heuristic for Reconstructing Horizontal Gene Transfer. In: Wang, L. (ed.) COCOON 2005. LNCS, vol. 3595, pp. 84–93. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  17. 17.
    Park, H.J., Jin, G., Nakhleh, L.: Algorithmic strategies for estimating the amount of reticulation from a collection of gene trees. In: Proceedings of the 9th Annual International Conference on Computational Systems Biology, pp. 114–123 (2010)Google Scholar
  18. 18.
    Rambaut, A.: Phylogen: Phylogenetic tree simulator package (2002), http://evolve.zoo.ox.ac.uk/software/PhyloGen/main.html
  19. 19.
    Schmidt, H., Martin, W.: Phylogenetic Trees from Large Datasets Inaugural–Dissertation zur. PhD thesis, Heinrich-Heine-Universitt, Dsseldorf (2003)Google Scholar
  20. 20.
    Than, C., Nakhleh, L.: SPR-based tree reconciliation: Non-binary trees and multiple solutions. In: Proceedings of the Sixth Asia Pacific Bioinformatics Conference, pp. 251–260 (2008)Google Scholar
  21. 21.
    Than, C., Ruths, D., Innan, H., Nakhleh, L.: Confounding factors in HGT detection: Statistical error, coalescent effects, and multiple solutions. Journal of Computational Biology 14(4), 517–535 (2007)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Than, C., Ruths, D., Nakhleh, L.: PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships. BMC Bioinformatics 9, 322 (2008)CrossRefGoogle Scholar
  23. 23.
    Tofigh, A., Hallett, M., Lagergren, J.: Simultaneous identification of duplications and lateral gene transfers. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1–19 (January 2011)Google Scholar
  24. 24.
    van Iersel, L., Kelk, S., Rupp, R., Huson, D.H.: Phylogenetic networks do not need to be complex: using fewer reticulations to represent conflicting clusters. Bioinformatics [ISMB] 26(12), i124–i131 (2010)CrossRefGoogle Scholar
  25. 25.
    Wu, Y., Wang, J.: Fast Computation of the Exact Hybridization Number of Two Phylogenetic Trees. In: Borodovsky, M., Gogarten, J.P., Przytycka, T.M., Rajasekaran, S. (eds.) ISBRA 2010. LNCS, vol. 6053, pp. 203–214. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  26. 26.
    Whidden, C., Beiko, R.G., Zeh, N.: Fast FPT Algorithms for Computing Rooted Agreement Forests: Theory and Experiments. In: Festa, P. (ed.) SEA 2010. LNCS, vol. 6049, pp. 141–153. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  27. 27.
    Wu, Y.: Close lower and upper bounds for the minimum reticulate network of multiple phylogenetic trees. Bioinformatics [ISMB] 26(12), 140–148 (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Hyun Jung Park
    • 1
  • Luay Nakhleh
    • 1
  1. 1.Dept. of Computer ScienceRice UniversityHoustonUSA

Personalised recommendations