Skip to main content

Heuristic Algorithms for the Protein Model Assignment Problem

  • Conference paper
Book cover Bioinformatics Research and Applications (ISBRA 2013)

Abstract

Assigning an optimal combination of empirical amino acid substitution models (e.g., WAG, LG, MTART) to partitioned multi-gene datasets when branch lengths across partitions are linked, is suspected to be an NP-hard problem. Given p partitions and the approximately 20 empirical protein models that are available, one needs to compute the log likelihood score of 20p possible model-to-partition assignments for obtaining the optimal assignment.

Initially, we show that protein model assignment (PMA) matters for empirical datasets in the sense that different (optimal versus suboptimal) PMAs can yield distinct final tree topologies when tree searches are conducted using RAxML.

In addition, we introduce and test several heuristics for finding near-optimal PMAs and present generally applicable techniques for reducing the execution times of these heuristics. We show that our heuristics can find PMAs with better log likelihood scores on a fixed, reasonable tree topology than the naïve approach to the PMA, which ignores the fact that branch lengths are linked across partitions. By re-analyzing a large empirical dataset, we show that phylogenies inferred under a PMA calculated by our heuristics have a different topology than trees inferred under a naïvely calculated PMA; these differences also induce distinct biological conclusions. The heuristics have been implemented and are available in a proof-of-concept version of RAxML.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tavaré, S.: Some probabilistic and statistical problems in the analysis of DNA sequences. Some Mathematical Questions in Biology-DNA Sequence Analysis 17, 57–86 (1986)

    Google Scholar 

  2. Abascal, F., Posada, D., Zardoya, R.: Mtart: a new model of amino acid replacement for arthropoda. Mol. Biol. Evol. 24(1), 1–5 (2007)

    Article  Google Scholar 

  3. Whelan, S., Goldman, N.: A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol. 18(5), 691–699 (2001)

    Article  Google Scholar 

  4. Le, S., Gascuel, O.: An improved general amino acid replacement matrix. Mol. Biol. Evol. 25(7), 1307–1320 (2008)

    Article  Google Scholar 

  5. Sullivan, J., Swofford, D.: Are guinea pigs rodents? The importance of adequate models in molecular phylogenetics. J. Mamm. Evol. 4(2), 77–86 (1997)

    Article  Google Scholar 

  6. Keane, T., Creevey, C., Pentony, M., Naughton, T., Mclnerney, J.: Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC Evol. Biol. 6(1), 29 (2006)

    Google Scholar 

  7. Lanfear, R., Calcott, B., Ho, S., Guindon, S.: Partitionfinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Mol. Biol. Evol. 29(6), 1695–1701 (2012)

    Article  Google Scholar 

  8. Meusemann, K., von Reumont, B., Simon, S., Roeding, F., Strauss, S., Kück, P., Ebersberger, I., Walzl, M., Pass, G., Breuers, S., et al.: A phylogenomic approach to resolve the arthropod tree of life. Mol. Biology Evol. 27(11), 2451–2464 (2010)

    Article  Google Scholar 

  9. Yutin, N., Puigbò, P., Koonin, E., Wolf, Y.: Phylogenomics of Prokaryotic Ribosomal Proteins. PloS ONE 7(5) (2012)

    Google Scholar 

  10. Stamatakis, A., Ludwig, T., Meier, H.: RAxML-III: A Fast Program for Maximum Likelihood-based Inference of Large Phylogenetic Trees. Bioinformatics 21(4), 456–463 (2005)

    Article  Google Scholar 

  11. Kobert, K., Hauser, J., Stamatakis, A.: Is the Protein Model Assignment Problem NP-hard?; Exelixis-RRDR-2012-9; Technical report, Heidelberg Institute for Theoretical Studies (October 2012), http://sco.h-its.org/exelixis/pubs/Exelixis-RRDR-2012-9.pdf

  12. Posada, D.: In: Selection of Phylogenetic Models of Molecular Evolution. John Wiley & Sons, Ltd. (2001)

    Google Scholar 

  13. Abascal, F., Zardoya, R., Posada, D.: Prottest: selection of best-fit models of protein evolution. Bioinformatics 21(9), 2104–2105 (2005)

    Article  Google Scholar 

  14. Tanabe, A.: Kakusan4 and aminosan: two programs for comparing nonpartitioned, proportional and separate models for combined molecular phylogenetic analyses of multilocus sequence data. Mol. Ecol. Resources 11(5), 914–921 (2011)

    Article  Google Scholar 

  15. Yang, Z.: Among-site rate variation and its impact on phylogenetic analyses. Trends Ecol. & Evol. 11(9), 367–372 (1996)

    Article  Google Scholar 

  16. Yang, Z.: Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites. J. Mol. Evol. 39, 306–314 (1994)

    Article  Google Scholar 

  17. Hauser, J.: Algorithms for Model Assignment in Multi-Gene Phylogenetics. Master’s thesis, Ruprecht-Karls University Heidelberg (2012)

    Google Scholar 

  18. Kirkpatrick, S., Gelatt, C., Vecchi, M.: Optimization by simulated annealing. Science 220(4598), 671 (1983)

    Google Scholar 

  19. Aarts, E., Laarhoven, P.: Simulated annealing: an introduction. Stat. Neerland. 43(1), 31–52 (1989)

    Article  MATH  Google Scholar 

  20. Stamatakis, A.: RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22(21), 2688–2690 (2006)

    Article  Google Scholar 

  21. Robinson, D., Foulds, L.: Comparison of phylogenetic trees. Math. Biosci. 53(1-2), 131–147 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  22. Yutin, N., Puigbò, P., Koonin, E., Wolf, Y.: Phylogenomics of Prokaryotic Ribosomal Proteins. PloS ONE 7(5), e36972 (2012)

    Google Scholar 

  23. Fletcher, W., Yang, Z.: Indelible: a flexible simulator of biological sequence evolution. Mol. Biol. Evol. 26(8), 1879–1888 (2009)

    Article  Google Scholar 

  24. Grimaldi, D.: 400 million years on six legs: On the origin and early evolution of Hexapoda. Arthropod Struct. & Dev. 39(2), 191–203 (2010)

    Article  Google Scholar 

  25. Trautwein, M., Wiegmann, B., Beutel, R., Kjer, K., Yeates, D.: Advances in insect phylogeny at the dawn of the postgenomic era. Ann. R. Entomol. 57, 449–468 (2012)

    Article  Google Scholar 

  26. Letsch, H., Meusemann, K., Wipfler, B., Schütte, K., Beutel, R., Misof, B.: Insect phylogenomics: results, problems and the impact of matrix composition. Proc. Royal Soc. B 279(1741), 3282–3290 (2012)

    Article  Google Scholar 

  27. von Reumont, B., Jenner, R., Wills, M., Dell’Ampio, E., Pass, G., Ebersberger, I., Meyer, B., Koenemann, S., Iliffe, T., Stamatakis, A., et al.: Pancrustacean phylogeny in the light of new phylogenomic data: support for Remipedia as the possible sister group of Hexapoda. Mol. Biol. Evol. 29(3), 1031–1045 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hauser, J. et al. (2013). Heuristic Algorithms for the Protein Model Assignment Problem. In: Cai, Z., Eulenstein, O., Janies, D., Schwartz, D. (eds) Bioinformatics Research and Applications. ISBRA 2013. Lecture Notes in Computer Science(), vol 7875. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38036-5_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38036-5_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38035-8

  • Online ISBN: 978-3-642-38036-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics