Heuristic Algorithms for the Protein Model Assignment Problem

  • Jörg Hauser
  • Kassian Kobert
  • Fernando Izquierdo-Carrasco
  • Karen Meusemann
  • Bernhard Misof
  • Michael Gertz
  • Alexandros Stamatakis
Conference paper

DOI: 10.1007/978-3-642-38036-5_16

Part of the Lecture Notes in Computer Science book series (LNCS, volume 7875)
Cite this paper as:
Hauser J. et al. (2013) Heuristic Algorithms for the Protein Model Assignment Problem. In: Cai Z., Eulenstein O., Janies D., Schwartz D. (eds) Bioinformatics Research and Applications. ISBRA 2013. Lecture Notes in Computer Science, vol 7875. Springer, Berlin, Heidelberg

Abstract

Assigning an optimal combination of empirical amino acid substitution models (e.g., WAG, LG, MTART) to partitioned multi-gene datasets when branch lengths across partitions are linked, is suspected to be an NP-hard problem. Given p partitions and the approximately 20 empirical protein models that are available, one needs to compute the log likelihood score of 20p possible model-to-partition assignments for obtaining the optimal assignment.

Initially, we show that protein model assignment (PMA) matters for empirical datasets in the sense that different (optimal versus suboptimal) PMAs can yield distinct final tree topologies when tree searches are conducted using RAxML.

In addition, we introduce and test several heuristics for finding near-optimal PMAs and present generally applicable techniques for reducing the execution times of these heuristics. We show that our heuristics can find PMAs with better log likelihood scores on a fixed, reasonable tree topology than the naïve approach to the PMA, which ignores the fact that branch lengths are linked across partitions. By re-analyzing a large empirical dataset, we show that phylogenies inferred under a PMA calculated by our heuristics have a different topology than trees inferred under a naïvely calculated PMA; these differences also induce distinct biological conclusions. The heuristics have been implemented and are available in a proof-of-concept version of RAxML.

Keywords

phylogenetic inference maximum likelihood model assignment protein data 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Jörg Hauser
    • 1
  • Kassian Kobert
    • 1
  • Fernando Izquierdo-Carrasco
    • 1
  • Karen Meusemann
    • 3
  • Bernhard Misof
    • 3
  • Michael Gertz
    • 2
  • Alexandros Stamatakis
    • 1
  1. 1.Heidelberg Institute for Theoretical StudiesHeidelbergGermany
  2. 2.Institute of Computer ScienceHeidelberg UniversityHeidelbergGermany
  3. 3.Zentrum für molekulare BiodiversitätsforschungZoologisches Forschungsmuseum Alexander KoenigBonnGermany

Personalised recommendations