Skip to main content
Log in

Inference of Functional Divergence Among Proteins When the Evolutionary Process is Non-stationary

  • Published:
Journal of Molecular Evolution Aims and scope Submit manuscript

Abstract

Functional shifts during protein evolution are expected to yield shifts in substitution rate, and statistical methods can test for this at both codon and amino acid levels. Although methods based on models of sequence evolution serve as powerful tools for studying evolutionary processes, violating underlying assumptions can lead to false biological conclusions. It is not unusual for functional shifts to be accompanied by changes in other aspects of the evolutionary process, such as codon or amino acid frequencies. However, models used to test for functional divergence assume these frequencies remain constant over time. We employed simulation to investigate the impact of non-stationary evolution on functional divergence inference. We investigated three likelihood ratio tests based on codon models and found varying degrees of sensitivity. Joint effects of shifts in frequencies and selection pressures can be large, leading to false signals for positive selection. Amino acid-based tests (FunDi and Bivar) were also compromised when several aspects of the substitution process were not adequately modeled. We applied the same tests to a core genome “scan” for functional divergence between light-adapted ecotypes of the cyanobacteria Prochlorococcus, and carried out gene-specific simulations for ten genes. Results of those simulations illustrated how the inference of functional divergence at the genomic level can be seriously impacted by model misspecification. Although computationally costly, simulations motivated by data in hand are warranted when several aspects of the substitution process are either misspecified or not included in the models upon which the statistical tests were built.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Anisimova M, Yang Z (2007) Multiple hypothesis testing to detect lineages under positive selection that affects only a few sites. Mol Biol Evol 24:1219–1228

    Article  PubMed  CAS  Google Scholar 

  • Aris-Brosou S, Bielawski JP (2006) Large-scale analyses of synonymous substitution rates can be sensitive to assumptions about the process of mutation. Gene 378:58–64

    Article  PubMed  CAS  Google Scholar 

  • Bao L, Gu H, Dunn KA, Bielawski JP (2008) Likelihood-based clustering (LiBaC) for codon models, a method for grouping sites according to similarities in the underlying process of evolution. Mol Biol Evol 25:1995–2007

    Article  PubMed  CAS  Google Scholar 

  • Bay RA, Bielawski JP (2011) Recombination detection under evolutionary scenarios relevant to functional divergence. J Mol Evol 73:273–286

    Article  PubMed  CAS  Google Scholar 

  • Bielawski JP, Yang Z (2004) A maximum likelihood method for detecting functional divergence at individual codon sites, with application to gene family evolution. J Mol Evol 59:121–132

    Article  PubMed  CAS  Google Scholar 

  • Chang BSW, Campbell DL (2000) Bias in phylogenetic reconstruction of vertebrate rhodopsin sequences. Mol Biol Evol 17:1220–1231

    Article  PubMed  CAS  Google Scholar 

  • Chisholm SW, Olson RJ, Zettler ER, Goericke R, Waterbury JB, Welschmeyer NA (1988) A novel free-living prochlorophyte abundant in the oceanic euphotic zone. Nature 334:340–343

    Article  Google Scholar 

  • Dufresne A, Salanoubat M, Partensky F, Artiguenave F, Axmann IM, Barbe V, Duprat S, Galperin MY, Koonin EV, Le Gall F et al (2003) Genome sequence of the cyanobacterium Prochlorococcus marinus SS120, a nearly minimal oxyphototrophic genome. Proc Natl Acad Sci USA 100:10020–10025

    Article  PubMed  CAS  Google Scholar 

  • Fletcher W, Yang Z (2009) INDELible: a flexible simulator of biological sequence evolution. Mol Biol Evol 26:1879–1888

    Article  PubMed  CAS  Google Scholar 

  • Galtier N, Gouy M (1998) Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. Mol Biol Evol 15:871–879

    Article  PubMed  CAS  Google Scholar 

  • Gaston D, Susko E, Roger AJ (2011) A phylogenetic mixture model for the identification of functionally divergent protein residues. Bioinformatics 27:2655–2663

    Article  PubMed  CAS  Google Scholar 

  • Gaucher EA, Gu X, Miyamoto MM, Benner SA (2002) Predicting functional divergence in protein evolution by site-specific rate shifts. Trends Biochem Sci 27:315–321

    Article  PubMed  CAS  Google Scholar 

  • Goldman N, Yang ZH (1994) Codon-based model of nucleotide substitution for protein-coding DNA-sequences. Mol Biol Evol 11:725–736

    PubMed  CAS  Google Scholar 

  • Gu X (1999) Statistical methods for testing functional divergence after gene duplication. Mol Biol Evol 16:1664–1674

    Article  PubMed  CAS  Google Scholar 

  • Hess WR, Rocap G, Ting CS, Larimer F, Stilwagen S, Lamerdin J, Chisholm SW (2001) The photosynthetic apparatus of Prochlorococcus: insights through comparative genomics. Photosynth Res 70:53–71

    Article  PubMed  CAS  Google Scholar 

  • Inagaki Y, Roger AJ (2006) Phylogenetic estimation under codon models can be biased by codon usage heterogeneity. Mol Phylogenet Evol 40:428–434

    Article  PubMed  CAS  Google Scholar 

  • Kettler GC, Martiny AC, Huang K, Zucker J, Coleman ML, Rodrigue S, Chen F, Lapidus A, Ferriera S, Johnson J et al (2007) Patterns and implications of gene gain and loss in the evolution of Prochlorococcus. PLoS Genetics 3:2515–2528

    Article  CAS  Google Scholar 

  • Kimura M (1983) The neutral theory of molecular evolution. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Knudsen B, Miyamoto MM (2001) A likelihood ratio test for evolutionary rate shifts and functional divergence among proteins. Proc Natl Acad Sci USA 98:14512–14517

    Article  PubMed  CAS  Google Scholar 

  • Lake JA (1994) Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances. Proc Natl Acad Sci USA 91:1455–1459

    Article  PubMed  CAS  Google Scholar 

  • Mooers AØ, Holmes EC (2000) The evolution of base composition and phylogenetic inference. Trends in Ecol Evol 15:365–369

    Article  Google Scholar 

  • Moore LR, Goericke R, Chisholm SW (1995) Comparative physiology of Synechococcus and Prochlorococcus: influence of light and temperature on growth, pigments, fluorescence and absorptive properties. Mar Ecol Prog Ser 116:259–275

    Article  Google Scholar 

  • Murrell B, Wertheim JO, Moola S, Weighill T, Scheffler K, Kosakovsky Pond SL (2012) Detecting individual sites subject to episodic diversifying selection. PLoS Genet 8(7):e1002764

    Article  PubMed  CAS  Google Scholar 

  • Muse SV, Gaut BS (1994) A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Mol Biol Evol 11:715–724

    PubMed  CAS  Google Scholar 

  • Partensky F, La Roche J, Wyman K, Falkowski PG (1997) The divinyl-chlorophyll a/b-protein complexes of two strains of the oxyphototrophic marine prokaryote Prochlorococcus—characterization and response to changes in growth irradiance. Photosynth Res 51:209–222

    Article  CAS  Google Scholar 

  • Rocap G, Distel DL, Waterbury JB, Chisholm SW (2002) Resolution of Prochlorococcus and Synechococcus ecotypes by using 16S–23S ribosomal DNA internal transcribed spacer sequences. Appl Environ Microbiol 68:1180–1191

    Article  PubMed  CAS  Google Scholar 

  • Rocap G, Larimer FW, Lamerdin J, Malfatti S, Chain P, Ahlgren NA, Arellano A, Coleman M, Hauser L, Hess WR (2003) Genome divergence in two Prochlorococcus ecotypes reflects oceanic niche differentiation. Nature 424:1042–1047

    Article  PubMed  CAS  Google Scholar 

  • Self S, Liang K-Y (1987) Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J Am Stat Assoc 82:605

    Article  Google Scholar 

  • Stamatakis A (2006) RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22:2688–2690

    Article  PubMed  CAS  Google Scholar 

  • Susko E, Inagaki Y, Field C, Holder ME, Roger AJ (2002) Testing for differences in rates-across-sites distributions in phylogenetic subtrees. Mol Biol Evol 19:1514–1523

    Article  PubMed  CAS  Google Scholar 

  • West NJ, Scanlan DJ (1999) Niche-partitioning of Prochlorococcus populations in a stratified water column in the Eastern North Atlantic Ocean. Appl Environ Microbiol 65:2585–2591

    PubMed  CAS  Google Scholar 

  • Whelan S, Goldman N (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 18:691–699

    Article  PubMed  CAS  Google Scholar 

  • Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24:1586–1591

    Article  PubMed  CAS  Google Scholar 

  • Yang Z, dos Reis M (2011) Statistical properties of the branch-site test of positive selection. Mol Biol Evol 28:1217–1228

    Article  PubMed  CAS  Google Scholar 

  • Yang Z, Nielsen R (2002) Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol Biol Evol 19:908–917

    Article  PubMed  CAS  Google Scholar 

  • Yang Z, Roberts D (1995) On the use of nucleic acid sequences to infer early branchings in the tree of life. Mol Biol Evol 12:451–458

    PubMed  CAS  Google Scholar 

  • Zhang J (2004) Frequent false detection of positive selection by the likelihood method with branch-site models. Mol Biol Evol 21:1332–1339

    Article  PubMed  CAS  Google Scholar 

  • Zhang J, Nielsen R, Yang Z (2005) Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol Biol Evol 22:2472–2479

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

This research was supported by a Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant awarded to JPB. The research utilized computer hardware funder by a grant from the Canadian Foundation for Innovation to JPB. JBP also acknowledges the support of the Centre for Genomics and Evolutionary Bioinformatics (CGEB) which is funded by the Tula Foundation. We thank Olga Zhaxybayeva for providing amino acid alignments for the Prochlorococcus genomic data. We thank Katherine A. Dunn for helpful discussions, and for valuable guidance and advice on the development of Perl programs and the automation of analyses of both simulated data and real genome-scale datasets. We thank Joseph R. Mingrone for helpful discussions, and for many important contributions to the computational work carried out as part of this study. We also thank three anonymous referees for their comments, and for several suggestions that substantially improved this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rachael A. Bay.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 250 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bay, R.A., Bielawski, J.P. Inference of Functional Divergence Among Proteins When the Evolutionary Process is Non-stationary. J Mol Evol 76, 205–215 (2013). https://doi.org/10.1007/s00239-013-9549-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00239-013-9549-0

Keywords

Navigation