Sequences for multiple protein-coding genes are now commonly available from several, often closely related species. These data sets offer intriguing opportunities to test hypotheses regarding whether different types of genes evolve under different selective pressures. Although maximum likelihood (ML) models of codon substitution that are suitable for such analyses have been developed, little is known about the statistical properties of these tests. We use a previously developed fixed-sites model and computer simulations to examine the accuracy and power of the likelihood ratio test (LRT) in comparing the nonsynonymous-to-synonymous substitution rate ratio (ω = dN/dS) between two genes. Our results show that the LRT applied to fixed-sites models may be inaccurate in some cases when setting significance thresholds using a χ2 approximation. Instead, we use a parametric bootstrap to describe the distribution of the LRT statistic for fixed-sites models and examine the power of the test as a function of sampling variables and properties of the genes under study. We find that the power of the test is high (>80%) even when sampling few taxa (e.g., six species) if sequences are sufficiently diverged and the test is largely unaffected by the tree topology used to simulate data. Our simulations show fixed-sites models are suitable for comparing substitution parameters among genes evolving under even strong evolutionary constraint (ω ≈ 0.05), although relative rate differences of 25% or less may be difficult to detect.
Likelihood ratio test Nonsynonymous/synonymous rate ratio
We thank Z. Yang, W. Swanson, and J. Felsenstein for helpful discussions. This work was supported by an NSF IGERT predoctoral fellowship and NSF dissertation improvement grant (DEB-0105176) to J.E.A. and by a grant from the NIH (GM54185) to P.C.P.
Anisimova, M, Bielawski, JP, Yang, Z 2001The accuracy and power of likelihood ratio tests to detect positive selection at amino acid sitesMol Biol Evol1815851592PubMedGoogle Scholar
Anisimova, M, Bielawski, JP, Yang, Z 2002Accuracy and power of Bayes prediction of amino acid sites under positive selectionMol Biol Evol19950958PubMedGoogle Scholar
Barrier, M, Bustamante, CD, Yu, J, Purugganan, MD 2003Selection on rapidly evolving proteins in the Arabidopsis genomeGenetics163723733PubMedGoogle Scholar
Endo, T, Ikeo, K, Gojobori, T 1996Large-scale selection for genes on which positive selection may operateMol Biol Evol13685690PubMedGoogle Scholar
Muse, SV, Gaut, BS 1994A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genomeMol Biol Evol11715724PubMedGoogle Scholar
Nielsen, R, Yang, Z 1998Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope geneGenetics148929936PubMedGoogle Scholar
Riley, R, Jin, W, Gibson, G 2003Contrasting selection pressures on components of Ras-mediated signal transduction in Droso- philaMol Ecol1213151323CrossRefPubMedGoogle Scholar
Stuart, A, Ord, K, Arnold, S 1999Kendall’s advanced theory of statistics, 6th ed, Vol 2a.ArnoldLondonGoogle Scholar
Whelan, S, Goldman, N 1999Distributions of statistics used for the comparison of models of sequence evolution in phylogeneticsMol Biol Evol1612921299Google Scholar
Yang, Z 1997PAML: a program for package for phylogenetic analysis by maximum likelihoodCABIOS15555556Google Scholar
Yang, Z 1998Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolutionMol Biol Evol15568573PubMedGoogle Scholar