Sequences for multiple protein-coding genes are now commonly available from several, often closely related species. These data sets offer intriguing opportunities to test hypotheses regarding whether different types of genes evolve under different selective pressures. Although maximum likelihood (ML) models of codon substitution that are suitable for such analyses have been developed, little is known about the statistical properties of these tests. We use a previously developed fixed-sites model and computer simulations to examine the accuracy and power of the likelihood ratio test (LRT) in comparing the nonsynonymous-to-synonymous substitution rate ratio (ω = dN/dS) between two genes. Our results show that the LRT applied to fixed-sites models may be inaccurate in some cases when setting significance thresholds using a χ2 approximation. Instead, we use a parametric bootstrap to describe the distribution of the LRT statistic for fixed-sites models and examine the power of the test as a function of sampling variables and properties of the genes under study. We find that the power of the test is high (>80%) even when sampling few taxa (e.g., six species) if sequences are sufficiently diverged and the test is largely unaffected by the tree topology used to simulate data. Our simulations show fixed-sites models are suitable for comparing substitution parameters among genes evolving under even strong evolutionary constraint (ω ≈ 0.05), although relative rate differences of 25% or less may be difficult to detect.
Likelihood ratio test Nonsynonymous/synonymous rate ratio