New Methods for Detecting Lineage-Specific Selection
So far, most methods for identifying sequences under selection based on comparative sequence data have either assumed selectional pressures are the same across all branches of a phylogeny, or have focused on changes in specific lineages of interest. Here, we introduce a more general method that detects sequences that have either come under selection, or begun to drift, on any lineage. The method is based on a phylogenetic hidden Markov model (phylo-HMM), and does not require element boundaries to be determined a priori, making it particularly useful for identifying noncoding sequences. Insertions and deletions (indels) are incorporated into the phylo-HMM by a simple strategy that uses a separately reconstructed “indel history.” To evaluate the statistical significance of predictions, we introduce a novel method for computing P-values based on prior and posterior distributions of the number of substitutions that have occurred in the evolution of predicted elements. We derive efficient dynamic-programming algorithms for obtaining these distributions, given a model of neutral evolution. Our methods have been implemented as computer programs called DLESS (Detection of LinEage-Specific Selection) and phyloP (phylogenetic P-values). We discuss results obtained with these programs on both real and simulated data sets.
KeywordsPosterior Distribution False Positive Rate Neutral Model Full Paper Neutral Evolution
Unable to display preview. Download preview PDF.
- 2.Woolfe, A., Goodson, M., Goode, D., Snell, P., McEwen, G., Vavouri, T., Smith, S., North, P., Callaway, H., Kelly, K., et al.: Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol. 3, e7 (2005)Google Scholar
- 7.Nielsen, R., Yang, Z.: Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148, 929–936 (1998)Google Scholar
- 8.Yang, Z., Nielsen, R.: Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol. Biol. Evol. 19, 908–917 (2002)Google Scholar
- 12.Nielsen, R., Bustamante, C., Clark, A.G., Glanowski, S., Sackton, T.B., Hubisz, M.J., Fledel-Alon, A., Tanenbaum, D.M., Civello, D., White, T.J., et al.: A scan for positively selected genes in the genomes of humans and chimpanzees. PLoS Biol. 3, e170 (2005)Google Scholar
- 14.Felsenstein, J., Churchill, G.A.: A hidden Markov model approach to variation among sites in rate of evolution. Mol. Biol. Evol. 13, 93–104 (1996)Google Scholar
- 15.Yang, Z.: A space-time process model for the evolution of DNA sequences. Genetics 139, 993–1005 (1995)Google Scholar
- 19.Siepel, A., Haussler, D.: Computational identification of evolutionarily conserved exons. In: Proc. 8th Int’l Conf. on Research in Computational Molecular Biology, pp. 177–186 (2004)Google Scholar
- 24.Jukes, T.H., Cantor, C.R.: Evolution of protein molecules. In: Munro, H. (ed.) Mammalian Protein Metabolism, pp. 21–132. Academic Press, New York (1969)Google Scholar
- 25.Gillespie, J.: Lineage effects and the index of dispersion of molecular evolution. Mol. Biol. Evol. 6, 636–647 (1989)Google Scholar
- 28.Felsenstein, J.: Inferring Phylogenies. Sinauer Associates, Inc., Sunderland, Massachusetts (2004)Google Scholar
- 29.Nielsen, R., Huelsenbeck, J.P.: Detecting positively selected amino acid sites using posterior predictive P-values. Pac. Symp. Biocomput., 576–588 (2002)Google Scholar