Detecting Signatures of Selection from DNA Sequences Using Datamonkey

  • Art F.Y. Poon
  • Simon D.W. Frost
  • Sergei L. Kosakovsky Pond
Part of the Methods in Molecular Biology book series (MIMB, volume 537)


Natural selection is a fundamental process affecting all evolving populations. In the simplest case, positive selection increases the frequency of alleles that confer a fitness advantage relative to the rest of the population, or increases its genetic diversity, and negative selection removes those alleles that are deleterious. Codon-based models of molecular evolution are able to infer signatures of selection from alignments of homologous sequences by estimating the relative rates of synonymous (dS) and non-synonymous substitutions (dN). Datamonkey ( provides a user-friendly web interface to a wide collection of state-of-the-art statistical techniques for estimating dS and dN and identifying codons and lineages under selection, even in the presence of recombinant sequences.

Key words

Positive selection adaptive evolution dN and dS estimation HyPhy phylogenetic analysis maximum likelihood inference parallel algorithms web service 


  1. 1.
    Nielsen, R., and Yang, Z. (1998) Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148, 929–36.PubMedGoogle Scholar
  2. 2.
    Yang, Z. H., Nielsen, R., Goldman, N., and Pedersen, A. M. K. (2000) Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155, 431–49.PubMedGoogle Scholar
  3. 3.
    Kosakovsky Pond, S. L., and Frost, S. D. W. (2005) Datamonkey: rapid detection of selective pressure on individual sites of codon alignments. Bioinformatics 21, 2531–3.CrossRefGoogle Scholar
  4. 4.
    Kosakovsky Pond, S. L., Frost, S. D. W., and Muse, S. V. (2005) HyPhy: hypothesis testing using phylogenies. Bioinformatics 21, 676–9.CrossRefGoogle Scholar
  5. 5.
    Kosakovsky Pond, S. L., Poon, A. F. Y., and Frost, S. D. W. (2007) Phylogenetic Handbook (Lemey, P., and Pybus, O., Eds.), Estimating selection pressures on alignments of coding sequences (in press; preprint available at, Cambridge University Press: Cambridge.
  6. 6.
    Kosakovsky Pond, S. L., and Frost, S. D. W. (2005) Not so different after all: a comparison of methods for detecting amino-acid sites under selection. Mol Biol Evol 22, 1208–22.PubMedCrossRefGoogle Scholar
  7. 7.
    Kosakovsky Pond, S. L., Frost, S. D. W., Grossman, Z., Gravenor, M. B., Richman, D. D., and Brown, A. J. L. (2006) Adaptation to different human populations by HIV-1 revealed by codon-based analyses. PLoS Comput Biol 2, e62.CrossRefGoogle Scholar
  8. 8.
    Kosakovsky Pond, S. L., and Frost, S. D. W. (2005) A genetic algorithm approach to detecting lineage-specific variation in selection pressure. Mol Biol Evol 22, 478–85.CrossRefGoogle Scholar
  9. 9.
    Kosakovsky Pond, S. L., Posada, D., Gravenor, M. B., Woelk, C. H., and Frost, S. D. W. (2006) Automated phylogenetic detection of recombination using a genetic algorithm. Mol Biol Evol 23, 1891–901.PubMedCrossRefGoogle Scholar
  10. 10.
    Shriner, D., Nickle, D. C., Jensen, M. A., and Mullins, J. I. (2003) Potential impact of recombination on sitewise approaches for detecting positive natural selection. Genet Res 81, 115–21.PubMedCrossRefGoogle Scholar
  11. 11.
    Scheffler, K., Martin, D. P., and Seoighe, C. (2006) Robust inference of positive selection from recombining coding sequences. Bioinformatics 22, 2493–9.PubMedCrossRefGoogle Scholar
  12. 12.
    Saitou, N., and Nei, M. (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4, 406–25.PubMedGoogle Scholar
  13. 13.
    Tamura, K., and Nei, M. (1993) Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol 10, 512–26.PubMedGoogle Scholar
  14. 14.
    Kosakovsky Pond, S. L., and Frost, S. D. W. (2005) A simple hierarchical approach to modeling distributions of substitution rates. Mol Biol Evol 22, 223–34.CrossRefGoogle Scholar
  15. 15.
    Hasegawa, M., Kishino, H., and Yano, T. A. (1985) Dating of the human ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22, 160–74.PubMedCrossRefGoogle Scholar
  16. 16.
    Yang, Z. H. (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 13, 555–6.PubMedGoogle Scholar
  17. 17.
    Caton, A. J., Brownlee, G. G., Yewdell, J. W., and Gerhard, W. (1982) The antigenic structure of the influenza virus A/PR/8/34 hemagglutinin (H1 subtype). Cell 31, 417–27.PubMedCrossRefGoogle Scholar
  18. 18.
    Perdue, M. L., and Suarez, D. L. (2000) Structural features of the avian influenza virus hemagglutinin that influence virulence. Vet Microbiol 74, 77–86.PubMedCrossRefGoogle Scholar
  19. 19.
    Frost, S. D. W., Little, S. J., Kosakovsky Pond, S. L., Chappey, C., Liu, Y., Wrin, T., Petropoulos, C. J., and Richman, D. D. (2005) Characterization of HIV-1 envelope variation and neutralizing antibody responses during transmission of HIV-1 subtype B. J Virol 79, 6523–7.PubMedCrossRefGoogle Scholar
  20. 20.
    Nielsen, R., and Yang, Z. H. (1998) Likelihood models for detecting positively selected amino acid sites and applications to the {HIV-1} envelope gene. Genetics 148, 929–36.PubMedGoogle Scholar
  21. 21.
    Suzuki, Y., and Gojobori, T. (1999) A method for detecting positive selection at single amino acid sites. Mol Biol Evol 16, 1315–28.PubMedGoogle Scholar
  22. 22.
    Yang, Z., and Nielsen, R. (2002) Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol Biol Evol 19, 908–17.PubMedCrossRefGoogle Scholar
  23. 23.
    Vergne, L., Peeters, M., Mpoudi-Ngole, E., Bourgeois, A., Liegeois, F., Toure-Kane, C., Mboup, S., Mulanga-Kabeya, C., Saman, E., Jourdan, J., Reynes, J., and Delaporte, E. (2000) Genetic diversity of protease and reverse transcriptase sequences in non-subtype-B human immunodeficiency virus type 1 strains: evidence of many minor drug resistance mutations in treatment-naive patients. J Clin Microbiol 38, 3919–25.PubMedGoogle Scholar
  24. 24.
    Posada, D. (2002) Evaluation of methods for detecting recombination from DNA sequences: empirical data. Mol Biol Evol 19, 708–17.PubMedCrossRefGoogle Scholar
  25. 25.
    Suzuki, Y., and Nei, M. (2004) False-positive selection identified by ML-based methods: examples from the sig1 gene of the diatom thalassiosira weissflogii and the tax gene of a human T-cell lymphotropic virus. Mol Biol Evol 21, 914–21.PubMedCrossRefGoogle Scholar
  26. 26.
    Anisimova, M., Bielawski, J. P., and Yang, Z. H. (2002) Accuracy and power of Bayes prediction of amino acid sites under positive selection. Mol Biol Evol 19, 950–8.PubMedCrossRefGoogle Scholar
  27. 27.
    Anisimova, M., Bielawski, J. P., and Yang, Z. H. (2001) Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. Mol Biol Evol 18, 1585–92.PubMedCrossRefGoogle Scholar
  28. 28.
    Posada, D., and Buckley, T. R. (2004) Model selection and model averaging in phylogenetics: advantages of Akaike information criterion and Bayesian approaches over likelihood ratio tests. Syst Biol 53, 793–808.PubMedCrossRefGoogle Scholar
  29. 29.
    Sorhannus, U., and Kosakovsky Pond, S. L. (2006) Evidence for positive selection on a sexual reproduction gene in the diatom genus Thalassiosira (Bacillariophyta). J Mol Evol 63, 231–9.PubMedCrossRefGoogle Scholar

Copyright information

© Humana Press, a part of Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Art F.Y. Poon
    • 1
  • Simon D.W. Frost
    • 1
  • Sergei L. Kosakovsky Pond
    • 1
  1. 1.Antiviral Research Center, Department of PathologyUniversity of California San DiegoLa JollaUSA

Personalised recommendations