Genome-Wide Detection of Selection and Other Evolutionary Forces

  • Zhuofei XuEmail author
  • Rui Zhou
Part of the Methods in Molecular Biology book series (MIMB, volume 1231)


As is well known, pathogenic microbes evolve rapidly to escape from the host immune system and antibiotics. Genetic variations among microbial populations occur frequently during the long-term pathogen–host evolutionary arms race, and individual mutation beneficial for the fitness can be fixed preferentially. Many recent comparative genomics studies have pointed out the importance of selective forces in the molecular evolution of bacterial pathogens. The public availability of large-scale next-generation sequencing data and many state-of-the-art statistical methods of molecular evolution enable us to scan genome-wide alignments for evidence of positive Darwinian selection, recombination, and other evolutionary forces operating on the coding regions. In this chapter, we describe an integrative analysis pipeline and its application to tracking featured evolutionary trajectories on the genome of an animal pathogen. The evolutionary analysis of the protein-coding part of the genomes will provide a wide spectrum of genetic variations that play potential roles in adaptive evolution of bacteria.

Key words

Sequence alignment Positive selection Intragenic homologous recombination Adaptive evolution Bacteria 



This work was supported by the National Basic Research Program of China (973 Program, 2012CB518802).


  1. 1.
    Petersen L, Bollback JP, Dimmic M et al (2007) Genes under positive selection in Escherichia coli. Genome Res 17:1336–1343CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Lefébure T, Stanhope MJ (2007) Evolution of the core and pan-genome of Streptococcus: positive selection, recombination, and genome composition. Genome Biol 8:R71CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Lefébure T, Stanhope MJ (2009) Pervasive, genome-wide positive selection leading to functional divergence in the bacterial genus Campylobacter. Genome Res 19:1224–1232CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Lam TT, Hon CC, Pybus OG et al (2008) Evolutionary and transmission dynamics of reassortant H5N1 influenza virus in Indonesia. PLoS Pathog 4:e1000130CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Yang Z, Bielawski JP (2000) Statistical methods for detecting molecular adaptation. Trends Ecol Evol 15:496–503CrossRefPubMedGoogle Scholar
  6. 6.
    Yang Z, Nielsen R, Goldman N et al (2000) Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155:431–449PubMedPubMedCentralGoogle Scholar
  7. 7.
    Delcher AL, Bratke KA, Powers EC et al (2007) Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 23:673–679CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Lukashin AV, Borodovsky M (1998) GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res 26:1107–1115CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Zhu H, Hu GQ, Yang YF et al (2007) MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes. BMC Bioinform 8:97CrossRefGoogle Scholar
  10. 10.
    Hyatt D, Chen GL, Locascio PF et al (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinform 11:119CrossRefGoogle Scholar
  11. 11.
    Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658–1659CrossRefPubMedGoogle Scholar
  12. 12.
    Notredame C, Higgins DG, Heringa J (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302:205–217CrossRefPubMedGoogle Scholar
  13. 13.
    Suyama M, Torrents D, Bork P (2006) PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res 34:W609–W612CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Castresana J (2000) Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol 17:540–552CrossRefPubMedGoogle Scholar
  15. 15.
    Kosakovsky Pond SL, Posada D, Gravenor MB et al (2006) GARD: a genetic algorithm for recombination detection. Bioinformatics 22:3096–3098CrossRefPubMedGoogle Scholar
  16. 16.
    Pond SL, Frost SD, Muse SV (2005) HyPhy: hypothesis testing using phylogenies. Bioinformatics 21:676–679CrossRefPubMedGoogle Scholar
  17. 17.
    Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52:696–704CrossRefPubMedGoogle Scholar
  18. 18.
    Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24:1586–1591CrossRefPubMedGoogle Scholar
  19. 19.
    Kelley LA, Sternberg MJ (2009) Protein structure prediction on the Web: a case study using the Phyre server. Nat Protoc 4:363–371CrossRefPubMedGoogle Scholar
  20. 20.
    Xu Z, Chen X, Li L et al (2010) Comparative genomic characterization of Actinobacillus pleuropneumoniae. J Bacteriol 192:5625–5636CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Altschul SF, Gish W, Miller W et al (1990) Basic local alignment search tool. J Mol Biol 215:403–410CrossRefPubMedGoogle Scholar
  22. 22.
    Tatusov RL, Fedorova ND, Jackson JD et al (2003) The COG database: an updated version includes eukaryotes. BMC Bioinform 4:41CrossRefGoogle Scholar
  23. 23.
    Talavera G, Castresana J (2007) Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol 56:564–577CrossRefPubMedGoogle Scholar
  24. 24.
    Pond SL, Murrell B, Poon AF (2012) Evolution of viral genomes: interplay between selection, recombination, and other forces. Methods Mol Biol 856:239–272CrossRefPubMedGoogle Scholar
  25. 25.
    Yang Z, Wong WS, Nielsen R (2005) Bayes empirical bayes inference of amino acid sites under positive selection. Mol Biol Evol 22:1107–1118CrossRefPubMedGoogle Scholar
  26. 26.
    Xu Z, Chen H, Zhou R (2011) Genome-wide evidence for positive selection and recombination in Actinobacillus pleuropneumoniae. BMC Evol Biol 11:203CrossRefPubMedPubMedCentralGoogle Scholar
  27. 27.
    Banks KE, Fortney KR, Baker B et al (2008) The enterobacterial common antigen-like gene cluster of Haemophilus ducreyi contributes to virulence in humans. J Infect Dis 197:1531–1536CrossRefPubMedGoogle Scholar
  28. 28.
    Chung JW, Ng-Thow-Hing C, Budman LI et al (2007) Outer membrane proteome of Actinobacillus pleuropneumoniae: LC-MS/MS analyses validate in silico predictions. Proteomics 7:1854–1865CrossRefPubMedGoogle Scholar
  29. 29.
    Sheehan BJ, Bossé JT, Beddek AJ et al (2003) Identification of Actinobacillus pleuropneumoniae genes important for survival during infection in its natural host. Infect Immun 71:3960–3970CrossRefPubMedPubMedCentralGoogle Scholar
  30. 30.
    Anisimova M, Bielawski JP, Yang Z (2001) Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. Mol Biol Evol 18:1585–1592CrossRefPubMedGoogle Scholar
  31. 31.
    Fletcher W, Yang Z (2010) The effect of insertions, deletions, and alignment errors on the branch-site test of positive selection. Mol Biol Evol 27:2257–2267CrossRefPubMedGoogle Scholar
  32. 32.
    Privman E, Penn O, Pupko T (2012) Improving the performance of positive selection inference by filtering unreliable alignment regions. Mol Biol Evol 29:1–5CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Section of Microbiology, Department of BiologyUniversity of CopenhagenCopenhagenDenmark

Personalised recommendations