Advertisement

Computational Methods in Microbial Population Genomics

  • Xavier DidelotEmail author
Chapter
Part of the Population Genomics book series (POGE)

Abstract

Whole genome sequencing is frequently applied to hundreds of samples within a single microbial population study. The resulting datasets are large and need to be analysed using computationally efficient methods, the development of which is an active research field. Here we review the current state of the art in terms of computation methods used in microbial population genomics. This includes software for assembly and alignment of core genomic regions, which is usually a pre-requirement for analysing the ancestry of the genomes, via phylogenetic on non-phylogenetic methods. We also review additional techniques aimed at combining genomic data with temporal, geographical or other types of metadata, as well as pan-genome methods of analysis that go beyond the core genome.

Keywords

Alignment Assembly Computation methods Microbial population genomics Pan-genome analysis Phylodynamics Phylogenetics Phylogeography Recombination 

References

  1. Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19:1655–64.  https://doi.org/10.1101/gr.094052.109.CrossRefPubMedPubMedCentralGoogle Scholar
  2. Altschul SF, Madden TL, Schaffer AA, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–402.CrossRefGoogle Scholar
  3. Angiuoli SV, Salzberg SL. Mugsy: fast multiple alignment of closely related whole genomes. Bioinformatics. 2010;27:334–42.CrossRefGoogle Scholar
  4. Ansari MA, Didelot X. Bayesian inference of the evolution of a phenotype distribution on a phylogenetic tree. Genetics. 2016;204:89–98.  https://doi.org/10.1101/040980.CrossRefPubMedPubMedCentralGoogle Scholar
  5. Argimón S, Abudahab K, Goater RJE, et al. Microreact: visualizing and sharing data for genomic epidemiology and phylogeography. Microb Genomics. 2016;2:e000093.  https://doi.org/10.1099/mgen.0.000093.CrossRefGoogle Scholar
  6. Balding DJ. A tutorial on statistical methods for population association studies. Nat Rev Genet. 2006;7:781–91.  https://doi.org/10.1038/nrg1916.CrossRefPubMedGoogle Scholar
  7. Bankevich A, Nurk S, Antipov D, et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012;19:455–77.  https://doi.org/10.1089/cmb.2012.0021.CrossRefPubMedPubMedCentralGoogle Scholar
  8. Baum DA, Smith SD, Donovan SSS. The tree-thinking challenge. Science. 2005;310:979–80.  https://doi.org/10.1126/science.1117727.CrossRefPubMedGoogle Scholar
  9. Biek R, Henderson JC, Waller LA, et al. A high-resolution genetic signature of demographic and spatial expansion in epizootic rabies virus. Proc Natl Acad Sci U S A. 2007;104:7993–8.  https://doi.org/10.1073/pnas.0700741104.CrossRefPubMedPubMedCentralGoogle Scholar
  10. Biek R, Pybus OG, Lloyd-Smith JO, Didelot X. Measurably evolving pathogens in the genomic era. Trends Ecol Evol. 2015;30:306–13.  https://doi.org/10.1016/j.tree.2015.03.009.CrossRefPubMedPubMedCentralGoogle Scholar
  11. Bielejec F, Rambaut A, Suchard MA, Lemey P. SPREAD: spatial phylogenetic reconstruction of evolutionary dynamics. Bioinformatics. 2011;27:2910–2.  https://doi.org/10.1093/bioinformatics/btr481.CrossRefPubMedPubMedCentralGoogle Scholar
  12. Bielejec F, Baele G, Vrancken B, et al. SpreaD3: interactive visualization of spatiotemporal history and trait evolutionary processes. Mol Biol Evol. 2016;33:2167–9.  https://doi.org/10.1093/molbev/msw082.CrossRefPubMedGoogle Scholar
  13. Bloomquist EWEEW, Dorman KSKSK, Suchard MA. StepBrothers: inferring partially shared ancestries among recombinant viral sequences. Biostatistics. 2009;10:106–20.  https://doi.org/10.1093/biostatistics/kxn019.CrossRefPubMedGoogle Scholar
  14. Bloomquist EW, Lemey P, Suchard MA. Three roads diverged? Routes to phylogeographic inference. Trends Ecol Evol. 2010;25:626–32.  https://doi.org/10.1016/j.tree.2010.08.010.CrossRefPubMedPubMedCentralGoogle Scholar
  15. Bouckaert R, Heled J, Kühnert D, et al. BEAST 2: a software platform for Bayesian evolutionary analysis. PLoS Comput Biol. 2014;10:e1003537.  https://doi.org/10.1371/journal.pcbi.1003537.CrossRefPubMedPubMedCentralGoogle Scholar
  16. Brown T, Didelot X, Wilson DJ, De Maio N. SimBac: simulation of whole bacterial genomes with homologous recombination. Microb Genomics. 2016;2.  https://doi.org/10.1099/mgen.0.000044.
  17. Castillo-Ramírez S, Corander J, Marttinen P, et al. Phylogeographic variation in recombination rates within a global clone of methicillin-resistant Staphylococcus aureus. Genome Biol. 2012;13:R126.  https://doi.org/10.1186/gb-2012-13-12-r126.CrossRefPubMedPubMedCentralGoogle Scholar
  18. Chaudhari NM, Gupta VK, Dutta C. BPGA- an ultra-fast pan-genome analysis pipeline. Sci Rep. 2016;6:24373.  https://doi.org/10.1038/srep24373.CrossRefPubMedPubMedCentralGoogle Scholar
  19. Chewapreecha C, Harris SR, Croucher NJ, et al. Dense genomic sampling identifies highways of pneumococcal recombination. Nat Genet. 2014;46:305–9.  https://doi.org/10.1038/ng.2895.CrossRefPubMedPubMedCentralGoogle Scholar
  20. Chin CS, Sorenson J, Harris JB, et al. The origin of the Haitian cholera outbreak strain. N Engl J Med. 2011;364:33–42.CrossRefGoogle Scholar
  21. Cohan FM, Perry EB. A systematics for discovering the fundamental units of bacterial diversity. Curr Biol. 2007;17:R373–86.  https://doi.org/10.1016/j.cub.2007.03.032.CrossRefPubMedGoogle Scholar
  22. Collins C, Didelot X. A phylogenetic method to perform genome-wide association studies in microbes that accounts for population structure and recombination. bioRxiv. 2017.  https://doi.org/10.1101/140798.
  23. Comas I, Coscolla M, Luo T, et al. Out-of-Africa migration and Neolithic coexpansion of Mycobacterium tuberculosis with modern humans. Nat Genet. 2013;45:1176–82.  https://doi.org/10.1038/ng.2744.CrossRefPubMedPubMedCentralGoogle Scholar
  24. Croucher NJ, Didelot X. The application of genomics to tracing bacterial pathogen transmission. Curr Opin Microbiol. 2015;23:62–7.  https://doi.org/10.1016/j.mib.2014.11.004.CrossRefPubMedGoogle Scholar
  25. Croucher NJ, Harris SRR, Fraser C, et al. Rapid pneumococcal evolution in response to clinical interventions. Science. 2011;331:430–4.  https://doi.org/10.1126/science.1198545.CrossRefPubMedPubMedCentralGoogle Scholar
  26. Croucher NJ, Coupland PG, Stevenson AE, et al. Diversification of bacterial genome content through distinct mechanisms over different timescales. Nat Commun. 2014;5:5471.  https://doi.org/10.1038/ncomms6471.CrossRefPubMedPubMedCentralGoogle Scholar
  27. Croucher NJ, Page AJ, Connor TR, et al. Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res. 2015;43:e15.  https://doi.org/10.1093/nar/gku1196.CrossRefPubMedGoogle Scholar
  28. Cui Y, Yu C, Yan Y, et al. Historical variations in mutation rate in an epidemic pathogen, Yersinia pestis. Proc Natl Acad Sci U S A. 2013;110:577–82.  https://doi.org/10.1073/pnas.1205750110.CrossRefPubMedGoogle Scholar
  29. Cui Y, Yang X, Didelot X, et al. Epidemic clones, oceanic gene pools and eco-LD in the free living marine pathogen Vibrio parahaemolyticus. Mol Biol Evol. 2015;32:1396–410.  https://doi.org/10.1093/molbev/msv009.CrossRefPubMedGoogle Scholar
  30. Darling AE, Mau B, Perna NT. progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One. 2010;5:e11147.  https://doi.org/10.1371/journal.pone.0011147.CrossRefPubMedPubMedCentralGoogle Scholar
  31. De Maio N, C-H W, O’Reilly KM, Wilson D. New routes to phylogeography: a Bayesian structured coalescent approximation. PLoS Genet. 2015;11:e1005421.  https://doi.org/10.1371/journal.pgen.1005421.CrossRefPubMedPubMedCentralGoogle Scholar
  32. De Silva D, Peters J, Cole K, et al. Whole-genome sequencing to determine transmission of Neisseria gonorrhoeae: an observational study. Lancet Infect Dis. 2016;16:1295–303.  https://doi.org/10.1016/S1473-3099(16)30157-8.CrossRefPubMedPubMedCentralGoogle Scholar
  33. Dearlove BL, Cody AJ, Pascoe B, et al. Rapid host switching in generalist Campylobacter strains erodes the signal for tracing human infections. ISME J. 2015;10(3):721–9.  https://doi.org/10.1038/ismej.2015.149.CrossRefPubMedPubMedCentralGoogle Scholar
  34. Didelot X, Falush D. Inference of bacterial microevolution using multilocus sequence data. Genetics. 2007;175:1251–66.  https://doi.org/10.1534/genetics.106.063305.CrossRefPubMedPubMedCentralGoogle Scholar
  35. Didelot X, Wilson DJ. ClonalFrameML: efficient inference of recombination in whole bacterial genomes. PLoS Comput Biol. 2015;11:e1004041.  https://doi.org/10.1371/journal.pcbi.1004041.CrossRefPubMedPubMedCentralGoogle Scholar
  36. Didelot X, Darling AE, Falush D. Inferring genomic flux in bacteria. Genome Res. 2009a;19:306–17.  https://doi.org/10.1101/gr.082263.108.clearly.CrossRefPubMedPubMedCentralGoogle Scholar
  37. Didelot X, Lawson DJ, Falush D. SimMLST: simulation of multi-locus sequence typing data under a neutral model. Bioinformatics. 2009b;25:1442–4.  https://doi.org/10.1093/bioinformatics/btp145.CrossRefPubMedGoogle Scholar
  38. Didelot X, Lawson DJ, Darling AE, Falush D. Inference of homologous recombination in bacteria using whole-genome sequences. Genetics. 2010;186:1435–49.  https://doi.org/10.1534/genetics.110.120121.CrossRefPubMedPubMedCentralGoogle Scholar
  39. Didelot X, Eyre DW, Cule M, et al. Microevolutionary analysis of Clostridium difficile genomes to investigate transmission. Genome Biol. 2012a;13:R118.  https://doi.org/10.1186/gb-2012-13-12-r118.CrossRefPubMedPubMedCentralGoogle Scholar
  40. Didelot X, Méric G, Falush D, Darling AE. Impact of homologous and non-homologous recombination in the genomic evolution of Escherichia coli. BMC Genomics. 2012b;13:256.  https://doi.org/10.1186/1471-2164-13-256.CrossRefPubMedPubMedCentralGoogle Scholar
  41. Didelot X, Pang B, Zhou Z, et al. The role of China in the global spread of the current cholera pandemic. PLoS Genet. 2015;11:e1005072.  https://doi.org/10.1371/journal.pgen.1005072.CrossRefPubMedPubMedCentralGoogle Scholar
  42. Didelot X, Walker AS, Peto TE, et al. Within-host evolution of bacterial pathogens. Nat Rev Microbiol. 2016;14:150–62.  https://doi.org/10.1038/nrmicro.2015.13.CrossRefPubMedPubMedCentralGoogle Scholar
  43. Dingle KE, Elliott B, Robinson E, et al. Evolutionary history of the clostridium difficile pathogenicity locus. Genome Biol Evol. 2014;6:36–52.  https://doi.org/10.1093/gbe/evt204.CrossRefPubMedGoogle Scholar
  44. Drummond AJ, Rambaut A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol. 2007;7:214.  https://doi.org/10.1186/1471-2148-7-214.CrossRefPubMedPubMedCentralGoogle Scholar
  45. Drummond AJ, Suchard MA. Bayesian random local clocks, or one rate to rule them all. BMC Biol. 2010;8:114.  https://doi.org/10.1186/1741-7007-8-114.CrossRefPubMedPubMedCentralGoogle Scholar
  46. Drummond AJ, Rambaut A, Shapiro B, Pybus OG. Bayesian coalescent inference of past population dynamics from molecular sequences. Mol Biol Evol. 2005;22:1185–92.  https://doi.org/10.1093/molbev/msi103.CrossRefPubMedGoogle Scholar
  47. Drummond AJ, Ho SYW, Phillips MJ, Rambaut A. Relaxed phylogenetics and dating with confidence. PLoS Biol. 2006;4:e88.  https://doi.org/10.1371/journal.pbio.0040088.CrossRefPubMedPubMedCentralGoogle Scholar
  48. Drummond AJ, Suchard MA, Xie D, Rambaut A. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol Biol Evol. 2012;29:1969–73.  https://doi.org/10.1093/molbev/mss075.CrossRefPubMedPubMedCentralGoogle Scholar
  49. Earle SG, Wu C, Charlesworth J, et al. Identifying lineage effects when controlling for population structure improves power in bacterial association studies. Nat Microbiol. 2016;1:16041.  https://doi.org/10.1038/nmicrobiol.2016.41.CrossRefPubMedPubMedCentralGoogle Scholar
  50. Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–7.  https://doi.org/10.1093/nar/gkh340.CrossRefPubMedPubMedCentralGoogle Scholar
  51. Excoffier L, Foll M. Fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics. 2011;27:1332–4.  https://doi.org/10.1093/bioinformatics/btr124.CrossRefPubMedPubMedCentralGoogle Scholar
  52. Falush D, Stephens M, Pritchard JK. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics. 2003;164:1567–87.PubMedPubMedCentralGoogle Scholar
  53. Faria NR, Suchard MA, Rambaut A, et al. Simultaneously reconstructing viral cross-species transmission history and identifying the underlying constraints. Philos Trans R Soc Lond Ser B Biol Sci. 2013;368:20120196.  https://doi.org/10.1098/rstb.2012.0196.CrossRefGoogle Scholar
  54. Felsenstein J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol. 1981;17:368–76.  https://doi.org/10.1007/BF01734359.CrossRefPubMedGoogle Scholar
  55. Felsenstein J. Confidence limits on phylogenies: an approach using the bootstrap. Syst Biol. 1985;39:783–91.Google Scholar
  56. Fitch WM. Toward defining the course of evolution: minimum change for a specific tree topology. Syst Biol. 1971;20:406–16.  https://doi.org/10.1093/sysbio/20.4.406.CrossRefGoogle Scholar
  57. Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. Prepr arXiv:1207.3907 [q-bio.GN]. 2012; 9.Google Scholar
  58. Gascuel O. BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol Biol Evol. 1997;14:685–95.  https://doi.org/10.1093/oxfordjournals.molbev.a025808.CrossRefPubMedGoogle Scholar
  59. Gire SK, Goba A, Andersen KG, et al. Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak. Science. 2014;345:1369–72.  https://doi.org/10.1126/science.1259657.CrossRefPubMedPubMedCentralGoogle Scholar
  60. Goodwin S, McPherson JD, McCombie WR. Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet. 2016;17:333–51.  https://doi.org/10.1038/nrg.2016.49.CrossRefGoogle Scholar
  61. Grad YH, Kirkcaldy RD, Trees D, et al. Genomic epidemiology of Neisseria gonorrhoeae with reduced susceptibility to cefixime in the USA: a retrospective observational study. Lancet Infect Dis. 2014;14:220–6.  https://doi.org/10.1016/S1473-3099(13)70693-5.CrossRefPubMedPubMedCentralGoogle Scholar
  62. Griffiths R, Tavare S. Sampling theory for neutral alleles in a varying environment. Philos Trans R Soc B Biol Sci. 1994;344:403–10.CrossRefGoogle Scholar
  63. Guindon S, Dufayard J-F, Lefort V, et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol. 2010;59:307–21.  https://doi.org/10.1093/sysbio/syq010.CrossRefPubMedGoogle Scholar
  64. Haase JK, Didelot X, Lecuit M, et al. The ubiquitous nature of Listeria monocytogenes clones: a large scale MultiLocus sequence typing study. Environ Microbiol. 2014;16:405–16.  https://doi.org/10.1111/1462-2920.12342.CrossRefPubMedGoogle Scholar
  65. Harris SRR, Feil EJ, Holden MT, et al. Evolution of MRSA during hospital transmission and intercontinental spread. Science. 2010;327:469–74.  https://doi.org/10.1126/science.1182395.CrossRefPubMedPubMedCentralGoogle Scholar
  66. Harris SR, Clarke IN, Seth-Smith HMB, et al. Whole-genome analysis of diverse Chlamydia trachomatis strains identifies phylogenetic relationships masked by current clinical typing. Nat Genet. 2012;44:413–9.  https://doi.org/10.1038/ng.2214.CrossRefPubMedPubMedCentralGoogle Scholar
  67. He M, Miyajima F, Roberts P, et al. Emergence and global spread of epidemic healthcare-associated Clostridium difficile. Nat Genet. 2013;45:109–13.  https://doi.org/10.1038/ng.2478.CrossRefPubMedGoogle Scholar
  68. Hedge J, Wilson J. Bacterial phylogenetic reconstruction from whole genomes is robust to recombination but demographic inference is not. MBio. 2014;5:e02158–14.  https://doi.org/10.1128/mBio.02158-14.Editor.CrossRefPubMedPubMedCentralGoogle Scholar
  69. Hellenthal G, Stephens M. msHOT: modifying Hudson’s ms simulator to incorporate crossover and gene conversion hotspots. Bioinformatics. 2007;23:520–1.  https://doi.org/10.1093/bioinformatics/btl622.CrossRefPubMedGoogle Scholar
  70. Höhna MJ, et al. RevBayes: Bayesian phylogenetic inference using graphical models and an interactive model-specification language. Syst Biol. 2016;65:726–36.CrossRefGoogle Scholar
  71. Holt KE, Baker S, Weill F-X, et al. Shigella sonnei genome sequencing and phylogenetic analysis indicate recent global dissemination from Europe. Nat Genet. 2012;44:1056–9.  https://doi.org/10.1038/ng.2369.CrossRefPubMedPubMedCentralGoogle Scholar
  72. Hudson RR. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics. 2002;18:337–8.  https://doi.org/10.1093/bioinformatics/18.2.337.CrossRefGoogle Scholar
  73. Hunt DEDE, David LA, Gevers D, et al. Resource partitioning and sympatric differentiation among closely related bacterioplankton. Science. 2008;320(5879):1081–5.  https://doi.org/10.1126/science.1157890.CrossRefPubMedGoogle Scholar
  74. Hyatt D, Chen G-L, Locascio PF, et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11:119.  https://doi.org/10.1186/1471-2105-11-119.CrossRefPubMedPubMedCentralGoogle Scholar
  75. Ingle DJ, Tauschek M, Edwards DJ, et al. Evolution of atypical enteropathogenic E. coli by repeated acquisition of LEE pathogenicity island variants. Nat Microbiol. 2016;1:15010.  https://doi.org/10.1038/nmicrobiol.2015.10.CrossRefPubMedGoogle Scholar
  76. Jolley KAA, Maiden MCJ. BIGSdb: scalable analysis of bacterial genome variation at the population level. BMC Bioinformatics. 2010;11:595.  https://doi.org/10.1186/1471-2105-11-595.CrossRefPubMedPubMedCentralGoogle Scholar
  77. Joseph SJ, Didelot X, Gandhi K, et al. Interplay of recombination and selection in the genomes of Chlamydia trachomatis. Biol Direct. 2011;6:28.  https://doi.org/10.1186/1745-6150-6-28.CrossRefPubMedPubMedCentralGoogle Scholar
  78. Joseph SJ, Didelot X, Rothschild J, et al. Population genomics of chlamydia trachomatis: insights on drift, selection, recombination and population structure. Mol Biol Evol. 2012;29:3933–46.  https://doi.org/10.1093/molbev/mss198.CrossRefPubMedPubMedCentralGoogle Scholar
  79. Joy JB, Liang RH, Mccloskey RM, et al. Ancestral reconstruction. PLoS Comput Biol. 2016;12:e1004763.  https://doi.org/10.1371/journal.pcbi.1004763.CrossRefPubMedPubMedCentralGoogle Scholar
  80. Kennemann L, Didelot X, Aebischer T, et al. Helicobacter pylori genome evolution during human infection. Proc Natl Acad Sci U S A. 2011;108:5033–8.  https://doi.org/10.1073/pnas.1018444108.CrossRefPubMedPubMedCentralGoogle Scholar
  81. Kingman JFC. The coalescent. Stoch Process their Appl. 1982;13:235–48.  https://doi.org/10.1016/0304-4149(82)90011-4.CrossRefGoogle Scholar
  82. Kislyuk AO, Haegeman B, Bergman NH, Weitz JS. Genomic fluidity: an integrative view of gene diversity within microbial populations. BMC Genomics. 2011;12:32.  https://doi.org/10.1186/1471-2164-12-32.CrossRefPubMedPubMedCentralGoogle Scholar
  83. Kurtz S, Phillippy A, Delcher AL, et al. Versatile and open software for comparing large genomes. Genome Biol. 2004;5:R12.  https://doi.org/10.1186/gb-2004-5-2-r12.CrossRefPubMedPubMedCentralGoogle Scholar
  84. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25.  https://doi.org/10.1186/gb-2009-10-3-r25.CrossRefPubMedPubMedCentralGoogle Scholar
  85. Lapierre P, Gogarten JP. Estimating the size of the bacterial pan-genome. Trends Genet. 2009;25:107–10.  https://doi.org/10.1002/9781118314630.ch15.CrossRefPubMedGoogle Scholar
  86. Lawson DJ, Hellenthal G, Myers S, Falush D. Inference of population structure using dense haplotype data. PLoS Genet. 2012;8:e1002453.  https://doi.org/10.1371/journal.pgen.1002453.CrossRefPubMedPubMedCentralGoogle Scholar
  87. Lees JA, Vehkala M, Välimäki N, et al. Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes. Nat Commun. 2016;7:12797.  https://doi.org/10.1101/038463.CrossRefPubMedPubMedCentralGoogle Scholar
  88. Lemey P, Rambaut A, Drummond AJ, Suchard M. Bayesian phylogeography finds its roots. PLoS Comput Biol. 2009;5:e1000520.  https://doi.org/10.1371/journal.pcbi.1000520.CrossRefPubMedPubMedCentralGoogle Scholar
  89. Lemey P, Rambaut A, Welch JJ, Suchard MA. Phylogeography takes a relaxed random walk in continuous space and time. Mol Biol Evol. 2010;27:1877–85.  https://doi.org/10.1093/molbev/msq067.CrossRefPubMedPubMedCentralGoogle Scholar
  90. Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25:1754–60.CrossRefGoogle Scholar
  91. Li N, Stephens M. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics. 2003;165:2213–33.  https://doi.org/10.1534/genetics.104.030692.CrossRefPubMedPubMedCentralGoogle Scholar
  92. Li L, Stoeckert CJJ, Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003;13(9):2178–89.  https://doi.org/10.1101/gr.1224503.candidates.CrossRefPubMedPubMedCentralGoogle Scholar
  93. Li H, Handsaker B, Wysoker A, et al. The sequence alignment/map (SAM) format and SAMtools. Bioinformatics. 2009;25:2078–9.  https://doi.org/10.1093/bioinformatics/btp352.CrossRefPubMedPubMedCentralGoogle Scholar
  94. Loman NJ, Pallen MJ. Twenty years of bacterial genome sequencing. Nat Rev Microbiol. 2015;13(12):787–94.  https://doi.org/10.1038/nrmicro3565.CrossRefPubMedGoogle Scholar
  95. Lunter G, Goodson M. Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res. 2011;21:936–9.  https://doi.org/10.1101/gr.111120.110.tions.CrossRefPubMedPubMedCentralGoogle Scholar
  96. Maiden MC, Bygraves JA, Feil EJ, et al. Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc Natl Acad Sci U S A. 1998;95:3140–5.CrossRefGoogle Scholar
  97. Marin JMJ, Pudlo P, Robert CPCP, Ryder R. Approximate Bayesian computational methods. Stat Comput. 2012;22:1167–80.CrossRefGoogle Scholar
  98. Martin DP, Murrell B, Golden M, et al. RDP4: detection and analysis of recombination patterns in virus genomes. Virus Evol. 2015;1:vev003.  https://doi.org/10.1093/ve/vev003.CrossRefPubMedPubMedCentralGoogle Scholar
  99. Marttinen P, Hanage WP, Croucher NJ, et al. Detection of recombination events in bacterial genomes from large population samples. Nucleic Acids Res. 2012;40:1–12.  https://doi.org/10.1093/nar/gkr928.CrossRefGoogle Scholar
  100. McKenna A, Hanna M, Banks E, et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–303.CrossRefGoogle Scholar
  101. McNally A, Oren Y, Kelly D, et al. Combined analysis of variation in core, accessory and regulatory genome regions provides a super-resolution view into the evolution of bacterial populations. PLoS Genet. 2016;12:e1006280.  https://doi.org/10.5061/dryad.d7d71.CrossRefPubMedPubMedCentralGoogle Scholar
  102. Medini D, Donati C, Tettelin H, et al. The microbial pan-genome. Curr Opin Genet Dev. 2005;15:589–94.  https://doi.org/10.1016/j.gde.2005.09.006.CrossRefPubMedGoogle Scholar
  103. Milne I, Wright F, Rowe G, et al. TOPALi: software for automatic identification of recombinant sequences within DNA multiple alignments. Bioinformatics. 2004;20:1806–7.  https://doi.org/10.1093/bioinformatics/bth155.CrossRefPubMedGoogle Scholar
  104. Milne I, Lindner D, Bayer M, et al. TOPALi v2: a rich graphical interface for evolutionary analyses of multiple alignments on HPC clusters and multi-core desktops. Bioinformatics. 2009;25:126–7.  https://doi.org/10.1093/bioinformatics/btn575.CrossRefPubMedGoogle Scholar
  105. Mutreja A, Kim DW, Thomson NR, et al. Evidence for several waves of global transmission in the seventh cholera pandemic. Nature. 2011;477:462–5.  https://doi.org/10.1038/nature10392.CrossRefPubMedPubMedCentralGoogle Scholar
  106. Nagarajan N, Kingsford C. GiRaF: robust, computational identification of influenza reassortments via graph mining. Nucleic Acids Res. 2011;39:e34.  https://doi.org/10.1093/nar/gkq1232.CrossRefPubMedGoogle Scholar
  107. Nübel U, Dordel J, Kurt K, et al. A timescale for evolution, population expansion, and spatial spread of an emerging clone of methicillin-resistant Staphylococcus aureus. PLoS Pathog. 2010;6:e1000855.  https://doi.org/10.1371/journal.ppat.1000855.CrossRefPubMedPubMedCentralGoogle Scholar
  108. Overbeek R, Olson R, Pusch GD, et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 2014;42:206–14.  https://doi.org/10.1093/nar/gkt1226.CrossRefGoogle Scholar
  109. Page AJ, Cummins CA, Hunt M, et al. Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics. 2015;31:3691–3.  https://doi.org/10.1093/bioinformatics/btv421.CrossRefPubMedPubMedCentralGoogle Scholar
  110. Paradis E, Claude J, Strimmer K. APE: analyses of phylogenetics and evolution in R language. Bioinformatics. 2004;20:289–90.  https://doi.org/10.1093/bioinformatics/btg412.CrossRefGoogle Scholar
  111. Peng Y, et al. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics. 2012;28:1420–8.CrossRefGoogle Scholar
  112. Pond SLK, Posada D, Gravenor MB, et al. Sequence analysis GARD: a genetic algorithm for recombination detection. Bioinformatics. 2006;22:3096–8.  https://doi.org/10.1093/bioinformatics/btl474.CrossRefGoogle Scholar
  113. Price MN, Dehal PS, Arkin AP. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol Biol Evol. 2009;26:1641–50.  https://doi.org/10.1093/molbev/msp077.CrossRefPubMedPubMedCentralGoogle Scholar
  114. Price MN, Dehal PS, Arkin AP. FastTree 2 – approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5:e9490.  https://doi.org/10.1371/journal.pone.0009490.CrossRefPubMedPubMedCentralGoogle Scholar
  115. Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000;155:945–59.PubMedPubMedCentralGoogle Scholar
  116. Purcell S, Neale B, Todd-Brown K, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75.  https://doi.org/10.1086/519795.CrossRefPubMedPubMedCentralGoogle Scholar
  117. Rambaut A, Lam TT, Max Carvalho L, Pybus OG. Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen). Virus Evol. 2016;2:vew007.  https://doi.org/10.1093/ve/vew007.CrossRefPubMedPubMedCentralGoogle Scholar
  118. Ratmann O, Hodcroft EB, Pickles M, et al. Phylogenetic tools for generalized HIV-1 epidemics: findings from the PANGEA-HIV methods comparison. Mol Biol Evol. 2017;34:185–203.  https://doi.org/10.1093/molbev/msw217.CrossRefPubMedGoogle Scholar
  119. Read TD, Massey RC. Characterizing the genetic basis of bacterial phenotypes using genome-wide association studies: a new direction for bacteriology. Genome Med. 2014;6:109.  https://doi.org/10.1186/s13073-014-0109-z.CrossRefPubMedPubMedCentralGoogle Scholar
  120. Ronquist F, Teslenko M, van der Mark P, et al. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol. 2012;61:539–42.  https://doi.org/10.1093/sysbio/sys029.CrossRefPubMedPubMedCentralGoogle Scholar
  121. Sahl JW, Caporaso JG, Rasko DA, Keim P. The large-scale blast score ratio (LS-BSR) pipeline: a method to rapidly compare genetic content between bacterial genomes. PeerJ. 2014;2:e332.  https://doi.org/10.7717/peerj.332.CrossRefPubMedPubMedCentralGoogle Scholar
  122. Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987;4:406–25.Google Scholar
  123. Schierup MH, Hein J. Consequences of recombination on traditional phylogenetic analysis. Genetics. 2000;156:879–91.PubMedPubMedCentralGoogle Scholar
  124. Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30:2068–9.  https://doi.org/10.1093/bioinformatics/btu153.CrossRefPubMedGoogle Scholar
  125. Shepheard MA, Fleming VM, Connor TR, et al. Historical zoonoses and other changes in host tropism of staphylococcus aureus, identified by phylogenetic analysis of a population dataset. PLoS One. 2013;8:e62369.  https://doi.org/10.1371/journal.pone.0062369.CrossRefPubMedPubMedCentralGoogle Scholar
  126. Sheppard SK, Didelot X, Jolley KA, et al. Progressive genome-wide introgression in agricultural Campylobacter coli. Mol Ecol. 2013a;22:1051–64.  https://doi.org/10.1111/mec.12162.CrossRefPubMedGoogle Scholar
  127. Sheppard SK, Didelot X, Meric G, et al. Genome-wide association study identifies vitamin B5 biosynthesis as a host specificity factor in Campylobacter. Proc Natl Acad Sci U S A. 2013b;110:11923–7.  https://doi.org/10.5061/dryad.28n35.CrossRefPubMedPubMedCentralGoogle Scholar
  128. Smith GJD, Vijaykrishna D, Bahl J, et al. Origins and evolutionary genomics of the 2009 swine-origin H1N1 influenza A epidemic. Nature. 2009;459:1122–5.  https://doi.org/10.1038/nature08182.CrossRefPubMedGoogle Scholar
  129. Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006;22:2688–90.  https://doi.org/10.1093/bioinformatics/btl446.CrossRefGoogle Scholar
  130. Stoesser N, Sheppard A, Pankhurst L, et al. Evolutionary history of the global emergence of the Escherichia coli epidemic clone ST131. MBio. 2016;7:e02162–15.  https://doi.org/10.1128/mBio.02162-15.Invited.CrossRefPubMedPubMedCentralGoogle Scholar
  131. Tang J, Hanage WP, Fraser C, Corander J. Identifying currents in the gene pool for bacterial populations using an integrative approach. PLoS Comput Biol. 2009;5:e1000455.  https://doi.org/10.1371/journal.pcbi.1000455.CrossRefPubMedPubMedCentralGoogle Scholar
  132. Tettelin H, Masignani V, Cieslewicz MJ, et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial “pan-genome”. Proc Natl Acad Sci U S A. 2005;102:13950–5.CrossRefGoogle Scholar
  133. Tettelin H, Riley D, Cattuto C, Medini D. Comparative genomics: the bacterial pan-genome. Curr Opin Microbiol. 2008;12:472–7.  https://doi.org/10.1016/j.mib.2008.09.006.CrossRefGoogle Scholar
  134. To T-H, Jung M, Lycett S, Gascuel O. Fast dating using least-squares criteria and algorithms. Syst Biol. 2016;65:82–97.  https://doi.org/10.1093/sysbio/syv068.CrossRefPubMedGoogle Scholar
  135. Touchon M, Hoede C, Tenaillon O, et al. Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths. PLoS Genet. 2009;5:e1000344.  https://doi.org/10.1371/journal.pgen.1000344.CrossRefPubMedPubMedCentralGoogle Scholar
  136. Tritt A, Eisen JA, Facciotti MT, Darling AE. An integrated pipeline for de novo assembly of microbial genomes. PLoS One. 2012;7:e42304.  https://doi.org/10.1371/journal.pone.0042304.CrossRefPubMedPubMedCentralGoogle Scholar
  137. Vernikos G, Medini D, Riley DR, Tettelin H. Ten years of pan-genome analyses. Curr Opin Microbiol. 2015;23:148–54.  https://doi.org/10.1016/j.mib.2014.11.016.CrossRefPubMedPubMedCentralGoogle Scholar
  138. Visscher PM, Hill WG, Wray NR. Heritability in the genomics era – concepts and misconceptions. Nat Rev Genet. 2008;9:255–66.  https://doi.org/10.1038/nrg2322.CrossRefGoogle Scholar
  139. Ward MJ, Gibbons CL, McAdam PR, et al. Time-scaled evolutionary analysis of the transmission and antibiotic resistance dynamics of Staphylococcus aureus clonal complex 398. Appl Environ Microbiol. 2014;80:7275–82.  https://doi.org/10.1128/AEM.01777-14.CrossRefPubMedPubMedCentralGoogle Scholar
  140. Weinert LA, Chaudhuri RR, Wang J, et al. Genomic signatures of human and animal disease in the zoonotic pathogen Streptococcus suis. Nat Commun. 2015;6:6740.  https://doi.org/10.1038/ncomms7740.CrossRefPubMedPubMedCentralGoogle Scholar
  141. Wielgoss S, Didelot X, Chaudhuri RR, et al. A barrier to homologous recombination between sympatric strains of the cooperative soil bacterium Myxococcus xanthus. ISME J. 2016;10:2468–77.  https://doi.org/10.1038/ismej.2016.34.CrossRefPubMedPubMedCentralGoogle Scholar
  142. Worobey M, Gemmel M, Teuwen DE, et al. Direct evidence of extensive diversity of HIV-1 in Kinshasa by 1960. Nature. 2008;455:661–4.  https://doi.org/10.1038/nature07390.CrossRefPubMedPubMedCentralGoogle Scholar
  143. Yahara K, Furuta Y, Oshima K, et al. Chromosome painting in silico in a bacterial species reveals fine population structure. Mol Biol Evol. 2013;30:1454–64.  https://doi.org/10.1093/molbev/mst055.CrossRefPubMedPubMedCentralGoogle Scholar
  144. Yahara K, Didelot X, Ansari MA, et al. Efficient inference of recombination hot regions in bacterial genomes. Mol Biol Evol. 2014;31:1593–605.  https://doi.org/10.1093/molbev/msu082.CrossRefPubMedPubMedCentralGoogle Scholar
  145. Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18:821–9.  https://doi.org/10.1101/gr.074492.107.CrossRefPubMedPubMedCentralGoogle Scholar
  146. Zhou Z, McCann A, Litrup E, et al. Neutral genomic microevolution of a recently emerged pathogen, Salmonella enterica serovar Agona. PLoS Genet. 2013;9:e1003471.  https://doi.org/10.1371/journal.pgen.1003471.CrossRefPubMedPubMedCentralGoogle Scholar
  147. Zinder D, Bedford T, Gupta S, Pascual M. The roles of competition and mutation in shaping antigenic and genetic diversity in influenza. PLoS Pathog. 2013;9:e1003104.  https://doi.org/10.1371/journal.ppat.1003104.CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Department of Infectious Disease EpidemiologyImperial College LondonLondonUK

Personalised recommendations