Reconstructing the Ancestral Relationships Between Bacterial Pathogen Genomes

  • Caitlin Collins
  • Xavier Didelot
Part of the Methods in Molecular Biology book series (MIMB, volume 1535)


Following recent developments in DNA sequencing technology, it is now possible to sequence hundreds of whole genomes from bacterial isolates at relatively low cost. Analyzing this growing wealth of genomic data in terms of ancestral relationships can reveal many interesting aspects of the evolution, ecology, and epidemiology of bacterial pathogens. However, reconstructing the ancestry of a sample of bacteria remains challenging, especially for the majority of species where recombination is frequent. Here, we review and describe the computational techniques currently available to infer ancestral relationships, including phylogenetic methods that either ignore or account for the effect of recombination, as well as model-based and model-free phylogeny-independent approaches.

Key words

Pathogen genomics Population structure Bacterial recombination Phylogenetics Ancestral inference Comparative genomics 


  1. 1.
    Didelot X, Bowden R, Wilson DJ et al (2012) Transforming clinical microbiology with bacterial genome sequencing. Nat Rev Genet 13:601–612PubMedPubMedCentralCrossRefGoogle Scholar
  2. 2.
    Loman NJ, Pallen MJ (2015) Twenty years of bacterial genome sequencing. Nat Rev Microbiol 13:787–794PubMedCrossRefGoogle Scholar
  3. 3.
    World Health Organisation (2015) World health statistics. Global health indicators: cause-specific mortality and morbidity.Google Scholar
  4. 4.
    Kiechle FL, Zhang X, Holland-Staley CA (2004) The -omics era and its impact. Arch Pathol Lab Med 128:1337–1345PubMedGoogle Scholar
  5. 5.
    Lowder BV, Guinane CM, Ben Zakour NL et al (2009) Recent human-to-poultry host jump, adaptation, and pandemic spread of Staphylococcus aureus. Proc Natl Acad Sci U S A 106:19545–19550PubMedPubMedCentralCrossRefGoogle Scholar
  6. 6.
    Guinane CM, Ben Zakour NL, Tormo-Mas MA et al (2010) Evolutionary genomics of Staphylococcus aureus reveals insights into the origin and molecular basis of ruminant host adaptation. Genome Biol Evol 2:454–466PubMedPubMedCentralCrossRefGoogle Scholar
  7. 7.
    Holden MTG, Hsu L-Y, Kurt K et al (2013) A genomic portrait of the emergence, evolution, and global spread of a methicillin-resistant Staphylococcus aureus pandemic. Genome Res 23:653–664PubMedPubMedCentralCrossRefGoogle Scholar
  8. 8.
    Croucher NJ, Harris SR, Fraser C et al (2011) Rapid pneumococcal evolution in response to clinical interventions. Science 331:430–434PubMedPubMedCentralCrossRefGoogle Scholar
  9. 9.
    Charlesworth J, Eyre-Walker A (2006) The rate of adaptive evolution in enteric bacteria. Mol Biol Evol 23:1348–1356PubMedCrossRefGoogle Scholar
  10. 10.
    Batut B, Knibbe C, Marais G, Daubin V (2014) Reductive genome evolution at both ends of the bacterial population size spectrum. Nat Rev Microbiol 12:841–850PubMedCrossRefGoogle Scholar
  11. 11.
    Achtman M (2004) Chapter 2: age, descent and genetic diversity within Yersinia pestis. In: Carniel E, Joseph Hinnesbusch B (eds) Yersinia: molecular and cellular biology, 1st edn. Taylor & Francis, Norfolk, UK, pp 17–29Google Scholar
  12. 12.
    Sheppard SK, Didelot X, Meric G et al (2013) Genome-wide association study identifies vitamin B5 biosynthesis as a host specificity factor in Campylobacter. Proc Natl Acad Sci U S A 110:11923–11927PubMedPubMedCentralCrossRefGoogle Scholar
  13. 13.
    Alam MT, Petit RA 3rd, Crispell EK et al (2014) Dissecting vancomycin-intermediate resistance in staphylococcus aureus using genome-wide association. Genome Biol Evol 6:1174–1185PubMedPubMedCentralCrossRefGoogle Scholar
  14. 14.
    Didelot X, Gardy J, Colijn C (2014) Bayesian inference of infectious disease transmission from whole-genome sequence data. Mol Biol Evol 31:1869–1879PubMedPubMedCentralCrossRefGoogle Scholar
  15. 15.
    Price AL, Zaitlen NA, Reich D, Patterson N (2010) New approaches to population stratification in genome-wide association studies. Nat Rev Genet 11:459–463PubMedPubMedCentralCrossRefGoogle Scholar
  16. 16.
    Kwok RBH (2011) Phylogeny, genealogy and the Linnaean hierarchy: a logical analysis. J Math Biol 63:73–108PubMedCrossRefGoogle Scholar
  17. 17.
    Lefort V, Desper R, Gascuel O (2015) FastME 2.0: a comprehensive, accurate, and fast distance-based phylogeny inference program. Mol Biol Evol 32:2798–2800PubMedPubMedCentralCrossRefGoogle Scholar
  18. 18.
    Murtagh F (2015) R: Hierarchical Clustering. Accessed 27 Jul 2015
  19. 19.
    Suzuki R, Shimodaira H (2006) Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics 22:1540–1542PubMedCrossRefGoogle Scholar
  20. 20.
    Popescu A-A, Huber KT, Paradis E (2012) ape 3.0: new tools for distance-based phylogenetics and evolutionary analysis in R. Bioinformatics 28:1536–1537PubMedCrossRefGoogle Scholar
  21. 21.
    Schliep KP (2011) phangorn: phylogenetic analysis in R. Bioinformatics 27:592–593PubMedCrossRefGoogle Scholar
  22. 22.
    Gascuel O (1997) BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol Biol Evol 14:685–695PubMedCrossRefGoogle Scholar
  23. 23.
    Tamura K, Peterson D, Peterson N et al (2011) MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 28:2731–2739PubMedPubMedCentralCrossRefGoogle Scholar
  24. 24.
    Felsenstein J (1989) PHYLIP - phylogeny inference package (Version 3.2). Cladistics 5:164–166Google Scholar
  25. 25.
    Wilgenbusch JC and Swofford D (2003) Inferring Evolutionary Trees with PAUP*. Current Protocols in Bioinformatics. 00:6.4:6.4.1–6.4.28Google Scholar
  26. 26.
    Guindon S, Dufayard J-F, Lefort V et al (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59:307–321PubMedCrossRefGoogle Scholar
  27. 27.
    Stamatakis A (2006) RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22:2688–2690PubMedCrossRefGoogle Scholar
  28. 28.
    Zwickl DJ (2006) Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. Ph. D. dissertation, The University of Texas at AustinGoogle Scholar
  29. 29.
    Price MN, Dehal PS, Arkin AP (2009) FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol Biol Evol 26:1641–1650PubMedPubMedCentralCrossRefGoogle Scholar
  30. 30.
    Price MN, Dehal PS, Arkin AP (2010) FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One 5:e9490PubMedPubMedCentralCrossRefGoogle Scholar
  31. 31.
    Ashkenazy H, Penn O, Doron-Faigenboim A et al (2012) FastML: a web server for probabilistic reconstruction of ancestral sequences. Nucleic Acids Res 40:W580–W584PubMedPubMedCentralCrossRefGoogle Scholar
  32. 32.
    Ronquist F, Teslenko M, van der Mark P et al (2012) MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol 61:539–542PubMedPubMedCentralCrossRefGoogle Scholar
  33. 33.
    Drummond AJ, Rambaut A (2007) BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol 7:214PubMedPubMedCentralCrossRefGoogle Scholar
  34. 34.
    Bouckaert R, Heled J, Kühnert D et al (2014) BEAST 2: a software platform for Bayesian evolutionary analysis. PLoS Comput Biol 10:e1003537PubMedPubMedCentralCrossRefGoogle Scholar
  35. 35.
    Didelot X, Falush D (2007) Inference of bacterial microevolution using multilocus sequence data. Genetics 175:1251–1266PubMedPubMedCentralCrossRefGoogle Scholar
  36. 36.
    Didelot X, Wilson DJ (2015) ClonalFrameML: efficient inference of recombination in whole bacterial genomes. PLoS Comput Biol 11:e1004041PubMedPubMedCentralCrossRefGoogle Scholar
  37. 37.
    Croucher NJ, Page AJ, Connor TR et al (2015) Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res 43:e15PubMedCrossRefGoogle Scholar
  38. 38.
    Pritchard JK, Stephens M, Rosenberg NA, Donnelly P (2000) Association mapping in structured populations. Am J Hum Genet 67:170–181PubMedPubMedCentralCrossRefGoogle Scholar
  39. 39.
    Pritchard JK, Wen W, Falush D (2003) Documentation for structure software: version 2Google Scholar
  40. 40.
    Tang J, Hanage WP, Fraser C, Corander J (2009) Identifying currents in the gene pool for bacterial populations using an integrative approach. PLoS Comput Biol 5:e1000455PubMedPubMedCentralCrossRefGoogle Scholar
  41. 41.
    Marttinen P, Hanage WP, Croucher NJ et al (2012) Detection of recombination events in bacterial genomes from large population samples. Nucleic Acids Res 40:e6PubMedCrossRefGoogle Scholar
  42. 42.
    Alexander DH, Novembre J, Lange K (2009) Fast model-based estimation of ancestry in unrelated individuals. Genome Res 19:1655–1664PubMedPubMedCentralCrossRefGoogle Scholar
  43. 43.
    Lawson DJ, Hellenthal G, Myers S, Falush D (2012) Inference of population structure using dense haplotype data. PLoS Genet 8:e1002453PubMedPubMedCentralCrossRefGoogle Scholar
  44. 44.
    Yahara K, Didelot X, Ansari MA et al (2014) Efficient inference of recombination hot regions in bacterial genomes. Mol Biol Evol 31:1593–1605PubMedPubMedCentralCrossRefGoogle Scholar
  45. 45.
    Dray S, Dufour AB (2007) The ade4 package: implementing the duality diagram for ecologists. J Stat Softw 22:1–20CrossRefGoogle Scholar
  46. 46.
    Jombart T, Devillard S, Balloux F (2010) Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. BMC Genet 11:94PubMedPubMedCentralCrossRefGoogle Scholar
  47. 47.
    Dunitz MI, Lang JM, Jospin G et al (2015) Swabs to genomes: a comprehensive workflow. PeerJ 3:e960PubMedPubMedCentralCrossRefGoogle Scholar
  48. 48.
    Li H, Ruan J, Durbin R (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 18:1851–1858PubMedPubMedCentralCrossRefGoogle Scholar
  49. 49.
    Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–829PubMedPubMedCentralCrossRefGoogle Scholar
  50. 50.
    Darling AE, Mau B, Perna NT (2010) ProgressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5:e11147PubMedPubMedCentralCrossRefGoogle Scholar
  51. 51.
    Jolley KA, Maiden MCJ (2010) BIGSdb: Scalable analysis of bacterial genome variation at the population level. BMC Bioinformatics 11:595PubMedPubMedCentralCrossRefGoogle Scholar
  52. 52.
    Legendre P, Legendre LFJ (1983) Developments in environmental modelling, vol 24, 2nd edn, Numerical ecology. Elsevier, AmsterdamGoogle Scholar
  53. 53.
    Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425PubMedGoogle Scholar
  54. 54.
    Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376PubMedCrossRefGoogle Scholar
  55. 55.
    Hedge J, Wilson DJ (2014) Bacterial phylogenetic reconstruction from whole genomes is robust to recombination but demographic inference is not. MBio 5:e02158PubMedPubMedCentralCrossRefGoogle Scholar
  56. 56.
    Bogdanowicz D, Giaro K, Wróbel B (2012) TreeCmp: comparison of trees in polynomial time. Evol Bioinform Online 8:475PubMedCentralGoogle Scholar
  57. 57.
    Sørensen T (1948) A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on Danish commons. Kongelige Danske Videnskabernes Selskabs Biologiske Skrifter 5:1–34Google Scholar
  58. 58.
    Sneath PHA, Sokal RR, Freeman WH (1975) Numerical taxonomy. The principles and practice of numerical classification. Syst Zool 24:263–268CrossRefGoogle Scholar
  59. 59.
    Gascuel O, Steel M (2006) Neighbor-joining revealed. Mol Biol Evol 23:1997–2000PubMedCrossRefGoogle Scholar
  60. 60.
    Zuckerland E, Pauling LB (1962) Molecular disease, evolution, and genetic heterogeneity. In: Kasha M, Pullman B (eds) Horizons in biochemistry. Academic Press, New York, pp 189–225Google Scholar
  61. 61.
    Wang L-S, Warnow T, Moret BME et al (2006) Distance-based genome rearrangement phylogeny. J Mol Evol 63:473–483PubMedCrossRefGoogle Scholar
  62. 62.
    Sheppard SK, Didelot X, Jolley KA et al (2013) Progressive genome-wide introgression in agricultural Campylobacter coli. Mol Ecol 22:1051–1064PubMedCrossRefGoogle Scholar
  63. 63.
    Merker M, Blin C, Mona S et al (2015) Evolutionary history and global spread of the Mycobacterium tuberculosis Beijing lineage. Nat Genet 47:242–249PubMedCrossRefGoogle Scholar
  64. 64.
    Morelli G, Song Y, Mazzoni CJ et al (2010) Yersinia pestis genome sequencing identifies patterns of global phylogenetic diversity. Nat Genet 42:1140–1143PubMedPubMedCentralCrossRefGoogle Scholar
  65. 65.
    Cui Y, Yu C, Yan Y et al (2013) Historical variations in mutation rate in an epidemic pathogen, Yersinia pestis. Proc Natl Acad Sci U S A 110:577–582PubMedCrossRefGoogle Scholar
  66. 66.
    Zhou Z, McCann A, Litrup E et al (2013) Neutral genomic microevolution of a recently emerged pathogen, Salmonella enterica serovar Agona. PLoS Genet 9:e1003471PubMedPubMedCentralCrossRefGoogle Scholar
  67. 67.
    Holder M, Lewis PO (2003) Phylogeny estimation: traditional and Bayesian approaches. Nat Rev Genet 4:275–284PubMedCrossRefGoogle Scholar
  68. 68.
    Mutreja A, Kim DW, Thomson NR et al (2011) Evidence for several waves of global transmission in the seventh cholera pandemic. Nature 477:462–465PubMedPubMedCentralCrossRefGoogle Scholar
  69. 69.
    Harris SR, Feil EJ, Holden MTG et al (2010) Evolution of MRSA during hospital transmission and intercontinental spread. Science 327:469–474PubMedPubMedCentralCrossRefGoogle Scholar
  70. 70.
    Harris SR, Clarke IN, Seth-Smith HMB et al (2012) Whole-genome analysis of diverse Chlamydia trachomatis strains identifies phylogenetic relationships masked by current clinical typing. Nat Genet 44(413–9):S1Google Scholar
  71. 71.
    Metropolis N, Rosenbluth AW, Rosenbluth MN et al (1953) Equation of state calculations by fast computing machines. J Chem Phys 21:1087–1092CrossRefGoogle Scholar
  72. 72.
    Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:97–109CrossRefGoogle Scholar
  73. 73.
    Biek R, Pybus OG, Lloyd-Smith JO, Didelot X (2015) Measurably evolving pathogens in the genomic era. Trends Ecol Evol 30:306–313PubMedPubMedCentralCrossRefGoogle Scholar
  74. 74.
    Pupko T, Pe’er I, Shamir R, Graur D (2000) A fast algorithm for joint reconstruction of ancestral amino acid sequences. Mol Biol Evol 17:890–896PubMedCrossRefGoogle Scholar
  75. 75.
    Didelot X, Meric G, Falush D, Darling A (2012) Impact of homologous and non-homologous recombination in the genomic evolution of Escherichia coli. BMC Genomics 13:256PubMedPubMedCentralCrossRefGoogle Scholar
  76. 76.
    Joseph SJ, Didelot X, Gandhi K et al (2011) Interplay of recombination and selection in the genomes of Chlamydia trachomatis. Biol Direct 6:28PubMedPubMedCentralCrossRefGoogle Scholar
  77. 77.
    Joseph SJ, Didelot X, Rothschild J et al (2012) Population genomics of Chlamydia trachomatis: insights on drift, selection, recombination, and population structure. Mol Biol Evol 29:3933–3946PubMedPubMedCentralCrossRefGoogle Scholar
  78. 78.
    Dearlove BL, Cody AJ, Pascoe B et al (2015) Rapid host switching in generalist Campylobacter strains erodes the signal for tracing human infections. ISME J 10:721–729. doi: 10.1038/ismej.2015.149 PubMedPubMedCentralCrossRefGoogle Scholar
  79. 79.
    van Tonder AJ, Bray JE, Roalfe L et al (2015) Genomics reveals the worldwide distribution of multidrug-resistant serotype 6E pneumococci. J Clin Microbiol 53:2271–2285PubMedPubMedCentralCrossRefGoogle Scholar
  80. 80.
    Walker TM, Kohl TA, Omar SV et al (2015) Whole-genome sequencing for prediction of Mycobacterium tuberculosis drug susceptibility and resistance: a retrospective cohort study. Lancet Infect Dis 15:1193–1202PubMedPubMedCentralCrossRefGoogle Scholar
  81. 81.
    Croucher NJ, Finkelstein JA, Pelton SI et al (2015) Population genomic datasets describing the post-vaccine evolutionary epidemiology of Streptococcus pneumoniae. Sci Data 2:150058PubMedPubMedCentralCrossRefGoogle Scholar
  82. 82.
    Chewapreecha C, Harris SR, Croucher NJ et al (2014) Dense genomic sampling identifies highways of pneumococcal recombination. Nat Genet 46:305–309PubMedPubMedCentralCrossRefGoogle Scholar
  83. 83.
    Cornick JE, Chaguza C, Harris SR et al (2015) Region-specific diversification of the highly virulent serotype 1 Streptococcus pneumoniae. Microbial Genomics 1:10.doi:  10.1099/mgen.0.000027
  84. 84.
    Kamng’ona AW, Hinds J, Bar-Zeev N et al (2015) High multiple carriage and emergence of Streptococcus pneumoniae vaccine serotype variants in Malawian children. BMC Infect Dis 15:234PubMedPubMedCentralCrossRefGoogle Scholar
  85. 85.
    Turner CE, Abbott J, Lamagni T et al (2015) Emergence of a new highly successful acapsular group A Streptococcus clade of genotype emm89 in the United Kingdom. MBio 6:e00622PubMedPubMedCentralGoogle Scholar
  86. 86.
    Stasiewicz MJ, Oliver HF, Wiedmann M, den Bakker HC (2015) Whole-genome sequencing allows for improved identification of persistent listeria monocytogenes in food-associated environments. Appl Environ Microbiol 81:6024–6037PubMedPubMedCentralCrossRefGoogle Scholar
  87. 87.
    Robinson DA, Feil EJ, Falush D (2010) Bacterial population genetics in infectious disease. Wiley-Blackwell, Malden, MACrossRefGoogle Scholar
  88. 88.
    Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945–959PubMedPubMedCentralGoogle Scholar
  89. 89.
    Falush D, Stephens M, Pritchard JK (2003) Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164:1567–1587PubMedPubMedCentralGoogle Scholar
  90. 90.
    Rosenberg NA (2004) distruct: a program for the graphical display of population structure. Mol Ecol Notes 4:137–138CrossRefGoogle Scholar
  91. 91.
    Ramasamy RK, Ramasamy S, Bindroo BB, Naik VG (2014) STRUCTURE PLOT: a program for drawing elegant STRUCTURE bar plots in user friendly interface. SpringerPlus 3:431PubMedPubMedCentralCrossRefGoogle Scholar
  92. 92.
    Falush D, Torpdahl M, Didelot X et al (2006) Mismatch induced speciation in Salmonella: model and data. Philos Trans R Soc Lond B Biol Sci 361:2045–2053PubMedPubMedCentralCrossRefGoogle Scholar
  93. 93.
    Wirth T, Falush D, Lan R et al (2006) Sex and virulence in Escherichia coli: an evolutionary perspective. Mol Microbiol 60:1136–1151PubMedPubMedCentralCrossRefGoogle Scholar
  94. 94.
    Sheppard SK, McCarthy ND, Falush D, Maiden MCJ (2008) Convergence of Campylobacter species: implications for bacterial evolution. Science 320:237–239PubMedCrossRefGoogle Scholar
  95. 95.
    Castillo-Ramírez S, Corander J, Marttinen P et al (2012) Phylogeographic variation in recombination rates within a global clone of methicillin-resistant Staphylococcus aureus. Genome Biol 13:R126PubMedPubMedCentralCrossRefGoogle Scholar
  96. 96.
    Yahara K, Furuta Y, Oshima K et al (2013) Chromosome painting in silico in a bacterial species reveals fine population structure. Mol Biol Evol 30:1454–1464PubMedPubMedCentralCrossRefGoogle Scholar
  97. 97.
    Cui Y, Yang X, Didelot X et al (2015) Epidemic clones, oceanic gene pools and eco-LD in the free living marine pathogen Vibrio parahaemolyticus. Mol Biol Evol 32:1396–1410. doi: 10.1093/molbev/msv009 PubMedCrossRefGoogle Scholar
  98. 98.
    Lawson DJ, Falush D (2012) Population identification using genetic data. Annu Rev Genomics Hum Genet 13:337–361PubMedCrossRefGoogle Scholar
  99. 99.
    R Core Development Team (2013) The R project for statistical computing. In: R: a language and environment for statistical computing. Accessed 1 Feb 2015
  100. 100.
    Jombart T (2008) adegenet: a R package for the multivariate analysis of genetic markers. Bioinformatics 24:1403–1405PubMedCrossRefGoogle Scholar
  101. 101.
    Jombart T, Ahmed I (2011) adegenet 1.3-1: new tools for the analysis of genome-wide SNP data. Bioinformatics 27:3070–3071PubMedPubMedCentralCrossRefGoogle Scholar
  102. 102.
    Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philosophical Magazine Series 6 2:559–572CrossRefGoogle Scholar
  103. 103.
    Cavalli-Sforza LL (1966) Population structure and human evolution. Proc R Soc Lond B Biol Sci 164:362–379PubMedCrossRefGoogle Scholar
  104. 104.
    Patterson N, Price AL, Reich D (2006) Population structure and eigenanalysis. PLoS Genet 2:e190PubMedPubMedCentralCrossRefGoogle Scholar
  105. 105.
    Paschou P, Ziv E, Burchard EG et al (2007) PCA-correlated SNPs for structure identification in worldwide human populations. PLoS Genet 3:1672–1686PubMedCrossRefGoogle Scholar
  106. 106.
    Lessa EP (1990) Multidimensional analysis of geographic genetic structure. Syst Biol 39:242–252Google Scholar
  107. 107.
    Alter O, Brown PO, Botstein D (2000) Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci U S A 97:10101–10106PubMedPubMedCentralCrossRefGoogle Scholar
  108. 108.
    Sanchez-Mazas A, Langaney A (1988) Common genetic pools between human populations. Hum Genet 78:161–166PubMedCrossRefGoogle Scholar
  109. 109.
    Smouse PE, Spielman RS, Park MH (1982) Multiple-locus allocation of individuals to groups as a function of the genetic variation within and differences among human populations. Am Nat 119:445–463CrossRefGoogle Scholar
  110. 110.
    Jombart T, Pontier D, Dufour A-B (2009) Genetic markers in the playground of multivariate analysis. Heredity 102:330–341PubMedCrossRefGoogle Scholar
  111. 111.
    Lefébure T, Bitar PDP, Suzuki H, Stanhope MJ (2010) Evolutionary dynamics of complete Campylobacter pan-genomes and the bacterial species concept. Genome Biol Evol 2:646–655PubMedPubMedCentralCrossRefGoogle Scholar
  112. 112.
    Bolivar I, Whiteson K, Stadelmann B et al (2012) Bacterial diversity in oral samples of children in niger with acute noma, acute necrotizing gingivitis, and healthy controls. PLoS Negl Trop Dis 6:e1556PubMedPubMedCentralCrossRefGoogle Scholar
  113. 113.
    Montano V, Didelot X, Foll M et al (2015) Worldwide population structure, long term demography, and local adaptation of helicobacter pylori. Genetics 200:947–963. doi: 10.1534/genetics.115.176404 PubMedPubMedCentralCrossRefGoogle Scholar
  114. 114.
    Efron B (1979) Bootstrap methods: another look at the Jackknife. Ann Statist 7:1–26CrossRefGoogle Scholar
  115. 115.
    Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783–791. doi: 10.2307/2408678 CrossRefGoogle Scholar
  116. 116.
    Anisimova M, Gascuel O (2006) Approximate likelihood-ratio test for branches: a fast, accurate, and powerful alternative. Syst Biol 55:539–552PubMedCrossRefGoogle Scholar
  117. 117.
    Comas I, Coscolla M, Luo T et al (2013) Out-of-Africa migration and Neolithic coexpansion of Mycobacterium tuberculosis with modern humans. Nat Genet 45:1176–1182PubMedPubMedCentralCrossRefGoogle Scholar
  118. 118.
    Milkman R, Bridges MM (1990) Molecular evolution of the Escherichia coli chromosome. III clonal frames. Genetics 126:505–517PubMedPubMedCentralGoogle Scholar
  119. 119.
    Dress AWM, Flamm C, Fritzsch G et al (2008) Noisy: identification of problematic columns in multiple sequence alignments. Algorithms Mol Biol 3:7PubMedPubMedCentralCrossRefGoogle Scholar
  120. 120.
    Hornstra HM, Priestley RA, Georgia SM et al (2011) Rapid typing of Coxiella burnetii. PLoS One 6:e26201PubMedPubMedCentralCrossRefGoogle Scholar
  121. 121.
    Vos M, Didelot X (2008) A comparison of homologous recombination rates in bacteria and archaea. ISME J 3:199–208PubMedCrossRefGoogle Scholar
  122. 122.
    Didelot X, Eyre DW, Cule M et al (2012) Microevolutionary analysis of Clostridium difficile genomes to investigate transmission. Genome Biol 13:R118PubMedPubMedCentralCrossRefGoogle Scholar
  123. 123.
    Feil EJ, Holmes EC, Bessen DE et al (2001) Recombination within natural populations of pathogenic bacteria: short-term empirical estimates and long-term phylogenetic consequences. Proc Natl Acad Sci U S A 98:182–187PubMedPubMedCentralCrossRefGoogle Scholar
  124. 124.
    Kennemann L, Didelot X, Aebischer T et al (2011) Helicobacter pylori genome evolution during human infection. Proc Natl Acad Sci U S A 108:5033–5038PubMedPubMedCentralCrossRefGoogle Scholar
  125. 125.
    Albright E, Hessel J, Hiranuma N et al (2014) A comparative analysis of popular phylogenetic reconstruction algorithms. In: Proceedings of the Midwest Instruction and Computing Symposium (MICS)Google Scholar
  126. 126.
    Bouckaert RR (2010) DensiTree: making sense of sets of phylogenetic trees. Bioinformatics 26:1372–1373PubMedCrossRefGoogle Scholar
  127. 127.
    Ochman H, Lawrence JG, Groisman EA (2000) Lateral gene transfer and the nature of bacterial innovation. Nature 405:299–304PubMedCrossRefGoogle Scholar
  128. 128.
    Schierup MH, Hein J (2000) Consequences of recombination on traditional phylogenetic analysis. Genetics 156:879–891PubMedPubMedCentralGoogle Scholar
  129. 129.
    Schierup MH, Hein J (2000) Recombination and the molecular clock. Mol Biol Evol 17:1578–1579PubMedCrossRefGoogle Scholar
  130. 130.
    Posada D, Crandall KA (2002) The effect of recombination on the accuracy of phylogeny estimation. J Mol Evol 54:396–402PubMedCrossRefGoogle Scholar
  131. 131.
    Rannala B, Yang Z (2008) Phylogenetic inference using whole genomes. Annu Rev Genomics Hum Genet 9:217–231PubMedCrossRefGoogle Scholar
  132. 132.
    Everitt RG, Didelot X, Batty EM et al (2014) Mobile elements drive recombination hotspots in the core genome of Staphylococcus aureus. Nat Commun 5:3956PubMedPubMedCentralCrossRefGoogle Scholar
  133. 133.
    Mostowy R, Croucher NJ, Hanage WP et al (2014) Heterogeneity in the frequency and characteristics of homologous recombination in pneumococcal evolution. PLoS Genet 10:e1004300PubMedPubMedCentralCrossRefGoogle Scholar
  134. 134.
    Namouchi A, Didelot X, Schöck U et al (2012) After the bottleneck: genome-wide diversification of the Mycobacterium tuberculosis complex by mutation, recombination, and natural selection. Genome Res 22:721–734PubMedPubMedCentralCrossRefGoogle Scholar
  135. 135.
    Dykhuizen DE, Green L (1991) Recombination in Escherichia coli and the definition of biological species. J Bacteriol 173:7257–7268PubMedPubMedCentralCrossRefGoogle Scholar
  136. 136.
    Hudson RR, Kaplan NL (1985) Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111:147–164PubMedPubMedCentralGoogle Scholar
  137. 137.
    Lewontin RC (1964) The interaction of selection and linkage. I general considerations; heterotic models. Genetics 49:49–67PubMedPubMedCentralGoogle Scholar
  138. 138.
    Hill WG, Robertson A (1968) Linkage disequilibrium in finite populations. Theor Appl Genet 38:226–231PubMedCrossRefGoogle Scholar
  139. 139.
    Didelot X, Lawson D, Darling A, Falush D (2010) Inference of homologous recombination in bacteria using whole-genome sequences. Genetics 186:1435–1449PubMedPubMedCentralCrossRefGoogle Scholar
  140. 140.
    Waples RS, Gaggiotti O (2006) What is a population? An empirical evaluation of some genetic methods for identifying the number of gene pools and their degree of connectivity. Mol Ecol 15:1419–1439PubMedCrossRefGoogle Scholar
  141. 141.
    Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol Ecol 14:2611–2620PubMedCrossRefGoogle Scholar
  142. 142.
    Hartigan JA, Wong MA (1979) Algorithm AS 136: A K-means clustering algorithm. J R Stat Soc Ser C Appl Stat 28:100–108Google Scholar
  143. 143.
    MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Cao J, Mao K, Cambria E et al (eds) Proceedings of ELM-2014 Volume 1: Algorithms and theories. Springer International Publishing, pp 281–297Google Scholar
  144. 144.
    Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41:578–588CrossRefGoogle Scholar
  145. 145.
    Lee C, Abdool A, Huang C-H (2009) PCA-based population structure inference with generic clustering algorithms. BMC Bioinformatics 10(Suppl 1):S73PubMedPubMedCentralCrossRefGoogle Scholar
  146. 146.
    Zhu X, Zhang S, Zhao H, Cooper RS (2002) Association mapping, using a mixture model for complex traits. Genet Epidemiol 23:181–196PubMedCrossRefGoogle Scholar
  147. 147.
    Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum Press, New YorkCrossRefGoogle Scholar
  148. 148.
    Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New YorkCrossRefGoogle Scholar
  149. 149.
    Fraley C, Raferty AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97:611–631CrossRefGoogle Scholar
  150. 150.
    Lawson DJ (2013) Populations in statistical genetic modelling and inference. arXiv [q-bio.PE]Google Scholar
  151. 151.
    McVean G (2009) A genealogical interpretation of principal components analysis. PLoS Genet 5:e1000686PubMedPubMedCentralCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.Department of Infectious Disease EpidemiologyImperial College LondonLondonUK

Personalised recommendations