Comparative Genomics

Reference work entry


The relative ease of sequencing bacterial genomes has resulted in thousands of sequenced bacterial genomes available in the public databases. This same technology now allows for using the entire genome sequence as an identifier for an organism. There are many methods available which attempt to use genome sequences to classify bacteria, and the method of choice, as always, depends on the question asked and the particular need. For example, 16S rRNA can define a bacterial species, and relate species, genera, and higher orders into groups consistent with their known biological properties. However, distinguishing between strains of the same species requires additional information. The advantage of having the whole-genome sequence is that roughly a 1,000 times as much information is available, and this information can be used for rapid classification of strains, based on DNA sequence. This chapter reviews many commonly used methods and also describes potential pitfalls if used inappropriately, as well as which questions are best addressed by particular methods. After a brief introduction to the classical methods of taxonomy, a description of the bacterial genomes currently available is given, and then whole-genome-based methods are explored using three different data sets.


Gamma Proteobacteria Small Genome Average Nucleotide Identity Mycoplasma Genitalium European Molecular Biology Laboratory 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Aanensen D, Spratt B (2005) The multilocus sequence typing network: mlst. net. Nucleic Acids Res 33:W728–W733PubMedCrossRefGoogle Scholar
  2. Abt B, Foster B, Others (2010) Complete genome sequence of cellulomonas flavigena type strain (134). Stand Genomic Sci 3:15–25Google Scholar
  3. Altschul S, Gish W, Miller W, Myers E, Lipman D (1990) Basic local alignment search tool. J Mol Biol 215:403–410PubMedGoogle Scholar
  4. Ansorge WJ (2009) Next-generation DNA sequencing techniques. New Biotechnol 25:195–203CrossRefGoogle Scholar
  5. Arslan D, Legendre M, Seltzer V, Abergel C, Claverie J (2011) Distant Mimivirus relative with a larger genome highlights the fundamental features of Megaviridae. Proc Natl Acad Sci 108:17486–17491PubMedCrossRefGoogle Scholar
  6. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL (2008) GenBank. Nucleic Acids Res 36:D25–D30PubMedCrossRefGoogle Scholar
  7. Bernal A, Ear U, Kyrpides N (2001) Genomes OnLine Database GOLD: a monitor of genome projects world-wide. Nucleic Acids Res 29:126–127PubMedCrossRefGoogle Scholar
  8. Beyer WA, Stein M, Smith TF, Ulam S (1974) A molecular sequence metric and evolutionary trees. Math Biosci 19:9–25Google Scholar
  9. Bohlin J, Snipen L, Cloeckaert A, Lagesen K, Ussery D, Kristoffersen AB, Godfroid J (2010) Genomic comparisons of Brucella spp and closely related bacteria using base compositional and proteome based methods. BMC Evol Biol 10:249PubMedCrossRefGoogle Scholar
  10. Brenner DJ, Krieg NR, Staley JT, Garrity GM, Boone DR, Vos P, Goodfellow M, Rainey FA, Schleifer KH (2005a) Bergeys manual® of systematic bacteriology, vol 2. Springer, BostonCrossRefGoogle Scholar
  11. Brenner DJ, Staley JT, Krieg NR (2005b) Classification of procaryotic organisms and the concept of bacterial speciation. In: Brenner DJ, Krieg NR, Staley JT, Garrity GM (eds) Bergey’s manual® of systematic bacteriology. Springer, Boston, pp 27–32CrossRefGoogle Scholar
  12. Cavalli-Sforza LL, Edwards AWF (1967) Phylogenetic analysis: Models and estimation procedures. Evolution 32:550–570CrossRefGoogle Scholar
  13. Chain P, Grafham D, Fulton R, Fitzgerald M (2009) Genome project standards in a new era of sequencing. Science 326:4–5CrossRefGoogle Scholar
  14. Cohn F (1872) Untersuchungen tiber Bakterien II. Beitr Biol Pflanz 1:127–224Google Scholar
  15. Delcher A, Kasif S, Fleischmann R, Peterson J, White O, Salzberg S (1999) Alignment of whole genomes. Nucleic Acids Res 27:2369–2376PubMedCrossRefGoogle Scholar
  16. DeWall MT, Cheng DW (2011) The minimal genome–a metabolic and environmental comparison. Brief Funct Genomic Proteomic 105:312–315Google Scholar
  17. Doolittle W (1999) Phylogenetic classification and the universal tree. Science 284:2124–2128PubMedCrossRefGoogle Scholar
  18. Felsenstein J (2004) Inferring phylogenies. Sinauer Associates, SunderlandGoogle Scholar
  19. Fitch WM, Margoliash E (1967) Construction of phylogenetic trees. Science 155:279–284PubMedCrossRefGoogle Scholar
  20. Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269:496–512PubMedCrossRefGoogle Scholar
  21. Fraser CM, Gocayne JD, Others (1995) The minimal gene complement of Mycoplasma genitalium. Science 270:397–403Google Scholar
  22. Galperin MY (2006) The minimal genome keeps growing. Environ Microbiol 84:569–573CrossRefGoogle Scholar
  23. Garrity GM, Lilburn GT, Cole JR, Harrison SH, Euzeby J, Tindall BJ (2007) Introduction to the taxonomic outline of bacteria and archaea (TOBA) Release 7.7. The Taxonomic Outline of Bacteria and Archaea. Accessed 23 Feb 2012
  24. Gibbons N, Murray R (1978) Proposals concerning the higher taxa of bacteria. Int J Syst Evol Microbiol 28:1–6Google Scholar
  25. Goris J, Konstantinidis K, Klappenbach J, Coenye T, Vandamme P, Tiedje J (2007) DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. Int J Syst Evol Microbiol 57:81–91PubMedCrossRefGoogle Scholar
  26. Harayama S, Kasai H (2006) Bacterial phylogeny reconstruction from molecular sequences. In: Stackebrandt E (ed) Molecular identification, systematics, and population structure of prokaryotes. Springer, Berlin/New York, pp 105–140CrossRefGoogle Scholar
  27. Hobohm U, Scharf M, Schneider R, Sander C (1992) Selection of representative protein data sets. Protein Sci 1:409–417PubMedCrossRefGoogle Scholar
  28. Konstantinidis K, Tiedje J (2005) Genomic insights that advance the species definition for prokaryotes. Proc Natl Acad Sci USA 102:2567–2572PubMedCrossRefGoogle Scholar
  29. Lagesen K, Hallin P, Rødland EA, Staerfeldt HH, Rognes T, Ussery DW (2007) RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res 35:3100–3108PubMedCrossRefGoogle Scholar
  30. Lagesen K, Ussery DW, Wassenaar TM (2010) Genome update: the 1000th genome–a cautionary tale. Microbiology 156:603–608PubMedCrossRefGoogle Scholar
  31. Lapage S, Sneath P, Lessel E, Skerman V, Seeliger H, Clark W (1992) International code of nomenclature of bacteria: bacteriological code, 1990 revision. American Society of Microbiology, Washington, DCGoogle Scholar
  32. Lapierre P, Gogarten J (2009) Estimating the size of the bacterial pan-genome. Trends Genet 25:107–110PubMedCrossRefGoogle Scholar
  33. Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG (2007) Clustal W and Clustal X version 2.0. Bioinformatics 23:2947–2948PubMedCrossRefGoogle Scholar
  34. Liolios K, Chen I-MA, Mavromatis K, Tavernarakis N, Hugenholtz P, Markowitz VM, Kyrpides NC (2010) The genomes on line database GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res 38:D346–D354PubMedCrossRefGoogle Scholar
  35. Maiden M, Bygraves J, Feil E, Morelli G, Russell J, Urwin R, Zhang Q, Zhou J, Zurth K, Caugant D, Others (1998) Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc Natl Acad Sci USA 95:3140–3145Google Scholar
  36. McCutcheon JP, McDonald BR, Moran NA (2009) Origin of an alternative genetic code in the extremely small and GC-rich genome of a bacterial symbiont. PLoS Genet 57:e1000565CrossRefGoogle Scholar
  37. Médigue C, Moszer I (2007) Annotation, comparison and databases for hundreds of bacterial genomes. Res Microbiol 158:24–36CrossRefGoogle Scholar
  38. Moya A, Gil R, Latorre A, Peretó J, Pilar Garcillán-Barcia M, de la Cruz F (2009) Toward minimal bacterial cells: evolution vs. design. FEMS Microbiol Rev 331:225–235CrossRefGoogle Scholar
  39. Murray RGE (1989) The higher taxa, or, a place for everything…? In: Williams ST, Sharpe ME, Holt JG (eds) Bergey’s manual of systematic bacteriology, vol 4, 1st edn. Williams & Wilkins, Baltimore, pp 2329–2332Google Scholar
  40. Pérez-Brocal V, Gil R, Ramos S, Lamelas A, Postigo M, Michelena JM, Silva FJ, Moya A, Latorre A (2006) A small microbial genome: the end of a long symbiotic relationship? Science 314:312–313PubMedCrossRefGoogle Scholar
  41. Qi J, Luo H, Hao B (2004) CVTree: a phylogenetic tree reconstruction tool based on whole genomes. Nucleic Acids Res 32:W45–W47PubMedCrossRefGoogle Scholar
  42. Richter M, Rosselló-Móra R (2009) Shifting the genomic gold standard for the prokaryotic species definition. Proc Natl Acad Sci 106:19126–19131PubMedCrossRefGoogle Scholar
  43. Sanford RA, Cole JR, Tiedje JM, Al SET, Icrobiol APPLENM (2002) Characterization and description of Anaeromyxobacter dehalogenans gen. nov., sp. nov., an aryl-halorespiring facultative anaerobic myxobacterium. Appl Environ Microbiol 68:893–900PubMedCrossRefGoogle Scholar
  44. Sboner A, Mu XJ, Greenbaum D, Auerbach RK, Gerstein MB (2011) The real cost of sequencing: higher than you think! Genome Biol 12:125PubMedCrossRefGoogle Scholar
  45. Schleifer KH (2009) Classification of Bacteria and Archaea: past, present and future. Syst Appl Microbiol 32:533–542PubMedCrossRefGoogle Scholar
  46. Schneiker S, Perlova O, Others (2007) Complete genome sequence of the myxobacterium Sorangium cellulosum. Nat Biotechnol 25:1281–1289Google Scholar
  47. Smith T, Waterman M (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197PubMedCrossRefGoogle Scholar
  48. Snipen L, Ussery DW (2010) Standard operating procedure for computing pangenome trees. Stand Genomic Sci 2:135–141PubMedCrossRefGoogle Scholar
  49. Stackebrandt E (2006) Exciting times: the challenge to be a bacterial systematist. Molecular Identification, Systematics, and Population Structure of Prokaryotes 1–21Google Scholar
  50. Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S (2011) MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 28(10):2731–2739PubMedCrossRefGoogle Scholar
  51. Teeling H, Meyerdierks A, Bauer M, Amann R, Glöckner FO (2004) Application of tetranucleotide frequencies for the assignment of genomic fragments. Environ Microbiol 6:938–947PubMedCrossRefGoogle Scholar
  52. Tindall BJ, Kämpfer P, Euzéby JP, Oren A (2001) Valid publication of names of prokaryotes according to the rules of nomenclature: past history and current practice. Int J Syst Evol Microbiol 56:2715–2720CrossRefGoogle Scholar
  53. Ussery D, Wassenaar T, Borini S (2009) Computing for comparative microbial genomics: bioinformatics for microbiologists. Springer, LondonCrossRefGoogle Scholar
  54. Ward JH, Jr. (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 48:236–244CrossRefGoogle Scholar
  55. Waters E, Hohn MJ, Others (2003) The genome of Nanoarchaeum equitans: insights into early archaeal evolution and derived parasitism. Proc Natl Acad Sci USA 100:12984–12988Google Scholar
  56. Wayne L, Brenner D, Colwell R, Grimont P, Kandler O, Krichevsky M, Moore L, Moore W, Murray R, Stackebrandt E, Others (1987) International committee on systematic bacteriology. Report of the ad hoc committee on reconciliation of approaches to bacterial systematics. Int J Syst Evol Microbiol 37:463–464Google Scholar
  57. Whittaker R, Margulis L (1978) Protist classification and the kingdoms of organisms. Biosystems 10:3–18PubMedCrossRefGoogle Scholar
  58. Woese C, Fox G (1977) Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc Natl Acad Sci USA 74:5088–5090PubMedCrossRefGoogle Scholar
  59. Woese C, Kandler O, Wheelis M (1990) Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci USA 87:4576–4579PubMedCrossRefGoogle Scholar
  60. Xu Z, Hao B (2009) CVTree update: a newly designed phylogenetic study platform using composition vectors and whole genomes. Nucleic Acids Res 37:174–178CrossRefGoogle Scholar
  61. Yamamoto S, Harayama S (1996) Phylogenetic analysis of acinetobacter strains based on the nucleotide sequences of gyrB genes and on the amino acid sequences of their products. Int J Syst Evol Microbiol 46:506–511Google Scholar
  62. Zuckerkandl E, Pauling LB, Kasha M, Pullman B (1962) Molecular disease, evolution, and genetic heterogeneity. Academic, New York, pp 189–225Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Department of Systems BiologyCenter for Biological Sequence Analysis, Kemitorvet, The Technical University of DenmarkLyngbyDenmark

Personalised recommendations