Advertisement

Comparative Genomics for Prokaryotes

Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 1704)

Abstract

Bacteria and archaea, collectively known as prokaryotes, have in general genomes that are much smaller than those of eukaryotes. As a result, thousands of these genomes have been sequenced. In prokaryotes, gene architecture lacks the intron-exon structure of eukaryotic genes (with an occasional exception). These two facts mean that there is an abundance of data for prokaryotic genomes, and that they are easier to study than the more complex eukaryotic genomes. In this chapter, we provide an overview of genome comparison tools that have been developed primarily (sometimes exclusively) for prokaryotic genomes. We cover methods that use only the DNA sequences, methods that use only the gene content, and methods that use both data types.

Key words

Prokaryotic genome Pangenome analysis Whole genome alignment 

Notes

Acknowledgments

This work was supported in part by a CNPq researcher fellowship (J.C.S. and N.F.A.); by CAPES grant 3385/2013 (BIGA project) (J.C.S. and N.F.A.); by Fundect-MS grants TO141/2016 and TO007/2015 (N.F.A); and by the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and HumanServices, under contract no. HHSN272201400027C (A.R.W.).

References

  1. 1.
    Hyatt D et al (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11:119CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Delcher AL et al (1999) Improved microbial gene identification with GLIMMER. Nucleic Acids Res 27(23):4636–4641CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Besemer J, Borodovsky M (2005) GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res 33(Web Server issue):W451–W454CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Seemann T (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics 30(14):2068–2069CrossRefPubMedGoogle Scholar
  5. 5.
    Tatusova T et al (2016) NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res 44(14):6614–6624CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Markowitz VM et al (2012) IMG: the integrated microbial genomes database and comparative analysis system. Nucleic Acids Res 40(Database issue):D115–D122CrossRefPubMedGoogle Scholar
  7. 7.
    Overbeek R et al (2014) The SEED and the rapid annotation of microbial genomes using subsystems technology (RAST). Nucleic Acids Res 42(Database issue):D206–D214CrossRefPubMedGoogle Scholar
  8. 8.
    Chen H, Boutros PC (2011) VennDiagram: a package for the generation of highly-customizable Venn and Euler diagrams in R. BMC Bioinformatics 12:35CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Altschul SF et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Li L, Stoeckert CJ Jr, Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13(9):2178–2189CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Lang JM, Darling AE, Eisen JA (2013) Phylogeny of bacterial and archaeal genomes using conserved genes: supertrees and supermatrices. PLoS One 8(4):e62510CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Edgar RC (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Castresana J (2000) Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol 17(4):540–552CrossRefPubMedGoogle Scholar
  14. 14.
    Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9):1312–1313CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Perriere G, Thioulouse J (1996) On-line tools for sequence retrieval and multivariate statistics in molecular biology. Comput Appl Biosci 12(1):63–69PubMedGoogle Scholar
  16. 16.
    Tettelin H et al (2005) Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome”. Proc Natl Acad Sci U S A 102(39):13950–13955CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Vernikos G et al (2015) Ten years of pan-genome analyses. Curr Opin Microbiol 23:148–154CrossRefPubMedGoogle Scholar
  18. 18.
    Marschall T (2016) Computational pan-genomics: status, promises and challenges. Brief Bioinform bbw089 Google Scholar
  19. 19.
    Kaas RS et al (2012) Estimating variation within the genes and inferring the phylogeny of 186 sequenced diverse Escherichia Coli genomes. BMC Genomics 13:577CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    Rouli L et al (2014) Genomic analysis of three African strains of bacillus anthracis demonstrates that they are part of the clonal expansion of an exclusively pathogenic bacterium. New Microbes New Infect 2(6):161–169CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Contreras-Moreira B, Vinuesa P (2013) GET_HOMOLOGUES, a versatile software package for scalable and robust microbial pangenome analysis. Appl Environ Microbiol 79(24):7696–7701CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    Snipen L, Almoy T, Ussery DW (2009) Microbial comparative pan-genomics using binomial mixture models. BMC Genomics 10:385CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Page AJ et al (2015) Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics 31(22):3691–3693CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Galperin MY et al (2015) Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res 43(Database issue):D261–D269CrossRefPubMedGoogle Scholar
  25. 25.
    Ashburner M et al (2000) Gene ontology: tool for the unification of biology the gene ontology consortium. Nat Genet 25(1):25–29CrossRefPubMedPubMedCentralGoogle Scholar
  26. 26.
    Conesa A et al (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21(18):3674–3676CrossRefPubMedGoogle Scholar
  27. 27.
    Setubal JC, Meidanis J (1997) Introduction to computational molecular biology. PWS, Boston, MA Google Scholar
  28. 28.
    Kurtz S et al (2004) Versatile and open software for comparing large genomes. Genome Biol 5(2):R12CrossRefPubMedPubMedCentralGoogle Scholar
  29. 29.
    Gusfield D (1997) Algorithms on strings, trees, and sequences. Cambridge University Press, New YorkCrossRefGoogle Scholar
  30. 30.
    Uricaru R et al (2015) YOC, a new strategy for pairwise alignment of collinear genomes. BMC Bioinformatics 16:111CrossRefPubMedPubMedCentralGoogle Scholar
  31. 31.
    Darling AC et al (2004) Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res 14(7):1394–1403CrossRefPubMedPubMedCentralGoogle Scholar
  32. 32.
    Wattam AR et al (2014) PATRIC, the bacterial bioinformatics database and analysis resource. Nucleic Acids Res 42(Database issue):D581–D591CrossRefPubMedGoogle Scholar
  33. 33.
    Kumar S, Stecher G, Tamura K (2016) MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol 33(7):1870–1874CrossRefPubMedGoogle Scholar
  34. 34.
    Deloger M, El Karoui M, Petit MA (2009) A genomic distance based on MUM indicates discontinuity between most bacterial species and genera. J Bacteriol 191(1):91–99CrossRefPubMedGoogle Scholar
  35. 35.
    Henz SR et al (2005) Whole-genome prokaryotic phylogeny. Bioinformatics 21(10):2329–2335CrossRefPubMedGoogle Scholar
  36. 36.
    Meier-Kolthoff JP et al (2013) Genome sequence-based species delimitation with confidence intervals and improved distance functions. BMC Bioinformatics 14:60CrossRefPubMedPubMedCentralGoogle Scholar
  37. 37.
    Wulff NA et al (2014) The complete genome sequence of ‘Candidatus Liberibacter americanus’, associated with citrus huanglongbing. Mol Plant Microbe Interact 27(2):163–176CrossRefPubMedGoogle Scholar
  38. 38.
    Akinosho H et al (2014) The emergence of clostridium thermocellum as a high utility candidate for consolidated bioprocessing applications. Front Chem 2:66CrossRefPubMedPubMedCentralGoogle Scholar
  39. 39.
    Setubal JC et al (2009) Genome sequence of Azotobacter vinelandii, an obligate aerobe specialized to support diverse anaerobic metabolic processes. J Bacteriol 191(14):4534–4545CrossRefPubMedPubMedCentralGoogle Scholar
  40. 40.
    Eisen JA et al (2000) Evidence for symmetric chromosomal inversions around the replication origin in bacteria. Genome Biol 1(6):RESEARCH0011CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media LLC 2018

Authors and Affiliations

  1. 1.Department of Biochemistry, Institute of ChemistryUniversity of São PauloSão PauloBrazil
  2. 2.School of ComputingFederal University of Mato Grosso do SulCampo GrandeBrazil
  3. 3.Biocomplexity InstituteVirginia TechBlacksburgUSA

Personalised recommendations