Comparative Genomics pp 55-78 | Cite as
Comparative Genomics for Prokaryotes
- 6 Citations
- 3 Mentions
- 4k Downloads
Abstract
Bacteria and archaea, collectively known as prokaryotes, have in general genomes that are much smaller than those of eukaryotes. As a result, thousands of these genomes have been sequenced. In prokaryotes, gene architecture lacks the intron-exon structure of eukaryotic genes (with an occasional exception). These two facts mean that there is an abundance of data for prokaryotic genomes, and that they are easier to study than the more complex eukaryotic genomes. In this chapter, we provide an overview of genome comparison tools that have been developed primarily (sometimes exclusively) for prokaryotic genomes. We cover methods that use only the DNA sequences, methods that use only the gene content, and methods that use both data types.
Key words
Prokaryotic genome Pangenome analysis Whole genome alignmentNotes
Acknowledgments
This work was supported in part by a CNPq researcher fellowship (J.C.S. and N.F.A.); by CAPES grant 3385/2013 (BIGA project) (J.C.S. and N.F.A.); by Fundect-MS grants TO141/2016 and TO007/2015 (N.F.A); and by the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and HumanServices, under contract no. HHSN272201400027C (A.R.W.).
References
- 1.Hyatt D et al (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11:119CrossRefPubMedPubMedCentralGoogle Scholar
- 2.Delcher AL et al (1999) Improved microbial gene identification with GLIMMER. Nucleic Acids Res 27(23):4636–4641CrossRefPubMedPubMedCentralGoogle Scholar
- 3.Besemer J, Borodovsky M (2005) GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res 33(Web Server issue):W451–W454CrossRefPubMedPubMedCentralGoogle Scholar
- 4.Seemann T (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics 30(14):2068–2069CrossRefPubMedGoogle Scholar
- 5.Tatusova T et al (2016) NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res 44(14):6614–6624CrossRefPubMedPubMedCentralGoogle Scholar
- 6.Markowitz VM et al (2012) IMG: the integrated microbial genomes database and comparative analysis system. Nucleic Acids Res 40(Database issue):D115–D122CrossRefPubMedGoogle Scholar
- 7.Overbeek R et al (2014) The SEED and the rapid annotation of microbial genomes using subsystems technology (RAST). Nucleic Acids Res 42(Database issue):D206–D214CrossRefPubMedGoogle Scholar
- 8.Chen H, Boutros PC (2011) VennDiagram: a package for the generation of highly-customizable Venn and Euler diagrams in R. BMC Bioinformatics 12:35CrossRefPubMedPubMedCentralGoogle Scholar
- 9.Altschul SF et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402CrossRefPubMedPubMedCentralGoogle Scholar
- 10.Li L, Stoeckert CJ Jr, Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13(9):2178–2189CrossRefPubMedPubMedCentralGoogle Scholar
- 11.Lang JM, Darling AE, Eisen JA (2013) Phylogeny of bacterial and archaeal genomes using conserved genes: supertrees and supermatrices. PLoS One 8(4):e62510CrossRefPubMedPubMedCentralGoogle Scholar
- 12.Edgar RC (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113CrossRefPubMedPubMedCentralGoogle Scholar
- 13.Castresana J (2000) Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol 17(4):540–552CrossRefPubMedGoogle Scholar
- 14.Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9):1312–1313CrossRefPubMedPubMedCentralGoogle Scholar
- 15.Perriere G, Thioulouse J (1996) On-line tools for sequence retrieval and multivariate statistics in molecular biology. Comput Appl Biosci 12(1):63–69PubMedGoogle Scholar
- 16.Tettelin H et al (2005) Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome”. Proc Natl Acad Sci U S A 102(39):13950–13955CrossRefPubMedPubMedCentralGoogle Scholar
- 17.Vernikos G et al (2015) Ten years of pan-genome analyses. Curr Opin Microbiol 23:148–154CrossRefPubMedGoogle Scholar
- 18.Marschall T (2016) Computational pan-genomics: status, promises and challenges. Brief Bioinform bbw089 Google Scholar
- 19.Kaas RS et al (2012) Estimating variation within the genes and inferring the phylogeny of 186 sequenced diverse Escherichia Coli genomes. BMC Genomics 13:577CrossRefPubMedPubMedCentralGoogle Scholar
- 20.Rouli L et al (2014) Genomic analysis of three African strains of bacillus anthracis demonstrates that they are part of the clonal expansion of an exclusively pathogenic bacterium. New Microbes New Infect 2(6):161–169CrossRefPubMedPubMedCentralGoogle Scholar
- 21.Contreras-Moreira B, Vinuesa P (2013) GET_HOMOLOGUES, a versatile software package for scalable and robust microbial pangenome analysis. Appl Environ Microbiol 79(24):7696–7701CrossRefPubMedPubMedCentralGoogle Scholar
- 22.Snipen L, Almoy T, Ussery DW (2009) Microbial comparative pan-genomics using binomial mixture models. BMC Genomics 10:385CrossRefPubMedPubMedCentralGoogle Scholar
- 23.Page AJ et al (2015) Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics 31(22):3691–3693CrossRefPubMedPubMedCentralGoogle Scholar
- 24.Galperin MY et al (2015) Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res 43(Database issue):D261–D269CrossRefPubMedGoogle Scholar
- 25.Ashburner M et al (2000) Gene ontology: tool for the unification of biology the gene ontology consortium. Nat Genet 25(1):25–29CrossRefPubMedPubMedCentralGoogle Scholar
- 26.Conesa A et al (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21(18):3674–3676CrossRefPubMedGoogle Scholar
- 27.Setubal JC, Meidanis J (1997) Introduction to computational molecular biology. PWS, Boston, MA Google Scholar
- 28.Kurtz S et al (2004) Versatile and open software for comparing large genomes. Genome Biol 5(2):R12CrossRefPubMedPubMedCentralGoogle Scholar
- 29.Gusfield D (1997) Algorithms on strings, trees, and sequences. Cambridge University Press, New YorkCrossRefGoogle Scholar
- 30.Uricaru R et al (2015) YOC, a new strategy for pairwise alignment of collinear genomes. BMC Bioinformatics 16:111CrossRefPubMedPubMedCentralGoogle Scholar
- 31.Darling AC et al (2004) Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res 14(7):1394–1403CrossRefPubMedPubMedCentralGoogle Scholar
- 32.Wattam AR et al (2014) PATRIC, the bacterial bioinformatics database and analysis resource. Nucleic Acids Res 42(Database issue):D581–D591CrossRefPubMedGoogle Scholar
- 33.Kumar S, Stecher G, Tamura K (2016) MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol 33(7):1870–1874CrossRefPubMedGoogle Scholar
- 34.Deloger M, El Karoui M, Petit MA (2009) A genomic distance based on MUM indicates discontinuity between most bacterial species and genera. J Bacteriol 191(1):91–99CrossRefPubMedGoogle Scholar
- 35.Henz SR et al (2005) Whole-genome prokaryotic phylogeny. Bioinformatics 21(10):2329–2335CrossRefPubMedGoogle Scholar
- 36.Meier-Kolthoff JP et al (2013) Genome sequence-based species delimitation with confidence intervals and improved distance functions. BMC Bioinformatics 14:60CrossRefPubMedPubMedCentralGoogle Scholar
- 37.Wulff NA et al (2014) The complete genome sequence of ‘Candidatus Liberibacter americanus’, associated with citrus huanglongbing. Mol Plant Microbe Interact 27(2):163–176CrossRefPubMedGoogle Scholar
- 38.Akinosho H et al (2014) The emergence of clostridium thermocellum as a high utility candidate for consolidated bioprocessing applications. Front Chem 2:66CrossRefPubMedPubMedCentralGoogle Scholar
- 39.Setubal JC et al (2009) Genome sequence of Azotobacter vinelandii, an obligate aerobe specialized to support diverse anaerobic metabolic processes. J Bacteriol 191(14):4534–4545CrossRefPubMedPubMedCentralGoogle Scholar
- 40.Eisen JA et al (2000) Evidence for symmetric chromosomal inversions around the replication origin in bacteria. Genome Biol 1(6):RESEARCH0011CrossRefPubMedPubMedCentralGoogle Scholar