Skip to main content

Prokaryotic Genome Annotation

  • 1033 Accesses

Part of the Methods in Molecular Biology book series (MIMB,volume 2349)

Abstract

In the last decade, the high-throughput and relatively low cost of short-read sequencing technologies have revolutionized prokaryotic genomics. This has led to an exponential increase in the number of bacterial and archaeal genome sequences available, as well as corresponding increase of genome assembly and annotation tools developed. Together, these hardware and software technologies have given scientists unprecedented options to study their chosen microbial systems without the need for large teams of bioinformaticists or supercomputing facilities. While these analysis tools largely fall into only a few categories, each may have different requirements, caveats and file formats, and some may be rarely updated or even abandoned. And so, despite the apparent ease in sequencing and analyzing a prokaryotic genome, it is no wonder that the budding genomicist may quickly find oneself overwhelmed. Here, we aim to provide the reader with an overview of genome annotation and its most important considerations, as well as an easy-to-follow protocol to get started with annotating a prokaryotic genome.

Key words

  • Genome annotation
  • Prokaryote sequencing
  • Gene prediction
  • Structural annotation
  • Functional annotation

This is a preview of subscription content, access via your institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-1-0716-1585-0_10
  • Chapter length: 22 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   169.00
Price excludes VAT (USA)
  • ISBN: 978-1-0716-1585-0
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Hardcover Book
USD   219.99
Price excludes VAT (USA)
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Sorokina M, Stam M, Médigue C et al (2014) Profiling the orphan enzymes. Biol Direct 9:10

    PubMed  PubMed Central  CrossRef  Google Scholar 

  2. Griesemer M, Kimbrel JA, Zhou CE et al (2018) Combining multiple functional annotation tools increases coverage of metabolic annotation. BMC Genomics 19:948

    CAS  PubMed  PubMed Central  CrossRef  Google Scholar 

  3. Baric RS, Crosson S, Damania B et al (2016) Next-generation high-throughput functional annotation of microbial genomes. MBio 7:e01245-16

    PubMed  PubMed Central  CrossRef  Google Scholar 

  4. Stepanauskas R (2012) Single cell genomics: an individual look at microbes. Curr Opin Microbiol 15:613–620

    CAS  PubMed  CrossRef  Google Scholar 

  5. Bowers RM, Kyrpides NC, Stepanauskas R et al (2017) Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat Biotechnol 35:725–731

    CAS  PubMed  PubMed Central  CrossRef  Google Scholar 

  6. Forouzan E, Maleki MSM, Karkhane AA et al (2017) Evaluation of nine popular de novo assemblers in microbial genome assembly. J Microbiol Methods 143:32–37

    CAS  PubMed  CrossRef  Google Scholar 

  7. Klassen JL, Currie CR (2012) Gene fragmentation in bacterial draft genomes: extent consequences and mitigation. BMC Genomics 13:14

    CAS  PubMed  PubMed Central  CrossRef  Google Scholar 

  8. Sohn J, Nam J-W (2016) The present and future of de novo whole-genome assembly. Brief Bioinformatics 2016:bbw096

    CrossRef  CAS  Google Scholar 

  9. Bowers RM, Clum A, Tice H et al (2015) Impact of library preparation protocols and template quantity on the metagenomic reconstruction of a mock microbial community. BMC Genomics 16:856

    PubMed  PubMed Central  CrossRef  Google Scholar 

  10. Parks DH, Imelfort M, Skennerton CT et al (2015) CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25:1043–1055

    CAS  PubMed  PubMed Central  CrossRef  Google Scholar 

  11. Wu M, Eisen JA (2008) A simple, fast, and accurate method of phylogenomic inference. Genome Biol 9:R151

    PubMed  PubMed Central  CrossRef  CAS  Google Scholar 

  12. Chain PSG, Grafham DV, Fulton RS et al (2009) Genome project standards in a new era of sequencing. Science 326:236–237

    CAS  PubMed  CrossRef  Google Scholar 

  13. Mende DR, Letunic I, Huerta-Cepas J et al (2017) proGenomes: a resource for consistent functional and taxonomic annotations of prokaryotic genomes. Nucleic Acids Res 45:D529–D534

    CAS  PubMed  CrossRef  Google Scholar 

  14. Gurevich A, Saveliev V, Vyahhi N et al (2013) QUAST: quality assessment tool for genome assemblies. Bioinformatics 29:1072–1075

    CAS  PubMed  PubMed Central  CrossRef  Google Scholar 

  15. da Veiga Leprevost F, Grüning BA, Alves AS et al (2017) BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics 33:2580–2582

    PubMed  PubMed Central  CrossRef  CAS  Google Scholar 

  16. Grüning B, Dale R, Sjödin A et al (2017) Bioconda: a sustainable and comprehensive software distribution for the life sciences. Nat Methods 15(7):475–476

    CrossRef  CAS  Google Scholar 

  17. Overmars L, Kerkhoven R, Siezen RJ et al (2013) MGcV: the microbial genomic context viewer for comparative genome analysis. BMC Genomics 14:209

    CAS  PubMed  PubMed Central  CrossRef  Google Scholar 

  18. Tatusova T, DiCuccio M, Badretdin A et al (2016) NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res 44:6614–6624

    CAS  PubMed  PubMed Central  CrossRef  Google Scholar 

  19. Chen IA, Markowitz VM, Chu K et al (2017) IMG/M: integrated genome and metagenome comparative data analysis system. Nucleic Acids Res 45:D507–D516

    CAS  PubMed  CrossRef  Google Scholar 

  20. Aziz RK, Bartels D, Best AA et al (2008) The RAST server: rapid annotations using subsystems technology. BMC Genomics 9:75

    PubMed  PubMed Central  CrossRef  CAS  Google Scholar 

  21. Seemann T (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics 30:2068–2069

    CAS  CrossRef  PubMed  Google Scholar 

  22. Van DGH, Stothard P, Shrivastava S et al (2005) BASys: a web server for automated bacterial genome annotation. Nucleic Acids Res 33:W455–W459

    CrossRef  CAS  Google Scholar 

  23. Kremer FS, Eslabão MR, Dellagostin OA et al (2016) Genix: a new online automated pipeline for bacterial genome annotation. FEMS Microbiol Lett 363(23):fnw263

    PubMed  CrossRef  CAS  Google Scholar 

  24. Thakur S, Guttman DS (2016) A de-novo genome analysis pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies. BMC Bioinformatics 17:260

    PubMed  PubMed Central  CrossRef  CAS  Google Scholar 

  25. Hyatt D, Chen GL, Locascio PF et al (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11:119

    PubMed  PubMed Central  CrossRef  CAS  Google Scholar 

  26. Delcher AL, Bratke KA, Powers EC et al (2007) Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 23:673–679

    CAS  PubMed  CrossRef  Google Scholar 

  27. Besemer J, Lomsadze A, Borodovsky M (2001) GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res 29:2607–2618

    CAS  PubMed  PubMed Central  CrossRef  Google Scholar 

  28. Lowe TM, Eddy SR (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25:955–964

    CAS  PubMed  PubMed Central  CrossRef  Google Scholar 

  29. Kalvari I, Argasinska J, Quinones-Olvera N et al (2018) Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families. Nucleic Acids Res 46:D335–D342

    CAS  PubMed  CrossRef  Google Scholar 

  30. Lagesen K, Hallin P, Rødland EA et al (2007) RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res 35:3100–3108

    CAS  PubMed  PubMed Central  CrossRef  Google Scholar 

  31. Moll I, Grill S, Gualerzi CO et al (2002) Leaderless mRNAs in bacteria: surprises in ribosomal recruitment and translational control. Mol Microbiol 43:239–246

    CAS  PubMed  CrossRef  Google Scholar 

  32. Zheng X, Hu GQ, She ZS et al (2011) Leaderless genes in bacteria: clue to the evolution of translation initiation mechanisms in prokaryotes. BMC Genomics 12:361

    CAS  PubMed  PubMed Central  CrossRef  Google Scholar 

  33. Lomsadze A, Gemayel K, Tang S et al (2017) Improved prokaryotic gene prediction yields insights into transcription and translation mechanisms on whole genome scale. https://doi.org/10.1101/193490

  34. Borodovsky M, Rudd KE, Koonin EV (1994) Intrinsic and extrinsic approaches for detecting genes in a bacterial genome. Nucleic Acids Res 22:4756–4767

    CAS  PubMed  PubMed Central  CrossRef  Google Scholar 

  35. Richardson EJ, Watson M (2012) The automatic annotation of bacterial genomes. Brief Bioinform 14:1–12

    PubMed  PubMed Central  CrossRef  CAS  Google Scholar 

  36. Sherwood AV, Henkin TM (2016) Riboswitch-mediated gene regulation: novel RNA architectures dictate gene expression responses. Annu Rev Microbiol 70:361–374

    CAS  PubMed  CrossRef  Google Scholar 

  37. Backofen R, Amman F, Costa F et al (2014) Bioinformatics of prokaryotic RNAs. RNA Biol 11:470–483

    PubMed  PubMed Central  CrossRef  Google Scholar 

  38. Kalvari I, Argasinska J, Quinones-Olvera N et al (2017) Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families. Nucleic Acids Res 46:D335–D342

    PubMed Central  CrossRef  CAS  Google Scholar 

  39. Nawrocki EP, Eddy SR (2013) Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29:2933–2935

    CAS  PubMed  PubMed Central  CrossRef  Google Scholar 

  40. Bobrovskyy M, Vanderpool CK (2013) Regulation of bacterial metabolism by small RNAs using diverse mechanisms. Annu Rev Genet 47:209–232

    CAS  PubMed  CrossRef  Google Scholar 

  41. Pain A, Ott A, Amine H et al (2015) An assessment of bacterial small RNA target prediction programs. RNA Biol 12:509–513

    PubMed  PubMed Central  CrossRef  Google Scholar 

  42. Modell JW, Jiang W, Marraffini LA (2017) CRISPR-Cas systems exploit viral DNA injection to establish and maintain adaptive immunity. Nature 544:101–104

    CAS  PubMed  PubMed Central  CrossRef  Google Scholar 

  43. Sallet E, Roux B, Sauviac L et al (2013) Next-generation annotation of prokaryotic genomes with EuGene-P: application to Sinorhizobium meliloti 2011. DNA Res 20:339–354

    CAS  PubMed  PubMed Central  CrossRef  Google Scholar 

  44. Sallet E, Gouzy J, Schiex T (2014) EuGene-PP: a next-generation automated annotation pipeline for prokaryotic genomes. Bioinformatics 30:2659–2661

    CAS  PubMed  CrossRef  Google Scholar 

  45. Zickmann F, Lindner MS, Renard BY (2014) GIIRA–RNA-Seq driven gene finding incorporating ambiguous reads. Bioinformatics 30:606–613

    CAS  PubMed  CrossRef  Google Scholar 

  46. Roberts A, Pimentel H, Trapnell C et al (2011) Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics 27:2325–2329

    CAS  PubMed  CrossRef  Google Scholar 

  47. Omasits U, Varadarajan AR, Schmid M et al (2017) An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics. Genome Res 27:2083–2095

    CAS  PubMed  PubMed Central  CrossRef  Google Scholar 

  48. Erbilgin O, Ruebel O, Louie KB et al (2017) MAGI: a Bayesian-like method for metabolite annotation, and gene integration. ACS Chem Biol 14(4):704–714

    CrossRef  CAS  Google Scholar 

  49. Schiex T, Moisan A, Rouzé P (2001) Eugène: an eukaryotic gene finder that combines several sources of evidence. In: Computational biology. Springer, Berlin, pp 111–125

    CrossRef  Google Scholar 

  50. Tripp HJ, Sutton G, White O et al (2015) Toward a standard in structural genome annotation for prokaryotes. Stand Genomic Sci 10:45

    PubMed  PubMed Central  CrossRef  CAS  Google Scholar 

  51. Kanehisa M, Furumichi M, Tanabe M et al (2017) KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res 45:D353–D361

    CAS  PubMed  CrossRef  Google Scholar 

  52. Moriya Y, Itoh M, Okuda S et al (2007) KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res 35:W182–W185

    PubMed  PubMed Central  CrossRef  Google Scholar 

  53. Weber T, Blin K, Duddela S et al (2015) antiSMASH 3.0-a comprehensive resource for the genome mining of biosynthetic gene clusters. Nucleic Acids Res 43:W237–W243

    CAS  PubMed  PubMed Central  CrossRef  Google Scholar 

  54. Yin Y, Mao X, Yang J et al (2012) dbCAN: a web resource for automated carbohydrate-active enzyme annotation. Nucleic Acids Res 40:W445–W451

    CAS  PubMed  PubMed Central  CrossRef  Google Scholar 

  55. Elbourne LD, Tetu SG, Hassan KA et al (2017) TransportDB 2.0: a database for exploring membrane transporters in sequenced genomes from all domains of life. Nucleic Acids Res 45:D320–D324

    CAS  PubMed  CrossRef  Google Scholar 

  56. Chen L (2004) VFDB: a reference database for bacterial virulence factors. Nucleic Acids Res 33:D325–D328

    PubMed Central  CrossRef  CAS  Google Scholar 

  57. Logan-Klumpler FJ, Silva ND, Boehme U et al (2011) GeneDB–an annotation database for pathogens. Nucleic Acids Res 40:D98–D108

    PubMed  PubMed Central  CrossRef  CAS  Google Scholar 

  58. Lombard V, Ramulu HG, Drula E et al (2013) The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res 42:D490–D495

    PubMed  PubMed Central  CrossRef  CAS  Google Scholar 

  59. Berlemont R, Martiny AC (2015) Genomic potential for polysaccharide deconstruction in bacteria. Appl Environ Microbiol 81:1513–1519

    PubMed  PubMed Central  CrossRef  CAS  Google Scholar 

  60. Sánchez-Rodríguez A, Tytgat HL, Winderickx J et al (2014) A network-based approach to identify substrate classes of bacterial glycosyltransferases. BMC Genomics 15:349

    PubMed  PubMed Central  CrossRef  CAS  Google Scholar 

Download references

Acknowledgments

This work was performed under the auspices of the U.S. Department of Energy at Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 and supported by the Genome Sciences Program of the Office of Biological and Environmental Research under the LLNL Biofuels SFA, FWP SCW1039.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jeffrey A. Kimbrel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Verify currency and authenticity via CrossMark

Cite this protocol

Kimbrel, J.A., Jeffrey, B.M., Ward, C.S. (2022). Prokaryotic Genome Annotation. In: Navid, A. (eds) Microbial Systems Biology. Methods in Molecular Biology, vol 2349. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1585-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-1585-0_10

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-1584-3

  • Online ISBN: 978-1-0716-1585-0

  • eBook Packages: Springer Protocols