Gene Calling and Bacterial Genome Annotation with BG7
New massive sequencing technologies are providing many bacterial genome sequences from diverse taxa but a refined annotation of these genomes is crucial for obtaining scientific findings and new knowledge. Thus, bacterial genome annotation has emerged as a key point to investigate in bacteria. Any efficient tool designed specifically to annotate bacterial genomes sequenced with massively parallel technologies has to consider the specific features of bacterial genomes (absence of introns and scarcity of nonprotein-coding sequence) and of next-generation sequencing (NGS) technologies (presence of errors and not perfectly assembled genomes). These features make it convenient to focus on coding regions and, hence, on protein sequences that are the elements directly related with biological functions.
In this chapter we describe how to annotate bacterial genomes with BG7, an open-source tool based on a protein-centered gene calling/annotation paradigm. BG7 is specifically designed for the annotation of bacterial genomes sequenced with NGS. This tool is sequence error tolerant maintaining their capabilities for the annotation of highly fragmented genomes or for annotating mixed sequences coming from several genomes (as those obtained through metagenomics samples). BG7 has been designed with scalability as a requirement, with a computing infrastructure completely based on cloud computing (Amazon Web Services).
Key wordsBacterial genomics Genome annotation Gene calling Next-generation sequencing Cloud computing Metagenomics Functional annotation Gene prediction Biographika Massive parallel sequencing
This work has been partially funded by the CDTI project NEXTMICRO (grant IDI-20120242). A.A. and E.K. are funded by the INTERCROSSING (Grant agreement no.: 289974) ITN European project.
Competing Interests Era7 offers service of bacterial annotation based on BG7, but BG7 code is available at GitHub, https://github.com/bg7/, under the license AGPLv3. All authors work at the research group named Oh no sequences! within Era7 Bioinformatics company.
- 4.Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, Meyer F, Olsen GJ, Olson R, Osterman AL, Overbeek RA, McNeil LK, Paarmann D, Paczian T, Parrello B, Pusch GD, Reich C, Stevens R, Vassieva O, Vonstein V, Wilke A, Zagnitko O (2008) The RAST server: rapid annotations using subsystems technology. BMC Genomics 9(1):75CrossRefPubMedPubMedCentralGoogle Scholar
- 6.Borodovsky M, Mills R, Besemer J, Lomsadze A (2003) Prokaryotic gene prediction using GeneMark and GeneMark.Hmm. In: Andreas D. Baxevanis et al. (eds) Current protocols in bioinformatics (Chapter 4 (7), Unit 4.5)Google Scholar
- 9.Tanenbaum DM, Goll J, Murphy S, Kumar P, Zafar N, Thiagarajan M, Madupu R, Davidsen T, Kagan L, Kravitz S, Rusch DB, Yooseph S (2010) The JCVI standard operating procedure for annotating prokaryotic metagenomic shotgun sequencing data. Stand Genomic Sci 2(2):229–237CrossRefPubMedPubMedCentralGoogle Scholar