Advertisement

Gene Calling and Bacterial Genome Annotation with BG7

  • Raquel TobesEmail author
  • Pablo Pareja-Tobes
  • Marina Manrique
  • Eduardo Pareja-Tobes
  • Evdokim Kovach
  • Alexey Alekhin
  • Eduardo Pareja
Part of the Methods in Molecular Biology book series (MIMB, volume 1231)

Abstract

New massive sequencing technologies are providing many bacterial genome sequences from diverse taxa but a refined annotation of these genomes is crucial for obtaining scientific findings and new knowledge. Thus, bacterial genome annotation has emerged as a key point to investigate in bacteria. Any efficient tool designed specifically to annotate bacterial genomes sequenced with massively parallel technologies has to consider the specific features of bacterial genomes (absence of introns and scarcity of nonprotein-coding sequence) and of next-generation sequencing (NGS) technologies (presence of errors and not perfectly assembled genomes). These features make it convenient to focus on coding regions and, hence, on protein sequences that are the elements directly related with biological functions.

In this chapter we describe how to annotate bacterial genomes with BG7, an open-source tool based on a protein-centered gene calling/annotation paradigm. BG7 is specifically designed for the annotation of bacterial genomes sequenced with NGS. This tool is sequence error tolerant maintaining their capabilities for the annotation of highly fragmented genomes or for annotating mixed sequences coming from several genomes (as those obtained through metagenomics samples). BG7 has been designed with scalability as a requirement, with a computing infrastructure completely based on cloud computing (Amazon Web Services).

Key words

Bacterial genomics Genome annotation Gene calling Next-generation sequencing Cloud computing Metagenomics Functional annotation Gene prediction Biographika Massive parallel sequencing 

Notes

Acknowledgements

This work has been partially funded by the CDTI project NEXTMICRO (grant IDI-20120242). A.A. and E.K. are funded by the INTERCROSSING (Grant agreement no.: 289974) ITN European project.

Competing Interests Era7 offers service of bacterial annotation based on BG7, but BG7 code is available at GitHub, https://github.com/bg7/, under the license AGPLv3. All authors work at the research group named Oh no sequences! within Era7 Bioinformatics company.

References

  1. 1.
    Salzberg SL, Delcher AL, Kasif S, White S (1998) Microbial gene identification using interpolated Markov models. Nucleic Acids Res 26(2):544–548CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Besemer J, Lomsadze A, Borodovsky M (2001) GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res 29(12):2607–2618CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11:119CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, Meyer F, Olsen GJ, Olson R, Osterman AL, Overbeek RA, McNeil LK, Paarmann D, Paczian T, Parrello B, Pusch GD, Reich C, Stevens R, Vassieva O, Vonstein V, Wilke A, Zagnitko O (2008) The RAST server: rapid annotations using subsystems technology. BMC Genomics 9(1):75CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Mavromatis K, Ivanova NN, Chen IA, Szeto E, Markowitz VM, Kyrpides NC (2009) The DOE-JGI standard operating procedure for the annotations of microbial genomes. Stand Genomic Sci 1(1):63–67CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Borodovsky M, Mills R, Besemer J, Lomsadze A (2003) Prokaryotic gene prediction using GeneMark and GeneMark.Hmm. In: Andreas D. Baxevanis et al. (eds) Current protocols in bioinformatics (Chapter 4 (7), Unit 4.5)Google Scholar
  7. 7.
    Stewart AC, Osborne B, Read TD (2009) DIYA: a bacterial annotation pipeline for any genomics lab. Bioinformatics 25(7):962–963CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Kumar K, Desai V, Cheng L, Khitrov M, Grover D, Satya RV, Yu C, Zavaljevski N, Reifman J (2011) AGeS: a software system for microbial genome sequence annotation. PloS One 6(3):e17469CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Tanenbaum DM, Goll J, Murphy S, Kumar P, Zafar N, Thiagarajan M, Madupu R, Davidsen T, Kagan L, Kravitz S, Rusch DB, Yooseph S (2010) The JCVI standard operating procedure for annotating prokaryotic metagenomic shotgun sequencing data. Stand Genomic Sci 2(2):229–237CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Hemmerich C, Buechlein A, Podicheti R, Revanna KV, Dong Q (2010) An Ergatis-based prokaryotic genome annotation web server. Bioinformatics 26(8):1122–1124CrossRefPubMedGoogle Scholar
  11. 11.
    Van Domselaar GH, Stothard P, Shrivastava S, Cruz JA, Guo A, Dong X, Lu P, Szafron D, Greiner R, Wishart DS (2005) BASys: a web server for automated bacterial genome annotation. Nucleic Acids Res 33(Web Server issue):W455–W459CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Lee D, Seo H, Park C, Park K (2009) WeGAS: a web-based microbial genome annotation system. Biosci Biotechnol Biochem 73(1):213–216CrossRefPubMedGoogle Scholar
  13. 13.
    Pareja-Tobes P, Manrique M, Pareja-Tobes E, Pareja E, Tobes R (2012) BG7: a new approach for bacterial genome annotation designed for next generation sequencing data. PLoS One 7(11):e49239CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Raquel Tobes
    • 1
    Email author
  • Pablo Pareja-Tobes
    • 1
  • Marina Manrique
    • 1
  • Eduardo Pareja-Tobes
    • 1
  • Evdokim Kovach
    • 1
  • Alexey Alekhin
    • 1
  • Eduardo Pareja
    • 1
  1. 1.Oh no Sequences! Research GroupEra7 BioinformaticsGranadaSpain

Personalised recommendations