Bacteriophages pp 231-238 | Cite as

Phage Genome Annotation Using the RAST Pipeline

Part of the Methods in Molecular Biology book series (MIMB, volume 1681)


Phages are complex biomolecular machineries that have to survive in a bacterial world. Phage genomes show many adaptations to their lifestyle such as shorter genes, reduced capacity for redundant DNA sequences, and the inclusion of tRNAs in their genomes. In addition, phages are not free-living, they require a host for replication and survival. These unique adaptations provide challenges for the bioinformatics analysis of phage genomes. In particular, ORF calling, genome annotation, noncoding RNA (ncRNA) identification, and the identification of transposons and insertions are all complicated in phage genome analysis. We provide a road map through the phage genome annotation pipeline, and discuss the challenges and solutions for phage genome annotation as we have implemented in the rapid annotation using subsystems (RAST) pipeline.

Key words

Phage Genome annotation RAST Functional annotation Gene predictions 



This work was supported by grants from the National Science Foundation MCB-1330800 and DUE-1323809 to RAE. BED was supported by the Netherlands Organization for Scientific Research (NWO) Vidi grant 864.14.004.


  1. 1.
    Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, Meyer F, Olsen GJ, Olson R, Osterman AL, Overbeek RA, McNeil LK, Paarmann D, Paczian T, Parrello B, Pusch GD, Reich C, Stevens R, Vassieva O, Vonstein V, Wilke A, Zagnitko O (2008) The RAST Server: rapid annotations using subsystems technology. BMC Genomics 9:75CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Brettin T, Davis JJ, Disz T, Edwards RA, Gerdes S, Olsen GJ, Olson R, Overbeek R, Parrello B, Pusch GD, Shukla M, Thomason Iii JA, Stevens R, Vonstein V, Wattam AR, Xia F (2015) RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci Rep 5:8365CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Badger JH, Olsen GJ (1999) CRITICA: coding region identification tool invoking comparative analysis. Mol Biol Evol 16:512–524CrossRefPubMedGoogle Scholar
  4. 4.
    Borodovsky M, Mclninch JD, Koonin EV, Rudd KE, Médigue C, Danchin A (1995) Detection of new genes in a bacterial genome using Markov models for three gene classes. Nucleic Acids Res 23:3554–3562CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Lukashin AV, Borodovsky M (1998) GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res 26:1107–1115CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Krause L, McHardy AC, Pühler A, Stoye J, Meyer F (2007) GISMO - Gene identification using a support vector machine for ORF classification. Nucleic Acids Res 35:540–549CrossRefPubMedGoogle Scholar
  7. 7.
    Delcher AL, Harmon D, Kasif S, White O, Salzberg SL (1999) Improved microbial gene identification with GLIMMER. Nucleic Acids Res 27:4636–4641CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Kelley DR, Liu B, Delcher AL, Pop M, Salzberg SL (2012) Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering. Nucleic Acids Res 40:e9–e9CrossRefPubMedGoogle Scholar
  9. 9.
    Noguchi H, Taniguchi T, Itoh T (2008) MetaGeneAnnotator: Detecting species-specific patterns of ribosomal binding site for precise gene prediction in anonymous prokaryotic and phage genomes. DNA Res 15:387–396Google Scholar
  10. 10.
    Hyatt D, Chen G-L, LoCascio PF, Land ML, Larimer FW, Hauser LJ (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11:119CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Summer EJ, Berry J, Tran TAT, Niu L, Struck DK, Young R (2007) Rz/Rz1 lysis gene equivalents in phages of Gram-negative hosts. J Mol Biol 373:1098–1112CrossRefPubMedGoogle Scholar
  12. 12.
    Walker PJ, Firth C, Widen SG, Blasdell KR, Guzman H, Wood TG, Paradkar PN, Holmes EC, Tesh RB, Vasilakis N (2015) Evolution of genome size and complexity in the Rhabdoviridae. PLoS Pathog 11:e1004664CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Kristensen DM, Waller AS, Yamada T, Bork P, Mushegian AR, Koonin EV (2013) Orthologous gene clusters and taxon signature genes for viruses of prokaryotes. J Bacteriol 195:941–950CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    McNair K, Bailey BA, Edwards RA (2012) PHACTS, a computational approach to classifying the lifestyle of phages. Bioinformatics 28:614–618CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Seguritan V, Alves N, Arnoult M, Raymond A, Lorimer D, Burgin AB, Salamon P, Segall AM (2012) Artificial neural networks trained to detect viral and phage structural proteins. PLoS Comput Biol 8:e1002657CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Lowe TM, Eddy SR (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25:955–964CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Nawrocki EP (2014) Annotating functional RNAs in genomes using Infernal. Methods Mol Biol 1097:163–197CrossRefPubMedGoogle Scholar
  18. 18.
    Bailly-Bechet M, Vergassola M, Rocha E (2007) Causes for the intriguing presence of tRNAs in phages. Genome Res 17:1486–1495CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Williams KP (2002) Integration sites for genetic elements in prokaryotic tRNA and tmRNA genes: sublocation preference of integrase subfamilies. Nucleic Acids Res 30:866–875CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    Seed KD, Lazinski DW, Calderwood SB, Camilli A (2013) A bacteriophage encodes its own CRISPR/Cas adaptive response to evade host innate immunity. Nature 494:489–491CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Cassman N, Prieto-Davó A, Walsh K, Silva GGZ, Angly F, Akhter S, Barott K, Busch J, McDole T, Haggerty JM, Willner D, Alarcón G, Ulloa O, DeLong EF, Dutilh BE, Rohwer F, Dinsdale EA (2012) Oxygen minimum zones harbour novel viral communities with low diversity. Environ Microbiol 14:3043–3065CrossRefPubMedGoogle Scholar
  22. 22.
    Aziz RK, Breitbart M, Edwards RA (2010) Transposases are the most abundant, most ubiquitous genes in nature. Nucleic Acids Res 38:4207–4217CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Riadi G, Medina-Moenne C, Holmes DS (2012) TnpPred: a web service for the robust prediction of prokaryotic transposases. Comp Funct Genomics 2012:678761CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27:573–580CrossRefPubMedPubMedCentralGoogle Scholar
  25. 25.
    Volfovsky N, Haas BJ, Salzberg SL (2001) A clustering method for repeat analysis in DNA sequences. Genome Biol 2:RESEARCH0027CrossRefPubMedPubMedCentralGoogle Scholar
  26. 26.
    Kropinski AM, Prangishvili D, Lavigne R (2009) Position paper: the creation of a rational scheme for the nomenclature of viruses of Bacteria and Archaea. Environ Microbiol 11:2775–2777CrossRefPubMedGoogle Scholar
  27. 27.
    Edwards RA, McNair K, Faust K, Raes J, Dutilh BE (2016) Computational approaches to predict bacteriophage–host relationships. FEMS Microbiol Rev 40:58–72CrossRefGoogle Scholar
  28. 28.
    Aziz RK, Dwivedi B, Akhter S, Breitbart M, Edwards RA (2015) Multidimensional metrics for estimating phage abundance, distribution, gene density, and sequence coverage in metagenomes. Front Microbiol 6:381PubMedPubMedCentralGoogle Scholar
  29. 29.
    Akhter S, Aziz RK, Edwards RA (2012) PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Res 40:e126–e126CrossRefPubMedPubMedCentralGoogle Scholar
  30. 30.
    Akhter S, Bailey BA, Salamon P, Aziz RK, Edwards RA (2013) Applying Shannon’s information theory to bacterial and phage genomes and metagenomes. Sci Rep 3:1033CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media LLC 2018

Authors and Affiliations

  1. 1.Computational Sciences Research CenterSan Diego State UniversitySan DiegoUSA
  2. 2.Department of Microbiology and Immunology, Faculty of PharmacyCairo UniversityCairoEgypt
  3. 3.Argonne National LaboratoryArgonneUSA
  4. 4.Theoretical Biology and BioinformaticsUtrecht UniversityUtrechtThe Netherlands
  5. 5.Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life SciencesRadboud University Medical CentreNijmegenThe Netherlands
  6. 6.Departments of Biology and Computer ScienceSan Diego State UniversitySan DiegoUSA

Personalised recommendations