Skip to main content

Viral Metagenome Annotation Pipeline

  • Living reference work entry
  • First Online:
Encyclopedia of Metagenomics

Introduction

Viruses are the most abundant and diverse organisms on Earth, yet only a small fraction of the viral genome sequence space has been decoded. Based on analyses of environmental viral communities, it is estimated that only 1 % of the existent viral diversity has been explored. During more than a century, cultivation of viruses has remained the gold standard for virus discovery and characterization. One major limitation of this approach is that for most viral species, their hosts (predominantly microbes) are either unknown or cannot be grown in culture. Viral metagenomics (VM) circumvents this limitation by sequencing viral genetic material isolated directly from the environment. A typical viral metagenomics workflow is depicted in Fig. 1. Viral metagenomic methods have evolved significantly since their beginnings. Initially, they involved viral particle purification and enrichment from environmental samples, sharing of isolated nucleic acids followed by an optional cDNA...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  • Bench SR, Hanson TE, Williamson KE, Ghosh D, Radosovich M, Wang K, Wommack KE. Metagenomic characterization of Chesapeake Bay virioplankton. Appl Environ Microbiol. 2007;73(23):7629–41. 2168038.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Bolger AM1, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014; doi:10.1093/bioinformatics/btu170

    Google Scholar 

  • Chen YA, Lin CC, Wang CD, Wu HB, Hwang PI. An optimized procedure greatly improves EST vector contamination removal. BMC Genomics. 2007;8:416. 2194723.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Chou HH, Holmes MH. DNA sequence quality trimming and vector removal. Bioinformatics. 2001;17(12):1093–104.

    Article  CAS  PubMed  Google Scholar 

  • DeJongh M, Formsma K, Boillot P, Gould J, Rycenga M, Best A. Toward the automated generation of genome-scale metabolic networks in the SEED. BMC Bioinforma. 2007;8:139. 1868769.

    Article  Google Scholar 

  • Dwivedi B, Schmieder R, Goldsmith DB, Edwards RA, Breitbart M. PhiSiGns: an online tool to identify signature genes in phages and design PCR primers for examining phage diversity. BMC Bioinformatics. 2012;13:37.

    Google Scholar 

  • Eddy SR. Accelerated profile HMM searches. PLoS Comput Biol. 2011;7(10):e1002195. 3197634.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Emanuelsson O, Nielsen H, von Heijne G. ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites. Protein Sci. 1999;8(5):978–84. 2144330.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Emanuelsson O, Nielsen H, Brunak S, von Heijne G. Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol. 2000;300(4):1005–16.

    Article  CAS  PubMed  Google Scholar 

  • Gilles A, Meglecz E, Pech N, Ferreira S, Malausa T, Martin JF. Accuracy and quality assessment of 454 GS-FLX titanium pyrosequencing. BMC Genomics. 2011;12:245. 3116506.

    Article  PubMed Central  PubMed  Google Scholar 

  • Gillespie JJ, Wattam AR, Cammer SA, Gabbard JL, Shukla MP, Dalay O, Driscoll T, et al. PATRIC: the comprehensive bacterial bioinformatics resource with a focus on human pathogenic species. Infect Immun. 2011;79(11):4286–98. 3257917.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Glass EM, Wilkening J, Wilke A, Antonopoulos D, Meyer F. Using the metagenomics RAST server (MG-RAST) for analyzing shotgun metagenomes. Cold Spring Harb Protoc. 2010; 2010(1):pdb prot5368.

    Google Scholar 

  • Huang Y, Niu B, Gao Y, Fu L, Li W. CD-HIT suite: a web server for clustering and comparing biological sequences. Bioinformatics. 2010;26(5):680–2. 2828112.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Hunter S, Jones P, Mitchell A, Apweiler R, Attwood TK, Bateman A, Bernard T, Binns D, Bork P, Burge S, et al. InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res. 2012;40(Database issue):D306–12. 3245097.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Jerome M, Noirot C, Klopp C. Assessment of replicate bias in 454 pyrosequencing and a multi-purpose read-filtering tool. BMC Res Notes. 2011;4:149. 3117718.

    Article  PubMed Central  PubMed  Google Scholar 

  • Krogh A1, Larsson B, von Heijne G, Sonnhammer EL. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol. 2001;305(3):567–80.

    Google Scholar 

  • Kunin V, Engelbrektson A, Ochman H, Hugenholtz P. Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environ Microbiol. 2010;12(1):118–23.

    Article  CAS  PubMed  Google Scholar 

  • Laserson J, Jojic V, Koller D. Genovo: de novo assembly for metagenomes. J Comput Biol. 2011;18(3):429–43.

    Article  CAS  PubMed  Google Scholar 

  • Leplae R, Lima-Mendez G, Toussaint A. ACLAME: a classification of mobile genetic elements, update 2010. Nucleic Acids Res. 2010;38(Database issue):D57–61. 2808911.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Letunic I, Doerks T, Bork P. SMART 7: recent updates to the protein domain annotation resource. Nucleic Acids Res. 2012;40(Database issue):D302–5. 3245027.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Li W. Analysis and comparison of very large metagenomes with fast clustering and functional annotation. BMC Bioinforma. 2009;10:359. 2774329.

    Article  Google Scholar 

  • Lorenzi HA, Hoover J, Inman J, Safford T, Murphy S, Kagan L, Williamson SJ. TheViral MetaGenome Annotation Pipeline (VMGAP): an automated tool for the functional annotation of viral Metagenomic shotgun sequencing data. Stand Genomic Sci. 2011;4(3):418–29. 3156399.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Mao X, Cai T, Olyarchuk JG, Wei L. Automated genome annotation and pathway identification using the KEGG Orthology (KO) as a controlled vocabulary. Bioinformatics. 2005;21(19):3787–93.

    Article  CAS  PubMed  Google Scholar 

  • Marchler-Bauer A, Lu S, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C, Fong JH, Geer LY, Geer RC, Gonzales NR, et al. CDD: a conserved domain database for the functional annotation of proteins. Nucleic Acids Res. 2011;39(Database issue):D225–9. 3013737.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Markowitz VM, Chen IM, Chu K, Szeto E, Palaniappan K, Grechkin Y, Ratner A, Jacob B, Pati A, Huntemann M, et al. IMG/M: the integrated metagenome data management and comparative analysis system. Nucleic Acids Res. 2012;40(Database issue):D123–9. 3245048.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • McHardy AC, Martin HG, Tsirigos A, Hugenholtz P, Rigoutsos I. Accurate phylogenetic classification of variable-length DNA fragments. Nat Methods. 2007;4(1):63–72.

    Article  CAS  PubMed  Google Scholar 

  • Meyer F, Paarmann D, D’Souza M, Olson R, Glass EM, Kubal M, Paczian T, Rodriguez A, Stevens R, Wilke A, et al. The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinforma. 2008;9:386. 2563014.

    Article  CAS  Google Scholar 

  • Minoche AE, Dohm JC, Himmelbauer H. Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biol. 2011;12(11):R112. 3334598.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Namiki T, Hachiya T, Tanaka H, Sakakibara Y. MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res. 2012;40(20):e155.

    Google Scholar 

  • Noguchi H, Taniguchi T, Itoh T. MetaGeneAnnotator: detecting species-specific patterns of ribosomal binding site for precise gene prediction in anonymous prokaryotic and phage genomes. DNA Res. 2008;15(6):387–96. 2608843.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY, Cohoon M, De Crecy-Lagard V, Diaz N, Disz T, Edwards R, et al. The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 2005;33(17):5691–702. 1251668.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Patel RK, Jain M. NGS QC toolkit: a toolkit for quality control of next generation sequencing data. PLoS One. 2012;7(2):e30619. 3270013.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Peng Y, Leung HC, Yiu SM, Chin FY. Meta-IDBA: a de Novo assembler for metagenomic data. Bioinformatics. 2011;27(13):i94–101. 3117360.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Peng Y, Leung HC, Yiu SM, Chin FY. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics. 2012;28(11):1420–8.

    Article  CAS  PubMed  Google Scholar 

  • Petersen TN, Brunak S, von Heijne G, Nielsen H. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods. 2011;8(10):785–6.

    Article  CAS  PubMed  Google Scholar 

  • Powell S, Szklarczyk D, Trachana K, Roth A, Kuhn M, Muller J, Arnold R, Rattei T, Letunic I, Doerks T, et al. eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges. Nucleic Acids Res. 2012;40(Database issue):D284–9. 3245133.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Pruitt KD, Tatusova T, Brown GR, Maglott DR. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res. 2012;40(Database issue):D130–5. 3245008.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, Pang N, Forslund K, Ceric G, Clements J, et al. The Pfam protein families database. Nucleic Acids Res. 2012;40(Database issue):D290–301. 3245129.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Rho M, Tang H, Ye Y. FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res. 2010;38(20):e191. 2978382.

    Article  PubMed Central  PubMed  Google Scholar 

  • Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Chetvernin V, Church DM, Dicuccio M, Federhen S, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2012;40(Database issue):D13–25. 3245031.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Shoop E, Casaes P, Onsongo G, Lesnett L, Petursdottir EO, Donkor EK, Tkach D, Cosimini M. Data exploration tools for the gene ontology database. Bioinformatics. 2004;20(18):3442–54.

    Article  CAS  PubMed  Google Scholar 

  • Steward GF, Preston CM. Analysis of a viral metagenomic library from 200 m depth in Monterey Bay, California constructed by direct shotgun cloning. Virol J. 2011;8:287. 3128862.

    Article  PubMed Central  PubMed  Google Scholar 

  • Sun S, Chen J, Li W, Altintas I, Lin A, Peltier S, Stocks K, Allen EE, Ellisman M, Grethe J, et al. Community cyberinfrastructure for advanced microbial ecology research and analysis: the CAMERA resource. Nucleic Acids Res. 2011;39(Database issue):D546–51. 3013694.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Suzek BE, Huang H, McGarvey P, Mazumder R, Wu CH. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics. 2007;23(10):1282–8.

    Article  CAS  PubMed  Google Scholar 

  • Tanabe M, Kanehisa M. Using the KEGG database resource. Curr Protoc Bioinform. 2012. Chapter 1:Unit1 12.

    Google Scholar 

  • Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003;4:41. 222959.

    Article  PubMed Central  PubMed  Google Scholar 

  • The UniProt Consortium. Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res. 2012;40(Database issue):D71–5.3245120.

    Google Scholar 

  • Toussaint A, Lima-Mendez G, Leplae R. PhiGO, a phage ontology associated with the ACLAME database. Res Microbiol. 2007;158(7):567–71.

    Article  CAS  PubMed  Google Scholar 

  • Yok N, Rosen G. Benchmarking of gene prediction programs for metagenomic data. Conf Proc IEEE Eng Med Biol Soc. 2010;2010:6190–3.

    PubMed  Google Scholar 

  • Zhang T, Luo Y, Liu K, Pan L, Zhang B, Yu J, Hu S. BIGpre: a quality assessment package for next-generation sequencing data. Genomics Proteomics Bioinforma. 2011;9(6):238–44.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hernan Lorenzi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this entry

Cite this entry

Lorenzi, H. (2013). Viral Metagenome Annotation Pipeline. In: Nelson, K. (eds) Encyclopedia of Metagenomics. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6418-1_693-4

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-6418-1_693-4

  • Received:

  • Accepted:

  • Published:

  • Publisher Name: Springer, New York, NY

  • Online ISBN: 978-1-4614-6418-1

  • eBook Packages: Springer Reference Biomedicine and Life SciencesReference Module Biomedical and Life Sciences

Publish with us

Policies and ethics