Introduction
Viruses are the most abundant and diverse organisms on Earth, yet only a small fraction of the viral genome sequence space has been decoded. Based on analyses of environmental viral communities, it is estimated that only 1 % of the existent viral diversity has been explored. During more than a century, cultivation of viruses has remained the gold standard for virus discovery and characterization. One major limitation of this approach is that for most viral species, their hosts (predominantly microbes) are either unknown or cannot be grown in culture. Viral metagenomics (VM) circumvents this limitation by sequencing viral genetic material isolated directly from the environment. A typical viral metagenomics workflow is depicted in Fig. 1. Viral metagenomic methods have evolved significantly since their beginnings. Initially, they involved viral particle purification and enrichment from environmental samples, sharing of isolated nucleic acids followed by an optional cDNA...
References
Bench SR, Hanson TE, Williamson KE, Ghosh D, Radosovich M, Wang K, Wommack KE. Metagenomic characterization of Chesapeake Bay virioplankton. Appl Environ Microbiol. 2007;73(23):7629–41. 2168038.
Bolger AM1, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014; doi:10.1093/bioinformatics/btu170
Chen YA, Lin CC, Wang CD, Wu HB, Hwang PI. An optimized procedure greatly improves EST vector contamination removal. BMC Genomics. 2007;8:416. 2194723.
Chou HH, Holmes MH. DNA sequence quality trimming and vector removal. Bioinformatics. 2001;17(12):1093–104.
DeJongh M, Formsma K, Boillot P, Gould J, Rycenga M, Best A. Toward the automated generation of genome-scale metabolic networks in the SEED. BMC Bioinforma. 2007;8:139. 1868769.
Dwivedi B, Schmieder R, Goldsmith DB, Edwards RA, Breitbart M. PhiSiGns: an online tool to identify signature genes in phages and design PCR primers for examining phage diversity. BMC Bioinformatics. 2012;13:37.
Eddy SR. Accelerated profile HMM searches. PLoS Comput Biol. 2011;7(10):e1002195. 3197634.
Emanuelsson O, Nielsen H, von Heijne G. ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites. Protein Sci. 1999;8(5):978–84. 2144330.
Emanuelsson O, Nielsen H, Brunak S, von Heijne G. Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol. 2000;300(4):1005–16.
Gilles A, Meglecz E, Pech N, Ferreira S, Malausa T, Martin JF. Accuracy and quality assessment of 454 GS-FLX titanium pyrosequencing. BMC Genomics. 2011;12:245. 3116506.
Gillespie JJ, Wattam AR, Cammer SA, Gabbard JL, Shukla MP, Dalay O, Driscoll T, et al. PATRIC: the comprehensive bacterial bioinformatics resource with a focus on human pathogenic species. Infect Immun. 2011;79(11):4286–98. 3257917.
Glass EM, Wilkening J, Wilke A, Antonopoulos D, Meyer F. Using the metagenomics RAST server (MG-RAST) for analyzing shotgun metagenomes. Cold Spring Harb Protoc. 2010; 2010(1):pdb prot5368.
Huang Y, Niu B, Gao Y, Fu L, Li W. CD-HIT suite: a web server for clustering and comparing biological sequences. Bioinformatics. 2010;26(5):680–2. 2828112.
Hunter S, Jones P, Mitchell A, Apweiler R, Attwood TK, Bateman A, Bernard T, Binns D, Bork P, Burge S, et al. InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res. 2012;40(Database issue):D306–12. 3245097.
Jerome M, Noirot C, Klopp C. Assessment of replicate bias in 454 pyrosequencing and a multi-purpose read-filtering tool. BMC Res Notes. 2011;4:149. 3117718.
Krogh A1, Larsson B, von Heijne G, Sonnhammer EL. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol. 2001;305(3):567–80.
Kunin V, Engelbrektson A, Ochman H, Hugenholtz P. Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environ Microbiol. 2010;12(1):118–23.
Laserson J, Jojic V, Koller D. Genovo: de novo assembly for metagenomes. J Comput Biol. 2011;18(3):429–43.
Leplae R, Lima-Mendez G, Toussaint A. ACLAME: a classification of mobile genetic elements, update 2010. Nucleic Acids Res. 2010;38(Database issue):D57–61. 2808911.
Letunic I, Doerks T, Bork P. SMART 7: recent updates to the protein domain annotation resource. Nucleic Acids Res. 2012;40(Database issue):D302–5. 3245027.
Li W. Analysis and comparison of very large metagenomes with fast clustering and functional annotation. BMC Bioinforma. 2009;10:359. 2774329.
Lorenzi HA, Hoover J, Inman J, Safford T, Murphy S, Kagan L, Williamson SJ. TheViral MetaGenome Annotation Pipeline (VMGAP): an automated tool for the functional annotation of viral Metagenomic shotgun sequencing data. Stand Genomic Sci. 2011;4(3):418–29. 3156399.
Mao X, Cai T, Olyarchuk JG, Wei L. Automated genome annotation and pathway identification using the KEGG Orthology (KO) as a controlled vocabulary. Bioinformatics. 2005;21(19):3787–93.
Marchler-Bauer A, Lu S, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C, Fong JH, Geer LY, Geer RC, Gonzales NR, et al. CDD: a conserved domain database for the functional annotation of proteins. Nucleic Acids Res. 2011;39(Database issue):D225–9. 3013737.
Markowitz VM, Chen IM, Chu K, Szeto E, Palaniappan K, Grechkin Y, Ratner A, Jacob B, Pati A, Huntemann M, et al. IMG/M: the integrated metagenome data management and comparative analysis system. Nucleic Acids Res. 2012;40(Database issue):D123–9. 3245048.
McHardy AC, Martin HG, Tsirigos A, Hugenholtz P, Rigoutsos I. Accurate phylogenetic classification of variable-length DNA fragments. Nat Methods. 2007;4(1):63–72.
Meyer F, Paarmann D, D’Souza M, Olson R, Glass EM, Kubal M, Paczian T, Rodriguez A, Stevens R, Wilke A, et al. The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinforma. 2008;9:386. 2563014.
Minoche AE, Dohm JC, Himmelbauer H. Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biol. 2011;12(11):R112. 3334598.
Namiki T, Hachiya T, Tanaka H, Sakakibara Y. MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res. 2012;40(20):e155.
Noguchi H, Taniguchi T, Itoh T. MetaGeneAnnotator: detecting species-specific patterns of ribosomal binding site for precise gene prediction in anonymous prokaryotic and phage genomes. DNA Res. 2008;15(6):387–96. 2608843.
Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY, Cohoon M, De Crecy-Lagard V, Diaz N, Disz T, Edwards R, et al. The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 2005;33(17):5691–702. 1251668.
Patel RK, Jain M. NGS QC toolkit: a toolkit for quality control of next generation sequencing data. PLoS One. 2012;7(2):e30619. 3270013.
Peng Y, Leung HC, Yiu SM, Chin FY. Meta-IDBA: a de Novo assembler for metagenomic data. Bioinformatics. 2011;27(13):i94–101. 3117360.
Peng Y, Leung HC, Yiu SM, Chin FY. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics. 2012;28(11):1420–8.
Petersen TN, Brunak S, von Heijne G, Nielsen H. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods. 2011;8(10):785–6.
Powell S, Szklarczyk D, Trachana K, Roth A, Kuhn M, Muller J, Arnold R, Rattei T, Letunic I, Doerks T, et al. eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges. Nucleic Acids Res. 2012;40(Database issue):D284–9. 3245133.
Pruitt KD, Tatusova T, Brown GR, Maglott DR. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res. 2012;40(Database issue):D130–5. 3245008.
Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, Pang N, Forslund K, Ceric G, Clements J, et al. The Pfam protein families database. Nucleic Acids Res. 2012;40(Database issue):D290–301. 3245129.
Rho M, Tang H, Ye Y. FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res. 2010;38(20):e191. 2978382.
Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Chetvernin V, Church DM, Dicuccio M, Federhen S, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2012;40(Database issue):D13–25. 3245031.
Shoop E, Casaes P, Onsongo G, Lesnett L, Petursdottir EO, Donkor EK, Tkach D, Cosimini M. Data exploration tools for the gene ontology database. Bioinformatics. 2004;20(18):3442–54.
Steward GF, Preston CM. Analysis of a viral metagenomic library from 200 m depth in Monterey Bay, California constructed by direct shotgun cloning. Virol J. 2011;8:287. 3128862.
Sun S, Chen J, Li W, Altintas I, Lin A, Peltier S, Stocks K, Allen EE, Ellisman M, Grethe J, et al. Community cyberinfrastructure for advanced microbial ecology research and analysis: the CAMERA resource. Nucleic Acids Res. 2011;39(Database issue):D546–51. 3013694.
Suzek BE, Huang H, McGarvey P, Mazumder R, Wu CH. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics. 2007;23(10):1282–8.
Tanabe M, Kanehisa M. Using the KEGG database resource. Curr Protoc Bioinform. 2012. Chapter 1:Unit1 12.
Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003;4:41. 222959.
The UniProt Consortium. Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res. 2012;40(Database issue):D71–5.3245120.
Toussaint A, Lima-Mendez G, Leplae R. PhiGO, a phage ontology associated with the ACLAME database. Res Microbiol. 2007;158(7):567–71.
Yok N, Rosen G. Benchmarking of gene prediction programs for metagenomic data. Conf Proc IEEE Eng Med Biol Soc. 2010;2010:6190–3.
Zhang T, Luo Y, Liu K, Pan L, Zhang B, Yu J, Hu S. BIGpre: a quality assessment package for next-generation sequencing data. Genomics Proteomics Bioinforma. 2011;9(6):238–44.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this entry
Cite this entry
Lorenzi, H. (2013). Viral Metagenome Annotation Pipeline. In: Nelson, K. (eds) Encyclopedia of Metagenomics. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6418-1_693-4
Download citation
DOI: https://doi.org/10.1007/978-1-4614-6418-1_693-4
Received:
Accepted:
Published:
Publisher Name: Springer, New York, NY
Online ISBN: 978-1-4614-6418-1
eBook Packages: Springer Reference Biomedicine and Life SciencesReference Module Biomedical and Life Sciences