Abstracts
This chapter reviews metagenome de novo assembly and currently available assembly algorithms and tools. Challenges and opportunities from metagenomic assembly are presented, with a summary of a typical metagenomic assembly workflow. Finally, approaches to reduce large datasets and identify the functions of assembled genes are also discussed.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Albertsen M, Hugenholtz P, Skarshewski A et al (2013) Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat Biotechnol 31:533–538. doi:10.1038/nbt.2579
Alneberg J, Bjarnason BS, de Bruijn I et al (2014) Binning metagenomic contigs by coverage and composition. Nat Methods 11:1144–1146. doi:10.1038/nmeth.3103
Altschul SF, Gish W, Miller W et al (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410. doi:10.1016/S0022-2836(05)80360-2
Bench SR, Hanson TE, Williamson KE et al (2007) Metagenomic characterization of Chesapeake Bay virioplankton. Appl Environ Microbiol 73:7629–7641. doi:10.1128/AEM.00938-07
Brady A, Salzberg SL (2009) Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat Methods 6:673–676. doi:10.1038/nmeth.1358
Brown CT, Howe A, Zhang Q, et al (2012) A reference-free algorithm for computational normalization of shotgun sequencing data. arXiv 1203.4802:1–18. doi: 10.1128/genomeA.00802-14.Copyright
Buchfink B, Xie C, Huson DH (2014) Fast and sensitive protein alignment using DIAMOND. Nat Methods 12:59–60. doi:10.1038/nmeth.3176
Camacho C, Coulouris G, Avagyan V et al (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421. doi:10.1186/1471-2105-10-421
Cleary B, Brito IL, Huang K et al (2015) Detection of low-abundance bacterial strains in metagenomic datasets by eigengenome partitioning. Nat Biotechnol 33(10):1053–1060
Darling AE, Jospin G, Lowe E et al (2014) PhyloSift: phylogenetic analysis of genomes and metagenomes. PeerJ 2:e243. doi:10.7717/peerj.243
Dick GJ, Andersson AF, Baker BJ et al (2009) Community-wide analysis of microbial genome sequence signatures. Genome Biol 10:R85. doi:10.1186/gb-2009-10-8-r85
Finn RD, Mistry J, Tate J et al (2010) The Pfam protein families database. Nucleic Acids Res 38:D211–D222. doi:10.1093/nar/gkm960
Finn RD, Clements J, Arndt W et al (2015) HMMER web server: 2015 update. Nucleic Acids Res 43:W30–W38. doi:10.1093/nar/gkv397
Fish JA, Chai B, Wang Q et al (2013) FunGene: the functional gene pipeline and repository. Front Microbiol 4:1–14. doi:10.3389/fmicb.2013.00291
Gibson MK, Forsberg KJ, Dantas G (2014) Improved annotation of antibiotic resistance determinants reveals microbial resistomes cluster by ecology. ISME J 9:1–10. doi:10.1038/ismej.2014.106
Huson DH, Auch AF, Qi J, Schuster SC (2007) MEGAN analysis of metagenomic data. Genome Res 17:377–386. doi:10.1101/gr.5969107
Imelfort M, Parks D, Woodcroft BJ et al (2014) GroopM: an automated tool for the recovery of population genomes from related metagenomes. PeerJ 2:e603. doi:10.7717/peerj.603
Jensen LJ, Julien P, Kuhn M et al (2008) eggNOG: automated construction and annotation of orthologous groups of genes. Nucleic Acids Res 36:250–254. doi:10.1093/nar/gkm796
Kanehisa M, Goto S, Kawashima S et al (2004) The KEGG resource for deciphering the genome. Nucleic Acids Res 32:D277–D280. doi:10.1093/nar/gkh063
Kang DD, Froula J, Egan R, Wang Z (2015) MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3:e1165
Karlsson FH, Tremaroli V, Nookaew I et al (2013) Gut metagenome in European women with normal, impaired and diabetic glucose control. Nature 498:99–103. doi:10.1038/nature12198
Krause L, Diaz NN, Goesmann A et al (2008) Phylogenetic classification of short environmental DNA fragments. Nucleic Acids Res 36:2230–2239. doi:10.1093/nar/gkn038
Lamendella R, Domingo JWS, Ghosh S et al (2011) Comparative fecal metagenomics unveils unique functional capacity of the swine gut. BMC Microbiol 11:103. doi:10.1186/1471-2180-11-103
Li R, Li Y, Kristiansen K, Wang J (2008) SOAP: short oligonucleotide alignment program. Bioinformatics 24:713–714. doi:10.1093/bioinformatics/btn025
Li Z, Chen Y, Mu D et al (2012) Comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-bruijn-graph. Brief Funct Genomics 11:25–37
Li D, Liu C-M, Luo R et al (2015) MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31(10):1674–1676. doi:10.1093/bioinformatics/btv033
Liu B, Gibbons T, Ghodsi M et al (2011) Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences. BMC Genomics 12:S4. doi:10.1186/1471-2164-12-S2-S4
Lombard V, Golaconda Ramulu H, Drula E et al (2014) The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res 42:490–495. doi:10.1093/nar/gkt1178
Mande SS, Mohammed MH, Ghosh TS (2012) Classification of metagenomic sequences: methods and challenges. Brief Bioinform 13:669–681. doi:10.1093/bib/bbs054
McArthur AG, Waglechner N, Nizam F et al (2013) The comprehensive antibiotic resistance database. Antimicrob Agents Chemother 57:3348–3357. doi:10.1128/AAC.00419-13
McHardy AC, Martín HG, Tsirigos A et al (2007) Accurate phylogenetic classification of variable-length DNA fragments. Nat Methods 4:63–72. doi:10.1038/nmeth976
Meyer F, Paarmann D, D’Souza M et al (2008) The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 9:386. doi:10.1186/1471-2105-9-386
Miller JR, Koren S, Sutton G (2010) Assembly algorithms for next-generation sequencing data. Genomics 95:315–327. doi:10.1016/j.ygeno.2010.03.001
Nagarajan N, Pop M (2013) Sequence assembly demystified. Nat Rev Genet 14:157–167. doi:10.1038/nrg3367
Nordberg H, Cantor M, Dusheyko S et al (2014) The genome portal of the Department of Energy Joint Genome Institute: 2014 updates. Nucleic Acids Res 42:D26–D31. doi:10.1093/nar/gkt1069
Patil KR, Roune L, McHardy AC (2012) The PhyloPythiaS web server for taxonomic assignment of metagenome sequences. PLoS One 7:e38581. doi:10.1371/journal.pone.0038581
Pruitt KD, Tatusova T, Maglott DR (2007) NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35:61–65. doi:10.1093/nar/gkl842
Qin J, Li R, Raes J et al (2010) A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464:59–65
Rosen GL, Reichenberger ER, Rosenfeld AM (2011) NBC: the Naive Bayes Classification tool webserver for taxonomic classification of metagenomic reads. Bioinformatics 27:127–129. doi:10.1093/bioinformatics/btq619
Segata N, Waldron L, Ballarini A et al (2012) Metagenomic microbial community profiling using unique clade-specific marker genes. Nat Methods 9:811–814. doi:10.1038/nmeth.2066
Sharon I, Morowitz MJ, Thomas BC et al (2013) Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization. Genome Res 23:111–120. doi:10.1101/gr.142315.112
Simpson JT, Durbin R (2012) Efficient de novo assembly of large genomes using compressed data structures. Genome Res 22:549–556
Tatusov RL, Fedorova ND, Jackson JD et al (2003) The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4:41. doi:10.1186/1471-2105-4-41
Tatusova T, Ciufo S, Fedorov B et al (2014) RefSeq microbial genomes database: new representation and annotation strategy. Nucleic Acids Res 42:5000. doi:10.1093/nar/gkt1274
Tatusova T, Ciufo S, Federhen S et al (2015) Update on RefSeq microbial genomes resources. Nucleic Acids Res 43:D599–D605. doi:10.1093/nar/gku1062
Teeling H, Waldmann J, Lombardot T et al (2004) TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. BMC Bioinformatics 5:163
The UniProt Consortium (2014) UniProt: a hub for protein information. Nucleic Acids Res 43:D204–D212. doi:10.1093/nar/gku989
Wang Q, Garrity GM, Tiedje JM, Cole JR (2007) Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol 73:5261–5267. doi:10.1128/AEM.00062-07
Wrighton KC, Thomas BC, Sharon I et al (2012) Fermentation, hydrogen, and sulfur metabolism in multiple uncultivated bacterial phyla. Science 337(6102):1661–1665. doi:10.1126/science.1224041
Wu M, Eisen J (2008) A simple, fast, and accurate method of phylogenomic inference. Genome Biol 9:R151. doi:10.1186/gb-2008-9-10-r151
Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–829. doi:10.1101/gr.074492.107
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Howe, A., Yang, F., Zhang, Q. (2017). Metagenome Assembly and Functional Annotation. In: Charles, T., Liles, M., Sessitsch, A. (eds) Functional Metagenomics: Tools and Applications. Springer, Cham. https://doi.org/10.1007/978-3-319-61510-3_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-61510-3_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-61508-0
Online ISBN: 978-3-319-61510-3
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)