Abstract
Metagenomics is the study of microbiomes using DNA sequencing technologies. Basic computational tasks are to determine the taxonomic composition (who is out there?), the functional composition (what can they do?), and also to correlate changes of composition to changes in external parameters (how do they compare?). One approach to address these issues is to first align all sequences against a protein reference database such as NCBI-nr and to then perform taxonomic and functional binning of all sequences based on their alignments. The resulting classifications can then be interactively analyzed and compared. Here we illustrate how to pursue this approach using the DIAMOND+MEGAN pipeline, on two different publicly available datasets, one containing short-read samples and other containing long-read samples.
Authors Anupam Gautam and Wenhuan Zeng equally contributed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Berg G, Rybakova D, Fischer D, Cernava T, Vergès MCC, Charles T, Chen X, Cocolin L, Eversole K, Corral GH, Kazou M (2020) Microbiome definition re-visited: old concepts and new challenges. Microbiome 8(1):1–22
Zeng W, Gautam A, Huson DH (2022) DeepToA: an ensemble deep-learning approach to predicting the theater of activity of a microbiome. Bioinformatics, btac584
Chaudhari NM, Gautam A, Gupta VK, Kaur G, Dutta C, Paul S (2018) PanGFR-HM: a dynamic web resource for pan-genomic and functional profiling of human microbiome with comparative features. Front Microbiol 9:2322
Handelsman J, Rondon MR, Brady SF, Clardy J, Goodman RM (1998) Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products. Chem Biol 5(10):R245–R249
Pace NR, Stahl DA, Lane DJ, Olsen GJ (1986) The analysis of natural microbial populations by ribosomal RNA sequences. In: Advances in microbial ecology, vol 9. Springer, Berlin
Bentley DR et al (2008) Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456(7218):53–59
Branton D, Deamer DW, Marziali A, Bayley H, Benner SA, Butler T, Di Ventra M, Garaj S, Hibbs A, Huang X, Jovanovich SB (2008). The potential and challenges of nanopore sequencing. Nanosci Technol Nat Biotechnol 26:1146–1153
Jain M, Olsen HE, Paten B, Akeson M (2016) The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol 17(1):1–11
Rhoads A, Au KF (2015) PacBio sequencing and its applications. Genomics Proteomics Bioinformatics 13(5):278–289
Huson DH, Auch AF, Qi J, Schuster SC (2007) MEGAN analysis of metagenomic data. Genome Res 17(3):377–386
Huson DH, Mitra S, Ruscheweyh HJ, Weber N, Schuster SC (2011) Integrative analysis of environmental sequences using MEGAN4. Genome Res 21(9):1552–1560
Glass EM, Wilkening J, Wilke A, Antonopoulos D, Meyer F (2010) Using the metagenomics RAST server (MG-RAST) for analyzing shotgun metagenomes. Cold Spring Harb Protoc 2010(1):pdb-prot5368
Kunin V, Copeland A, Lapidus A, Mavromatis K, Hugenholtz P (2008) A bioinformatician’s guide to metagenomics. Microbiol Mol Biol Rev 72(4):557–578
Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL (2005) GenBank. Nucleic Acids Res 13(1) 33(suppl_1):D34-D38
Buchfink B, Xie C, Huson DH (2015) Fast and sensitive protein alignment using DIAMOND. Nat Methods 13(1) 12(1):59–60
Segata N, Waldron L, Ballarini A, Narasimhan V, Jousson O, Huttenhower C (2012) Metagenomic microbial community profiling using unique clade-specific marker genes. Nat Methods 9(8):811–814
Arumugam K, Bağcı C, Bessarab I, Beier S, Buchfink B, Gorska A, Qiu G, Huson DH, Williams RB (2019) Annotated bacterial chromosomes from frame-shift-corrected long-read metagenomic data. Microbiome 7(1):1–13
Huson DH, Albrecht B, Bağcı C, Bessarab I, Gorska A, Jolic D, Williams RB (2018) MEGAN-LR: new algorithms allow accurate binning and easy interactive exploration of metagenomic long reads and contigs. Biol Direct 13(1):1–17
Boisvert S, Raymond F, Godzaridis É, Laviolette F, Corbeil J (2012) Ray Meta: scalable de novo metagenome assembly and profiling. Genome Biol 13(12):1–13
Delforno TP, Lacerda Jr GV, Sierra-Garcia IN, Okada DY, Macedo TZ, Varesche MBA, Oliveira VM (2017) Metagenomic analysis of the microbiome in three different bioreactor configurations applied to commercial laundry wastewater treatment. Sci Total Environ 587:389–398
Wilke A, Bischof J, Harrison T, Brettin T, D’Souza M, Gerlach W, Matthews H, Paczian T, Wilkening J, Glass EM, Desai N (2015) A RESTful API for accessing microbial community data for MG-RAST. PLoS Comput Biol 11(1):e1004008
Andrews S (2010) FastQC: a quality control tool for high throughput sequence data. Available online at: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
Ewels P, Magnusson M, Lundin S, Käller M (2016) MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32(19):3047–3048
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30(15):2114–2120
Liem M, Regensburg-Tuïnk T, Henkel C, Jansen H, Spaink H (2021) Microbial diversity characterization of seawater in a pilot study using Oxford Nanopore Technologies long-read sequencing. BMC Res Notes 14(1):1–7
Kolmogorov M, Yuan J, Lin Y, Pevzner PA (2019) Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol 37(5):540–546
Vaser R, Sović I, Nagarajan N, Šikić M (2017) Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res 27(5):737–746
Huson DH, Beier S, Flade I, Górska A, El-Hadidi M, Mitra S, Ruscheweyh HJ, Tappu R (2016) MEGAN community edition-interactive exploration and analysis of large-scale microbiome sequencing data. PLoS Comput Biol 12(6):e1004957
Poinar HN, Schwarz C, Qi J, Shapiro B, Macphee RD, Buigues B, Tikhonov A, Huson DH, Tomsho LP, Auch A, Rampp M, Miller W, Schuster SC (2006) Metagenomics to paleogenomics: large-scale sequencing of mammoth DNA. Science 311(5759):392–394
Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M (1999) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 27(1):29–34
Gish W, States DJ (1993) Identification of protein coding regions by database similarity search. Nat Genet 3(3):266–272
Federhen S (2012) The NCBI taxonomy database. Nucleic Acids Res 40(D1):D136–D143
Parks DH, Chuvochina M, Chaumeil PA, Rinke C, Mussig AJ, Hugenholtz P (2020) A complete domain-to-species taxonomy for Bacteria and Archaea. Nat Biotechnol 38(9):1079–1086
Powell S, Szklarczyk D, Trachana K, Roth A, Kuhn M, Muller J, Arnold R, Rattei T, Letunic I, Doerks T, Jensen LJ (2012) eggNOG v3. 0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges. Nucleic Acids Res 40(D1):D284–D289
Mitchell A, Chang HY, Daugherty L, Fraser M, Hunter S, Lopez R, McAnulla C, McMenamin C, Nuka G, Pesseat S, Sangrador-Vegas A (2015) The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Res 43(D1):D213–D221
Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, Edwards RA, Gerdes S, Parrello B, Shukla M, Vonstein V (2014) The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res 42(D1):D206–D214
Wattam AR, Abraham D, Dalay O, Disz TL, Driscoll T, Gabbard JL, Gillespie JJ, Gough R, Hix D, Kenyon R, Machi D (2014) PATRIC, the bacterial bioinformatics database and analysis resource. Nucleic Acids Res 42(D1):D581–D591
Webb EC (1992) Enzyme nomenclature 1992. Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the nomenclature and classification of enzymes (No. Ed. 6). Academic Press, Cambridge
Huson DH, Tappu R, Bazinet AL et al (2017) Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads. Microbiome 5:11
Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW (2015) CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25(7):1043–1055
Seemann T (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics 30(14):2068–2069
Mendler K, Chen H, Parks DH, Lobb B, Hug LA, Doxey AC (2019) AnnoTree: visualization and exploration of a functionally annotated microbial tree of life. Nucleic Acids Res 47(9):4442–4448
Gautam A, Felderhoff H, Bağci C, Huson DH (2022) Using AnnoTree to get more assignments, faster, in DIAMOND+MEGAN microbiome analysis. mSystems 7(1):e01408–e01421
Gautam A, Zeng W, Huson DH (2023) MeganServer: facilitating interactive access to metagenomic data on a server, to appear in: Bioinformatics https://doi.org/10.1093/bioinformatics/btad105
Acknowledgements
The authors acknowledge hardware support by the High Performance and Cloud Computing Group at the Zentrum für Datenverarbeitung of the University of Tübingen, the state of Baden-Württemberg through bwHPC, and the German Research Foundation (DFG) through grant no. INST 37/935-1 FUGG. We would also like to acknowledge Marius Eisele for helping us with the long-read datasets.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Gautam, A., Zeng, W., Huson, D.H. (2023). DIAMOND + MEGAN Microbiome Analysis. In: Mitra, S. (eds) Metagenomic Data Analysis. Methods in Molecular Biology, vol 2649. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-3072-3_6
Download citation
DOI: https://doi.org/10.1007/978-1-0716-3072-3_6
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-3071-6
Online ISBN: 978-1-0716-3072-3
eBook Packages: Springer Protocols