Abstract
The tremendous progress in next-generation sequencing (NGS) technology has brought an avalanche of sequence-based data. This huge volume of data has resulted in novel challenges for existing bioinformatics tools in terms of data handling and subsequent analyses. Additionally, complexity of such data makes the task of analysis of metagenomic datasets more complicated for available bioinformatics pipelines. Here we are dealing with various bioinformatics tools, available online for analysis of WGS-based metagenome datasets, and simultaneously comparing their analysis pipelines. In the last one decade, over a dozen of such online tools/servers have been developed which are accessible via public domain. IMG/M and MG-RAST are two of the most popular tools as per the number of citations they received in peer-reviewed scientific journals till December 2016. This chapter discusses and compares 11 online bioinformatics tools detailing their sequence data handling, pipelines for annotation, sequence clustering methods, user-friendly attributes, and feasibility of data repository.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Angly FE, Willner D, Prieto-Davó A, Edwards RA, Schmieder R, Vega-Thurber R, Antonopoulos DA, Barott K, Cottrell MT, Desnues C, Dinsdale EA (2009) The GAAS metagenomic tool and its estimations of viral and microbial average genome size in four major biomes. PLoS Comput Biol 5(12):e1000593. https://doi.org/10.1371/journal.pcbi.1000593
Behnam E, Smith AD (2014) The Amordad database engine for metagenomics. Bioinformatics 30:2949–2955. https://doi.org/10.1093/bioinformatics/btu405
Buchfink B, Xie C, Huson DH (2015) Fast and sensitive protein alignment using DIAMOND. Nat Methods 12(1):59–60. https://doi.org/10.1038/nmeth.3176
Carlos N, Tang YW, Pei Z (2012) Pearls and pitfalls of genomics-based microbiome analysis. Emerg Microbes Infect 1:e45. https://doi.org/10.1038/emi.2012.41
Chao Y, Ma L, Yang Y, Ju F, Zhang XX, Wu WM, Zhang T (2013) Metagenomic analysis reveals significant changes of microbial compositions and protective functions during drinking water treatment. Sci Rep 19:3. https://doi.org/10.1038/srep03550
Chen IM, Markowitz VM, Chu K, Palaniappan K, Szeto E, Pillay M, Ratner A, Huang J, Andersen E, Huntemann M, Varghese N (2016) IMG/M: integrated genome and metagenome comparative data analysis system. Nucleic Acids Res. https://doi.org/10.1093/nar/gkw929
Chivian D, Dehal PS, Keller K, Arkin AP (2013) MetaMicrobesOnline: phylogenomic analysis of microbial communities. Nucleic Acids Res 41:D648–D654. https://doi.org/10.1093/nar/gks1202
Cox MP, Peterson DA, Biggs PJ (2010) SolexaQA: at-a-glance quality assessment of Illumina second-generation sequencing data. BMC Bioinformatics 11(1):485. https://doi.org/10.1186/1471-2105-11-485
Dafale N, Agrawal L, Kapley A, Meshram S, Purohit H, Wate S (2010) Selection of indicator bacteria based on screening of 16S rDNA metagenomic library from a two-stage anoxic–oxic bioreactor system degrading azo dyes. Bioresour Technol 101(2):476–484. https://doi.org/10.1016/j.biortech
Dehal PS, Joachimiak MP, Price MN, Bates JT, Baumohl JK, Chivian D, Friedland GD, Huang KH, Keller K, Novichkov PS, Dubchak IL (2010) MicrobesOnline: an integrated portal for comparative and functional genomics. Nucleic Acids Res 38(suppl 1):D396–D400. https://doi.org/10.1093/nar/gkp919
Dubey SK, Padmanabhan P (2003) Tracking of methanotrophs and their diversity in paddy soil: a molecular. Curr Sci 85(1):93
Dudhagara P, Bhavsar S, Bhagat C, Ghelani A, Bhatt S, Patel R (2015a) Web resources for metagenomics studies. Genomics Proteomics Bioinformatics 13(5):296–303. https://doi.org/10.1016/j.gpb.2015.10.003
Dudhagara P, Ghelani A, Bhavsar S, Bhatt S (2015b) Metagenomic data of fungal internal transcribed Spacer and 18S rRNA gene sequences from Lonar lake sediment, India. Data Brief 4:266–268. https://doi.org/10.1016/j.dib.2015.06.001
Eddy SR (2011) Accelerated profile HMM searches. PLoS Comput Biol 7(10):e1002195. https://doi.org/10.1371/journal.pcbi.1002195
Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26(19):2460–2461. https://doi.org/10.1093/bioinformatics/btq461
Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, Sonnhammer EL (2013) Pfam: the protein families database. Nucleic Acids Res. https://doi.org/10.1093/nar/gkt1223
Ghelani A, Patel R, Mangrola A, Dudhagara P (2015) Cultivation independent comprehensive survey of bacterial diversity in Tulsi Shyam Hot Springs, India. Genom Data 4:54–56. https://doi.org/10.1016/j.gdata.2015.03.003
Goll J, Rusch DB, Tanenbaum DM, Thiagarajan M, Li K, Methé BA, Yooseph S (2010) METAREP: JCVI metagenomics reports—an open source tool for high-performance comparative metagenomics. Bioinformatics 26(20):2631–2632. https://doi.org/10.1093/bioinformatics/btq455
Goll J, Thiagarajan M, Abubucker S, Huttenhower C, Yooseph S, Methé BA (2012) A case study for large-scale human microbiome analysis using JCVI’s metagenomics reports (METAREP). PLoS One 7:e29044. https://doi.org/10.1371/journal.pone.0029044
Gulhane M, Pandit P, Khardenavis A, Singh D, Purohit H (2017) Study of microbial community plasticity for anaerobic digestion of vegetable waste in Anaerobic Baffled Reactor. Renew Energy 101:59–66. https://doi.org/10.1016/j.renene.2016.08.021
Handelsman J, Rondon MR, Brady SF, Clardy J, Goodman RM (1998) Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products. Chem Biol 5:R245–R249. https://doi.org/10.1016/S1074-5521(98)90108-9
Hasan NA, Young BA, Minard-Smith AT, Saeed K, Li H, Heizer EM, McMillan NJ, Isom R, Abdullah AS, Bornman DM, Faith SA (2014) Microbial community profiling of human saliva using shotgun metagenomic sequencing. PLoS One 9(5):e97699. https://doi.org/10.1371/journal.pone.0097699
Hunter S, Corbett M, Denise H, Fraser M, Gonzalez-Beltran A, Hunter C, Jones P, Leinonen R, McAnulla C, Maguire E, Maslen J (2014) EBI metagenomics—a new resource for the analysis and archiving of metagenomic data. Nucleic Acids Res 42(D1):D600–D606. https://doi.org/10.1093/nar/gkt961
Huson DH, Auch AF, Qi J, Schuster SC (2007) MEGAN analysis of metagenomic data. Genome Res 17(3):377–386. https://doi.org/10.1101/gr.5969107
Huson DH, Mitra S, Ruscheweyh HJ, Weber N, Schuster SC (2011) Integrative analysis of environmental sequences using MEGAN4. Genome Res 21(9):1552–1560. https://doi.org/10.1101/gr.120618.111
Huson DH, Beier S, Flade I, Górska A, El-Hadidi M, Mitra S, Ruscheweyh HJ, Tappu R (2016) MEGAN community edition-interactive exploration and analysis of large-scale microbiome sequencing data. PLoS Comput Biol 12(6):e1004957. https://doi.org/10.1371/journal.pcbi.1004957
Jadeja NB, More RP, Purohit HJ, Kapley A (2014) Metagenomic analysis of oxygenases from activated sludge. Bioresour Technol 165:250–256. https://doi.org/10.1016/j.biortech.2014.02.045
Kanehisa M, Goto S (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28(1):27–30. https://doi.org/10.1093/nar/28.1.27
Kapley A, Liu R, Jadeja NB, Zhang Y, Yang M, Purohit HJ (2015) Shifts in microbial community and its correlation with degradative efficiency in a wastewater treatment plant. Appl Biochem Biotechnol 176(8):2131–2143. https://doi.org/10.1007/s12010-015-1703-2
Keegan KP, Trimble WL, Wilkening J, Wilke A, Harrison T, D’Souza M, Meyer F (2012) A platform-independent method for detecting errors in metagenomic sequencing data: DRISEE. PLoS Comput Biol 8(6):e1002541. https://doi.org/10.1371/journal.pcbi.1002541
Kröber M, Bekel T, Diaz NN, Goesmann A, Jaenicke S, Krause L, Miller D, Runte KJ, Viehöver P, Pühler A, Schlüter A (2009) Phylogenetic characterization of a biogas plant microbial community integrating clone library 16S-rDNA sequences and metagenome sequence data obtained by 454-pyrosequencing. J Biotechnol 142(1):38–49. https://doi.org/10.1016/j.jbiotec.2009.02.010
Lingner T, Asshauer KP, Schreiber F, Meinicke P (2011) CoMet – a web server for comparative functional profiling of metagenomes. Nucleic Acids Res 39:W518–W523. https://doi.org/10.1093/nar/gkr388
Luo C, Rodriguez-R LM, Konstantinidis KT (2014) MyTaxa: an advanced taxonomic classifier for genomic and metagenomic sequences. Nucleic Acids Res 42:e73. https://doi.org/10.1093/nar/gku169
Mangrola A, Dudhagara P, Koringa P, Joshi CG, Parmar M, Patel R (2015) Deciphering the microbiota of Tuwa hot spring, India using shotgun metagenomic sequencing approach. Genom Data 4:153–155. https://doi.org/10.1016/j.gdata.2015.04.014
Markowitz VM, Ivanova NN, Szeto E, Palaniappan K, Chu K, Dalevi D, Chen IM, Grechkin Y, Dubchak I, Anderson I, Lykidis A (2008) IMG/M: a data management and analysis system for metagenomes. Nucleic Acids Res 36(suppl 1):D534–D538. https://doi.org/10.1093/nar/gkm869
Markowitz VM, Chen IM, Chu K, Szeto E, Palaniappan K, Grechkin Y, Ratner A, Jacob B, Pati A, Huntemann M, Liolios K (2012a) IMG/M: the integrated metagenome data management and comparative analysis system. Nucleic Acids Res 40(D1):D123–D129. https://doi.org/10.1093/nar/gkr975
Markowitz VM, Chen IM, Chu K, Szeto E, Palaniappan K, Jacob B, Ratner A, Liolios K, Pagani I, Huntemann M, Mavromatis K (2012b) IMG/M-HMP: a metagenome comparative analysis system for the Human Microbiome Project. PLoS One 7(7):e40151. https://doi.org/10.1371/journal.pone.0040151
Markowitz VM, Chen IM, Palaniappan K, Chu K, Szeto E, Grechkin Y, Ratner A, Jacob B, Huang J, Williams P, Huntemann M (2012c) IMG: the integrated microbial genomes database and comparative analysis system. Nucleic Acids Res 40(D1):D115–D122. https://doi.org/10.1093/nar/gkr1044
Markowitz VM, Chen IM, Chu K, Szeto E, Palaniappan K, Pillay M, Ratner A, Huang J, Pagani I, Tringe S, Huntemann M (2014) IMG/M 4 version of the integrated metagenome comparative analysis system. Nucleic Acids Res 42(D1):D568–D573. https://doi.org/10.1093/nar/gkt919
Mason OU, Scott NM, Gonzalez A, Robbins-Pianka A, Bælum J, Kimbrel J, Bouskill NJ, Prestat E, Borglin S, Joyner DC, Fortney JL (2014) Metagenomics reveals sediment microbial community response to Deepwater Horizon oil spill. ISME J 8(7):1464–1475. https://doi.org/10.1038/ismej.2013.254
Meyer F, Paarmann D, D’Souza M, Olson R, Glass EM, Kubal M, Paczian T, Rodriguez A, Stevens R, Wilke A, Wilkening J (2008) The metagenomics RAST server–a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 9(1):386. https://doi.org/10.1186/1471-2105-9-386
Mitchell A, Chang HY, Daugherty L, Fraser M, Hunter S, Lopez R, McAnulla C, McMenamin C, Nuka G, Pesseat S, Sangrador-Vegas A (2014) The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Res. https://doi.org/10.1093/nar/gku1243
Moharikar A, Purohit HJ, Kumar R (2005) Microbial population dynamics at effluent treatment plants. J Environ Monit 7(6):552–558. https://doi.org/10.1039/B406576J
Moharikar A, Kapley A, Purohit HJ (2003) Detection of dioxygenase genes present in various activated sludge. Environ Sci Pollut Res 10(6):373–378. https://doi.org/10.1065/espr2003.07.164
More RP, Mitra S, Raju SC, Kapley A, Purohit HJ (2014) Mining and assessment of catabolic pathways in the metagenome of a common effluent treatment plant to induce the degradative capacity of biomass. Bioresour Technol 153:137–146. https://doi.org/10.1016/j.biortech.2013.11.065
Oulas A, Pavloudi C, Polymenakou P, Pavlopoulos GA, Papanikolaou N, Kotoulas G, Arvanitidis C, Iliopoulos I (2015) Metagenomics: tools and insights for analyzing next-generation sequencing data derived from biodiversity studies. Bioinf Biol Insights 9:75. https://doi.org/10.4137/BBI.S12462
Ounit R, Wanamaker S, Close TJ, Lonardi S (2015) CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genomics 16:236. https://doi.org/10.1186/s12864-015-1419-2
Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, Edwards RA, Gerdes S, Parrello B, Shukla M, Vonstein V (2014) The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res 42(D1):D206–D214. https://doi.org/10.1093/nar/gkt1226
Pal RR, Khardenavis AA, Purohit HJ (2015) Identification and monitoring of nitrification and denitrification genes in Klebsiella pneumoniae EGD-HP19-C for its ability to perform heterotrophic nitrification and aerobic denitrification. Funct Integr Genomics 15(1):63–76. https://doi.org/10.1007/s10142-014-0406-z
Pandit AS, Joshi MN, Bhargava P, Ayachit GN, Shaikh IM, Saiyed ZM, Saxena AK, Bagatharia SB (2014) Metagenomes from the saline desert of Kutch. Genome Announc 2(3):e00439-14. https://doi.org/10.1128/genomeA.00439-14
Pandit PD, Gulhane MK, Khardenavis AA, Purohit HJ (2016) Mining of hemicellulose and lignin degrading genes from differentially enriched methane producing microbial community. Bioresour Technol 216:923–930. https://doi.org/10.1016/j.biortech.2016.06.021
Powell S, Szklarczyk D, Trachana K, Roth A, Kuhn M, Muller J, Arnold R, Rattei T, Letunic I, Doerks T, Jensen LJ (2012) eggNOG v3. 0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges. Nucleic Acids Res 40(D1):D284–D289. https://doi.org/10.1093/nar/gkr1060
Price MN, Dehal PS, Arkin AP (2010) FastTree 2–approximately maximum-likelihood trees for large alignments. PLoS One 5(3):e9490. https://doi.org/10.1371/journal.pone.0009490
Puranik S, Pal RR, More RP, Purohit HJ (2016) Metagenomic approach to characterize soil microbial diversity of Phumdi at Loktak Lake. Water Sci Technol 74(9):2075–2086. https://doi.org/10.2166/wst.2016.370
Purohit HJ, Kapley A, Moharikar AA, Narde G (2003) A novel approach for extraction of PCR-compatible DNA from activated sludge samples collected from different biological effluent treatment plants. J Microbiol Methods 52(3):315–323. https://doi.org/10.1016/S0167-7012(02)00185-9
Rho M, Tang H, Ye Y (2010) FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res 38(20):e191. https://doi.org/10.1093/nar/gkq747
Roux S, Faubladier M, Mahul A, Paulhe N, Bernard A, Debroas D, Enault F (2011) Metavir: a web server dedicated to virome analysis. Bioinformatics 27(21):3074–3075. https://doi.org/10.1093/bioinformatics/btr519
Roux S, Tournayre J, Mahul A, Debroas D, Enault F (2014) Metavir 2: new tools for viral metagenome comparison and assembled virome analysis. BMC Bioinformatics 15(1):1. https://doi.org/10.1186/1471-2105-15-76
Selengut JD, Haft DH, Davidsen T, Ganapathy A, Gwinn-Giglio M, Nelson WC, Richter AR, White O (2007) TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes. Nucleic Acids Res 35(suppl 1):D260–D264. https://doi.org/10.1093/nar/gkl1043
Seshadri R, Kravitz SA, Smarr L, Gilna P, Frazier M (2007) CAMERA: a community resource for metagenomics. PLoS Biol 5:e75. https://doi.org/10.1371/journal.pbio.0050075
Sharma S, Vakhlu J (2014) Metagenomics as advanced screening methods for novel microbial metabolite. Microb Biotechnol Prog Trends 7:43–62. https://doi.org/10.1201/b17587-4
Sharma N, Tanksale H, Kapley A, Purohit HJ (2012) Mining the metagenome of activated biomass of an industrial wastewater treatment plant by a novel method. Indian J Microbiol 52(4):538–543. https://doi.org/10.1007/s12088-012-0263-1
Staden R (1979) A strategy of DNA sequencing employing computer programs. Nucleic Acids Res 6(7):2601–2610. https://doi.org/10.1093/nar/6.7.2601
Su CH, Hsu MT, Chiang S, Cheng JH, Weng FC, Wang D, Tsai HK (2011) MetaABC—an integrated metagenomics platform for data adjustment, binning and clustering. Bioinformatics 27(16):2298–2299. https://doi.org/10.1093/bioinformatics/btr376
Sun S, Chen J, Li W, Altinatas I, Lin A, Peltier S, Stocks K, Allen EE, Ellisman M, Grethe J, Wooley J (2010) Community cyberinfrastructure for advanced microbial ecology research and analysis: the CAMERA resource. Nucleic Acids Res. https://doi.org/10.1093/nar/gkq1102
Sun Q, Liu L, Wu L, Li W, Liu Q, Zhang J, Liu D, Ma J (2015) Web resources for microbial data. Genomics Proteomics Bioinformatics 13(1):69–72. https://doi.org/10.1016/j.gpb.2015.01.008
Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram UT, Rao BS, Kiryutin B, Galperin MY, Fedorova ND, Koonin EV (2001) The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res 29(1):22–28. https://doi.org/10.1093/nar/29.1.22
Thomas T, Gilbert J, Meyer F (2012) Metagenomics – a guide from sampling to data analysis. Microb Inform Exp 2:3. https://doi.org/10.1186/2042-5783-2-3
Tikariha H, Pal RR, Qureshi A, Kapley A, Purohit HJ (2016) In silico analysis for prediction of degradative capacity of Pseudomonas putida SF1. Gene 591(2):382–392. https://doi.org/10.1016/j.gene.2016.06.028
Wilke A, Harrison T, Wilkening J, Field D, Glass EM, Kyrpides N, Mavrommatis K, Meyer F (2012) The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools. BMC Bioinformatics 13(1):141. https://doi.org/10.1186/1471-2105-13-141
Wommack KE, Bhavsar J, Polson SW, Chen J, Dumas M, Srinivasiah S, Furman M, Jamindar S, Nasko DJ (2012) VIROME: a standard operating procedure for analysis of viral metagenome sequences. Stand Genomic Sci 6(3):421. https://doi.org/10.4056/sigs.2945050
Yadav TC, Pal RR, Shastry S, Jadeja NB, Kapley A (2015) Comparative metagenomics demonstrating different degradative capacity of activated biomass treating hydrocarbon contaminated wastewater. Bioresour Technol 188:24–32. https://doi.org/10.1016/j.biortech.2015.01.141
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Pal, R.R., More, R.P., Purohit, H.J. (2018). Bioinformatics Tools for Shotgun Metagenomic Data Analysis. In: Purohit, H., Kalia, V., More, R. (eds) Soft Computing for Biological Systems. Springer, Singapore. https://doi.org/10.1007/978-981-10-7455-4_6
Download citation
DOI: https://doi.org/10.1007/978-981-10-7455-4_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7454-7
Online ISBN: 978-981-10-7455-4
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)