Abstract
Microbiome samples are accumulating at a very fast speed, representing microbial communities from every niche (biome) of our body as well as the environment. The fast-growing amount of microbiome samples, as well as the diversified sources from where the samples are collected, have provided us with an unprecedented scene from where we could obtain a better understanding of the microbial evolution and ecology. While all of these represent profound biological patterns and regulation principles, the understanding of them is heavily dependent on data integration and big-data mining, including the data-driven microbiome marker identification, non-linear relationship mining, dynamic pattern discovery, regulation principle discovery, etc.
In this chapter, we first introduce several terminologies in microbiome research, followed by the introduction of microbiome big-data. Then we emphasize the microbiome databases, as well as mainstream microbiome data mining techniques. We have provided several microbiome applications to showcase the power of microbiome big-data integration and mining for knowledge and clinical applications. Finally, we have summarized the current status of microbiome big-data analysis, pointed out several bottlenecks, and illustrated prospects in this research area.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Backhed F et al (2015) Dynamics and stabilization of the human gut microbiome during the first year of life. Cell Host Microbe 17(6):852
Bashan A et al (2016) Universality of human microbial dynamics. Nature 534(7606):259
Becker SA et al (2007) Quantitative prediction of cellular metabolism with constraint-based models: the COBRA toolbox. Nat Protoc 2(3):727–738
Biteen JS et al (2016) Tools for the microbiome: nano and beyond. ACS Nano 10(1):6–37
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30(15):2114–2120
Bolyen E et al (2019) Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat Biotechnol 37(8):852–857
Caporaso JG et al (2010) QIIME allows analysis of high-throughput community sequencing data. Nat Methods 7(5):335–336
Cheng M, Cao L, Ning K (2019) Microbiome big-data mining and applications using single-cell technologies and metagenomics approaches toward precision medicine. Front Genet 10:972
Clemente JC et al (2012) The impact of the gut microbiota on human health: an integrative view. Cell 148(6):1258–1270
Conway KR, Boddy CN (2013) ClusterMine360: a database of microbial PKS/NRPS biosynthesis. Nucleic Acids Res 41(Database issue):D402–D407
Costea PI et al (2018) Enterotypes in the landscape of gut microbial community composition. Nat Microbiol 3(1):8–16
Costello EK et al (2009) Bacterial community variation in human body habitats across space and time. Science 326(5960):1694–1697
Daniel R (2004) The soil metagenome – a rich resource for the discovery of novel natural products. Curr Opin Biotechnol 15(3):199–204
Dewhirst FE et al (2010) The human oral microbiome. J Bacteriol 192(19):5002–5017
Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26(19):2460–2461
Franzosa EA et al (2018) Species-level functional profiling of metagenomes and metatranscriptomes. Nat Methods 15(11):962–968
Fredricks DN (2001) Microbial ecology of human skin in health and disease. J Investig Dermatol Symp Proc 6(3):167–169
Fu L et al (2012) CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28(23):3150–3152
Gerlach W, Stoye J (2011) Taxonomic classification of metagenomic shotgun sequences with CARMA3. Nucleic Acids Res 39(14):e91
Glass EM et al (2010) Using the metagenomics RAST server (MG-RAST) for analyzing shotgun metagenomes. Cold Spring Harb Protoc 2010(1):pdb.prot5368
Gonzalez A et al (2018) Qiita: rapid, web-enabled microbiome meta-analysis. Nat Methods 15(10):796–798
Grice EA et al (2009) Topographical and temporal diversity of the human skin microbiome. Science 324(5931):1190–1192
Guo J et al (2017) Metagenomic analysis reveals wastewater treatment plants as hotspots of antibiotic resistance genes and mobile genetic elements. Water Res 123:468–478
Hadjithomas M et al (2015) IMG-ABC: a knowledge base to fuel discovery of biosynthetic gene clusters and novel secondary metabolites. MBio 6(4):e00932
Halfvarson J et al (2017) Dynamics of the human gut microbiome in inflammatory bowel disease. Nat Microbiol 2:17004
Hamady M, Knight R (2009) Microbial community profiling for human microbiome projects: tools, techniques, and challenges. Genome Res 19(7):1141–1152
Hamady M, Lozupone C, Knight R (2010) Fast UniFrac: facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data. ISME J 4(1):17–27
Han M et al (2020) Stratification of athletes’ gut microbiota: the multifaceted hubs associated with dietary factors, physical characteristics and performance. Gut Microbes 12(1):1–18
Huson DH et al (2007) MEGAN analysis of metagenomic data. Genome Res 17(3):377–386
Ichikawa N et al (2013) DoBISCUIT: a database of secondary metabolite biosynthetic gene clusters. Nucleic Acids Res 41(Database issue):D408–D414
Integrative HMP (iHMP) Research Network Consortium (2014) The Integrative Human Microbiome Project: dynamic analysis of microbiome-host omics profiles during periods of human health and disease. Cell Host Microbe 16(3):276–289
Integrative HMP (iHMP) Research Network Consortium (2019) The Integrative Human Microbiome Project. Nature 569(7758):641–648
(2019) After the Integrative Human Microbiome Project, what’s next for the microbiome community? Nature 569(7758):599
Keegan KP, Glass EM, Meyer F (2016) MG-RAST, a metagenomics service for analysis of microbial community structure and function. Methods Mol Biol 1399:207–233
Knight R et al (2018) Best practices for analysing microbiomes. Nat Rev Microbiol 16(7):410–422
Knights D et al (2011) Bayesian community-wide culture-independent microbial source tracking. Nat Methods 8(9):761–763
Kodama Y et al (2012) The sequence read archive: explosive growth of sequencing data. Nucleic Acids Res 40(Database issue):D54–D56
Koren O et al (2011) Human oral, gut, and plaque microbiota in patients with atherosclerosis. Proc Natl Acad Sci U S A 108(suppl 1):4592–4598
Kultima JR et al (2016) MOCAT2: a metagenomic assembly, annotation and profiling framework. Bioinformatics 32(16):2520–2523
Lan K et al (2018) A survey of data mining and deep learning in bioinformatics. J Med Syst 42(8):139
Langille MG et al (2013) Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nat Biotechnol 31(9):814–821
Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9(4):357–359
Li D et al (2015) MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31(10):1674–1676
Li Y et al (2019) Deep learning in bioinformatics: introduction, application, and perspective in the big data era. Methods 166:4–21
Liu H et al (2019) Resilience of human gut microbial communities for the long stay with multiple dietary shifts. Gut 68(12):2254–2255
Lozupone C, Knight R (2005) UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol 71(12):8228–8235
Luo C et al (2015) ConStrains identifies microbial strains in metagenomic datasets. Nat Biotechnol 33(10):1045–1052
Markowitz VM et al (2008) IMG/M: a data management and analysis system for metagenomes. Nucleic Acids Res 36(Database issue):D534–D538
Mason OU et al (2014) Metagenomics reveals sediment microbial community response to deepwater horizon oil spill. ISME J 8(7):1464–1475
McHardy AC et al (2007) Accurate phylogenetic classification of variable-length DNA fragments. Nat Methods 4(1):63–72
Medema MH et al (2011) antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences. Nucleic Acids Res 39(Web Server issue):W339–W346
Merelli I, Viti F, Milanesi L (2012) IBDsite: a galaxy-interacting, integrative database for supporting inflammatory bowel disease high throughput data analysis. BMC Bioinformatics 13(suppl 14):S5
Meyer F et al (2008) The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 9:386
Meyer F et al (2019) MG-RAST version 4-lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis. Brief Bioinform 20(4):1151–1159
(2014) Microbiota meet big data. Nat Chem Biol 10(8):605
Mikheenko A, Saveliev V, Gurevich A (2016) MetaQUAST: evaluation of metagenome assemblies. Bioinformatics 32(7):1088–1090
Min S, Lee B, Yoon S (2017) Deep learning in bioinformatics. Brief Bioinform 18(5):851–869
Mitchell AL et al (2020) MGnify: the microbiome analysis resource in 2020. Nucleic Acids Res 48(D1):D570–D578
Monzoorul Haque M et al (2009) SOrt-ITEMS: sequence orthology based approach for improved taxonomic estimation of metagenomic sequences. Bioinformatics 25(14):1722–1730
Nurk S et al (2017) metaSPAdes: a new versatile metagenomic assembler. Genome Res 27(5):824–834
Paczian T et al (2019) The MG-RAST API explorer: an on-ramp for RESTful query composition. BMC Bioinformatics 20(1):561
Parks DH, Beiko RG (2010) Identifying biologically relevant differences between metagenomic communities. Bioinformatics 26(6):715–721
Patro R et al (2017) Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods 14(4):417–419
Qin J et al (2010) A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464(7285):59–65
Ren T et al (2017) Seasonal, spatial, and maternal effects on gut microbiome in wild red squirrels. Microbiome 5(1):163
Riesenfeld CS, Schloss PD, Handelsman J (2004) Metagenomics: genomic analysis of microbial communities. Annu Rev Genet 38:525–552
Rognes T et al (2016) VSEARCH: a versatile open source tool for metagenomics. PeerJ 4:e2584
Routy B et al (2018) Gut microbiome influences efficacy of PD-1-based immunotherapy against epithelial tumors. Science 359(6371):91–97
Seemann T (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics 30(14):2068–2069
Segata N et al (2011) Metagenomic biomarker discovery and explanation. Genome Biol 12(6):R60
Segata N et al (2013) Computational meta’omics for microbial community studies. Mol Syst Biol 9:666
Seshadri R et al (2007) CAMERA: a community resource for metagenomics. PLoS Biol 5(3):e75
Shah N et al (2011) Comparing bacterial communities inferred from 16S rRNA gene sequencing and shotgun metagenomics. Pac Symp Biocomput:165–176
Shenhav L et al (2019) FEAST: fast expectation-maximization for microbial source tracking. Nat Methods 16(7):627–632
Sieber CMK et al (2018) Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat Microbiol 3(7):836–843
Smits SA et al (2017) Seasonal cycling in the gut microbiome of the Hadza hunter-gatherers of Tanzania. Science 357(6353):802–806
Su X, Xu J, Ning K (2012) Parallel-META: efficient metagenomic data analysis based on high-performance computation. BMC Syst Biol 6(Suppl 1):S16
Sunagawa S et al (2015) Ocean plankton. Structure and function of the global ocean microbiome. Science 348(6237):1261359
Surana NK, Kasper DL (2017) Moving beyond microbiome-wide associations to causal microbe identification. Nature 552(7684):244–247
Tang B et al (2019) Recent advances of deep learning in bioinformatics and computational biology. Front Genet 10:214
Teng F et al (2015) Prediction of early childhood caries via spatial-temporal variations of oral microbiota. Cell Host Microbe 18(3):296–306
Thompson LR et al (2017) A communal catalogue reveals Earth’s multiscale microbial diversity. Nature 551(7681):457–463
Truong DT et al (2015) MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat Methods 12(10):902–903
Uritskiy GV, DiRuggiero J, Taylor J (2018) MetaWRAP-a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome 6(1):158
Wang W, Gao X (2019) Deep learning in bioinformatics. Methods 166:1–3
Whiteside SA et al (2015) The microbiome of the urinary tract--a role beyond infection. Nat Rev Urol 12(2):81–90
Wood DE, Salzberg SL (2014) Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 15(3):R46
Wu GD et al (2011) Linking long-term dietary patterns with gut microbial enterotypes. Science 334(6052):105–108
Zhu W, Lomsadze A, Borodovsky M (2010) Ab initio gene identification in metagenomic sequences. Nucleic Acids Res 38(12):e132
Ziemert N et al (2012) The natural product domain seeker NaPDoS: a phylogeny based bioinformatic tool to classify secondary metabolite gene diversity. PLoS One 7(3):e34064
Zhang G et al (2017) Development of Comprehensive Microbiome Big Data Warehouse/Center for Long-term Scientific Impact[J]. Bulletin of Chinese Academy of Sciences 32(3):280–289
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Ning, K. (2022). Microbiome and Big-Data Mining. In: Chen, M., Hofestädt, R. (eds) Integrative Bioinformatics. Springer, Singapore. https://doi.org/10.1007/978-981-16-6795-4_10
Download citation
DOI: https://doi.org/10.1007/978-981-16-6795-4_10
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-6794-7
Online ISBN: 978-981-16-6795-4
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)