Skip to main content

Microbiome and Big-Data Mining

  • Chapter
  • First Online:
Integrative Bioinformatics
  • 992 Accesses

Abstract

Microbiome samples are accumulating at a very fast speed, representing microbial communities from every niche (biome) of our body as well as the environment. The fast-growing amount of microbiome samples, as well as the diversified sources from where the samples are collected, have provided us with an unprecedented scene from where we could obtain a better understanding of the microbial evolution and ecology. While all of these represent profound biological patterns and regulation principles, the understanding of them is heavily dependent on data integration and big-data mining, including the data-driven microbiome marker identification, non-linear relationship mining, dynamic pattern discovery, regulation principle discovery, etc.

In this chapter, we first introduce several terminologies in microbiome research, followed by the introduction of microbiome big-data. Then we emphasize the microbiome databases, as well as mainstream microbiome data mining techniques. We have provided several microbiome applications to showcase the power of microbiome big-data integration and mining for knowledge and clinical applications. Finally, we have summarized the current status of microbiome big-data analysis, pointed out several bottlenecks, and illustrated prospects in this research area.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Backhed F et al (2015) Dynamics and stabilization of the human gut microbiome during the first year of life. Cell Host Microbe 17(6):852

    Article  CAS  PubMed  Google Scholar 

  • Bashan A et al (2016) Universality of human microbial dynamics. Nature 534(7606):259

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Becker SA et al (2007) Quantitative prediction of cellular metabolism with constraint-based models: the COBRA toolbox. Nat Protoc 2(3):727–738

    Article  CAS  PubMed  Google Scholar 

  • Biteen JS et al (2016) Tools for the microbiome: nano and beyond. ACS Nano 10(1):6–37

    Article  CAS  PubMed  Google Scholar 

  • Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30(15):2114–2120

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Bolyen E et al (2019) Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat Biotechnol 37(8):852–857

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Caporaso JG et al (2010) QIIME allows analysis of high-throughput community sequencing data. Nat Methods 7(5):335–336

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Cheng M, Cao L, Ning K (2019) Microbiome big-data mining and applications using single-cell technologies and metagenomics approaches toward precision medicine. Front Genet 10:972

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Clemente JC et al (2012) The impact of the gut microbiota on human health: an integrative view. Cell 148(6):1258–1270

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Conway KR, Boddy CN (2013) ClusterMine360: a database of microbial PKS/NRPS biosynthesis. Nucleic Acids Res 41(Database issue):D402–D407

    CAS  PubMed  Google Scholar 

  • Costea PI et al (2018) Enterotypes in the landscape of gut microbial community composition. Nat Microbiol 3(1):8–16

    Article  CAS  PubMed  Google Scholar 

  • Costello EK et al (2009) Bacterial community variation in human body habitats across space and time. Science 326(5960):1694–1697

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Daniel R (2004) The soil metagenome – a rich resource for the discovery of novel natural products. Curr Opin Biotechnol 15(3):199–204

    Article  CAS  PubMed  Google Scholar 

  • Dewhirst FE et al (2010) The human oral microbiome. J Bacteriol 192(19):5002–5017

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26(19):2460–2461

    Article  CAS  PubMed  Google Scholar 

  • Franzosa EA et al (2018) Species-level functional profiling of metagenomes and metatranscriptomes. Nat Methods 15(11):962–968

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Fredricks DN (2001) Microbial ecology of human skin in health and disease. J Investig Dermatol Symp Proc 6(3):167–169

    Article  CAS  PubMed  Google Scholar 

  • Fu L et al (2012) CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28(23):3150–3152

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Gerlach W, Stoye J (2011) Taxonomic classification of metagenomic shotgun sequences with CARMA3. Nucleic Acids Res 39(14):e91

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Glass EM et al (2010) Using the metagenomics RAST server (MG-RAST) for analyzing shotgun metagenomes. Cold Spring Harb Protoc 2010(1):pdb.prot5368

    Article  PubMed  Google Scholar 

  • Gonzalez A et al (2018) Qiita: rapid, web-enabled microbiome meta-analysis. Nat Methods 15(10):796–798

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Grice EA et al (2009) Topographical and temporal diversity of the human skin microbiome. Science 324(5931):1190–1192

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Guo J et al (2017) Metagenomic analysis reveals wastewater treatment plants as hotspots of antibiotic resistance genes and mobile genetic elements. Water Res 123:468–478

    Article  CAS  PubMed  Google Scholar 

  • Hadjithomas M et al (2015) IMG-ABC: a knowledge base to fuel discovery of biosynthetic gene clusters and novel secondary metabolites. MBio 6(4):e00932

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Halfvarson J et al (2017) Dynamics of the human gut microbiome in inflammatory bowel disease. Nat Microbiol 2:17004

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Hamady M, Knight R (2009) Microbial community profiling for human microbiome projects: tools, techniques, and challenges. Genome Res 19(7):1141–1152

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Hamady M, Lozupone C, Knight R (2010) Fast UniFrac: facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data. ISME J 4(1):17–27

    Article  CAS  PubMed  Google Scholar 

  • Han M et al (2020) Stratification of athletes’ gut microbiota: the multifaceted hubs associated with dietary factors, physical characteristics and performance. Gut Microbes 12(1):1–18

    Article  CAS  PubMed  Google Scholar 

  • Huson DH et al (2007) MEGAN analysis of metagenomic data. Genome Res 17(3):377–386

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Ichikawa N et al (2013) DoBISCUIT: a database of secondary metabolite biosynthetic gene clusters. Nucleic Acids Res 41(Database issue):D408–D414

    CAS  PubMed  Google Scholar 

  • Integrative HMP (iHMP) Research Network Consortium (2014) The Integrative Human Microbiome Project: dynamic analysis of microbiome-host omics profiles during periods of human health and disease. Cell Host Microbe 16(3):276–289

    Article  CAS  Google Scholar 

  • Integrative HMP (iHMP) Research Network Consortium (2019) The Integrative Human Microbiome Project. Nature 569(7758):641–648

    Article  CAS  Google Scholar 

  • (2019) After the Integrative Human Microbiome Project, what’s next for the microbiome community? Nature 569(7758):599

    Google Scholar 

  • Keegan KP, Glass EM, Meyer F (2016) MG-RAST, a metagenomics service for analysis of microbial community structure and function. Methods Mol Biol 1399:207–233

    Article  CAS  PubMed  Google Scholar 

  • Knight R et al (2018) Best practices for analysing microbiomes. Nat Rev Microbiol 16(7):410–422

    Article  CAS  PubMed  Google Scholar 

  • Knights D et al (2011) Bayesian community-wide culture-independent microbial source tracking. Nat Methods 8(9):761–763

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kodama Y et al (2012) The sequence read archive: explosive growth of sequencing data. Nucleic Acids Res 40(Database issue):D54–D56

    Article  CAS  PubMed  Google Scholar 

  • Koren O et al (2011) Human oral, gut, and plaque microbiota in patients with atherosclerosis. Proc Natl Acad Sci U S A 108(suppl 1):4592–4598

    Article  CAS  PubMed  Google Scholar 

  • Kultima JR et al (2016) MOCAT2: a metagenomic assembly, annotation and profiling framework. Bioinformatics 32(16):2520–2523

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Lan K et al (2018) A survey of data mining and deep learning in bioinformatics. J Med Syst 42(8):139

    Article  PubMed  Google Scholar 

  • Langille MG et al (2013) Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nat Biotechnol 31(9):814–821

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9(4):357–359

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Li D et al (2015) MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31(10):1674–1676

    Article  CAS  PubMed  Google Scholar 

  • Li Y et al (2019) Deep learning in bioinformatics: introduction, application, and perspective in the big data era. Methods 166:4–21

    Article  CAS  PubMed  Google Scholar 

  • Liu H et al (2019) Resilience of human gut microbial communities for the long stay with multiple dietary shifts. Gut 68(12):2254–2255

    Article  CAS  PubMed  Google Scholar 

  • Lozupone C, Knight R (2005) UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol 71(12):8228–8235

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Luo C et al (2015) ConStrains identifies microbial strains in metagenomic datasets. Nat Biotechnol 33(10):1045–1052

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Markowitz VM et al (2008) IMG/M: a data management and analysis system for metagenomes. Nucleic Acids Res 36(Database issue):D534–D538

    CAS  PubMed  Google Scholar 

  • Mason OU et al (2014) Metagenomics reveals sediment microbial community response to deepwater horizon oil spill. ISME J 8(7):1464–1475

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • McHardy AC et al (2007) Accurate phylogenetic classification of variable-length DNA fragments. Nat Methods 4(1):63–72

    Article  CAS  PubMed  Google Scholar 

  • Medema MH et al (2011) antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences. Nucleic Acids Res 39(Web Server issue):W339–W346

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Merelli I, Viti F, Milanesi L (2012) IBDsite: a galaxy-interacting, integrative database for supporting inflammatory bowel disease high throughput data analysis. BMC Bioinformatics 13(suppl 14):S5

    Article  PubMed  PubMed Central  Google Scholar 

  • Meyer F et al (2008) The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 9:386

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Meyer F et al (2019) MG-RAST version 4-lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis. Brief Bioinform 20(4):1151–1159

    Article  PubMed  Google Scholar 

  • (2014) Microbiota meet big data. Nat Chem Biol 10(8):605

    Google Scholar 

  • Mikheenko A, Saveliev V, Gurevich A (2016) MetaQUAST: evaluation of metagenome assemblies. Bioinformatics 32(7):1088–1090

    Article  CAS  PubMed  Google Scholar 

  • Min S, Lee B, Yoon S (2017) Deep learning in bioinformatics. Brief Bioinform 18(5):851–869

    PubMed  Google Scholar 

  • Mitchell AL et al (2020) MGnify: the microbiome analysis resource in 2020. Nucleic Acids Res 48(D1):D570–D578

    CAS  PubMed  Google Scholar 

  • Monzoorul Haque M et al (2009) SOrt-ITEMS: sequence orthology based approach for improved taxonomic estimation of metagenomic sequences. Bioinformatics 25(14):1722–1730

    Article  CAS  PubMed  Google Scholar 

  • Nurk S et al (2017) metaSPAdes: a new versatile metagenomic assembler. Genome Res 27(5):824–834

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Paczian T et al (2019) The MG-RAST API explorer: an on-ramp for RESTful query composition. BMC Bioinformatics 20(1):561

    Article  PubMed  PubMed Central  Google Scholar 

  • Parks DH, Beiko RG (2010) Identifying biologically relevant differences between metagenomic communities. Bioinformatics 26(6):715–721

    Article  CAS  PubMed  Google Scholar 

  • Patro R et al (2017) Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods 14(4):417–419

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Qin J et al (2010) A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464(7285):59–65

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Ren T et al (2017) Seasonal, spatial, and maternal effects on gut microbiome in wild red squirrels. Microbiome 5(1):163

    Article  PubMed  PubMed Central  Google Scholar 

  • Riesenfeld CS, Schloss PD, Handelsman J (2004) Metagenomics: genomic analysis of microbial communities. Annu Rev Genet 38:525–552

    Article  CAS  PubMed  Google Scholar 

  • Rognes T et al (2016) VSEARCH: a versatile open source tool for metagenomics. PeerJ 4:e2584

    Article  PubMed  PubMed Central  Google Scholar 

  • Routy B et al (2018) Gut microbiome influences efficacy of PD-1-based immunotherapy against epithelial tumors. Science 359(6371):91–97

    Article  CAS  PubMed  Google Scholar 

  • Seemann T (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics 30(14):2068–2069

    Article  CAS  PubMed  Google Scholar 

  • Segata N et al (2011) Metagenomic biomarker discovery and explanation. Genome Biol 12(6):R60

    Article  PubMed  PubMed Central  Google Scholar 

  • Segata N et al (2013) Computational meta’omics for microbial community studies. Mol Syst Biol 9:666

    Article  PubMed  PubMed Central  Google Scholar 

  • Seshadri R et al (2007) CAMERA: a community resource for metagenomics. PLoS Biol 5(3):e75

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Shah N et al (2011) Comparing bacterial communities inferred from 16S rRNA gene sequencing and shotgun metagenomics. Pac Symp Biocomput:165–176

    Google Scholar 

  • Shenhav L et al (2019) FEAST: fast expectation-maximization for microbial source tracking. Nat Methods 16(7):627–632

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Sieber CMK et al (2018) Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat Microbiol 3(7):836–843

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Smits SA et al (2017) Seasonal cycling in the gut microbiome of the Hadza hunter-gatherers of Tanzania. Science 357(6353):802–806

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Su X, Xu J, Ning K (2012) Parallel-META: efficient metagenomic data analysis based on high-performance computation. BMC Syst Biol 6(Suppl 1):S16

    Article  PubMed  PubMed Central  Google Scholar 

  • Sunagawa S et al (2015) Ocean plankton. Structure and function of the global ocean microbiome. Science 348(6237):1261359

    Article  CAS  PubMed  Google Scholar 

  • Surana NK, Kasper DL (2017) Moving beyond microbiome-wide associations to causal microbe identification. Nature 552(7684):244–247

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Tang B et al (2019) Recent advances of deep learning in bioinformatics and computational biology. Front Genet 10:214

    Article  PubMed  PubMed Central  Google Scholar 

  • Teng F et al (2015) Prediction of early childhood caries via spatial-temporal variations of oral microbiota. Cell Host Microbe 18(3):296–306

    Article  CAS  PubMed  Google Scholar 

  • Thompson LR et al (2017) A communal catalogue reveals Earth’s multiscale microbial diversity. Nature 551(7681):457–463

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Truong DT et al (2015) MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat Methods 12(10):902–903

    Article  CAS  PubMed  Google Scholar 

  • Uritskiy GV, DiRuggiero J, Taylor J (2018) MetaWRAP-a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome 6(1):158

    Article  PubMed  PubMed Central  Google Scholar 

  • Wang W, Gao X (2019) Deep learning in bioinformatics. Methods 166:1–3

    Article  CAS  PubMed  Google Scholar 

  • Whiteside SA et al (2015) The microbiome of the urinary tract--a role beyond infection. Nat Rev Urol 12(2):81–90

    Article  PubMed  Google Scholar 

  • Wood DE, Salzberg SL (2014) Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 15(3):R46

    Article  PubMed  PubMed Central  Google Scholar 

  • Wu GD et al (2011) Linking long-term dietary patterns with gut microbial enterotypes. Science 334(6052):105–108

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zhu W, Lomsadze A, Borodovsky M (2010) Ab initio gene identification in metagenomic sequences. Nucleic Acids Res 38(12):e132

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Ziemert N et al (2012) The natural product domain seeker NaPDoS: a phylogeny based bioinformatic tool to classify secondary metabolite gene diversity. PLoS One 7(3):e34064

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zhang G et al (2017) Development of Comprehensive Microbiome Big Data Warehouse/Center for Long-term Scientific Impact[J]. Bulletin of Chinese Academy of Sciences 32(3):280–289

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kang Ning .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Ning, K. (2022). Microbiome and Big-Data Mining. In: Chen, M., Hofestädt, R. (eds) Integrative Bioinformatics. Springer, Singapore. https://doi.org/10.1007/978-981-16-6795-4_10

Download citation

Publish with us

Policies and ethics