Skip to main content

Metagenomics Databases for Bacteria

  • Protocol
  • First Online:
Metagenomic Data Analysis

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2649))

Abstract

The booming sequencing technologies have turned metagenomics into a widely used tool for microbe-related studies, especially in the areas of clinical medicine and ecology. Accordingly, the toolkit of metagenomics data analysis is growing stronger to provide multiple approaches for solving various biological questions and understanding the component and function of microbiome. As part of the toolkit, metagenomics databases play a central role in the creation and maintenance of processed data such as definition of taxonomic classifications, annotation of gene functions, sequence alignment, and phylogenetic tree inference. The availability of a large quantity of high-quality bacterial genomic sequences contributes significantly to the construction and update of metagenomics databases, which constitute the core resource for metagenomics data analysis at various scales. This chapter presents the key concepts, technical options, and challenges for metagenomics projects as well as the curation processes and versatile functions for the four representative bacterial metagenomics databases, including Greengenes, SILVA, Ribosomal Database Project (RDP), and Genome Taxonomy Database (GTDB).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Handelsman J (2004) Metagenomics: application of genomics to uncultured microorganisms. Microbiol Mol Biol Rev 68:669–685

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Bharti R, Grimm DG (2021) Current challenges and best-practice protocols for microbiome analysis. Brief Bioinform 22:178–193

    Article  CAS  PubMed  Google Scholar 

  3. Human Microbiome Project, C (2012) A framework for human microbiome research. Nature 486:215–221

    Article  Google Scholar 

  4. Wu L, Ning D, Zhang, B, Li Y, Zhang P, Shan X, Zhang Q, Brown MR, Li Z, Van Nostrand JD, Ling F, Xiao N, Zhang Y, Vierheilig J, Wells GF, Yang Y, Deng Y, Tu Q, Wang A, Global Water Microbiome C, Zhang T, He Z, Keller J, Nielsen PH, Alvarez PJJ, Criddle CS, Wagner M, Tiedje JM, He Q, Curtis TP, Stahl DA, Alvarez-Cohen L, Rittmann BE, Wen X, Zhou J (2019) Global diversity and biogeography of bacterial communities in wastewater treatment plants. Nat Microbiol 4:1183–1195

    Article  Google Scholar 

  5. Chiu CY, Miller SA (2019) Clinical metagenomics. Nat Rev Genet 20:341–355

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Daniel R (2005) The metagenomics of soil. Nat Rev Microbiol 3:470–478

    Article  CAS  PubMed  Google Scholar 

  7. Schirmer M, Franzosa EA, Lloyd-Price J, McIver LJ, Schwager R, Poon TW, Ananthakrishnan AN, Andrews E, Barron G, Lake K, Prasad M, Sauk J, Stevens B, Wilson RG, Braun J, Denson LA, Kugathasan S, McGovern DPB, Vlamakis H, Xavier RJ, Huttenhower C (2018) Dynamics of metatranscription in the inflammatory bowel disease gut microbiome. Nat Microbiol 3:337–346

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Utter DR, Borisy GG, Eren AM, Cavanaugh CM, Mark Welch JL (2020) Metapangenomics of the oral microbiome provides insights into habitat adaptation and cultivar diversity. Genome Biol 21:293

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Amarasinghe SL, Su S, Dong X, Zappia L, Ritchie ME, Gouil Q (2020) Opportunities and challenges in long-read sequencing data analysis. Genome Biol 21:30

    Article  PubMed  PubMed Central  Google Scholar 

  10. Sayers EW, Beck J, Brister JR, Bolton EE, Canese K, Comeau DC, Funk K, Ketter A, Kim S, Kimchi A, Kitts PA, Kuznetsov A, Lathrop S, Lu Z, McGarvey K, Madden TL, Murphy TD, O'Leary N, Phan L, Schneider VA, Thibaud-Nissen F, Trawick BW, Pruitt KD, Ostell J (2020) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 48:D9–D16

    Article  CAS  PubMed  Google Scholar 

  11. Howe KL, Achuthan P, Allen J, Allen J, Alvarez-Jarreta J, Amode MR, Armean IM, Azov AG, Bennett R, Bhai J, Billis K, Boddu S, Charkhchi M, Cummins C, Da Rin Fioretto L, Davidson C, Dodiya K, El Houdaigui B, Fatima R, Gall A, Garcia Giron C, Grego T, Guijarro-Clarke C, Haggerty L, Hemrom A, Hourlier T, Izuogu OG, Juettemann T, Kaikala V, Kay M, Lavidas I, Le T, Lemos D, Gonzalez Martinez J, Marugan JC, Maurel T, McMahon AC, Mohanan S, Moore B, Muffato M, Oheh DN, Paraschas D, Parker A, Parton A, Prosovetskaia I, Sakthivel MP, Salam AIA, Schmitt BM, Schuilenburg H, Sheppard D, Steed E, Szpak M, Szuba M, Taylor K, Thormann A, Threadgold G, Walts B, Winterbottom A, Chakiachvili M, Chaubal A, De Silva N, Flint B, Frankish A, Hunt SE, GR II, Langridge N, Loveland JE, Martin FJ, Mudge JM, Morales J, Perry E, Ruffier M, Tate J, Thybert D, Trevanion SJ, Cunningham F, Yates AD, Zerbino DR, Flicek P (2021) Ensembl 2021. Nucleic Acids Res 49:D884–D891

    Article  CAS  PubMed  Google Scholar 

  12. DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, Huber T, Dalevi D, Hu P, Andersen GL (2006) Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol 72:5069–5072

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glockner FO (2013) The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res 41:D590–D596

    Article  CAS  PubMed  Google Scholar 

  14. Cole JR, Wang Q, Fish JA, Chai B, McGarrell DM, Sun Y, Brown CT, Porras-Alfaro A, Kuske CR, Tiedje JM (2014) Ribosomal database project: data and tools for high throughput rRNA analysis. Nucleic Acids Res 42:D633–D642

    Article  CAS  PubMed  Google Scholar 

  15. Parks DH, Chuvochina M, Chaumeil PA, Rinke C, Mussig AJ, Hugenholtz P (2020) A complete domain-to-species taxonomy for Bacteria and Archaea. Nat Biotechnol 38:1079–1086

    Article  CAS  PubMed  Google Scholar 

  16. Parks DH, Chuvochina M, Waite DW, Rinke C, Skarshewski A, Chaumeil PA, Hugenholtz P (2018) A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36:996–1004

    Article  CAS  PubMed  Google Scholar 

  17. Huber T, Faulkner G, Hugenholtz P (2004) Bellerophon: a program to detect chimeric sequences in multiple sequence alignments. Bioinformatics 20:2317–2319

    Article  CAS  PubMed  Google Scholar 

  18. DeSantis TZ Jr, Hugenholtz P, Keller K, Brodie EL, Larsen N, Piceno YM, Phan R, Andersen GL (2006) NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes. Nucleic Acids Res 34:W394–W399

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Harrison PW, Ahamed A, Aslam R, Alako BTF, Burgin J, Buso N, Courtot M, Fan J, Gupta D, Haseeb M, Holt S, Ibrahim T, Ivanov E, Jayathilaka S, Balavenkataraman Kadhirvelu V, Kumar M, Lopez R, Kay S, Leinonen R, Liu X, O'Cathail C, Pakseresht A, Park Y, Pesant S, Rahman N, Rajan J, Sokolov A, Vijayaraja S, Waheed Z, Zyoud A, Burdett T, Cochrane G (2021) The European nucleotide archive in 2020. Nucleic Acids Res 49:D82–D85

    Article  CAS  PubMed  Google Scholar 

  20. Pruesse E, Peplies J, Glockner FO (2012) SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes. Bioinformatics 28:1823–1829

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Ashelford KE, Chuzhanova NA, Fry JC, Jones AJ, Weightman AJ (2005) At least 1 in 20 16S rRNA sequence records currently held in public repositories is estimated to contain substantial anomalies. Appl Environ Microbiol 71:7724–7736

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Nawrocki EP, Eddy SR (2013) Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29:2933–2935

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R (2011) UCHIME improves sensitivity and speed of chimera detection. Bioinformatics 27:2194–2200

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Price MN, Dehal PS, Arkin AP (2010) FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS One 5:e9490

    Article  PubMed  PubMed Central  Google Scholar 

  25. Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, Lanfear R (2020) IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol 37:1530–1534

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Chaumeil PA, Mussig AJ, Hugenholtz P, Parks DH (2019) GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics 36:1925–1927

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dapeng Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Wang, D. (2023). Metagenomics Databases for Bacteria. In: Mitra, S. (eds) Metagenomic Data Analysis. Methods in Molecular Biology, vol 2649. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-3072-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-3072-3_3

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-3071-6

  • Online ISBN: 978-1-0716-3072-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics