Abstract
Metagenome-assembled genomes (MAGs) are microbial genomes reconstructed from metagenome data. In the last few years, many thousands of MAGs have been reported in the literature, for a variety of environments and host-associated microbiota, including humans. MAGs have helped us better understand microbial populations and their interactions with the environment where they live; moreover most MAGs belong to novel species, therefore helping to decrease the so-called microbial dark matter. However, questions about the biological reality of MAGs have not, in general, been properly addressed. In this review, I define the notions of hypothetical MAGs and conserved hypothetical MAGs. These notions should help with the understanding of the biological reality of MAGs, their worldwide occurrence, and the efforts to improve MAG recovery processes.
References
Almeida A, Nayfach S, Boland M, Strozzi F, Beracochea M, Shi ZJ, Pollard KS, Sakharova E, Parks DH, Hugenholtz P, Segata N, Kyrpides NC, Finn RD (2021) A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat Biotechnol 39(1):105–114. https://doi.org/10.1038/s41587-020-0603-3
Bowers RM, Kyrpides NC, Stepanauskas R, Harmon-Smith M, Doud D, Reddy TBK, Schulz F, Jarett J, Rivers AR, Eloe-Fadrosh EA, Tringe SG, Ivanova NN, Copeland A, Clum A, Becraft ED, Malmstrom RR, Birren B, Podar M, Bork P, Weinstock GM, Garrity GM, Dodsworth JA, Yooseph S, Sutton G, Glockner FO, Gilbert JA, Nelson WC, Hallam SJ, Jungbluth SP, Ettema TJG, Tighe S, Konstantinidis KT, Liu WT, Baker BJ, Rattei T, Eisen JA, Hedlund B, McMahon KD, Fierer N, Knight R, Finn R, Cochrane G, Karsch-Mizrachi I, Tyson GW, Rinke C, Genome Standards C, Lapidus A, Meyer F, Yilmaz P, Parks DH, Eren AM, Schriml L, Banfield JF, Hugenholtz P, Woyke T (2017) Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat Biotechnol 35(8):725–731. https://doi.org/10.1038/nbt.3893
Braga LPP, Pereira RV, Martins LF, Moura LMS, Sanchez FB, Patane JSL, da Silva AM, Setubal JC (2021) Genome-resolved metagenome and metatranscriptome analyses of thermophilic composting reveal key bacterial players and their metabolic interactions. BMC Genomics 22(1):652. https://doi.org/10.1186/s12864-021-07957-9
Campanaro S, Treu L, Rodriguez RL, Kovalovszki A, Ziels RM, Maus I, Zhu X, Kougias PG, Basile A, Luo G, Schluter A, Konstantinidis KT, Angelidaki I (2020) New insights from the biogas microbiome by comprehensive genome-resolved metagenomics of nearly 1600 species originating from multiple anaerobic digesters. Biotechnol Biofuels 13:25. https://doi.org/10.1186/s13068-020-01679-y
Chen LX, Anantharaman K, Shaiber A, Eren AM, Banfield JF (2020) Accurate and complete genomes from metagenomes. Genome Res 30(3):315–333. https://doi.org/10.1101/gr.258640.119
Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM et al (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269(5223):496–512. https://doi.org/10.1126/science.7542800
Garg SG, Kapust N, Lin W, Knopp M, Tria FDK, Nelson-Sathi S, Gould SB, Fan L, Zhu R, Zhang C, Martin WF (2021) Anomalous phylogenetic behavior of ribosomal proteins in metagenome-assembled Asgard Archaea. Genome Biol Evol 13(1). https://doi.org/10.1093/gbe/evaa238
Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL (2004) Versatile and open software for comparing large genomes. Genome Biol 5(2):R12. https://doi.org/10.1186/gb-2004-5-2-r12
Lagier JC, Dubourg G, Million M, Cadoret F, Bilen M, Fenollar F, Levasseur A, Rolain JM, Fournier PE, Raoult D (2018) Culturing the human microbiota and culturomics. Nat Rev Microbiol 16:540–550. https://doi.org/10.1038/s41579-018-0041-0
Lloyd KG, Steen AD, Ladau J, Yin J, Crosby L (2018) Phylogenetically novel uncultured microbial cells dominate earth microbiomes. mSystems 3(5). https://doi.org/10.1128/mSystems.00055-18
Lui LM, Nielsen TN, Arkin AP (2021) A method for achieving complete microbial genomes and improving bins from metagenomics data. PLoS Comput Biol 17(5):e1008972. https://doi.org/10.1371/journal.pcbi.1008972
Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, Dewell SB, Du L, Fierro JM, Gomes XV, Godwin BC, He W, Helgesen S, Ho CH, Irzyk GP, Jando SC, Alenquer ML, Jarvie TP, Jirage KB, Kim JB, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM, Lei M, Li J, Lohman KL, Lu H, Makhijani VB, McDade KE, McKenna MP, Myers EW, Nickerson E, Nobile JR, Plant R, Puc BP, Ronan MT, Roth GT, Sarkis GJ, Simons JF, Simpson JW, Srinivasan M, Tartaro KR, Tomasz A, Vogt KA, Volkmer GA, Wang SH, Wang Y, Weiner MP, Yu P, Begley RF, Rothberg JM (2005) genome sequencing in microfabricated high-density picolitre reactors. Nature 437(7057):376–380. https://doi.org/10.1038/nature03959
Meier-Kolthoff JP, Auch AF, Klenk HP, Goker M (2013) Genome sequence-based species delimitation with confidence intervals and improved distance functions. BMC Bioinformatics 14:60. https://doi.org/10.1186/1471-2105-14-60
Meziti A, Rodriguez RL, Hatt JK, Pena-Gonzalez A, Levy K, Konstantinidis KT (2021) The reliability of metagenome-assembled genomes (MAGs) in representing natural populations: insights from comparing MAGs against isolate genomes derived from the same fecal sample. Appl Environ Microbiol 87(6). https://doi.org/10.1128/AEM.02593-20
Mukherjee S, Stamatis D, Bertsch J, Ovchinnikova G, Sundaramurthi JC, Lee J, Kandimalla M, Chen IA, Kyrpides NC, Reddy TBK (2021) Genomes OnLine Database (GOLD) vol 8: overview and updates. Nucleic Acids Res 49(D1):D723–D733. https://doi.org/10.1093/nar/gkaa983
Nayfach S, Roux S, Seshadri R, Udwary D, Varghese N, Schulz F, Wu D, Paez-Espino D, Chen IM, Huntemann M, Palaniappan K, Ladau J, Mukherjee S, Reddy TBK, Nielsen T, Kirton E, Faria JP, Edirisinghe JN, Henry CS, Jungbluth SP, Chivian D, Dehal P, Wood-Charlson EM, Arkin AP, Tringe SG, Visel A, Consortium IMD, Woyke T, Mouncey NJ, Ivanova NN, Kyrpides NC, Eloe-Fadrosh EA (2021) A genomic catalog of Earth’s microbiomes. Nat Biotechnol 39(4):499–509. https://doi.org/10.1038/s41587-020-0718-6
Parks DH, Chuvochina M, Waite DW, Rinke C, Skarshewski A, Chaumeil PA, Hugenholtz P (2018) A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36(10):996–1004. https://doi.org/10.1038/nbt.4229
Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW (2015) CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25(7):1043–1055. https://doi.org/10.1101/gr.186072.114
Pasolli E, Asnicar F, Manara S, Zolfo M, Karcher N, Armanini F, Beghini F, Manghi P, Tett A, Ghensi P, Collado MC, Rice BL, DuLong C, Morgan XC, Golden CD, Quince C, Huttenhower C, Segata N (2019) Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle. Cell 176(3):649–662. https://doi.org/10.1016/j.cell.2019.01.001 (e620)
Perez-Cobas AE, Gomez-Valero L, Buchrieser C (2020) Metagenomic approaches in microbial ecology: an update on whole-genome and marker gene sequencing analyses. Microb Genom 6(8). https://doi.org/10.1099/mgen.0.000409
Quince C, Nurk S, Raguideau S, James R, Soyer OS, Summers JK, Limasset A, Eren AM, Chikhi R, Darling AE (2021) STRONG: metagenomics strain resolution on assembly graphs. Genome Biol 22(1):214. https://doi.org/10.1186/s13059-021-02419-7
Rinke C, Schwientek P, Sczyrba A, Ivanova NN, Anderson IJ, Cheng JF, Darling A, Malfatti S, Swan BK, Gies EA, Dodsworth JA, Hedlund BP, Tsiamis G, Sievert SM, Liu WT, Eisen JA, Hallam SJ, Kyrpides NC, Stepanauskas R, Rubin EM, Hugenholtz P, Woyke T (2013) Insights into the phylogeny and coding potential of microbial dark matter. Nature 499(7459):431–437. https://doi.org/10.1038/nature12352
Sangwan N, Xia F, Gilbert JA (2016) Recovering complete and draft population genomes from metagenome datasets. Microbiome 4:8. https://doi.org/10.1186/s40168-016-0154-5
Sayers EW, Cavanaugh M, Clark K, Ostell J, Pruitt KD, Karsch-Mizrachi I (2019) GenBank. Nucleic Acids Res 47(D1):D94–D99. https://doi.org/10.1093/nar/gky989
Segata N (2018) On the road to strain-resolved comparative metagenomics. mSystems 3(2). https://doi.org/10.1128/mSystems.00190-17
Setubal JC, Stadler PF (2018) Gene phylogenies and orthologous groups. Methods Mol Biol 1704:1–28. https://doi.org/10.1007/978-1-4939-7463-4_1
Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA (2003) The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4:41. https://doi.org/10.1186/1471-2105-4-41
Tatusov RL, Koonin EV, Lipman DJ (1997) A genomic perspective on protein families. Science 278(5338):631–637. https://doi.org/10.1126/science.278.5338.631
Tully BJ, Graham ED, Heidelberg JF (2018) The reconstruction of 2,631 draft metagenome-assembled genomes from the global oceans. Sci Data 5:170203. https://doi.org/10.1038/sdata.2017.203
Uritskiy GV, DiRuggiero J, Taylor J (2018) MetaWRAP-a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome 6(1):158. https://doi.org/10.1186/s40168-018-0541-1
Funding
The author was funded in part by a CNPq Senior Researcher Fellowship.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declares no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Glossary
- Genome completeness
-
The completeness of MAGs and draft isolate genomes can be estimated by determining the fraction of certain marker genes present in the genome for the particular prokaryotic clade to which the MAG or the isolate belongs. These marker genes are assumed to be required in all members of the clade.
- Genome contamination
-
For a given isolate genome or MAG sequence, the percentage of the sequence that is estimated to belong to a different species.
- Genome
-
The set of all DNA molecules in a cell.
- Genome alignment
-
This is a particular case of DNA sequence alignment. A pairwise alignment algorithm seeks to establish a correspondence between positions in one sequence with positions in the other sequence, in order to maximize the matches between positions. When two sequences have 95% identity, this means that matches were found between 95% of the positions participating in the alignment. Because prokaryotic genomes have usually more than a million base pairs, and in some cases surpass ten million base pairs, their alignments require special programs, different from those employed to align shorter sequences. One popular program to align genomes is MUMmer (Kurtz et al. 2004).
- Homology and orthology
-
Two DNA sequences (in particular, two gene sequences) are homologous if they share a common ancestor. Homology is therefore a biological concept. In practice, one has to resort to sequence similarity in order to infer homology. This has led to widespread misleading statements in the literature, where it is easy to find expressions such as “sequence X and Y have 55% homology”; what the authors of such statements mean is that sequence X and Y, when aligned, display 55% of sequence identity. When a homology relationship can be inferred between two DNA sequences in the absence of the complicating factor of duplications, the term orthology can be used. The expression “ortholog MAGs” is not standard and has been used in the spirit of the analogy between annotation of protein-coding genes and MAG similarity relationships proposed in the text.
- Reads
-
The output of a DNA sequencing machine. The length of a read can vary from 50 bp to thousands of kbp, depending on the sequencing technology.
Rights and permissions
About this article
Cite this article
Setubal, J.C. Metagenome-assembled genomes: concepts, analogies, and challenges. Biophys Rev 13, 905–909 (2021). https://doi.org/10.1007/s12551-021-00865-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12551-021-00865-y