Abstract
The ribosomal small subunit (SSU) rRNA gene has emerged as an important genetic marker for taxonomic identification in environmental sequencing datasets. In addition to being present in the nucleus of eukaryotes and the core genome of prokaryotes, the gene is also found in the mitochondria of eukaryotes and in the chloroplasts of photosynthetic eukaryotes. These three sets of genes are conceptually paralogous and should in most situations not be aligned and analyzed jointly. To identify the origin of SSU sequences in complex sequence datasets has hitherto been a time-consuming and largely manual undertaking. However, the present study introduces Metaxa (http://microbiology.se/software/metaxa/), an automated software tool to extract full-length and partial SSU sequences from larger sequence datasets and assign them to an archaeal, bacterial, nuclear eukaryote, mitochondrial, or chloroplast origin. Using data from reference databases and from full-length organelle and organism genomes, we show that Metaxa detects and scores SSU sequences for origin with very low proportions of false positives and negatives. We believe that this tool will be useful in microbial and evolutionary ecology as well as in metagenomics.
References
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402
Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2009) GenBank. Nucleic Acids Res 37:D26–D31
Bentley DR (2006) Whole genome re-sequencing. Curr Opin Genet Dev 16:545–552
Bidartondo MI, Bruns TD, Blackwell M et al (2008) Preserving accuracy in GenBank. Science 319:1616
Cannone JJ, Subramanian S, Schnare MN et al (2002) The comparative RNA web (CRW) site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs. BMC Bioinformatics 3:2
Christen R (2008) Global sequencing: a review of current molecular data and new methods available to assess microbial diversity. Microbes Environ 23:253–268
Cole JR, Wang Q, Cardenas E et al (2009) The ribosomal database project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res 37:D141–D145
DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, Huber T, Dalevi D, Hu P, Andersen GL (2006) Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol 72:5069–5072
Eddy SR (1998) Profile hidden markov models. Bioinformatics 14:755–763
Hartmann M, Howes CG, Abarenkov K, Mohn WW, Nilsson RH (2010) V-Xtractor: an open-source, high-throughput software tool to identify and extract hypervariable regions of small subunit (16S/18S) ribosomal RNA gene sequences. J Microbiol Methods 83:250–253
Hartmann M, Howes CG, Veldre V et al (2011) V-RevComp: automated high-throughput detection of reverse complementary 16S ribosomal RNA gene sequences in large environmental and taxonomic datasets. FEMS Microbiol Lett 319:140–145
Kang S, Mansfield MA, Park B, Geiser DM, Ivors KL, Coffey MD, Grünwald NJ, Martin FN, Lévesque CA, Blair J (2010) The promise and pitfalls of sequence-based identification of plant-pathogenic fungi and oomycetes. Phytopathology 100:732–737
Katoh K, Toh H (2008) Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform 9:286–298
Langesen K, Hallin P, Rødland EA, Stærfeldt H-H, Rognes T, Ussery DW (2007) RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res 35:3100–3108
Lupi R, D’Onorio de Meo P, Picardi E, D’Antonio M, Paoletti D, Castrignanò T, Pesolec G, Gissi C (2010) MitoZoa: a curated mitochondrial genome database of metazoans for comparative genomics studies. Mitochondrion 10:192–199
Margulies M, Egholm M, Altman WE et al (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature 437:376–380
Nilsson RH, Tedersoo L, Lindahl BD et al (2011) Towards standardization of the description and publication of next-generation sequencing datasets of fungal communities. New Phytol (in press). doi: 10.1111/j.1469-8137.2011.03755.x
O’Brien EA, Zhang Y, Wang E, Marie V, Badejoko W, Lang BF, Burger G (2009) GOBASE: an organelle genome database. Nucleic Acids Res 37:D946–D950
Pearson WR, Lipman DJ (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci USA 85:2444–2448
Preusse EC, Quast C, Knittel K, Fuchs B, Ludwig W, Peplies J, Glöckner FO (2007) SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res 35:7188–7196
Rice P, Longden I, Bleasby A (2000) EMBOSS: the European molecular biology open software suite. Trends Genet 16:276–277
Ryberg M, Kristiansson E, Sjökvist E, Nilsson RH (2009) An outlook on the fungal internal transcribed spacer sequences in GenBank and the introduction of a web-based tool for the exploration of fungal diversity. New Phytol 181:471–477
Schneider KL, Pollard KS, Baertsch R, Pohl A, Lowe TM (2006) The UCSC archaeal genome browser. Nucleic Acid Res 34:D407–D410
Shendure J, Ji H (2008) Next-generation DNA sequencing. Nat Biotechnol 26:1135–1145
Trevors JT, Masson L (2010) DNA technologies: What’s next applied to microbiology research? Antonie Leeuwenhoek 98:249–262
Wooley JC, Godzik A, Friedberg I (2010) A primer on metagenomics. PLoS Comput Biol 6:2
Acknowledgments
The Frontiers in Biodiversity Research Centre of Excellence (University of Tartu) and the Platform in Ecotoxicology—From Gene to Ocean (University of Gothenburg) are gratefully acknowledged for their support.
Conflict of interest
The authors declare that they have no conflict of interest.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Bengtsson, J., Eriksson, K.M., Hartmann, M. et al. Metaxa: a software tool for automated detection and discrimination among ribosomal small subunit (12S/16S/18S) sequences of archaea, bacteria, eukaryotes, mitochondria, and chloroplasts in metagenomes and environmental sequencing datasets. Antonie van Leeuwenhoek 100, 471–475 (2011). https://doi.org/10.1007/s10482-011-9598-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10482-011-9598-6