Antonie van Leeuwenhoek

, 100:471 | Cite as

Metaxa: a software tool for automated detection and discrimination among ribosomal small subunit (12S/16S/18S) sequences of archaea, bacteria, eukaryotes, mitochondria, and chloroplasts in metagenomes and environmental sequencing datasets

  • Johan BengtssonEmail author
  • K. Martin Eriksson
  • Martin Hartmann
  • Zheng Wang
  • Belle Damodara Shenoy
  • Gwen-Aëlle Grelet
  • Kessy Abarenkov
  • Anna Petri
  • Magnus Alm Rosenblad
  • R. Henrik Nilsson
Short Communication


The ribosomal small subunit (SSU) rRNA gene has emerged as an important genetic marker for taxonomic identification in environmental sequencing datasets. In addition to being present in the nucleus of eukaryotes and the core genome of prokaryotes, the gene is also found in the mitochondria of eukaryotes and in the chloroplasts of photosynthetic eukaryotes. These three sets of genes are conceptually paralogous and should in most situations not be aligned and analyzed jointly. To identify the origin of SSU sequences in complex sequence datasets has hitherto been a time-consuming and largely manual undertaking. However, the present study introduces Metaxa (, an automated software tool to extract full-length and partial SSU sequences from larger sequence datasets and assign them to an archaeal, bacterial, nuclear eukaryote, mitochondrial, or chloroplast origin. Using data from reference databases and from full-length organelle and organism genomes, we show that Metaxa detects and scores SSU sequences for origin with very low proportions of false positives and negatives. We believe that this tool will be useful in microbial and evolutionary ecology as well as in metagenomics.


Metagenomics Microbial communities rRNA libraries Phylogenetic assignment 



The Frontiers in Biodiversity Research Centre of Excellence (University of Tartu) and the Platform in Ecotoxicology—From Gene to Ocean (University of Gothenburg) are gratefully acknowledged for their support.

Conflict of interest

The authors declare that they have no conflict of interest.

Supplementary material (13.6 mb)
Supplementary material 1 (ZIP 13957 kb) (1.1 mb)
Supplementary material 2 (ZIP 1122 kb) (279 kb)
Supplementary material 3 (ZIP 279 kb) (4 kb)
Supplementary material 4 (ZIP 4 kb)
10482_2011_9598_MOESM5_ESM.pdf (47 kb)
Supplementary material 5 (PDF 47 kb)


  1. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402PubMedCrossRefGoogle Scholar
  2. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2009) GenBank. Nucleic Acids Res 37:D26–D31PubMedCrossRefGoogle Scholar
  3. Bentley DR (2006) Whole genome re-sequencing. Curr Opin Genet Dev 16:545–552PubMedCrossRefGoogle Scholar
  4. Bidartondo MI, Bruns TD, Blackwell M et al (2008) Preserving accuracy in GenBank. Science 319:1616PubMedCrossRefGoogle Scholar
  5. Cannone JJ, Subramanian S, Schnare MN et al (2002) The comparative RNA web (CRW) site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs. BMC Bioinformatics 3:2PubMedCrossRefGoogle Scholar
  6. Christen R (2008) Global sequencing: a review of current molecular data and new methods available to assess microbial diversity. Microbes Environ 23:253–268PubMedCrossRefGoogle Scholar
  7. Cole JR, Wang Q, Cardenas E et al (2009) The ribosomal database project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res 37:D141–D145PubMedCrossRefGoogle Scholar
  8. DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, Huber T, Dalevi D, Hu P, Andersen GL (2006) Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol 72:5069–5072PubMedCrossRefGoogle Scholar
  9. Eddy SR (1998) Profile hidden markov models. Bioinformatics 14:755–763PubMedCrossRefGoogle Scholar
  10. Hartmann M, Howes CG, Abarenkov K, Mohn WW, Nilsson RH (2010) V-Xtractor: an open-source, high-throughput software tool to identify and extract hypervariable regions of small subunit (16S/18S) ribosomal RNA gene sequences. J Microbiol Methods 83:250–253PubMedCrossRefGoogle Scholar
  11. Hartmann M, Howes CG, Veldre V et al (2011) V-RevComp: automated high-throughput detection of reverse complementary 16S ribosomal RNA gene sequences in large environmental and taxonomic datasets. FEMS Microbiol Lett 319:140–145Google Scholar
  12. Kang S, Mansfield MA, Park B, Geiser DM, Ivors KL, Coffey MD, Grünwald NJ, Martin FN, Lévesque CA, Blair J (2010) The promise and pitfalls of sequence-based identification of plant-pathogenic fungi and oomycetes. Phytopathology 100:732–737PubMedCrossRefGoogle Scholar
  13. Katoh K, Toh H (2008) Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform 9:286–298PubMedCrossRefGoogle Scholar
  14. Langesen K, Hallin P, Rødland EA, Stærfeldt H-H, Rognes T, Ussery DW (2007) RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res 35:3100–3108CrossRefGoogle Scholar
  15. Lupi R, D’Onorio de Meo P, Picardi E, D’Antonio M, Paoletti D, Castrignanò T, Pesolec G, Gissi C (2010) MitoZoa: a curated mitochondrial genome database of metazoans for comparative genomics studies. Mitochondrion 10:192–199PubMedCrossRefGoogle Scholar
  16. Margulies M, Egholm M, Altman WE et al (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature 437:376–380PubMedGoogle Scholar
  17. Nilsson RH, Tedersoo L, Lindahl BD et al (2011) Towards standardization of the description and publication of next-generation sequencing datasets of fungal communities. New Phytol (in press). doi:  10.1111/j.1469-8137.2011.03755.x
  18. O’Brien EA, Zhang Y, Wang E, Marie V, Badejoko W, Lang BF, Burger G (2009) GOBASE: an organelle genome database. Nucleic Acids Res 37:D946–D950PubMedCrossRefGoogle Scholar
  19. Pearson WR, Lipman DJ (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci USA 85:2444–2448PubMedCrossRefGoogle Scholar
  20. Preusse EC, Quast C, Knittel K, Fuchs B, Ludwig W, Peplies J, Glöckner FO (2007) SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res 35:7188–7196CrossRefGoogle Scholar
  21. Rice P, Longden I, Bleasby A (2000) EMBOSS: the European molecular biology open software suite. Trends Genet 16:276–277PubMedCrossRefGoogle Scholar
  22. Ryberg M, Kristiansson E, Sjökvist E, Nilsson RH (2009) An outlook on the fungal internal transcribed spacer sequences in GenBank and the introduction of a web-based tool for the exploration of fungal diversity. New Phytol 181:471–477PubMedCrossRefGoogle Scholar
  23. Schneider KL, Pollard KS, Baertsch R, Pohl A, Lowe TM (2006) The UCSC archaeal genome browser. Nucleic Acid Res 34:D407–D410PubMedCrossRefGoogle Scholar
  24. Shendure J, Ji H (2008) Next-generation DNA sequencing. Nat Biotechnol 26:1135–1145PubMedCrossRefGoogle Scholar
  25. Trevors JT, Masson L (2010) DNA technologies: What’s next applied to microbiology research? Antonie Leeuwenhoek 98:249–262PubMedCrossRefGoogle Scholar
  26. Wooley JC, Godzik A, Friedberg I (2010) A primer on metagenomics. PLoS Comput Biol 6:2CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2011

Authors and Affiliations

  • Johan Bengtsson
    • 1
    • 2
    Email author
  • K. Martin Eriksson
    • 1
  • Martin Hartmann
    • 3
  • Zheng Wang
    • 4
  • Belle Damodara Shenoy
    • 5
  • Gwen-Aëlle Grelet
    • 6
  • Kessy Abarenkov
    • 7
  • Anna Petri
    • 1
  • Magnus Alm Rosenblad
    • 2
  • R. Henrik Nilsson
    • 1
    • 7
  1. 1.Department of Plant and Environmental SciencesUniversity of GothenburgGöteborgSweden
  2. 2.Department of Cell and Molecular BiologyUniversity of GothenburgGöteborgSweden
  3. 3.Department of Microbiology and Immunology, Life Sciences CentreUniversity of British ColumbiaVancouver, BCCanada
  4. 4.Department of Ecology and Evolutionary BiologyYale UniversityNew HavenUSA
  5. 5.Microbial Type Culture Collection and Gene Bank, Institute of Microbial Technology (CSIR-IMTECH)ChandigarhIndia
  6. 6.Landcare ResearchLincolnNew Zealand
  7. 7.Department of BotanyInstitute of Ecology and Earth Sciences, University of TartuTartuEstonia

Personalised recommendations