Journal of Biosciences

, Volume 36, Issue 4, pp 709–717

Eu-Detect: An algorithm for detecting eukaryotic sequences in metagenomic data sets

  • Monzoorul Haque Mohammed
  • Sudha Chadaram
  • Dinakar Komanduri
  • Tarini Shankar Ghosh
  • Sharmila S Mande
Article

Abstract

Physical partitioning techniques are routinely employed (during sample preparation stage) for segregating the prokaryotic and eukaryotic fractions of metagenomic samples. In spite of these efforts, several metagenomic studies focusing on bacterial and archaeal populations have reported the presence of contaminating eukaryotic sequences in metagenomic data sets. Contaminating sequences originate not only from genomes of micro-eukaryotic species but also from genomes of (higher) eukaryotic host cells. The latter scenario usually occurs in the case of host-associated metagenomes. Identification and removal of contaminating sequences is important, since these sequences not only impact estimates of microbial diversity but also affect the accuracy of several downstream analyses. Currently, the computational techniques used for identifying contaminating eukaryotic sequences, being alignment based, are slow, inefficient, and require huge computing resources. In this article, we present Eu-Detect, an alignment-free algorithm that can rapidly identify eukaryotic sequences contaminating metagenomic data sets. Validation results indicate that on a desktop with modest hardware specifications, the Eu-Detect algorithm is able to rapidly segregate DNA sequence fragments of prokaryotic and eukaryotic origin, with high sensitivity. A Web server for the Eu-Detect algorithm is available at http://metagenomics.atc.tcs.com/Eu-Detect/.

Keywords

Alignment-free feature vector space metagenomics micro-eukaryotes oligonucleotide composition 

Supplementary material

12038_2011_9105_MOESM1_ESM.pdf (9.3 mb)
Supplementary Material(PDF 9.29 mb)

References

  1. Altschul SF, Gish W, Miller W, Myers EW and Lipman DJ 1990 Basic local alignment search tool. J. Mol. Biol. 215 403–410PubMedGoogle Scholar
  2. Diaz N, Krause L, Goesmann A, Niehaus K and Nattkemper T 2009 TACOA-Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinfo 10 56CrossRefGoogle Scholar
  3. Hartigan JA and Wong MA 1979 A K-Means Clustering Algorithm. App. Stat. 28 100–108CrossRefGoogle Scholar
  4. Lopez-Garcia P, Rodriguez-Valera F, Pedros-Alio C and Moreira D 2001 Unexpected diversity of small eukaryotes in deep-sea Antarctic plankton. Nature (London) 409 603–607CrossRefGoogle Scholar
  5. Mardia KV, Kent JT and Bibby JM 1979 Multivariate analysis (Academic Press)Google Scholar
  6. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, et al. 2005 Genome sequencing in micro-fabricated high-density pico-litre reactors. Nature (London) 437 376–380Google Scholar
  7. Moon-Van Der Staay SY, Wachter RD and Vaulot D 2001 Oceanic 18S rDNA sequences from picoplankton reveal unsuspected eukaryotic diversity. Nature (London) 409 607–610CrossRefGoogle Scholar
  8. Piganeau G, Desdevises Y, Derelle E and Moreau H 2008 Picoeukaryotic sequences in the Sargasso Sea metagenome. Genome Biol. 9 R5PubMedCrossRefGoogle Scholar
  9. Pride DT, Meinersmann RJ, Wassenaar TM and Blaser MJ 2003 Evolutionary implications of microbial genome tetranucleotide frequency biases. Genome Res. 13 145–158PubMedCrossRefGoogle Scholar
  10. Richter DC, Ott F, Auch AF, Schmid R and Huson DH 2008 MetaSim – A sequencing simulator for genomics and metagenomics. PLoS One 3 e3373PubMedCrossRefGoogle Scholar
  11. Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, Yooseph S, Wu D, Eisen JA, et al. 2007 The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific. PLoS Biol. 5 e77PubMedCrossRefGoogle Scholar
  12. Sanger F, Air GM, Barrell BG, Brown NL, Coulson AR, Fiddes CA, Hutchison CA, Slocombe PM, et al. 1977 The nucleotide sequence of bacteriophage phi X174 DNA. Nature (London) 265 687–695CrossRefGoogle Scholar
  13. Scanlan PD and Marchesi JR 2008 Micro-eukaryotic diversity of the human distal gut microbiota: qualitative assessment using culture-dependent and independent analysis of faeces. ISME J. 2 1183–1193PubMedCrossRefGoogle Scholar
  14. Schmieder R and Edwards R 2011 Fast identification and removal of sequence contamination from genomic and metagenomic data sets. PLoS One, 6 e17288PubMedCrossRefGoogle Scholar
  15. Teeling H, Meyerdierks A, Bauer M, Amann R and Glockner FO 2004 Application of tetranucleotide frequencies for the assignment of genomic fragments. Environ. Microbiol. 6 938–947PubMedCrossRefGoogle Scholar
  16. Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, Wu D, Paulsen I, et al. 2004 Environmental genome shotgun sequencing of the Sargasso sea. Science 304 66–74PubMedCrossRefGoogle Scholar
  17. Warnecke F, Luginbühl P, Ivanova N, Ghassemian M, Richardson TH, Stege JT, Cayouette M, McHardy AC, et al. 2007 Metagenomic and functional analysis of hindgut micro-biota of a wood-feeding higher termite. Nature(London) 450 560–565CrossRefGoogle Scholar
  18. Willner D, Furlan M, Haynes M, Schmieder R, Angly FE, Silva J, Tammadoni S, Nosrat B, et al. 2009 Metagenomic analysis of respiratory tract DNA viral communities in cystic fibrosis and non-cystic fibrosis individuals. PLoS One 4 e7370PubMedCrossRefGoogle Scholar
  19. Yooseph S, Sutton G, Rusch DB, Halpern AL, Williamson SJ, Remington K, Eisen JA, Heidelberg KB, et al. 2007 The Sorcerer II global ocean sampling expedition: expanding the universe of protein families. PLoS Biol. 5 e16PubMedCrossRefGoogle Scholar
  20. Zhang Z, Schwartz S, Wagner L and Miller W 2000 A greedy algorithm for aligning DNA sequences. J. Comput. Biol. 7 203–214PubMedCrossRefGoogle Scholar

Copyright information

© Indian Academy of Sciences 2011

Authors and Affiliations

  • Monzoorul Haque Mohammed
    • 1
  • Sudha Chadaram
    • 1
  • Dinakar Komanduri
    • 1
  • Tarini Shankar Ghosh
    • 1
  • Sharmila S Mande
    • 1
  1. 1.Bio-Sciences R&D Division, TCS Innovation Labs, Tata Consultancy Services LimitedHyderabadIndia

Personalised recommendations