Skip to main content

Higher Classification Accuracy of Short Metagenomic Reads by Discriminative Spaced k-mers

Part of the Lecture Notes in Computer Science book series (LNBI,volume 9289)

Abstract

The growing number of metagenomic studies in medicine and environmental sciences is creating new computational demands in the analysis of these very large datasets. We have recently proposed a time-efficient algorithm called Clark that can accurately classify metagenomic sequences against a set of reference genomes. The competitive advantage of Clark depends on the use of discriminative contiguous k-mers. In default mode, Clark’s speed is currently unmatched and its precision is comparable to the state-of-the-art, however, its sensitivity still does not match the level of the most sensitive (but slowest) metagenomic classifier. In this paper, we introduce an algorithmic improvement that allows Clark’s classification sensitivity to match the best metagenomic classifier, without a significant loss of speed or precision compared to the original version. Finally, on real metagenomes, Clark can assign with high accuracy a much higher proportion of short reads than its closest competitor. The improved version of Clark, based on discriminative spaced k-mers, is freely available at http://clark.cs.ucr.edu/Spaced/.

Keywords

  • Metagenomics
  • Microbiome
  • Classification
  • Discriminative spaced k-mers
  • Short metagenomic reads

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-662-48221-6_21
  • Chapter length: 10 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   59.99
Price excludes VAT (USA)
  • ISBN: 978-3-662-48221-6
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   79.99
Price excludes VAT (USA)

Notes

  1. 1.

    http://hannonlab.cshl.edu/fastx_toolkit/index.html.

  2. 2.

    https://github.com/ucdavis-bioinformatics/scythe.

  3. 3.

    https://github.com/ucdavis-bioinformatics/sickle.

References

  1. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)

    CrossRef  Google Scholar 

  2. Bao, E., Jiang, T., Kaloshian, I., Girke, T.: Seed: efficient clustering of next-generation sequences. Bioinformatics 27(18), 2502–2509 (2011)

    Google Scholar 

  3. Bazinet, A.L., Cummings, M.P.: A comparative evaluation of sequence classification programs. BMC Bioinformatics 13(1), 92 (2012)

    CrossRef  Google Scholar 

  4. Brady, A., Salzberg, S.: PhymmBL expanded: confidence scores, custom databases, parallelization and more. Nat. Methods 8(5), 367–367 (2011)

    CrossRef  Google Scholar 

  5. Brown, D.G., Li, M., Ma, B.: A tutorial of recent developments in the seeding of local alignment. J. Bioinform. Comput. Biol. 2(04), 819–842 (2004)

    CrossRef  Google Scholar 

  6. Choi, K.P., Zeng, F., Zhang, L.: Good spaced seeds for homology search. In: Proceedings of Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004, pp. 379–386. IEEE (2004)

    Google Scholar 

  7. Human Microbiome Project Consortium: A framework for human microbiome research. Nature 486(7402), 215–221 (2012)

    Google Scholar 

  8. Felczykowska, A., Bloch, S.K., Nejman-Falenczyk, B., Baranska, S.: Metagenomic approach in the investigation of new bioactive compounds in the marine environment. Acta Biochim. Pol. 59, 501–505 (2012)

    Google Scholar 

  9. Huson, D.H., Auch, A.F., Qi, J., Schuster, S.C.: MEGAN analysis of metagenomic data. Genome Res. 17(3), 377–386 (2007)

    CrossRef  Google Scholar 

  10. Huttenhower, C., Gevers, D., Knight, R., Abubucker, S., Badger, J., Chinwalla, A., et al.: Structure, function and diversity of the healthy human microbiome. Nature 486(7402), 207–214 (2012)

    CrossRef  Google Scholar 

  11. Ilie, L., Ilie, S.: Multiple spaced seeds for homology search. Bioinformatics 23(22), 2969–2977 (2007)

    CrossRef  Google Scholar 

  12. Ilie, L., Ilie, S., Bigvand, A.M.: Speed: fast computation of sensitive spaced seeds. Bioinformatics 27(17), 2433–2434 (2011)

    CrossRef  Google Scholar 

  13. Li, M., Ma, B., Kisman, D., Tromp, J.: Patternhunter ii: highly sensitive and fast homology search. J. Bioinform. Comput. Biol. 2(03), 417–439 (2004)

    CrossRef  Google Scholar 

  14. Li, M., Ma, B., Zhang, L.: Superiority and complexity of the spaced seeds. In: Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm. Society for Industrial and Applied Mathematics, pp. 444–453 (2006)

    Google Scholar 

  15. Lindgreen, S., Adair, K.L., Gardner, P.: An Evaluation of the Accuracy and Speed of Metagenome Analysis Tools. Cold Spring Harbor Laboratory Press (2015). doi:10.1101/017830

  16. Liu, B., Gibbons, T., Ghodsi, M., Treangen, T., Pop, M.: Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences. BMC Genomics 12(Suppl 2), S4 (2011)

    CrossRef  Google Scholar 

  17. Ma, B., Tromp, J., Li, M.: Patternhunter: faster and more sensitive homology search. Bioinformatics 18(3), 440–445 (2002)

    CrossRef  Google Scholar 

  18. Mueller, R.S., Bryson, S., Kieft, B., Li, Z., Pett-Ridge, J., Chavez, F., Hettich, R.L., Pan, C., Mayali, X.: Metagenome sequencing of a coastal marine microbial community from Monterey Bay, California. Genome Announc. 3(2), e00341-15 (2015)

    CrossRef  Google Scholar 

  19. Ounit, R., Wanamaker, S., Close, T.J., Lonardi, S.: Clark: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genomics 16(1), 236 (2015)

    CrossRef  Google Scholar 

  20. Pace, N.R.: Mapping the tree of life: progress and prospects. Microbiol. Mol. Biol. Rev. 73(4), 565–576 (2009)

    CrossRef  Google Scholar 

  21. Rosen, G.L., Reichenberger, E.R., Rosenfeld, A.M.: NBC: the naive bayes classification tool webserver for taxonomic classification of metagenomic reads. Bioinformatics 27(1), 127–129 (2011)

    CrossRef  Google Scholar 

  22. Segata, N., Waldron, L., Ballarini, A., Narasimhan, V., Jousson, O., Huttenhower, C.: Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods 9(8), 811–814 (2012)

    CrossRef  Google Scholar 

  23. Sunagawa, S., Mende, D.R., Zeller, G., Izquierdo-Carrasco, F., Berger, S.A., Kultima, J.R., Coelho, L.P., Arumugam, M., Tap, J., Nielsen, H.B., et al.: Metagenomic species profiling using universal phylogenetic marker genes. Nat. Methods 10(12), 1196–1199 (2013)

    CrossRef  Google Scholar 

  24. Venter, J.C., Remington, K., Heidelberg, J.F., Halpern, A.L., Rusch, D., Eisen, J.A., Wu, D., Paulsen, I., Nelson, K.E., Nelson, W., et al.: Environmental genome shotgun sequencing of the Sargasso Sea. Science 304(5667), 66–74 (2004)

    CrossRef  Google Scholar 

  25. Wood, D., Salzberg, S.: Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15(3), R46 (2014)

    CrossRef  Google Scholar 

  26. Zhang, Z., Schwartz, S., Wagner, L., Miller, W.: A greedy algorithm for aligning DNA sequences. J. Comput. Biol. 7(1–2), 203–214 (2000)

    CrossRef  Google Scholar 

Download references

Acknowledgments

This work was supported in part by the U.S. National Science Foundation [IIS-1302134]. We are thankful to the anonymous reviewers for their constructive feedback.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stefano Lonardi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ounit, R., Lonardi, S. (2015). Higher Classification Accuracy of Short Metagenomic Reads by Discriminative Spaced k-mers. In: Pop, M., Touzet, H. (eds) Algorithms in Bioinformatics. WABI 2015. Lecture Notes in Computer Science(), vol 9289. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48221-6_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-48221-6_21

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-48220-9

  • Online ISBN: 978-3-662-48221-6

  • eBook Packages: Computer ScienceComputer Science (R0)