Higher Classification Accuracy of Short Metagenomic Reads by Discriminative Spaced k-mers
- Cite this paper as:
- Ounit R., Lonardi S. (2015) Higher Classification Accuracy of Short Metagenomic Reads by Discriminative Spaced k-mers. In: Pop M., Touzet H. (eds) Algorithms in Bioinformatics. WABI 2015. Lecture Notes in Computer Science, vol 9289. Springer, Berlin, Heidelberg
The growing number of metagenomic studies in medicine and environmental sciences is creating new computational demands in the analysis of these very large datasets. We have recently proposed a time-efficient algorithm called Clark that can accurately classify metagenomic sequences against a set of reference genomes. The competitive advantage of Clark depends on the use of discriminative contiguousk-mers. In default mode, Clark’s speed is currently unmatched and its precision is comparable to the state-of-the-art, however, its sensitivity still does not match the level of the most sensitive (but slowest) metagenomic classifier. In this paper, we introduce an algorithmic improvement that allows Clark’s classification sensitivity to match the best metagenomic classifier, without a significant loss of speed or precision compared to the original version. Finally, on real metagenomes, Clark can assign with high accuracy a much higher proportion of short reads than its closest competitor. The improved version of Clark, based on discriminative spacedk-mers, is freely available at http://clark.cs.ucr.edu/Spaced/.