Skip to main content

Theoretical and Practical Analyses in Metagenomic Sequence Classification

  • Conference paper
  • First Online:
  • 714 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1062))

Abstract

Metagenomics is the study of genomic sequences in a heterogeneous microbial sample taken, e.g. from soil, water, human microbiome and skin. One of the primary objectives of metagenomic studies is to assign a taxonomic identity to each read sequenced from a sample and then to estimate the abundance of the known clades. With ever-increasing metagenomic datasets obtained from high-throughput sequencing technologies readily available nowadays, several fast and accurate methods have been developed that can work with reasonable computing requirements. Here we provide an overview of the state-of-the-art methods for the classification of metagenomic sequences, especially highlighting theoretical factors that seem to correlate well with practical factors, and could therefore be useful in the choice or development of a new method in experimental contexts. In particular, we emphasize that the information derived from the known genomes and eventually used in the learning and classification processes may create several experimental issues—mostly based on the amount of information used in the processes and its uniqueness, significance, and redundancy,—and some of these issues are intrinsic both in current alignment-based approaches and in compositional ones. This entails the need to develop efficient alignment-free methods that overcome such problems by combining the learning and classification processes in a single framework.

H.A. is supported and M.E. and D.V. are partially supported by the ERASMUS+ KA107 project no. 2018-1-IT02-KA107-047786.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Ames, S.K., Hysom, D.A., Gardner, S.N., Lloyd, G.S., Gokhale, M.B., Allen, J.E.: Scalable metagenomic taxonomy classification using a reference genome database. Bioinformatics 29(18), 2253–2260 (2013)

    Article  Google Scholar 

  2. Breitwieser, F., Baker, D., Salzberg, S.L.: KrakenUniq: confident and fast metagenomics classification using unique k-mer counts. Genome Biol. 19(1), 198 (2018)

    Article  Google Scholar 

  3. Buchfink, B., Xie, C., Huson, D.H.: Fast and sensitive protein alignment using diamond. Nat. Methods 12(1), 59 (2015)

    Article  Google Scholar 

  4. Comin, M., Verzotto, D.: The irredundant class method for remote homology detection of protein sequences. J. Comput. Biol. 18(12), 1819–1829 (2011)

    Article  Google Scholar 

  5. Comin, M., Verzotto, D.: Comparing, ranking and filtering motifs with character classes: application to biological sequences analysis. In: Biological Knowledge Discovery Handbook: Preprocessing, Mining and Postprocessing of Biological Data, chap. 13, pp. 307–332. Wiley (2013)

    Google Scholar 

  6. Comin, M., Verzotto, D.: Filtering degenerate patterns with application to protein sequence analysis. Algorithms 6(2), 352–370 (2013)

    Article  MathSciNet  Google Scholar 

  7. Comin, M., Verzotto, D.: Beyond fixed-resolution alignment-free measures for mammalian enhancers sequence comparison. IEEE/ACM Trans. Comput. Biol. Bioinf. 11(4), 628–637 (2014)

    Article  Google Scholar 

  8. Comin, M., Verzotto, D.: Alignment-free measures for whole-genome comparison. In: Pattern Recognition in Computational Molecular Biology, chap. 3, pp. 43–64. Wiley (2015)

    Google Scholar 

  9. Freitas, T.A.K., Li, P.E., Scholz, M.B., Chain, P.S.: Accurate read-based metagenome characterization using a hierarchical suite of unique signatures. Nucleic Acids Res. 43(10), e69 (2015)

    Article  Google Scholar 

  10. Garofalo, F., Rosone, G., Sciortino, M., Verzotto, D.: The colored longest common prefix array computed via sequential scans. In: Gagie, T., Moffat, A., Navarro, G., Cuadros-Vargas, E. (eds.) SPIRE 2018. LNCS, vol. 11147, pp. 153–167. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00479-8_13

    Chapter  Google Scholar 

  11. Huson, D.H., Auch, A.F., Qi, J., Schuster, S.C.: Megan analysis of metagenomic data. Genome Res. 17(3), 377–386 (2007)

    Article  Google Scholar 

  12. Lam, T.H., Verzotto, D., Liu, J., Nagarajan, N., et al.: Understanding the microbial basis of body odor in pre-pubescent children and teenagers. Microbiome 6, 213 (2018)

    Article  Google Scholar 

  13. Marchiori, D., Comin, M.: SKraken: fast and sensitive classification of short metagenomic reads based on filtering uninformative k-mers. In: BIOINFORMATICS, pp. 59–67 (2017)

    Google Scholar 

  14. McIntyre, A.B., et al.: Comprehensive benchmarking and ensemble approaches for metagenomic classifiers. Genome Biol. 18(1), 182 (2017)

    Article  Google Scholar 

  15. Ounit, R., Lonardi, S.: Higher classification accuracy of short metagenomic reads by discriminative spaced k-mers. In: Pop, M., Touzet, H. (eds.) WABI 2015. LNCS, vol. 9289, pp. 286–295. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48221-6_21

    Chapter  Google Scholar 

  16. Ounit, R., Wanamaker, S., Close, T.J., Lonardi, S.: Clark: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genomics 16(1), 236 (2015)

    Article  Google Scholar 

  17. Quince, C., Walker, A.W., Simpson, J.T., Loman, N.J., Segata, N.: Shotgun metagenomics, from sampling to analysis. Nat. Biotechnol. 35, 833 (2017)

    Article  Google Scholar 

  18. Saha, S., Johnson, J., Pal, S., Weinstock, G.M., Rajasekaran, S.: MSC: a metagenomic sequence classification algorithm. Bioinformatics, bty1071 (2019)

    Google Scholar 

  19. Teo, A.S., Verzotto, D., Yao, F., Nagarajan, N., Hillmer, A.M.: Single-molecule optical genome mapping of a human HapMap and a colorectal cancer cell line. GigaScience 4, 65 (2015)

    Article  Google Scholar 

  20. Truong, D.T., et al.: Metaphlan2 for enhanced metagenomic taxonomic profiling. Nat. Methods 12(10), 902 (2015)

    Article  Google Scholar 

  21. Vervier, K., Mahé, P., Vert, J.-P.: MetaVW: large-scale machine learning for metagenomics sequence classification. In: Mamitsuka, H. (ed.) Data Mining for Systems Biology. MMB, vol. 1807, pp. 9–20. Springer, New York (2018). https://doi.org/10.1007/978-1-4939-8561-6_2

    Chapter  Google Scholar 

  22. Verzotto, D., Teo, A.S., Hillmer, A.M., Nagarajan, N.: OPTIMA: sensitive and accurate whole-genome alignment of error-prone genomic maps by combinatorial indexing and technology-agnostic statistical analysis. GigaScience 5, 2 (2016)

    Article  Google Scholar 

  23. Verzotto, D., Teo, A.S., Hillmer, A.M., Nagarajan, N.: Index-based map-to-sequence alignment in large eukaryotic genomes. In: Proceedings 5th RECOMB Satellite Workshop on Massively Parallel Sequencing (RECOMB-Seq), pp. 1–11. Cold Spring Harbor Labs Journals (2015). https://doi.org/10.1101/017194. bioRxiv 017194

  24. Wood, D.E., Salzberg, S.L.: Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15(3), R46 (2014)

    Article  Google Scholar 

  25. Zielezinski, A., Vinga, S., Almeida, J., Karlowski, W.: Alignment-free sequence comparison: benefits, applications, and tools. Genome Biol. 18(1), 186 (2017)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Davide Verzotto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Amraoui, H., Elloumi, M., Marcelloni, F., Mhamdi, F., Verzotto, D. (2019). Theoretical and Practical Analyses in Metagenomic Sequence Classification. In: Anderst-Kotsis, G., et al. Database and Expert Systems Applications. DEXA 2019. Communications in Computer and Information Science, vol 1062. Springer, Cham. https://doi.org/10.1007/978-3-030-27684-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-27684-3_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-27683-6

  • Online ISBN: 978-3-030-27684-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics