Skip to main content

MetaVW: Large-Scale Machine Learning for Metagenomics Sequence Classification

  • Protocol
  • First Online:
Data Mining for Systems Biology

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1807))

Abstract

Metagenomics is the study of microbial community diversity, especially the uncultured microorganisms by shotgun sequencing environmental samples. As the sequencers throughput and the data volume increase, it becomes challenging to develop scalable bioinformatics tools that reconstruct microbiome structure by binning sequencing reads to reference genomes. Standard alignment-based methods, such as BWA-MEM, provide state-of-the-art performance, but we demonstrate in Vervier et al. (2016) that compositional approaches using nucleotides motifs have faster analysis time, for comparable accuracy. In this work, we describe how to use MetaVW, a scalable machine learning implementation for short sequencing reads binning, based on their k-mers profile. We provide a step-by-step guideline on how we trained the classification models and how it can easily generalize to user-defined reference genomes and specific applications. We also give additional details on what effect parameters in the algorithm have on performances.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://projects.cbio.mines-paristech.fr/largescalemetagenomics/large-scale-metagenomics-1.0.tar.gz.

References

  1. Handelsman J (2004) Metagenomics: application of genomics to uncultured microorganisms. Microbiol Mol Biol Rev 68(4):669–685

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Quince C et al (2017) Shotgun metagenomics, from sampling to analysis. Nat Biotechnol 35(9):833–844

    Article  CAS  PubMed  Google Scholar 

  3. Vervier K et al (2016) Large-scale machine learning for metagenomics sequence classification. Bioinformatics 32(7):1023–1032

    Article  CAS  PubMed  Google Scholar 

  4. Wood DE, Salzberg SL (2014) Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 15:R46

    Article  PubMed  PubMed Central  Google Scholar 

  5. Simner PJ et al (2018) Understanding the promises and hurdles of metagenomic next-generation sequencing as a diagnostic tool for infectious diseases. Clin Infect Dis 66(5): 778–788

    Article  PubMed  Google Scholar 

  6. Sonnenburg S et al (2006) Large scale learning with string kernels. J Mach Learn Res 7:1531–1565

    Google Scholar 

  7. Gammerman A, Vovk V (2007) Hedging predictions in machine learning. Comp J 50(2):151–163

    Article  Google Scholar 

  8. Parks D et al (2011) Classifying short genomic fragments from novel lineages using composition and homology. BMC Bioinformatics 12:328–344

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgments

This work was supported by the European Research Council (SMAC-ERC-280032 to J-P.V.).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jean-Philippe Vert .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Vervier, K., Mahé, P., Vert, JP. (2018). MetaVW: Large-Scale Machine Learning for Metagenomics Sequence Classification. In: Mamitsuka, H. (eds) Data Mining for Systems Biology. Methods in Molecular Biology, vol 1807. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8561-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-8561-6_2

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-8560-9

  • Online ISBN: 978-1-4939-8561-6

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics