Skip to main content

Unbiased Taxonomic Annotation of Metagenomic Samples

  • Conference paper
  • First Online:
Bioinformatics Research and Applications (ISBRA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10330))

Included in the following conference series:

Abstract

The classification of reads from a metagenomic sample using a reference taxonomy is usually based on first mapping the reads to the reference sequences and then, classifying each read at a node under the lowest common ancestor of the candidate sequences in the reference taxonomy with the least classification error. However, this taxonomic annotation can be biased by an imbalanced taxonomy and also by the presence of multiple nodes in the taxonomy with the least classification error for a given read. In this paper, we show that the Rand index is a better indicator of classification error than the often used area under the ROC curve and F-measure for both balanced and imbalanced reference taxonomies, and we also address the second source of bias by reducing the taxonomic annotation problem for a whole metagenomic sample to a set cover problem, for which a logarithmic approximation can be obtained in linear time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Alonso, D., Barré, A., Beretta, S., Bonizzoni, P., Nikolski, M., Valiente, G.: Further steps in TANGO: improved taxonomic assignment in metagenomics. Bioinformatics 30(1), 17–23 (2013)

    Article  Google Scholar 

  2. Bar-Yehuda, R., Even, S.: A linear-time approximation algorithm for the weighted vertex cover problem. J. Algorithms 2(2), 198–203 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  3. Clemente, J.C., Jansson, J., Valiente, G.: Flexible taxonomic assignment of ambiguous sequencing reads. BMC Bioinform. 12(1), 8 (2011)

    Article  Google Scholar 

  4. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. MIT Press, Cambridge (2009)

    MATH  Google Scholar 

  5. Federhen, S.: The NCBI taxonomy database. Nucleic Acids Res. 40(D1), D136–D143 (2012)

    Article  Google Scholar 

  6. Federhen, S.: Type material in the NCBI taxonomy database. Nucleic Acids Res. 43(D1), D1086–D1098 (2015)

    Article  Google Scholar 

  7. Fischer, J., Huson, D.H.: New common ancestor problems in trees and directed acyclic graphs. Inform. Process. Lett. 110(8–9), 331–335 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  8. Fosso, B., Santamaria, M., D’Antonio, M., Lovero, D., Corrado, G., Vizza, E., Passero, N., Garbuglia, A.R., Capobianchi, M.R., Crescenzi, M., Valiente, G., Pesole, G.: MetaShot: An accurate workflow for taxon classification of host-associated microbiome from shotgun metagenomic data. Bioinformatics (2017, in press)

    Google Scholar 

  9. Fosso, B., Santamaria, M., Marzano, M., Alonso, D., Valiente, G., Donvito, G., Monaco, A., Notarangelo, P., Pesole, G.: BioMaS: a modular pipeline for bioinformatic analysis of metagenomic amplicons. BMC Bioinform. 16(1), 203 (2015)

    Article  Google Scholar 

  10. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to NP-Completeness. Freeman, Dallas (1979)

    MATH  Google Scholar 

  11. Huerta-Cepas, J., Serra, F., Bork, P.: ETE 3: reconstruction, analysis and visualization of phylogenomic data. Mol. Biol. Evol. 33(6), 1635–1638 (2016)

    Article  Google Scholar 

  12. Huson, D.H., Auch, A., Qi, J., Schuster, S.C.: MEGAN analysis of metagenomic data. Genome Res. 17(3), 377–386 (2007)

    Article  Google Scholar 

  13. Jaccard, P.: Étude comparative de la distribution florale dans une portion des Alpes et du Jura. Bull. Soc. Vaud. Sc. Nat. 37(142), 547–579 (1901)

    Google Scholar 

  14. Johnson, D.S.: Approximation algorithms for combinatorial problems. J. Comput. Syst. Sci. 9(3), 256–278 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  15. Kunin, V., Copeland, A., Lapidus, A., Mavromatis, K., Hugenholtz, P.: A bioinformatician’s guide to metagenomics. Microbiol. Mol. Biol. Rev. 72(4), 557–578 (2008)

    Article  Google Scholar 

  16. López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inform. Sci. 250(1), 113–141 (2013)

    Article  Google Scholar 

  17. Matthews, B.W.: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta 405(2), 442–451 (1975)

    Article  Google Scholar 

  18. Powers, D.M.W.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Tech. 2(1), 37–63 (2011)

    MathSciNet  Google Scholar 

  19. Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)

    Article  Google Scholar 

  20. Thomas, T., Gilbert, J., Meyer, F.: Metagenomics: a guide from sampling to data analysis. Microb. Inform. Exp. 2(1), 3 (2012)

    Article  Google Scholar 

  21. Wooley, J.C., Godzik, A., Friedberg, I.: A primer on metagenomics. PLoS Comput. Biol. 6(2), e1000667 (2010)

    Article  Google Scholar 

  22. Youden, W.J.: Index for rating diagnostic tests. Cancer 3(1), 32–35 (1950)

    Article  Google Scholar 

  23. Yule, G.U.: On the methods of measuring association between two attributes. J. R. Statist. Soc. 75(6), 579–642 (1912)

    Article  Google Scholar 

Download references

Acknowledgements

Partially supported by Spanish Ministry of Economy and Competitiveness and European Regional Development Fund project DPI2015-67082-P (MINECO/FEDER).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gabriel Valiente .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Fosso, B., Pesole, G., Rosselló, F., Valiente, G. (2017). Unbiased Taxonomic Annotation of Metagenomic Samples. In: Cai, Z., Daescu, O., Li, M. (eds) Bioinformatics Research and Applications. ISBRA 2017. Lecture Notes in Computer Science(), vol 10330. Springer, Cham. https://doi.org/10.1007/978-3-319-59575-7_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59575-7_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59574-0

  • Online ISBN: 978-3-319-59575-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics