Skip to main content

Host Trait Prediction of Metagenomic Data for Topology-Based Visualization

  • Conference paper
Distributed Computing and Internet Technology (ICDCIT 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8956))

Abstract

Microbiome and metagenomic research continues to grow as well as the size and complexity of the collected data. Additionally, it is understood that the microbiome can have a complex relationship with the environment or host it inhabits, such as in gastrointestinal disease. The goal of this study is to accurately predict a host’s trait using only metagenomic data, by training a statistical model on available metagenome sequencing data. We compare a traditional Support Vector Regression approach to a new non-parametric method developed here, called PKEM, which uses dimensionality reduction combined with Kernel Density Estimation. The results are visualized using methods from Topological Data Analysis. Such representations assist in understanding how the data organizes and can lead to new insights. We apply this visualization-of-prediction technique to cat, dog and human microbiome obtained from fecal samples. In the first two the host trait is irritable bowel syndrome while in the last the host trait is Kwashiorkor, a form of severe malnutrition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Zolla, G., Badri, D.V., Bakker, M.G., Manter, D.K., Vivanco, J.M.: Soil microbiomes vary in their ability to confer drought tolerance to Arabidopsis. Applied Soil Ecology 68, 1–9 (2013)

    Article  Google Scholar 

  2. Badri, D.V., Quintana, N., El Kassis, E.G., Kim, H.K., Choi, Y.H., Sugiyama, A., Verpoorte, R., Martinoia, E., Manter, D.K., Vivanco, J.M.: An ABC transporter mutation alters root exudation of phytochemicals that provoke an overhaul of natural soil microbiota. Plant Physiology 151(4), 2006–2017 (2009)

    Article  Google Scholar 

  3. Devaraj, S., Hemarajata, P., Versalovic, J.: The human gut microbiome and body metabolism: implications for obesity and diabetes. Clinical Chemistry 59(4), 617–628 (2013)

    Article  Google Scholar 

  4. Koren, O., Knights, D., Gonzalez, A., Waldron, L., Segata, N., Knight, R., Huttenhower, C., Ley, R.E.: A guide to enterotypes across the human body: Meta-analysis of microbial community structures in human microbiome datasets. PLoS Computational Biology 9(1), e1002863 (2013)

    Google Scholar 

  5. Statnikov, A., Alekseyenko, A.V., Li, Z., Henaff, M., Perez-Perez, G.I., Blaser, M.J., Aliferis, C.F.: Microbiomic signatures of psoriasis: Feasibility and methodology comparison. Scientific Reports (3) (2013)

    Google Scholar 

  6. Statnikov, A., Henaff, M., Narendra, V., Konganti, K., Li, Z., Yang, L., Pei, Z., Blaser, M., Aliferis, C., Alekseyenko, A.: A comprehensive evaluation of multicategory classification methods for microbiomic data. Microbiome 1(1) (2013)

    Google Scholar 

  7. Lozupone, C., Knight, R.: UniFrac: a new phylogenetic method for comparing microbial communities. Applied and Environmental Microbiology 71(12), 8228–8235 (2005)

    Article  Google Scholar 

  8. Caporaso, J.G., Kuczynski, J., Stombaugh, J., Bittinger, K., Bushman, F.D., Costello, E.K., Fierer, N., Peña, A.G., Goodrich, J.K., Gordon, J.I., et al.: QIIME allows analysis of high-throughput community sequencing data. Nature Methods 7(5), 335–336 (2010)

    Article  Google Scholar 

  9. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pp. 144–152 (1992)

    Google Scholar 

  10. Guyon, I., Boser, B., Vapnik, V.: Automatic capacity tuning of very large VC-dimension classifiers. Advances in Neural Information Processing Systems, 147–155 (1993)

    Google Scholar 

  11. Cortes, C., Vapnik, V.: Support-vector networks. In: Machine Learning, pp. 273–297 (1995)

    Google Scholar 

  12. Schölkopf, B.: Support vector learning (1997), http://www.kernel-machines.org

  13. Vapnik, V., Golowich, S.E., Smola, A.: Support vector method for function approximation, regression estimation, and signal processing. Advances in Neural Information Processing Systems 9, 281–287 (1996)

    Google Scholar 

  14. Smola, A.J., Schölkopf, B.: A tutorial on support vector regression. Statistics and Computing 14(3), 199–222 (2004)

    Article  MathSciNet  Google Scholar 

  15. Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

  16. Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D., Weingessel, A.: e1071: Misc Functions of the Department of Statistics (e1071), TU Wien (2011) R package version 1.6

    Google Scholar 

  17. R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2014)

    Google Scholar 

  18. Hotelling, H.: Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology 24(6), 417 (1933)

    Article  Google Scholar 

  19. Pearson, K.: LIII. on lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 2(11), 559–572 (1901)

    Article  Google Scholar 

  20. Parzen, E.: On estimation of a probability density function and mode. The Annals of Mathematical Statistics, 1065–1076 (1962)

    Google Scholar 

  21. Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The Annals of Mathematical Statistics 27(3), 832–837 (1956)

    Article  MATH  MathSciNet  Google Scholar 

  22. Simonoff, J.S.: Smoothing methods in statistics. Springer, London (1996)

    Book  MATH  Google Scholar 

  23. Hayfield, T., Racine, J.S.: Nonparametric econometrics: The np package. Journal of Statistical Software 27(5) (2008)

    Google Scholar 

  24. Zomorodian, A., Carlsson, G.: Computing persistent homology. Discrete & Computational Geometry 33(2), 249–274 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  25. Carlsson, G.: Topology and data. Bulletin of the American Mathematical Society 46(2), 255–308 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  26. Nicolau, M., Levine, A.J., Carlsson, G.: Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proceedings of the National Academy of Sciences 108(17), 7265–7270 (2011)

    Article  Google Scholar 

  27. Chan, J.M., Carlsson, G., Rabadan, R.: Topology of viral evolution. Proceedings of the National Academy of Sciences 110(46), 18566–18571 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  28. Bartlett, C.W., Cheong, S.Y., Hou, L., Paquette, J., Lum, P.Y., Jäger, G., Battke, F., Vehlow, C., Heinrich, J., Nieselt, K., et al.: An eQTL biological data visualization challenge and approaches from the visualization community. BMC Bioinformatics 13(suppl. 8), S8 (2012)

    Google Scholar 

  29. Singh, G., Mémoli, F., Carlsson, G.E.: Topological methods for the analysis of high dimensional data sets and 3D object recognition. In: SPBG, pp. 91–100 (2007)

    Google Scholar 

  30. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20, 53–65 (1987)

    Article  MATH  Google Scholar 

  31. Bell, E.T., Suchodolski, J.S., Isaiah, A., Fleeman, L.M., Cook, A.K., Steiner, J.M., Mansfield, C.S.: Faecal microbiota of cats with insulin-treated diabetes mellitus. PLoS ONE 9(10) (2014)

    Google Scholar 

  32. Suchodolski, J.S., Markel, M.E., Garcia-Mazcorro, J.F., Unterer, S., Heilmann, R.M., Dowd, S.E., Kachroo, P., Ivanov, I., Minamoto, Y., Dillman, E.M., Steiner, J.M., Cook, A.K., Toresson, L.: The fecal microbiome in dogs with acute diarrhea and idiopathic inflammatory bowel disease. PLoS ONE 7(12) (2012)

    Google Scholar 

  33. Smith, M.I., Yatsunenko, T., Manary, M.J., Trehan, I., Mkakosya, R., Cheng, J., Kau, A.L., Rich, S.S., Concannon, P., Mychaleckyj, J.C., Liu, J., Houpt, E., Li, J.V., Holmes, E., Nicholson, J., Knights, D., Ursell, L.K., Knight, R., Gordon, J.I.: Gut microbiomes of Malawian twin pairs discordant for kwashiorkor. Science 339(6119), 548–554 (2013)

    Article  Google Scholar 

  34. Gevers, D., Kugathasan, S., Denson, L.A., VĂ¡zquez-Baeza, Y., Van Treuren, W., Ren, B., Schwager, E., Knights, D., Song, S.J., Yassour, M., et al.: The treatment-naive microbiome in new-onset Crohns disease. Cell Host & Microbe 15(3), 382–392 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Parida, L., Haiminen, N., Haws, D., Suchodolski, J. (2015). Host Trait Prediction of Metagenomic Data for Topology-Based Visualization. In: Natarajan, R., Barua, G., Patra, M.R. (eds) Distributed Computing and Internet Technology. ICDCIT 2015. Lecture Notes in Computer Science, vol 8956. Springer, Cham. https://doi.org/10.1007/978-3-319-14977-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-14977-6_8

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-14976-9

  • Online ISBN: 978-3-319-14977-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics