Skip to main content

Machine Learning Using Neural Networks for Metabolomic Pathway Analyses

  • Protocol
  • First Online:
Computational Biology and Machine Learning for Metabolic Engineering and Synthetic Biology

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2553))

Abstract

Elucidating the mechanisms of metabolic pathways helps us understand the cascade of enzyme-catalyzed reactions that lead to the conversion of substances into final products. This has implications for predicting how newly synthesized compounds will affect a person’s metabolism and, hence, the development of novel treatments to improve one’s health. The study of metabolomic pathways, together with protein engineering, may also aid in the extraction, at a scale, of natural products to be used as drugs and drug precursors. Several approaches have been used to correlate protein annotations to metabolic pathways in order to derive pathways directly related to specific organisms. These could range from association rule-mining techniques to machine learning methods such as decision trees, naïve Bayes, logistic regression, and ensemble methods.

In this chapter, we will be reviewing the use of machine learning for metabolic pathway analyses, with a step-by-step focus on the use of deep learning to predict the association of compounds (metabolites) to their respective metabolomic pathway classes. This prediction could help explain interactions of small molecules in organisms. Inspired by the work of Baranwal et al. (2019), we demonstrate how to build and train a deep learning neural network model to perform a multi-label prediction. We considered two different types of fingerprints as features (inputs to the model). The output of the model is the set of metabolic pathway classes (from the KEGG dataset) in which the input molecule participates. We will walk through the various steps of this process, including data collection, feature engineering, model selection, training, and evaluation. This model-building and evaluation process may be easily transferred to other domains of interest. All the source code used in this chapter is made publicly available at https://github.com/jp-um/machine_learning_for_metabolomic_pathway_analyses.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Nicholson JK, Lindon JC (2008) Systems biology: metabonomics. Nature 455:1054

    Article  CAS  Google Scholar 

  2. Fiehn O (2002) Metabolomics – the link between genotypes and phenotypes. Plant Mol Biol 48:155

    Article  CAS  Google Scholar 

  3. Holmes E, Wilson ID, Nicholson JK (2008) Metabolic phenotyping in health and disease. Cell 134:714

    Article  CAS  Google Scholar 

  4. Vermeersch KA, Styczynski MP (2013) Applications of metabolomics in cancer research. J Carcinog 12:9

    Article  Google Scholar 

  5. Kraj A, Drabik A, Silberring (2010) Nowe podejście w oznaczaniu i identyfikacji mikroorganizmów (Polish). Wydawnictwa Uniwersytetu Warszawskiego, Warszawa, pp 1-4–15-18

    Google Scholar 

  6. Bu Q, Huang YN, Yan GY, Cen XB, Zhao YL (2012) Metabolomics: a revolution for novel cancer marker identification. Comb Chem High Throughput Screen 15:266

    Article  CAS  Google Scholar 

  7. Spratlin JL, Serkova NJ, Eckhardt SG (2009) Clinical applications of metabolomics in oncology: a review. Clin Cancer Res 15:431

    Article  CAS  Google Scholar 

  8. Gika HG, Theodoridis GA, Plumb RS, Wilson ID (2014) Current practice of liquid chromatography-mass spectrometry in metabolomics and metabonomics. J Pharm Biomed Anal 87:12

    Article  CAS  Google Scholar 

  9. Blekherman G, Laubenbacher R, Cortes DF, Mendes P, Torti FM et al (2011) Bioinformatics tools for cancer metabolomics. Metabolomics 7:329

    Article  CAS  Google Scholar 

  10. Ellis DI, Dunn WB, Griffin JL, Allwood JW, Goodacre R (2007) Metabolic fingerprinting as a diagnostic tool. Pharmacogenomics 8:1243

    Article  CAS  Google Scholar 

  11. Drexler DM, Reily MD, Shipkova PA (2011) Metabolomics guides rational development of a simplified cell culture medium for drug screening against Trypanosoma brucei. Anal Bioanal Chem 399:2645

    Article  CAS  Google Scholar 

  12. Schuhmacher R, Krska R, Weckwerth W, Goodacre R (2013) Metabolomics and metabolite profiling. Anal Bioanal Chem 405:5003

    Article  CAS  Google Scholar 

  13. McCulloch W, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys 5:115–133

    Article  Google Scholar 

  14. Le Cun Y, Bengio Y, Hinton G (2015) Deep learning, 436. Nature 521

    Google Scholar 

  15. Cambiaghi A, Ferrario M, Masseroli M (2016) Analysis of metabolomic data: tools, current strategies and future challenges for omics data integration. Brief Bioinform 18(3):498–510

    Google Scholar 

  16. Smith R, Ventura D, Prince JT (2013) LC-MS alignment in theory and practice: a comprehensive algorithmic review. Brief Bioinform 16(1):104–117

    Article  Google Scholar 

  17. Alonso A, Marsal S, Julia A (2015) Analytical methods in untargeted metabolomics: state of the art in 2015. Front Bioeng Biotechnol 3:23

    Article  Google Scholar 

  18. Nguyen DH, Nguyen CH, Mamitsuka H (2018) Recent advances and prospects of computational methods for metabolite identification: a review with emphasis on machine learning approaches. Brief Bioinform 20(6):2028–2043

    Article  Google Scholar 

  19. Puchades-Carrasco L, Palomino-Schatzlein M, Perez-Rambla C et al (2015) Bioinformatics tools for the analysis of NMR metabolomics studies focused on the identification of clinically relevant biomarkers. Brief Bioinform 17(3):541–552

    Article  Google Scholar 

  20. Baranwal M, Magner A, Elvati P et al (2020) A deep learning architecture for metabolomic pathway prediction. Bioinformatics 36(8):2547–2553

    Article  CAS  Google Scholar 

  21. Pomyen Y, Wanichthanarak K, Poungsombat P et al (2020) Deep metabolome: applications of deep learning in metabolomics. Comput Struct Biotechnol J 18:2818–2825

    Article  CAS  Google Scholar 

  22. Chollet F (2017) Deep learning with python. Manning Publications Co

    Google Scholar 

  23. Chollet F, Allaire JJ (2018) Deep learning with R. Manning Publications Co.

    Google Scholar 

  24. Kim P (2017) MATLAB deep learning: with machine learning, neural networks and artificial intelligence. Apress

    Google Scholar 

  25. Abadi M, Barham P, Chen J et al (2016) Tensorflow: a system for large-scale machine learning. Proc. 12th USENIX Symposium on Operating Systems Design and Implementation

    Google Scholar 

  26. Chollet F. Keras. https://keras.io. Accessed 6th Jan 2022

  27. Pazke A, Gross S, Massa F et al (2019) PyTorch: an imperative style, high-performance deep learning library. Adv Neural Inform Proc Syst 32:8024–8035

    Google Scholar 

  28. Pedregosa F, Varoquaux G, Gramfort A et al (2019) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830, 2011

    Google Scholar 

  29. KEGG Pathway Database. Available at: https://www.genome.jp/kegg/pathway.html. Accessed 6th Jan 2022

  30. Good AC, Oprea TI (2008) Optimization of CAMD techniques 3. Virtual screening enrichment studies: a help or hindrance in tool selection? J Comput-aided Mol Des 22:169–178

    Article  CAS  Google Scholar 

  31. RDKit: Open-source cheminformatics, available at https://www.rdkit.org. Accessed 6th Jan 2022

  32. Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28(1):31–36

    Article  CAS  Google Scholar 

  33. Butina D (1999) Unsupervised data base clustering based on daylight’s fingerprint and tanimoto similarity: a fast and automated way to cluster small and large data sets. J Chem Inf Comput Sci 39(4):747–750

    Article  CAS  Google Scholar 

  34. Ballester PJ, Richards WG (2007) Ultrafast shape recognition for similarity search in molecular databases. Proc R Soc A Math Phys Eng Sci 463:1307–1321

    CAS  Google Scholar 

  35. Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50(5):742–754

    Article  CAS  Google Scholar 

  36. Durant JL, Leland BA, Henry DR, Nourse JG (2002) Reoptimization of MDL keys for use in drug discovery. J Chem Inform Comp Sci 42:1273–1280

    Article  CAS  Google Scholar 

  37. Szymanski P, Kajdanowicz T (2017) A scikit-based Python environment for performing multi-label classification. arXiv:1702.01460

    Google Scholar 

  38. Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. arXiv:1412.6980

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rosalin Bonetta Valentino .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Bonetta Valentino, R., Ebejer, JP., Valentino, G. (2023). Machine Learning Using Neural Networks for Metabolomic Pathway Analyses. In: Selvarajoo, K. (eds) Computational Biology and Machine Learning for Metabolic Engineering and Synthetic Biology. Methods in Molecular Biology, vol 2553. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2617-7_17

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-2617-7_17

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-2616-0

  • Online ISBN: 978-1-0716-2617-7

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics