Abstract
Elucidation of biological pathways leading to specialized metabolites remains a complex task. It is however a mandatory step to allow bioproduction into heterologous hosts. Many steps have already been identified using conventional approaches, enlarging the space of known possible chemical steps. In the recent past years, identification of missing steps has been fueled by the generation of genomic and transcriptomic data for nonmodel species. The analysis of gene expression profiles has revealed that in many cases, genes encoding enzymes involved in the same biosynthetic pathways are coexpressed across different tissue types and environmental conditions. Hence, coexpressed studies, either in the form of differential gene expression, gene coexpression network, or unsupervised clustering methods, have helped deciphering missing steps to complete knowledge on biosynthetic pathways. Already identified biosynthetic steps can be used as baits to capture the remaining unknown steps. The present protocol shows how supervised machine learning in the form of artificial neural networks (ANNs) can efficiently classify genes as specialized metabolism related or not according to their expression levels. Using Catharanthus roseus as an example, we show that ANN trained on a minimal set of bait genes results in many true positives (correctly predicted genes) while keeping false positives low (containing possible candidate genes).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
O’Connor SE, Maresh JJ (2006) Chemistry and biology of monoterpene indole alkaloid biosynthesis. Nat Prod Rep 23:532–547. https://doi.org/10.1039/b512615k
Pyne ME, Narcross L, Martin VJJ (2019) Engineering plant secondary metabolism in microbial systems. Plant Physiol 179:844–861. https://doi.org/10.1104/pp.18.01291
Qu Y, Easson MEAM, Simionescu R et al (2018) Solution of the multistep pathway for assembly of corynanthean, strychnos, iboga, and aspidosperma monoterpenoid indole alkaloids from 19E-geissoschizine. Proc Natl Acad Sci U S A 115:3180–3185. https://doi.org/10.1073/pnas.1719979115
Caputi L, Franke J, Farrow SC et al (2018) Missing enzymes in the biosynthesis of the anticancer drug vinblastine in Madagascar periwinkle. Science 360:1235–1239. https://doi.org/10.1126/science.aat4100
Szabó LF (2008) Rigorous biogenetic network for a group of indole alkaloids derived from strictosidine. Molecules 13:1875–1896. https://doi.org/10.3390/molecules13081875
Dugé de Bernonville T, Papon N, Clastre M et al (2020) Identifying missing biosynthesis enzymes of plant natural products. Trends Pharmacol Sci 41:142–146. https://doi.org/10.1016/j.tips.2019.12.006
Payne RME, Xu D, Foureau E et al (2017) An NPF transporter exports a central monoterpene indole alkaloid intermediate from the vacuole. Nat Plants 3:16208. https://doi.org/10.1038/nplants.2016.208
Baranwal M, Magner A, Elvati P et al (2020) A deep learning architecture for metabolic pathway prediction. Bioinformatics 36(8):2547–2553. https://doi.org/10.1093/bioinformatics/btz954
Kulmanov M, Hoehndorf R (2020) DeepGOPlus: improved protein function prediction from sequence. Bioinformatics 36:422–429. https://doi.org/10.1093/bioinformatics/btz595
Peng J, Xue H, Wei Z et al (2021) Integrating multi-network topology for gene function prediction using deep neural networks. Brief Bioinform 22(2):2096–2105. https://doi.org/10.1093/bib/bbaa036
Eetemadi A, Tagkopoulos I (2019) Genetic Neural Networks: an artificial neural network architecture for capturing gene expression relationships. Bioinformatics 35:2226–2234. https://doi.org/10.1093/bioinformatics/bty945
Gandomi AH, Roke DA (2015) Assessment of artificial neural network and genetic programming as predictive tools. Adv Eng Softw 88:63–72. https://doi.org/10.1016/j.advengsoft.2015.05.007
Orr GB, Müller K-R (1998) Neural networks: tricks of the trade. Springer, Berlin Heidelberg
Khan SH, Hayat M, Porikli F (2019) Regularization of deep neural networks with spectral dropout. Neural Netw 110:82–90. https://doi.org/10.1016/j.neunet.2018.09.009
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Dugé de Bernonville, T., Amor Stander, E., Dugé de Bernonville, G., Besseau, S., Courdavault, V. (2022). Predicting Monoterpene Indole Alkaloid-Related Genes from Expression Data with Artificial Neural Networks. In: Courdavault, V., Besseau, S. (eds) Catharanthus roseus. Methods in Molecular Biology, vol 2505. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2349-7_10
Download citation
DOI: https://doi.org/10.1007/978-1-0716-2349-7_10
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2348-0
Online ISBN: 978-1-0716-2349-7
eBook Packages: Springer Protocols