Abstract
There has been considerable interest in transforming peptides into small molecules as peptide-based molecules often present poorer bioavailability and lower metabolic stability. Our studies looked into building machine learning (ML) models to investigate if ML is able to identify the ‘bioactive’ features of peptides and use the features to accurately discriminate between binding and non-binding small molecules. The ghrelin receptor (GR), a receptor that is implicated in various diseases, was used as an example to demonstrate whether ML models derived from a peptide library can be used to predict small molecule binders. ML models based on three different algorithms, namely random forest, support vector machine, and extreme gradient boosting, were built based on a carefully curated dataset of peptide/peptidomimetic and small molecule GR ligands. The results indicated that ML models trained with a dataset exclusively composed of peptides/peptidomimetics provide limited predictive power for small molecules, but that ML models trained with a diverse dataset composed of an array of both peptides/peptidomimetics and small molecules displayed exceptional results in terms of accuracy and false rates. The diversified models can accurately differentiate the binding small molecules from non-binding small molecules using an external validation set with new small molecules that we synthesized previously. Structural features that are the most critical contributors to binding activity were extracted and are remarkably consistent with the crystallography and mutagenesis studies.
Graphical abstract
Similar content being viewed by others
References
Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G, Li B, Madabhushi A, Shah P, Spitzer M, Zhao S (2019) Applications of machine learning in drug discovery and development. Nat Rev Drug Discov 18:463–477. https://doi.org/10.1038/s41573-019-0024-5
Yang X, Wang Y, Byrne R, Schneider G, Yang S (2019) Concepts of artificial intelligence for computer-assisted drug discovery. Chem Rev 119:10520–10594. https://doi.org/10.1021/acs.chemrev.8b00728
Raschka S, Kaufman B (2020) Machine learning and AI-based approaches for bioactive ligand discovery and GPCR-ligand recognition. Methods 180:89–110. https://doi.org/10.1016/j.ymeth.2020.06.016
Carracedo-Reboredo P, Linares-Blanco J, Rodriguez-Fernandez N, Cedron F, Novoa FJ, Carballal A, Maojo V, Pazos A, Fernandez-Lozano C (2021) A review on machine learning approaches and trends in drug discovery. Comput Struct Biotechnol J 19:4538–4558. https://doi.org/10.1016/j.csbj.2021.08.011
Kong W, Tu X, Huang W, Yang Y, Xie Z, Huang Z (2020) Prediction and optimization of NaV1.7 sodium channel inhibitors based on machine learning and simulated annealing. J Chem Inf Model 60:2739–2753. https://doi.org/10.1021/acs.jcim.9b01180
Tan X, Li C, Yang R, Zhao S, Li F, Li X, Chen L, Wan X, Liu X, Yang T, Tong X, Xu T, Cui R, Jiang H, Zhang S, Liu H, Zheng M (2022) Discovery of pyrazolo[3,4-d]pyridazinone derivatives as selective DDR1 inhibitors via deep learning based design, synthesis, and biological evaluation. J Med Chem 65:103–119. https://doi.org/10.1021/acs.jmedchem.1c01205
Miljkovic F, Rodriguez-Perez R, Bajorath J (2020) Machine learning models for accurate prediction of kinase inhibitors with different binding modes. J Med Chem 63:8738–8748. https://doi.org/10.1021/acs.jmedchem.9b00867
Zhavoronkov A, Ivanenkov YA, Aliper A, Veselov MS, Aladinskiy VA, Aladinskaya AV, Terentiev VA, Polykovskiy DA, Kuznetsov MD, Asadulaev A, Volkov Y, Zholus A, Shayakhmetov RR, Zhebrak A, Minaeva LI, Zagribelnyy BA, Lee LH, Soll R, Madge D, Xing L, Guo T, Aspuru-Guzik A (2019) Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat Biotechnol 37:1038–1040. https://doi.org/10.1038/s41587-019-0224-x
Hedegaard MA, Holst B (2020) The complex signaling pathways of the ghrelin receptor. Endocrinology 161:bqaa020. https://doi.org/10.1210/endocr/bqaa020
Müller TD, Nogueiras R, Andermann ML, Andrews ZB, Anker SD, Argente J, Batterham RL, Benoit SC, Bowers CY, Broglio F, Casanueva FF, D’Alessio D, Depoortere I, Geliebter A, Ghigo E, Cole PA, Cowley M, Cummings DE, Dagher A, Diano S, Dickson SL, Diéguez C, Granata R, Grill HJ, Grove K, Habegger KM, Heppner K, Heiman ML, Holsen L, Holst B, Inui A, Jansson JO, Kirchner H, Korbonits M, Laferrère B, LeRoux CW, Lopez M, Morin S, Nakazato M, Nass R, Perez-Tilve D, Pfluger PT, Schwartz TW, Seeley RJ, Sleeman M, Sun Y, Sussel L, Tong J, Thorner MO, Van der Lely AJ, Van der Ploeg LHT, Zigman JM, Kojima M, Kangawa K, Smith RG, Horvath T, Tschöp MH (2015) Ghrelin. Mol Metab 4:437–460. https://doi.org/10.1016/j.molmet.2015.03.005
Poher AL, Tschöp MH, Müller TD (2018) Ghrelin regulation of glucose metabolism. Peptides 100:236–242. https://doi.org/10.1016/j.peptides.2017.12.015
Lu C, McFarland MS, Nesbitt RL, Williams AK, Chan S, Gomez-Lemus J, Autran-Gomez AM, Al-Zahrani A, Chin JL, Izawa JI, Luyt LG, Lewis JD (2012) Ghrelin receptor as a novel imaging target for prostatic neoplasms. Prostate 72:825–833. https://doi.org/10.1002/pros.21484
Zhang J, Xie T (2020) Ghrelin inhibits cisplatin-induced MDA-MB-231 breast cancer cell apoptosis via PI3K/Akt/mTOR signaling. Exp Ther Med 19:1633–1640. https://doi.org/10.3892/etm.2019.8398
Gaytan F, Morales C, Barreiro ML, Jeffery P, Chopin LK, Herington AC, Casanueva FF, Aguilar E, Dieguez C, Tena-Sempere M (2005) Expression of growth hormone secretagogue receptor type 1a, the functional ghrelin receptor, in human ovarian surface epithelium, mullerian duct derivatives, and ovarian tumors. J Clin Endocrinol Metab 90:1798–1804. https://doi.org/10.1210/jc.2004-1532
Hanrahan P, Bell J, Bottomley G, Bradley S, Clarke P, Curtis E, Davis S, Dawson G, Horswill J, Keily J, Moore G, Rasamison C, Bloxham J (2012) Substituted azaquinazolinones as modulators of GHSr-1a for the treatment of type II diabetes and obesity. Bioorg Med Chem Lett 22:2271–2278. https://doi.org/10.1016/j.bmcl.2012.01.078
Moulin A, Brunel L, Boeglin D, Demange L, Ryan J, M’Kadmi C, Denoyelle S, Martinez J, Fehrentz JA (2013) The 1,2,4-triazole as a scaffold for the design of ghrelin receptor ligands: development of JMV 2959, a potent antagonist. Amino Acids 44:301–314. https://doi.org/10.1007/s00726-012-1355-2
Hou J, Kovacs MS, Dhanvantari S, Luyt LG (2018) Development of candidates for positron emission tomography (PET) imaging of ghrelin receptor in disease: design, synthesis, and evaluation of fluorine-bearing quinazolinone derivatives. J Med Chem 61:1261–1275. https://doi.org/10.1021/acs.jmedchem.7b01754
Luyt LG, Hou J (2021) Quinazolinone derivatives useful for imaging. US 11186571
Lau JL, Dunn MK (2018) Therapeutic peptides: historical perspectives, current development trends, and future directions. Bioorg Med Chem 26:2700–2707. https://doi.org/10.1016/j.bmc.2017.06.052
Otvos L, Wade JD (2014) Current challenges in peptide-based drug discovery. Front Chem 2:1–4. https://doi.org/10.3389/fchem.2014.00062
Lundquist P, Artursson P (2016) Oral absorption of peptides and nanoparticles across the human intestine: opportunities, limitations and studies in human tissues. Adv Drug Deliv Rev 106:256–276. https://doi.org/10.1016/j.addr.2016.07.007
M’Kadmi C, Cabral A, Barrile F, Giribaldi J, Cantel S, Damian M, Mary S, Denoyelle S, Dutertre S, Péraldi-Roux S, Neasta J, Oiry C, Banères JL, Marie J, Perello M, Fehrentz JA (2019) N-terminal liver-expressed antimicrobial peptide 2 (LEAP2) region exhibits inverse agonist activity toward the ghrelin receptor. J Med Chem 62:965–973. https://doi.org/10.1021/acs.jmedchem.8b01644
Hou J, Charron CL, Fowkes MM, Luyt LG (2016) Bridging computational modeling with amino acid replacements to investigate GHS-R1a-peptidomimetic recognition. Eur J Med Chem 123:822–833. https://doi.org/10.1016/j.ejmech.2016.07.078
Giorgioni G, Bello FD, Quaglia W, Botticelli L, Cifani C, Bonaventura EMD, Bonaventura MVMD, Piergentili A (2022) Advances in the development of nonpeptide small molecules targeting ghrelin receptor. J Med Chem 65:3098–3118. https://doi.org/10.1021/acs.jmedchem.1c02191
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13:2498–2504. https://doi.org/10.1101/gr.1239303
Davies M, Nowotka M, Papadatos G, Dedman N, Gaulton A, Atkinson F, Bellis L, Overington JP (2015) ChEMBL web services: streamlining access to drug discovery data and utilities. Nucleic Acids Res 43:W612–W620. https://doi.org/10.1093/nar/gkv352
Siramshetty VB, Chen Q, Devarakonda P, Preissner R (2018) The Catch-22 of predicting hERG blockade using publicly accessible bioactivity data. J Chem Inf Model. https://doi.org/10.1021/acs.jcim.8b00150
Siramshetty VB, Nguyen DT, Martinez NJ, Southall NT, Simeonov A, Zakharov AV (2020) Critical assessment of artificial intelligence methods for prediction of hERG channel inhibition in the “Big Data” era. J Chem Inf Model 60:6007–6019. https://doi.org/10.1021/acs.jcim.0c00884
Fan T, Sun G, Zhao L, Cui X, Zhong R (2018) QSAR and classification study on prediction of acute oral toxicity of N-nitroso compounds. Int J Mol Sci 19:3015. https://doi.org/10.3390/ijms19103015
Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50:742–754. https://doi.org/10.1021/ci100050t
Morgan HL (1965) The generation of a unique machine description for chemical structures—a technique developed at chemical abstracts service. J Chem Doc 5:107–113. https://doi.org/10.1021/c160017a018
Accelrys (2011) MACCS structural keys. Accelrys, San Diego
The RDKit book. https://www.rdkit.org/docs/RDKit_Book.html
RDKit: cheminformatics and machine learning software (2013). http://www.rdkit.org
Yap CW (2011) PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem 32:1466–1474. https://doi.org/10.1002/jcc.21707
Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605
Miljkovic F, Martinsson A, Obrezanova O, Williamson B, Johnson M, Sykes A, Bender A, Greene N (2021) Machine learning models for human in vivo pharmacokinetic parameters with in-house validation. Mol Pharm 18:4520–4530. https://doi.org/10.1021/acs.molpharmaceut.1c00718
Hou T, Bian Y, McGuire T, Xie XQ (2021) Integrated multi-class classification and prediction of GPCR allosteric modulators by machine learning intelligence. Biomolecules 11:870. https://doi.org/10.3390/biom11060870
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Hunter JD (2007) Matplotlib: a 2D graphics environment. Comput Sci Eng 9:90–95. https://doi.org/10.1109/MCSE.2007.55
Breiman L (2001) Random Forests. Mach Learn 45:5–32. https://doi.org/10.1023/A:1010933404324
Vapnik VN (2000) The nature of statistical learning theory. Springer, New York
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. arXiv. https://doi.org/10.48550/arXiv.1603.02754
Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. NIPS, Long Beach, pp 4768–4777. https://doi.org/10.48550/arXiv.1705.07874
Rodríguez-Pérez R, Bajorath J (2020) Interpretation of compound activity predictions from complex machine learning models using local approximations and shapley values. J Med Chem 63:8761–8777. https://doi.org/10.1021/acs.jmedchem.9b01101
Wang K, Tian J, Zheng C, Yang H, Ren J, Liu Y, Han Q, Zhang Y (2021) Interpretable prediction of 3-year all-cause mortality in patients with heart failure caused by coronary heart disease based on machine learning and SHAP. Comput Biol Med 137:104813. https://doi.org/10.1016/j.compbiomed.2021.104813
Shiimura Y, Horita S, Hamamoto A, Asada H, Hirata K, Tanaka M, Mori K, Uemura T, Kobayashi T, Iwata S, Kojima M (2020) Structure of an antagonist-bound ghrelin receptor reveals possible ghrelin recognition mode. Nat Commun 11:4160. https://doi.org/10.1038/s41467-020-17554-1
Forli S, Huey R, Pique ME, Sanner MF, Goodsell DS, Olson AJ (2016) Computational protein–ligand docking and virtual drug screening with the AutoDock suite. Nat Protoc 11:905–919. https://doi.org/10.1038/nprot.2016.051
Cui X, Yang R, Li S, Liu J, Wu Q, Li X (2021) Modeling and insights into molecular basis of low molecular weight respiratory sensitizers. Mol Divers 25:847–859. https://doi.org/10.1007/s11030-020-10069-3
Kruskal WH, Wallis WA (2012) Use of ranks in one-criterion variance analysis. J Am Stat Assoc 47:583–621. https://doi.org/10.2307/2280779
Sanchez JE, KC GB, Franco J, Allen WJ, Garcia JD, Sirimulla S (2021) BiasNet: a model to predict ligand bias toward GPCR signaling. J Chem Inf Model 61:4190–4199. https://doi.org/10.1021/acs.jcim.1c00317
Jasial S, Gilberg E, Blaschke T, Bajorath J (2018) Machine learning distinguishes with high accuracy between pan-assay interference compounds that are promiscuous or represent dark chemical matter. J Med Chem 61:10255–10264. https://doi.org/10.1021/acs.jmedchem.8b01404
Galati S, Yonchev D, Rodríguez-Pérez R, Vogt M, Tuccinardi T, Bajorath J (2021) Predicting isoform-selective carbonic anhydrase inhibitors via machine learning and rationalizing structural features important for selectivity. ACS Omega 6:4080–4089. https://doi.org/10.1021/acsomega.0c06153
Yang KK, Wu Z, Bedbrook CN, Arnold FH (2018) Learned protein embeddings for machine learning. Bioinformatics 34:2642–2648. https://doi.org/10.1093/bioinformatics/bty178
Duvenaud D, Maclaurin D, Aguilera-Iparraguirre J, Gómez-Bombarelli R, Hirzel T, Aspuru-Guzik A, Adams RP (2015) Convolutional networks on graphs for learning molecular fingerprints. NIPS, Montreal, pp 2215–2223. https://doi.org/10.48550/arXiv.1509.09292
Acknowledgements
We thank Google Colaboratory (Co-lab) for providing computation resources. This work was supported by Natural Sciences and Engineering Research Council of Canada (NSERC), Thunder Bay Regional Health Research Institute, and Lakehead University.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All the authors of this manuscript declare that they have no conflict of interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Liu, W., Hopkins, A.M., Yan, P. et al. Can machine learning ‘transform’ peptides/peptidomimetics into small molecules? A case study with ghrelin receptor ligands. Mol Divers 27, 2239–2255 (2023). https://doi.org/10.1007/s11030-022-10555-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11030-022-10555-w