Abstract
The poly (ADP-ribose) polymerase-1 (PARP-1) enzyme is an important target in the treatment of breast cancer. Currently, treatment options include the drugs Olaparib, Niraparib, Rucaparib, and Talazoparib; however, these drugs can cause severe side effects including hematological toxicity and cardiotoxicity. Although in silico models for the prediction of PARP-1 activity have been developed, the drawbacks of these models include low specificity, a narrow applicability domain, and a lack of interpretability. To address these issues, a comprehensive machine learning (ML)-based quantitative structure–activity relationship (QSAR) approach for the informed prediction of PARP-1 activity is presented. Classification models built using the Synthetic Minority Oversampling Technique (SMOTE) for data balancing gave robust and predictive models based on the K-nearest neighbor algorithm (accuracy 0.86, sensitivity 0.88, specificity 0.80). Regression models were built on structurally congeneric datasets, with the models for the phthalazinone class and fused cyclic compounds giving the best performance. In accordance with the Organization for Economic Cooperation and Development (OECD) guidelines, a mechanistic interpretation is proposed using the Shapley Additive Explanations (SHAP) to identify the important topological features to differentiate between PARP-1 actives and inactives. Moreover, an analysis of the PARP-1 dataset revealed the prevalence of activity cliffs, which possibly negatively impacts the model’s predictive performance. Finally, a set of chemical transformation rules were extracted using the matched molecular pair analysis (MMPA) which provided mechanistic insights and can guide medicinal chemists in the design of novel PARP-1 inhibitors.
Similar content being viewed by others
References
Ossovskaya V, Koo IC, Kaldjian EP et al (2010) Upregulation of poly (ADP-Ribose) polymerase-1 (PARP1) in triple-negative breast cancer and other primary human tumor types. Genes Cancer 1:812–821. https://doi.org/10.1177/1947601910383418
Tong W-M, Cortes U, Wang Z-Q (2001) Poly(ADP-ribose) polymerase: a guardian angel protecting the genome and suppressing tumorigenesis. Biochim Biophys Acta. https://doi.org/10.1016/s0304-419x(01)00035-x
Miwa M, Masutani M (2007) PolyADP-ribosylation and cancer. Cancer Sci 98:1528–1535. https://doi.org/10.1111/j.1349-7006.2007.00567.x
Zhao Y, Zhang LX, Jiang T et al (2020) The ups and downs of poly(ADP-ribose) polymerase-1 inhibitors in cancer therapy–Current progress and future direction. Eur J Med Chem 203:112570–112839. https://doi.org/10.1016/j.ejmech.2020.112570
Bruin MAC, Sonke GS, Beijnen JH, Huitema ADR (2022) Pharmacokinetics and pharmacodynamics of PARP inhibitors in oncology. Clin Pharmacokinet 61:1649–1675. https://doi.org/10.1007/s40262-022-01167-6
Tian X, Chen L, Gai D et al (2022) Adverse event profiles of PARP inhibitors: analysis of spontaneous reports submitted to FAERS. Front Pharmacol. https://doi.org/10.3389/fphar.2022.851246
Bao S, Yue Y, Hua Y et al (2021) Safety profile of poly (ADP-ribose) polymerase (PARP) inhibitors in cancer: a network meta-analysis of randomized controlled trials. Ann Transl Med 9:1229–1229
Farrés J, Llacuna L, Martin-Caballero J et al (2015) PARP-2 sustains erythropoiesis in mice by limiting replicative stress in erythroid progenitors. Cell Death Differ 22:1144–1157. https://doi.org/10.1038/cdd.2014.202
Sandhu D, Antolin AA, Cox AR, Jones AM (2022) Identification of different side effects between PARP inhibitors and their polypharmacological multi-target rationale. Br J Clin Pharmacol 88:742–752. https://doi.org/10.1111/bcp.15015
Velagapudi UK, Patel BA, Shao X et al (2021) Recent development in the discovery of PARP inhibitors as anticancer agents: a patent update (2016–2020). Expert Opin Ther Pat 31:609–623. https://doi.org/10.1080/13543776.2021.1886275
Speck-Planche A, Cordeiro MNDS (2017) Fragment-based in silico modeling of multi-target inhibitors against breast cancer-related proteins. Mol Divers 21:511–523. https://doi.org/10.1007/s11030-017-9731-1
Kleandrova VV, Scotti L, Bezerra Mendonça FJ Jr et al (2021) QSAR modeling for multi-target drug discovery: designing simultaneous inhibitors of proteins in diverse pathogenic parasites. Front CHEM. https://doi.org/10.3389/fchem.2021.634663
Speck-Planche A, Kleandrova VV, Scotti MT (2021) In silico drug repurposing for anti-inflammatory therapy: virtual search for dual inhibitors of caspase-1 and TNF-alpha. Biomolecules. https://doi.org/10.3390/biom11121832
Speck-Planche A, Scotti MT (2019) BET bromodomain inhibitors: fragment-based in silico design using multi-target QSAR models. Mol Divers 23:555–572. https://doi.org/10.1007/s11030-018-9890-8
Hirlekar BU, Nuthi A, Singh KD et al (2023) An overview of compound properties, multiparameter optimization, and computational drug design methods for PARP-1 inhibitor drugs. Eur J Med Chem 252:115300. https://doi.org/10.1016/j.ejmech.2023.115300
Lerksuthirat T, Chitphuk S, Stitchantrakul W et al (2023) Parp1Pred: a web server for screening the bioactivity of inhibitors against DNA repair enzyme Parp-1. EXCLI J 22:84–107
Ai D, Wu J, Cai H et al (2022) A multi-task FP-GNN framework enables accurate prediction of selective PARP inhibitors. Front Pharmacol 13:1–17. https://doi.org/10.3389/fphar.2022.971369
Rewatkar PV, Kokil GR, Raut MK (2011) QSAR studies of phthalazinones: novel inhibitors of poly (ADP-ribose) polymerase. Med Chem Res 20:877–886. https://doi.org/10.1007/s00044-010-9414-2
Revathi P, Kanth SS, Gururaj S et al (2021) Understanding structural characteristics of PARP-1 inhibitors through combined 3D-QSAR and molecular docking studies and discovery of new inhibitors by multistage virtual screening. Struct Chem 32:2035–2050. https://doi.org/10.1007/s11224-021-01765-3
Ramadan SK, Elrazaz EZ, Abouzid KAM, El-Naggar AM (2020) Design, synthesis and: in silico studies of new quinazolinone derivatives as antitumor PARP-1 inhibitors. RSC Adv 10:29475–29492. https://doi.org/10.1039/d0ra05943a
Costantino G, Macchiarulo A, Camaioni E, Pellicciari R (2001) Modeling of poly(ADP-ribose)polymerase (PARP) inhibitors. Docking of ligands and quantitative structure-activity relationship analysis. J Med Chem 44:3786–3794. https://doi.org/10.1021/jm010116l
Halder AK, Saha A, Das SK, Jha T (2015) Stepwise development of structure-activity relationship of diverse PARP-1 inhibitors through comparative and validated in silico modeling techniques and molecular dynamics simulation. J Biomol Struct Dyn 33:1756–1779. https://doi.org/10.1080/07391102.2014.969772
Fatima S, Bathini R, Sivan SK, Manga V (2012) Molecular docking and 3D-QSAR studies on inhibitors of DNA damage signaling enzyme human PARP-1. J Recept Signal Transduction 32:214–224. https://doi.org/10.3109/10799893.2012.693087
Rewatkar PV, Kokil GR, Raut MK (2010) QSAR studies of phthalazinones: novel inhibitors of poly (ADP-ribose) polymerase. Med Chem Res 20:877–886. https://doi.org/10.1007/s00044-010-9414-2
Korkmaz S (2020) Deep learning-based imbalanced data classification for drug discovery. J Chem Inf Model 60:4180–4190. https://doi.org/10.1021/acs.jcim.9b01162
Blagus R, Lusa L (2013) SMOTE for high-dimensional class-imbalanced data. BMC Bioinform. https://doi.org/10.1186/1471-2105-14-106
Rodríguez-Pérez R, Bajorath J (2020) Interpretation of compound activity predictions from complex machine learning models using local approximations and shapley values. J Med Chem 63:8761–8777. https://doi.org/10.1021/acs.jmedchem.9b01101
Rodríguez-Pérez R, Bajorath J (2020) Interpretation of machine learning models using shapley values: application to compound potency and multi-target activity predictions. J Comput Aided Mol Des 34:1013–1026. https://doi.org/10.1007/s10822-020-00314-0
Mendez D, Gaulton A, Bento AP et al (2019) ChEMBL: Towards direct deposition of bioassay data. Nucleic Acids Res 47:D930–D940. https://doi.org/10.1093/nar/gky1075
Kim S, Chen J, Cheng T et al (2023) PubChem 2023 update. Nucleic Acids Res 51:D1373–D1380. https://doi.org/10.1093/nar/gkac956
Vignaux PA, Lane TR, Urbina F et al (2023) Validation of acetylcholinesterase inhibition machine learning models for multiple species. Chem Res Toxicol 36:188–201. https://doi.org/10.1021/acs.chemrestox.2c00283
Lind AP, Anderson PC (2019) Predicting drug activity against cancer cells by random forest models based on minimal genomic information and chemical properties. PLoS ONE. https://doi.org/10.1371/journal.pone.0219774
Mauri A (2020) alvaDesc: a tool to calculate and analyze molecular descriptors and fingerprints. Methods Pharm Toxicol. https://doi.org/10.1007/978-1-0716-0150-1_32/COVER
Abdi H, Williams LJ (2010) Principal component analysis. Wiley Interdiscip Rev Comput Stat 2:433–459
KNIME | Open for innovation. https://www.knime.com/. Accessed 24 Nov 2021
Ramos-Pérez I, Arnaiz-González Á, Rodríguez JJ, García-Osorio C (2022) When is resampling beneficial for feature selection with imbalanced wide data? Expert Syst Appl 188:116015. https://doi.org/10.1016/J.ESWA.2021.116015
Tuv E, Borisov A, Runger G et al (2009) Feature selection with ensembles, artificial variables, and redundancy elimination. J Mach Learn Res 10:1341–1366
de Ville B (2013) Decision trees. Wiley Interdiscip Rev Comput Stat 5:448–455. https://doi.org/10.1002/wics.1278
Breiman L (2001) Random forests. Mach Learn 45:5–32
Berrar D (2019) Bayes’ theorem and naive bayes classifier. Encyclopedia of bioinformatics and computational biology. Elsevier, Amsterdam, pp 403–412
Natekin A, Knoll A (2013) Gradient boosting machines, a tutorial. Front Neurorobot 7:1–21
Bentéjac C, Csörgő A, Martínez-Muñoz G (2021) A comparative analysis of gradient boosting algorithms. Artif Intell Rev 54:1937–1967. https://doi.org/10.1007/s10462-020-09896-5
Zhang Z (2016) Introduction to machine learning: K-nearest neighbors. Ann Transl Med. https://doi.org/10.21037/atm.2016.03.37
Specht DF (1990) Probabilistic neural networks. Neural Netw 3:109. https://doi.org/10.1016/0893-6080(90)90049-Q
Kocev D, Vens C, Struyf J, Džeroski S (2013) Tree ensembles for predicting structured outputs. Pattern Recognit 46:817–833. https://doi.org/10.1016/j.patcog.2012.09.023
Murtagh F (1991) Multilayer perceptrons for classification and regression. Neurocomputing 2:183–197
Mood C (2010) Logistic regression: Why we cannot do what we think we can do, and what we can do about it. Eur Sociol Rev 26:67–82. https://doi.org/10.1093/esr/jcp006
Watt J, Borhani R, Katsaggelos A (2020) Machine learning refined: foundations, algorithms and applications. Cambridge University Press, Cambridge
Deisenroth MP, Faisal AA, Ong CS (2020) Mathematics for machine learning. Cambridge University Press, Cambridge
Gandhi V, Giranda V, Gong J, Penning T, Zhu G D (2016) Inhibitors of Poly(ADP-Ribose)Polymerase. US9283222
Hyunho L, Kwangwoo C, Eun Seon K, Eun Sung J, Hyeongchan O, Jeong - Min K, Jiseon P, Hanchang L (2019) Tricyclic Derivative Compound, Method for Preparing Same, and Pharmaceutical Composition Comprising Same. US10464919
Bregman H, Buchanan J, Chakka N, Dimauro E, Gunaydin H, Guzman Perez B, Hua Z, Huang X (2016) Quinazoline Compounds and Derivatives Thereof. US9505749
Kim M-H, Kim S, Ku S, et al (2014) Tricyclic Derivative or Pharmaceutically Acceptable Salts Thereof, Preparation Method Thereof, and Pharmaceutical Composition Containing the Same. US8815891
Zhou C, Ren B, Wang H (2017) Fused Tetra or Penta-Cyclic dihydrodiazepinoncarbazolones as PARP Inhibitors. US9617273
Eklund M, Norinder U, Boyer S, Carlsson L (2012) Benchmarking variable selection in QSAR. Mol Inform 31:173–179. https://doi.org/10.1002/minf.201100142
Leardi R, Boggia R, Terrile M (1992) Genetic algorithms as a strategy for feature selection. J Chemom 6(5):267
Uyanık GK, Güler N (2013) A study on multiple linear regression analysis. Procedia Soc Behav Sci 106:234–240. https://doi.org/10.1016/j.sbspro.2013.12.027
Svetnik V, Liaw A, Tong C et al (2003) Random forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comput Sci 43:1947–1958. https://doi.org/10.1021/ci034160g
Awad M, Khanna R (2015) Support vector regression. Efficient learning machines. Apress, Berkeley, pp 67–80
Wu Y, Duguay CR, Xu L (2021) Assessment of machine learning classifiers for global lake ice cover mapping from MODIS TOA reflectance data. Remote Sens Environ. https://doi.org/10.1016/j.rse.2020.112206
Gomatam A, Khan A, Raikuvar K et al (2023) Role of computational modelling in drug discovery for HIV. Current trends in computational modelling for drug discovery. Springer, Cham
Oecd principles for the validation, for regulatory purposes, of (quantitative) structure-activity relationship models
Melagraki G, Afantitis A, Sarimveis H et al (2010) In silico exploration for identifying structure-activity relationship of MEK inhibition and oral bioavailability for isothiazole derivatives. Chem Biol Drug Des 76:397–406. https://doi.org/10.1111/J.1747-0285.2010.01029.X
Afantitis A, Melagraki G, Sarimveis H et al (2008) Development and evaluation of a QSPR model for the prediction of diamagnetic susceptibility. QSAR Comb Sci 27:432–436. https://doi.org/10.1002/qsar.200730083
Melagraki G, Afantitis A, Sarimveis H et al (2009) Predictive QSAR workflow for the in silico identification and screening of novel HDAC inhibitors. Mol Divers 13:301–311. https://doi.org/10.1007/s11030-009-9115-2
Hussain J, Rea C (2010) Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large data sets. J Chem Inf Model 50:339–348
Xu T, Xu M, Zhu W et al (2022) Efficient identification of anti-SARS-CoV-2 compounds using chemical structure- and biological activity-based modeling. J Med Chem 65:4590–4599. https://doi.org/10.1021/acs.jmedchem.1c01372
Reilly SW, Puentes LN, Wilson K et al (2018) Examination of diazaspiro cores as piperazine bioisosteres in the olaparib framework shows reduced DNA damage and cytotoxicity. J Med Chem 61:5367–5379. https://doi.org/10.1021/acs.jmedchem.8b00576
Stumpfe D, Hu H, Bajorath J (2019) Evolving concept of activity cliffs. ACS Omega 4:14360–14368. https://doi.org/10.1021/acsomega.9b02221
Maggiora GM (2006) On outliers and activity cliffs s why QSAR often disappoints. J Chem Inf Model 46:1535
Cruz-Monteagudo M, Medina-Franco JL, Pérez-Castillo Y et al (2014) Activity cliffs in drug discovery: Dr Jekyll or Mr Hyde? Drug Discov Today 19:1069–1080. https://doi.org/10.1016/j.drudis.2014.02.003
OEDOCKING 4.2.0.1: OpenEye Scientific Software, Inc.
Resources for drug discovery | OpenEye Scientific. https://www.eyesopen.com/resources?resource_type=Publications. Accessed 11 Oct 2022
Banerjee A, Roy K (2023) Prediction-inspired intelligent training for the development of classification read-across structure-activity relationship (c-RASAR) models for organic skin sensitizers: assessment of classification error rate from novel similarity coefficients. Chem Res Toxicol 36:1518–1531
Alexander DLJ, Tropsha A, Winkler DA (2015) Beware of R2: simple, unambiguous assessment of the prediction accuracy of QSAR and QSPR models. J Chem Inf Model 55:1316–1322. https://doi.org/10.1021/acs.jcim.5b00206
Labute P (2000) A widely applicable set of descriptors. J Mol Gr Model. https://doi.org/10.1016/S1093-3263(00)00068-1
Roy K, Mitra I (2012) Electrotopological state atom (E-State) index in drug design, QSAR, property prediction and toxicity assessment. Curr Comput Aided-Drug Des 8:135–158. https://doi.org/10.2174/157340912800492366
Todeschini R, Consonni V (2010) Molecular descriptors for chemoinformatics. Wiley, Hoboken
Dossetter AG, Griffen EJ, Leach AG (2013) Matched molecular pair analysis in drug discovery. Drug Discov Today 18:724–731
Griffen E, Leach AG, Robb GR, Warner DJ (2011) Matched molecular pairs as a medicinal chemistry tool. J Med Chem 54:7739–7750
Leach AG, Jones HD, Cosgrove DA et al (2006) Matched molecular pairs as a guide in the optimization of pharmaceutical properties; a study of aqueous solubility, plasma protein binding and oral exposure. J Med Chem 49:6672–6682. https://doi.org/10.1021/jm0605233
Kramer C, Fuchs JE, Whitebread S et al (2014) Matched molecular pair analysis: significance and the impact of experimental uncertainty. J Med Chem 57:3786–3802. https://doi.org/10.1021/jm500317a
Papadatos G, Alkarouri M, Gillet VJ et al (2010) Lead optimization using matched molecular pairs: inclusion of contextual information for enhanced prediction of hERG inhibition, solubility, and lipophilicity. J Chem Inf Model 50:1872–1886
Acknowledgements
VAD, AG, and USN M acknowledge the funding from the Ministry of Electronics and Information Technology (MeitY), Govt. of India, New Delhi (project reference number No(4)12/2021-ITEA). BUH and KDS acknowledge NIPER Guwahati for financial support and travel grants.
Author information
Authors and Affiliations
Contributions
VAD (PI) conceptualized the work and wrote the article and supervised the work done by AG, BUH, and KDS. AG collected the dataset for and developed the regression models, wrote python scripts, performed MMPA analysis and model interpretation, wrote the manuscript, and participated in offline/online discussions. BUH and KDS collected the data for and developed classification models, built and tested KNIME workflows, wrote the manuscript, and participated in offline/online discussions. USN is the coordinator of the MeitY-sponsored project and participated in offline/online discussions. #AG and BUH contributed equally to this work. All authors have read and agreed with the contents of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gomatam, A., Hirlekar, B.U., Singh, K.D. et al. Improved QSAR models for PARP-1 inhibition using data balancing, interpretable machine learning, and matched molecular pair analysis. Mol Divers (2024). https://doi.org/10.1007/s11030-024-10809-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11030-024-10809-9