Abstract
Predicting compound chemical stability is important because unstable compounds can lead to either false positive or to false negative conclusions in bioassays. Experimental data (COMDECOM) measured from DMSO/H2O solutions stored at 50 °C for 105 days were used to predicted stability by applying rule-embedded naïve Bayesian learning, based upon atom center fragment (ACF) features. To build the naïve Bayesian classifier, we derived ACF features from 9,746 compounds in the COMDECOM dataset. By recursively applying naïve Bayesian learning from the data set, each ACF is assigned with an expected stable probability (p s ) and an unstable probability (p uns ). 13,340 ACFs, together with their p s and p uns data, were stored in a knowledge base for use by the Bayesian classifier. For a given compound, its ACFs were derived from its structure connection table with the same protocol used to drive ACFs from the training data. Then, the Bayesian classifier assigned p s and p uns values to the compound ACFs by a structural pattern recognition algorithm, which was implemented in-house. Compound instability is calculated, with Bayes’ theorem, based upon the p s and p uns values of the compound ACFs. We were able to achieve performance with an AUC value of 84 % and a tenfold cross validation accuracy of 76.5 %. To reduce false negatives, a rule-based approach has been embedded in the classifier. The rule-based module allows the program to improve its predictivity by expanding its compound instability knowledge base, thus further reducing the possibility of false negatives. To our knowledge, this is the first in silico prediction service for the prediction of the stabilities of organic compounds.
Similar content being viewed by others
References
Di L, Kerns EH (2009) Stability challenges in drug discovery. Chem Biodivers 6(11):1875–1886. doi:10.1002/cbdv.200900061
Blaxill Z, Holland-Crimmin S, Lifely R (2009) Stability through the ages: the GSK experience. J Biomol Screen 14(5):547–556. doi:10.1002/cbdv.200900061
Cheng XH, Hochlowski J, Tang H, Hepp D, Beckner C, Kantor S, Schmitt R (2003) Studies on repository compound stability in DMSO under various conditions. J Biomol Screen 8(3):292–304. doi:10.1177/1087057103008003007
Waterman KC, Adami RC, Alsante KM, Antipas AS, Arenson DR, Carrier R, Hong JY, Landis MS, Lombardo F, Shah JC, Shalaev E, Smith SW, Wang H (2002) Hydrolysis in pharmaceutical formulations. Pharm Dev Technol 7(2):113–146. doi:10.1081/PDT-120003494
Waterman KC, Adami RC, Alsante KM, Hong JY, Landis MS, Lombardo F, Roberts CJ (2002) Stabilization of pharmaceuticals to oxidative degradation. Pharm Dev Technol 7(1):1–32. doi:10.1081/PDT-120002237
Waterman KC, Adami RC (2005) Accelerated aging: prediction of chemical stability of pharmaceuticals. Int J Pharm 293(1–2):101–125. doi:10.1016/j.ijpharm.2004.12.013
Hochlowski J, Cheng XH, Sauer D, Djuric S (2003) Studies of the relative stability of TFA adducts vs non-TFA analogues for combinatorial chemistry library members in DMSO in a repository compound collection. J Comb Chem 5(4):345–349. doi:10.1021/cc0300107
Kozikowski BA, Burt TM, Tirey DA, Williams LE, Kuzmak BR, Stanton DT, Morand KL, Nelson SL (2003) The effect of freeze/thaw cycles on the stability of compounds in DMSO. J Biomol Screen 8(2):210–215. doi:10.1177/1087057103252618
Kozikowski BA, Burt TM, Tirey DA, Williams LE, Kuzmak BR, Stanton DT, Morand KL, Nelson SL (2003) The effect of room-temperature storage on the stability of compounds in DMSO. J Biomol Screen 8(2):205–209. doi:10.1177/1087057103252617
Engeloch C, Schopfer U, Muckenschnabel I, Le Goff F, Mees H, Boesch K, Popov M (2008) Stability of screening compounds in wet DMSO. J Biomol Screen 13(10):999–1006. doi:10.1177/1087057108326536
Popa-Burke I, Novick S, Lane CA, Hogan R, Torres-Saavedra P, Hardy B, Ray B, Lindsay M, Paulus I, Miller L (2014) The effect of initial purity on the stability of solutions in storage. J Biomol Screen 19(2):308–316. doi:10.1177/1087057113492201
Zitha-Bovens E, Maas P, Wife D, Tijhuis J, Hu QN, Kleinoder T, Gasteiger J (2009) COMDECOM: predicting the lifetime of screening compounds in DMSO solution. J Biomol Screen 14(5):557–565. doi:10.1177/1087057109336953
Xu J, Hagler A (2002) Chemoinformatics and drug discovery. Molecules 7(8):566–600. doi:10.3390/70800566
Cignitti M, Allen TL (1959) Bond energies and the interactions between next-nearest neighbors. I. Saturated hydrocarbons, diamond, sulfanes, S8, and organic sulfur compounds. J Chem Phys 43(12):4472–4478. doi:10.1021/ja00965a011
Berger JO (1993) Statistical decision theory and Bayesian analysis. Springer series in statistics, 2nd edn. Springer, New York
Xu J (1997) C-13 NMR spectral prediction by means of generalized atom center fragment method. Molecules 2(8):114–128. doi:10.3390/20800114
Kuhne R, Ebert RU, Schuurmann G (2009) Chemical domain of QSAR models from atom-centered fragments. J Chem Inf Model 49(12):2660–2669. doi:10.1021/ci900313u
Yan X, Gu Q, Lu F, Li J, Xu J (2012) GSA: a GPU-accelerated structure similarity algorithm and its application in progressive virtual screening. Mol Divers 16(4):759–769. doi:10.1007/s11030-012-9403-0
Klon AE (2009) Bayesian modeling in virtual high throughput screening. Comb Chem High Throughput Screen 12(5):469–483. doi:10.2174/138620709788489046
Chen L, Li YY, Zhao Q, Peng H, Hou TJ (2011) ADME evaluation in drug discovery. 10. Predictions of P-glycoprotein inhibitors using recursive partitioning and naive Bayesian classification techniques. Mol Pharm 8(3):889–900. doi:10.1021/mp100465q
Broccatelli P (2012) QSAR models for P-glycoprotein transport based on a highly consistent data set. J Chem Inf Model 52(9):2462–2470. doi:10.1021/ci3002809
Martins IF, Teixeira AL, Pinheiro L, Falcao AO (2012) A Bayesian approach to in silico blood–brain barrier penetration modeling. J Chem Inf Model 52(6):1686–1697. doi:10.1021/ci300124c
Townsend JA, Glen RC, Mussa HY (2012) Note on naive Bayes based on binary descriptors in Cheminformatics. J Chem Inf Model 52(10):2494–2500. doi:10.1021/ci200303m
Varnek A, Baskin I (2012) Machine learning methods for property prediction in chemoinformatics: quo vadis? J Chem Inf Model 52(6):1413–1437. doi:10.1021/ci200409x
Sun HM (2005) A naive Bayes classifier for prediction of multidrug resistance reversal activity on the basis of atom typing. J Med Chem 48(12):4031–4039. doi:10.1021/jm050180t
Prathipati P, Ma NL, Keller TH (2008) Global Bayesian models for the prioritization of antitubercular agents. J Chem Inf Model 48(12):2362–2370. doi:10.1021/ci800143n
Xia XY, Maliski EG, Gallant P, Rogers D (2004) Classification of kinase inhibitors using a Bayesian model. J Med Chem 47(18):4463–4470. doi:10.1021/jm0303195
Sun HM (2006) An accurate and interpretable Bayesian classification model for prediction of hERG liability. ChemMedChem 1(3):315–322. doi:10.1002/cmdc.200500047
Mussa HY, Mitchell JB, Glen RC (2013) Full “Laplacianised” posterior naive Bayesian algorithm. J Cheminform 5(1):37. doi:10.1186/1758-2946-5-37
Singh N, Chaudhury S, Liu R, AbdulHameed MD, Tawa G, Wallqvist A (2012) QSAR classification model for antibacterial compounds and its use in virtual screening. J Chem Inf Model 52(10):2559–2569. doi:10.1021/ci300336v
Xu J (2002) A new approach to finding natural chemical structure classes. J Med Chem 45(24):5311–5320. doi:10.1021/jm010520k
Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, Pon A, Banco K, Mak C, Neveu V, Djoumbou Y, Eisner R, Guo AC, Wishart DS (2011) DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs. Nucleic Acids Res 39:D1035–D1041. doi:10.1093/nar/gkq1126
Zhou Y, Zhou B, Chen K, Yan SF, King FJ, Jiang S, Winzeler EA (2007) Large-scale annotation of small-molecule libraries using public databases. J Chem Inf Model 47(4):1386–1394. doi:10.1021/ci700092v
Yan A, Hu X, Wang K, Sun J (2013) Discriminating of ATP competitive Src kinase inhibitors and decoys using self-organizing map and support vector machine. Mol Divers 17(1):75–83. doi:10.1007/s11030-012-9411-0
Rishton GM (1997) Reactive compounds and in vitro false positives in HTS. Drug Discov Today 2:382–384. doi:10.1016/S1359-6446(97)01083-0
Sushko I, Novotarskyi S, Korner R, Pandey AK, Cherkasov A, Lo JZ, Gramatica P, Hansen K, Schroeter T, Muller KR, Xi LL, Liu HX, Yao XJ, Oberg T, Hormozdiari F, Dao PH, Sahinalp C, Todeschini R, Polishchuk P, Artemenko A, Kuz’min V, Martin TM, Young DM, Fourches D, Muratov E, Tropsha A, Baskin I, Horvath D, Marcou G, Muller C, Varnek A, Prokopenko VV, Tetko IV (2010) Applicability domains for classification problems: benchmarking of distance to models for Ames mutagenicity set. J Chem Inf Model 50(12):2094–2111. doi:10.1021/ci100253r
Vorberg S, Tetko IV (2014) Modeling the biodegradability of chemical compounds using the online CHEmical modeling environment (OCHEM). Mol Inform 33(1):73–85. doi:10.1002/minf.201300030
Sushko I, Novotarskyi S, Korner R, Pandey AK, Rupp M, Teetz W, Brandmaier S, Abdelaziz A, Prokopenko VV, Tanchuk VY, Todeschini R, Varnek A, Marcou G, Ertl P, Potemkin V, Grishina M, Gasteiger J, Schwab C, Baskin II, Palyulin VA, Radchenko EV, Welsh WJ, Kholodovych V, Chekmarev D, Cherkasov A, Aires-de-Sousa J, Zhang QY, Bender A, Nigsch F, Patiny L, Williams A, Tkachenko V, Tetko IV (2011) Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information. J Comput Aided Mol Des 25(6):533–554. doi:10.1007/s10822-011-9440-2
Acknowledgments
This work was supported by a grant from the National High Technology Research and Development Program of China (863 Program) (No. 2012AA020307), the Guangdong Recruitment Program of Creative Research Groups, the National Natural Science Foundation of China (No. 81173470), and the Special Funding Program for the National Supercomputer Center in Guangzhou (2012Y2-00048/2013Y2-00045, 201200000037). The authors would like to thank Mr. Heming Xu of Columbia University for his comments and corrections, which have improved the manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Zhihong Liu and Minghao Zheng have contributed equally to this work.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Liu, Z., Zheng, M., Yan, X. et al. ChemStable: a web server for rule-embedded naïve Bayesian learning approach to predict compound stability. J Comput Aided Mol Des 28, 941–950 (2014). https://doi.org/10.1007/s10822-014-9778-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-014-9778-3