Skip to main content
Log in

ChemStable: a web server for rule-embedded naïve Bayesian learning approach to predict compound stability

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

Predicting compound chemical stability is important because unstable compounds can lead to either false positive or to false negative conclusions in bioassays. Experimental data (COMDECOM) measured from DMSO/H2O solutions stored at 50 °C for 105 days were used to predicted stability by applying rule-embedded naïve Bayesian learning, based upon atom center fragment (ACF) features. To build the naïve Bayesian classifier, we derived ACF features from 9,746 compounds in the COMDECOM dataset. By recursively applying naïve Bayesian learning from the data set, each ACF is assigned with an expected stable probability (p s ) and an unstable probability (p uns ). 13,340 ACFs, together with their p s and p uns data, were stored in a knowledge base for use by the Bayesian classifier. For a given compound, its ACFs were derived from its structure connection table with the same protocol used to drive ACFs from the training data. Then, the Bayesian classifier assigned p s and p uns values to the compound ACFs by a structural pattern recognition algorithm, which was implemented in-house. Compound instability is calculated, with Bayes’ theorem, based upon the p s and p uns values of the compound ACFs. We were able to achieve performance with an AUC value of 84 % and a tenfold cross validation accuracy of 76.5 %. To reduce false negatives, a rule-based approach has been embedded in the classifier. The rule-based module allows the program to improve its predictivity by expanding its compound instability knowledge base, thus further reducing the possibility of false negatives. To our knowledge, this is the first in silico prediction service for the prediction of the stabilities of organic compounds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Di L, Kerns EH (2009) Stability challenges in drug discovery. Chem Biodivers 6(11):1875–1886. doi:10.1002/cbdv.200900061

    Article  CAS  Google Scholar 

  2. Blaxill Z, Holland-Crimmin S, Lifely R (2009) Stability through the ages: the GSK experience. J Biomol Screen 14(5):547–556. doi:10.1002/cbdv.200900061

    Article  CAS  Google Scholar 

  3. Cheng XH, Hochlowski J, Tang H, Hepp D, Beckner C, Kantor S, Schmitt R (2003) Studies on repository compound stability in DMSO under various conditions. J Biomol Screen 8(3):292–304. doi:10.1177/1087057103008003007

    Article  CAS  Google Scholar 

  4. Waterman KC, Adami RC, Alsante KM, Antipas AS, Arenson DR, Carrier R, Hong JY, Landis MS, Lombardo F, Shah JC, Shalaev E, Smith SW, Wang H (2002) Hydrolysis in pharmaceutical formulations. Pharm Dev Technol 7(2):113–146. doi:10.1081/PDT-120003494

    Article  CAS  Google Scholar 

  5. Waterman KC, Adami RC, Alsante KM, Hong JY, Landis MS, Lombardo F, Roberts CJ (2002) Stabilization of pharmaceuticals to oxidative degradation. Pharm Dev Technol 7(1):1–32. doi:10.1081/PDT-120002237

    Article  CAS  Google Scholar 

  6. Waterman KC, Adami RC (2005) Accelerated aging: prediction of chemical stability of pharmaceuticals. Int J Pharm 293(1–2):101–125. doi:10.1016/j.ijpharm.2004.12.013

    Article  CAS  Google Scholar 

  7. Hochlowski J, Cheng XH, Sauer D, Djuric S (2003) Studies of the relative stability of TFA adducts vs non-TFA analogues for combinatorial chemistry library members in DMSO in a repository compound collection. J Comb Chem 5(4):345–349. doi:10.1021/cc0300107

    Article  CAS  Google Scholar 

  8. Kozikowski BA, Burt TM, Tirey DA, Williams LE, Kuzmak BR, Stanton DT, Morand KL, Nelson SL (2003) The effect of freeze/thaw cycles on the stability of compounds in DMSO. J Biomol Screen 8(2):210–215. doi:10.1177/1087057103252618

    Article  CAS  Google Scholar 

  9. Kozikowski BA, Burt TM, Tirey DA, Williams LE, Kuzmak BR, Stanton DT, Morand KL, Nelson SL (2003) The effect of room-temperature storage on the stability of compounds in DMSO. J Biomol Screen 8(2):205–209. doi:10.1177/1087057103252617

    Article  CAS  Google Scholar 

  10. Engeloch C, Schopfer U, Muckenschnabel I, Le Goff F, Mees H, Boesch K, Popov M (2008) Stability of screening compounds in wet DMSO. J Biomol Screen 13(10):999–1006. doi:10.1177/1087057108326536

    Article  CAS  Google Scholar 

  11. Popa-Burke I, Novick S, Lane CA, Hogan R, Torres-Saavedra P, Hardy B, Ray B, Lindsay M, Paulus I, Miller L (2014) The effect of initial purity on the stability of solutions in storage. J Biomol Screen 19(2):308–316. doi:10.1177/1087057113492201

    Article  Google Scholar 

  12. Zitha-Bovens E, Maas P, Wife D, Tijhuis J, Hu QN, Kleinoder T, Gasteiger J (2009) COMDECOM: predicting the lifetime of screening compounds in DMSO solution. J Biomol Screen 14(5):557–565. doi:10.1177/1087057109336953

    Article  CAS  Google Scholar 

  13. Xu J, Hagler A (2002) Chemoinformatics and drug discovery. Molecules 7(8):566–600. doi:10.3390/70800566

    Article  CAS  Google Scholar 

  14. Cignitti M, Allen TL (1959) Bond energies and the interactions between next-nearest neighbors. I. Saturated hydrocarbons, diamond, sulfanes, S8, and organic sulfur compounds. J Chem Phys 43(12):4472–4478. doi:10.1021/ja00965a011

    Article  Google Scholar 

  15. Berger JO (1993) Statistical decision theory and Bayesian analysis. Springer series in statistics, 2nd edn. Springer, New York

    Google Scholar 

  16. Xu J (1997) C-13 NMR spectral prediction by means of generalized atom center fragment method. Molecules 2(8):114–128. doi:10.3390/20800114

    Article  CAS  Google Scholar 

  17. Kuhne R, Ebert RU, Schuurmann G (2009) Chemical domain of QSAR models from atom-centered fragments. J Chem Inf Model 49(12):2660–2669. doi:10.1021/ci900313u

    Article  Google Scholar 

  18. Yan X, Gu Q, Lu F, Li J, Xu J (2012) GSA: a GPU-accelerated structure similarity algorithm and its application in progressive virtual screening. Mol Divers 16(4):759–769. doi:10.1007/s11030-012-9403-0

    Article  CAS  Google Scholar 

  19. Klon AE (2009) Bayesian modeling in virtual high throughput screening. Comb Chem High Throughput Screen 12(5):469–483. doi:10.2174/138620709788489046

    Article  CAS  Google Scholar 

  20. Chen L, Li YY, Zhao Q, Peng H, Hou TJ (2011) ADME evaluation in drug discovery. 10. Predictions of P-glycoprotein inhibitors using recursive partitioning and naive Bayesian classification techniques. Mol Pharm 8(3):889–900. doi:10.1021/mp100465q

    Article  CAS  Google Scholar 

  21. Broccatelli P (2012) QSAR models for P-glycoprotein transport based on a highly consistent data set. J Chem Inf Model 52(9):2462–2470. doi:10.1021/ci3002809

    Article  CAS  Google Scholar 

  22. Martins IF, Teixeira AL, Pinheiro L, Falcao AO (2012) A Bayesian approach to in silico blood–brain barrier penetration modeling. J Chem Inf Model 52(6):1686–1697. doi:10.1021/ci300124c

    Article  CAS  Google Scholar 

  23. Townsend JA, Glen RC, Mussa HY (2012) Note on naive Bayes based on binary descriptors in Cheminformatics. J Chem Inf Model 52(10):2494–2500. doi:10.1021/ci200303m

    Article  CAS  Google Scholar 

  24. Varnek A, Baskin I (2012) Machine learning methods for property prediction in chemoinformatics: quo vadis? J Chem Inf Model 52(6):1413–1437. doi:10.1021/ci200409x

    Article  CAS  Google Scholar 

  25. Sun HM (2005) A naive Bayes classifier for prediction of multidrug resistance reversal activity on the basis of atom typing. J Med Chem 48(12):4031–4039. doi:10.1021/jm050180t

    Article  CAS  Google Scholar 

  26. Prathipati P, Ma NL, Keller TH (2008) Global Bayesian models for the prioritization of antitubercular agents. J Chem Inf Model 48(12):2362–2370. doi:10.1021/ci800143n

    Article  CAS  Google Scholar 

  27. Xia XY, Maliski EG, Gallant P, Rogers D (2004) Classification of kinase inhibitors using a Bayesian model. J Med Chem 47(18):4463–4470. doi:10.1021/jm0303195

    Article  CAS  Google Scholar 

  28. Sun HM (2006) An accurate and interpretable Bayesian classification model for prediction of hERG liability. ChemMedChem 1(3):315–322. doi:10.1002/cmdc.200500047

    Article  CAS  Google Scholar 

  29. Mussa HY, Mitchell JB, Glen RC (2013) Full “Laplacianised” posterior naive Bayesian algorithm. J Cheminform 5(1):37. doi:10.1186/1758-2946-5-37

    Article  CAS  Google Scholar 

  30. Singh N, Chaudhury S, Liu R, AbdulHameed MD, Tawa G, Wallqvist A (2012) QSAR classification model for antibacterial compounds and its use in virtual screening. J Chem Inf Model 52(10):2559–2569. doi:10.1021/ci300336v

    Article  CAS  Google Scholar 

  31. Xu J (2002) A new approach to finding natural chemical structure classes. J Med Chem 45(24):5311–5320. doi:10.1021/jm010520k

    Article  CAS  Google Scholar 

  32. Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, Pon A, Banco K, Mak C, Neveu V, Djoumbou Y, Eisner R, Guo AC, Wishart DS (2011) DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs. Nucleic Acids Res 39:D1035–D1041. doi:10.1093/nar/gkq1126

    Article  CAS  Google Scholar 

  33. Zhou Y, Zhou B, Chen K, Yan SF, King FJ, Jiang S, Winzeler EA (2007) Large-scale annotation of small-molecule libraries using public databases. J Chem Inf Model 47(4):1386–1394. doi:10.1021/ci700092v

    Article  CAS  Google Scholar 

  34. Yan A, Hu X, Wang K, Sun J (2013) Discriminating of ATP competitive Src kinase inhibitors and decoys using self-organizing map and support vector machine. Mol Divers 17(1):75–83. doi:10.1007/s11030-012-9411-0

    Article  CAS  Google Scholar 

  35. Rishton GM (1997) Reactive compounds and in vitro false positives in HTS. Drug Discov Today 2:382–384. doi:10.1016/S1359-6446(97)01083-0

    Article  CAS  Google Scholar 

  36. Sushko I, Novotarskyi S, Korner R, Pandey AK, Cherkasov A, Lo JZ, Gramatica P, Hansen K, Schroeter T, Muller KR, Xi LL, Liu HX, Yao XJ, Oberg T, Hormozdiari F, Dao PH, Sahinalp C, Todeschini R, Polishchuk P, Artemenko A, Kuz’min V, Martin TM, Young DM, Fourches D, Muratov E, Tropsha A, Baskin I, Horvath D, Marcou G, Muller C, Varnek A, Prokopenko VV, Tetko IV (2010) Applicability domains for classification problems: benchmarking of distance to models for Ames mutagenicity set. J Chem Inf Model 50(12):2094–2111. doi:10.1021/ci100253r

    Article  CAS  Google Scholar 

  37. Vorberg S, Tetko IV (2014) Modeling the biodegradability of chemical compounds using the online CHEmical modeling environment (OCHEM). Mol Inform 33(1):73–85. doi:10.1002/minf.201300030

    Article  CAS  Google Scholar 

  38. Sushko I, Novotarskyi S, Korner R, Pandey AK, Rupp M, Teetz W, Brandmaier S, Abdelaziz A, Prokopenko VV, Tanchuk VY, Todeschini R, Varnek A, Marcou G, Ertl P, Potemkin V, Grishina M, Gasteiger J, Schwab C, Baskin II, Palyulin VA, Radchenko EV, Welsh WJ, Kholodovych V, Chekmarev D, Cherkasov A, Aires-de-Sousa J, Zhang QY, Bender A, Nigsch F, Patiny L, Williams A, Tkachenko V, Tetko IV (2011) Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information. J Comput Aided Mol Des 25(6):533–554. doi:10.1007/s10822-011-9440-2

    Article  CAS  Google Scholar 

Download references

Acknowledgments

This work was supported by a grant from the National High Technology Research and Development Program of China (863 Program) (No. 2012AA020307), the Guangdong Recruitment Program of Creative Research Groups, the National Natural Science Foundation of China (No. 81173470), and the Special Funding Program for the National Supercomputer Center in Guangzhou (2012Y2-00048/2013Y2-00045, 201200000037). The authors would like to thank Mr. Heming Xu of Columbia University for his comments and corrections, which have improved the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Xu.

Additional information

Zhihong Liu and Minghao Zheng have contributed equally to this work.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 197 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Z., Zheng, M., Yan, X. et al. ChemStable: a web server for rule-embedded naïve Bayesian learning approach to predict compound stability. J Comput Aided Mol Des 28, 941–950 (2014). https://doi.org/10.1007/s10822-014-9778-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-014-9778-3

Keywords

Navigation