To advance translational research of potential therapeutic small molecules against infectious microbes, the compounds must display a relative lack of mammalian cell cytotoxicity. Vero cell cytotoxicity (CC50) is a common initial assay for this metric. We explored the development of naïve Bayesian models that can enhance the probability of identifying non-cytotoxic compounds.
Vero cell cytotoxicity assays were identified in PubChem, reformatted, and curated to create a training set with 8741 unique small molecules. These data were used to develop Bayesian classifiers, which were assessed with internal cross-validation, external tests with a set of 193 compounds from our laboratory, and independent validation with an additional diverse set of 1609 unique compounds from PubChem.
Evaluation with independent, external test and validation sets indicated that cytotoxicity Bayesian models constructed with the ECFP_6 descriptor were more accurate than those that used FCFP_6 fingerprints. The best cytotoxicity Bayesian model displayed predictive power in external evaluations, according to conventional and chance-corrected statistics, as well as enrichment factors.
The results from external tests demonstrate that our novel cytotoxicity Bayesian model displays sufficient predictive power to help guide translational research. To assist the chemical tool and drug discovery communities, our curated training set is being distributed as part of the Supplementary Material.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Absorption, metabolism, distribution, excretion and toxicity
Assay Identification number on PubChem BioAssay
Extended class fingerprints of maximum diameter 6
Molecular function class fingerprints of maximum diameter 6
Negative predictive value (filtering rate)
Positive predictive value (hit rate)
Quantitative Structure-Activity Relationships
Simplified molecular-input line-entry system
- Vero CC50 :
Vero cell (African green monkey kidney cell) 50% cytotoxicity value
Kola I, Landis J. Can the pharmaceutical industry reduce attrition rates? Nat Rev Drug Discov. 2004;3(8):711–5.
Schoonen WG, Westerink WM, Horbach GJ. High-throughput screening for analysis of in vitro toxicity. EXS. 2009;99:401–52.
Segall MD, Barber C. Addressing toxicity risk when designing and selecting compounds in early drug discovery. Drug Discov Today. 2014;19(5):688–93.
Chekmarev DS, Kholodovych V, Balakin KV, Ivanenkov Y, Ekins S, Welsh WJ. Shape signatures: new descriptors for predicting cardiotoxicity in silico. Chem Res Toxicol. 2008;21(6):1304–14.
Polak S, Wisniowska B, Fijorek K, Glinka A, Polak M, Mendyk A. The open-access dataset for insilico cardiotoxicity prediction system. Bioinformation. 2011;6(6):244–5.
Ekins S, Williams AJ, Xu JJ. A predictive ligand-based Bayesian model for human drug-induced liver injury. Drug Metab Dispos. 2010;38(12):2302–8.
Greene N, Fisk L, Naven RT, Note RR, Patel ML, Pelletier DJ. Developing structure-activity relationships for the prediction of hepatotoxicity. Chem Res Toxicol. 2010;23(7):1215–22.
Rodgers AD, Zhu H, Fourches D, Rusyn I, Tropsha A. Modeling liver-related adverse effects of drugs using knearest neighbor quantitative structure-activity relationship method. Chem Res Toxicol. 2010;23(4):724–32.
Liew CY, Lim YC, Yap CW. Mixed learning algorithms and features ensemble in hepatotoxicity prediction. J Comput Aided Mol Des. 2011;25(9):855–71.
Ekins S. Progress in computational toxicology. J Pharmacol Toxicol Methods. 2014;69(2):115–40.
Zhang H, Chen QY, Xiang ML, Ma CY, Huang Q, Yang SY. In silico prediction of mitochondrial toxicity by using GA-CG-SVM approach. Toxicol in Vitro. 2009;23(1):134–40.
Lin Z, Will Y. Evaluation of drugs with specific organ toxicities in organ-specific cell lines. Toxicol Sci. 2012;126(1):114–27.
Lakshminarayana SB, Huat TB, Ho PC, Manjunatha UH, Dartois V, Dick T, et al. Comprehensive physicochemical, pharmacokinetic and activity profiling of anti-TB agents. J Antimicrob Chemother. 2015;70(3):857–67.
Riss TL, Moravec RA. Use of multiple assay endpoints to investigate the effects of incubation time, dose of toxin, and plating density in cell-based cytotoxicity assays. Assay Drug Dev Technol. 2004;2(1):51–62.
Manjunatha UH, Smith PW. Perspective: challenges and opportunities in TB drug discovery from phenotypic screening. Bioorg Med Chem. 2015;23(16):5087–97.
Franzblau SG, DeGroote MA, Cho SH, Andries K, Nuermberger E, Orme IM, et al. Comprehensive analysis of methods used for the evaluation of compounds against Mycobacterium tuberculosis. Tuberculosis (Edinb). 2012;92(6):453–88.
Kim H, Yoon SC, Lee TY, Jeong D. Discriminative cytotoxicity assessment based on various cellular damages. Toxicol Lett. 2009;184(1):13–7.
Schrey AK, Nickel-Seeber J, Drwal MN, Zwicker P, Schultze N, Haertel B, et al. Computational prediction of immune cell cytotoxicity. Food Chem Toxicol. 2017;107(Pt A):150–66.
Moon H, Cong M. Predictive models of cytotoxicity as mediated by exposure to chemicals or drugs. SAR QSAR Environ Res. 2016;27(6):455–68.
Adhikari N, Halder AK, Saha A, Das Saha K, Jha T. Structural findings of phenylindoles as cytotoxic antimitotic agents in human breast cancer cell lines through multiple validated QSAR studies. Toxicol in Vitro. 2015;29(7):1392–404.
Ekins S, Freundlich JS, Hobrath JV, Lucile White E, Reynolds RC. Combining computational methods for hit to lead optimization in Mycobacterium tuberculosis drug discovery. Pharm Res. 2014;31(2):414–35.
Stouch TR, Kenyon JR, Johnson SR, Chen XQ, Doweyko A, Li Y. In silico ADME/Tox: why models fail. J Comput Aided Mol Des. 2003;17(2–4):83–92.
Johnson SR. The trouble with QSAR (or how I learned to stop worrying and embrace fallacy). J Chem Inf Model. 2008;48(1):25–6.
Ekins S, Reynolds RC, Kim H, Koo M-S, Ekonomidis M, Talaue M, et al. Bayesian models leveraging bioactivity and cytotoxicity information for drug discovery. Chem Biol. 2013;20:370–8.
Ekins S, Perryman AL, Clark AM, Reynolds RC, Freundlich JS. Machine learning model analysis and data visualization with small molecules tested in a mouse model of Mycobacterium tuberculosis infection (2014-2015). J Chem Inf Model. 2016;56(7):1332–43.
Perryman AL, Stratton TP, Ekins S, Freundlich JS. Predicting mouse liver microsomal stability with "pruned" machine learning models and public data. Pharm Res. 2016;33(2):433–49.
Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Zhou Z, et al. PubChem's BioAssay database. Nucleic Acids Res. 2012;40(Database issue):D400–12.
Smith CJ, Hansch C, Morton MJ. QSAR treatment of multiple toxicities: the mutagenicity and cytotoxicity of quinolines. Mutat Res. 1997;379(2):167–75.
Skibo EB, Xing C, Dorr RT. Aziridinyl quinone antitumor agents based on indoles and cyclopent[b]indoles: structure-activity relationships for cytotoxicity and antitumor activity. J Med Chem. 2001;44(22):3545–62.
Weinstein JN, Myers TG, O'Connor PM, Friend SH, Fornace AJ Jr, Kohn KW, et al. An information-intensive approach to the molecular pharmacology of cancer. Science. 1997;275(5298):343–9.
Swamidass SJ, Chen J, Bruand J, Phung P, Ralaivola L, Baldi P. Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity. Bioinformatics. 2005;21(Suppl 1):i359–68.
Lee AC, Shedden K, Rosania GR, Crippen GM. Data mining the NCI60 to predict generalized cytotoxicity. J Chem Inf Model. 2008;48(7):1379–88.
Molnar L, Keseru GM, Papp A, Lorincz Z, Ambrus G, Darvas F. A neural network based classification scheme for cytotoxicity predictions:validation on 30,000 compounds. Bioorg Med Chem Lett. 2006;16(4):1037–9.
Guha R, Schurer SC. Utilizing high throughput screening data for predictive toxicology models: protocols and application to MLSCN assays. J Comput Aided Mol Des. 2008;22(6–7):367–84.
Boik JC, Newman RA. Structure-activity models of oral clearance, cytotoxicity, and LD50: a screen for promising anticancer compounds. BMC Pharmacol. 2008;8:12.
Huang R, Southall N, Xia M, Cho MH, Jadhav A, Nguyen DT, et al. Weighted feature significance: a simple, interpretable model of compound toxicity based on the statistical enrichment of structural features. Toxicol Sci. 2009;112(2):385–93.
Langdon SR, Mulgrew J, Paolini GV, van Hoorn WP. Predicting cytotoxicity from heterogeneous data sources with Bayesian learning. J Cheminform. 2010;2(1):11.
Chang CY, Hsu MT, Esposito EX, Tseng YJ. Oversampling to overcome overfitting: exploring the relationship between data set composition, molecular descriptors, and predictive modeling methods. J Chem Inf Model. 2013;53(4):958–71.
Mervin LH, Cao Q, Barrett IP, Firth MA, Murray D, McWilliams L, et al. Understanding cytotoxicity and Cytostaticity in a high-throughput screening collection. ACS Chem Biol. 2016;11(11):3007–23.
Stratton TP, Perryman AL, Vilcheze C, Russo R, Li SG, Patel JS, et al. Addressing the metabolic stability of Antituberculars through machine learning. ACS Med Chem Lett. 2017;8(10):1099–104.
Hu Y, Unwalla R, Denny RA, Bikker J, Di L, Humblet C. Development of QSAR models for microsomal stability: identification of good and bad structural features for rat, human and mouse microsomal stability. J Comput Aided Mol Des. 2010;24(1):23–35.
Conflicts of Interest
S.E. is the Founder and CEO of Collaborations Pharmaceuticals Inc.
Electronic supplementary material
About this article
Cite this article
Perryman, A.L., Patel, J.S., Russo, R. et al. Naïve Bayesian Models for Vero Cell Cytotoxicity. Pharm Res 35, 170 (2018). https://doi.org/10.1007/s11095-018-2439-9
- Bayesian model
- machine learning
- predicting mammalian cytotoxicity
- translational research
- vero cell CC50