Utilizing high throughput screening data for predictive toxicology models: protocols and application to MLSCN assays

  • Rajarshi GuhaEmail author
  • Stephan C. Schürer


Computational toxicology is emerging as an encouraging alternative to experimental testing. The Molecular Libraries Screening Center Network (MLSCN) as part of the NIH Molecular Libraries Roadmap has recently started generating large and diverse screening datasets, which are publicly available in PubChem. In this report, we investigate various aspects of developing computational models to predict cell toxicity based on cell proliferation screening data generated in the MLSCN. By capturing feature-based information in those datasets, such predictive models would be useful in evaluating cell-based screening results in general (for example from reporter assays) and could be used as an aid to identify and eliminate potentially undesired compounds. Specifically we present the results of random forest ensemble models developed using different cell proliferation datasets and highlight protocols to take into account their extremely imbalanced nature. Depending on the nature of the datasets and the descriptors employed we were able to achieve percentage correct classification rates between 70% and 85% on the prediction set, though the accuracy rate dropped significantly when the models were applied to in vivo data. In this context we also compare the MLSCN cell proliferation results with animal acute toxicity data to investigate to what extent animal toxicity can be correlated and potentially predicted by proliferation results. Finally, we present a visualization technique that allows one to compare a new dataset to the training set of the models to decide whether the new dataset may be reliably predicted.


Domain applicability HTS assay QSAR Cell proliferation Animal toxicity Jurkat cell line 



RG would like to acknowledge funding from NIH Grant No. P20 HG003894-01. SCS acknowledges the support by the National Institutes of Health Molecular Library Screening Center Network (Grant No U54 MH074404-01, Prof. Hugh Rosen, Principle Investigator).


  1. 1.
    Nidhi GM, Davies JW, Jenkins JL (2006) Prediction of biological targets for compounds using multiple-category Bayesian models trained on chemogenomics databases. J Chem Inf Model 46(3):1124–1133Google Scholar
  2. 2.
    Poroikov V, Filimonov D, Lagunin A, Gloriozova T, Zakharov A (2007) PASS: identification of probable targets and mechanisms of toxicity. SAR QSAR Environ Res 18:101–110CrossRefGoogle Scholar
  3. 3.
    Paakkari I (2002) Cardiotoxicity of new antihistamines and cisapride. Toxicol Lett 127(1–3):279–284CrossRefGoogle Scholar
  4. 4.
    Vandenberg JI, Walker BD, Campbell TJ (2001) Herg K+ channels: friend and foe. Trends Pharmacol Sci 22(5):240–246CrossRefGoogle Scholar
  5. 5.
    Maxwell DM, Brecht KM, Koplovitz I, Sweeney RE (2006) Acetylcholinesterase inhibition: does it explain the toxicity of organophosphorus compounds? Arch Toxicol 80(11):756–760CrossRefGoogle Scholar
  6. 6.
    Taylor P, Kovarik Z, Reiner E, Radic Z (2007) Acetylcholinesterase: converting a vulnerable target to a template for antidotes and detection of inhibitor exposure. Toxicology 233(1–3):70–78CrossRefGoogle Scholar
  7. 7.
    Clark RD, Wolohan PRN, Hodgkin EE, Kelly JH, Sussman NL (2004) Modelling in vitro hepatotoxicity using molecular interaction fields and SIMCA J Mol Graph Model 22(6):487–497CrossRefGoogle Scholar
  8. 8.
    Hodges G, Roberts DW, Marshall SJ, Dearden JC (2006) Defining the toxic mode of action of esther sulphonates using the joint toxicity of mixtures. Chemosphere 64(1):17–25CrossRefGoogle Scholar
  9. 9.
    Ankley GT, Villeneuve DL (2006) The fathead minnow in aquatic toxicology: past, present and future. Aquat Toxicol 78(1):91–102CrossRefGoogle Scholar
  10. 10.
    Lagunin AA, Zakharov AV, Filimonov DA, Poroikov VV (2007) A new approach to QSAR modelling of acute toxicity. Sar QSAR Environ Res 18(3–4):285–298CrossRefGoogle Scholar
  11. 11.
    Pasha FA, Srivastava HK, Srivastava A, Singh PP (2007) QSTR study of small organic molecules against Tetrahymena pyriformis. QSAR Comb Sci 26(1):69–84CrossRefGoogle Scholar
  12. 12.
    Yan XF, Xiao HM (2007) QSAR study of nitrobenzenes’ toxicity to tetrahymena pyriformis using semi-empirical quantum chemical methods. Chin J Struct Chem 26(1):7–14Google Scholar
  13. 13.
    Park SY, Lee SM, Ye SK, Yoon SH, Chung MH, Choi J (2006) Benzo[a]pyrene-induced DNA damage and p53 modulation in human hepatoma HepG2 cells for the identification of potential biomarkers for PAH monitoring and risk assessment. Toxicol Lett 167(1):27–33CrossRefGoogle Scholar
  14. 14.
    Roos PH, Tschirbs S, Pfeifer F, Welge P, Hack A, Wilhelm M, Bolt HM (2004) Risk potentials for humans of original and remediated PAH-contaminated soils: application of biomarkers of effect. Toxicology 205(3):181–194CrossRefGoogle Scholar
  15. 15.
    Niu J, Yu G (2004) Molecular structural characteristics governing biocatalytic chlorination of PAHs by chloroperoxidase from Caldariomyces fumago. SAR QSAR Environ Res 15(3):159–167CrossRefGoogle Scholar
  16. 16.
    Perugini M, Visciano P, Giammarino A, Manera M, Di Nardo W, Amorena M (2007) Polycyclic aromatic hydrocarbons in marine organisms from the Adriatic Sea, Italy. Chemosphere 66(10):1904–1910CrossRefGoogle Scholar
  17. 17.
    Bohonowych JE, Denison MS (2007) Persistent binding of ligands to the aryl hydrocarbon receptor. Toxicol Sci 98(1):99–109CrossRefGoogle Scholar
  18. 18.
    Chroust K, Pavlova M, Prokop Z, Mendel J, Bozkova K, Kubat Z, Zajickova V, Damborsky J (2007) Quantitative structure-activity relationships for toxicity and genotoxicity of halogenated aliphatic compounds: wing spot test of Drosophila melanogaster. Chemosphere 67(1):152–159CrossRefGoogle Scholar
  19. 19.
    Muellner MG, Wagner ED, McCalla K, Richardson SD, Woo YT, Plewa MJ (2007) Haloacetonitriles vs. regulated haloacetic acids: are nitrogen-containing DBPs more toxic? Environ Sci Technol 41(2):645–651CrossRefGoogle Scholar
  20. 20.
    Lu GH, Wang C, Li YM (2006) QSARS for acute toxicity of halogenated benzenes to bacteria in natural waters. Biomed Environ Sci 19(6):457–460Google Scholar
  21. 21.
    Liu HX, Papa E, Gramatica P (2006) QSAR prediction of estrogen activity for a large set of diverse chemicals under the guidance of OECD principles. Chem Res Toxicol 19(11):1540–1548CrossRefGoogle Scholar
  22. 22.
    Afantitis A, Melagraki G, Sarimveis H, Koutentis PA, Markopoulos J, Igglessi-Markopoulou O (2006) A novel QSAR model for predicting induction of apoptosis by 4-aryl-4H-chromenes. Bioorg Med Chem 14(19):6686–6694CrossRefGoogle Scholar
  23. 23.
    Mosier PD, Jurs PC (2002) QSAR/QSPR studies using probabilistic neural networks and generalized regression neural networks. J Chem Inf Comput Sci 42(6):1460–1470CrossRefGoogle Scholar
  24. 24.
    Kaiser KLE, Niculescu SP, Schultz TW (2002) Probabilistic neural network modeling of the toxicity of chemicals to Tetrahymena pyriformis with molecular fragment descriptors. SAR QAR Environ Res 13(1):57–67CrossRefGoogle Scholar
  25. 25.
    Roncaglioni A, Novic M, Vracko M, Benfenati E (2004) Classification of potential endocrine disrupters on the basis of molecular structure using a nonlinear modeling method. J Chem Inf Comput Sci 44(2):300–309CrossRefGoogle Scholar
  26. 26.
    Mazzatorta P, Vracko M, Jezierska A, Benfenati E (2003) Modeling toxicity by using supervised Kohonen neural networks. J Chem Inf Comput Sci 43(2):485–492CrossRefGoogle Scholar
  27. 27.
    Crettaz P, Benigni R (2005) Prediction of the rodent carcinogenicity of 60 pesticides by the DEREKfW expert system. J Chem Inf Model 45(6):1864–1873CrossRefGoogle Scholar
  28. 28.
    Veith GD (2004) On the nature, evolution and future of quantitative structure-activity relationships (QSAR) in toxicology. SAR QSAR Environ Res 15(5–6):323–330CrossRefGoogle Scholar
  29. 29.
    von Korff M, Sander T (2006) Toxicity-indicating structural patterns. J Chem Inf Model 46(2):536–544CrossRefGoogle Scholar
  30. 30.
    Xia M, Huang R, Witt KL, Southall N, Fostel J, Cho MH, Jadhav A, Smith CS, Inglese J, Portier CJ, Tice RR, Austin CP (2007) Compound cytotoxicity profiling using quantitative high-throughput screening. Environ Health Perspect, in press,  10.1289/ehp.10727
  31. 31.
    MDL (2006) MDL Toxicity Database, MDL, San RamonGoogle Scholar
  32. 32.
    Renner S, Fechner U, Schneider G (2006) Pharmacophores and pharmacophore searches. In: Langer T, Hoffmann RD (eds) Wiley-VCH, Wienheim, Germany 32:49–79Google Scholar
  33. 33.
    Breiman L (2001) Random forests. Machine Learning 45:5–32Google Scholar
  34. 34.
    Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Chapman & Hall/CRC, Boca Raton, FLGoogle Scholar
  35. 35.
    R Development Core Team (2005) A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, AustriaGoogle Scholar
  36. 36.
    Cho SJ, Hermsmeier MA (2002) Genetic algorithm guided selection: variable selection and subset selection. J Chem Inf Comput Sci 42:927–936CrossRefGoogle Scholar
  37. 37.
    Forrest S (1993) Genetic algorithms: principles of natural selection applied to computation. Science 261:872–878CrossRefGoogle Scholar
  38. 38.
    Leardi R (2001) Genetic algorithms in chemometrics and chemistry. J Chemo 15:559–569CrossRefGoogle Scholar
  39. 39.
    Derksen S, Keselman HJ (1992) Backward, forward and stepwise automated subset selection algorithms: frequency of obtaining authentic and noise variables. Br J Math Statis Psychol 45:265–282Google Scholar
  40. 40.
    Kirkpatrick S, Gelatt JCD, Vecchi MP (1983) Optimization by simulated annealing. Science 220:671–680CrossRefGoogle Scholar
  41. 41.
    Sutter JM, Dixon SL, Jurs PC (1995) Automated descriptor selection for quantitative structure-activity relationships using generalized simulated annealing. J Chem Inf Comput Sci 35:77–84CrossRefGoogle Scholar
  42. 42.
    Hanley JA, Mcneil BJ (1982) The meaning and use of the area under a Receiver Operating Characteristic (ROC) Curve. Radiology 143:29–36Google Scholar
  43. 43.
    Accelrys Scitegic Pipeline Pilot, San Diego, 2007Google Scholar
  44. 44.
    Cerri A, Serra F, Ferrari P, Folpini E, Padoani G, Melloni P (1997) Synthesis, cardiotonic activity, and structure-activity relationships of 17 beta-guanylhydrazone derivatives of 5 beta-androstane-3 beta, 14 beta-diol acting on the Na+,K(+)-ATPase receptor. J Med Chem 40(21):3484–3488CrossRefGoogle Scholar
  45. 45.
    Grove SJ, Kaur J, Muir AW, Pow E, Tarver GJ, Zhang MQ (2002) Oxyaniliniums as acetylcholinesterase inhibitors for the reversal of neuromuscular block. Bioorg Med Chem Lett 12(2):193–196CrossRefGoogle Scholar
  46. 46.
    Leader H, Wolfe AD, Chiang PK, Gordon RK (2002) Pyridophens: binary pyridostigmine-aprophen prodrugs with differential inhibition of acetylcholinesterase, butyrylcholinesterase, and muscarinic receptors. J Med Chem 45(4):902–910CrossRefGoogle Scholar
  47. 47.
    Sheridan RP, Feuston BP, Maiorov VN, Kearsley SK (2004) Similarity to molecules in the training set is a good discriminator for prediction accuracy in QSAR. J Chem Inf Comput Sci 44(6):1912–1928CrossRefGoogle Scholar
  48. 48.
    Guha R, Dutta D, Jurs PC, Chen T (2006) Local lazy regression: making use of the neighborhood to improve QSAR predictions. J Chem Inf Model 46(4):1836–1847CrossRefGoogle Scholar
  49. 49.
    Netzeva TI, Worth A, Aldenberg T, Benigni R, Cronin MTD, Gramatica P, Jaworska JS, Kahn S, Klopman G, Marchant CA, Myatt G, Nikolova-Jeliazkova N, Patlewicz GY, Perkins R, Roberts D, Schultz T, Stanton DW, van de Sandt JJM, Tong W, Veith G, Yang C (2005) Current status of methods for defining the applicability domain of (Quantitative) structure–activity relationships. The Report and Recommendations of ECVAM Workshop 52. Altern Lab Anim 33(2):155–173Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2008

Authors and Affiliations

  1. 1.School of InformaticsIndiana UniversityBloomingtonUSA
  2. 2.Department of Scientific ComputingThe Scripps Research InstituteJupiterUSA

Personalised recommendations