Abstract
We are now seeing the benefit of investments made over the last decade in high-throughput screening (HTS) that is resulting in large structure activity datasets entering public and open databases such as ChEMBL and PubChem. The growth of academic HTS screening centers and the increasing move to academia for early stage drug discovery suggests a great need for the informatics tools and methods to mine such data and learn from it. Collaborative Drug Discovery, Inc. (CDD) has developed a number of tools for storing, mining, securely and selectively sharing, as well as learning from such HTS data. We present a new web based data mining and visualization module directly within the CDD Vault platform for high-throughput drug discovery data that makes use of a novel technology stack following modern reactive design principles. We also describe CDD Models within the CDD Vault platform that enables researchers to share models, share predictions from models, and create models from distributed, heterogeneous data. Our system is built on top of the Collaborative Drug Discovery Vault Activity and Registration data repository ecosystem which allows users to manipulate and visualize thousands of molecules in real time. This can be performed in any browser on any platform. In this chapter we present examples of its use with public datasets in CDD Vault. Such approaches can complement other cheminformatics tools, whether open source or commercial, in providing approaches for data mining and modeling of HTS data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Macarron R, Banks MN, Bojanic D, Burns DJ, Cirovic DA, Garyantes T, Green DV, Hertzberg RP, Janzen WP, Paslay JW, Schopfer U, Sittampalam GS (2011) Impact of high-throughput screening in biomedical research. Nat Rev Drug Discov 10:188–195
Ekins S, Waller CL, Bradley MP, Clark AM, Williams AJ (2013) Four disruptive strategies for removing drug discovery bottlenecks. Drug Discov Today 18:265–271
Oprea TI, Bologa CG, Boyer S, Curpan RF, Glen RC, Hopkins AL, Lipinski CA, Marshall GR, Martin YC, Ostopovici-Halip L, Rishton G, Ursu O, Vaz RJ, Waller C, Waldmann H, Sklar LA (2009) A crowdsourcing evaluation of the Nih chemical probes. Nat Chem Biol 5:441–447
Roy A, McDonald PR, Sittampalam S, Chaguturu R (2010) Open access high throughput drug discovery in the public domain: a Mount Everest in the making. Curr Pharm Biotechnol 11:764–778
Kaiser J (2011) National Institutes of Health. Drug-screening program looking for a home. Science 334:299
Frye S, Crosby M, Edwards T, Juliano R (2011) US academic drug discovery. Nat Rev Drug Discov 10:409–410
Arrowsmith CH, Audia JE, Austin C, Baell J, Bennett J, Blagg J, Bountra C, Brennan PE, Brown PJ, Bunnage ME, Buser-Doepner C, Campbell RM, Carter AJ, Cohen P, Copeland RA, Cravatt B, Dahlin JL, Dhanak D, Edwards AM, Frederiksen M, Frye SV, Gray N, Grimshaw CE, Hepworth D, Howe T, Huber KV, Jin J, Knapp S, Kotz JD, Kruger RG, Lowe D, Mader MM, Marsden B, Mueller-Fahrnow A, Muller S, O’Hagan RC, Overington JP, Owen DR, Rosenberg SH, Roth B, Ross R, Schapira M, Schreiber SL, Shoichet B, Sundstrom M, Superti-Furga G, Taunton J, Toledo-Sherman L, Walpole C, Walters MA, Willson TM, Workman P, Young RN, Zuercher WJ (2015) The promise and peril of chemical probes. Nat Chem Biol 11:536–541
Litterman N, Lipinski CA, Bunin BA, Ekins S (2014) Computational prediction and validation of an expert’s evaluation of chemical probes. J Chem Inf Model 54:2996–3004
Payne DA, Gwynn MN, Holmes DJ, Pompliano DL (2007) Drugs for bad bugs: confronting the challenges of antibacterial discovery. Nat Rev Drug Discov 6:29–40
Wassermann AM, Camargo LM, Auld DS (2014) Composition and applications of focus libraries to phenotypic assays. Front Pharmacol 5:164
Mak PA, Rao SP, Ping Tan M, Lin X, Chyba J, Tay J, Ng SH, Tan BH, Cherian J, Duraiswamy J, Bifani P, Lim V, Lee BH, Ling Ma N, Beer D, Thayalan P, Kuhen K, Chatterjee A, Supek F, Glynne R, Zheng J, Boshoff HI, Barry CE 3rd, Dick T, Pethe K, Camacho LR (2012) A high-throughput screen to identify inhibitors of Atp homeostasis in non-replicating mycobacterium tuberculosis. ACS Chem Biol 7:1190–1197
Stanley SA, Grant SS, Kawate T, Iwase N, Shimizu M, Wivagg C, Silvis M, Kazyanskaya E, Aquadro J, Golas A, Fitzgerald M, Dai H, Zhang L, Hung DT (2012) Identification of novel inhibitors of M. tuberculosis growth using whole cell based high-throughput screening. ACS Chem Biol 7:1377–1384
Gold B, Pingle M, Brickner SJ, Shah N, Roberts J, Rundell M, Bracken WC, Warrier T, Somersan S, Venugopal A, Darby C, Jiang X, Warren JD, Fernandez J, Ouerfelli O, Nuermberger EL, Cunningham-Bussel A, Rath P, Chidawanyika T, Deng H, Realubit R, Glickman JF, Nathan CF (2012) Nonsteroidal anti-inflammatory drug sensitizes mycobacterium tuberculosis to endogenous and exogenous antimicrobials. Proc Natl Acad Sci U S A 109:16004–16011
Magnet S, Hartkoorn RC, Szekely R, Pato J, Triccas JA, Schneider P, Szantai-Kis C, Orfi L, Chambon M, Banfi D, Bueno M, Turcatti G, Keri G, Cole ST (2010) Leads for antitubercular compounds from kinase inhibitor library screens. Tuberculosis (Edinb) 90:354–360
Oprea TI, Matter H (2004) Integrating virtual screening in lead discovery. Curr Opin Chem Biol 8:349–358
Ekins S, Mestres J, Testa B (2007) In silico pharmacology for drug discovery: applications to targets and beyond. Br J Pharmacol 152:21–37
Ekins S, Mestres J, Testa B (2007) In silico pharmacology for drug discovery: methods for virtual ligand screening and profiling. Br J Pharmacol 152:9–20
McGaughey GB, Sheridan RP, Bayly CI, Culberson JC, Kreatsoulas C, Lindsley S, Maiorov V, Truchon JF, Cornell WD (2007) Comparison of topological, shape, and docking methods in virtual screening. J Chem Inf Model 47:1504–1519
Lombardo F, Obach RS, Dicapua FM, Bakken GA, Lu J, Potter DM, Gao F, Miller MD, Zhang Y (2006) A hybrid mixture discriminant analysis-random forest computational model for the prediction of volume of distribution of drugs in human. J Med Chem 49:2262–2267
Lombardo F, Obach RS, Shalaeva MY, Gao F (2004) Prediction of human volume of distribution values for neutral and basic drugs. 2. Extended data set and leave-class-out statistics. J Med Chem 47:1242–1250
Lombardo F, Obach RS, Shalaeva MY, Gao F (2002) Prediction of volume of distribution values in humans for neutral and basic drugs using physicochemical measurements and plasma protein binding. J Med Chem 45:2867–2876
Lombardo F, Shalaeva MY, Tupper KA, Gao F (2001) Elogdoct: a tool for lipophilicity determination in drug discovery. 2. Basic and neutral compounds. J Med Chem 44:2490–2497
Lombardo F, Blake JF, Curatolo WJ (1996) Computation of brain-blood partitioning of organic solutes via free energy calculations. J Med Chem 39:4750–4755
Lipinski CA, Lombardo F, Dominy BW, Feeney PJ (1997) Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev 23:3–25
Ekins S, Ring BJ, Grace J, McRobie-Belle DJ, Wrighton SA (2000) Present and future in vitro approaches for drug metabolism. J Pharmacol Toxicol Methods 44:313–324
Huggett B (2016) Academic partnerships 2015. Nat Biotechnol 34:372
Zientek M, Stoner C, Ayscue R, Klug-McLeod J, Jiang Y, West M, Collins C, Ekins S (2010) Integrated in silico-in vitro strategy for addressing cytochrome P450 3a4 time-dependent inhibition. Chem Res Toxicol 23:664–676
Lombardo F, Shalaeva MY, Tupper KA, Gao F, Abraham MH (2010) Elogpoct a tool for lipophilicity determination in drug discovery. J Med Chem 43:2922–2928
Lagorce D, Sperandio O, Galons H, Miteva MA, Villoutreix BO (2008) Faf-Drugs2: free Adme/tox filtering tool to assist drug discovery and chemical biology projects. BMC Bioinformatics 9:396
Villoutreix BO, Renault N, Lagorce D, Sperandio O, Montes M, Miteva MA (2007) Free resources to assist structure-based virtual ligand screening experiments. Curr Protein Pept Sci 8:381–411
Ekins S (2007) Computational toxicology: risk assessment for pharmaceutical and environmental chemicals. John Wiley and Sons, Hoboken, NJ
Balani SK, Miwa GT, Gan LS, Wu JT, Lee FW (2005) Strategy of utilizing in vitro and in vivo Adme tools for lead optimization and drug candidate selection. Curr Top Med Chem 5:1033–1038
van De Waterbeemd H, Smith DA, Beaumont K, Walker DK (2001) Property-based design: optimization of drug absorption and pharmacokinetics. J Med Chem 44:1313–1333
Walters WP, Murcko MA (2002) Prediction of ‘Drug-likeness’. Adv Drug Deliv Rev 54:255–271
Bento AP, Gaulton A, Hersey A, Bellis LJ, Chambers J, Davies M, Kruger FA, Light Y, Mak L, McGlinchey S, Nowotka M, Papadatos G, Santos R, Overington JP (2014) The Chembl bioactivity database: an update. Nucleic Acids Res 42:D1083–D1090
Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B, Overington JP (2012) Chembl: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 40:D1100–D1107
Papadatos G, Overington JP (2014) The Chembl database: a taster for medicinal chemists. Future Med Chem 6:361–364
Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Bryant SH (2009) Pubchem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Res 37:W623–W633
Wang Y, Bolton E, Dracheva S, Karapetyan K, Shoemaker BA, Suzek TO, Wang J, Xiao J, Zhang J, Bryant SH (2010) An overview of the Pubchem Bioassay resource. Nucleic Acids Res 38:D255–D266
Huang R, Xia M, Sakamuru S, Zhao J, Shahane SA, Attene-Ramos M, Zhao T, Austin CP, Simeonov A (2016) Modelling the Tox21 10 K chemical profiles for in vivo toxicity prediction and mechanism characterization. Nat Commun 7:10425
Dix DJ, Houck KA, Martin MT, Richard AM, Setzer RW, Kavlock RJ (2007) The Toxcast program for prioritizing toxicity testing of environmental chemicals. Toxicol Sci 95:5–12
Shah F, Greene N (2014) Analysis of Pfizer compounds in Epa’s Toxcast chemicals-assay space. Chem Res Toxicol 27:86–98
Hohman M, Gregory K, Chibale K, Smith PJ, Ekins S, Bunin B (2009) Novel web-based tools combining chemistry informatics, biology and social networks for drug discovery. Drug Discov Today 14:261–270
Ekins S, Hohman M, Bunin BA (2011) Pioneering use of the cloud for development of the collaborative drug discovery (Cdd) database. In: Ekins S, Hupcey MAZ, Williams AJ (eds) Collaborative computational technologies for biomedical research. Wiley and Sons, Hoboken, pp 335–361
Clark AM, Dole K, Coulon-Spector A, McNutt A, Grass G, Freundlich JS, Reynolds RC, Ekins S (2015) Open source Bayesian models: 1. Application to Adme/Tox and drug discovery datasets. J Chem Inf Model 55:1231–1245
Clark AM, Dole K, Ekins S (2015) Open source Bayesian models: 3. Composite models for prediction of binned responses. J Chem Inf Model 56:275–285
Clark AM, Ekins S (2015) Open source Bayesian models: 2. Mining a “big dataset” to create and validate models with Chembl. J Chem Inf Model 55:1246–1260
Balakin KV (2010) Pharmaceutical data mining : approaches and applications for drug discovery. John Wiley & Sons, Hoboken, NJ
Yan SF, King FJ, He Y, Caldwell JS, Zhou Y (2006) Learning from the data: mining of large high-throughput screening databases. J Chem Inf Model 46:2381–2395
Crisman TJ, Parker CN, Jenkins JL, Scheiber J, Thoma M, Kang ZB, Kim R, Bender A, Nettles JH, Davies JW, Glick M (2007) Understanding false positives in reporter gene assays: in silico chemogenomics approaches to prioritize cell-based Hts data. J Chem Inf Model 47:1319–1327
Johnson RL, Huang R, Jadhav A, Southall N, Wichterman J, MacArthur R, Xia M, Bi K, Printen J, Austin CP, Inglese J (2009) A quantitative high-throughput screen for modulators of Il-6 signaling: a model for interrogating biological networks using chemical libraries. Mol BioSyst 5:1039–1050
Hammann F, Drewe J (2012) Decision tree models for data mining in hit discovery. Expert Opin Drug Discov 7:341–352
Guilloux VL, Arrault A, Colliandre L, Bourg S, Vayer P, Morin-Allory L (2012) Mining collections of compounds with screening assistant 2. J Cheminform 4:20
Takada N, Ohmori N, Okada T (2013) Mining basic active structures from a large-scale database. J Cheminform 5:15
Soufan O, Ba-alawi W, Afeef M, Essack M, Rodionov V, Kalnis P, Bajic VB (2015) Mining chemical activity status from high-throughput screening assays. PLoS One 10:e0144426
Howe EA, de Souza A, Lahr DL, Chatwin S, Montgomery P, Alexander BR, Nguyen DT, Cruz Y, Stonich DA, Walzer G, Rose JT, Picard SC, Liu Z, Rose JN, Xiang X, Asiedu J, Durkin D, Levine J, Yang JJ, Schurer SC, Braisted JC, Southall N, Southern MR, Chung TD, Brudz S, Tanega C, Schreiber SL, Bittker JA, Guha R, Clemons PA (2015) Bioassay research database (Bard): chemical biology and probe-development enabled by structured metadata and result types. Nucleic Acids Res 43:D1163–D1170
Ekins S, Boulanger B, Swaan PW, Hupcey MA (2002) Towards a new age of virtual Adme/Tox and multidimensional drug discovery. Mol Divers 5:255–275
Gupta RR, Gifford EM, Liston T, Waller CL, Bunin B, Ekins S (2010) Using open source computational tools for predicting human metabolic stability and additional Adme/Tox properties. Drug Metab Dispos 38:2083–2090
Ekins S, Casey AC, Roberts D, Parish T, Bunin BA (2014) Bayesian models for screening and Tb mobile for target inference with mycobacterium tuberculosis. Tuberculosis (Edinb) 94:162–169
Ekins S, Reynolds RC, Franzblau SG, Wan B, Freundlich JS, Bunin BA (2013) Enhancing hit identification in mycobacterium tuberculosis drug discovery using validated dual-event Bayesian models. PLoS One 8:e63240
Ekins S, Reynolds RC, Kim H, Koo MS, Ekonomidis M, Talaue M, Paget SD, Woolhiser LK, Lenaerts AJ, Bunin BA, Connell N, Freundlich JS (2013) Bayesian models leveraging bioactivity and cytotoxicity information for drug discovery. Chem Biol 20:370–378
Ekins S, Freundlich JS, Hobrath JV, White EL, Reynolds RC (2014) Combining computational methods for hit to lead optimization in mycobacterium tuberculosis drug discovery. Pharm Res 31:414–435
Ekins S, Freundlich JS, Reynolds RC (2013) Fusing dual-event datasets for mycobacterium tuberculosis machine learning models and their evaluation. J Chem Inf Model 53:3054–3063
Ekins S, Pottorf R, Reynolds RC, Williams AJ, Clark AM, Freundlich JS (2014) Looking back to the future: predicting in vivo efficacy of small molecules versus mycobacterium tuberculosis. J Chem Inf Model 54:1070–1082
Ekins S, de Siqueira-Neto JL, McCall LI, Sarker M, Yadav M, Ponder EL, Kallel EA, Kellar D, Chen S, Arkin M, Bunin BA, McKerrow JH, Talcott C (2015) Machine learning models and pathway genome data base for Trypanosoma cruzi drug discovery. PLoS Negl Trop Dis 9:e0003878
Ekins S, Freundlich JS, Clark AM, Anantpadma M, Davey RA, Madrid P (2016) Machine learning models identify molecules active against the ebola virus in vitro. F1000Res 4:1091
Perryman AL, Stratton TP, Ekins S, Freundlich JS (2016) Predicting mouse liver microsomal stability with “pruned” machine learning models and public data. Pharm Res 33:433–449
Ekins S, Clark AM, Wright SH (2015) Making transporter models for drug-drug interaction prediction mobile. Drug Metab Dispos 43:1642–1645
Clark AM, Sarker M, Ekins S (2014) New target predictions and visualization tools incorporating open source molecular fingerprints for Tb mobile 2.0. J Cheminform 6:38
Lipinski CA, Litterman N, Southan C, Williams AJ, Clark AM, Ekins S (2015) The parallel worlds of public or commercial chemistry and biology data. J Med Chem 58:2068–2076
Jones DR, Ekins S, Li L, Hall SD (2017) Computational approaches that predict metabolic intermediate complex formation with Cyp3a4 (+B5). Drug Metab Dispos 35:1466–1475
Metz JT, Johnson EF, Soni NB, Merta PJ, Kifle L, Hajduk PJ (2011) Navigating the kinome. Nat Chem Biol 7:200–202
Davis MI, Hunt JP, Herrgard S, Ciceri P, Wodicka LM, Pallares G, Hocker M, Treiber DK, Zarrinkar PP (2011) Comprehensive analysis of kinase inhibitor selectivity. Nat Biotechnol 29:1046–1051
Clemons PA, Bodycombe NE, Carrinski HA, Wilson JA, Shamji AF, Wagner BK, Koehler AN, Schreiber SL (2010) Small molecules of different origins have distinct distributions of structural complexity that correlate with protein-binding profiles. Proc Natl Acad Sci U S A 107:18787–18792
Ekins S, Litterman NK, Lipinski CA, Bunin BA (2016) Thermodynamic proxies to compensate for biases in drug discovery methods. Pharm Res 33:194–205
Anastassiadis T, Deacon SW, Devarajan K, Ma H, Peterson JR (2011) Comprehensive assay of kinase catalytic activity reveals features of kinase inhibitor selectivity. Nat Biotechnol 29:1039–1045
Norman RA, Toader D, Ferguson AD (2012) Structural approaches to obtain kinase selectivity. Trends Pharmacol Sci 33:273–278
Niijima S, Shiraishi A, Okuno Y (2012) Dissecting kinase profiling data to predict activity and understand cross-reactivity of kinase inhibitors. J Chem Inf Model 52:901–912
Uitdehaag JC, Verkaar F, Alwan H, de Man J, Buijsman RC, Zaman GJ (2012) A guide to picking the most selective kinase inhibitor tool compounds for pharmacological validation of drug targets. Br J Pharmacol 166:858–876
Burrill GS (2010) In: Fourth annual CDD community meeting, San Francisco
Paillard G, Cochrane P, Jones PS, van Hoorn WP, Caracoti A, van Vlijmen H, Pannifer AD (2016) The Elf Honest Data Broker: informatics enabling public-private collaboration in a precompetitive arena. Drug Discov Today 21:97–102
http://rarediseases.info.nih.gov/Resources/Rare_Diseases_Information.aspx http://rarediseases.info.nih.gov/Resources/Rare_Diseases_Information.aspx
Acknowledgments
We acknowledge that the Bayesian model software within CDD was developed with support from Award Number 9R44TR000942-02 “Biocomputation across distributed private datasets to enhance drug discovery” from the NIH NCATS. The CDD TB has been developed thanks to funding from the Bill and Melinda Gates Foundation (Grant#49852 “Collaborative drug discovery for TB through a novel database of SAR data optimized to promote data archiving and sharing”). The work was partially supported by a grant from the European Community’s Seventh Framework Program (grant 260872, MM4TB Consortium) to S.E. S.E. gratefully acknowledges Biovia (formerly Accelrys) for providing Discovery Studio and Dr. Alexander Perryman and Dr. Joel Freundlich for their feedback and collaboration on CDD models. We sincerely acknowledge our many colleagues, collaborators, and advocates who have contributed to the development of CDD over the years.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Ekins, S. et al. (2018). Data Mining and Computational Modeling of High-Throughput Screening Datasets. In: Damoiseaux, R., Hasson, S. (eds) Reporter Gene Assays. Methods in Molecular Biology, vol 1755. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7724-6_14
Download citation
DOI: https://doi.org/10.1007/978-1-4939-7724-6_14
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-7722-2
Online ISBN: 978-1-4939-7724-6
eBook Packages: Springer Protocols