Skip to main content

Cheminformatic Analysis of High-Throughput Compound Screens

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1056))

Abstract

This article gives an overview of basic computational methods that are commonly used for analyzing small molecule screening data in the chemical genomics field. First, we introduce cheminformatic concepts for analyzing drug-like small molecule structures and their properties. Second, we introduce compound selection approaches for assembling screening libraries using compound property and diversity analyses. Finally, we discuss methods for interpreting screening hits by analyzing compound structures and induced phenotypes using similarity search and clustering approaches. These are critical steps for optimizing screening hits, and relating structure to bioactivity and phenotype.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Oprea TI (2002) Chemical space navigation in lead discovery. Curr Opin Chem Biol 6(3):384–389

    Article  PubMed  CAS  Google Scholar 

  2. Strausberg RL, Schreiber SL (2003) From knowing to controlling: a path from genomics to drugs using small molecule probes. Science 300(5617):294–295

    Article  PubMed  CAS  Google Scholar 

  3. Savchuk NP, Balakin KV, Tkachenko SE (2004) Exploring the chemogenomic knowledge space with annotated chemical libraries. Curr Opin Chem Biol 8(4):412–417

    Article  PubMed  CAS  Google Scholar 

  4. Haggarty SJ (2005) The principle of complementarity: chemical versus biological space. Curr Opin Chem Biol 9(3):296–303

    Article  PubMed  CAS  Google Scholar 

  5. Oprea TI, Tropsha A, Faulon JL, Rintoul MD (2007) Systems chemical biology. Nat Chem Biol 3(8):447–450

    Article  PubMed  CAS  Google Scholar 

  6. Dobson CM (2004) Chemical space and biology. Nature 432(7019):824–828

    Article  PubMed  CAS  Google Scholar 

  7. Hattori M, Okuno YY, Goto S, Kanehisa M (2003) Heuristics for chemical compound matching. Genome Inform 14:144–153

    PubMed  CAS  Google Scholar 

  8. Zhang P, Foerster H, Tissier CP, Mueller L, Paley S, Karp PD, Rhee SY (2005) MetaCyc and AraCyc. Metabolic pathway databases for plant research. Plant Physiol 138(1):27–37

    Article  PubMed  CAS  Google Scholar 

  9. Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M (2006) From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 34(Database issue):354–357

    Article  Google Scholar 

  10. Schreiber SL (1998) Chemical genetics resulting from a passion for synthetic organic chemistry. Bioorg Med Chem 6(8):1127–1152

    Article  PubMed  CAS  Google Scholar 

  11. Olah MM, Bologa CG, Oprea TI (2004) Strategies for compound selection. Curr Drug Discov Technol 1(3):211–220

    Article  PubMed  CAS  Google Scholar 

  12. Li Q, Cheng T, Wang Y, Bryant SH (2010) PubChem as a public resource for drug discovery. Drug Discov Today 15(23–24):1052–1057

    Article  PubMed  CAS  Google Scholar 

  13. Austin CP, Brady LS, Insel TR, Collins FS (2004) NIH molecular libraries initiative. Science 306(5699):1138–1139

    Article  PubMed  CAS  Google Scholar 

  14. PubChem Team (2008) PubChem is a NCBI database that provides information on the biological activities of small molecules. http://pubchem.ncbi.nlm.nih.gov

  15. Seiler KP, George GA, Happ MP, Bodycombe NE, Carrinski HA, Norton S, Brudz S, Sullivan JP, Muhlich J, Serrano M, Ferraiolo P, Tolliday NJ, Schreiber SL, Clemons PA (2008) ChemBank: a small-molecule screening and cheminformatics resource database. Nucleic Acids Res 36(Database issue):351–359

    Google Scholar 

  16. Ihlenfeldt WD, Voigt JH, Bienfait B, Oellien F, Nicklaus MC (2002) Enhanced CACTVS browser of the open NCI database. J Chem Inf Comput Sci 42(1):46–57

    PubMed  CAS  Google Scholar 

  17. Chen JH, Linstead E, Swamidass SJ, Wang D, Baldi P (2007) ChemDB update-full-text search and virtual chemical space. Bioinformatics 23(17):2348–2351

    Article  PubMed  CAS  Google Scholar 

  18. Irwin JJ, Shoichet BK (2005) ZINC-a free database of commercially available compounds for virtual screening. J Chem Inf Model 45(1):177–182

    Article  PubMed  CAS  Google Scholar 

  19. Liu T, Lin Y, Wen X, Jorissen RN, Gilson MK (2007) BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Res 35(4):198–201

    Article  Google Scholar 

  20. Girke T, Cheng LC, Raikhel N (2005) ChemMine. A compound mining database for chemical genomics. Plant Physiol 138(2):573–577

    Article  PubMed  CAS  Google Scholar 

  21. Wang R, Fang X, Lu Y, Wang S (2004) The PDBbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. J Med Chem 47(12):2977–2980

    Article  PubMed  CAS  Google Scholar 

  22. Degtyarenko K, de Matos P, Ennis M, Hastings J, Zbinden M, McNaught A, Alcántara R, Darsow M, Guedj M, Ashburner M (2008) ChEBI: a database and ontology for chemical entities of biological interest. Nucleic Acids Res 36(Database issue):344–350

    Google Scholar 

  23. Block P, Sotriffer CA, Dramburg I, Klebe G (2006) AffinDB: a freely accessible database of affinities for protein-ligand complexes from the PDB. Nucleic Acids Res 34(Database issue):522–526

    Article  Google Scholar 

  24. Kuhn M, von Mering C, Campillos M, Jensen LJ, Bork P (2008) STITCH: interaction networks of chemicals and proteins. Nucleic Acids Res 36(Database issue):684–688

    Google Scholar 

  25. Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, Gautam B, Hassanali M (2008) DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res 36(Database issue):901–906

    Google Scholar 

  26. Günther S, Kuhn M, Dunkel M, Campillos M, Senger C, Petsalaki E, Ahmed J, Urdiales EG, Gewiess A, Jensen LJ, Schneider R, Skoblo R, Russell RB, Bourne PE, Bork P, Preissner R (2008) SuperTarget and Matador: resources for exploring drug-target relationships. Nucleic Acids Res 36(Database issue):919–922

    Google Scholar 

  27. Goede A, Dunkel M, Mester N, Frommel C, Preissner R (2005) SuperDrug: a conformational drug database. Bioinformatics 21(9):1751–1753

    Article  PubMed  CAS  Google Scholar 

  28. Backman TW, Cao Y, Girke T (2011) Chemmine tools: an online service for analyzing and clustering small molecules. Nucleic Acids Res 39(Web Server issue):486–491

    Article  Google Scholar 

  29. Zhu Q, Lajiness MS, Ding Y, Wild DJ (2010) WENDI: a tool for finding non-obvious relationships between compounds and biological properties, genes, diseases and scholarly publications. J Cheminform 2:6

    Article  PubMed  Google Scholar 

  30. Walker T, Grulke CM, Pozefsky D, Tropsha A (2010) Chembench: a cheminformatics workbench. Bioinformatics 26(23):3000–3001

    Article  PubMed  CAS  Google Scholar 

  31. Spjuth O, Helmus T, Willighagen EL, Kuhn S, Eklund M, Wagener J, Murray-Rust P, Steinbeck C, Wikberg JE (2007) Bioclipse: an open source workbench for chemo- and bioinformatics. BMC Bioinforma 8:59

    Article  Google Scholar 

  32. Berthold MR, Cebron N, Dill F, Gabriel TR, Kotter T, Meinl T, Ohl P, Sieb C, Thiel K, Wiswedel B (2007) KNIME: the Konstanz information miner. Springer, New York

    Google Scholar 

  33. Cao Y, Charisi A, Cheng LC, Jiang T, Girke T (2008) ChemmineR: a compound mining framework for R. Bioinformatics 24(15):1733–1734

    Article  PubMed  CAS  Google Scholar 

  34. Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E (2003) The chemistry development kit (cdk): an open-source java library for chemo- and bioinformatics. J Chem Inf Comput Sci 43(2):493–500

    PubMed  CAS  Google Scholar 

  35. Steinbeck C, Hoppe C, Kuhn S, Floris M, Guha R, Willighagen EL (2006) Recent developments of the chemistry development kit (CDK)—an open-source java library for chemo- and bioinformatics. Curr Pharm Des 12(17):2111–2120

    Article  PubMed  CAS  Google Scholar 

  36. Guha R, Howard MT, Hutchison GR, Murray-Rust P, Rzepa H, Steinbeck C, Wegner J, Willighagen EL (2006) The blue obelisk-interoperability in chemical informatics. J Chem Inf Model 46(3):991–998

    Article  PubMed  CAS  Google Scholar 

  37. Sykora VJ, Leahy DE (2008) Chemical Descriptors Library (CDL): a generic, open source software library for chemical informatics. J Chem Inf Model 48:1931–1942

    Article  PubMed  CAS  Google Scholar 

  38. Wegner JK, Fröhlich H, Zell A (2004) Feature selection for descriptor based classification models. 2. Human intestinal absorption (HIA). J Chem Inf Comput Sci 44(3):931–939

    PubMed  CAS  Google Scholar 

  39. Sheridan RP, Kearsley SK (2002) Why do we need so many chemical similarity search methods? Drug Discov Today 7(17):903–911

    Article  PubMed  Google Scholar 

  40. Chen X, Reynolds CH (2002) Performance of similarity measures in 2D fragment-based similarity searching: comparison of structural descriptors and similarity coefficients. J Chem Inf Comput Sci 42(6):1407–1414

    PubMed  CAS  Google Scholar 

  41. Cheng AC, Coleman RG, Smyth KT, Cao Q, Soulard P, Caffrey DR, Salzberg AC, Huang ES (2007) Structure-based maximal affinity model predicts small-molecule druggability. Nat Biotechnol 25(1):71–75

    Article  PubMed  Google Scholar 

  42. Lipinski CA, Lombardo F, Dominy BW, Feeney PJ (1997) Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev 23(1–3):3–25

    Article  CAS  Google Scholar 

  43. Baurin N, Baker R, Richardson C, Chen I, Foloppe N, Potter A, Jordan A, Roughley S, Parratt M, Greaney P, Morley D, Hubbard RE (2004) Drug-like annotation and duplicate analysis of a 23-supplier chemical database totalling 2.7 million compounds. J Chem Inf Comput Sci 44(2):643–651

    PubMed  CAS  Google Scholar 

  44. Tetko IV, Gasteiger J, Todeschini R, Mauri A, Livingstone D, Ertl P, Palyulin VA, Radchenko EV, Zefirov NS, Makarenko AS, Tanchuk VY, Prokopenko VV (2005) Virtual computational chemistry laboratory-design and description. J Comput Aided Mol Des 19(6):453–463

    Article  PubMed  CAS  Google Scholar 

  45. Monge A, Arrault A, Marot C, Morin-Allory L (2006) Managing, profiling and analyzing a library of 2.6 million compounds gathered from 32 chemical providers. Mol Divers 10(3):389–403

    Article  PubMed  CAS  Google Scholar 

  46. Hajduk PJ, Sauer DR (2008) Statistical analysis of the effects of common chemical substituents on ligand potency. J Med Chem 51(3):553–564

    Article  PubMed  CAS  Google Scholar 

  47. Gedeck P, Rohde B, Bartels C (2006) QSAR-how good is it in practice? Comparison of descriptor sets on an unbiased cross section of corporate data sets. J Chem Inf Model 46(5):1924–1936

    Article  PubMed  CAS  Google Scholar 

  48. Sutherland JJ, O’Brien LA, Weaver DF (2004) A comparison of methods for modeling quantitative structure-activity relationships. J Med Chem 47(22):5541–5554

    Article  PubMed  CAS  Google Scholar 

  49. van der Walt C, Barnard E (2006) Data characteristics that determine classifier performance. Proceedings of 16th annual symposium of the pattern recognition association of South Africa, pp 160–165

    Google Scholar 

  50. Ivanciuc O (2007) Applications of support vector machines in chemistry. Rev Comput Chem 23:291

    CAS  Google Scholar 

  51. Gentleman R, Carey V, Dudoit S, Irizarry R, Huber W (2005) Bioinformatics and computational biology solutions using R and bioconductor. Springer, New York

    Book  Google Scholar 

  52. Backman TW, Cao Y, Girke T (2011) Chemmine tools: an online service for analyzing and clustering small molecules. Nucleic Acids Res 39:W4386–W491

    Article  Google Scholar 

  53. Verheij HJ (2006) Leadlikeness and structural diversity of synthetic screening libraries. Mol Divers 10(3):377–388

    Article  PubMed  CAS  Google Scholar 

  54. Baell JB, Holloway GA (2010) New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. J Med Chem 53(7):2719–2740

    Article  PubMed  CAS  Google Scholar 

  55. Guha R (2007) Chemical Informatics functionality in R. J Stat Softw 18(8):1–16

    Google Scholar 

  56. Landon MR, Schaus SE (2006) JEDA: Joint entropy diversity analysis. An information-theoretic method for choosing diverse and representative subsets from combinatorial libraries. Mol Divers 10(3):333–339

    Article  PubMed  CAS  Google Scholar 

  57. Perez JJ (2005) Managing molecular diversity. Chem Soc Rev 34(2):143–152

    Article  PubMed  CAS  Google Scholar 

  58. Pau G, Fuchs F, Sklyar O, Boutros M, Huber W (2010) EBImage-an R package for image processing with applications to cellular phenotypes. Bioinformatics 26(7):979–981

    Article  PubMed  CAS  Google Scholar 

  59. Wang X, Terfve C, Rose JC, Markowetz F (2011) HTSanalyzeR: an R/Bioconductor package for integrated network analysis of high-throughput screens. Bioinformatics 27(6):879–880

    Article  PubMed  CAS  Google Scholar 

  60. Cao Y, Jiang T, Girke T (2010) Accelerated similarity searching and clustering of large compound sets by geometric embedding and locality sensitive hashing. Bioinformatics 26(7):953–959

    Article  PubMed  CAS  Google Scholar 

  61. Cao Y, Jiang T, Girke T (2008) A maximum common substructure-based algorithm for searching and predicting drug-like compounds. Bioinformatics 24(13):366–374

    Article  Google Scholar 

Download references

Acknowledgments

We thank the community software development projects listed in Table 1. We also acknowledge support from the core facilities at the Institute for Integrative Genome Biology (IIGB) at UC Riverside. The authors cheminformatics tools (ChemMine Tools and ChemmineR) were developed with support from the National Science Foundation [grant numbers: ABI-0957099, 2010–0520325 and IGERT-0504249].

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media, New York

About this protocol

Cite this protocol

Backman, T.W.H., Girke, T. (2014). Cheminformatic Analysis of High-Throughput Compound Screens. In: Hicks, G., Robert, S. (eds) Plant Chemical Genomics. Methods in Molecular Biology, vol 1056. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-592-7_15

Download citation

  • DOI: https://doi.org/10.1007/978-1-62703-592-7_15

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-62703-591-0

  • Online ISBN: 978-1-62703-592-7

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics