Skip to main content

Compound Collection Preparation for Virtual Screening

  • Protocol
  • First Online:
Bioinformatics and Drug Discovery

Part of the book series: Methods in Molecular Biology ((MIMB,volume 910))

Abstract

Virtual screening is an established technique that has successfully been deployed in the identification of novel biologically active molecules. Whether for ligand-based or for structure-based virtual screening, a chemical collection needs to be properly processed prior to in silico evaluation. Here we describe our step-by-step procedure for handling large collections of compounds prior to virtual screening.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Oprea TI (2011) Next generation QSAR. Mol Inform 30:89

    Article  Google Scholar 

  2. The PubChem service is hosted by the National Center for Biotechnology Information at NIH; http://pubchem.ncbi.nlm.nih.gov/

  3. ChEMBL is a database of bioactive drug-like molecules hosted by the European Bioinformatics Institute at EMBL; https://www.ebi.ac.uk/chembldb/

  4. Oprea TI, Ostopovici-Halip L, Rad-Curpan R (2010) Databases for chemical and biological information. In: Balakin KV (ed) Pharmaceutical data mining. Wiley, Hoboken, NJ, pp 491–520

    Google Scholar 

  5. Brown F (2005) Chemoinformatics – a ten year update. Curr Opin Drug Discov Dev 8:296–302

    Google Scholar 

  6. Horvath D (1997) A virtual screening approach applied to the search for trypanothione reductase inhibitors. J Med Chem 40:2412–2423

    Article  PubMed  CAS  Google Scholar 

  7. Walters WP, Stahl MT, Murcko MA (1998) Virtual screening - an overview. Drug Discov Today 3:160–178

    Article  CAS  Google Scholar 

  8. Mewes HW, Wachinger B, Stümpflen V (2010) Perspectives of a systems biology of the synapse: How to transform an indefinite data space into a model? Pharmacopsychiatry 43:S2–S8

    Article  PubMed  Google Scholar 

  9. Mestres J, Gregori-Puigjané E, Valverde S, Solé RV (2008) Data completeness - the Achilles heel of drug-target networks. Nat Biotechnol 26:983–984

    Article  PubMed  CAS  Google Scholar 

  10. Schwaighofer A, Schroeter T, Mika S, Blanchard G (2009) How wrong can we get? A review of machine learning approaches and error bars. Comb Chem High Throughput Screen 12:453–468

    Article  PubMed  CAS  Google Scholar 

  11. Edwards BS, Bologa CG, Young SM, Prossnitz ER, Sklar LA, Oprea TI (2005) Integration of virtual screening with high throughput flow cytometry to identify novel small molecule formylpeptide receptor antagonists. Mol Pharmacol 368:1301–1310

    Article  Google Scholar 

  12. Young SM, Bologa CG, Fara D, Bryant BK, Strouse JJ, Arterburn JB, Ye RD, Oprea TI, Prossnitz ER, Sklar LA, Edwards BS (2009) Duplex high-throughput flow cytometry screen identifies two novel formylpeptide receptor family probes. Cytometry 75A:253–263

    Article  CAS  Google Scholar 

  13. Dennis M, Burai R, Ramesh C, Petrie W, Alcon S, Nayak T, Bologa C, Leitão A, Brailoiu E, Deliu E, Dun NS, Sklar LA, Hathaway H, Arterburn JB, Oprea TI, Prossnitz ER (2009) In vivo effects of a GPR30 antagonist. Nat Chem Biol 5:421–427

    Article  PubMed  CAS  Google Scholar 

  14. Bologa CG, Revankar CM, Young SM, Edwards BS, Arterburn JB, Parker MA, Tkachenko SE, Savchuck NP, Sklar LA, Oprea TI, Prossnitz ER (2006) Virtual and biomolecular screening converge on a selective agonist for GPR30. Nat Chem Biol 2:207–212

    Article  PubMed  CAS  Google Scholar 

  15. Search conducted March 27, 2011 in the Institute for Scientific Information “Web of Science” application, http://apps.isiknowledge.com

  16. Oprea TI, Bologa CG, Boyer S, Curpan RF, Glen RC, Hopkins AL, Lipinski CA, Marshall GR, Martin YC, Ostopovici-Halip L, Rishton G, Ursu O, Vaz RJ, Waller CL, Waldmann H, Sklar LA (2009) A crowdsourcing evaluation of the NIH chemical probes. Nat Chem Biol 5:441–447

    Article  PubMed  CAS  Google Scholar 

  17. Arterburn JB, Oprea TI, Prossnitz ER, Edwards BS, Sklar LA (2009) Discovery of selective probes and antagonists for G protein-coupled Receptors FPR/FPRL1 and GPR30. Curr Top Med Chem 9:1227–1236

    Article  PubMed  CAS  Google Scholar 

  18. Koch MA, Schuffenhauer A, Scheck M, Wetzel S, Casaulta M, Odermatt A, Ertl P, Waldmann H (2005) Charting biologically relevant chemical space: a structural classification of natural products (SCONP). Proc Natl Acad Sci USA 102:17272–17277

    Article  PubMed  CAS  Google Scholar 

  19. Renner S, van Otterlo W, Dominguez Seoane M, Möcklinghoff S, Hofmann B, Wetzel S, Schuffenhauer A, Ertl P, Oprea TI, Steinhilber D, Brunsveld L, Rauh D, Waldmann H (2009) Bioactivity-guided mapping of and navigation in chemical space by means of hierarchical scaffold trees. Nat Chem Biol 5:585–592

    Article  PubMed  CAS  Google Scholar 

  20. Wetzel S, Klein K, Renner S, Rauh D, Oprea TI, Mutzel P, Waldmann H (2009) Interactive exploration of chemical space with Scaffold Hunter. Nat Chem Biol 5:581–583

    Article  PubMed  CAS  Google Scholar 

  21. Olah MM, Bologa CG, Oprea TI (2004) Strategies for compound selection. Curr Drug Discov Tech 1:211–220

    Article  CAS  Google Scholar 

  22. Fara DC, Oprea TI, Prossnitz ER, Bologa CG, Edwards BS, Sklar LA (2006) Integration of virtual and physical screening. Drug Discov Today Technol 3:377–385

    Article  Google Scholar 

  23. Oprea TI, Bologa CG, Edwards BS, Prossnitz EA, Sklar LA (2004) Post-HTS analysis: an empirical compound prioritization scheme. J Biomol Screen 10:419–425

    Article  Google Scholar 

  24. Baell JB, Holloway GA (2010) New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. J Med Chem 53:2719–2740

    Article  PubMed  CAS  Google Scholar 

  25. Young SM, Bologa CG, Oprea TI, Prossnitz ER, Sklar LA, Edwards BS (2005) Screening with HyperCyt high throughput flow ­cytometry to detect small-molecule formyl peptide receptor ligands. J Biomol Screen 10:374–382

    Article  PubMed  CAS  Google Scholar 

  26. Rishton G (2003) Nonleadlikeness and leadlikeness in biochemical screening. Drug Discov Today 8:86–96

    Article  PubMed  CAS  Google Scholar 

  27. McGovern SL, Caselli E, Grigorieff N, Shoichet BK (2002) A common mechanism underlying promiscuous inhibitors from virtual and high-throughput screening. J Med Chem 45:1712–1722

    Article  PubMed  CAS  Google Scholar 

  28. Roche O, Schneider P, Zuegge J, Guba W, Kansy M, Alanine A, Bleicher K, Danel F, Gutknecht EM, Rogers-Evans M, Neidhart W, Stalder H, Dillon M, Sjögren E, Fotouhi N, Gillespie P, Goodnow R, Harris W, Jones P, Taniguchi M, Tsujii S, von der Saal W, Zimmermann G, Schneider G (2002) Development of a virtual screening method for identification of ‘frequent hitters’ in compound libraries. J Med Chem 45:137–142

    Article  PubMed  CAS  Google Scholar 

  29. Oprea TI (2002) Lead structure searching: are we looking for the appropriate properties? J Comput Aided Mol Des 16:325–334

    Article  PubMed  CAS  Google Scholar 

  30. Austin CP, Brady LS, Insel TR, Collins FS (2004) NIH molecular libraries initiative. Science 306:1138–1139

    Article  PubMed  CAS  Google Scholar 

  31. Collins FS (2010) Research agenda. Opportunities for research and NIH. Science 327:36–37

    Article  PubMed  CAS  Google Scholar 

  32. Boguski MS, Mandl KD, Sukhatme VP (2009) Repurposing with a difference. Science 324:1394–1395

    Article  PubMed  CAS  Google Scholar 

  33. Toney JH, Fasick JI, Singh S, Beyrer C, Sullivan DJ Jr (2009) Purposeful learning with drug repurposing. Science 325:1139–1140

    Article  Google Scholar 

  34. Chong CR, Sullivan DJ Jr (2007) New uses for old drugs. Nature 448:645–646

    Article  PubMed  CAS  Google Scholar 

  35. Campillos M, Kuhn M, Gavin AC, Jensen LJ, Bork P (2008) Drug target identification using side-effect similarity. Science 321:263–266

    Article  PubMed  CAS  Google Scholar 

  36. Keiser MJ, Setola V, Irwin JJ, Laggner C, Abbas AI, Hufeisen SJ, Jensen NH, Kuijer MB, Matos RC, Tran TB, Whaley R, Glennon RA, Hert J, Thomas KLH, Edwards DD, Shoichet BK, Roth BL (2009) Predicting new molecular targets for known drugs. Nature 462:175–181

    Article  PubMed  CAS  Google Scholar 

  37. Ashburn TT, Thor KB (2004) Drug repositioning: identifying and developing new uses for existing drugs. Nat Rev Drug Discov 3:673–683

    Article  PubMed  CAS  Google Scholar 

  38. CTSA: http://www.ncrr.nih.gov/clinical_research_resources/clinical_and_translational_science_awards/

  39. Lipinski CA, Lombardo F, Dominy BW, Feeney PJ (1997) Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev 23:3–25

    Article  CAS  Google Scholar 

  40. Oprea TI (2000) Property distribution of drug-related chemical databases. J Comput Aided Mol Des 14:251–264

    Article  PubMed  CAS  Google Scholar 

  41. Ursu O, Oprea TI (2010) Model-free drug-likeness from fragments. J Chem Inf Model 50:1387–1394

    Article  PubMed  CAS  Google Scholar 

  42. Wester MJ, Pollock SN, Coutsias EA, Allu TK, Muresan S, Oprea TI (2008) Scaffold topologies. 2. Analysis of chemical databases. J Chem Inf Model 48:1311–1324

    Article  PubMed  CAS  Google Scholar 

  43. Teague SJ, Davis AM, Leeson PD, Oprea TI (1999) The design of leadlike combinatorial libraries. Angew Chem Int Ed 38:3743–3748, German version: Angew Chem 111:3962–3967

    Article  CAS  Google Scholar 

  44. Hann MM, Oprea TI (2004) Pursuing the leadlikeness concept in pharmaceutical research. Curr Opin Chem Biol 8:255–263

    Article  PubMed  CAS  Google Scholar 

  45. Oprea TI, Allu TK, Fara DC, Rad RF, Ostopovici L, Bologa CG (2007) Lead-like, drug-like or “Pub-like”: how different are they? J Comput Aided Mol Des 21:113–119

    Article  PubMed  CAS  Google Scholar 

  46. See the OpenEye Scientific Software, Santa Fe, NM website, http://www.eyesopen.com/

  47. See the Mesa Analytics & Computing, Santa Fe, NM website, http://www.mesaac.com/

  48. See the ChemAxon kft, Budapest, Hungary website, https://www.chemaxon.com/

  49. Accelrys Inc., San Diego, CA; http://www.accelrys.com/

  50. See the Chemical Computing Group website, http://www.chemcomp.com/

  51. Tripos, Inc. (a Certara company), St. Louis, MI; http://tripos.com/

  52. See the Daylight Chemical Information Systems, Inc., Santa Fe, NM, website, http://www.daylight.com/

  53. Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28:31–36

    Article  CAS  Google Scholar 

  54. Daylight Toolkit v4.81, Daylight Chemical Information Systems, Santa Fe, NM; http://www.daylight.com/

  55. The International Chemical Identifier, InChI, was a IUPAC project, http://www.iupac.org/inchi/

  56. OEChem Toolkit v1.3, Openeye Scientific Software, Santa Fe, NM; http://www.eyesopen.com/

  57. Open Babel; http://openbabel.sourceforge.net/

  58. Smi2fp_ascii, Daylight Chemical Information Systems, Santa Fe, NM; http://www.daylight.com/

  59. MACCSKeys320Generator, Mesa Analytics and Computing LLC, Santa Fe, NM; http://www.mesaac.com/

  60. Barnard JM, Downs GM (1997) Chemical fragment generation and clustering software. J Chem Inf Comput Sci 37: 141–142; see also http://www.digitalchemistry.co.uk/

    Google Scholar 

  61. Durant JL, Leland BA, Henry DR, Nourse JG (2002) Reoptimization of MDL keys for use in drug discovery. J Chem Inf Comput Sci 42:1273–1280

    Article  PubMed  CAS  Google Scholar 

  62. MOE: The Molecular Operating Environment from Chemical Computing Group Inc., Montreal, Quebec, http://www.chemcomp.com/

  63. Open Babel: The Open Source Chemistry Toolbox, http://openbabel.org/wiki/Main_Page

  64. CDK is a Java library for structural chemo- and bioinformatics, http://cdk.sf.net/

  65. Leo A (1993) Estimating LogPoct from structures. Chem Rev 5:1281–1306

    Article  Google Scholar 

  66. CLOGP is available from BioByte Corporation, Claremont, CA; http://www.biobyte.com/

  67. EPI Suite v3.11, U.S. Environmental Protection Agency, http://www.epa.gov/

  68. Tetko IV, Tanchuk VY.(2002) Application of associative neural networks for prediction of lipophilicity in ALOGPS 2.1 program. J Chem Inf Comput Sci 42, 1136–1145; http://vcclab.org/lab/alogps/index.html

  69. The Virtual Computational Chemistry Laboratory (VCCLAB) as a number of on-line software modules, available at http://vcclab.org/

  70. Molinspiration has a number of property calculators, including 3D conformer generation, at http://molinspiration.com/

  71. Digital Chemistry clustering package; http://www.digitalchemistry.co.uk/prod_clustering.html

  72. Cluster Package, Daylight Chemical Information Systems, Santa Fe, NM; http://www.daylight.com/

  73. Measures, Mesa Analytics and Computing LLC, Santa Fe, NM; http://www.mesaac.com/

  74. ChemoMine plc, Cambridge UK; http://www.chemomine.co.uk/

  75. MacCuish JD, MacCuish NE (2010) Chapman & Hall/CRC mathematical & computational biology, vol 40. Clustering in bioinformatics and drug discovery. Boca Raton, FL, 244 p

    Google Scholar 

  76. Pearlman RS (1987) Rapid generation of high quality approximate 3D molecular structures. Chem Design Auto News 2: 1–7; CONCORD is available from Tripos Inc, http://tripos.com

  77. Gasteiger J, Rudolph C, Sadowski J (1990) Automatic generation of 3D-atomic coordinates for organic molecules. Tetrahedron Comput Methodol 3: 537–547; CORINA is available from Molecular Networks GmbH, Erlangen, Germany; http://www.mol-net.de/

  78. Hawkins PCD, Skillman AG, Warren GL, Ellingson BA, Stahl MT (2010) Conformer generation with OMEGA: algorithm and validation using high quality structures from the Protein Databank and Cambridge Structural Database. J Chem Inf Model 50: 572–584; OpenEye Scientific Software Inc., Santa Fe, NM; http://www.eyesopen.com/

  79. MODDE 7, Umetrics AB, Umeå, Sweden; http://www.umetrics.com/

  80. The MLSMR collection can be determined using the PubChem interface (keyword, MLSMR) at http://pubchem.ncbi.nlm.nih.gov/

  81. Oprea TI, Nielsen SK, Ursu O, Yang JJ, Taboureau O, Mathias SL, Kouskoumvekaki I, Sklar LA, Bologa CG (2011) Associating drugs, targets and clinical outcomes into an integrated network affords a new platform for computer-aided drug repurposing. Mol Inform 30:100–111

    Article  PubMed  CAS  Google Scholar 

  82. FILTER is available from OpenEye Scientific Software Inc., Santa Fe, NM; http://www.eyesopen.com/products/applications/filter.html

  83. Olah M, Mracec M, Ostopovici L, Rad R, Bora A, Hadaruga N, Olah I, Banda M, Simon Z, Mracec M, Oprea TI (2004) WOMBAT: world of molecular bioactivity. In: Oprea TI (ed) Cheminformatics in drug discovery. Wiley-VCH, New York

    Google Scholar 

  84. Coats EA (1998) The CoMFA steroids as a benchmark dataset for development of 3D-QSAR methods. In: Kubinyi H, Folkers G, Martin YC (eds) 3D QSAR in drug design, vol 3, Recent advances. Kluwer/ESCOM, Dordrecht, pp 199–213

    Chapter  Google Scholar 

  85. Oprea TI, Olah M, Ostopovici L, Rad R, Mracec M (2003) On the propagation of errors in the QSAR literature. In: Ford M, Livingstone D, Dearden J, Van de Waterbeemd H (eds) EuroQSAR 2002 - Designing drugs and crop protectants: processes, problems and solutions. Blackwell Publishing, New York, pp 314–315

    Google Scholar 

  86. Chemical Database Management Software, TimTec Inc.; http://software.timtec.net/ched.htm

  87. Public web applications from UNM Biocomputing are available at http://pasilla.health.unm.edu

  88. Yang JJ, Ursu O, Bologa CG, Curpan RF, Halip L, Lipinski CA, Sklar LA, Oprea TI (2011) On promiscuous compounds: a MLSMR retrospective analysis. Manuscript in preparation

    Google Scholar 

  89. Johnston PA (2011) Redox cycling compounds generate H2O2 in HTS buffers containing strong reducing reagents-real hits or promiscuous artifacts? Curr Opin Chem Biol 15:174–182

    Article  PubMed  CAS  Google Scholar 

  90. Kenny PW, Sadowski J (2004) Structure modification in chemical databases. In: Oprea TI (ed) Cheminformatics in drug discovery. Wiley-VCH, New York

    Google Scholar 

  91. Martin YC (ed.) (2010). Perspectives in drug discovery and design: tautomers and tautomerism. J Comput Aided Mol Des. 24:473–638

    Google Scholar 

  92. Sadowski J, Gasteiger J (1993) From atoms and bonds to three-dimensional atomic coordinates: automatic model builders. Chem Rev 93:2567–2581

    Article  CAS  Google Scholar 

  93. See the Metabolomics Fiehn Lab site: http://fiehnlab.ucdavis.edu/staff/kind/ChemoInformatics/Concepts/3D-conformer/.

  94. Johnson MA, Maggiora GM (1990) Concepts and applications of molecular similarity. Wiley-VCH, New York

    Google Scholar 

  95. Maggiora GM (2006) On outliers and activity cliffs—Why QSAR often disappoints. J Chem Inf Model 46:1535

    Article  PubMed  CAS  Google Scholar 

  96. Oprea TI (2002) Chemical space navigation in lead discovery. Curr Opin Chem Biol 6:384–389

    Article  PubMed  CAS  Google Scholar 

  97. Todeschini R, Consonni V (2008) Handbook of molecular descriptors, 2nd edn. Wiley-VCH, Weinheim

    Google Scholar 

  98. Tanimoto TT (1961) Non-linear model for a computer assisted medical diagnostic procedure. Trans N Y Acad Sci Ser 2 23:576–580

    Article  Google Scholar 

  99. Tversky A (1977) Features of similarity. Psychol Rev 84:327–352

    Article  Google Scholar 

  100. Willett P (1987) Similarity and clustering techniques in chemical information systems. Research Studies, Letchworth

    Google Scholar 

  101. Willett P (2000) Chemoinformatics – similarity and diversity in chemical libraries. Curr Opin Biotech 11:85–88

    Article  PubMed  CAS  Google Scholar 

  102. Lewis RA, Pickett SD, Clark DE (2000) Computer-aided molecular diversity analysis and combinatorial library design. Rev Comput Chem 16:1–51

    Article  CAS  Google Scholar 

  103. Martin YC (2001) Diverse viewpoints on computational aspects of molecular diversity. J Comb Chem 3:231–250

    Article  PubMed  CAS  Google Scholar 

  104. Linusson A, Gottfries J, Lindgren F, Wold S (2000) Statistical molecular design of building blocks for combinatorial chemistry. J Med Chem 43:1320–1328

    Article  PubMed  CAS  Google Scholar 

  105. Eriksson L, Johansson E, Kettaneh-Wold N, Wikström C, Wold S (2000) Design of experiments: principles and applications. Umetrics Academy, Umeå

    Google Scholar 

  106. Taleb NN (2005) Fooled by randomness: the hidden role of chance in the markets and life. Random House, New York

    Google Scholar 

  107. Taleb NN (2007) The Black Swan. The impact of the highly improbable. Random House, New York

    Google Scholar 

  108. Sneader W (2005) Drug discovery: a history. Wiley, New York

    Book  Google Scholar 

  109. Boström J, Norrby P-O, Liljefors T (1998) Conformational energy penalties of protein-bound ligands. J ComputAided Mol Des 12:383–396

    Article  Google Scholar 

  110. Prossnitz ER, Arterburn JB, Edwards BS, Sklar LA, Oprea TI (2006) Steroid-binding GPCRs: new drug discovery targets for old ligands. Exp Opin Drug Discov 1:137–150

    Article  CAS  Google Scholar 

Download references

Acknowledgments

This work was supported, in part, by NIH grants 1R21GM095952-01 and 5U54MH084690-03. We thank Drs. Jeremy Yang and Oleg Ursu for useful discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tudor I. Oprea .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media New York

About this protocol

Cite this protocol

Bologa, C.G., Oprea, T.I. (2012). Compound Collection Preparation for Virtual Screening. In: Larson, R. (eds) Bioinformatics and Drug Discovery. Methods in Molecular Biology, vol 910. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-61779-965-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-61779-965-5_7

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-61779-964-8

  • Online ISBN: 978-1-61779-965-5

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics