Skip to main content
Log in

Mappability of drug-like space: towards a polypharmacologically competent map of drug-relevant compounds

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

Intuitive, visual rendering—mapping—of high-dimensional chemical spaces (CS), is an important topic in chemoinformatics. Such maps were so far dedicated to specific compound collections—either limited series of known activities, or large, even exhaustive enumerations of molecules, but without associated property data. Typically, they were challenged to answer some classification problem with respect to those same molecules, admired for their aesthetical virtues and then forgotten—because they were set-specific constructs. This work wishes to address the question whether a general, compound set-independent map can be generated, and the claim of “universality” quantitatively justified, with respect to all the structure–activity information available so far—or, more realistically, an exploitable but significant fraction thereof. The “universal” CS map is expected to project molecules from the initial CS into a lower-dimensional space that is neighborhood behavior-compliant with respect to a large panel of ligand properties. Such map should be able to discriminate actives from inactives, or even support quantitative neighborhood-based, parameter-free property prediction (regression) models, for a wide panel of targets and target families. It should be polypharmacologically competent, without requiring any target-specific parameter fitting. This work describes an evolutionary growth procedure of such maps, based on generative topographic mapping, followed by the validation of their polypharmacological competence. Validation was achieved with respect to a maximum of exploitable structure–activity information, covering all of Homo sapiens proteins of the ChEMBL database, antiparasitic and antiviral data, etc. Five evolved maps satisfactorily solved hundreds of activity-based ligand classification challenges for targets, and even in vivo properties independent from training data. They also stood chemogenomics-related challenges, as cumulated responsibility vectors obtained by mapping of target-specific ligand collections were shown to represent validated target descriptors, complying with currently accepted target classification in biology. Therefore, they represent, in our opinion, a robust and well documented answer to the key question “What is a good CS map?”

Graphical Abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Abbreviations

(Q)SPR/SAR:

(Quantitative) structure–property/structure–activity relationships

CS:

Chemical space

GTM:

Generative topographic map

HTS:

High throughput screening

References

  1. Virshup AM, Contreras-Garcia J, Wipf P, Yang W, Beratan DN (2013) J Am Chem Soc 135(19):7296

    Article  CAS  Google Scholar 

  2. Reker D, Rodrigues T, Schneider P, Schneider G (2014) PNAS 111(11):4067

    Article  CAS  Google Scholar 

  3. Bonachera F, Marcou G, Kireeva N, Varnek A, Horvath D (2012) Bioorg Med Chem 20:5396

    Article  CAS  Google Scholar 

  4. Kohonen T (2001) Self-organizing maps. Springer, Heidelberg

    Book  Google Scholar 

  5. Agrafiotis DK, Rassokhin DN, Lobanov VS (2001) J Comput Chem 22(5):488

    Article  CAS  Google Scholar 

  6. Agrafiotis DK (2003) J Comput Chem 24(10):1215

    Article  CAS  Google Scholar 

  7. Sander T, Freyss J, von Korff M, Rufener C (2014) J Chem Inf Model 55(2):460

    Article  Google Scholar 

  8. Gaspar H, Marcou G, Horvath D, Arault A, Lozano S, Vayer P, Varnek A (2013) J Chem Inf Model 53(12):3318

    Article  CAS  Google Scholar 

  9. Kireeva N, Baskin II, Gaspar HA, Horvath D, Marcou G, Varnek A (2012) Mol Inf 31(3–4):301

    Article  CAS  Google Scholar 

  10. Oprea TI, Gottfries J (2001) J Comb Chem 3(2):157

    Article  CAS  Google Scholar 

  11. Renner S, van Otterlo WAL, Dominguez Seoane M, Mocklinghoff S, Hofmann B, Wetzel S, Schuffenhauer A, Ertl P, Oprea TI, Steinhilber D, Brunsveld L, Rauh D, Waldmann H (2009) Nat Chem Biol 5(8):585

    Article  CAS  Google Scholar 

  12. Lloyd DG, Golfis G, Knox AJS, Fayne D, Meegan MJ, Oprea TI (2006) Drug Discov Today 11(3–4):149

    Article  CAS  Google Scholar 

  13. Matero S, Lahtela-Kakkonen M, Korhonen O, Ketolainen J, Lappalainen R, Poso A (2006) Chemom Intell Lab Syst 84(1–2):134

    Article  CAS  Google Scholar 

  14. Öberg T, Iqbal MS (2012) Chemosphere 87(8):975

    Article  Google Scholar 

  15. Kauvar LM, Villar HO, Sportsman JR, Higgins DL, Schmidt DE Jr (1998) J Chromatogr B Biomed Sci Appl 715(1):93

    Article  CAS  Google Scholar 

  16. Horvath D, Lisurek M, Rupp B, Kühne R, Specker E, von Kries J, Rognan D, Andersson CD, Almqvist F, Elofsson M, Enqvist P-A, Gustavsson A-L, Remez N, Mestres J, Marcou G, Varnek A, Hibert M, Quintana J, Frank R (2014) ChemMedChem 9(10):2309

    Article  CAS  Google Scholar 

  17. Abad-Zapatero C, Perišić O, Wass J, Bento AP, Overington J, Al-Lazikani B, Johnson ME (2010) Drug Discov Today 15(19–20):804

    Article  CAS  Google Scholar 

  18. Paolini GV, Shapland RHB, van Hoorn WP, Mason JS, Hopkins AL (2006) Nat Biotech 24(7):805

    Article  CAS  Google Scholar 

  19. Ruddigkeit L, van Deursen R, Blum LC, Reymond J-L (2012) J Chem Inf Model 52(11):2864

    Article  CAS  Google Scholar 

  20. Reymond J-L, Ruddigkeit L, Blum L, van Deursen R (2012) Wiley Interdiscip Rev Comput Mol Sci 2(5):717

    Article  CAS  Google Scholar 

  21. Polishchuk PG, Madzhidov TI, Varnek A (2013) J Comput Aided Mol Des 27(8):675

    Article  CAS  Google Scholar 

  22. Horvath D, Koch C, Schneider G, Marcou G, Varnek A (2011) J Comput Aided Mol Des 25(3):237

    Article  CAS  Google Scholar 

  23. Horvath D, Jeandenans C (2003) J Chem Inf Comput Sci 43:691

    Article  CAS  Google Scholar 

  24. Ruggiu F, Marcou G, Varnek A, Horvath D (2010) Mol Inform 29(12):855

    Article  CAS  Google Scholar 

  25. Varnek A, Fourches D, Horvath D, Klimchuk O, Gaudin C, Vayer P, Solov’ev V, Hoonakker F, Tetko Iv, Marcou G (2008) Curr Comput-Aided Drug Des 4(3):191

    Article  CAS  Google Scholar 

  26. Varnek A, Fourches D, Solov’ev V, Klimchuk O, Ouadi A, Billard I (2007) Solv Extr Ion Exch 25(4):433

    Article  CAS  Google Scholar 

  27. Gaspar HA, Baskin II, Marcou G, Horvath D, Varnek A (2015) Mol Inform. doi:10.1002/minf.201400153

    Google Scholar 

  28. Gaspar HA, Baskin II, Marcou G, Horvath D, Varnek A (2014) J Chem Inf Model 55(1):84

    Article  Google Scholar 

  29. Bishop CM, Svensén M, Williams CK (1998) Neural Comput 10(1):215

    Article  Google Scholar 

  30. Horvath D, Brown J, Marcou G, Varnek A (2014) Challenges 5(2):450

    Article  Google Scholar 

  31. Bieler M, Heilker R, Koeppen H, Schneider G (2011) J Chem Inf Model 51(8):1897

    Article  CAS  Google Scholar 

  32. Brown JB, Okuno Y, Marcou G, Varnek A, Horvath D (2014) J Comput Aided Mol Des 28(6):597

    Article  CAS  Google Scholar 

  33. Lin H, Sassano MF, Roth BL, Shoichet BK (2013) Nat Methods 10(2):140

    Article  CAS  Google Scholar 

  34. Keiser MJ, Roth BL, Armbruster BN, Ernsberger P, Irwin JJ, Shoichet BK (2007) Nat Biotech 25(2):197

    Article  CAS  Google Scholar 

  35. Horvath D, Marcou G, Varnek A (2013) J Chem Inf Model 53(7):1543

    Article  CAS  Google Scholar 

  36. ChemAxon (2009) Standardizer http://www.chemaxon.com/jchem/doc/user/standardizer.html. Accessed Feb 2008, Budapest

  37. ChemAxon (2007) pKa calculator plugin https://www.chemaxon.com/products/calculator-plugins/property-predictors/. Accessed Feb 2013. ChemAxon, Budapest

  38. Varnek A, Fourches D, Hoonakker F, Solov’ev VP (2005) J Comput Aided Mol Des 19(9–10):693

    Article  CAS  Google Scholar 

  39. Laboratoire de Chemoinformatique Strasbourg (2012) Nomenclature of ISIDA fragments

  40. Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B, Overington JP (2011) Nucl Acids Res 40(D1):D1100

    Article  Google Scholar 

  41. Irwin JJ, Shoichet BK (2005) J Chem Inf Model 45(1):177

    Article  CAS  Google Scholar 

  42. Sedykh A, Fourches D, Duan J, Hucke O, Garneau M, Zhu H, Bonneau P, Tropsha A (2013) Pharm Res 30(4):996

    Article  CAS  Google Scholar 

  43. Elhabiri M, Sidorov P, Cesar-Rodo E, Marcou G, Lanfranchi DA, Davioud-Charvet E, Horvath D, Varnek A (2015) Chem A Eur J 21–8:3415

    Article  Google Scholar 

  44. Lanfranchi DA, Cesar-Rodo E, Bertrand B, Huang H-H, Day L, Johann L, Elhabiri M, Becker K, Williams DL, Davioud-Charvet E (2012) Org Biomol Chem 10(31):6375

    Article  CAS  Google Scholar 

  45. Muller T, Johann L, Jannack B, Bruckner M, Lanfranchi DA, Bauer H, Sanchez C, Yardley V, Deregnaucourt C, Schrevel J, Lanzer M, Schirmer RH, Davioud-Charvet E (2011) J Am Chem Soc 133(30):11557

    Article  CAS  Google Scholar 

  46. Davioud-Charvet E, Delarue S, Biot C, Schwöbel B, Boehme CC, Mössigbrodt A, Maes L, Sergheraert C, Grellier P, Schirmer RH, Becker K (2001) J Med Chem 44(24):4268

    Article  CAS  Google Scholar 

  47. Elhabiri M, Sidorov P, Cesar-Rodo E, Marcou G, Lanfranchi DA, Davioud-Charvet E, Horvath D, Varnek A (2015) Chem A Eur J. doi:10.1002/chem.201403703

    Google Scholar 

  48. Willett P, Barnard JM, Downs GM (1998) J Chem Inf Model 38:983

    Article  CAS  Google Scholar 

  49. Welch BL (1947) Biometrika 34:28

    CAS  Google Scholar 

  50. Rolland C, Gozalbes R, Nicolai E, Paugam MF, Coussy L, Barbosa F, Horvath D, Revah F (2005) J Med Chem 48:6563

    Article  CAS  Google Scholar 

  51. Flower RJ (2003) Nat Rev Drug Discov 2(3):179

    Article  CAS  Google Scholar 

  52. Wang JL, Aston K, Limburg D, Ludwig C, Hallinan AE, Koszyk F, Hamper B, Brown D, Graneto M, Talley J, Maziasz T, Masferrer J, Carter J (2010) Bioorg Med Chem Lett 20(23):7164

    Article  CAS  Google Scholar 

  53. Janusz JM, Young PA, Ridgeway JM, Scherz MW, Enzweiler K, Wu LI, Gan L, Chen J, Kellstein DE, Green SA, Tulich JL, Rosario-Jansen T, Magrisso IJ, Wehmeyer KR, Kuhlenbeck DL, Eichhold TH, Dobson RLM (1998) J Med Chem 41(18):3515

    Article  CAS  Google Scholar 

  54. DayLight (2007) SMARTS http://www.daylight.com/dayhtml/doc/theory.smarts.html. Accessed Oct 2014. Daylight Chemical Information Systems

  55. Schneider G, Schneider P, Renner S (2006) QSAR Comb Sci 25:1162

    Article  CAS  Google Scholar 

  56. Jacobson KA, Van Galen PJM, Williams M (1992) J Med Chem 35(3):407

    Article  CAS  Google Scholar 

  57. Poulsen S-A, Quinn RJ (1998) Bioorg Med Chem 6(6):619

    Article  CAS  Google Scholar 

  58. Groundwater PW, Solomons KRH, Drewe JA, Munawar MA (1996) Protein tyrosine kinase inhibitors. In: Ellis GP, Luscombe DK (eds) Progress in medicinal chemistry, vol 33. Elsevier, Amsterdam, p 233

    Google Scholar 

  59. Lawrence DS, Niu J (1998) Pharmacol Ther 77(2):81

    Article  CAS  Google Scholar 

  60. Levitzki A (1999) Pharmacol Ther 82(2–3):231

    Article  CAS  Google Scholar 

  61. Davies S, Reddy H, Caivano M, Cohen P (2000) Biochem J 351:95

    Article  CAS  Google Scholar 

  62. Bain J, Plater L, Elliott M, Shpiro N, Hastie C, Mclauchlan H, Klevernic I, Arthur J, Alessi D, Cohen P (2007) Biochem J 408:297

    Article  CAS  Google Scholar 

Download references

Acknowledgments

The Laboratory of Chemoinformatics wishes to thank the High Performance Computing centers of the University of Strasbourg, France and the Babes-Bolyai University of Cluj, Romania for supplied computer power, and assistance. Many thanks to Prof. Jürgen Bajorath, for providing the clean and coherent ChEMBL compound subset. K. Klimenko, B. Viira and T. Gimadiev are acknowledged for the help with preparation of antiviral, antimalarial and transporters datasets. PS and AV thank Russian Scientific Foundation (Agreement No 14-43-00024 of October 1, 2014) for support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dragos Horvath.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (ZIP 15042 kb)

Appendix: ChEMBL data curation protocol

Appendix: ChEMBL data curation protocol

This “Appendix” describes the data curation protocol concerning Challenges 3 and 4, as shown in Fig. 14 and explained in further details below.

Fig. 14
figure 14

Workflow of data mining and curation for external Homo sapiens-target related validation. For mined targets, compounds were labeled as active or inactive according to their reported activity type: inhibition %, Ki, IC50, EC50, “potency”—and a chosen activity threshold (AT). Targets that have only potency activity type associated entered only challenge 4, others entered both challenges

First, the complete list of biological targets of Homo sapiens was retrieved from ChEMBL. For each target labeled as “single protein” (2474 proteins) associated ligand activity data were uploaded using a script available as Supporting Information. Analysis and binning of activity data was done as follows:

  1. 1.

    Compounds associated with an inhibition percentage equal or less than 50 % are labeled as inactives.

  2. 2.

    Compounds with an associated dose-dependent activity value lower than an imposed activity threshold are labeled as actives.

  3. 3.

    Compounds with an associated dose-dependent activity value equal or higher than ten times the activity threshold were labeled as inactives.

Eventually, compounds with multiple entries leading to contradictory tentative activity class assignments were ignored. Compounds for which label active or inactive could not be set were also ignored.

In the steps 2 and 3, several activity thresholds were tested: 1000, 500, 100 and 50 nM. A threshold was used for a given target if:

  • more than 100 compounds could be successfully classified for the given target, and

  • out of these, at least 20 compounds are active, and

  • inactives are representing more than 50 % of the set.

If the above rules could not be satisfied, the target was discarded. If more than one threshold value satisfied these conditions, the resulting dataset with the proportion of active compounds closest to 25 % was kept.

If the activity value used in the previous steps was the Ki, IC50 or EC50, the targets entered the both Challenge 3 and Challenge 4. If the activity type was potency, the associated target entered the Challenge 4 only. Those data are results of high-throughput screening (HTS) campaigns.

Supporting Information features two tar archives, Challenge 3.tar.gz and Challenge 4.tar.gz respectively, containing files named <ChEMBLTarget_ID>_S<subset number>.class which report, for the named target, a list of ligand ChEMBL IDs next to their attributed class (1-inactive/2-active). Distributions contain also the files target.chid_AT_name, listing on three columns the validated target ChEMBL IDs, associated activity thresholds AT (in nM) and the name of the target as given in ChEMBL database.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sidorov, P., Gaspar, H., Marcou, G. et al. Mappability of drug-like space: towards a polypharmacologically competent map of drug-relevant compounds. J Comput Aided Mol Des 29, 1087–1108 (2015). https://doi.org/10.1007/s10822-015-9882-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-015-9882-z

Keywords

Navigation