Mappability of drug-like space: towards a polypharmacologically competent map of drug-relevant compounds

Sidorov, Pavel; Gaspar, Helena; Marcou, Gilles; Varnek, Alexandre; Horvath, Dragos

doi:10.1007/s10822-015-9882-z

Mappability of drug-like space: towards a polypharmacologically competent map of drug-relevant compounds

Published: 12 November 2015

Volume 29, pages 1087–1108, (2015)
Cite this article

Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Pavel Sidorov^1,2,
Helena Gaspar¹,
Gilles Marcou¹,
Alexandre Varnek^1,2 &
…
Dragos Horvath¹

661 Accesses
47 Citations
Explore all metrics

Abstract

Intuitive, visual rendering—mapping—of high-dimensional chemical spaces (CS), is an important topic in chemoinformatics. Such maps were so far dedicated to specific compound collections—either limited series of known activities, or large, even exhaustive enumerations of molecules, but without associated property data. Typically, they were challenged to answer some classification problem with respect to those same molecules, admired for their aesthetical virtues and then forgotten—because they were set-specific constructs. This work wishes to address the question whether a general, compound set-independent map can be generated, and the claim of “universality” quantitatively justified, with respect to all the structure–activity information available so far—or, more realistically, an exploitable but significant fraction thereof. The “universal” CS map is expected to project molecules from the initial CS into a lower-dimensional space that is neighborhood behavior-compliant with respect to a large panel of ligand properties. Such map should be able to discriminate actives from inactives, or even support quantitative neighborhood-based, parameter-free property prediction (regression) models, for a wide panel of targets and target families. It should be polypharmacologically competent, without requiring any target-specific parameter fitting. This work describes an evolutionary growth procedure of such maps, based on generative topographic mapping, followed by the validation of their polypharmacological competence. Validation was achieved with respect to a maximum of exploitable structure–activity information, covering all of Homo sapiens proteins of the ChEMBL database, antiparasitic and antiviral data, etc. Five evolved maps satisfactorily solved hundreds of activity-based ligand classification challenges for targets, and even in vivo properties independent from training data. They also stood chemogenomics-related challenges, as cumulated responsibility vectors obtained by mapping of target-specific ligand collections were shown to represent validated target descriptors, complying with currently accepted target classification in biology. Therefore, they represent, in our opinion, a robust and well documented answer to the key question “What is a good CS map?”

Graphical Abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Pocket Crafter: a 3D generative modeling based workflow for the rapid generation of hit molecules in drug discovery

Article Open access 21 March 2024

Lingling Shen, Jian Fang, … He Wang

Multi-task generative topographic mapping in virtual screening

Article 09 February 2019

Arkadii Lin, Dragos Horvath, … Alexandre Varnek

“DompeKeys”: a set of novel substructure-based descriptors for efficient chemical space mapping, development and structural interpretation of machine learning models, and indexing of large databases

Article Open access 23 February 2024

Candida Manelfi, Valerio Tazzari, … Andrea Rosario Beccari

Abbreviations

(Q)SPR/SAR:: (Quantitative) structure–property/structure–activity relationships
CS:: Chemical space
GTM:: Generative topographic map
HTS:: High throughput screening

References

Virshup AM, Contreras-Garcia J, Wipf P, Yang W, Beratan DN (2013) J Am Chem Soc 135(19):7296
Article CAS Google Scholar
Reker D, Rodrigues T, Schneider P, Schneider G (2014) PNAS 111(11):4067
Article CAS Google Scholar
Bonachera F, Marcou G, Kireeva N, Varnek A, Horvath D (2012) Bioorg Med Chem 20:5396
Article CAS Google Scholar
Kohonen T (2001) Self-organizing maps. Springer, Heidelberg
Book Google Scholar
Agrafiotis DK, Rassokhin DN, Lobanov VS (2001) J Comput Chem 22(5):488
Article CAS Google Scholar
Agrafiotis DK (2003) J Comput Chem 24(10):1215
Article CAS Google Scholar
Sander T, Freyss J, von Korff M, Rufener C (2014) J Chem Inf Model 55(2):460
Article Google Scholar
Gaspar H, Marcou G, Horvath D, Arault A, Lozano S, Vayer P, Varnek A (2013) J Chem Inf Model 53(12):3318
Article CAS Google Scholar
Kireeva N, Baskin II, Gaspar HA, Horvath D, Marcou G, Varnek A (2012) Mol Inf 31(3–4):301
Article CAS Google Scholar
Oprea TI, Gottfries J (2001) J Comb Chem 3(2):157
Article CAS Google Scholar
Renner S, van Otterlo WAL, Dominguez Seoane M, Mocklinghoff S, Hofmann B, Wetzel S, Schuffenhauer A, Ertl P, Oprea TI, Steinhilber D, Brunsveld L, Rauh D, Waldmann H (2009) Nat Chem Biol 5(8):585
Article CAS Google Scholar
Lloyd DG, Golfis G, Knox AJS, Fayne D, Meegan MJ, Oprea TI (2006) Drug Discov Today 11(3–4):149
Article CAS Google Scholar
Matero S, Lahtela-Kakkonen M, Korhonen O, Ketolainen J, Lappalainen R, Poso A (2006) Chemom Intell Lab Syst 84(1–2):134
Article CAS Google Scholar
Öberg T, Iqbal MS (2012) Chemosphere 87(8):975
Article Google Scholar
Kauvar LM, Villar HO, Sportsman JR, Higgins DL, Schmidt DE Jr (1998) J Chromatogr B Biomed Sci Appl 715(1):93
Article CAS Google Scholar
Horvath D, Lisurek M, Rupp B, Kühne R, Specker E, von Kries J, Rognan D, Andersson CD, Almqvist F, Elofsson M, Enqvist P-A, Gustavsson A-L, Remez N, Mestres J, Marcou G, Varnek A, Hibert M, Quintana J, Frank R (2014) ChemMedChem 9(10):2309
Article CAS Google Scholar
Abad-Zapatero C, Perišić O, Wass J, Bento AP, Overington J, Al-Lazikani B, Johnson ME (2010) Drug Discov Today 15(19–20):804
Article CAS Google Scholar
Paolini GV, Shapland RHB, van Hoorn WP, Mason JS, Hopkins AL (2006) Nat Biotech 24(7):805
Article CAS Google Scholar
Ruddigkeit L, van Deursen R, Blum LC, Reymond J-L (2012) J Chem Inf Model 52(11):2864
Article CAS Google Scholar
Reymond J-L, Ruddigkeit L, Blum L, van Deursen R (2012) Wiley Interdiscip Rev Comput Mol Sci 2(5):717
Article CAS Google Scholar
Polishchuk PG, Madzhidov TI, Varnek A (2013) J Comput Aided Mol Des 27(8):675
Article CAS Google Scholar
Horvath D, Koch C, Schneider G, Marcou G, Varnek A (2011) J Comput Aided Mol Des 25(3):237
Article CAS Google Scholar
Horvath D, Jeandenans C (2003) J Chem Inf Comput Sci 43:691
Article CAS Google Scholar
Ruggiu F, Marcou G, Varnek A, Horvath D (2010) Mol Inform 29(12):855
Article CAS Google Scholar
Varnek A, Fourches D, Horvath D, Klimchuk O, Gaudin C, Vayer P, Solov’ev V, Hoonakker F, Tetko Iv, Marcou G (2008) Curr Comput-Aided Drug Des 4(3):191
Article CAS Google Scholar
Varnek A, Fourches D, Solov’ev V, Klimchuk O, Ouadi A, Billard I (2007) Solv Extr Ion Exch 25(4):433
Article CAS Google Scholar
Gaspar HA, Baskin II, Marcou G, Horvath D, Varnek A (2015) Mol Inform. doi:10.1002/minf.201400153
Google Scholar
Gaspar HA, Baskin II, Marcou G, Horvath D, Varnek A (2014) J Chem Inf Model 55(1):84
Article Google Scholar
Bishop CM, Svensén M, Williams CK (1998) Neural Comput 10(1):215
Article Google Scholar
Horvath D, Brown J, Marcou G, Varnek A (2014) Challenges 5(2):450
Article Google Scholar
Bieler M, Heilker R, Koeppen H, Schneider G (2011) J Chem Inf Model 51(8):1897
Article CAS Google Scholar
Brown JB, Okuno Y, Marcou G, Varnek A, Horvath D (2014) J Comput Aided Mol Des 28(6):597
Article CAS Google Scholar
Lin H, Sassano MF, Roth BL, Shoichet BK (2013) Nat Methods 10(2):140
Article CAS Google Scholar
Keiser MJ, Roth BL, Armbruster BN, Ernsberger P, Irwin JJ, Shoichet BK (2007) Nat Biotech 25(2):197
Article CAS Google Scholar
Horvath D, Marcou G, Varnek A (2013) J Chem Inf Model 53(7):1543
Article CAS Google Scholar
ChemAxon (2009) Standardizer http://www.chemaxon.com/jchem/doc/user/standardizer.html. Accessed Feb 2008, Budapest
ChemAxon (2007) pKa calculator plugin https://www.chemaxon.com/products/calculator-plugins/property-predictors/. Accessed Feb 2013. ChemAxon, Budapest
Varnek A, Fourches D, Hoonakker F, Solov’ev VP (2005) J Comput Aided Mol Des 19(9–10):693
Article CAS Google Scholar
Laboratoire de Chemoinformatique Strasbourg (2012) Nomenclature of ISIDA fragments
Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B, Overington JP (2011) Nucl Acids Res 40(D1):D1100
Article Google Scholar
Irwin JJ, Shoichet BK (2005) J Chem Inf Model 45(1):177
Article CAS Google Scholar
Sedykh A, Fourches D, Duan J, Hucke O, Garneau M, Zhu H, Bonneau P, Tropsha A (2013) Pharm Res 30(4):996
Article CAS Google Scholar
Elhabiri M, Sidorov P, Cesar-Rodo E, Marcou G, Lanfranchi DA, Davioud-Charvet E, Horvath D, Varnek A (2015) Chem A Eur J 21–8:3415
Article Google Scholar
Lanfranchi DA, Cesar-Rodo E, Bertrand B, Huang H-H, Day L, Johann L, Elhabiri M, Becker K, Williams DL, Davioud-Charvet E (2012) Org Biomol Chem 10(31):6375
Article CAS Google Scholar
Muller T, Johann L, Jannack B, Bruckner M, Lanfranchi DA, Bauer H, Sanchez C, Yardley V, Deregnaucourt C, Schrevel J, Lanzer M, Schirmer RH, Davioud-Charvet E (2011) J Am Chem Soc 133(30):11557
Article CAS Google Scholar
Davioud-Charvet E, Delarue S, Biot C, Schwöbel B, Boehme CC, Mössigbrodt A, Maes L, Sergheraert C, Grellier P, Schirmer RH, Becker K (2001) J Med Chem 44(24):4268
Article CAS Google Scholar
Elhabiri M, Sidorov P, Cesar-Rodo E, Marcou G, Lanfranchi DA, Davioud-Charvet E, Horvath D, Varnek A (2015) Chem A Eur J. doi:10.1002/chem.201403703
Google Scholar
Willett P, Barnard JM, Downs GM (1998) J Chem Inf Model 38:983
Article CAS Google Scholar
Welch BL (1947) Biometrika 34:28
CAS Google Scholar
Rolland C, Gozalbes R, Nicolai E, Paugam MF, Coussy L, Barbosa F, Horvath D, Revah F (2005) J Med Chem 48:6563
Article CAS Google Scholar
Flower RJ (2003) Nat Rev Drug Discov 2(3):179
Article CAS Google Scholar
Wang JL, Aston K, Limburg D, Ludwig C, Hallinan AE, Koszyk F, Hamper B, Brown D, Graneto M, Talley J, Maziasz T, Masferrer J, Carter J (2010) Bioorg Med Chem Lett 20(23):7164
Article CAS Google Scholar
Janusz JM, Young PA, Ridgeway JM, Scherz MW, Enzweiler K, Wu LI, Gan L, Chen J, Kellstein DE, Green SA, Tulich JL, Rosario-Jansen T, Magrisso IJ, Wehmeyer KR, Kuhlenbeck DL, Eichhold TH, Dobson RLM (1998) J Med Chem 41(18):3515
Article CAS Google Scholar
DayLight (2007) SMARTS http://www.daylight.com/dayhtml/doc/theory.smarts.html. Accessed Oct 2014. Daylight Chemical Information Systems
Schneider G, Schneider P, Renner S (2006) QSAR Comb Sci 25:1162
Article CAS Google Scholar
Jacobson KA, Van Galen PJM, Williams M (1992) J Med Chem 35(3):407
Article CAS Google Scholar
Poulsen S-A, Quinn RJ (1998) Bioorg Med Chem 6(6):619
Article CAS Google Scholar
Groundwater PW, Solomons KRH, Drewe JA, Munawar MA (1996) Protein tyrosine kinase inhibitors. In: Ellis GP, Luscombe DK (eds) Progress in medicinal chemistry, vol 33. Elsevier, Amsterdam, p 233
Google Scholar
Lawrence DS, Niu J (1998) Pharmacol Ther 77(2):81
Article CAS Google Scholar
Levitzki A (1999) Pharmacol Ther 82(2–3):231
Article CAS Google Scholar
Davies S, Reddy H, Caivano M, Cohen P (2000) Biochem J 351:95
Article CAS Google Scholar
Bain J, Plater L, Elliott M, Shpiro N, Hastie C, Mclauchlan H, Klevernic I, Arthur J, Alessi D, Cohen P (2007) Biochem J 408:297
Article CAS Google Scholar

Download references

Acknowledgments

The Laboratory of Chemoinformatics wishes to thank the High Performance Computing centers of the University of Strasbourg, France and the Babes-Bolyai University of Cluj, Romania for supplied computer power, and assistance. Many thanks to Prof. Jürgen Bajorath, for providing the clean and coherent ChEMBL compound subset. K. Klimenko, B. Viira and T. Gimadiev are acknowledged for the help with preparation of antiviral, antimalarial and transporters datasets. PS and AV thank Russian Scientific Foundation (Agreement No 14-43-00024 of October 1, 2014) for support.

Author information

Authors and Affiliations

Laboratoire de Chémoinformatique, UMR 7140, CNRS-Univ. Strasbourg, 1 rue Blaise Pascal, 67000, Strasbourg, France
Pavel Sidorov, Helena Gaspar, Gilles Marcou, Alexandre Varnek & Dragos Horvath
Laboratory of Chemoinformatics, Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
Pavel Sidorov & Alexandre Varnek

Authors

Pavel Sidorov
View author publications
You can also search for this author in PubMed Google Scholar
Helena Gaspar
View author publications
You can also search for this author in PubMed Google Scholar
Gilles Marcou
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre Varnek
View author publications
You can also search for this author in PubMed Google Scholar
Dragos Horvath
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dragos Horvath.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (ZIP 15042 kb)

Appendix: ChEMBL data curation protocol

This “Appendix” describes the data curation protocol concerning Challenges 3 and 4, as shown in Fig. 14 and explained in further details below.

First, the complete list of biological targets of Homo sapiens was retrieved from ChEMBL. For each target labeled as “single protein” (2474 proteins) associated ligand activity data were uploaded using a script available as Supporting Information. Analysis and binning of activity data was done as follows:

1.
Compounds associated with an inhibition percentage equal or less than 50 % are labeled as inactives.
2.
Compounds with an associated dose-dependent activity value lower than an imposed activity threshold are labeled as actives.
3.
Compounds with an associated dose-dependent activity value equal or higher than ten times the activity threshold were labeled as inactives.

Eventually, compounds with multiple entries leading to contradictory tentative activity class assignments were ignored. Compounds for which label active or inactive could not be set were also ignored.

In the steps 2 and 3, several activity thresholds were tested: 1000, 500, 100 and 50 nM. A threshold was used for a given target if:

more than 100 compounds could be successfully classified for the given target, and
out of these, at least 20 compounds are active, and
inactives are representing more than 50 % of the set.

If the above rules could not be satisfied, the target was discarded. If more than one threshold value satisfied these conditions, the resulting dataset with the proportion of active compounds closest to 25 % was kept.

If the activity value used in the previous steps was the K_i, IC50 or EC50, the targets entered the both Challenge 3 and Challenge 4. If the activity type was potency, the associated target entered the Challenge 4 only. Those data are results of high-throughput screening (HTS) campaigns.

Supporting Information features two tar archives, Challenge 3.tar.gz and Challenge 4.tar.gz respectively, containing files named <ChEMBLTarget_ID>_S<subset number>.class which report, for the named target, a list of ligand ChEMBL IDs next to their attributed class (1-inactive/2-active). Distributions contain also the files target.chid_AT_name, listing on three columns the validated target ChEMBL IDs, associated activity thresholds AT (in nM) and the name of the target as given in ChEMBL database.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sidorov, P., Gaspar, H., Marcou, G. et al. Mappability of drug-like space: towards a polypharmacologically competent map of drug-relevant compounds. J Comput Aided Mol Des 29, 1087–1108 (2015). https://doi.org/10.1007/s10822-015-9882-z

Download citation

Received: 05 October 2015
Accepted: 06 November 2015
Published: 12 November 2015
Issue Date: December 2015
DOI: https://doi.org/10.1007/s10822-015-9882-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.