Abstract
Intuitive, visual rendering—mapping—of high-dimensional chemical spaces (CS), is an important topic in chemoinformatics. Such maps were so far dedicated to specific compound collections—either limited series of known activities, or large, even exhaustive enumerations of molecules, but without associated property data. Typically, they were challenged to answer some classification problem with respect to those same molecules, admired for their aesthetical virtues and then forgotten—because they were set-specific constructs. This work wishes to address the question whether a general, compound set-independent map can be generated, and the claim of “universality” quantitatively justified, with respect to all the structure–activity information available so far—or, more realistically, an exploitable but significant fraction thereof. The “universal” CS map is expected to project molecules from the initial CS into a lower-dimensional space that is neighborhood behavior-compliant with respect to a large panel of ligand properties. Such map should be able to discriminate actives from inactives, or even support quantitative neighborhood-based, parameter-free property prediction (regression) models, for a wide panel of targets and target families. It should be polypharmacologically competent, without requiring any target-specific parameter fitting. This work describes an evolutionary growth procedure of such maps, based on generative topographic mapping, followed by the validation of their polypharmacological competence. Validation was achieved with respect to a maximum of exploitable structure–activity information, covering all of Homo sapiens proteins of the ChEMBL database, antiparasitic and antiviral data, etc. Five evolved maps satisfactorily solved hundreds of activity-based ligand classification challenges for targets, and even in vivo properties independent from training data. They also stood chemogenomics-related challenges, as cumulated responsibility vectors obtained by mapping of target-specific ligand collections were shown to represent validated target descriptors, complying with currently accepted target classification in biology. Therefore, they represent, in our opinion, a robust and well documented answer to the key question “What is a good CS map?”
Graphical Abstract
Similar content being viewed by others
Abbreviations
- (Q)SPR/SAR:
-
(Quantitative) structure–property/structure–activity relationships
- CS:
-
Chemical space
- GTM:
-
Generative topographic map
- HTS:
-
High throughput screening
References
Virshup AM, Contreras-Garcia J, Wipf P, Yang W, Beratan DN (2013) J Am Chem Soc 135(19):7296
Reker D, Rodrigues T, Schneider P, Schneider G (2014) PNAS 111(11):4067
Bonachera F, Marcou G, Kireeva N, Varnek A, Horvath D (2012) Bioorg Med Chem 20:5396
Kohonen T (2001) Self-organizing maps. Springer, Heidelberg
Agrafiotis DK, Rassokhin DN, Lobanov VS (2001) J Comput Chem 22(5):488
Agrafiotis DK (2003) J Comput Chem 24(10):1215
Sander T, Freyss J, von Korff M, Rufener C (2014) J Chem Inf Model 55(2):460
Gaspar H, Marcou G, Horvath D, Arault A, Lozano S, Vayer P, Varnek A (2013) J Chem Inf Model 53(12):3318
Kireeva N, Baskin II, Gaspar HA, Horvath D, Marcou G, Varnek A (2012) Mol Inf 31(3–4):301
Oprea TI, Gottfries J (2001) J Comb Chem 3(2):157
Renner S, van Otterlo WAL, Dominguez Seoane M, Mocklinghoff S, Hofmann B, Wetzel S, Schuffenhauer A, Ertl P, Oprea TI, Steinhilber D, Brunsveld L, Rauh D, Waldmann H (2009) Nat Chem Biol 5(8):585
Lloyd DG, Golfis G, Knox AJS, Fayne D, Meegan MJ, Oprea TI (2006) Drug Discov Today 11(3–4):149
Matero S, Lahtela-Kakkonen M, Korhonen O, Ketolainen J, Lappalainen R, Poso A (2006) Chemom Intell Lab Syst 84(1–2):134
Öberg T, Iqbal MS (2012) Chemosphere 87(8):975
Kauvar LM, Villar HO, Sportsman JR, Higgins DL, Schmidt DE Jr (1998) J Chromatogr B Biomed Sci Appl 715(1):93
Horvath D, Lisurek M, Rupp B, Kühne R, Specker E, von Kries J, Rognan D, Andersson CD, Almqvist F, Elofsson M, Enqvist P-A, Gustavsson A-L, Remez N, Mestres J, Marcou G, Varnek A, Hibert M, Quintana J, Frank R (2014) ChemMedChem 9(10):2309
Abad-Zapatero C, Perišić O, Wass J, Bento AP, Overington J, Al-Lazikani B, Johnson ME (2010) Drug Discov Today 15(19–20):804
Paolini GV, Shapland RHB, van Hoorn WP, Mason JS, Hopkins AL (2006) Nat Biotech 24(7):805
Ruddigkeit L, van Deursen R, Blum LC, Reymond J-L (2012) J Chem Inf Model 52(11):2864
Reymond J-L, Ruddigkeit L, Blum L, van Deursen R (2012) Wiley Interdiscip Rev Comput Mol Sci 2(5):717
Polishchuk PG, Madzhidov TI, Varnek A (2013) J Comput Aided Mol Des 27(8):675
Horvath D, Koch C, Schneider G, Marcou G, Varnek A (2011) J Comput Aided Mol Des 25(3):237
Horvath D, Jeandenans C (2003) J Chem Inf Comput Sci 43:691
Ruggiu F, Marcou G, Varnek A, Horvath D (2010) Mol Inform 29(12):855
Varnek A, Fourches D, Horvath D, Klimchuk O, Gaudin C, Vayer P, Solov’ev V, Hoonakker F, Tetko Iv, Marcou G (2008) Curr Comput-Aided Drug Des 4(3):191
Varnek A, Fourches D, Solov’ev V, Klimchuk O, Ouadi A, Billard I (2007) Solv Extr Ion Exch 25(4):433
Gaspar HA, Baskin II, Marcou G, Horvath D, Varnek A (2015) Mol Inform. doi:10.1002/minf.201400153
Gaspar HA, Baskin II, Marcou G, Horvath D, Varnek A (2014) J Chem Inf Model 55(1):84
Bishop CM, Svensén M, Williams CK (1998) Neural Comput 10(1):215
Horvath D, Brown J, Marcou G, Varnek A (2014) Challenges 5(2):450
Bieler M, Heilker R, Koeppen H, Schneider G (2011) J Chem Inf Model 51(8):1897
Brown JB, Okuno Y, Marcou G, Varnek A, Horvath D (2014) J Comput Aided Mol Des 28(6):597
Lin H, Sassano MF, Roth BL, Shoichet BK (2013) Nat Methods 10(2):140
Keiser MJ, Roth BL, Armbruster BN, Ernsberger P, Irwin JJ, Shoichet BK (2007) Nat Biotech 25(2):197
Horvath D, Marcou G, Varnek A (2013) J Chem Inf Model 53(7):1543
ChemAxon (2009) Standardizer http://www.chemaxon.com/jchem/doc/user/standardizer.html. Accessed Feb 2008, Budapest
ChemAxon (2007) pKa calculator plugin https://www.chemaxon.com/products/calculator-plugins/property-predictors/. Accessed Feb 2013. ChemAxon, Budapest
Varnek A, Fourches D, Hoonakker F, Solov’ev VP (2005) J Comput Aided Mol Des 19(9–10):693
Laboratoire de Chemoinformatique Strasbourg (2012) Nomenclature of ISIDA fragments
Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B, Overington JP (2011) Nucl Acids Res 40(D1):D1100
Irwin JJ, Shoichet BK (2005) J Chem Inf Model 45(1):177
Sedykh A, Fourches D, Duan J, Hucke O, Garneau M, Zhu H, Bonneau P, Tropsha A (2013) Pharm Res 30(4):996
Elhabiri M, Sidorov P, Cesar-Rodo E, Marcou G, Lanfranchi DA, Davioud-Charvet E, Horvath D, Varnek A (2015) Chem A Eur J 21–8:3415
Lanfranchi DA, Cesar-Rodo E, Bertrand B, Huang H-H, Day L, Johann L, Elhabiri M, Becker K, Williams DL, Davioud-Charvet E (2012) Org Biomol Chem 10(31):6375
Muller T, Johann L, Jannack B, Bruckner M, Lanfranchi DA, Bauer H, Sanchez C, Yardley V, Deregnaucourt C, Schrevel J, Lanzer M, Schirmer RH, Davioud-Charvet E (2011) J Am Chem Soc 133(30):11557
Davioud-Charvet E, Delarue S, Biot C, Schwöbel B, Boehme CC, Mössigbrodt A, Maes L, Sergheraert C, Grellier P, Schirmer RH, Becker K (2001) J Med Chem 44(24):4268
Elhabiri M, Sidorov P, Cesar-Rodo E, Marcou G, Lanfranchi DA, Davioud-Charvet E, Horvath D, Varnek A (2015) Chem A Eur J. doi:10.1002/chem.201403703
Willett P, Barnard JM, Downs GM (1998) J Chem Inf Model 38:983
Welch BL (1947) Biometrika 34:28
Rolland C, Gozalbes R, Nicolai E, Paugam MF, Coussy L, Barbosa F, Horvath D, Revah F (2005) J Med Chem 48:6563
Flower RJ (2003) Nat Rev Drug Discov 2(3):179
Wang JL, Aston K, Limburg D, Ludwig C, Hallinan AE, Koszyk F, Hamper B, Brown D, Graneto M, Talley J, Maziasz T, Masferrer J, Carter J (2010) Bioorg Med Chem Lett 20(23):7164
Janusz JM, Young PA, Ridgeway JM, Scherz MW, Enzweiler K, Wu LI, Gan L, Chen J, Kellstein DE, Green SA, Tulich JL, Rosario-Jansen T, Magrisso IJ, Wehmeyer KR, Kuhlenbeck DL, Eichhold TH, Dobson RLM (1998) J Med Chem 41(18):3515
DayLight (2007) SMARTS http://www.daylight.com/dayhtml/doc/theory.smarts.html. Accessed Oct 2014. Daylight Chemical Information Systems
Schneider G, Schneider P, Renner S (2006) QSAR Comb Sci 25:1162
Jacobson KA, Van Galen PJM, Williams M (1992) J Med Chem 35(3):407
Poulsen S-A, Quinn RJ (1998) Bioorg Med Chem 6(6):619
Groundwater PW, Solomons KRH, Drewe JA, Munawar MA (1996) Protein tyrosine kinase inhibitors. In: Ellis GP, Luscombe DK (eds) Progress in medicinal chemistry, vol 33. Elsevier, Amsterdam, p 233
Lawrence DS, Niu J (1998) Pharmacol Ther 77(2):81
Levitzki A (1999) Pharmacol Ther 82(2–3):231
Davies S, Reddy H, Caivano M, Cohen P (2000) Biochem J 351:95
Bain J, Plater L, Elliott M, Shpiro N, Hastie C, Mclauchlan H, Klevernic I, Arthur J, Alessi D, Cohen P (2007) Biochem J 408:297
Acknowledgments
The Laboratory of Chemoinformatics wishes to thank the High Performance Computing centers of the University of Strasbourg, France and the Babes-Bolyai University of Cluj, Romania for supplied computer power, and assistance. Many thanks to Prof. Jürgen Bajorath, for providing the clean and coherent ChEMBL compound subset. K. Klimenko, B. Viira and T. Gimadiev are acknowledged for the help with preparation of antiviral, antimalarial and transporters datasets. PS and AV thank Russian Scientific Foundation (Agreement No 14-43-00024 of October 1, 2014) for support.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix: ChEMBL data curation protocol
Appendix: ChEMBL data curation protocol
This “Appendix” describes the data curation protocol concerning Challenges 3 and 4, as shown in Fig. 14 and explained in further details below.
First, the complete list of biological targets of Homo sapiens was retrieved from ChEMBL. For each target labeled as “single protein” (2474 proteins) associated ligand activity data were uploaded using a script available as Supporting Information. Analysis and binning of activity data was done as follows:
-
1.
Compounds associated with an inhibition percentage equal or less than 50 % are labeled as inactives.
-
2.
Compounds with an associated dose-dependent activity value lower than an imposed activity threshold are labeled as actives.
-
3.
Compounds with an associated dose-dependent activity value equal or higher than ten times the activity threshold were labeled as inactives.
Eventually, compounds with multiple entries leading to contradictory tentative activity class assignments were ignored. Compounds for which label active or inactive could not be set were also ignored.
In the steps 2 and 3, several activity thresholds were tested: 1000, 500, 100 and 50 nM. A threshold was used for a given target if:
-
more than 100 compounds could be successfully classified for the given target, and
-
out of these, at least 20 compounds are active, and
-
inactives are representing more than 50 % of the set.
If the above rules could not be satisfied, the target was discarded. If more than one threshold value satisfied these conditions, the resulting dataset with the proportion of active compounds closest to 25 % was kept.
If the activity value used in the previous steps was the Ki, IC50 or EC50, the targets entered the both Challenge 3 and Challenge 4. If the activity type was potency, the associated target entered the Challenge 4 only. Those data are results of high-throughput screening (HTS) campaigns.
Supporting Information features two tar archives, Challenge 3.tar.gz and Challenge 4.tar.gz respectively, containing files named <ChEMBLTarget_ID>_S<subset number>.class which report, for the named target, a list of ligand ChEMBL IDs next to their attributed class (1-inactive/2-active). Distributions contain also the files target.chid_AT_name, listing on three columns the validated target ChEMBL IDs, associated activity thresholds AT (in nM) and the name of the target as given in ChEMBL database.
Rights and permissions
About this article
Cite this article
Sidorov, P., Gaspar, H., Marcou, G. et al. Mappability of drug-like space: towards a polypharmacologically competent map of drug-relevant compounds. J Comput Aided Mol Des 29, 1087–1108 (2015). https://doi.org/10.1007/s10822-015-9882-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-015-9882-z