Journal of Computer-Aided Molecular Design

, Volume 33, Issue 3, pp 331–343 | Cite as

Multi-task generative topographic mapping in virtual screening

  • Arkadii Lin
  • Dragos Horvath
  • Gilles Marcou
  • Bernd Beck
  • Alexandre VarnekEmail author


The previously reported procedure to generate “universal” Generative Topographic Maps (GTMs) of the drug-like chemical space is in practice a multi-task learning process, in which both operational GTM parameters (example: map grid size) and hyperparameters (key example: the molecular descriptor space to be used) are being chosen by an evolutionary process in order to fit/select “universal” GTM manifolds. After selection (a one-time task aimed at optimizing the compromise in terms of neighborhood behavior compliance, over a large pool of various biological targets), for any further use the manifolds are ready to provide “fit-free” predictive models. Using any structure–activity set—irrespectively whether the associated target served at map fitting stage or not—the generation or “coloring” a property landscape enables predicting the property for any external molecule, with zero additional fitable parameters involved. While previous works have signaled the excellent behavior of such models in aggressive three-fold cross-validation assessments of their predictive power, the present work wished to explore their behavior in Virtual Screening (VS), here simulated on hand of external DUD ligand and decoy series that are fully disjoint from the ChEMBL-extracted landscape coloring sets. Beyond the rather robust results of the universal GTM manifolds in this challenge, it could be shown that the descriptor spaces selected by the evolutionary multi-task learner were intrinsically able to serve as an excellent support for many other VS procedures, starting from parameter-free similarity searching, to local (target-specific) GTM models, to parameter-rich, nonlinear Random Forest and Neural Network approaches.


Generative topographic mapping Multi-task learning Ligand-based virtual screening Big data Universal maps ChEMBL DUD Neural networks 



Generative topographic mapping


Universal generative topographic mapping


Genetic algorithm




Directory of Useful Decoys


Neural network


Random forest


Author contributions

The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript.


The project leading to this article has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie Grant agreement No 676434, “Big Data in Chemistry” (“BIGCHEM”,

Supplementary material

10822_2019_188_MOESM1_ESM.docx (32 kb)
Supplementary material 1 (DOCX 31 KB) (2.5 mb)
Supplementary material 2 (ZIP 2577 KB)


  1. 1.
    Bishop CM, Svensén M, Williams CK (1998) GTM: the generative topographic mapping. Neural Comput 10(1):215–234CrossRefGoogle Scholar
  2. 2.
    Kohonen T (1990) The self-organizing map. Proc IEEE 78(9):1464–1480CrossRefGoogle Scholar
  3. 3.
    Lin A, Horvath D, Afonina V, Marcou G, Jean-Louis R, Varnek A (2018) Mapping of the available chemical space versus the chemical universe of lead-like compounds. ChemMedChem 13:540–554. CrossRefGoogle Scholar
  4. 4.
    Kireeva N, Baskin I, Gaspar H, Horvath D, Marcou G, Varnek A (2012) Generative topographic mapping (GTM): universal tool for data visualization, structure—activity modeling and dataset comparison. Mol Inform 31(3–4):301–312CrossRefGoogle Scholar
  5. 5.
    Gaspar HA, Baskin II, Marcou G, Horvath D, Varnek A (2015) GTM-based QSAR models and their applicability domains. Mol Inform 34(6–7):348–356. CrossRefGoogle Scholar
  6. 6.
    Muegge I, Oloff S (2006) Advances in virtual screening. Drug Discov Today 3(4):405–411. CrossRefGoogle Scholar
  7. 7.
    Lavecchia A (2015) Machine-learning approaches in drug discovery: methods and applications. Drug Discov Today 20(3):318–331. CrossRefGoogle Scholar
  8. 8.
    Hristozov D, Oprea TI, Gasteiger J (2007) Ligand-based virtual screening by novelty detection with self-organizing maps. J Chem Inf Model 47(6):2044–2062. CrossRefGoogle Scholar
  9. 9.
    Kaiser D, Terfloth L, Kopp S, Schulz J, de Laet R, Chiba P, Ecker GF, Gasteiger J (2007) Self-organizing maps for identification of new inhibitors of P-glycoprotein. J Med Chem 50(7):1698–1702. CrossRefGoogle Scholar
  10. 10.
    Schneider G, Nettekoven M (2003) Ligand-based combinatorial design of selective purinergic receptor (A2A) antagonists using self-organizing maps. J Comb Chem 5(3):233–237CrossRefGoogle Scholar
  11. 11.
    Sidorov P, Gaspar H, Marcou G, Varnek A, Horvath D (2015) Mappability of drug-like space: towards a polypharmacologically competent map of drug-relevant compounds. J Comput Aided Mol Des 29(12):1087–1108. CrossRefGoogle Scholar
  12. 12.
    Rosenbaum L, Dörr A, Bauer MR, Boeckler FM, Zell A (2013) Inferring multi-target QSAR models with taxonomy-based multi-task learning. J Cheminform 5(1):33CrossRefGoogle Scholar
  13. 13.
    Varnek A, Gaudin C, Marcou G, Baskin I, Pandey AK, Tetko IV (2009) Inductive transfer of knowledge: application of multi-task learning and feature net approaches to model tissue-air partition coefficients. J Chem Inf Model 49(1):133–144. CrossRefGoogle Scholar
  14. 14.
    Xu Y, Ma J, Liaw A, Sheridan RP, Svetnik V (2017) Demystifying multitask deep neural networks for quantitative structure–activity relationships. J Chem Inf Model 57(10):2490–2504CrossRefGoogle Scholar
  15. 15.
    Brown JB, Okuno Y, Marcou G, Varnek A, Horvath D (2014) Computational chemogenomics: is it more than inductive transfer? J Comput Aided Mol Des 28(6):597–618. CrossRefGoogle Scholar
  16. 16.
    Heikamp K, Bajorath J (2013) Prediction of compounds with closely related activity profiles using weighted support vector machine linear combinations. J Chem Inf Model 53(4):791–801. CrossRefGoogle Scholar
  17. 17.
    Medina-Franco JL, Giulianotti MA, Welmaker GS, Houghten RA (2013) Shifting from the single to the multitarget paradigm in drug discovery. Drug Discovery Today 18(9–10):495–501. CrossRefGoogle Scholar
  18. 18.
    Bieler M, Heilker R, Koeppen H, Schneider G (2011) Assay related target similarity (ARTS)—chemogenomics approach for quantitative comparison of biological targets. J Chem Inf Model 51(8):1897–1905. CrossRefGoogle Scholar
  19. 19.
    Jacob L, Hoffmann B, Stoven V, Vert J-P (2008) Virtual screening of GPCRs: an in silico chemogenomics approach. BMC Bioinform 9(1):363CrossRefGoogle Scholar
  20. 20.
    Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B (2011) ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 40(D1):D1100–D1107CrossRefGoogle Scholar
  21. 21.
    Huang N, Shoichet BK, Irwin JJ (2006) Benchmarking sets for molecular Docking. J Med Chem 49(23):6789–6801. CrossRefGoogle Scholar
  22. 22.
    Ruggiu F, Marcou G, Varnek A, Horvath D (2010) ISIDA property-labelled fragment descriptors. Mol Inform 29(12):855–868. CrossRefGoogle Scholar
  23. 23.
    Ruggiu F, Marcou G, Solov’ev V, Horvath D, Varnek A (2017) ISIDA fragmentor 2017-user manual.
  24. 24.
    Horvath D, Brown J, Marcou G, Varnek A (2014) An evolutionary optimizer of libsvm models. Challenges 5(2):450–472CrossRefGoogle Scholar
  25. 25.
    Gaspar HA, Baskin II, Marcou G, Horvath D, Varnek A (2015) Chemical data visualization and analysis with incremental generative topographic mapping: big data challenge. J Chem Inf Model 55(1):84–94. CrossRefGoogle Scholar
  26. 26.
    Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12(Oct):2825–2830Google Scholar
  27. 27.
    Ruck DW, Rogers SK, Kabrisky M, Oxley ME, Suter BW (1990) The multilayer perceptron as an approximation to a Bayes optimal discriminant function. IEEE Trans Neural Netw 1(4):296–298. CrossRefGoogle Scholar
  28. 28.
    Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533CrossRefGoogle Scholar
  29. 29.
    Breiman L (2001) Random forests. Mach Learn 45(1):5–32CrossRefGoogle Scholar
  30. 30.
    Dahl GE, Sainath TN, Hinton GE (2013) Improving deep neural networks for LVCSR using rectified linear units and dropout. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), 2013, IEEE, Vancouver, pp 8609–8613Google Scholar
  31. 31.
    Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:14126980Google Scholar
  32. 32.
    Horvath D, Koch C, Schneider G, Marcou G, Varnek A (2011) Local neighborhood behavior in a combinatorial library context. J Comput Aided Mol Des 25(3):237–252. CrossRefGoogle Scholar
  33. 33.
    Papadatos G, Cooper AWJ, Kadirkamanathan V, Macdonald SJF, McLay IM, Pickett SD, Pritchard JM, Willett P, Gillet VJ (2009) Analysis of neighborhood behavior in lead optimization and array design. J Chem Inf Model 49(2):195–208. CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Laboratory of Chemoinformatics, Faculty of ChemistryUniversity of StrasbourgStrasbourgFrance
  2. 2.Department of Medicinal ChemistryBoehringer Ingelheim Pharma GmbH & Co. KGBiberach an der RissGermany

Personalised recommendations