Multi-task generative topographic mapping in virtual screening
- 88 Downloads
The previously reported procedure to generate “universal” Generative Topographic Maps (GTMs) of the drug-like chemical space is in practice a multi-task learning process, in which both operational GTM parameters (example: map grid size) and hyperparameters (key example: the molecular descriptor space to be used) are being chosen by an evolutionary process in order to fit/select “universal” GTM manifolds. After selection (a one-time task aimed at optimizing the compromise in terms of neighborhood behavior compliance, over a large pool of various biological targets), for any further use the manifolds are ready to provide “fit-free” predictive models. Using any structure–activity set—irrespectively whether the associated target served at map fitting stage or not—the generation or “coloring” a property landscape enables predicting the property for any external molecule, with zero additional fitable parameters involved. While previous works have signaled the excellent behavior of such models in aggressive three-fold cross-validation assessments of their predictive power, the present work wished to explore their behavior in Virtual Screening (VS), here simulated on hand of external DUD ligand and decoy series that are fully disjoint from the ChEMBL-extracted landscape coloring sets. Beyond the rather robust results of the universal GTM manifolds in this challenge, it could be shown that the descriptor spaces selected by the evolutionary multi-task learner were intrinsically able to serve as an excellent support for many other VS procedures, starting from parameter-free similarity searching, to local (target-specific) GTM models, to parameter-rich, nonlinear Random Forest and Neural Network approaches.
KeywordsGenerative topographic mapping Multi-task learning Ligand-based virtual screening Big data Universal maps ChEMBL DUD Neural networks
Generative topographic mapping
Universal generative topographic mapping
Directory of Useful Decoys
The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript.
The project leading to this article has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie Grant agreement No 676434, “Big Data in Chemistry” (“BIGCHEM”, http://bigchem.eu).
- 23.Ruggiu F, Marcou G, Solov’ev V, Horvath D, Varnek A (2017) ISIDA fragmentor 2017-user manual. http://infochim.u-strasbg.fr/downloads/manuals/Fragmentor2017/Fragmentor2017_Manual_nov2017.pdf
- 26.Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12(Oct):2825–2830Google Scholar
- 30.Dahl GE, Sainath TN, Hinton GE (2013) Improving deep neural networks for LVCSR using rectified linear units and dropout. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), 2013, IEEE, Vancouver, pp 8609–8613Google Scholar
- 31.Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:14126980Google Scholar