Conformal prediction of biological activity of chemical compounds
- 387 Downloads
The paper presents an application of Conformal Predictors to a chemoinformatics problem of predicting the biological activities of chemical compounds. The paper addresses some specific challenges in this domain: a large number of compounds (training examples), high-dimensionality of feature space, sparseness and a strong class imbalance. A variant of conformal predictors called Inductive Mondrian Conformal Predictor is applied to deal with these challenges. Results are presented for several non-conformity measures extracted from underlying algorithms and different kernels. A number of performance measures are used in order to demonstrate the flexibility of Inductive Mondrian Conformal Predictors in dealing with such a complex set of data. This approach allowed us to identify the most likely active compounds for a given biological target and present them in a ranking order.
KeywordsConformal prediction Confidence estimation Chemoinformatics Non-conformity measure
Mathematics Subject Classification (2010)68T05
This project (ExCAPE) has received funding from the European Union’s Horizon 2020 Research and Innovation programme under Grant Agreement no. 671555. We are grateful for the help in conducting experiments to the Ministry of Education, Youth and Sports (Czech Republic) that supports the Large Infrastructures for Research, Experimental Development and Innovations project “IT4Innovations National Supercomputing Center – LM2015070”. This work was also supported by EPSRC grant EP/K033344/1 (“Mining the Network Behaviour of Bots”) and by Technology Integrated Health Management (TIHM) project awarded to the School of Mathematics and Information Security at Royal Holloway as part of an initiative by NHS England supported by InnovateUK. We are indebted to Lars Carlsson of Astra Zeneca for providing the data and useful discussions. We are also thankful to Zhiyuan Luo and Vladimir Vovk for many valuable comments and discussions.
- 2.Bottou, L., Chapelle, O., DeCoste, D., Weston, J.: Large-scale kernel machines (neural information processing). The MIT press (2007)Google Scholar
- 3.Bussonnier, M.: Interactive parallel computing in Python. https://github.com/ipython/ipyparallel
- 4.Pérez, F., Granger, B.E.: IPython: a system for interactive scientific computing, vol. 9 (2007). http://ipython.org
- 5.Kluyver, T., et al.: Jupyter Notebooks – a publishing format for reproducible computational workflows. Positioning and Power in Academic Publishing: Players, Agents and Agendas, 87–90 doi: 10.3233/978-1-61499-649-1-87
- 10.Gärtner, T.: Kernels for Structured Data. World Scientific Publishing Co., Inc., River Edge (2009)Google Scholar
- 11.Graf, H.P., Cosatto, E., Bottou, L., Durdanovic, I., Vapnik, V.: Parallel support vector machines: the cascade SVM. In: Advances in Neural Information Processing Systems, pp 521–528. MIT Press (2005)Google Scholar
- 13.Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, É.: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetMATHGoogle Scholar
- 17.Holenz, J., et al. (eds.): Lead Generation: Methods and Strategies, vol. 68. Wiley-VCH (2016)Google Scholar
- 20.Toccaceli, P., Nouretdinov, I., Gammerman, A.: Conformal predictors for compound activity prediction. In: COPA Proceedings of the 5th International Symposium on Conformal and Probabilistic Prediction with Applications, vol. 9653, p 2016. Springer-Verlag New York Inc. (2016)Google Scholar
- 21.Nouretdinov, I., Gammerman, A., Qi, Y., Klein-Seetharaman, J.: Determining confidence of predicted interactions between HIV-1 and human proteins using conformal method. Pac. Symp. Biocomput. 311 (2012)Google Scholar
- 23.McCool, M., Robison, A.D., Reinders, J.: Structured Parallel Programming: Patterns for Efficient Computation. Morgan-Kaufmann (2012)Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.