Learned Feature Generation for Molecules

  • Patrick WinterEmail author
  • Christian Borgelt
  • Michael R. Berthold
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11191)


When classifying molecules for virtual screening, the molecular structure first needs to be converted into meaningful features, before a classifier can be trained. The most common methods use a static algorithm that has been created based on domain knowledge to perform this generation of features. We propose an approach where this conversion is learned by a convolutional neural network finding features that are useful for the task at hand based on the available data. Preliminary results indicate that our current approach can already come up with features that perform similarly well as common methods. Since this approach does not yet use any chemical properties, results could be improved in future versions.


Convolutional neural networks Feature generation Molecular features Virtual screening 



This work was partially funded by the Konstanz Research School Chemical Biology and KNIME AG.


  1. 1.
  2. 2.
  3. 3.
    DUD - A Directory of Useful Decoys.
  4. 4.
    Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30(7), 1145–1159 (1997)CrossRefGoogle Scholar
  5. 5.
    Broach, J.R., Thorner, J., et al.: High-throughput screening for drug discovery. Nature 384(6604), 14–16 (1996)CrossRefGoogle Scholar
  6. 6.
    Durant, J.L., Leland, B.A., Henry, D.R., Nourse, J.G.: Reoptimization of MDL keys for use in drug discovery. J. Chem. Inf. Comput. Sci. 42(6), 1273–1280 (2002)CrossRefGoogle Scholar
  7. 7.
    Gaulton, A., et al.: ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 40(D1), D1100–D1107 (2011)CrossRefGoogle Scholar
  8. 8.
    Halgren, T.A., et al.: Glide: a new approach for rapid, accurate docking and scoring. 2. enrichment factors in database screening. J. Med. Chem. 47(7), 1750–1759 (2004)CrossRefGoogle Scholar
  9. 9.
    Irwin, J.J.: Community benchmark for virtual screening. J. Comput.-Aided Mol. Des. 22(3–4), 193–199 (2008)CrossRefGoogle Scholar
  10. 10.
    Kearnes, S., McCloskey, K., Berndl, M., Pande, V., Riley, P.: Molecular graph convolutions: moving beyond fingerprints. J. Comput.-Aided Mol. Des. 30(8), 595–608 (2016)CrossRefGoogle Scholar
  11. 11.
    Klopman, G.: Artificial intelligence approach to structure-activity studies. computer automated structure evaluation of biological activity of organic molecules. J. Am. Chem. Soc. 106(24), 7315–7321 (1984)CrossRefGoogle Scholar
  12. 12.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  13. 13.
    Landrum, G.A., et al.: RDKit: Open-source cheminformatics. (2006)
  14. 14.
    Le Cun, Y., et al.: Handwritten zip code recognition with multilayer networks. In: Proceedings. 10th International Conference on Pattern Recognition, 1990, vol. 2, pp. 35–40. IEEE (1990)Google Scholar
  15. 15.
    Mayr, A., Klambauer, G., Unterthiner, T., Hochreiter, S.: Deeptox: toxicity prediction using deep learning. Front. Environ. Sci. 3, 80 (2016)CrossRefGoogle Scholar
  16. 16.
    Nixon, M.S., Aguado, A.S.: Feature Extraction & Image Processing for Computer Vision. Academic Press, New York (2012)CrossRefGoogle Scholar
  17. 17.
    Ramsundar, B., Kearnes, S., Riley, P., Webster, D., Konerding, D., Pande, V.: Massively multitask networks for drug discovery. arXiv preprint arXiv:1502.02072 (2015)
  18. 18.
    Riniker, S., Landrum, G.A.: Open-source platform to benchmark fingerprints for ligand-based virtual screening. J. Cheminformatics 5(1), 26 (2013)CrossRefGoogle Scholar
  19. 19.
    Rogers, D., Hahn, M.: Extended-connectivity fingerprints. J. Chem. Inf. Model. 50(5), 742–754 (2010)CrossRefGoogle Scholar
  20. 20.
    Rohrer, S.G., Baumann, K.: Maximum unbiased validation (MUV) data sets for virtual screening based on pubchem bioactivity data. J. Chem. Inf. Model. 49(2), 169–184 (2009)CrossRefGoogle Scholar
  21. 21.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  22. 22.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  23. 23.
    Todeschini, R., Consonni, V.: Handbook of Molecular Descriptors, vol. 11. Wiley, New York (2008)Google Scholar
  24. 24.
    Unterthiner, T., et al.: Deep learning as an opportunity in virtual screening. Proc. Deep Learn. Workshop NIPS 27, 1–9 (2014)Google Scholar
  25. 25.
    Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Patrick Winter
    • 1
    • 2
    Email author
  • Christian Borgelt
    • 1
    • 3
  • Michael R. Berthold
    • 1
    • 2
    • 4
  1. 1.Department of Computer and Information ScienceUniversity of KonstanzKonstanzGermany
  2. 2.Konstanz Research School Chemical Biology (KoRS-CB)KonstanzGermany
  3. 3.Department of Computer ScienceOtto-von-Guericke UniversityMagdeburgGermany
  4. 4.KNIME AGZurichSwitzerland

Personalised recommendations