Artificial Intelligence in Biological Activity Prediction

  • João CorreiaEmail author
  • Tiago Resende
  • Delora Baptista
  • Miguel Rocha
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1005)


Artificial intelligence has become an indispensable resource in chemoinformatics. Numerous machine learning algorithms for activity prediction recently emerged, becoming an indispensable approach to mine chemical information from large compound datasets. These approaches enable the automation of compound discovery to find biologically active molecules with important properties. Here, we present a review of some of the main machine learning studies in biological activity prediction of compounds, in particular for sweetness prediction. We discuss some of the most used compound featurization techniques and the major databases of chemical compounds relevant to these tasks.


Machine learning Deep learning Biological activity prediction Sweetness prediction Compound featurization 



This study was supported by the European Commission through project SHIKIFACTORY100 - Modular cell factories for the production of 100 compounds from the shikimate pathway (Reference 814408), and by the Portuguese FCT under the scope of the strategic funding of UID/BIO/04469/2019 unit and BioTecNorte operation (NORTE-01-0145-FEDER-000004) funded by the European Regional Development Fund under the scope of Norte2020.


  1. 1.
    Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge [u.a.] (2013)Google Scholar
  2. 2.
    Samuel, A.L.: Some studies in machine learning using the game of checkers. IBM J. Res. Dev. 3(3), 210–229 (1959)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Toccaceli, P., et al.: Conformal prediction of biological activity of chemical compounds. Ann. Math. Artif. Intell. 81(1–2), 105–123 (2017)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Wishart, D.S., et al.: DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 46(D1), D1074–D1082 (2017)CrossRefGoogle Scholar
  5. 5.
    Kim, S., et al.: PubChem 2019 update: improved access to chemical data. Nucleic Acids Res. 47(D1), D1102–D1109 (2018)CrossRefGoogle Scholar
  6. 6.
    Hastings, J., et al.: ChEBI in 2016: Improved services and an expanding collection of metabolites. Nucleic Acids Res. 44(D1), D1214–D1219 (2015)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Wu, Z., et al.: MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9(2), 513–530 (2018)CrossRefGoogle Scholar
  8. 8.
    Pence, H.E., Williams, A.: ChemSpider: an online chemical information resource. J. Chem. Educ. 87(11), 1123–1124 (2010)CrossRefGoogle Scholar
  9. 9.
    Wishart, D., et al.: T3DB: the toxic exposome database. Nucleic Acids Res. 43(D1), D928–D934 (2014)CrossRefGoogle Scholar
  10. 10.
    Mayr, A., et al.: Large-scale comparison of machine learning methods for drug target prediction on ChEMBL. Chem. Sci. 9(24), 5441–5451 (2018)CrossRefGoogle Scholar
  11. 11.
    Merget, B., et al.: Profiling prediction of kinase inhibitors: toward the virtual assay. J. Med. Chem. 60(1), 474–485 (2016)CrossRefGoogle Scholar
  12. 12.
    Ma, J., et al.: Deep neural nets as a method for quantitative structure-activity relationships. J. Chem. Inf. Model. 55(2), 263–274 (2015)CrossRefGoogle Scholar
  13. 13.
    Gaulton, A., et al.: The ChEMBL database in 2017. Nucleic Acids Res. 45(D1), D945–D954 (2016)CrossRefGoogle Scholar
  14. 14.
    Lenselink, E.B., et al.: Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set. J. Cheminformatics 9(1), 45 (2017)CrossRefGoogle Scholar
  15. 15.
    Korotcov, A., et al.: Comparison of deep learning with multiple machine learning methods and metrics using diverse drug discovery data sets. Mol. Pharm. 14(12), 4462–4475 (2017)CrossRefGoogle Scholar
  16. 16.
    Xu, Y., et al.: Demystifying multitask deep neural networks for quantitative structure-activity relationships. J. Chem. Inf. Model. 57(10), 2490–2504 (2017)CrossRefGoogle Scholar
  17. 17.
    Koutsoukas, A., et al.: Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data. J. Cheminformatics 9(1), 42 (2017)CrossRefGoogle Scholar
  18. 18.
    Mayr, A., et al.: DeepTox: toxicity prediction using deep learning. Front. Environ. Sci. 3, 80 (2016)CrossRefGoogle Scholar
  19. 19.
    Kearnes, S., et al.: Modeling industrial ADMET data with multitask networks, June 2016Google Scholar
  20. 20.
    Ramsundar, B., et al.: Is multitask deep learning practical for pharma? J. Chem. Inf. Model. 57(8), 2068–2076 (2017)CrossRefGoogle Scholar
  21. 21.
    Dahl, G., Jaitly, N., Salakhutdinov, R.: Multi-task neural networks for QSAR predictions. CoRR arXiv:1406.1231v1 (2014)
  22. 22.
    Xu, Y., et al.: Deep learning for drug-induced liver injury. J. Chem. Inf. Model. 55(10), 2085–2093 (2015)CrossRefGoogle Scholar
  23. 23.
    Ramsundar, B., et al.: Massively multitask networks for drug discovery. CoRR arXiv:1502.02072 (2015)
  24. 24.
    Unterthiner, T., et al.: Deep learning as an opportunity in virtual screening, January 2014Google Scholar
  25. 25.
    Chen, B., et al.: Comparison of random forest and pipeline pilot naïve bayes in prospective QSAR predictions. J. Chem. Inf. Model. 52(3), 792–803 (2012)CrossRefGoogle Scholar
  26. 26.
    Myint, K.Z., et al.: Molecular fingerprint-based artificial neural networks QSAR for ligand biological activity predictions. Mol. Pharm. 9(10), 2912–2923 (2012)CrossRefGoogle Scholar
  27. 27.
    Martin, E., et al.: Profile-QSAR: a novel meta-QSAR method that combines activities across the kinase family to accurately predict affinity, selectivity, and cellular activity. J. Chem. Inf. Model. 51(8), 1942–1956 (2011)CrossRefGoogle Scholar
  28. 28.
    O’Boyle, N.M.: Towards a universal SMILES representation - a standard method to generate canonical SMILES based on the InChI. J. Cheminformatics 4(1), 22 (2012)CrossRefGoogle Scholar
  29. 29.
    Weininger, D.: SMILES, a chemical language and information system. 1. introduction to methodology and encoding rules. J. Chem. Inf. Model. 28(1), 31–36 (1988)CrossRefGoogle Scholar
  30. 30.
    Heller, S.R., et al.: InChI, the IUPAC international chemical identifier. J. Cheminformatics 7(1), 23 (2015)CrossRefGoogle Scholar
  31. 31.
    Duvenaud, D.K., et al.: Convolutional networks on graphs for learning molecular fingerprints. CoRR arXiv:1509.09292 (2015)
  32. 32.
    Kearnes, S., et al.: Molecular graph convolutions: moving beyond fingerprints. J. Comput.-Aided Mol. Des. 30(8), 595–608 (2016)CrossRefGoogle Scholar
  33. 33.
    Xu, Z., et al.: Seq2seq fingerprint. In: Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB 2017, pp. 285–294. ACM Press, New York (2017)Google Scholar
  34. 34.
    Sutskever, I., et al.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)Google Scholar
  35. 35.
    Jaeger, S., et al.: Mol2vec: unsupervised machine learning approach with chemical intuition. J. Chem. Inf. Model. 58(1), 27–35 (2018)CrossRefGoogle Scholar
  36. 36.
    Mikolov, T., et al.: Efficient estimation of word representations in vector space, January 2013Google Scholar
  37. 37.
    Whitehouse, C.R., et al.: The potential toxicity of artificial sweeteners. AAOHN J. 56(6), 251–259 (2008)CrossRefGoogle Scholar
  38. 38.
    Yang, X., et al.: In-silico prediction of sweetness of sugars and sweeteners. Food Chem. 128(3), 653–658 (2011)CrossRefGoogle Scholar
  39. 39.
    Zhong, M., et al.: Prediction of sweetness by multilinear regression analysis and support vector machine. J. Food Sci. 78(9), S1445–S1450 (2013)CrossRefGoogle Scholar
  40. 40.
    Rojas, C., et al.: A new QSPR study on relative sweetness. Int. J. Quant. Struct.-Prop. Relat. 1(1), 78–93 (2016)MathSciNetGoogle Scholar
  41. 41.
    Rojas, C., et al.: A QSTR-based expert system to predict sweetness of molecules. Front. Chem. 5, 53 (2017)CrossRefGoogle Scholar
  42. 42.
    Chéron, J.B., et al.: Sweetness prediction of natural compounds. Food Chem. 221, 1421–1425 (2017)CrossRefGoogle Scholar
  43. 43.
    Goel, A., et al.: In-silico prediction of sweetness using structure-activity relationship models. Food Chem. 253, 127–131 (2018)CrossRefGoogle Scholar
  44. 44.
    Banerjee, P., Preissner, R.: BitterSweetForest: a random forest based binary classifier to predict bitterness and sweetness of chemical compounds. Front. Chem. 6, 93 (2018)CrossRefGoogle Scholar
  45. 45.
    Ojha, P.K., Roy, K.: Development of a robust and validated 2D-QSPR model for sweetness potency of diverse functional organic molecules. Food Chem. Toxicol. 112, 551–562 (2018)CrossRefGoogle Scholar
  46. 46.
    Zheng, S., et al.: e-sweet: a machine-learning based platform for the prediction of sweetener and its relative sweetness. Front. Chem. 7, 35 (2019)CrossRefGoogle Scholar
  47. 47.
    Ahmed, J., et al.: SuperSweet-a resource on natural and artificial sweetening agents. Nucleic Acids Res. 39(Database), D377–D382 (2010)CrossRefGoogle Scholar
  48. 48.
    Dagan-Wiener, A., et al.: Bitter or not? BitterPredict, a tool for predicting taste from chemical structure. Sci. Rep. 7(1) (2017)Google Scholar
  49. 49.
    Garg, N., et al.: FlavorDB: a database of flavor molecules. Nucleic Acids Res. 46(D1), D1210–D1216 (2017)CrossRefGoogle Scholar
  50. 50.
    Banerjee, P., et al.: Super natural II–a database of natural products. Nucleic Acids Res. 43(D1), D935–D939 (2014)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • João Correia
    • 1
    Email author
  • Tiago Resende
    • 1
  • Delora Baptista
    • 1
  • Miguel Rocha
    • 1
  1. 1.CEB - Centre of Biological EngineeringUniversity of MinhoBragaPortugal

Personalised recommendations