Machine Learning and Deep Learning Methods in Ecotoxicological QSAR Modeling

  • Giuseppina GiniEmail author
  • Francesco Zanoli
Part of the Methods in Pharmacology and Toxicology book series (MIPT)


Today the registered chemical structures are about 28 millions, while experimental toxicity data are available for a few hundred thousands of them. Defining properties and effects for all the available chemicals is a huge task due to the cost of the experimentation and to legislative restrictions. Therefore, prediction is the only available solution, but it poses many challenges in terms of accuracy and interpretability. Predictive toxicology systems use statistics as well as methods based on machine learning (ML). While ML has been widely used in the pharmaceutical domain, its use in ecotoxicology is more limited. After reviewing the experiences in quantitative structure-activity relationships (QSARs) for modeling CMR (carcinogenic, mutagenic, reproductive) toxicity and PBT (persistent, bioaccumulative, and toxic) chemicals, we look at the advancements of technology in ML. Recently, the investigation of the neural basis for many cognitive functions has provided the tools to create new systems that can think, solve problems, find patterns, and recognize images and texts; these new methods are named deep learning (DL). We modified the most successful DL architecture, implemented Toxception as a tool to generate QSAR models, and tested it in a real case, on a dataset of about 20,000 molecules tested for mutagenicity with the Ames test. The results obtained challenge the current state of the art. In addition, Toxception does not use any chemistry knowledge besides the 2D structures derived from SMILES. We conclude examining advantages, open challenges, and drawbacks of building QSARs with DL.

Key words

Machine learning Neural networks Deep learning Mutagenicity Ames test 


  1. 1.
    Judson J, Richard A, Dix DJ (2009) The toxicity data landscape for environmental chemicals. Environ Health Perspect 117(5):685–695CrossRefGoogle Scholar
  2. 2.
    Gini G, Ferrari T, Cattaneo D, Golbamaki N, Manganaro A, Benfenati E (2013) Automatic knowledge extraction from chemical structures: the case of mutagenicity prediction. SAR QSAR Environ Res 24(5):365–383. Scholar
  3. 3.
    Collins FS, Gray GM, Bucher J (2008) Transforming environmental health protection. Science 319(5865):906–907. Scholar
  4. 4.
    Gini G, Katrizky A (eds) (1999) Predictive toxicology of chemicals: experiences and impact of AI tools, papers from the AAAI Spring Symposium on Predictive toxicology SS-99-01. AAAI Press, Menlo Park, 1999Google Scholar
  5. 5.
    Lo Y-C, Rensi SE, Torng W, Altman RB (2018) Machine learning in chemoinformatics and drug discovery. Drug Discov Today 23(8):1538–1546CrossRefGoogle Scholar
  6. 6.
    Khan PM, Roy K (2018) Current approaches for choosing feature selection and learning algorithms in quantitative structure-activity relationships (QSAR). Expert Opin Drug Discovery 13(12):1075–1089. Scholar
  7. 7.
    Devinyak OT, Lesyk RB (2016) 5-Year trends in QSAR and its machine learning methods. Curr Comput Aided Drug Des, Las Vegas, NV, USA. 12(4):265–271Google Scholar
  8. 8.
    Zhang L, Tan J, Han D, Zhu H (2017) From machine learning to deep learning: progress in machine intelligence for rational drug discovery. Drug Discov Today 22(1):1680–1685CrossRefGoogle Scholar
  9. 9.
    Lee Y, Buchanan BG, Mattison DM, Klopman G, Rosenkranz HS (1995) Learning rules to predict rodent carcinogenicity of non-genotoxic chemicals. Mutat Res 328:127–149CrossRefGoogle Scholar
  10. 10.
    Bradbury SP, Feijtel TCJ, Van Leeuwen CJ (2004) Meeting the scientific needs of ecological risk assessment in a regulatory context. Environ Sci Technol 38(23):463A–470AGoogle Scholar
  11. 11.
    Mazzatorta P, Benfenati E, Lorenzini P, Vighi M (2004) QSAR in ecotoxicology: an overview of modern classification techniques. J Chem Inf Comput Sci 44:105–112CrossRefGoogle Scholar
  12. 12.
    Helma C, King RD, Kramer S, Srinivasan A (2001) The predictive toxicology challenge 2000–2001.
  13. 13.
    Gini G, Benfenati E, Lorenzini M, Bruschi M, Grasso P (1999) Predictive carcinogenicity: a model for aromatic compounds, with nitrogen-containing substituents, based on molecular descriptors using artificial neural networks. J Chem Inf Comput Sci 39:1076–1080. Scholar
  14. 14.
    Gini G, Lorenzini M, Benfenati E, Brambilla R, Malve’ L (2001) Mixing a symbolic and a subsymbolic expert to improve carcinogenicity prediction of aromatic compounds. Proceedings of second workshop on Multiple Classifier Systems (MCS 2001), Springer, pp 126–135Google Scholar
  15. 15.
    Rallo R, Espinosa G, Giralt F (2005) Using an ensemble of neural based QSARs for the prediction of toxicological properties of chemical contaminants. Process Saf Environ Prot 83(B4):387–392CrossRefGoogle Scholar
  16. 16.
    Fjodorova N, Vračko M, Novič M, Roncaglioni A, Benfenati E (2010) New public QSAR model for carcinogenicity. Chem Cent J 4(Suppl 1):S3. Scholar
  17. 17.
    Golbamaki A, Benfenati E, Golbamaki N, Manganaro A, Merdivan E, Gini G (2016) New clues on carcinogenicity-related substructures derived from mining two large datasets of chemical compounds. J Environ Sci Health C 34(2):97–113CrossRefGoogle Scholar
  18. 18.
    Li N, Qi J, Wang P, Zhang X, Zhang T, Li H (2019, 2019) Quantitative structure–activity relationship (QSAR) study of carcinogenicity of polycyclic aromatic hydrocarbons (PAHs) in atmospheric particulate matter by random forest (RF). Anal Methods.
  19. 19.
    Papamokos G, Silins I (2016) Combining QSAR modeling and text-mining techniques to link chemical structures and carcinogenic modes of action. Front Pharmacol. 30 Aug 2016.
  20. 20.
    Ferrari T, Gini G (2010) An open source multistep model to predict mutagenicity from statistic analysis and relevant structural alerts. Chem Cent J 4(Suppl 1):S2. online Scholar
  21. 21.
    Gini G, Franchi AM, Manganaro A, Golbamaki A, Benfenati E (2014) ToxRead: a tool to assist in read across and its use to assess mutagenicity of chemicals, SAR and QSAR in environmental research., pp 1–13, online December 2014
  22. 22.
    Toropov AA, Toropova AP, Martyanov SE, Benfenati E, Gini G, Leszczynska D, Leszczynski J (2011) Comparison of SMILES and molecular graphs as the representation of the molecular structure for QSAR analysis for mutagenic potential of polyaromatic amines. Chemom Intell Lab Syst 109:94–100CrossRefGoogle Scholar
  23. 23.
    Maunz A, Gütlein M, Rautenberg M, Vorgrimmler D, Gebele D, Helma C (2013) Lazar: a modular predictive toxicology framework. Front Pharmacol 4:38. Scholar
  24. 24.
    Zhang Q-Y, Aires-de-Sousa J (2007) Random forest prediction of mutagenicity from empirical physicochemical descriptors. J Chem Inf Model 47(1):1–8. Scholar
  25. 25.
    Maran U, Sid S (2003) QSAR Modeling of genotoxicity on non-congeneric sets of organic compounds. Artif Intell Rev 20:13–38CrossRefGoogle Scholar
  26. 26.
    Cronin MTD, Worth AP (2008) (Q)SARs for predicting effects relating to reproductive toxicity. QSAR Comb Sci 27(1):91–100CrossRefGoogle Scholar
  27. 27.
    Cassano A, Manganaro A, Martin T, Young D, Piclin N, Pintore M, Bigoni D, Benfenati E (2010) CAESAR models for developmental toxicity. Chem Cent J 4(Supp 1):S4. Scholar
  28. 28.
    Baker JR, Gamberger D, Mihelcic JR, Sabljic A (2004) Evaluation of artificial intelligence based models for chemical biodegradability prediction. Molecules 9(12):989–1003.
  29. 29.
    Lombardo A, Pizzo F, Benfenati E, Manganaro A, Ferrari T, Gini G (2016) A new in silico classification model for ready biodegradability, based on molecular fragments. Chemosphere 108(2016):10–16Google Scholar
  30. 30.
    Miller TH, Gallidabino MD, MacRae JI, Owen SF, Bury NR, Barron LP (2019) Prediction of bioconcentration factors in fish and invertebrates using machine learning. Sci Total Environ 648:80–89CrossRefGoogle Scholar
  31. 31.
    Lombardo A, Roncaglioni A, Boriani E, Milan C, Benfenati E (2010) Assessment and validation of the CAESAR predictive model for bioconcentration factor (BCF) in fish. Chem Cent J 4(Supp1):S1CrossRefGoogle Scholar
  32. 32.
    Valsecchi C, Grisoni F, Consonni V, Ballabio D (2019) Structural alerts for the identification of bioaccumulative compounds. Integr Environ Assess Manag 15(1):19–28CrossRefGoogle Scholar
  33. 33.
    Benfenati E, Roncaglioni A, Petoumenou MI, Cappelli CI, Gini G (2015) Integrating QSAR and read-across for environmental assessment. SAR QSAR Environ Res 26(7–9):605–618CrossRefGoogle Scholar
  34. 34.
    Benfenati E (ed) (2007) Quantitative structure-activity relationships (QSAR) for pesticide regulatory purposes. Amsterdam Elsevier ScienceGoogle Scholar
  35. 35.
    Gini G, Ferrari T, Lombardo A, Cassano A, Benfenati E (2019) A new QSAR model for acute fish toxicity based on mined structural alerts. J Toxicol Risk Assess 5(1):016. Scholar
  36. 36.
    Gini G, Craciun M, Benfenati E (2004) Combining unsupervised and supervised artificial neural networks to predict aquatic toxicity. J Chem Inf Comput Sci 44(6):1897–1902CrossRefGoogle Scholar
  37. 37.
    Pintore M, Piclin N, Benfenati E, Gini G, Chretien JR (2003) Predicting toxicity against the fathead Minnow by Adaptive Fuzzy Partition. QSAR Comb Sci (Wiley-VCH) 22:210–219CrossRefGoogle Scholar
  38. 38.
    Toropova A, Toropov A, Veselinovic A, Veselinović J, Leszczynska D, Leszczynski J (2016) Monte Carlo based QSAR models for toxicity of organic chemicals to Daphnia magna. Environ Toxicol Chem 35(11):2691–2697CrossRefGoogle Scholar
  39. 39.
    Xu Y, Pei J, Lai L (2017) Deep learning based regression and multi-class models for acute oral toxicity prediction with automatic chemical feature extraction. arXiv:1704.04718v3 [stat.ML]Google Scholar
  40. 40.
    Sayre R, Grulke C (2018) Universal LD50 predictions using deep learning. ICCVAM – Predictive models for acute oral systemic toxicity, Bethesda, 11–12 Apr 2018Google Scholar
  41. 41.
    Benfenati E, Mazzatorta P, Neagu CD, Gini G (2002) Combining classifiers of pesticides toxicity through a neuro-fuzzy approach. Proceedings of 3rd international workshop on multiple classifier systems, MCS 2002, Springer, Cagliari, June 2002, pp 293–303Google Scholar
  42. 42.
    Mazzatorta P, Cronin MTD, Benfenati E (2006) A QSAR study of avian oral toxicity using support vector machines and genetic algorithms. Mol Inform 25(7):616–628Google Scholar
  43. 43.
    Gini G, Garg T, Stefanelli M (2009) Ensembling regression models to improve their predictivity: a case study in QSAR (Quantitative Structure Activity Relationships) within computational chemometrics. Appl Artif Intell 23:261–281CrossRefGoogle Scholar
  44. 44.
    Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. airXiv:1602.07261v2 [cs.CV]Google Scholar
  45. 45.
    Goh G, Siegel C, Vishnu A, Hodas NO, Baker N (2017) Chemception: a deep neural network with minimal chemistry knowledge matches the performance of expert-developed QSAR/QSPR models.
  46. 46.
    McCulloch WS, Warren S, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. B Math Biophy 5(4):115–133. ISSN 1522-9602. Scholar
  47. 47.
    Werbos PJ (1994) The roots of backpropagation: from ordered derivatives to neural networks and political forecasting. Wiley, New YorkGoogle Scholar
  48. 48.
    Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Geoffrey G, David D, Miroslav D (eds) Proceedings of the fourteenth international conference on artificial intelligence and statistics, Fort Lauderdale, 11–13 Apr 2011; PMLR Proceedings of Machine Learning Research, pp 315–323Google Scholar
  49. 49.
    Devillers J (ed) (1996) Neural networks in QSAR and drug design. Academic Press, San DiegoGoogle Scholar
  50. 50.
    O’Shea KT (2015) An introduction to convolutional neural networks. arXiv:1511.08458v2 [cs.NE]Google Scholar
  51. 51.
    LeCun Y, Bengio Y (1995) Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks.
  52. 52.
    Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA. arXiv:1511.08458 [cs.NE]
  53. 53.
    Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2016) Going deeper with convolutions. Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA. pp 1–9Google Scholar
  54. 54.
    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. The IEEE conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA. pp 770–778Google Scholar
  55. 55.
    Serre T, Wolf L, Bileschi S, Riesenhuber M, Poggio T (2007) Robust object recognition with cortex-like mechanisms. IEEE Trans Pattern Anal Mach Intell 29(3):411–426CrossRefGoogle Scholar
  56. 56.
    Lin M, Chen Q, Yan S (2016) Network in network. arXiv preprint arXiv:1312.4400, 2013Google Scholar
  57. 57.
    Ames BN (1984) The detection of environmental mutagens and potential. Cancer 53:2030–2040CrossRefGoogle Scholar
  58. 58.
    Piegorsch W W, Zeiger E (1991) Measuring intra-assay agreement for the Ames salmonella assay. In: Hotorn L (ed), Statistical methods in toxicology, Lecture Notes in Medical Informatics, Springer, Berlin-Heidelberg, pp 35–41Google Scholar
  59. 59.
    Benfenati E, Golbamaki A, Raitano G, Roncaglioni A, Manganelli S, Lemke F, Norinder U, Lo Piparo E, Honma M, Manganaro A, Gini G (2018) A large comparison of integrated SAR/QSAR models of the Ames test for mutagenicity. SAR QSAR Environ Res 29(8):591–611CrossRefGoogle Scholar
  60. 60.
    Martin T (2016) User’s guide for T.E.S.T. (Toxicity Estimation Software Tool), U.S. EPA/National Risk Management Research Laboratory/Sustainable Technology Division, Cincinnati, OH (2016). Available at
  61. 61.
    Benigni R, Netzeva T, Benfenati E, Bossa C (2007) The expanding role of predictive toxicology: an update on the (Q)SAR models for mutagens and carcinogens. J Environ Sci Health C 25(1):53–97. Scholar
  62. 62.
    Manganaro A, Pizzo F, Lombardo A, Pogliaghi A, Benfenati E (2016) Predicting persistence in the sediment compartment with a new automatic software based on the k-Nearest Neighbor (k-NN) algorithm. Chemosphere 144:1624–1630CrossRefGoogle Scholar
  63. 63.
    Mazzatorta P, Tran LA, Schilter B, Grigorov M (2007) Integration of structure-activity relationship and artificial intelligence systems to improve in silico prediction of Ames test mutagenicity. J Chem Inf Model 47:34–38. Scholar
  64. 64.
    Norinder U, Ahlberg E, Carlsson L (2019) Predicting Ames mutagenicity using conformal prediction in the Ames/QSAR International challenge project mutagenesis 34:33–40.
  65. 65.
    Weininger M, Weininger A, Weininger JL (1989) Smiles. Algorithm for generation of unique SMILES notation. J Chem Inf Model 29(2):97–101CrossRefGoogle Scholar
  66. 66.
    Benfenati E, Manganaro A, Gini G (2013) VEGA-QSAR: Ai inside a platform for predictive toxicology, PAI@ AI∗ IA, pp 21–28Google Scholar
  67. 67.
    NIHS. Ames/QSAR international collaborative study. URL
  68. 68.
    Corvi R, Madia F (2018) Eurl ECVAM genotoxicity and carcinogenicity consolidated database of Ames positive chemicals. European Commission, Joint Research Centre (JRC)Google Scholar
  69. 69.
    Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958Google Scholar
  70. 70.
    Kingma DP, Lei Ba J (2017) Adam: a method for stochastic optimization, arXiv:1412.6980[cs.LG]Google Scholar
  71. 71.
    Gal Y, Ghahramani Z (2016) Dropout as a Bayesian approximation: representing model uncertainty in deep learning Bayesian in deep learning. arXiv:1506.02142v6 [stat.ML]Google Scholar
  72. 72.
    Wolpert D (1996) The lack of a priori distinctions between learning algorithms. Neural Comput 8:1341–1390CrossRefGoogle Scholar
  73. 73.
    Ben-David S, Hribes P, Moran S, Shpilka A, Yehudayoff A (2019) Learnability can be undecidable. Nat Mach Intell 1:121CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Authors and Affiliations

  1. 1.DEIB, Politecnico di MilanoMilanItaly

Personalised recommendations