Skip to main content
Log in

MutagenPred-GCNNs: A Graph Convolutional Neural Network-Based Classification Model for Mutagenicity Prediction with Data-Driven Molecular Fingerprints

  • Original research article
  • Published:
Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Abstract

An important task in the early stage of drug discovery is the identification of mutagenic compounds. Mutagenicity prediction models that can interpret relationships between toxicological endpoints and compound structures are especially favorable. In this research, we used an advanced graph convolutional neural network (GCNN) architecture to identify the molecular representation and develop predictive models based on these representations. The predictive model based on features extracted by GCNNs can not only predict the mutagenicity of compounds but also identify the structure alerts in compounds. In fivefold cross-validation and external validation, the highest area under the curve was 0.8782 and 0.8382, respectively; the highest accuracy (Q) was 80.98% and 76.63%, respectively; the highest sensitivity was 83.27% and 78.92%, respectively; and the highest specificity was 78.83% and 76.32%, respectively. Additionally, our model also identified some toxicophores, such as aromatic nitro, three-membered heterocycles, quinones, and nitrogen and sulfur mustard. These results indicate that GCNNs could learn the features of mutagens effectively. In summary, we developed a mutagenicity classification model with high predictive performance and interpretability based on a data-driven molecular representation trained through GCNNs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Parasuraman S (2011) Toxicological screening. J Pharmacol Pharmacother 2(2):74–79. https://doi.org/10.4103/0976-500X.81895

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Segall MD, Chris B (2014) Addressing toxicity risk when designing and selecting compounds in early drug discovery. Drug Discov Today 19(5):688–693. https://doi.org/10.1016/j.drudis.2014.01.006

    Article  CAS  PubMed  Google Scholar 

  3. Ames BN, Lee FD, Durston WE (1973) An improved bacterial test system for the detection and classification of mutagens and carcinogens. PNAS 70(6):1903–1903. https://doi.org/10.1073/pnas.70.3.782

    Article  Google Scholar 

  4. Hillebrecht A, Muster W, Brigo A, Kansy M, Weiser T, Singer T (2011) Comparative evaluation of in silico systems for ames test mutagenicity prediction: scope and limitations. Chem Res Toxicol 24(6):843–854. https://doi.org/10.1021/tx2000398

    Article  CAS  PubMed  Google Scholar 

  5. Lhasa Ltd. L, UK DEREK for Windows. http://www.lhasalimited.org

  6. Leadscope Inc. C, OH. Leadscope Model Applier. http://www.leadscope.com

  7. MultiCASE Inc. B, OH. MultiCASE. http://www.multicase.com

  8. Saiakhov RD, Chakravarti S, Fuller MA, Klopman G (2011) Case ultra: an expert system for computational toxicology with a novel approach for improving risk assessment of chemicals. Toxicol Lett. https://doi.org/10.1016/j.toxlet.2011.05.355

    Article  Google Scholar 

  9. Saiakhov R, Chakravarti S, Klopman G (2013) Effectiveness of CASE ultra expert system in evaluating adverse effects of drugs. Mol Inform 32(1):87–97. https://doi.org/10.1002/minf.201200081

    Article  CAS  PubMed  Google Scholar 

  10. Benigni R, Bossa C, Tcheremenskaia O (2013) Nongenotoxic carcinogenicity of chemicals: mechanisms of action and early recognition through a new set of structural alerts. Chem Rev 113(5):2940–2957. https://doi.org/10.1021/cr300206t

    Article  CAS  PubMed  Google Scholar 

  11. Zhang J, Mucs D, Norinder U, Svensson F (2019) LightGBM: an effective and scalable algorithm for prediction of chemical toxicity-application to the Tox21 and mutagenicity datasets. J Chem Inf Model 59(10):4150–4158. https://doi.org/10.1021/acs.jcim.9b00633

    Article  CAS  PubMed  Google Scholar 

  12. Priyanka B, Eckert AO, Schrey AK, Robert P (2018) ProTox-II: a webserver for the prediction of toxicity of chemicals. Nucleic Acids Res 46(W1):W257–W263. https://doi.org/10.1093/nar/gky318

    Article  CAS  Google Scholar 

  13. Hongbin Y, Chaofeng L, Lixia S, Jie L, Yingchun C (2019) admetSAR 2.0: web-service for prediction and optimization of chemical ADMET properties. Bioinformatics 35(6):1067–1069. https://doi.org/10.1093/bioinformatics/bty707

    Article  CAS  Google Scholar 

  14. Aliper A, Plis S, Artemov A, Ulloa A, Mamoshina P, Zhavoronkov A (2016) Deep learning applications for predicting pharmacological properties of drugs and drug repurposing using transcriptomic data. Mol Pharm 13(7):2524–2530. https://doi.org/10.1021/acs.molpharmaceut.6b00248

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Pan Z, Yu W, Yi X, Khan A, Yuan F, Zheng Y (2019) Recent progress on generative adversarial networks (GANs): a survey. IEEE Access 7:36322–36333. https://doi.org/10.1109/ACCESS.2019.2905015

    Article  Google Scholar 

  16. Khan A, Sohail A, Zahoora U, Qureshi AS (2020) A survey of the recent architectures of deep convolutional neural networks. Artif Intell Rev 53:5455–5516. https://doi.org/10.1007/s10462-020-09825-6

    Article  Google Scholar 

  17. Ma J, Sheridan RP, Liaw A, Dahl GE, Svetnik V (2015) Deep neural nets as a method for quantitative structure–activity relationships. J Chem Inf Model 55(2):263–274. https://doi.org/10.1021/ci500747n

    Article  CAS  PubMed  Google Scholar 

  18. Mayr A, Klambauer G, Unterthiner T, Hochreiter S (2016) DeepTox: toxicity prediction using deep learning. Front environ sci 3:80. https://doi.org/10.3389/fenvs.2015.00080

    Article  Google Scholar 

  19. Koutsoukas A, Monaghan KJ, Li X, Huan J (2017) Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data. J Cheminform 9(1):42. https://doi.org/10.1186/s13321-017-0226-y

    Article  PubMed  PubMed Central  Google Scholar 

  20. Todeschini R, Consonni V, Mannhold R, Kubinyi H, Folkers G (2009) Molecular descriptors for chemoinformatics: volume I: alphabetical listing/volume II: appendices references. Methods and principles in medicinal chemistry. Wiley, Hoboken. https://doi.org/10.1002/9783527628766

    Book  Google Scholar 

  21. Shen J, Cheng F, Xu Y, Li W, Tang Y (2010) Estimation of ADME properties with substructure pattern recognition. J Chem Inf Model 50(6):1034–1041. https://doi.org/10.1021/ci100104j

    Article  CAS  PubMed  Google Scholar 

  22. Cw YAP (2010) Software news and update PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem 32(7):1466–1474. https://doi.org/10.1002/jcc.21707

    Article  CAS  Google Scholar 

  23. Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50(5):742–754. https://doi.org/10.1021/ci100050t

    Article  CAS  PubMed  Google Scholar 

  24. Morgan H (1965) The generation of a unique machine description for chemical structures—a technique developed at chemical abstracts service. J Chem Doc 5(2):107–113. https://doi.org/10.1021/c160017a018

    Article  CAS  Google Scholar 

  25. Li H, Liang Y, Xu Q (2009) Support vector machines and its applications in chemistry. Chemom Intell Lab Syst 95(2):188–198. https://doi.org/10.1016/j.chemolab.2008.10.007

    Article  CAS  Google Scholar 

  26. Hautier G, Fischer CC, Jain A, Mueller T, Ceder G (2010) Finding nature’s missing ternary oxide compounds using machine learning and density functional theory. Chem Mater 22(12):3762–3767. https://doi.org/10.1021/cm100795d

    Article  CAS  Google Scholar 

  27. Müller K-R, Rätsch G, Sonnenburg S, Mika S, Grimm M, Heinrich N (2005) Classifying ‘drug-likeness’ with kernel-based learning methods. J Chem Inf Model 45(2):249–253. https://doi.org/10.1021/ci049737o

    Article  CAS  PubMed  Google Scholar 

  28. Bartók AP, Gillan MJ, Manby FR, Csányi G (2013) Machine-learning approach for one-and two-body corrections to density functional theory: applications to molecular and condensed water. Phys Rev B 88(5):054104. https://doi.org/10.1103/PhysRevB.88.054104

    Article  CAS  Google Scholar 

  29. Preuer K, Klambauer G, Rippmann F, Hochreiter S, Unterthiner T (2019) Interpretable deep learning in drug discovery. Explainable AI: interpreting, explaining and visualizing deep learning. Springer, Berlin. https://doi.org/10.1007/978-3-030-28954-6_18

    Book  Google Scholar 

  30. Bruna J, Zaremba W, Szlam A, LeCun Y (2013) Spectral networks and locally connected networks on graphs. https://arxiv.org/abs/1312.6203

  31. Masci J, Boscaini D, Bronstein MM, Vandergheynst P (2015) Geodesic convolutional neural networks on Riemannian manifolds. In: 2015 IEEE international conference on computer vision workshop (ICCVW), Santiago, pp 832–840. https://doi.org/10.1109/ICCVW.2015.112

  32. Duvenaud DK, Maclaurin D, Iparraguirre J, Bombarell R, Hirzel T, Aspuru-Guzik A, Adams RP (2015) Convolutional networks on graphs for learning molecular fingerprints. In: Advances in neural information processing systems, vol 28. Curran Associates, Inc. http://papers.nips.cc/paper/5954-convolutional-networks-on-graphs-for-learning-molecular-fingerprints.pdf

  33. Goh GB, Siegel C, Vishnu A, Hodas NO, Baker N (2017) Chemception: a deep neural network with minimal chemistry knowledge matches the performance of expert-developed QSAR/QSPR models. https://arxiv.org/abs/1706.06689

  34. Katja H, Sebastian M, Timon S, Andreas S, Antonius TL, Thomas SH, Nikolaus H, Klaus-Robert M (2009) Benchmark data set for in silico prediction of Ames mutagenicity. J Chem Inf Model 49(9):2077–2081. https://doi.org/10.1021/ci900161g

    Article  CAS  Google Scholar 

  35. Kazius J, McGuire R, Bursi R (2005) Derivation and validation of toxicophores for mutagenicity prediction. J Med Chem 48(1):312–320. https://doi.org/10.1021/jm040835a

    Article  CAS  PubMed  Google Scholar 

  36. Helma C, Cramer T, Kramer S, De Raedt L (2004) Data mining and machine learning techniques for the identification of mutagenicity inducing substructures and structure activity relationships of noncongeneric compounds. J Chem Inf Comput Sci 44(4):1402–1411. https://doi.org/10.1021/ci034254q

    Article  CAS  PubMed  Google Scholar 

  37. Feng J, Lurati L, Ouyang H, Robinson T, Wang Y, Yuan S, Young SS (2003) Predictive toxicology: benchmarking molecular descriptors and statistical methods. J Chem Inf Comput Sci 43(5):1463–1470. https://doi.org/10.1021/ci034032s

    Article  CAS  PubMed  Google Scholar 

  38. Landrum G (2016) RDKit: open-source cheminformatics software. https://www.rdkit.org/

  39. Bergstra J, Yamins D, Cox D (2013) Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In: Sanjoy D, David M (eds) Proceedings of the 30th international conference on machine learning, vol 1. PMLR, pp 115–123. https://doi.org/10.5555/3042817.3042832

  40. Xu C, Cheng F, Chen L, Du Z, Li W, Liu G, Lee PW, Tang Y (2012) In silico prediction of chemical ames mutagenicity. J Chem Inf Model 52(11):2840–2847. https://doi.org/10.1021/ci300400a

    Article  CAS  PubMed  Google Scholar 

  41. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V (2011) Scikit-learn: machine learning in python. J Mach Learn Res. https://doi.org/10.1145/2786984.2786995

    Article  Google Scholar 

  42. Sushko I, Salmina E, Potemkin VA, Poda G, Tetko IV (2012) ToxAlerts: a web server of structural alerts for toxic chemicals and compounds with potential adverse reactions. J Chem Inf Model 52(8):2310–2316. https://doi.org/10.1021/ci300245q

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Fishbein L (2011) Potential industrial carcinogens and mutagens. Elsevier, Amsterdam. https://www.elsevier.com/books/potential-industrial-carcinogens-and-mutagens/fishbein/978-0-444-41777-0

  44. Klopman G, Frierson MR, Rosenkranz HS (1990) The structural basis of the mutagenicity of chemicals in Salmonella typhimurium: the Gene-Tox Data Base. Mutat Res Fundam Mol Mech Mutagen 228(1):1–50. https://doi.org/10.1016/0027-5107(90)90013-T

    Article  CAS  Google Scholar 

  45. Chesis L, Smith MT (1984) Mutagenicity of quinones: pathways of metabolic activation and detoxification. PNAS 81(6):1696–1700. https://doi.org/10.1073/pnas.81.6.1696

    Article  CAS  PubMed  Google Scholar 

  46. Ashby J, Tennant RW (1988) Chemical structure, Salmonella mutagenicity and extent of carcinogenicity as indicators of genotoxic carcinogenesis among 222 chemicals tested in rodents by the U.S. NCI/NTP 204(1):17–115. https://doi.org/10.1016/0165-1218(88)90114-0

    Article  CAS  Google Scholar 

  47. Benigni R, Bossa C (2008) Structure alerts for carcinogenicity, and the Salmonella assay system: a novel insight through the chemical relational databases technology. Mutat Res Rev Mutat Res 659(3):248–261. https://doi.org/10.1016/j.mrrev.2008.05.003

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongsheng Liu.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Information (332 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, S., Zhang, L., Feng, H. et al. MutagenPred-GCNNs: A Graph Convolutional Neural Network-Based Classification Model for Mutagenicity Prediction with Data-Driven Molecular Fingerprints. Interdiscip Sci Comput Life Sci 13, 25–33 (2021). https://doi.org/10.1007/s12539-020-00407-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12539-020-00407-2

Keywords

Navigation