Abstract
An important task in the early stage of drug discovery is the identification of mutagenic compounds. Mutagenicity prediction models that can interpret relationships between toxicological endpoints and compound structures are especially favorable. In this research, we used an advanced graph convolutional neural network (GCNN) architecture to identify the molecular representation and develop predictive models based on these representations. The predictive model based on features extracted by GCNNs can not only predict the mutagenicity of compounds but also identify the structure alerts in compounds. In fivefold cross-validation and external validation, the highest area under the curve was 0.8782 and 0.8382, respectively; the highest accuracy (Q) was 80.98% and 76.63%, respectively; the highest sensitivity was 83.27% and 78.92%, respectively; and the highest specificity was 78.83% and 76.32%, respectively. Additionally, our model also identified some toxicophores, such as aromatic nitro, three-membered heterocycles, quinones, and nitrogen and sulfur mustard. These results indicate that GCNNs could learn the features of mutagens effectively. In summary, we developed a mutagenicity classification model with high predictive performance and interpretability based on a data-driven molecular representation trained through GCNNs.
Similar content being viewed by others
References
Parasuraman S (2011) Toxicological screening. J Pharmacol Pharmacother 2(2):74–79. https://doi.org/10.4103/0976-500X.81895
Segall MD, Chris B (2014) Addressing toxicity risk when designing and selecting compounds in early drug discovery. Drug Discov Today 19(5):688–693. https://doi.org/10.1016/j.drudis.2014.01.006
Ames BN, Lee FD, Durston WE (1973) An improved bacterial test system for the detection and classification of mutagens and carcinogens. PNAS 70(6):1903–1903. https://doi.org/10.1073/pnas.70.3.782
Hillebrecht A, Muster W, Brigo A, Kansy M, Weiser T, Singer T (2011) Comparative evaluation of in silico systems for ames test mutagenicity prediction: scope and limitations. Chem Res Toxicol 24(6):843–854. https://doi.org/10.1021/tx2000398
Lhasa Ltd. L, UK DEREK for Windows. http://www.lhasalimited.org
Leadscope Inc. C, OH. Leadscope Model Applier. http://www.leadscope.com
MultiCASE Inc. B, OH. MultiCASE. http://www.multicase.com
Saiakhov RD, Chakravarti S, Fuller MA, Klopman G (2011) Case ultra: an expert system for computational toxicology with a novel approach for improving risk assessment of chemicals. Toxicol Lett. https://doi.org/10.1016/j.toxlet.2011.05.355
Saiakhov R, Chakravarti S, Klopman G (2013) Effectiveness of CASE ultra expert system in evaluating adverse effects of drugs. Mol Inform 32(1):87–97. https://doi.org/10.1002/minf.201200081
Benigni R, Bossa C, Tcheremenskaia O (2013) Nongenotoxic carcinogenicity of chemicals: mechanisms of action and early recognition through a new set of structural alerts. Chem Rev 113(5):2940–2957. https://doi.org/10.1021/cr300206t
Zhang J, Mucs D, Norinder U, Svensson F (2019) LightGBM: an effective and scalable algorithm for prediction of chemical toxicity-application to the Tox21 and mutagenicity datasets. J Chem Inf Model 59(10):4150–4158. https://doi.org/10.1021/acs.jcim.9b00633
Priyanka B, Eckert AO, Schrey AK, Robert P (2018) ProTox-II: a webserver for the prediction of toxicity of chemicals. Nucleic Acids Res 46(W1):W257–W263. https://doi.org/10.1093/nar/gky318
Hongbin Y, Chaofeng L, Lixia S, Jie L, Yingchun C (2019) admetSAR 2.0: web-service for prediction and optimization of chemical ADMET properties. Bioinformatics 35(6):1067–1069. https://doi.org/10.1093/bioinformatics/bty707
Aliper A, Plis S, Artemov A, Ulloa A, Mamoshina P, Zhavoronkov A (2016) Deep learning applications for predicting pharmacological properties of drugs and drug repurposing using transcriptomic data. Mol Pharm 13(7):2524–2530. https://doi.org/10.1021/acs.molpharmaceut.6b00248
Pan Z, Yu W, Yi X, Khan A, Yuan F, Zheng Y (2019) Recent progress on generative adversarial networks (GANs): a survey. IEEE Access 7:36322–36333. https://doi.org/10.1109/ACCESS.2019.2905015
Khan A, Sohail A, Zahoora U, Qureshi AS (2020) A survey of the recent architectures of deep convolutional neural networks. Artif Intell Rev 53:5455–5516. https://doi.org/10.1007/s10462-020-09825-6
Ma J, Sheridan RP, Liaw A, Dahl GE, Svetnik V (2015) Deep neural nets as a method for quantitative structure–activity relationships. J Chem Inf Model 55(2):263–274. https://doi.org/10.1021/ci500747n
Mayr A, Klambauer G, Unterthiner T, Hochreiter S (2016) DeepTox: toxicity prediction using deep learning. Front environ sci 3:80. https://doi.org/10.3389/fenvs.2015.00080
Koutsoukas A, Monaghan KJ, Li X, Huan J (2017) Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data. J Cheminform 9(1):42. https://doi.org/10.1186/s13321-017-0226-y
Todeschini R, Consonni V, Mannhold R, Kubinyi H, Folkers G (2009) Molecular descriptors for chemoinformatics: volume I: alphabetical listing/volume II: appendices references. Methods and principles in medicinal chemistry. Wiley, Hoboken. https://doi.org/10.1002/9783527628766
Shen J, Cheng F, Xu Y, Li W, Tang Y (2010) Estimation of ADME properties with substructure pattern recognition. J Chem Inf Model 50(6):1034–1041. https://doi.org/10.1021/ci100104j
Cw YAP (2010) Software news and update PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem 32(7):1466–1474. https://doi.org/10.1002/jcc.21707
Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50(5):742–754. https://doi.org/10.1021/ci100050t
Morgan H (1965) The generation of a unique machine description for chemical structures—a technique developed at chemical abstracts service. J Chem Doc 5(2):107–113. https://doi.org/10.1021/c160017a018
Li H, Liang Y, Xu Q (2009) Support vector machines and its applications in chemistry. Chemom Intell Lab Syst 95(2):188–198. https://doi.org/10.1016/j.chemolab.2008.10.007
Hautier G, Fischer CC, Jain A, Mueller T, Ceder G (2010) Finding nature’s missing ternary oxide compounds using machine learning and density functional theory. Chem Mater 22(12):3762–3767. https://doi.org/10.1021/cm100795d
Müller K-R, Rätsch G, Sonnenburg S, Mika S, Grimm M, Heinrich N (2005) Classifying ‘drug-likeness’ with kernel-based learning methods. J Chem Inf Model 45(2):249–253. https://doi.org/10.1021/ci049737o
Bartók AP, Gillan MJ, Manby FR, Csányi G (2013) Machine-learning approach for one-and two-body corrections to density functional theory: applications to molecular and condensed water. Phys Rev B 88(5):054104. https://doi.org/10.1103/PhysRevB.88.054104
Preuer K, Klambauer G, Rippmann F, Hochreiter S, Unterthiner T (2019) Interpretable deep learning in drug discovery. Explainable AI: interpreting, explaining and visualizing deep learning. Springer, Berlin. https://doi.org/10.1007/978-3-030-28954-6_18
Bruna J, Zaremba W, Szlam A, LeCun Y (2013) Spectral networks and locally connected networks on graphs. https://arxiv.org/abs/1312.6203
Masci J, Boscaini D, Bronstein MM, Vandergheynst P (2015) Geodesic convolutional neural networks on Riemannian manifolds. In: 2015 IEEE international conference on computer vision workshop (ICCVW), Santiago, pp 832–840. https://doi.org/10.1109/ICCVW.2015.112
Duvenaud DK, Maclaurin D, Iparraguirre J, Bombarell R, Hirzel T, Aspuru-Guzik A, Adams RP (2015) Convolutional networks on graphs for learning molecular fingerprints. In: Advances in neural information processing systems, vol 28. Curran Associates, Inc. http://papers.nips.cc/paper/5954-convolutional-networks-on-graphs-for-learning-molecular-fingerprints.pdf
Goh GB, Siegel C, Vishnu A, Hodas NO, Baker N (2017) Chemception: a deep neural network with minimal chemistry knowledge matches the performance of expert-developed QSAR/QSPR models. https://arxiv.org/abs/1706.06689
Katja H, Sebastian M, Timon S, Andreas S, Antonius TL, Thomas SH, Nikolaus H, Klaus-Robert M (2009) Benchmark data set for in silico prediction of Ames mutagenicity. J Chem Inf Model 49(9):2077–2081. https://doi.org/10.1021/ci900161g
Kazius J, McGuire R, Bursi R (2005) Derivation and validation of toxicophores for mutagenicity prediction. J Med Chem 48(1):312–320. https://doi.org/10.1021/jm040835a
Helma C, Cramer T, Kramer S, De Raedt L (2004) Data mining and machine learning techniques for the identification of mutagenicity inducing substructures and structure activity relationships of noncongeneric compounds. J Chem Inf Comput Sci 44(4):1402–1411. https://doi.org/10.1021/ci034254q
Feng J, Lurati L, Ouyang H, Robinson T, Wang Y, Yuan S, Young SS (2003) Predictive toxicology: benchmarking molecular descriptors and statistical methods. J Chem Inf Comput Sci 43(5):1463–1470. https://doi.org/10.1021/ci034032s
Landrum G (2016) RDKit: open-source cheminformatics software. https://www.rdkit.org/
Bergstra J, Yamins D, Cox D (2013) Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In: Sanjoy D, David M (eds) Proceedings of the 30th international conference on machine learning, vol 1. PMLR, pp 115–123. https://doi.org/10.5555/3042817.3042832
Xu C, Cheng F, Chen L, Du Z, Li W, Liu G, Lee PW, Tang Y (2012) In silico prediction of chemical ames mutagenicity. J Chem Inf Model 52(11):2840–2847. https://doi.org/10.1021/ci300400a
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V (2011) Scikit-learn: machine learning in python. J Mach Learn Res. https://doi.org/10.1145/2786984.2786995
Sushko I, Salmina E, Potemkin VA, Poda G, Tetko IV (2012) ToxAlerts: a web server of structural alerts for toxic chemicals and compounds with potential adverse reactions. J Chem Inf Model 52(8):2310–2316. https://doi.org/10.1021/ci300245q
Fishbein L (2011) Potential industrial carcinogens and mutagens. Elsevier, Amsterdam. https://www.elsevier.com/books/potential-industrial-carcinogens-and-mutagens/fishbein/978-0-444-41777-0
Klopman G, Frierson MR, Rosenkranz HS (1990) The structural basis of the mutagenicity of chemicals in Salmonella typhimurium: the Gene-Tox Data Base. Mutat Res Fundam Mol Mech Mutagen 228(1):1–50. https://doi.org/10.1016/0027-5107(90)90013-T
Chesis L, Smith MT (1984) Mutagenicity of quinones: pathways of metabolic activation and detoxification. PNAS 81(6):1696–1700. https://doi.org/10.1073/pnas.81.6.1696
Ashby J, Tennant RW (1988) Chemical structure, Salmonella mutagenicity and extent of carcinogenicity as indicators of genotoxic carcinogenesis among 222 chemicals tested in rodents by the U.S. NCI/NTP 204(1):17–115. https://doi.org/10.1016/0165-1218(88)90114-0
Benigni R, Bossa C (2008) Structure alerts for carcinogenicity, and the Salmonella assay system: a novel insight through the chemical relational databases technology. Mutat Res Rev Mutat Res 659(3):248–261. https://doi.org/10.1016/j.mrrev.2008.05.003
Author information
Authors and Affiliations
Corresponding author
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Li, S., Zhang, L., Feng, H. et al. MutagenPred-GCNNs: A Graph Convolutional Neural Network-Based Classification Model for Mutagenicity Prediction with Data-Driven Molecular Fingerprints. Interdiscip Sci Comput Life Sci 13, 25–33 (2021). https://doi.org/10.1007/s12539-020-00407-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12539-020-00407-2