Abstract
We are exposed to various chemical compounds present in the environment, cosmetics, and drugs almost every day. Mutagenicity is a valuable property that plays a significant role in establishing a chemical compound’s safety. Exposure and handling of mutagenic chemicals in the environment pose a high health risk; therefore, identification and screening of these chemicals are essential. Considering the time constraints and the pressure to avoid laboratory animals’ use, the shift to alternative methodologies that can establish a rapid and cost-effective detection without undue over-conservation seems critical. In this regard, computational detection and identification of the mutagens in environmental samples like drugs, pesticides, dyes, reagents, wastewater, cosmetics, and other substances is vital. From the last two decades, there have been numerous efforts to develop the prediction models for mutagenicity, and by far, machine learning methods have demonstrated some noteworthy performance and reliability. However, the accuracy of such prediction models has always been one of the major concerns for the researchers working in this area. The mutagenicity prediction models were developed using deep neural network (DNN), support vector machine, k-nearest neighbor, and random forest. The developed classifiers were based on 3039 compounds and validated on 1014 compounds; each of them encoded with 1597 molecular feature vectors. DNN-based prediction model yielded highest prediction accuracy of 92.95% and 83.81% with the training and test data, respectively. The area under the receiver’s operating curve and precision-recall curve values were found to be 0.894 and 0.838, respectively. The DNN-based classifier not only fits the data with better performance as compared to traditional machine learning algorithms, viz., support vector machine, k-nearest neighbor, and random forest (with and without feature reduction) but also yields better performance metrics. In current work, we propose a DNN-based model to predict mutagenicity of compounds.
Similar content being viewed by others
Availability of data and material
Not applicable.
References
Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, et al (2016) TensorFlow: large-scale machine learning on heterogeneous distributed systems arXiv: 1603.04467. https://arxiv.org/abs/1603.04467
Alipanahi B, Delong A, Weirauch MT, Frey BJ (2015) Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol 33(8):831–838. https://doi.org/10.1038/nbt.3300
Alpaydin E. Introduction to machine learning. 3rd ed. MIT press; 2014.
alvaDesc. Accessed: 15 October 2020. Available at: https://www.alvascience.com/alvadesc, Accessed on 24.07.2019
Bastien F, Lamblin P, Paseanu R, Bergstra J, Goodfellow I et al (2012) Theano: new features and speed improvements. arXiv:1211.5590. https://arxiv.org/abs/1211.5590
Bhagat HA, Compton SA, Musso DL, Laudeman CP, Jackson KMP, Yi NY, Nierobisz LS, Forsberg L, Brenman JE, Sexton JZ (2018) N-substituted phenylbenzamides of the niclosamide chemotype attenuate obesity related changes in high fat diet fed mice. PLoS One 13(10):e0204605. https://doi.org/10.1371/journal.pone.0204605
Bower JH, Bolouri H (2004) Computational modeling of genetic and biochemical networks MIT Press 390.
Bryce SM, Bernacki DT, Smith-Roe SL, Witt KL, Bemis JC, Dertinger SD (2018) Investigating the generalizability of the MultiFlow ® DNA damage assay and several companion machine learning models with a set of 103 diverse test chemicals. Toxicol Sci 162(1):146–166. https://doi.org/10.1093/toxsci/kfx235
Cao C, Liu F, Tan H, Song D, Shu W, Li W, Zhou Y, Bo X, Xie Z (2018) Deep Learning and Its Applications in Biomedicine. Genomics Proteomics Bioinformatics 16(1):17–32. https://doi.org/10.1016/j.gpb.2017.07.003
Di Lena P, Nagata K, Baldi P (2012) Deep architectures for protein contact map prediction. Bioinformatics 28(19):2449–2457. https://doi.org/10.1093/bioinformatics/bts475
Ding YL, Lyu YC, Leong MK (2017) In silico prediction of the mutagenicity of nitroaromatic compounds using a novel two-QSAR approach. Toxicol in Vitro 40:102–114. https://doi.org/10.1016/j.tiv.2016.12.013
Dong Y, Li D (2011) Deep learning and its applications to signal and information processing. IEEE Signal Process Mag 28(1):145–154. https://doi.org/10.1109/MSP.2010.939038
Eickholt J, Cheng J (2013) DNdisorder: predicting protein disorder using boosting and deep networks. BMC Bioinformatics 14:88. https://doi.org/10.1186/1471-2105-14-88
Ford KA, Ryslik G, Chan BK, Lewin-Koh SC, Almeida D, Stokes M, Gomez SR (2017) Comparative evaluation of 11 in silico models for the prediction of small molecule mutagenicity: role of steric hindrance and electron-withdrawing groups. Toxicol Mech Methods 27(1):24–35. https://doi.org/10.1080/15376516.2016.1174761
Fukushima K (1980) Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern 36:193–202. https://doi.org/10.1007/BF00344251
Goh GB, Hodas NO, Vishnu A (2017) Deep learning for computational chemistry. J Comput Chem 38(16):1291–1307. https://doi.org/10.1002/jcc.24764
Guan D, Fan K, Spence I, Matthews S (2018) QSAR ligand dataset for modelling mutagenicity, genotoxicity, and rodent carcinogenicity. Data Brief 17:876–884. https://doi.org/10.1016/j.dib.2018.01.077
Hao Y, Sun G, Fan T, Sun X, Liu Y, Zhang N, Zhao L, Zhong R, Peng Y (2019) Prediction on the mutagenicity of nitroaromatic compounds using quantum chemistry descriptors based QSAR and machine learning derived classification methods. Ecotoxicol Environ Saf 186:109822. https://doi.org/10.1016/j.ecoenv.2019.109822
Haranosono Y, Ueoka H, Kito G, Nemoto S, Kurata M, Sakaki H (2018) A reaction mechanism-based prediction of mutagenicity: α-halo carbonyl compounds adduct with DNA by SN2 reaction. J Toxicol Sci 43(3):203–211. https://doi.org/10.2131/jts.43.203
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507. https://doi.org/10.1126/science.1127647
Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554. https://doi.org/10.1162/neco.2006.18.7.1527
Honma M (2020) An assessment of mutagenicity of chemical substances by (quantitative) structure-activity relationship. Genes Environ 42:23. https://doi.org/10.1186/s41021-020-00163-1
Hsu KH, Su BH, Tu YS, Lin OA, Tseng YJ (2016) Mutagenicity in a molecule: identification of core structural features of mutagenicity using a scaffold analysis. PLoS One 11(2):e0148900. https://doi.org/10.1371/journal.pone.0148900
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift arXiv:1502.03167v3. https://arxiv.org/abs/1502.03167
Kazius J, McGuire R, Bursi R (2005) Derivation and validation of toxicophores for mutagenicity prediction. J Med Chem 48(1):312–320. https://doi.org/10.1021/jm040835a
Kuhnke L, Ter Laak A, Göller AH (2019) Mechanistic reactivity descriptors for the prediction of ames mutagenicity of primary aromatic amines. J Chem Inf Model 59(2):668–672. https://doi.org/10.1021/acs.jcim.8b00758
Kumar R, Sharma A, Varadwaj P et al (2011) Classification of oral bioavailability of drugs by machine learning approaches. J Comp Int Sci 2(3):1–18. https://doi.org/10.6062/jcis.2011.02.03.0045
Kumar R, Sharma A, Siddiqui MH, Tiwari RK (2016) Prediction of metabolism of drugs using artificial intelligence: how far have we reached? Curr Drug Metab 17(2):129–141. https://doi.org/10.2174/1389200216666151103121352
Kumar R, Sharma A, Siddiqui MH, Tiwari RK (2017) Prediction of human intestinal absorption of compounds using artificial intelligence techniques. Curr Drug Discov Technol 14(4):244–254. https://doi.org/10.2174/1570163814666170404160911
Kumar R, Sharma A, Siddiqui MH, Tiwari RK (2018a) Promises of machine learning approaches in prediction of absorption of compounds. Mini-Rev Med Chem 2018;18(3):196-207. https://doi.org/10.2174/1389557517666170315150116
Kumar R, Sharma A, Siddiqui MH, Tiwari RK (2018b) Prediction of drug-plasma protein binding using artificial intelligence based algorithms. Comb Chem High Throughput Screen 21(1):57–64. https://doi.org/10.2174/1386207321666171218121557
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature. 521(7553):436–444. https://doi.org/10.1038/nature14539
Leong MK, Lin SW, Chen HB, Tsai FY (2010) Predicting mutagenicity of aromatic amines by various machine learning approaches. Toxicol Sci 116(2):498–513. https://doi.org/10.1093/toxsci/kfq159
Lu J, Zhang P, Zou XW, Zhao XQ, Cheng KG, Zhao YL, Bi Y, Zheng MY, Luo XM (2017) In silico prediction of chemical toxicity profile using local lazy learning. Comb Chem High Throughput Screen 20(4):346–353. https://doi.org/10.2174/1386207320666170217151826
Maron DM, Ames BN (1983) Revised methods for Salmonella mutagenicity test. Mutat Res 113:173–215. https://doi.org/10.1016/0165-1161(83)90010-9
Mayr A, Klambauer G, Unterthiner T, Hochreiter S (2016) DeepTox: toxicity prediction using deep learning. Front Environ Sc 3:80. https://doi.org/10.3389/fenvs.2015.00080
Mombelli E, Raitano G, Benfenati E (2016) In silico prediction of chemically induced mutagenicity: how to use QSAR models and interpret their results. Methods Mol Biol 1425:87–105. https://doi.org/10.1007/978-1-4939-3609-0_5
Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10) pp.807-814
Norinder U, Myatt G, Ahlberg E et al (2018) Predicting aromatic amine mutagenicity with confidence: a case study using conformal prediction. Biomolecules 8(3):85. https://doi.org/10.3390/biom8030085
Norinder U, Ahlberg E, Carlsson L (2019) Predicting Ames mutagenicity using conformal prediction in the Ames/QSAR international challenge project. Mutagenesis 34(1):33–40. https://doi.org/10.1093/mutage/gey038
Rim KT, Kim SJ (2015) A review on mutagenicity testing for hazard classification of chemicals at work: focusing on in vivo micronucleus test for allyl chloride. Saf Health Work 6(3):184–191. https://doi.org/10.1016/j.shaw.2015.05.005
Saxena D, Sharma A, Siddiqui MH, Kumar R (2019) Blood brain barrier permeability prediction using machine learning techniques: an update. Curr Pharm Biotechnol 20(14):1163–1171. https://doi.org/10.2174/1389201020666190821145346
Sharma A, Kumar R, Varadwaj PK, Ahmad A, Ashraf GM (2011) A comparative study of support vector machine, artificial neural network and bayesian classifier for mutagenicity prediction. Interdiscip Sci 3(3):232–239. https://doi.org/10.1007/s12539-011-0102-9
Sharma A, Kumar R, Semwal R, Aier I, Varadwaj P (2020) DeepOlf: deep neural network based architecture for predicting odorants and their interacting olfactory receptors. IEEE/ACM Trans Comput Biol Bioinform 1:1. https://doi.org/10.1109/TCBB.2020.3002154
Sharma A, Kumar R, Ranjta S, Varadwaj PK (2021) SMILES to smell: decoding the structure-odor relationship of chemical compounds using the deep neural network approach. J Chem Inf Model 61(2):676–688. https://doi.org/10.1021/acs.jcim.0c01288
Spencer M, Eickholt J, Cheng J (2015) A deep learning network approach to ab initio protein secondary structure prediction. IEEE/ACM Trans Comput Biol Bioinform 12(1):103–112. https://doi.org/10.1109/TCBB.2014.2343960
Tianqi C, Mu L, Li Y et al (2015) MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. In: Neural information processing systems, Workshop on Machine Learning Systems arXiv:1512.01274. https://arxiv.org/abs/1512.01274
Van Bossuyt M, Van Hoeck E, Raitano G, Vanhaecke T, Benfenati E et al (2018) Performance of in silico models for mutagenicity prediction of food contact materials. Toxicol Sci 163(2):632–638. https://doi.org/10.1093/toxsci/kfy057
Webb SJ, Hanser T, Howlin B, Krause P, Vessey JD (2014) Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity. Aust J Chem 6(1):8. https://doi.org/10.1186/1758-2946-6-8
Xu C, Cheng F, Chen L, Du Z, Li W et al (2012) In silico prediction of chemical Ames mutagenicity. J Chem Inf Model 52(11):2840–2847. https://doi.org/10.1021/ci300400a
Yangqing J, Evan S, Jeff D et al (2014) Caffe: convolutional architecture for fast feature embedding. arXiv 1408.5093. https://arxiv.org/abs/1408.5093
Zhang H, Kang YL, Zhu YY, Zhao KX, Liang JY, Ding L, Zhang TG, Zhang J (2017) Novel naïve Bayes classification models for predicting the chemical Ames mutagenicity. Toxicol in Vitro 41:56–63. https://doi.org/10.1016/j.tiv.2017.02.016
Zhou J, Troyanskaya OG (2015) Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods 12:931–934. https://doi.org/10.1038/nmeth.3547
Author information
Authors and Affiliations
Contributions
RK, FUK, and AS designed the study and prepared the draft of the manuscript. MHS and IBAA performed the literature review and aided in revising the manuscript. MAK, GMA, BSA, and MSU edited the whole manuscript and improved the draft. All authors read and approved the final submitted version of the manuscript.
Corresponding authors
Ethics declarations
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Conflict of interest
The authors declare no conflict of interest.
Additional information
Responsible Editor: Ludek Blaha
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
ESM 1
(DOCX 1122 kb)
Rights and permissions
About this article
Cite this article
Kumar, R., Khan, F.U., Sharma, A. et al. A deep neural network–based approach for prediction of mutagenicity of compounds. Environ Sci Pollut Res 28, 47641–47650 (2021). https://doi.org/10.1007/s11356-021-14028-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11356-021-14028-9