Abstract
The ongoing COVID-19 pandemic has caused dramatic loss of human life. There is an urgent need for safe and efficient anti-coronavirus infection drugs. Anti-coronavirus peptides (ACovPs) can inhibit coronavirus infection. With high-efficiency, low-toxicity, and broad-spectrum inhibitory effects on coronaviruses, they are promising candidates to be developed into a new type of anti-coronavirus drug. Experiment is the traditional way of ACovPs’ identification, which is less efficient and more expensive. With the accumulation of experimental data on ACovPs, computational prediction provides a cheaper and faster way to find anti-coronavirus peptides’ candidates. In this study, we ensemble several state-of-the-art machine learning methodologies to build nine classification models for the prediction of ACovPs. These models were pre-trained using deep neural networks, and the performance of our ensemble model, ACP-Dnnel, was evaluated across three datasets and independent dataset. We followed Chou's 5-step rules. (1) we constructed the benchmark datasets data1, data2, and data3 for training and testing, and introduced the independent validation dataset ACVP-M; (2) we analyzed the peptides sequence composition feature of the benchmark dataset; (3) we constructed the ACP-Dnnel model with deep convolutional neural network (DCNN) merged the bi-directional long short-term memory (BiLSTM) as the base model for pre-training to extract the features embedded in the benchmark dataset, and then, nine classification algorithms were introduced to ensemble together for classification prediction and voting together; (4) tenfold cross-validation was introduced during the training process, and the final model performance was evaluated; (5) finally, we constructed a user-friendly web server accessible to the public at http://150.158.148.228:5000/. The highest accuracy (ACC) of ACP-Dnnel reaches 97%, and the Matthew’s correlation coefficient (MCC) value exceeds 0.9. On three different datasets, its average accuracy is 96.0%. After the latest independent dataset validation, ACP-Dnnel improved at MCC, SP, and ACC values 6.2%, 7.5% and 6.3% greater, respectively. It is suggested that ACP-Dnnel can be helpful for the laboratory identification of ACovPs, speeding up the anti-coronavirus peptide drug discovery and development. We constructed the web server of anti-coronavirus peptides’ prediction and it is available at http://150.158.148.228:5000/.
Similar content being viewed by others
Data availability statement
All data sets used in this paper can be freely downloaded from the http://150.158.148.228:5000/Download.
References
Aslan MF, Unlersen MF, Sabanci K et al (2021) CNN-based transfer learning–BiLSTM network: a novel approach for COVID-19 infection detection. Appl Soft Comput 98:106912
Biau G (2012) Analysis of a random forests model. J Mach Learn Res 13(1):1063–1095
Boopathi V, Subramaniyam S, Malik A, Lee G, Manavalan B, Yang D-C (2019) mACPpred: a support vector machine-based metapredictor for identification of anticancer peptides. Int J Mol Sci 20(8):1964
Chang KY, Yang JR (2013) Analysis and prediction of highly effective antiviral peptides based on random forests. PLoS ONE 8(8):e70166
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–794
Chen S, Liao Y, Zhao J et al (2023) PACVP: prediction of anti-coronavirus peptides using a stacking learning strategy with effective feature representation. In: IEEE/ACM transactions on computational biology and bioinformatics
Chicco D, Jurman G (2020) The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom 21(1):1–13
Chou KC (2011) Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol 273(1):236–247
Chowdhury AS, Reehl SM, Kehn-Hall K et al (2020) Better understanding and prediction of antiviral peptides through primary and secondary structure feature importance. Sci Rep 10(1):1–8
Chung CR, Kuo TR, Wu LC et al (2020) Characterization and identification of antimicrobial peptides with different functional activities. Brief Bioinform 21(3):1098–1114
Dong X, Yu Z, Cao W et al (2020) A survey on ensemble learning. Front Comp Sci 14(2):241–258
Dotolo S, Marabotti A, Facchiano A et al (2021) A review on drug repurposing applicable to COVID-19. Brief Bioinform 22(2):726–741
Dzisoo AM, He B, Karikari R et al (2019) CISI: a tool for predicting cross-interaction or self-interaction of monoclonal antibodies using sequences. Interdiscip Sci Comput Life Sci 11(4):691–697
Fan HH, Wang LQ, Liu WL et al (2020) Repurposing of clinically approved drugs for treatment of coronavirus disease 2019 in a 2019-novel coronavirus-related coronavirus model. Chin Med J 133(09):1051–1056
Genuer R, Poggi JM (2020) Random forests. In: Random forests with R. Springer, Cham, pp 33–55
Gns HS, Saraswathy GR, Murahari M et al (2019) An update on drug repurposing: re-written saga of the drug’s fate. Biomed Pharmacother 110:700–716
Gomes B, Augusto MT, Felício MR et al (2018) Designing improved active peptides for therapeutic approaches against infectious diseases. Biotechnol Adv 36(2):415–429
Hu S, Ma R, Wang H (2019) An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences. PLoS ONE 14(11):e0225317
Kamel H, Abdulah D, Al-Tuwaijari JM (2019) Cancer classification using gaussian naive bayes algorithm. In: 2019 international engineering conference (IEC). IEEE, pp 165–170
Kieslich CA, Alimirzaei F, Song H et al (2021) Data-driven prediction of antiviral peptides based on periodicities of amino acid properties. Comput Aided Chem Eng 50:2019–2024
Kramer O, Kramer O (2016) Scikit-learn. Machine learning for evolution strategies, pp 45–53
Kumar S, Kumar S (2019) Molecular docking: a structure-based approach for drug repurposing. In: Silico drug design. Academic Press, pp 161–189
Kurata H, Tsukiyama S, Manavalan B (2022) iACVP: markedly enhanced identification of anti-coronavirus peptides using a dataset-specific word2vec model. Brief Bioinform 23(4):bbac265
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Lee HT, Lee CC, Yang JR et al (2015) A large-scale structural classification of antimicrobial peptides. BioMed Res Int 2015
Lin TT, Sun YY, Cheng WC et al (2021) Developing an antiviral peptides predictor with generative adversarial network data augmentation. bioRxiv
Lippmann C, Kringel D, Ultsch A et al (2018) Computational functional genomics-based approaches in analgesic drug discovery and repurposing. Pharmacogenomics 19(9):783–797
Liu Y, Zhu Y, Sun X et al (2023) DRAVP: a comprehensive database of antiviral peptides and proteins. Viruses 15(4):820
Manavalan B, Basith S, Lee G (2022) Comparative analysis of machine learning-based approaches for identifying therapeutic peptides targeting SARS-CoV-2. Brief Bioinform 23(1):bbab412
Masoudi-Sobhanzadeh Y, Esmaeili H, Masoudi-Nejad A (2022) A fuzzy logic-based computational method for the repurposing of drugs against COVID-19. Bioimpacts 12(4):315
Meher PK, Sahu TK, Saini V et al (2017) Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou’s general PseAAC. Sci Rep 7(1):1–12
Mishal A, Saravanan R, Atchitha SS et al (2020) A review of corona virus disease-2019. History 4(7):1–8
Moolayil J, Moolayil J, John S (2019) Learn Keras for deep neural networks. Apress, Birmingham
Nishant T, Abid Q, Manoj K (2012) AVPpred: collection and prediction of highly effective antiviral peptides. Nucleic Acids Res 40:W199-204
O’Brien-Simpson NM, Hoffmann R, Chia CS et al (2018) Antimicrobial and anticancer peptides. Front Chem 6:13
Outlaw VK, Bovier FT, Mears MC et al (2020) Inhibition of coronavirus entry in vitro and ex vivo by a lipid-conjugated peptide derived from the SARS-CoV-2 spike glycoprotein HRC domain. Mbio 11(5):e01935-e2020
Pang Y, Yao L, Jhong JH et al (2021a) AVPIden: a new scheme for identification and functional prediction of antiviral peptides based on machine learning approaches. Brief Bioinform 22(6):263
Pang Y, Wang Z, Jhong JH et al (2021b) Identifying anti-coronavirus peptides by incorporating different negative datasets and imbalanced learning strategies. Brief Bioinform 22(2):1085–1095
Parikh R, Mathai A, Parikh S et al (2008) Understanding and using sensitivity, specificity and predictive values. Indian J Ophthalmol 56(1):45
Pfalzgraff A, Brandenburg K, Weindl G (2018) Antimicrobial peptides and their therapeutic potential for bacterial skin infections and wounds. Front Pharmacol 9:281
Pinzi L, Rastelli G (2019) Molecular docking: shifting paradigms in drug discovery. Int J Mol Sci 20(18):4331
Ruby U, Yendapalli V (2020) Binary cross entropy with deep learning technique for image classification. Int J Adv Trends Comput Sci Eng 9(10)
Sandag GA (2020) A prediction model of company health using bagging classifier. JITK (jurnal Ilmu Pengetahuan Dan Teknologi Komputer) 6(1):41–46
Shin HC, Roth HR, Gao M et al (2016) Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging 35(5):1285–1298
Shipe ME, Deppen SA, Farjah F et al (2019) Developing prediction models for clinical use using logistic regression: an overview. J Thorac Dis 11(Suppl 4):S574
Siami-Namini S, Tavakoli N, Namin AS (2019) The performance of LSTM and BiLSTM in forecasting time series. In: 2019 IEEE international conference on big data (Big Data). IEEE, pp 3285–3292
Singh A (2021) A review of coronavirus disease—2019. BR Nahata Smriti Sansthan Int J Phram Sci Clin Res 1(3)
Timmons PB, Hewage CM (2021a) ENNAVIA is an innovative new method which employs neural networks for antiviral and anti-coronavirus activity prediction for therapeutic peptides. bioRxiv
Timmons PB, Hewage CM (2021b) ENNAVIA is a novel method which employs neural networks for antiviral and anti-coronavirus activity prediction for therapeutic peptides. Brief Bioinform 22(6):bbab258
Van Laarhoven T (2017) L2 regularization versus batch and weight normalization. https://arxiv.org/abs/1706.05350
Wang B, Yao Y, Wei PW et al (2021) Housefly phormicin inhibits Staphylococcus aureus and MRSA by disrupting biofilm formation and altering gene expression in vitro and in vivo. Int J Biol Macromol 167:1424–1434
Wei L, Zhou C, Su R et al (2019) PEPred-Suite: improved and robust prediction of therapeutic peptides using adaptive feature representation learning. Bioinformatics 35(21):4272–4280
Xiao X, Shao YT, Cheng X et al (2021) iAMP-CA2L: a new CNN-BiLSTM-SVM classifier based on cellular automata image for identifying antimicrobial peptides and their functional types. Brief Bioinform 22(6):bbab209
Xing W, Bei Y (2019) Medical health big data classification based on KNN classification algorithm. IEEE Access 8:28808–28819
Xue H, Li J, Xie H et al (2018) Review of drug repositioning approaches and resources. Int J Biol Sci 14(10):1232
Yang W, Zhu XJ, Huang J et al (2019) A brief survey of machine learning methods in protein sub-Golgi localization. Curr Bioinform 14(3):234–240
Yang S, Huang J, He B (2021) CASPredict: a web service for identifying Cas proteins. PeerJ 9:e11887
Yoo SH, Geng H, Chiu TL et al (2020) Deep learning-based decision-tree classifier for COVID-19 diagnosis from chest X-ray imaging. Front Med 7:427
Zhang Q, Chen X, Li B et al (2022) A database of anti-coronavirus peptides. Sci Data 9(1):294
Zhou Y, Hou Y, Shen J et al (2020) Network-based drug repurposing for novel coronavirus 2019-nCoV/SARS-CoV-2. Cell Discov 6(1):14
Zhou Y, Xie S, Yang Y et al (2022) SSH2.0: a better tool for predicting the hydrophobic interaction risk of monoclonal antibody. Front Genet 13:842127
Acknowledgements
This project was supported by the Provincial Health Commission Science and Technology Foundation of Guizhou (No. gzwkj2023-590) and the Guizhou Medical University National Natural Science Foundation Cultivation Project (No. 21NSFCP40), and thanks also to all the authors involved in the project.
Funding
This work was supported by the Provincial Health Commission Science and Technology Foundation of Guizhou (No. gzwkj2023-590), Guizhou Medical University National Natural Science Foundation Cultivation Project (No. 21NSFCP40), and National Natural Science Foundation of China (No. 62071099 and No. 32160668).
Author information
Authors and Affiliations
Contributions
M-YL conceived and designed the study and wrote the paper. M-YL, TW, and Y-XZ configured the experimental environment. M-YL and TW finished the developmental of the webserver. And H-ML provides interpretation of the biological significance of anti-coronavirus peptides and gives guidance. Y-WZ and Z-RH participates in project discussions and provides constructive suggestions. C-CX and JH provides algorithm optimization ideas, JH participates in all research work and gives guidance. All authors have read and agreed to the published version of the manuscript.
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest.
Institutional review board statement
Not applicable.
Informed consent statement
Not applicable.
Additional information
Handling editor: F. Albericio.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, M., Liu, H., Wu, T. et al. ACP-Dnnel: anti-coronavirus peptides’ prediction based on deep neural network ensemble learning. Amino Acids 55, 1121–1136 (2023). https://doi.org/10.1007/s00726-023-03300-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00726-023-03300-6