Abstract
The ubiquitous antimicrobial peptides (AMPs), with a broad range of antimicrobial activities, represent a great promise for combating the multi-drug resistant infections. In this study, using a large and diverse set of AMPs (2638) and non-AMPs (3700), we have explored a variety of machine learning classifiers to build in silico models for AMP prediction, including Random Forest (RF), k-Nearest Neighbors (k-NN), Support Vector Machine (SVM), Decision Tree (DT), Naive Bayes (NB), Quadratic Discriminant Analysis (QDA), and ensemble learning. Among the various models generated, the RF classifier-based model top-performed in both the internal [Accuracy: 91.40%, Precision: 89.37%, Sensitivity: 90.05%, and Specificity: 92.36%] and external validations [Accuracy: 89.43%, Precision: 88.92%, Sensitivity: 85.21%, and Specificity: 92.43%]. In addition, the RF classifier-based model correctly predicted the known AMPs and non-AMPs; those kept aside as an additional external validation set. The performance assessment revealed three features viz. ChargeD2001, PAAC12 (pseudo amino acid composition), and polarity T13 that are likely to play vital roles in the antimicrobial activity of AMPs. The developed RF-based classification model may further be useful in the design and prediction of the novel potential AMPs.
Similar content being viewed by others
References
Sang Y, Blecha F (2008) Antimicrobial peptides and bacteriocins: alternatives to traditional antibiotics. Anim Health Res Rev 9(2):227–235
Lata S, Sharma BK, Raghava GP (2007) Analysis and prediction of antibacterial peptides. BMC Bioinformatics 8:263
Lai R et al (2002) Antimicrobial peptides from skin secretions of Chinese red belly toad Bombina maxima. Peptides 23(3):427–435
Finlay BB, Hancock RE (2004) Can innate immunity be enhanced to treat microbial infections? Nat Rev Microbiol 2(6):497–504
Wang Z, Wang G (2004) APD: the antimicrobial peptide database. Nucleic Acids Res 32:D590–D592
Dubos RJ, Cattaneo C (1939) Studies on a bactericidal agent extracted from a soil bacillus: III. Preparation and activity of a protein-free fraction. J Exp Med 70(3):249–256
Dubos RJ (1939) Studies on a bactericidal agent extracted from a soil bacillus: II. Protective effect of the bactericidal agent against experimental pneumococcus infections in mice. J Exp Med 70(1):11–17
Dubos RJ (1939) Studies on a bactericidal agent extracted from a soil bacillus: I. Preparation of the agent its activity in vitro. J Exp Med 70(1):1–10
Dubos RJ, Hotchkiss RD (1941) The production of bactericidal substances by aerobic sporulating bacilli. J Exp Med 73(5):629–640
Van Epps HL (2006) Rene Dubos: unearthing antibiotics. J Exp Med 203(2):259
Balls AK, Thompson RR, Walden MK (1946) A crystalline protein with beta-amylase activity, prepared from sweet potatoes. J Biol Chem 163:571
Ohtani S et al (1977) Complete primary structures of two subunits of purothionin A, a lethal protein for brewer’s yeast from wheat flour. J Biochem 82(3):753–767
Steiner H et al (1981) Sequence and specificity of two antibacterial proteins involved in insect immunity. Nature 292(5820):246–248
Waghu FH et al (2016) CAMPR3: a database on sequences, structures and signatures of antimicrobial peptides. Nucleic Acids Res 44(D1):D1094–D1097
Wright GD (2007) The antibiotic resistome: the nexus of chemical and genetic diversity. Nat Rev Microbiol 5(3):175–186
Kindrachuk J, Napper S (2010) Structure-activity relationships of multifunctional host defence peptides. Mini Rev Med Chem 10(7):596–614
Yount NY et al (2006) Advances in antimicrobial peptide immunobiology. Biopolymers 84(5):435–458
Wang G, Li X, Wang Z (2009) APD2: the updated antimicrobial peptide database and its application in peptide design. Nucleic Acids Res 37:D933–D937
Sirtori LR, Motta Ade S, Brandelli A (2008) Mode of action of antimicrobial peptide P45 on Listeria monocytogenes. J Basic Microbiol 48(5):393–400
Gordon YJ, Romanowski EG, McDermott AM (2005) A review of antimicrobial peptides and their therapeutic potential as anti-infective drugs. Curr Eye Res 30(7):505–515
Mahlapuu M et al (2016) Antimicrobial peptides: an emerging category of therapeutic agents. Front Cell Infect Microbiol 6:194
Friedman LM et al (2015) Fundamentals of clinical trials. Springer, Berlin
Kummar S et al (2007) Compressing drug development timelines in oncology using phase ‘0’ trials. Nat Rev Cancer 7(2):131–139
Sanz-Ruiz R et al (2010) Phases I-III clinical trials using adult stem cells. Stem Cells Int. https://doi.org/10.4061/2010/579142
Flay BR (1986) Efficacy and effectiveness trials (and other phases of research) in the development of health promotion programs. Prev Med 15(5):451–474
Torrent M, Nogues MV, Boix E (2012) Discovering new in silico tools for antimicrobial peptide prediction. Curr Drug Targets 13(9):1148–1157
Wani MA, Roy KK (2021) Development and validation of consensus machine learning-based models for the prediction of novel small molecules as potential anti-tubercular agents. Mol Divers. https://doi.org/10.1007/s11030-021-10238-y
Pundir P, Gomanse V, Krishnamacharya N (2013) Classification and prediction techniques using machine learning for anomaly detection. I J Eng Res Appl 1:1716–1722
Stephenson N et al (2019) Survey of machine learning techniques in drug discovery. Curr Drug Metabol 20(3):185–193
Cheng J, Tegge AN, Baldi P (2008) Machine learning methods for protein structure prediction. IEEE Rev Biomed Eng 1:41–49
Li J et al (2020) ACNNT3: attention-CNN framework for prediction of sequence-based bacterial type III secreted effectors. Comput Math Methods Med 2020:3974598
Li Z et al (2019) RDense: a protein-RNA binding prediction model based on bidirectional recurrent neural network and densely connected convolutional networks. IEEE Access 8:14588–14605
Aranha MP et al (2020) Prediction of peptide binding to MHC using machine learning with sequence and structure-based feature sets. Biochim et Biophys Acta 1864(4):129535
Jiang X et al (2017) Identify Huntington’s disease associated genes based on restricted Boltzmann machine with RNA-seq data. BMC Bioinformatics 18(1):1–13
Mignone P et al (2020) Exploiting transfer learning for the reconstruction of the human gene regulatory network. Bioinformatics 36(5):1553–1561
Pio G et al (2020) Exploiting causality in gene network reconstruction based on graph embedding. Mach Learn 109(6):1231–1279
Smith VA, Jarvis ED, Hartemink AJ (2002) Evaluating functional network inference using simulations of complex biological systems. Bioinformatics 18:S216–S224
Jiang X et al (2020) A generative adversarial network model for disease gene prediction with RNA-seq data. IEEE Access 8:37352–37360
Thomas S et al (2010) CAMP: a useful resource for research on antimicrobial peptides. Nucleic Acids Res 38:D774–D780
Wang P et al (2011) Prediction of antimicrobial peptides based on sequence alignment and feature selection methods. PLoS ONE 6(4):e18476
Waghu FH et al (2014) CAMP: collection of sequences and structures of antimicrobial peptides. Nucleic Acids Res 42(D1):D1154–D1158
Ng XY, Rosdi BA, Shahrudin S (2015) Prediction of antimicrobial peptides based on sequence alignment and support vector machine-pairwise algorithm utilizing LZ-complexity. Biomed Res Int 2015:212715
Caprani M et al (2020) Identification of antimicrobial peptides from macroalgae with machine learning. In: International conference on practical applications of computational biology & bioinformatics, Springer
Zhao T, Hu Y, Zang T (2020) DRACP: a novel method for identification of anticancer peptides. BMC Bioinformatics 21(16):1–11
Ho TK (1998) The random subspace method for constructing decision forests. IEEE PAMI 20(8):832–844
Lira F et al (2013) Prediction of antimicrobial activity of synthetic peptides by a decision tree model. Appl Environm Microbiol 79(10):3156–3159
Exarchos KP et al (2007) Predicting peptide bond conformation using feature selection and the Naive Bayes approach. In: 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, IEEE
Chen W, Luo L (2009) Classification of antimicrobial peptide using diversity measure with quadratic discriminant analysis. J Microbiol Methods 78(1):94–96
Usmani SS, Bhalla S, Raghava GP (2018) Prediction of antitubercular peptides from sequence information using ensemble classifier and hybrid features. Front Pharmacol 9:954
Cao DS et al (2013) PyDPI: freely available python package for chemoinformatics, bioinformatics, and chemogenomics studies. J Chem Inf Model 53(11):3086–3096
Pedregosa F et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Team R (2020) RStudio: integrated development for R, in RStudio. PBC, Boston
Zhao X et al (2013) LAMP: a database linking antimicrobial peptides. PLoS ONE 8(6):e66557
Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22(13):1658–1659
Todeschini R, Consonni V (2000) Handbook of molecular descriptors. John Wiley & Sons, New York
Chou K-C (2001) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins 43(3):246–255
Shen H-B, Chou K-C (2008) PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. Anal Biochem 373(2):386–388
Komer B, Bergstra J, Eliasmith C (2014) Hyperopt-sklearn: automatic hyperparameter configuration for scikit-learn. In: ICML Workshop on AutoML
Donoho DL (2000) High-dimensional data analysis: the curses and blessings of dimensionality. In: AMS Conf Math Challenges 21st Century
Spruyt V (2014) The curse of dimensionality in classification. Computer Vision for Dummies. Available from: https://www.visiondummy.com/2014/04/curse-dimensionality-affect-classification/
Huljanah M et al (2019) Feature selection using random forest classifier for predicting prostate cancer. In: IOP conference series: materials science and engineering, IOP Publishing
Popov NV, Razmochaeva NV, Klionskiy DM (2020) Investigation of algorithms for converting dimension of feature space in retail data analysis problems. In: 2020 9th mediterranean conference on embedded computing (MECO), IEEE
Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46(3):175–185
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Patrzykat A et al (2003) Novel antimicrobial peptides derived from flatfish genes. Antimicrob Agents Chemother 47(8):2464–2470
Sen PC, Hajra M, Ghosh M (2020) Supervised classification algorithms in machine learning: a survey and review. In: Emerging technology in modelling and graphics, Springer, p. 99-111
Shawe-Taylor J et al (1998) Structural risk minimization over data-dependent hierarchies. IEEE Trans Inf Theory 44(5):1926–1940
Warner HR et al (1961) A mathematical approach to medical diagnosis: application to congenital heart disease. JAMA 177(3):177–183
Zhang H (2004) The optimality of naive bayes. In Proc. Seventeenth Int. Florida Artif. Intell. Res. Soc. Conf. FLAIRS
Zhang MQ (1997) Identification of protein coding regions in the human genome by quadratic discriminant analysis. PNAS 94(2):565–568
Lu J, Luo L (2008) Prediction for human transcription start site using diversity measure with quadratic discriminant. J Bioinformation 2(7):316
Shaikh R (2018) Cross validation explained: evaluating estimator performance. Towards Data Science. Available from: https://www.towardsdatascience.com/cross-validation-explained-evaluating-estimator-performance-e51e5430ff85
Banerjee P, Dehnbostel FO, Preissner R (2018) Prediction is a balancing act: importance of sampling methods to balance sensitivity and specificity of predictive models based on imbalanced chemical data sets. Front Chem 6:362
Nematzadeh Z, Ibrahim R, Selamat A (2015) Comparative studies on breast cancer classifications with k-fold cross validations using machine learning techniques. In: 10th Asian Control conference (ASCC), IEEE
Yang K et al (2011) Determining the repeat number of cross-validation. In: 4th International conference on biomedical engineering and informatics (BMEI), IEEE
Ngai PH, Ng TB (2004) A ribonuclease with antimicrobial, antimitogenic and antiproliferative activities from the edible mushroom Pleurotus sajor-caju. Peptides 25(1):11–17
Pimenta AM et al (2005) Electrospray ionization quadrupole time-of-flight and matrix-assisted laser desorption/ionization tandem time-of-flight mass spectrometric analyses to solve micro-heterogeneity in post-translationally modified peptides from Phoneutria nigriventer (Aranea, Ctenidae) venom. Rapid Commun Mass Spectrom 19(1):31–37
Schoofs L et al (1990) Locustatachykinin I and II, two novel insect neuropeptides with homology to peptides of the vertebrate tachykinin family. FEBS Lett 261(2):397–401
Wang Y et al (1992) Primary structure and receptor-binding properties of a neurokinin A-related peptide from frog gut. Biochem J 287(Pt 3):827–832
Waugh D et al (1993) Primary structures and biological activities of substance-P-related peptides from the brain of the dogfish Scyliorhinus canicula. Eur J Biochem 214(2):469–474
Wong JH, Ng TB (2003) Gymnin, a potent defensin-like antifungal peptide from the Yunnan bean (Gymnocladus chinensis Baill). Peptides 24(7):963–968
Wong JH, Ng TB (2005) Sesquin, a potent defensin-like antimicrobial peptide from ground beans with inhibitory activities toward tumor cells and HIV-1 reverse transcriptase. Peptides 26(7):1120–1126
Xia L, Ng TB (2005) An antifungal protein from flageolet beans. Peptides 26(12):2397–2403
Acknowledgements
The authors are thankful to NIPER Kolkata and NIPER Mohali for providing the resources and support. The author Mushtaq A. Wani is thankful to the Department of Pharmaceuticals and the Ministry of Chemicals and Fertilizes for providing a Ph.D. fellowship.
Funding
None.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Wani, M.A., Garg, P. & Roy, K.K. Machine learning-enabled predictive modeling to precisely identify the antimicrobial peptides. Med Biol Eng Comput 59, 2397–2408 (2021). https://doi.org/10.1007/s11517-021-02443-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11517-021-02443-6