Skip to main content

Advertisement

Log in

Machine learning-enabled predictive modeling to precisely identify the antimicrobial peptides

  • Original Article
  • Published:
Medical & Biological Engineering & Computing Aims and scope Submit manuscript

Abstract

The ubiquitous antimicrobial peptides (AMPs), with a broad range of antimicrobial activities, represent a great promise for combating the multi-drug resistant infections. In this study, using a large and diverse set of AMPs (2638) and non-AMPs (3700), we have explored a variety of machine learning classifiers to build in silico models for AMP prediction, including Random Forest (RF), k-Nearest Neighbors (k-NN), Support Vector Machine (SVM), Decision Tree (DT), Naive Bayes (NB), Quadratic Discriminant Analysis (QDA), and ensemble learning. Among the various models generated, the RF classifier-based model top-performed in both the internal [Accuracy: 91.40%, Precision: 89.37%, Sensitivity: 90.05%, and Specificity: 92.36%] and external validations [Accuracy: 89.43%, Precision: 88.92%, Sensitivity: 85.21%, and Specificity: 92.43%]. In addition, the RF classifier-based model correctly predicted the known AMPs and non-AMPs; those kept aside as an additional external validation set. The performance assessment revealed three features viz. ChargeD2001, PAAC12 (pseudo amino acid composition), and polarity T13 that are likely to play vital roles in the antimicrobial activity of AMPs. The developed RF-based classification model may further be useful in the design and prediction of the novel potential AMPs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Sang Y, Blecha F (2008) Antimicrobial peptides and bacteriocins: alternatives to traditional antibiotics. Anim Health Res Rev 9(2):227–235

    Article  PubMed  Google Scholar 

  2. Lata S, Sharma BK, Raghava GP (2007) Analysis and prediction of antibacterial peptides. BMC Bioinformatics 8:263

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  3. Lai R et al (2002) Antimicrobial peptides from skin secretions of Chinese red belly toad Bombina maxima. Peptides 23(3):427–435

    Article  CAS  PubMed  Google Scholar 

  4. Finlay BB, Hancock RE (2004) Can innate immunity be enhanced to treat microbial infections? Nat Rev Microbiol 2(6):497–504

    Article  CAS  PubMed  Google Scholar 

  5. Wang Z, Wang G (2004) APD: the antimicrobial peptide database. Nucleic Acids Res 32:D590–D592

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Dubos RJ, Cattaneo C (1939) Studies on a bactericidal agent extracted from a soil bacillus: III. Preparation and activity of a protein-free fraction. J Exp Med 70(3):249–256

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Dubos RJ (1939) Studies on a bactericidal agent extracted from a soil bacillus: II. Protective effect of the bactericidal agent against experimental pneumococcus infections in mice. J Exp Med 70(1):11–17

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Dubos RJ (1939) Studies on a bactericidal agent extracted from a soil bacillus: I. Preparation of the agent its activity in vitro. J Exp Med 70(1):1–10

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Dubos RJ, Hotchkiss RD (1941) The production of bactericidal substances by aerobic sporulating bacilli. J Exp Med 73(5):629–640

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Van Epps HL (2006) Rene Dubos: unearthing antibiotics. J Exp Med 203(2):259

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  11. Balls AK, Thompson RR, Walden MK (1946) A crystalline protein with beta-amylase activity, prepared from sweet potatoes. J Biol Chem 163:571

    Article  CAS  PubMed  Google Scholar 

  12. Ohtani S et al (1977) Complete primary structures of two subunits of purothionin A, a lethal protein for brewer’s yeast from wheat flour. J Biochem 82(3):753–767

    Article  CAS  PubMed  Google Scholar 

  13. Steiner H et al (1981) Sequence and specificity of two antibacterial proteins involved in insect immunity. Nature 292(5820):246–248

    Article  CAS  PubMed  Google Scholar 

  14. Waghu FH et al (2016) CAMPR3: a database on sequences, structures and signatures of antimicrobial peptides. Nucleic Acids Res 44(D1):D1094–D1097

    Article  CAS  PubMed  Google Scholar 

  15. Wright GD (2007) The antibiotic resistome: the nexus of chemical and genetic diversity. Nat Rev Microbiol 5(3):175–186

    Article  CAS  PubMed  Google Scholar 

  16. Kindrachuk J, Napper S (2010) Structure-activity relationships of multifunctional host defence peptides. Mini Rev Med Chem 10(7):596–614

    Article  CAS  PubMed  Google Scholar 

  17. Yount NY et al (2006) Advances in antimicrobial peptide immunobiology. Biopolymers 84(5):435–458

    Article  CAS  PubMed  Google Scholar 

  18. Wang G, Li X, Wang Z (2009) APD2: the updated antimicrobial peptide database and its application in peptide design. Nucleic Acids Res 37:D933–D937

    Article  CAS  PubMed  Google Scholar 

  19. Sirtori LR, Motta Ade S, Brandelli A (2008) Mode of action of antimicrobial peptide P45 on Listeria monocytogenes. J Basic Microbiol 48(5):393–400

    Article  CAS  PubMed  Google Scholar 

  20. Gordon YJ, Romanowski EG, McDermott AM (2005) A review of antimicrobial peptides and their therapeutic potential as anti-infective drugs. Curr Eye Res 30(7):505–515

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Mahlapuu M et al (2016) Antimicrobial peptides: an emerging category of therapeutic agents. Front Cell Infect Microbiol 6:194

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  22. Friedman LM et al (2015) Fundamentals of clinical trials. Springer, Berlin

    Book  Google Scholar 

  23. Kummar S et al (2007) Compressing drug development timelines in oncology using phase ‘0’ trials. Nat Rev Cancer 7(2):131–139

    Article  CAS  PubMed  Google Scholar 

  24. Sanz-Ruiz R et al (2010) Phases I-III clinical trials using adult stem cells. Stem Cells Int. https://doi.org/10.4061/2010/579142

    Article  PubMed  PubMed Central  Google Scholar 

  25. Flay BR (1986) Efficacy and effectiveness trials (and other phases of research) in the development of health promotion programs. Prev Med 15(5):451–474

    Article  CAS  PubMed  Google Scholar 

  26. Torrent M, Nogues MV, Boix E (2012) Discovering new in silico tools for antimicrobial peptide prediction. Curr Drug Targets 13(9):1148–1157

    Article  CAS  PubMed  Google Scholar 

  27. Wani MA, Roy KK (2021) Development and validation of consensus machine learning-based models for the prediction of novel small molecules as potential anti-tubercular agents. Mol Divers. https://doi.org/10.1007/s11030-021-10238-y

    Article  PubMed  Google Scholar 

  28. Pundir P, Gomanse V, Krishnamacharya N (2013) Classification and prediction techniques using machine learning for anomaly detection. I J Eng Res Appl 1:1716–1722

    Google Scholar 

  29. Stephenson N et al (2019) Survey of machine learning techniques in drug discovery. Curr Drug Metabol 20(3):185–193

    Article  CAS  Google Scholar 

  30. Cheng J, Tegge AN, Baldi P (2008) Machine learning methods for protein structure prediction. IEEE Rev Biomed Eng 1:41–49

    Article  PubMed  Google Scholar 

  31. Li J et al (2020) ACNNT3: attention-CNN framework for prediction of sequence-based bacterial type III secreted effectors. Comput Math Methods Med 2020:3974598

    Article  PubMed  PubMed Central  Google Scholar 

  32. Li Z et al (2019) RDense: a protein-RNA binding prediction model based on bidirectional recurrent neural network and densely connected convolutional networks. IEEE Access 8:14588–14605

    Article  Google Scholar 

  33. Aranha MP et al (2020) Prediction of peptide binding to MHC using machine learning with sequence and structure-based feature sets. Biochim et Biophys Acta 1864(4):129535

    Article  CAS  Google Scholar 

  34. Jiang X et al (2017) Identify Huntington’s disease associated genes based on restricted Boltzmann machine with RNA-seq data. BMC Bioinformatics 18(1):1–13

    Article  CAS  Google Scholar 

  35. Mignone P et al (2020) Exploiting transfer learning for the reconstruction of the human gene regulatory network. Bioinformatics 36(5):1553–1561

    CAS  PubMed  Google Scholar 

  36. Pio G et al (2020) Exploiting causality in gene network reconstruction based on graph embedding. Mach Learn 109(6):1231–1279

    Article  Google Scholar 

  37. Smith VA, Jarvis ED, Hartemink AJ (2002) Evaluating functional network inference using simulations of complex biological systems. Bioinformatics 18:S216–S224

    Article  PubMed  Google Scholar 

  38. Jiang X et al (2020) A generative adversarial network model for disease gene prediction with RNA-seq data. IEEE Access 8:37352–37360

    Article  Google Scholar 

  39. Thomas S et al (2010) CAMP: a useful resource for research on antimicrobial peptides. Nucleic Acids Res 38:D774–D780

    Article  CAS  PubMed  Google Scholar 

  40. Wang P et al (2011) Prediction of antimicrobial peptides based on sequence alignment and feature selection methods. PLoS ONE 6(4):e18476

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Waghu FH et al (2014) CAMP: collection of sequences and structures of antimicrobial peptides. Nucleic Acids Res 42(D1):D1154–D1158

    Article  CAS  PubMed  Google Scholar 

  42. Ng XY, Rosdi BA, Shahrudin S (2015) Prediction of antimicrobial peptides based on sequence alignment and support vector machine-pairwise algorithm utilizing LZ-complexity. Biomed Res Int 2015:212715

    Article  PubMed  PubMed Central  Google Scholar 

  43. Caprani M et al (2020) Identification of antimicrobial peptides from macroalgae with machine learning. In: International conference on practical applications of computational biology & bioinformatics, Springer

  44. Zhao T, Hu Y, Zang T (2020) DRACP: a novel method for identification of anticancer peptides. BMC Bioinformatics 21(16):1–11

    Google Scholar 

  45. Ho TK (1998) The random subspace method for constructing decision forests. IEEE PAMI 20(8):832–844

    Article  Google Scholar 

  46. Lira F et al (2013) Prediction of antimicrobial activity of synthetic peptides by a decision tree model. Appl Environm Microbiol 79(10):3156–3159

    Article  CAS  Google Scholar 

  47. Exarchos KP et al (2007) Predicting peptide bond conformation using feature selection and the Naive Bayes approach. In: 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, IEEE

  48. Chen W, Luo L (2009) Classification of antimicrobial peptide using diversity measure with quadratic discriminant analysis. J Microbiol Methods 78(1):94–96

    Article  CAS  PubMed  Google Scholar 

  49. Usmani SS, Bhalla S, Raghava GP (2018) Prediction of antitubercular peptides from sequence information using ensemble classifier and hybrid features. Front Pharmacol 9:954

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  50. Cao DS et al (2013) PyDPI: freely available python package for chemoinformatics, bioinformatics, and chemogenomics studies. J Chem Inf Model 53(11):3086–3096

    Article  CAS  PubMed  Google Scholar 

  51. Pedregosa F et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830

    Google Scholar 

  52. Team R (2020) RStudio: integrated development for R, in RStudio. PBC, Boston

    Google Scholar 

  53. Zhao X et al (2013) LAMP: a database linking antimicrobial peptides. PLoS ONE 8(6):e66557

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22(13):1658–1659

    Article  CAS  PubMed  Google Scholar 

  55. Todeschini R, Consonni V (2000) Handbook of molecular descriptors. John Wiley & Sons, New York

    Book  Google Scholar 

  56. Chou K-C (2001) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins 43(3):246–255

    Article  CAS  PubMed  Google Scholar 

  57. Shen H-B, Chou K-C (2008) PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. Anal Biochem 373(2):386–388

    Article  CAS  PubMed  Google Scholar 

  58. Komer B, Bergstra J, Eliasmith C (2014) Hyperopt-sklearn: automatic hyperparameter configuration for scikit-learn. In: ICML Workshop on AutoML

  59. Donoho DL (2000) High-dimensional data analysis: the curses and blessings of dimensionality. In: AMS Conf Math Challenges 21st Century

  60. Spruyt V (2014) The curse of dimensionality in classification. Computer Vision for Dummies. Available from: https://www.visiondummy.com/2014/04/curse-dimensionality-affect-classification/

  61. Huljanah M et al (2019) Feature selection using random forest classifier for predicting prostate cancer. In: IOP conference series: materials science and engineering, IOP Publishing

  62. Popov NV, Razmochaeva NV, Klionskiy DM (2020) Investigation of algorithms for converting dimension of feature space in retail data analysis problems. In: 2020 9th mediterranean conference on embedded computing (MECO), IEEE

  63. Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46(3):175–185

    Google Scholar 

  64. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    Article  Google Scholar 

  65. Patrzykat A et al (2003) Novel antimicrobial peptides derived from flatfish genes. Antimicrob Agents Chemother 47(8):2464–2470

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Sen PC, Hajra M, Ghosh M (2020) Supervised classification algorithms in machine learning: a survey and review. In: Emerging technology in modelling and graphics, Springer, p. 99-111

  67. Shawe-Taylor J et al (1998) Structural risk minimization over data-dependent hierarchies. IEEE Trans Inf Theory 44(5):1926–1940

    Article  Google Scholar 

  68. Warner HR et al (1961) A mathematical approach to medical diagnosis: application to congenital heart disease. JAMA 177(3):177–183

    Article  CAS  PubMed  Google Scholar 

  69. Zhang H (2004) The optimality of naive bayes. In Proc. Seventeenth Int. Florida Artif. Intell. Res. Soc. Conf. FLAIRS

  70. Zhang MQ (1997) Identification of protein coding regions in the human genome by quadratic discriminant analysis. PNAS 94(2):565–568

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Lu J, Luo L (2008) Prediction for human transcription start site using diversity measure with quadratic discriminant. J Bioinformation 2(7):316

    Article  Google Scholar 

  72. Shaikh R (2018) Cross validation explained: evaluating estimator performance. Towards Data Science. Available from: https://www.towardsdatascience.com/cross-validation-explained-evaluating-estimator-performance-e51e5430ff85

  73. Banerjee P, Dehnbostel FO, Preissner R (2018) Prediction is a balancing act: importance of sampling methods to balance sensitivity and specificity of predictive models based on imbalanced chemical data sets. Front Chem 6:362

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  74. Nematzadeh Z, Ibrahim R, Selamat A (2015) Comparative studies on breast cancer classifications with k-fold cross validations using machine learning techniques. In: 10th Asian Control conference (ASCC), IEEE

  75. Yang K et al (2011) Determining the repeat number of cross-validation. In: 4th International conference on biomedical engineering and informatics (BMEI), IEEE

  76. Ngai PH, Ng TB (2004) A ribonuclease with antimicrobial, antimitogenic and antiproliferative activities from the edible mushroom Pleurotus sajor-caju. Peptides 25(1):11–17

    Article  CAS  PubMed  Google Scholar 

  77. Pimenta AM et al (2005) Electrospray ionization quadrupole time-of-flight and matrix-assisted laser desorption/ionization tandem time-of-flight mass spectrometric analyses to solve micro-heterogeneity in post-translationally modified peptides from Phoneutria nigriventer (Aranea, Ctenidae) venom. Rapid Commun Mass Spectrom 19(1):31–37

    Article  CAS  PubMed  Google Scholar 

  78. Schoofs L et al (1990) Locustatachykinin I and II, two novel insect neuropeptides with homology to peptides of the vertebrate tachykinin family. FEBS Lett 261(2):397–401

    Article  CAS  PubMed  Google Scholar 

  79. Wang Y et al (1992) Primary structure and receptor-binding properties of a neurokinin A-related peptide from frog gut. Biochem J 287(Pt 3):827–832

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Waugh D et al (1993) Primary structures and biological activities of substance-P-related peptides from the brain of the dogfish Scyliorhinus canicula. Eur J Biochem 214(2):469–474

    Article  CAS  PubMed  Google Scholar 

  81. Wong JH, Ng TB (2003) Gymnin, a potent defensin-like antifungal peptide from the Yunnan bean (Gymnocladus chinensis Baill). Peptides 24(7):963–968

    Article  CAS  PubMed  Google Scholar 

  82. Wong JH, Ng TB (2005) Sesquin, a potent defensin-like antimicrobial peptide from ground beans with inhibitory activities toward tumor cells and HIV-1 reverse transcriptase. Peptides 26(7):1120–1126

    Article  CAS  PubMed  Google Scholar 

  83. Xia L, Ng TB (2005) An antifungal protein from flageolet beans. Peptides 26(12):2397–2403

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The authors are thankful to NIPER Kolkata and NIPER Mohali for providing the resources and support. The author Mushtaq A. Wani is thankful to the Department of Pharmaceuticals and the Ministry of Chemicals and Fertilizes for providing a Ph.D. fellowship.

Funding

None.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kuldeep K. Roy.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 32 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wani, M.A., Garg, P. & Roy, K.K. Machine learning-enabled predictive modeling to precisely identify the antimicrobial peptides. Med Biol Eng Comput 59, 2397–2408 (2021). https://doi.org/10.1007/s11517-021-02443-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11517-021-02443-6

Keywords

Navigation