Abstract
Antiviral peptides (AVPs) open new possibilities as an effective antiviral therapeutic in the current scenario of evolving drug-resistant viruses. Knowledge about the sequence and structure activity relationship in AVPs is still largely unknown. AVPs and antimicrobial peptides (AMPs) share several common features but as they target different life forms (living organisms and viruses), exploring the differential sequence features may facilitate in designing specific AVPs. The current work developed accurate prediction models for discriminating (a) AVPs from AMPs, (b) Coronaviridae AVPs from other virus family specific AVPs and (c) highly active AVPs (HAA) from lowly active AVPs (LAA). Further explainable machine learning methods (using model agnostic global interpretable methods) are utilized for exploring and interpreting the physicochemical spaces of AVPs, Coronaviridae AVPs and highly active AVPs. To further understand the association of physicochemical space distribution with pIC50 values, regression models were developed and analyzed using accumulated local effects and interaction strength analysis. An independent sample t-test is used to filter out the significant compositional differences between the smaller length HAA and longer length HAA groups. AVPs prefer lower charge/length ratio and basic residues in comparison with AMPs. Coronaviridae family-specific AVPs have lower propensities for basic amino acids, charge and preference for aspartic acid. Further there is prevalence for basic residues in lowly active AVPs as compared to highly active AVPs. Sequence order effects captured in terms of average amino acid pair distances proved to be more constructive in deciphering the sequences of AVPs.
Similar content being viewed by others
References
Agarwal G, Gabrani R (2021) Antiviral peptides: identification and validation. Int J Pept Res Ther 27(1):149–168. https://doi.org/10.1007/s10989-020-10072-0
Apley DW, Zhu J (2020) Visualizing the effects of predictor variables in black box supervised learning models. J R Stat Soc Ser B (stat Methodol) 82(4):1059–1086. https://doi.org/10.1111/rssb.12377
Ashaolu TJ, Nawaz A, Walayat N, Khalifa I (2021) Potential “biopeptidal” therapeutics for severe respiratory syndrome coronaviruses: a review of antiviral peptides, viral mechanisms, and prospective needs. Appl Microbiol Biotechnol 105(9):3457–3470. https://doi.org/10.1007/s00253-021-11267-1
Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324
Collard Charles D (2007) A razor may be sharper than an ax, but it cannot cut wood. Anesthesiology 106(3):420–422. https://doi.org/10.1097/00000542-200703000-00004
Daszykowski M, Walczak B, Massart DL (2002) Representative subset selection. Anal Chim Acta 468(1):91–103. https://doi.org/10.1016/S0003-2670(02)00651-7
Decker AP, Mechesso AF, Wang G (2022) Expanding the landscape of amino acid-rich antimicrobial peptides: definition, deployment in nature, implications for peptide design and therapeutic potential. Int J Mol Sci 23(21):12874
Frank E, Witten I (1998) Generating accurate rule sets without global optimization. In: Machine learning: proceedings of the fifteenth international conference
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor Newsl 11(1):10–18. https://doi.org/10.1145/1656274.1656278
Heydari H, Golmohammadi R, Mirnejad R, Tebyanian H, Fasihi-Ramandi M, Moosazadeh Moghaddam M (2021) Antiviral peptides against coronaviridae family: a review. Peptides 139:170526. https://doi.org/10.1016/j.peptides.2021.170526
Jhong J-H, Yao L, Pang Y, Li Z, Chung C-R, Wang R, Li S, Li W, Luo M, Ma R, Huang Y, Zhu X, Zhang J, Feng H, Cheng Q, Wang C, Xi K, Wu L-C, Chang T-H, Horng J-T, Zhu L, Chiang Y-C, Wang Z, Lee T-Y (2021) dbAMP 2.0: updated resource for antimicrobial peptides with an enhanced scanning method for genomic and proteomic data. Nucleic Acids Res 50(D1):D460–D470. https://doi.org/10.1093/nar/gkab1080
Kawashima S, Kanehisa M (2000) AAindex: amino acid index database. Nucleic Acids Res 28(1):374–374. https://doi.org/10.1093/nar/28.1.374
Kennard R, Stone LA (2012) Computer aided design of experiments. Technometrics 11:137–148. https://doi.org/10.1080/00401706.1969.10490666
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444. https://doi.org/10.1038/nature14539
Lee Y-CJ, Shirkey JD, Park J, Bisht K, Cowan AJ (2022) An overview of antiviral peptides and rational biodesign considerations. BioDesign Res. 2022:9898241. https://doi.org/10.34133/2022/9898241
Liu B, Wang X, Chen Q, Dong Q, Lan X (2012) Using amino acid physicochemical distance transformation for fast protein remote homology detection. PLoS ONE 7(9):e46633. https://doi.org/10.1371/journal.pone.0046633
Liu Y, Zhu Y, Sun X, Ma T, Lao X, Zheng H (2023) DRAVP: a comprehensive database of antiviral peptides and proteins. Viruses 15(4):820
Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. Paper presented at the Proceedings of the 31st international conference on neural information processing systems, Long Beach, California, USA
Mahendran ASK, Lim YS, Fang C-M, Loh H-S, Le CF (2020) The potential of antiviral peptides as COVID-19 therapeutics. Front Pharmacol. https://doi.org/10.3389/fphar.2020.575444
Müller M (2000) Generalized Linear Models. XploRe — Learning Guide. Springer, Berlin, pp 205–228. https://doi.org/10.1007/978-3-642-60232-0_7
Natekin A, Knoll A (2013) Gradient boosting machines, a tutorial. Front Neurorobot. https://doi.org/10.3389/fnbot.2013.00021
Nath A (2016) Insights into the sequence parameters for halophilic adaptation. Amino Acids 48(3):751–762. https://doi.org/10.1007/s00726-015-2123-x
Nath A (2021) Prediction for understanding the effectiveness of antiviral peptides. Comput Biol Chem 95:107588. https://doi.org/10.1016/j.compbiolchem.2021.107588
Nath A, Chaube R, Subbiah K (2013) An insight into the molecular basis for convergent evolution in fish antifreeze Proteins. Comput Biol Med 43(7):817–821. https://doi.org/10.1016/j.compbiomed.2013.04.013
Nath A, Subbiah K (2015) Maximizing lipocalin prediction through balanced and diversified training set and decision fusion. Comput Biol Chem 59:101–110. https://doi.org/10.1016/j.compbiolchem.2015.09.011
Nath A, Subbiah K (2018) The role of pertinently diversified and balanced training as well as testing data sets in achieving the true performance of classifiers in predicting the antifreeze proteins. Neurocomputing 272:294–305. https://doi.org/10.1016/j.neucom.2017.07.004
Noble WS (2009) How does multiple testing correction work? Nat Biotechnol 27(12):1135–1137. https://doi.org/10.1038/nbt1209-1135
Pang Y, Wang Z, Jhong J-H, Lee T-Y (2021a) Identifying anti-coronavirus peptides by incorporating different negative datasets and imbalanced learning strategies. Brief Bioinform 22(2):1085–1095. https://doi.org/10.1093/bib/bbaa423
Pang Y, Yao L, Jhong J-H, Wang Z, Lee T-Y (2021b) AVPIden: a new scheme for identification and functional prediction of antiviral peptides based on machine learning approaches. Brief Bioinform. https://doi.org/10.1093/bib/bbab263
Pirtskhalava M, Amstrong AA, Grigolava M, Chubinidze M, Alimbarashvili E, Vishnepolsky B, Gabrielian A, Rosenthal A, Hurt DE, Tartakovsky M (2020) DBAASP v3: database of antimicrobial/cytotoxic activity and structure of peptides as a resource for development of new therapeutics. Nucleic Acids Res 49(D1):D288–D297. https://doi.org/10.1093/nar/gkaa991
Qureshi A, Tandon H, Kumar M (2015) AVP-IC50Pred: Multiple machine learning techniques-based prediction of peptide antiviral activity in terms of half maximal inhibitory concentration (IC50). Pept Sci 104(6):753–763. https://doi.org/10.1002/bip.22703
Qureshi A, Thakur N, Kumar M (2013a) HIPdb: a database of experimentally validated HIV inhibiting peptides. PLoS ONE 8(1):e54908. https://doi.org/10.1371/journal.pone.0054908
Qureshi A, Thakur N, Tandon H, Kumar M (2013b) AVPdb: a database of experimentally validated antiviral peptides targeting medically important viruses. Nucleic Acids Res 42(D1):D1147–D1153. https://doi.org/10.1093/nar/gkt1191
Saptoro A, Tadé M (2012) A modified kennard-stone algorithm for optimal division of data for developing artificial neural network models. Chem Product Process Model. https://doi.org/10.1515/1934-2659.1645
Sarkar T, Chetia M, Chatterjee S (2021) Antimicrobial peptides and proteins: from nature’s reservoir to the laboratory and beyond. Front Chem. https://doi.org/10.3389/fchem.2021.691532
Schapire RE (2003) The Boosting Approach to Machine Learning: An Overview. In: Denison DD, Hansen MH, Holmes CC, Mallick B, Yu B (eds) Nonlinear Estimation and Classification. Springer, New York, pp 149–171. https://doi.org/10.1007/978-0-387-21579-2_9
Shi G, Kang X, Dong F, Liu Y, Zhu N, Hu Y, Xu H, Lao X, Zheng H (2021) DRAMP 3.0: an enhanced comprehensive data repository of antimicrobial peptides. Nucleic Acids Res 50(D1):D488–D496. https://doi.org/10.1093/nar/gkab651
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Storey JD, Tibshirani R (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci 100(16):9440–9445. https://doi.org/10.1073/pnas.1530509100
Surana S, Arora P, Singh D, Sahasrabuddhe D, Valadi J (2022) PandoraGAN: generating antiviral peptides using generative adversarial network. BioRxiv. https://doi.org/10.1101/2021.02.15.431193
Team R (2006) A language and environment for statistical computing. Computing. https://doi.org/10.1890/0012-9658(2002)083[3097:CFHIWS]2.0.CO;2
Teng LY, Mattar CNZ, Biswas A, Hoo WL, Saw SN (2022) Interpreting the role of nuchal fold for fetal growth restriction prediction using machine learning. Sci Rep 12(1):3907. https://doi.org/10.1038/s41598-022-07883-0
Timmons PB, Hewage CM (2021) ENNAVIA is a novel method which employs neural networks for antiviral and anti-coronavirus activity prediction for therapeutic peptides. Brief Bioinf 22:6. https://doi.org/10.1093/bib/bbab258
Tonk M, Růžek D, Vilcinskas A (2021) Compelling evidence for the activity of antiviral peptides against SARS-CoV-2. Viruses 13(5):912
Urbanowicz RJ, Meeker M, La Cava W, Olson RS, Moore JH (2018) Relief-based feature selection: introduction and review. J Biomed Inform 85:189–203. https://doi.org/10.1016/j.jbi.2018.07.014
Van Oort CM, Ferrell JB, Remington JM, Wshah S, Li J (2021) AMPGAN v2: machine learning-guided design of antimicrobial peptides. J Chem Inf Model 61(5):2198–2207. https://doi.org/10.1021/acs.jcim.0c01441
Vilas Boas LCP, Campos ML, Berlanda RLA, de Carvalho NN, Franco OL (2019) Antiviral peptides as promising therapeutic drugs. Cell Mol Life Sci 76(18):3525–3542. https://doi.org/10.1007/s00018-019-03138-w
Wei Q, Dunbrack RL Jr (2013) The role of balanced training and testing data sets for binary classifiers in bioinformatics. PLoS ONE 8(7):e67863. https://doi.org/10.1371/journal.pone.0067863
Welchowski T, Maloney KO, Mitchell R, Schmid M (2022) Techniques to improve ecological interpretability of black-box machine learning models. J Agric Biol Environ Stat 27(1):175–197. https://doi.org/10.1007/s13253-021-00479-7
Zhao H, Zhou J, Zhang K, Chu H, Liu D, Poon VK-M, Chan CC-S, Leung H-C, Fai N, Lin Y-P, Zhang AJ-X, Jin D-Y, Yuen K-Y, Zheng B-J (2016) A novel peptide with potent and broad-spectrum antiviral activities against multiple respiratory viruses. Sci Rep 6(1):22008. https://doi.org/10.1038/srep22008
Zheng J, Khil PP, Camerini-Otero RD, Przytycka TM (2010) Detecting sequence polymorphisms associated with meiotic recombination hotspots in the human genome. Genome Biol 11(10):R103. https://doi.org/10.1186/gb-2010-11-10-r103
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author has no conflict of interest.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Nath, A. Physicochemical and sequence determinants of antiviral peptides. BIOLOGIA FUTURA 74, 489–506 (2023). https://doi.org/10.1007/s42977-023-00188-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42977-023-00188-x