Skip to main content

Advertisement

Log in

StackHCV: a web-based integrative machine-learning framework for large-scale identification of hepatitis C virus NS5B inhibitors

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

Fast and accurate identification of inhibitors with potency against HCV NS5B polymerase is currently a challenging task. As conventional experimental methods is the gold standard method for the design and development of new HCV inhibitors, they often require costly investment of time and resources. In this study, we develop a novel machine learning-based meta-predictor (termed StackHCV) for accurate and large-scale identification of HCV inhibitors. Unlike the existing method, which is based on single-feature-based approach, we first constructed a pool of various baseline models by employing a wide range of heterogeneous molecular fingerprints with five popular machine learning algorithms (k-nearest neighbor, multi-layer perceptron, partial least squares, random forest and support vectors machine). Secondly, we integrated these baseline models in order to develop the final meta-based model by means of the stacking strategy. Extensive benchmarking experiments showed that StackHCV achieved a more accurate and stable performance as compared to its constituent baseline models on the training dataset and also outperformed the existing predictor on the independent test dataset. To facilitate the high-throughput identification of HCV inhibitors, we built a web server that can be freely accessed at http://camt.pythonanywhere.com/StackHCV. It is expected that StackHCV could be a useful tool for fast and precise identification of potential drugs against HCV NS5B particularly for liver cancer therapy and other clinical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

All the data are available at http://camt.pythonanywhere.com/StackHCV.

References

  1. Thrift AP, El-Serag HB, Kanwal F (2017) Global epidemiology and burden of HCV infection and HCV-related disease. Nat Rev Gastroenterol Hepatol 14(2):122–132

    Article  PubMed  Google Scholar 

  2. Khalid H et al (2020) Discovery of novel Hepatitis C virus inhibitor targeting multiple allosteric sites of NS5B polymerase. Infect Genet Evol 84:104371

    Article  PubMed  CAS  Google Scholar 

  3. Dubuisson J, Cosset FL (2014) Virology and cell biology of the hepatitis C virus life cycle: an update. J Hepatol 61(1):S3–S13

    Article  PubMed  CAS  Google Scholar 

  4. Pawlotsky JM, Chevaliez S, McHutchison JA (2007) The hepatitis C virus life cycle as a target for new antiviral therapies. Gastroenterology 132(5):1979–1998

    Article  PubMed  CAS  Google Scholar 

  5. Zajac M, Muszalska I, Sobczak A, Dadej A, Tomczak S, Jelinska A (2019) Hepatitis C-new drugs and treatment prospects. Eur J Med Chem 165:225–249

    Article  PubMed  CAS  Google Scholar 

  6. de Albuquerque P, Santos LHS, Antunes D, Caffarena ER, Figueiredo AS (2020) Structural insights into NS5B protein of novel equine hepaciviruses and pegiviruses complexed with polymerase inhibitors. Virus Res 278:197867

    Article  PubMed  CAS  Google Scholar 

  7. Ago H et al (1999) Crystal structure of the RNA-dependent RNA polymerase of hepatitis C virus. Structure 7(11):1417–1426

    Article  PubMed  CAS  Google Scholar 

  8. Tanaji TT (2008) Multiple allosteric pockets of HCV NS5B polymerase and its inhibitors: a structure based insight. Curr Bioact Compd 4(2):86–109

    Article  Google Scholar 

  9. Hang JQ et al (2009) Slow binding inhibition and mechanism of resistance of non-nucleoside polymerase inhibitors of hepatitis C virus. J Biol Chem 284(23):15517–15529

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  10. Worachartcheewan A, Prachayasittikul V, Toropova AP, Toropov AA, Nantasenamat C (2015) Large-scale structure-activity relationship study of hepatitis C virus NS5B polymerase inhibition using SMILES-based descriptors. Mol Divers 19(4):955–964

    Article  PubMed  CAS  Google Scholar 

  11. Hassan GS, Georgey HH, Mohammed EZ, Omar FA (2019) Anti-hepatitis-C virus activity and QSAR study of certain thiazolidinone and thiazolotriazine derivatives as potential NS5B polymerase inhibitors. Eur J Med Chem 184:111747

    Article  PubMed  CAS  Google Scholar 

  12. Musmuca I, Caroli A, Mai A, Kaushik-Basu N, Arora P, Ragno R (2010) Combining 3-D quantitative structure–activity relationship with ligand based and structure based alignment procedures for in silico screening of new hepatitis C virus NS5B Polymerase inhibitors. J Chem Inf Model 50(4):662–676

    Article  PubMed  CAS  Google Scholar 

  13. Golub AG et al (2012) Discovery of new scaffolds for rational design of HCV NS5B polymerase inhibitors. Eur J Med Chem 58:258–264

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  14. Talele TT et al (2010) Structure-based virtual screening, synthesis and SAR of novel inhibitors of hepatitis C virus NS5B polymerase. Bioorg Med Chem 18(13):4630–4638

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  15. Malik AA, Phanus-umporn C, Schaduangrat N, Shoombuatong W, Isarankura‐Na‐Ayudhya C, Nantasenamat C (2020) HCVpred: a web server for predicting the bioactivity of hepatitis C virus NS5B inhibitors. J Comput Chem 41:1820–1834

    Article  PubMed  CAS  Google Scholar 

  16. Worachartcheewan A, Prachayasittikul V, Anuwongcharoen N, Shoombuatong W, Prachayasittikul V, Nantasenamat C (2015) On the origins of hepatitis C virus NS5B polymerase inhibitory activity using machine learning approaches. Curr Topics Med Chem 15(18):1814–1826

    Article  CAS  Google Scholar 

  17. Di Marco S et al (2005) Interdomain communication in hepatitis C virus polymerase abolished by small molecule inhibitors bound to a novel allosteric site. J Biol Chem 280(33):29765–29770

    Article  PubMed  CAS  Google Scholar 

  18. Charoenkwan P, Chiangjong W, Nantasenamat C, Hasan MM, Manavalan B, Shoombuatong W (2021) StackIL6: a stacking ensemble model for improving the prediction of IL6 inducing peptides. Brief Bioinform. https://doi.org/10.1093/bib/bbab172

    Article  PubMed  Google Scholar 

  19. Charoenkwan P, Nantasenamat C, Hasan MM, Shoombuatong W (2020) Meta-iPVP: a sequence-based meta-predictor for improving the prediction of phage virion proteins using effective feature representation. J Comput Aided Mol Des 34(10):1105–1116

    Article  PubMed  CAS  Google Scholar 

  20. Hasan M, Schaduangrat N, Basith S, Lee G, Shoombuatong W, Manavalan B (2020) HLPpred-Fuse: improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation.  Bioinformatics  36:3350–3356

    Article  PubMed  CAS  Google Scholar 

  21. Gaulton A et al (2017) The ChEMBL database in 2017. Nucleic Acids Res 45(D1):D945–D954

    Article  PubMed  CAS  Google Scholar 

  22. Liu K, Chen W (2020) Imrm a platform for simultaneously identifying multiple kinds of RNA modifications. Bioinformatics 36(11):3336–3342

    Article  PubMed  CAS  Google Scholar 

  23. Su ZD et al (2018) iLoc-lncRNA predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC. Bioinformatics 34(4):4196–4204

    Article  PubMed  CAS  Google Scholar 

  24. Wei L, Zhou C, Su R, Zou Q (2019) PEPred-Suite: improved and robust prediction of therapeutic peptides using adaptive feature representation learning. Bioinformatics 35(21):4272–4280

    Article  PubMed  CAS  Google Scholar 

  25. Xu Z-C, Feng P-M, Yang H, Qiu W-R, Chen W, Lin H (2019) iRNAD: a computational tool for identifying D modification sites in RNA sequence. Bioinformatics 35(23):4922–4929

    Article  PubMed  CAS  Google Scholar 

  26. Hasan MM, Basith S, Khatun MS, Lee G, Manavalan B, Kurata H (2020) Meta-i6mA: an interspecies predictor for identifying DNA N6-methyladenine sites of plant genomes by exploiting informative features in an integrative machine-learning framework. Brief Bioinform 22:bba202

    Article  Google Scholar 

  27. Lv H, Zhang Z-M, Li S-H, Tan J-X, Chen W, Lin H (2019) Evaluation of different computational methods on 5-methylcytosine sites identification. Brief Bioinform 21:982–995

    Article  CAS  Google Scholar 

  28. Qiang X, Zhou C, Ye X, Du P-f, Su R, Wei L (2020) CPPred-FL: a sequence-based predictor for large-scale identification of cell-penetrating peptides by feature representation learning. Brief Bioinform 21(1):11–23

    Google Scholar 

  29. Rao B, Zhou C, Zhang G, Su R, Wei L (2020) ACPred-Fuse: fusing multi-view information improves the prediction of anticancer peptides. Brief Bioinform 21(5):1846–1855

    Article  PubMed  Google Scholar 

  30. Willighagen EL et al (2017) The Chemistry Development Kit (CDK) v2. 0: atom typing, depiction, molecular formulas, and substructure searching. J Cheminform 9(1):1–19

    CAS  Google Scholar 

  31. Dao F-Y et al (2019) Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique. Bioinformatics 35(12):2075–2083

    Article  PubMed  CAS  Google Scholar 

  32. Phanus-umporn C, Shoombuatong W, Prachayasittikul V, Anuwongcharoen N, Nantasenamat C (2018) Privileged substructures for anti-sickling activity via cheminformatic analysis. RSC Adv 8(11):5920–5935

    Article  PubMed  PubMed Central  Google Scholar 

  33. Shoombuatong W et al (2017) Towards the revival of interpretable QSAR models. In: Advances in QSAR modeling. Springer, New York pp 3–55

  34. Schaduangrat N, Nantasenamat C, Prachayasittikul V, Shoombuatong W (2019) ACPred: a computational tool for the prediction and analysis of anticancer peptides. Molecules 24(10):1973

    Article  PubMed Central  CAS  Google Scholar 

  35. Pratiwi R et al (2017) CryoProtect: a web server for classifying antifreeze proteins from nonantifreeze proteins. J Chem 2017:1–15

    Article  CAS  Google Scholar 

  36. Win TS, Malik AA, Prachayasittikul V, Wikberg JES, Nantasenamat C, Shoombuatong W (2017) HemoPred: a web server for predicting the hemolytic activity of peptides. Fut Med Chem 9(3):275–291

    Article  CAS  Google Scholar 

  37. Win TS, Schaduangrat N, Prachayasittikul V, Nantasenamat C, Shoombuatong W (2018) PAAP: A web server for predicting antihypertensive activity of peptides. Fut Med Chem 10(15):1749–1767

    Article  CAS  Google Scholar 

  38. Laengsri V, Nantasenamat C, Schaduangrat N, Nuchnoi P, Prachayasittikul V, Shoombuatong W (2019) TargetAntiAngio: a sequence-based tool for the prediction and analysis of anti-angiogenic peptides. Int J Mol Sci 20(12):2950

    Article  PubMed Central  CAS  Google Scholar 

  39. Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259

    Article  Google Scholar 

  40. Manavalan B, Basith S, Shin TH, Wei L, Lee G (2018) mAHTPred: a sequence-based meta-predictor for improving the prediction of anti-hypertensive peptides using effective feature representation. Bioinformatics 35:2757–2765

    Article  CAS  Google Scholar 

  41. Manavalan B, Basith S, Shin TH, Wei L, Lee G (2019) Meta-4mCpred: a sequence-based meta-predictor for accurate DNA 4mC site prediction using effective feature representation. Mol Therapy 16:733–744

    CAS  Google Scholar 

  42. Wei L, Ye X, Xue Y, Sakurai T, Wei L (2021) ATSE: a peptide toxicity predictor by exploiting structural and evolutionary information based on graph neural network and attention mechanism. Brief Bioinform 22:bbab041

    Article  PubMed  Google Scholar 

  43. Wei L, Zhou C, Chen H, Song J, Su R (2018) ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides. Bioinformatics 34(23):4007–4016

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  44. Charoenkwan P, Nantasenamat C, Hasan MM, Manavalan B, Shoombuatong W (2021) BERT4Bitter: a bidirectional encoder representations from transformers (BERT)-based model for improving the prediction of bitter peptides. Bioinformatics 37(17):2556–2562

    Article  Google Scholar 

  45. Hasan MM, Alam MA, Shoombuatong W, Deng H-W, Manavalan B, Kurata H (2021) NeuroPred-FRL: an interpretable prediction model for identifying neuropeptide using feature representation learning. Brief Bioinform bbab172. https://doi.org/10.1093/bib/bbab172

  46. Pedregosa F et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830

    Google Scholar 

  47. Wei L, Ding Y, Su R, Tang J, Zou Q (2018) Prediction of human protein subcellular localization using deep learning. J Parallel Distrib Comput 117:212–217

    Article  Google Scholar 

  48. Wei L, Hu J, Li F, Song J, Su R, Zou Q (2020) Comparative analysis and prediction of quorum-sensing peptides using feature representation learning and machine learning algorithms. Brief Bioinform 21(1):106–119

    CAS  Google Scholar 

  49. Su R, Liu X, Xiao G, Wei L (2020) Meta-GDBP: a high-level stacked regression model to improve anticancer drug response prediction. Brief Bioinform 21(3):996–1005

    Article  PubMed  CAS  Google Scholar 

  50. Su R, Wu H, Xu B, Liu X, Wei L (2018) Developing a multi-dose computational model for drug-induced hepatotoxicity prediction based on toxicogenomics data. IEEE/ACM Trans Comput Biol Bioinform 16(4):1231–1239

    Article  PubMed  Google Scholar 

  51. Charoenkwan P, Chiangjong W, Lee VS, Nantasenamat C, Hasan MM, Shoombuatong W (2021) Improved prediction and characterization of anticancer activities of peptides using a novel flexible scoring card method. Sci Rep 11(1):1–13

    Article  CAS  Google Scholar 

  52. Charoenkwan P et al (2013) HCS-neurons: identifying phenotypic changes in multi-neuron images upon drug treatments of high-content screening. BMC Bioinform 14:S12

    Article  Google Scholar 

  53. Charoenkwan P, Kanthawong S, Nantasenamat C, Hasan MM, Shoombuatong W (2020) iDPPIV-SCM: a sequence-based predictor for identifying and analyzing dipeptidyl peptidase IV (DPP-IV) inhibitory peptides using a scoring card method. J Proteome Res 19(10):4125–4136

    Article  PubMed  CAS  Google Scholar 

  54. Charoenkwan P, Kanthawong S, Nantasenamat C, Hasan MM, Shoombuatong W (2020) iAMY-SCM: improved prediction and analysis of amyloid proteins using a scoring card method with propensity scores of dipeptides. Genomics 113:689–698

    Article  PubMed  CAS  Google Scholar 

  55. Charoenkwan P, Nantasenamat C, Hasan MM, Shoombuatong W (2020)  iTTCA-hybrid: improved and robust identification of tumor T cell antigens by utilizing hybrid feature representation. Anal Biochem 599:113747

    Article  PubMed  CAS  Google Scholar 

  56. Charoenkwan P, Shoombuatong W, Lee H-C, Chaijaruwanich J, Huang H-L, Ho S-Y (2013) SCMCRYS: predicting protein crystallization using an ensemble scoring card method with estimating propensity scores of P-collocated amino acid pairs. PLoS ONE 8(9):e72368

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  57. Charoenkwan P, Yana J, Nantasenamat C, Hasan MM, Shoombuatong W (2020) iUmami-SCM: a novel sequence-based predictor for prediction and analysis of umami peptides using a scoring card method with propensity scores of dipeptides. J Chem Inform Model 60:6666–6678

    Article  CAS  Google Scholar 

  58. Charoenkwan P, Yana J, Schaduangrat N, Nantasenamat C, Hasan MM, Shoombuatong W (2020) iBitter-SCM: identification and characterization of bitter peptides using a scoring card method with propensity scores of dipeptides. Genomics 112:2813–2822

    Article  PubMed  CAS  Google Scholar 

  59. Frisch MJ et al (2016) Gaussian 09 Rev. D.01. Wallingford, CT

  60. Verdonk ML et al (2005) Modeling water molecules in protein–ligand docking using GOLD. J Med Chem 48(20):6504–6515

    Article  PubMed  CAS  Google Scholar 

  61. Carhart RE, Smith DH, Venkataraghavan R (1985) Atom pairs as molecular features in structure-activity studies: definition and applications. J Chem Inform Comput Sci 25(2):64–73

    Article  CAS  Google Scholar 

  62. Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E (2003) The Chemistry Development Kit (CDK): an open-source java library for chemo- and bioinformatics. J Chem Inform Comput Sci 43(2):493–500

    Article  CAS  Google Scholar 

  63. Hall LH, Kier LB (1995) Electrotopological state indices for atom types: a novel combination of electronic, topological, and valence state information. J Chem Inform Comput Sci 35(6):1039–1045

    Article  CAS  Google Scholar 

  64. Klekota J, Roth FP (2008) Chemical substructures that enrich for biological activity. Bioinformatics 24(21):2518–2525

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  65. Durant JL, Leland BA, Henry DR, Nourse JG (2002) Reoptimization of MDL keys for use in drug discovery. J Chem Inform Comput Sci 42(6):1273–1280

    Article  CAS  Google Scholar 

  66. Kim S et al (2016) PubChem substance and compound databases. Nucleic Acids Res 44:D1202

    Article  PubMed  CAS  Google Scholar 

  67. Laggner C (2005) SMARTS patterns for functional group classification

Download references

Acknowledgements

This work was fully supported by College of Arts, Media and Technology, Chiang Mai University and partially supported by Chiang Mai University, Chiang Mai University High Performance Computer Service and Mahidol University. GOLD license was supported by Kasetsart University Research and Development Institute, KURDI (Grant No. FF (KU) 11.64).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Phasit Charoenkwan or Watshara Shoombuatong.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 531.3 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Malik, A.A., Chotpatiwetchkul, W., Phanus-umporn, C. et al. StackHCV: a web-based integrative machine-learning framework for large-scale identification of hepatitis C virus NS5B inhibitors. J Comput Aided Mol Des 35, 1037–1053 (2021). https://doi.org/10.1007/s10822-021-00418-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-021-00418-1

Keywords

Navigation