Abstract
Fast and accurate identification of inhibitors with potency against HCV NS5B polymerase is currently a challenging task. As conventional experimental methods is the gold standard method for the design and development of new HCV inhibitors, they often require costly investment of time and resources. In this study, we develop a novel machine learning-based meta-predictor (termed StackHCV) for accurate and large-scale identification of HCV inhibitors. Unlike the existing method, which is based on single-feature-based approach, we first constructed a pool of various baseline models by employing a wide range of heterogeneous molecular fingerprints with five popular machine learning algorithms (k-nearest neighbor, multi-layer perceptron, partial least squares, random forest and support vectors machine). Secondly, we integrated these baseline models in order to develop the final meta-based model by means of the stacking strategy. Extensive benchmarking experiments showed that StackHCV achieved a more accurate and stable performance as compared to its constituent baseline models on the training dataset and also outperformed the existing predictor on the independent test dataset. To facilitate the high-throughput identification of HCV inhibitors, we built a web server that can be freely accessed at http://camt.pythonanywhere.com/StackHCV. It is expected that StackHCV could be a useful tool for fast and precise identification of potential drugs against HCV NS5B particularly for liver cancer therapy and other clinical applications.
Similar content being viewed by others
Data availability
All the data are available at http://camt.pythonanywhere.com/StackHCV.
References
Thrift AP, El-Serag HB, Kanwal F (2017) Global epidemiology and burden of HCV infection and HCV-related disease. Nat Rev Gastroenterol Hepatol 14(2):122–132
Khalid H et al (2020) Discovery of novel Hepatitis C virus inhibitor targeting multiple allosteric sites of NS5B polymerase. Infect Genet Evol 84:104371
Dubuisson J, Cosset FL (2014) Virology and cell biology of the hepatitis C virus life cycle: an update. J Hepatol 61(1):S3–S13
Pawlotsky JM, Chevaliez S, McHutchison JA (2007) The hepatitis C virus life cycle as a target for new antiviral therapies. Gastroenterology 132(5):1979–1998
Zajac M, Muszalska I, Sobczak A, Dadej A, Tomczak S, Jelinska A (2019) Hepatitis C-new drugs and treatment prospects. Eur J Med Chem 165:225–249
de Albuquerque P, Santos LHS, Antunes D, Caffarena ER, Figueiredo AS (2020) Structural insights into NS5B protein of novel equine hepaciviruses and pegiviruses complexed with polymerase inhibitors. Virus Res 278:197867
Ago H et al (1999) Crystal structure of the RNA-dependent RNA polymerase of hepatitis C virus. Structure 7(11):1417–1426
Tanaji TT (2008) Multiple allosteric pockets of HCV NS5B polymerase and its inhibitors: a structure based insight. Curr Bioact Compd 4(2):86–109
Hang JQ et al (2009) Slow binding inhibition and mechanism of resistance of non-nucleoside polymerase inhibitors of hepatitis C virus. J Biol Chem 284(23):15517–15529
Worachartcheewan A, Prachayasittikul V, Toropova AP, Toropov AA, Nantasenamat C (2015) Large-scale structure-activity relationship study of hepatitis C virus NS5B polymerase inhibition using SMILES-based descriptors. Mol Divers 19(4):955–964
Hassan GS, Georgey HH, Mohammed EZ, Omar FA (2019) Anti-hepatitis-C virus activity and QSAR study of certain thiazolidinone and thiazolotriazine derivatives as potential NS5B polymerase inhibitors. Eur J Med Chem 184:111747
Musmuca I, Caroli A, Mai A, Kaushik-Basu N, Arora P, Ragno R (2010) Combining 3-D quantitative structure–activity relationship with ligand based and structure based alignment procedures for in silico screening of new hepatitis C virus NS5B Polymerase inhibitors. J Chem Inf Model 50(4):662–676
Golub AG et al (2012) Discovery of new scaffolds for rational design of HCV NS5B polymerase inhibitors. Eur J Med Chem 58:258–264
Talele TT et al (2010) Structure-based virtual screening, synthesis and SAR of novel inhibitors of hepatitis C virus NS5B polymerase. Bioorg Med Chem 18(13):4630–4638
Malik AA, Phanus-umporn C, Schaduangrat N, Shoombuatong W, Isarankura‐Na‐Ayudhya C, Nantasenamat C (2020) HCVpred: a web server for predicting the bioactivity of hepatitis C virus NS5B inhibitors. J Comput Chem 41:1820–1834
Worachartcheewan A, Prachayasittikul V, Anuwongcharoen N, Shoombuatong W, Prachayasittikul V, Nantasenamat C (2015) On the origins of hepatitis C virus NS5B polymerase inhibitory activity using machine learning approaches. Curr Topics Med Chem 15(18):1814–1826
Di Marco S et al (2005) Interdomain communication in hepatitis C virus polymerase abolished by small molecule inhibitors bound to a novel allosteric site. J Biol Chem 280(33):29765–29770
Charoenkwan P, Chiangjong W, Nantasenamat C, Hasan MM, Manavalan B, Shoombuatong W (2021) StackIL6: a stacking ensemble model for improving the prediction of IL6 inducing peptides. Brief Bioinform. https://doi.org/10.1093/bib/bbab172
Charoenkwan P, Nantasenamat C, Hasan MM, Shoombuatong W (2020) Meta-iPVP: a sequence-based meta-predictor for improving the prediction of phage virion proteins using effective feature representation. J Comput Aided Mol Des 34(10):1105–1116
Hasan M, Schaduangrat N, Basith S, Lee G, Shoombuatong W, Manavalan B (2020) HLPpred-Fuse: improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation. Bioinformatics 36:3350–3356
Gaulton A et al (2017) The ChEMBL database in 2017. Nucleic Acids Res 45(D1):D945–D954
Liu K, Chen W (2020) Imrm a platform for simultaneously identifying multiple kinds of RNA modifications. Bioinformatics 36(11):3336–3342
Su ZD et al (2018) iLoc-lncRNA predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC. Bioinformatics 34(4):4196–4204
Wei L, Zhou C, Su R, Zou Q (2019) PEPred-Suite: improved and robust prediction of therapeutic peptides using adaptive feature representation learning. Bioinformatics 35(21):4272–4280
Xu Z-C, Feng P-M, Yang H, Qiu W-R, Chen W, Lin H (2019) iRNAD: a computational tool for identifying D modification sites in RNA sequence. Bioinformatics 35(23):4922–4929
Hasan MM, Basith S, Khatun MS, Lee G, Manavalan B, Kurata H (2020) Meta-i6mA: an interspecies predictor for identifying DNA N6-methyladenine sites of plant genomes by exploiting informative features in an integrative machine-learning framework. Brief Bioinform 22:bba202
Lv H, Zhang Z-M, Li S-H, Tan J-X, Chen W, Lin H (2019) Evaluation of different computational methods on 5-methylcytosine sites identification. Brief Bioinform 21:982–995
Qiang X, Zhou C, Ye X, Du P-f, Su R, Wei L (2020) CPPred-FL: a sequence-based predictor for large-scale identification of cell-penetrating peptides by feature representation learning. Brief Bioinform 21(1):11–23
Rao B, Zhou C, Zhang G, Su R, Wei L (2020) ACPred-Fuse: fusing multi-view information improves the prediction of anticancer peptides. Brief Bioinform 21(5):1846–1855
Willighagen EL et al (2017) The Chemistry Development Kit (CDK) v2. 0: atom typing, depiction, molecular formulas, and substructure searching. J Cheminform 9(1):1–19
Dao F-Y et al (2019) Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique. Bioinformatics 35(12):2075–2083
Phanus-umporn C, Shoombuatong W, Prachayasittikul V, Anuwongcharoen N, Nantasenamat C (2018) Privileged substructures for anti-sickling activity via cheminformatic analysis. RSC Adv 8(11):5920–5935
Shoombuatong W et al (2017) Towards the revival of interpretable QSAR models. In: Advances in QSAR modeling. Springer, New York pp 3–55
Schaduangrat N, Nantasenamat C, Prachayasittikul V, Shoombuatong W (2019) ACPred: a computational tool for the prediction and analysis of anticancer peptides. Molecules 24(10):1973
Pratiwi R et al (2017) CryoProtect: a web server for classifying antifreeze proteins from nonantifreeze proteins. J Chem 2017:1–15
Win TS, Malik AA, Prachayasittikul V, Wikberg JES, Nantasenamat C, Shoombuatong W (2017) HemoPred: a web server for predicting the hemolytic activity of peptides. Fut Med Chem 9(3):275–291
Win TS, Schaduangrat N, Prachayasittikul V, Nantasenamat C, Shoombuatong W (2018) PAAP: A web server for predicting antihypertensive activity of peptides. Fut Med Chem 10(15):1749–1767
Laengsri V, Nantasenamat C, Schaduangrat N, Nuchnoi P, Prachayasittikul V, Shoombuatong W (2019) TargetAntiAngio: a sequence-based tool for the prediction and analysis of anti-angiogenic peptides. Int J Mol Sci 20(12):2950
Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259
Manavalan B, Basith S, Shin TH, Wei L, Lee G (2018) mAHTPred: a sequence-based meta-predictor for improving the prediction of anti-hypertensive peptides using effective feature representation. Bioinformatics 35:2757–2765
Manavalan B, Basith S, Shin TH, Wei L, Lee G (2019) Meta-4mCpred: a sequence-based meta-predictor for accurate DNA 4mC site prediction using effective feature representation. Mol Therapy 16:733–744
Wei L, Ye X, Xue Y, Sakurai T, Wei L (2021) ATSE: a peptide toxicity predictor by exploiting structural and evolutionary information based on graph neural network and attention mechanism. Brief Bioinform 22:bbab041
Wei L, Zhou C, Chen H, Song J, Su R (2018) ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides. Bioinformatics 34(23):4007–4016
Charoenkwan P, Nantasenamat C, Hasan MM, Manavalan B, Shoombuatong W (2021) BERT4Bitter: a bidirectional encoder representations from transformers (BERT)-based model for improving the prediction of bitter peptides. Bioinformatics 37(17):2556–2562
Hasan MM, Alam MA, Shoombuatong W, Deng H-W, Manavalan B, Kurata H (2021) NeuroPred-FRL: an interpretable prediction model for identifying neuropeptide using feature representation learning. Brief Bioinform bbab172. https://doi.org/10.1093/bib/bbab172
Pedregosa F et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Wei L, Ding Y, Su R, Tang J, Zou Q (2018) Prediction of human protein subcellular localization using deep learning. J Parallel Distrib Comput 117:212–217
Wei L, Hu J, Li F, Song J, Su R, Zou Q (2020) Comparative analysis and prediction of quorum-sensing peptides using feature representation learning and machine learning algorithms. Brief Bioinform 21(1):106–119
Su R, Liu X, Xiao G, Wei L (2020) Meta-GDBP: a high-level stacked regression model to improve anticancer drug response prediction. Brief Bioinform 21(3):996–1005
Su R, Wu H, Xu B, Liu X, Wei L (2018) Developing a multi-dose computational model for drug-induced hepatotoxicity prediction based on toxicogenomics data. IEEE/ACM Trans Comput Biol Bioinform 16(4):1231–1239
Charoenkwan P, Chiangjong W, Lee VS, Nantasenamat C, Hasan MM, Shoombuatong W (2021) Improved prediction and characterization of anticancer activities of peptides using a novel flexible scoring card method. Sci Rep 11(1):1–13
Charoenkwan P et al (2013) HCS-neurons: identifying phenotypic changes in multi-neuron images upon drug treatments of high-content screening. BMC Bioinform 14:S12
Charoenkwan P, Kanthawong S, Nantasenamat C, Hasan MM, Shoombuatong W (2020) iDPPIV-SCM: a sequence-based predictor for identifying and analyzing dipeptidyl peptidase IV (DPP-IV) inhibitory peptides using a scoring card method. J Proteome Res 19(10):4125–4136
Charoenkwan P, Kanthawong S, Nantasenamat C, Hasan MM, Shoombuatong W (2020) iAMY-SCM: improved prediction and analysis of amyloid proteins using a scoring card method with propensity scores of dipeptides. Genomics 113:689–698
Charoenkwan P, Nantasenamat C, Hasan MM, Shoombuatong W (2020) iTTCA-hybrid: improved and robust identification of tumor T cell antigens by utilizing hybrid feature representation. Anal Biochem 599:113747
Charoenkwan P, Shoombuatong W, Lee H-C, Chaijaruwanich J, Huang H-L, Ho S-Y (2013) SCMCRYS: predicting protein crystallization using an ensemble scoring card method with estimating propensity scores of P-collocated amino acid pairs. PLoS ONE 8(9):e72368
Charoenkwan P, Yana J, Nantasenamat C, Hasan MM, Shoombuatong W (2020) iUmami-SCM: a novel sequence-based predictor for prediction and analysis of umami peptides using a scoring card method with propensity scores of dipeptides. J Chem Inform Model 60:6666–6678
Charoenkwan P, Yana J, Schaduangrat N, Nantasenamat C, Hasan MM, Shoombuatong W (2020) iBitter-SCM: identification and characterization of bitter peptides using a scoring card method with propensity scores of dipeptides. Genomics 112:2813–2822
Frisch MJ et al (2016) Gaussian 09 Rev. D.01. Wallingford, CT
Verdonk ML et al (2005) Modeling water molecules in protein–ligand docking using GOLD. J Med Chem 48(20):6504–6515
Carhart RE, Smith DH, Venkataraghavan R (1985) Atom pairs as molecular features in structure-activity studies: definition and applications. J Chem Inform Comput Sci 25(2):64–73
Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E (2003) The Chemistry Development Kit (CDK): an open-source java library for chemo- and bioinformatics. J Chem Inform Comput Sci 43(2):493–500
Hall LH, Kier LB (1995) Electrotopological state indices for atom types: a novel combination of electronic, topological, and valence state information. J Chem Inform Comput Sci 35(6):1039–1045
Klekota J, Roth FP (2008) Chemical substructures that enrich for biological activity. Bioinformatics 24(21):2518–2525
Durant JL, Leland BA, Henry DR, Nourse JG (2002) Reoptimization of MDL keys for use in drug discovery. J Chem Inform Comput Sci 42(6):1273–1280
Kim S et al (2016) PubChem substance and compound databases. Nucleic Acids Res 44:D1202
Laggner C (2005) SMARTS patterns for functional group classification
Acknowledgements
This work was fully supported by College of Arts, Media and Technology, Chiang Mai University and partially supported by Chiang Mai University, Chiang Mai University High Performance Computer Service and Mahidol University. GOLD license was supported by Kasetsart University Research and Development Institute, KURDI (Grant No. FF (KU) 11.64).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Malik, A.A., Chotpatiwetchkul, W., Phanus-umporn, C. et al. StackHCV: a web-based integrative machine-learning framework for large-scale identification of hepatitis C virus NS5B inhibitors. J Comput Aided Mol Des 35, 1037–1053 (2021). https://doi.org/10.1007/s10822-021-00418-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-021-00418-1