Background

Hepatitis C virus (HCV) is a global medical condition that causes several life-threatening chronic diseases in the liver and the hepatitis C virus (HCV) is primarily liable for chronic liver diseases worldwide (Jia et al. 2020). In many countries, HCV is indeed a primary risk factor for liver failure and liver transplantation which is a growing public health problem (El-Kassem et al. 2019; González-Grande et al. 2016). The World Health Organization (WHO) estimated seventy-one million people were infected with HCV in 2015, representing one percent of the world’s population. The infection is widely dispersed in diverse parts of the world, with an incidence of 0.5 to 6.5 percent in the wide-ranging population (Kucherenko et al. 2016; World Health Organization 2018).

HCV is an associate of the viral envelop (Flaviviridae) consisting of a positive single-strand ribonucleic acid (RNA) encoding a polyenzyme. Both host and viral proteolytic enzymes cleave this polyenzyme to the structural and non-structural (NS) enzymes. The NS3/4A area is trypsin similar to the protein complex which performs a very important function in viruses’ replication and facilitates procedures for attenuating and evading the natural immune defense of the host cell. Extensive studies indicate that blocking the NS3/4A enzyme function would successfully prevent HCV replication (Shi et al. 2015). HCV serine protease NS3/4A is an enzyme that arises from the mixture of two distinct enzymes, the macromolecule NS3, and the macromolecule NS4A. NS3 macromolecule is a bi-functional macromolecule containing the N-terminal serine protease and a C-terminal endonuclease region. The HCV poly-macromolecules at the NS3/4A, NS4A/4B, NS4B/5A, and NS5A/5B intersections are catalyzed by the N-terminal serine protease. However, NS3 only is not sufficient for cleavage at these intersections. NS4A macromolecule is essential for efficient cleavages. Mainly, as a cofactor, the central portion of NS4A is essential to offer NS3 with a fitted complex and stimulate the catalytic process. At these intersections, abnormalities in NS4A disrupt the NS3 protease to cleave. The exact function of NS4A is to balance the NS3 from cellular protease by protecting it from breakdown and stimulating the proteolytic response (Subedi 2019). There are seven recognized HCV genotypes (GTs), and sixty-seven verified subgroups (Chahine et al. 2017; Petruzziello et al. 2016). The global dispersion of the HCV genotypes differs across various geographic locations. HCV GT1 is the most common in the world and has a broad geographic range, accounting for a larger percentage (46%) of HCV infections worldwide. HCV GT3 is the next most common GT in South Asia, Australia, as well as some European nations with thirty percent of infections worldwide. HCV GT2 and GT4 account for nine to thirteen percent of the infections with a narrow geographic dispersion. HCV GT2 is greater in Asia and West Africa, whereas HCV GT4 infections are high in Central and Eastern Sub-Saharan Africa, North Africa, and the Middle East. HCV GTs 5, 6, and 7 are perhaps the most geographically limited, with GT5 accessible in South Africa and GT6 common in eastern and south-eastern Asia, whereas GT7 also was stated in a small percentage of people in DR Congo (Coppola et al. 2019; Rabaan et al. 2019; Smith et al. 2014).

Though some hepatitis viruses have vaccines, there are none available for HCV (El-Kassem et al. 2019). In the last few years, conventional interferon (IFN)-free therapeutic regimens in blend with ribavirin have been widely recognized as a model of excellence in antiviral therapy (Liu et al. 2018). However, this may have several side effects such as thyroid deficiency, neurological problems, digestive problems, as well as other negative reactions. Lately, the therapy is primarily dependent on direct-acting antiviral agents in which HCV NS protease is seen as the main target of antiviral inhibitor's development (El-Kassem et al. 2019). According to Liu et al., “in 2011, telaprevir and boceprevir were successively approved as the first direct-acting antiviral agents (DAAs) used as the HCV NS3/4A protease inhibitors, which initiated a breakthrough in the treatment of HCV” (Liu et al. 2018; Poordad et al. 2011). DAAs have culminated in a significantly improved tolerability and effectiveness compared with the traditional regimen of severe HCV infection (Bidell et al. 2016). The development of resistance-mechanism, like antibiotics, also encourages the viable discovery of new compounds or the modification of existing ones (El-Kassem et al. 2019).

The strategy to a correlation existed between structure and activity (QSAR) is indeed very helpful for the estimation of biological responses, particularly in drug development. This strategy is built on the hypothesis that differences in the properties of the molecules (biological activities) may be strongly linked with variations in their physicochemical features (molecular descriptors) (Arthur et al. 2020; Bhadoriya et al. 2015; Veerasamy et al. 2011). Virtual screening (VS) utilizes computer-driven tools and techniques to explore hidden organic molecules that are similar in structure. VS has surfaced in drug development as a computationally intensive strategy to evaluate different databases of chemical compounds for unique hits with improved characteristics, which could then be tested empirically. Just like other computational techniques, VS would not aim to substitute in vitro and also in vivo assays, but instead to facilitate the development process, lessen the number of candidates to be tried empirically, then justify their selection. Such techniques are typically applied to get hits that seem to be much more likely to offer good clinical candidates (Arthur et al. 2020; Neves et al. 2018; Vyas et al. 2008).

The earlier process of the drug development process is preceded by guesswork, and it is costly in terms of capital, time, and resources. Nevertheless, with the introduction of computational strategies of drug design, the drug development and design process can be successfully carried out saving huge capital resources (Arthur et al. 2016). Over the random screening of existing chemical libraries, the ligand-based strategy has proven successfully (Roy et al. 2012). It provides a theoretical tool that can be used to suggest the actions of recognized and proposed drug molecules. Ligand-based and 3D-QSAR approaches for the discovery of unique and effective NS5B inhibitors were also explored by Therese et al. (2014). In the present research, computational methods were applied to derive a reliable QSAR model and to use the data provided by the model to proposal novel molecules with high potency as an NS3/4A protease inhibitor and to investigate the binding energy of designed molecules in comparison with an approved direct-acting antiviral agent (Telaprevir, Simeprevir, and Voxilaprevir) through molecular docking.

Methods

Dataset

The molecules utilized in this study were 63 N-methyl-6-(N-methylmethylsulfonamido)-5-(4-oxo-3,4-dihydroquinazolin-6-yl)benzofuran-3-carboxamide derivatives retrieved from datasets database (https://pubchem.ncbi.nlm.nih.gov/) as HCV inhibitors with pubchem AID: 1344392 deposited on 8th September 2018 by CHEMBL (External ID: CHEMBL3888610) obtained as IC50 (µM) and was transformed to pIC50 (pEC50 = − logIC50) (Tropsha 2010).

Computed descriptors

The descriptors were computed by first optimized the dataset molecules with density functional theory (DFT) using B3LYP functional and 631G** basis set in Spartan 14 software (Shao et al. 2006). The optimized structures are first transferred to another software (PaDEL-Descriptors), which computed the structural properties (molecular descriptors) for each molecule (Yap 2011).

Dataset division

In the current analysis, the dataset was mainly split into two parts containing 70% dataset for constructing the model and 30% dataset which is unused during model construction but was used in the determination of the model's predictive ability (Tropsha 2010).

Model generation

The correlation analysis was achieved by Material Studio software and Genetic Function Approximation (GFA) was incorporated in the process to define the ideal QSAR models. In regression analysis, X (descriptors) relies on the conditional value of predictor variables Y (pIC50) (Veerasamy et al. 2011). GFA is the technique used to generate statistical data models using the evolution process. Substituting regression study further with the GFA algorithm enables model-building to be comparable with, or better to conventional approaches, and provides additional information accessible that is not given by other methods. As with most methods for extrapolation, GFA offers various models for the user (Rogers 1997).

Assessment of the generated model

The established model was assessed by the following numerical measures: cross-validated correlation coefficient (\(q_{{{\text{CV}}}}^{2}\)), external explained variance (\(r_{{{\text{pred}}}}^{2}\)), random R2 (\(cR_{p}^{2}\)), variance inflation factor (VIF), and mean effect (MF), which are defined as follows:

$$q_{{{\text{CV}}}}^{2} = 1 - \frac{{\sum \left( {y_{{\exp}} - y_{{{\text{est}}}} } \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {y_{{\exp}} - \overline{y}} \right)^{2} }}$$

\(y_{{\exp}}\),\(y_{{{\text{est}}}}\) and \(\overline{y}\) represents the experimental, estimated, and average data point of experimental biological response, respectively.

The external explained variance (\(r_{{{\text{pred}}}}^{2}\)) was computed using the equation:

$$r_{{{\text{pred}}}}^{2} = 1 - \frac{{\sum \left( {y_{{{\exp}\left( {{\text{Test}}} \right)}} - y_{{{\text{est}}\left( {{\text{Test}}} \right)}} } \right)^{2} }}{{\sum \left( {y_{{{\exp}\left( {{\text{Test}}} \right)}} - \overline{y}_{{{\text{Training}}}} } \right)^{2} }}$$

\(y_{{{\exp}\left( {{\text{Test}}} \right)}}\) and \(y_{{{\text{est}}\left( {{\text{Test}}} \right)}}\) represent experimental and estimated activity data for the test set molecules, and \(\overline{y}_{{{\text{Training}}}}\) represents the average experimental biological response of the training set.

The random R2 values (\(cR_{p}^{2}\)) of the model were estimated from the equation:

$$cR_{p}^{2} = R \times \sqrt {R^{2} - \overline{R}_{r}^{2} }$$

where R, \(R^{2}\) and \(\overline{R}_{r}^{2}\) represents correlation coefficient, coefficient of determination, and mean of randomized coefficient of determination, respectively.

The variance inflation factor (VIF) of each descriptor in the model was estimated by the equation:

$${\text{VIF}} = \frac{1}{{1 - R^{2} }}$$

where \(R^{2}\) is the various correlation coefficient of one descriptor's influence regressed in the model over the other molecular descriptors (Beheshti et al. 2016).

Ery descriptor's mean effect (MF) value had been used to determine the descriptor's comparative impact on the model. The MF was determined by the Formula:

$${\text{MF}} = \frac{{\beta_{j} \mathop \sum \nolimits_{i = 1}^{i = n} d_{ij} }}{{\mathop \sum \nolimits_{j}^{m} \beta_{j} \mathop \sum \nolimits_{i}^{n} d_{ij} }}$$

where βj, dij, m, and n represents the descriptor coefficient j in that model, the descriptor's value in the sample space for each compound in the training dataset, the number of descriptors in the model, and the number of compounds in the training dataset, respectively (Arthur et al. 2020; Oluwaseye et al. 2020).

Applicability domain (AD)

Williams's plot was used to measure the established QSAR model's AD. The actual reference space (\(h_{i}\)) of a molecule, as well as the threshold value (h*), are evaluated using the equations below:

$$\begin{aligned} & h_{i} = X\left( {X^{T} X} \right)^{ - 1} X^{T} \\ & h^{*} = \frac{{3\left( {q + 1} \right)}}{n} \\ \end{aligned}$$

For which X is the row-matrix descriptor of the query item, which is distinctive array of the training dataset, n represents the total of query item and q represents the total descriptors in the model (Arthur et al. 2020; Eriksson et al. 2003; Li et al. 2011). The standardized residual (SDR) of the model AD is estimated by the equation:

$${\text{SDR}} = \frac{{\overline{Y} - Y}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} \frac{{\left( {\overline{Y} - Y} \right)^{2} }}{n}} }}$$

In which Y is the observed response value for whichever the set (training or validation sets), \(\overline{Y}\) is the model's predicted activity value, and the total of compound present in the dataset is represented by n. The conventional dimension prediction for a given molecule is usually demarcated by 0 < hi < \(h^{*}\) and − 3 < SDR < 3 boundaries. Consequently, whichever molecule through SDR less than − 3 or greater than + 3 are labeled an outlier in the variable response area, as well as any molecule with control higher value than \(h^{*}\), is labeled a distinguished molecule foreign to the most compounds used during model construction.

Docking studies

Ligand structure preparation

ChemBio Ultra 12.0 was used to draw 2D Ligand structures (Evans 2014; Li et al. 2004). The density functional theory (DFT) technique in Spartan 14 was used to minimize the energy of each ligand in the dataset and input into PyRx in PDB file format (Huey et al. 2012).

Protein structure preparation

The structure of HCV NS3/4a protease was extracted via Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) with the HCV NS3/4a protease structural PDB ID being 4A92. The co-crystallized ligand macrocyclic protease inhibitor in complex with the HCV NS3/4a protease crystal structure was discarded, hydrogen atoms were introduced, slight use residue structures were discarded, partials side chains were substituted with the use of discovery studio (Danishuddin et al. 2010). The structures are again saved in PDB form for use in PyRx tools (Huey et al. 2012).

Docking procedure and evaluation

A rectangular grid measuring 65.5217 × 72.7141 × 80.3011 Å, centered on 5.2017, 15.6939, 30.8304 was built across the binding site of ligand on HCV NS3/4a protease by autodock tools. The grid center was fixed at ligand, and grid energy measurements were performed. The Autodock docking computation used default settings, and 10 docked alignments were produced for each molecule. The bonded ligand was deleted from the complexes test the validity and reliability of the docking computations and forwarded for one-ligand run computation. This replicated core-scoring sequences of 4 falling from bonded X-ray verification for HCV NS3/4a protease roots mean square deviation (rmsd) standards of 0.71–0.74 Å, proposing that this process is sufficiently valid to be enough for docking studies of other molecules. The results were transferred for thorough observation of the binding relationships and correlations between the molecules and amino acid sequences at the active spots using discovery-studio software (Trott and Olson 2010).

Results

A QSAR method for investigating the structure–activity relationship of 63 HCV NS3/4a protease inhibitors was implemented in the present research, and the QSAR model is presented as:

$$\begin{aligned} {\varvec{pIC}}_{50} & = - 50.5082\left( { \pm 22.1927} \right) - 0.0021\left( { \pm 0.0004} \right) {\varvec{ATSC}}5{\varvec{i}} \\ & \quad + 47.1967\left( { \pm 8.8627} \right) S{\varvec{pMin}}3\_{\varvec{Bhs}} + 23.7391\left( { \pm 7.2362} \right)\user2{ SpMax}3\_{\varvec{Bhs}} \\ & \quad + 13.2688\left( { \pm 1.8530} \right) {\varvec{MDEN}}33\user2{ } - 23.4365\left( { \pm 3.7989} \right) {\varvec{piPC}}3 \\ n_{{{\text{train}}}} & = 44, \quad r_{{{\text{train}}}}^{2} = 0.7704,\quad K = 5,\quad F = 25.4979,\quad q_{{{\text{LOO}}}}^{2} = 0.6914,\quad {\text{RMSE}}_{{{\text{train}}}} = 0.3880, \\ n_{{{\text{test}}}} & = 19,\quad r_{{{\text{test}}}}^{2} = 0.7047,\quad {\text{RMSE}}_{{{\text{train}}}} = 0.3392,\quad {\text{Outliers}} > {3}.0 = {3},\quad {\text{Influential }}\;{\text{molecules}} > h^{*} = {4}. \\ \end{aligned}$$

where \(n_{{{\text{train}}}}\) and \(n_{{{\text{test}}}}\) are amounts of data present in the training and validation dataset, respectively, \(r_{{{\text{train}}}}^{2}\) and \(r_{{{\text{test}}}}^{2}\) are the coefficients of correlation for internal and external validation, respectively, \(Q_{{{\text{LOO}}}}^{2}\) is the squared cross-validation coefficients for leave one out, F is the Fisher F statistic, and \({\text{RMSE}}_{{{\text{train}}}}\) and \({\text{RMSE}}_{{{\text{test}}}}\) are the roots mean square error for training and test set, respectively, K is the predictor parameters (descriptors) present in the model.

Additional file 1: Table S1 displays the chemical structure, Pubchem SID and CID, experimental IC50, and Estimated pIC50 of all the compounds used for this analysis. Table 1 describes the model figures. The model's correlation matrix and VIF are stated in Table 2. Table 3 presents 10 iterations of y-randomization test. The descriptions of the descriptors used in the model and the computed mean effect (MF) of each descriptor present in the model are stated in Table 4. Table 5 represents the Template molecule, designed molecules, and approved direct-acting antiviral agents (Telaprevir, Simeprevir, and Voxilaprevir) with their estimated pIC50 and leverages while Table 6 presents the docking results of the template molecule, designed molecule with the highest activity, and approved direct-acting antiviral agents (Telaprevir, Simeprevir, and Voxilaprevir). In Fig. 1 the model applicability domain (AD) was represented using Williams's plot. Figure 2 presents a plot of the model estimated against experimental anti-hepatitis C activity values for both the training and test sets. Figure 3 represents the chart of SDR vs estimated pIC50 values for the whole data point. The structure of the precursor molecule which is compound 33 in Additional file 1: Table S1 is presented in Fig. 4. The three-dimensional and two-dimensional interactions of the template molecule (Molecule 1, see Table 6), the newly designed molecule (molecule 7, see Table 6), Telaprevir (molecule 14, see Table 6), Simeprevir (molecule 15, see Table 6) and Voxilaprevir (molecule 16, see Table 6) with the binding pocket of 3D structures of HCV NS3/4a protease/helicase are presented in Figs. 5, 6, 7, 8 and 9, respectively.

Table 1 QSAR models validation parameters and scores
Table 2 Pearson's correlation, Variance Inflation Factor (VIF) of descriptors used in the model
Table 3 Y-randomization test scores
Table 4 A description of the descriptor used in the model and the MF
Table 5 Template molecule, designed molecules, Telaprevir, Simeprevir, and Voxilaprevir with their estimated pIC50 and leverages
Table 6 Docking results of Template molecule, designed molecule with the highest activity, first, second, and third generation approved direct-acting antiviral agents
Fig. 1
figure 1

The model Applicable Domain plot (Williams plot)

Fig. 2
figure 2

Plot of the model estimated against experimental anti-hepatitis C activity values

Fig. 3
figure 3

The plot of standardized residual against estimated pIC50 values for the entire data set

Fig. 4
figure 4

The structure of the template molecule see Additional file 1: Table S1, C 33 (2-(4-fluorophenyl)-5-(3-(1-(4-fluorophenyl)ethyl)-4-oxo-3,4-dihydro-2H-pyrido[2,3-e][1,3]oxazin-6-yl)-N-methyl-6-(N-methylmethylsulfonamido)benzofuran-3-carboxamide)

Fig. 5
figure 5

The 3D and 2D interaction of the Template molecule (Molecule 1, see Table 6) with the binding pocket of 3D structures of HCV NS3/4a protease/helicase (PDB ID: 4A92)

Fig. 6
figure 6

The 3D and 2D interaction of the designed Molecule (Molecule 7, see Table 6) with better activity to Telaprevir, Simeprevir, and Voxilaprevir (approved direct-acting antiviral agents) with the binding pocket of 3D structures of HCV NS3/4a protease/helicase (PDB ID: 4A92)

Fig. 7
figure 7

The 3D and 2D interaction of the Telaprevir (first generation approved direct-acting antiviral agent) (Molecule 14, see Table 6) with the binding pocket of 3D structures of HCV NS3/4a protease/helicase (PDB ID: 4A92)

Fig. 8
figure 8

The 3D and 2D interaction of the Simeprevir (second generation approved direct-acting antiviral agent) (Molecule 15, see Table 6) with the binding pocket of 3D structures of HCV NS3/4a protease/helicase (PDB ID: 4A92)

Fig. 9
figure 9

The 3D and 2D interaction of the Voxilaprevir (third generation approved direct-acting antiviral agent) (Molecule 16, see Table 6) with the binding pocket of 3D structures of HCV NS3/4a protease/helicase (PDB ID: 4A92)

Discussion

The developed model explains seventy-seven percent (77%) and predicts seventy percent (70%) of the variances of the considered molecules with anti-hepatitis-C virus activity against HCV NS3/4a protease. The model statistics described in Table 1 meet the criteria for validating a QSAR model developed by OECD (Roy et al. 2012; Veerasamy et al. 2011). The findings show that the R2 and Q2 for the model's internal evaluation have been stated as 0.7704 and 0.6914, respectively. This implies that the model correctly interpreted the data when evaluated and that the model can estimate the fitted training set, as the model predicted approximately 70% of the data and thus met the minimum condition of 50% (Veerasamy et al. 2011). The error statistics, such as SEE and RMSE, have also been documented in Table 2 and have been found to backing model robustness.

Qin et al. reported a QSAR study of the bioactivity of hepatitis C virus (HCV) NS3/4A protease inhibitors by multiple linear regression (MLR) and support vector machine (SVM) in the literature, and results show R2 values for internally and externally evaluation were, respectively, 0.75 and 0.72 (Qin et al. 2017), which seem to be close in values compared to R2 values for internally and externally evaluation of 0.77 and 0.70, respectively, as noted in this article.

From Table 2, it was detected that all the descriptors have VIF scores below 5, which means the model obtained has significant results and that the descriptors were considered to be fairly orthogonal (Eriksson et al. 2003). The mean effect (MF) value offers significant details on the impact of the model’s molecular descriptors, the size and the signs of these descriptors MF show their intensity and direction in manipulating the activities of the study compounds and are observed to be in decreasing order of piPC3 > SpMax3_Bhs > SpMin3_Bhs > ATSC3i > MDEN33 (see Table 4). SpMax3_Bhs, SpMin3_Bhs, and MDEN33 contribute positively, while piPC3 and ATSC3i contribute negatively to the activity of the anti-hepatitis-C virus compounds. The y-randomization test computed shows that the value for the model's random R2 (cR2p = 0.7025) is substantially higher compared to the target value of 0.50, meaning the model is never the product of simple possibility alone (Arthur et al. 2016).

The square area in Fig. 1 represents the model AD. Where h∗ (0.41) boundary is the model warning leverage and SDR is the standardized residual of the models. The outcome shows that 89% of the molecules considered were inside the AD of the model while 5% formed the Outliers which are compound 16, 18, and 32 in Additional file 1: Table S1 as identified and indicated in Fig. 1 (> ± 3.0) and 6% of the studied molecules are Influential molecules which are compound 2, 6, 21, and 39 in Additional file 1: Table S1 as identified and indicated in Fig. 1 (> h*). In summary, the suggested model had high potential and efficiency. Thus, it can be used as an instrument for optimizing the activity of any of the compounds considered.

Figure 2 indicates a meaningful correlation among the models’ experimental and estimated activity values, and there was also propagation of the models residual along with the axis SDR equivalent to zero in Fig. 3. Such results showed that the models had the high predictive potential both internally and externally and were free of systemic bias. Consequently, they could be used to predict known molecules lacking activity, as long as the molecule is inside the AD of the model.

Explanation of descriptors utilized in the established QSAR model

The first predictor parameter in the model is a two-dimensional Autocorrelation descriptor (ATSC5i) and is characterized as based Broto-Moreau autocorrelation of lag 5 measured by first ionization potential which explains how the first ionization potential is spread along with the topological structures of the molecules (Gramatica et al. 2000), and its presence in the model linked the first ionization potential of pairs atoms that are disconnected by five bonds (lag 5) of the researched molecules with anti-hepatitis-C virus activity on HCV NS3/4a protease. It was found from the developed model that ATSC5i negatively influences the activity of the compounds when increased.

SpMin3_Bhs and SpMax3_Bhs have positive influences on the anti-hepatitis-C virus activity of the analogs on HCV NS3/4a protease. They are both Burden-Modified Eigenvalues Descriptor. SpMin3_Bhs reveals the smallest absolute eigenvalue of the Burden modified matrix—n 3/measured by the relative I-state. Fluorine/fluorine-containing substituents, e.g., C6H5F, drop the estimated value and so are unfavorable to the activity. The SpMax3_Bhs is defined as the largest absolute eigenvalue of Burden-modified matrix—n 3 / measured by relative I-state. A transformed correlation matrix (Burden matrix) is considered, the diagonal element of which is being replaced by the comparative interpretative condition of the atoms in the molecule, and the two bonded atoms are represented off-diagonally. The descriptor encrypts facts about the underlying molecular structure feature and is usually employed for similarity/differences searching (Todeschini and Consonni 2008).

MDEN33 is another descriptor found in the model and is described as a Molecular distance edge among all tertiary nitrogen. The descriptors are symbolized as MDEXst in which X represents the element, s stand for the first atom kind, and t stand for the second atom kind. The kind or level of the atom (primary, secondary, or tertiary) is derived from the number of non-hydrogen atoms connected to a specified atom (Todeschini and Consonni 2008). It is positively related to the activity of researched molecules. This is an indication that the N-containing moiety introduction rises the activity values of the researched molecules. A molecule with an extra nitrogen atom had a high value of MDEN33 in its structure, hence better activity e.g., molecule, 7, 8, 11, and 12 in Table 6.

In the model, the last descriptor is piPC3 which is a two-dimensional descriptor and is known as a Standard bond order 3 (ln(1 + x), and was found to have a negative impact on the activity of the researched molecules when improved (Todeschini and Consonni 2008).

New molecule proposal and estimation of activity

Based on the built QSAR model and evaluated results, compound 33 in Additional file 1: Table S1 shown in Fig. 4 was used as a pattern to improve the molecular structure. Compound 33 was used as a template for designing novel molecules because it was carefully chosen from Fig. 1, detecting the compound with high activity, low standardized residual, and was discovered inside the established model’s AD. The previously established QSAR model was used to estimate the activity of the template molecule, newly designed molecules, and approved direct-acting antiviral agents (Telaprevir, Simeprevir, and Voxilaprevir). The result shows that all designed derivatives and DAAs have enhanced pIC50 value than the template except molecule 3, 4, and 10 (see Table 5). Also, molecule 7 in Table 5 has the highest activity among them even better compared to DAAs. The structure of the template, newly designed molecules, and approved direct-acting antiviral agents (Telaprevir, Simeprevir, and Voxilaprevir) together with their estimated activity and Leverages are presented in Table 5. The outcomes of the leverage were good and found to be lower than the leverage threshold (h* = 0.41), this implied that all the designed molecules, as well as approved direct-acting antiviral agents (Telaprevir, Simeprevir, and Voxilaprevir), were within the model’s sphere of applicability.

Molecular docking results and analysis

Among all the molecules in Table 5 including approved direct-acting antiviral agents, it was observed that molecule 7 has the highest predicted pIC50 value (17.3373) and therefore was exposed to a molecular docking study. Also, Telaprevir, Simeprevir, and Voxilaprevir (approved direct-acting antiviral agents) and the template molecule with predicted pIC50 of 13.5142, 15.1774, 16.9516, and 8.3572, respectively, are exposed to a similar docking analysis for comparison. The results of the docking analysis such as Binding Energy (kCal/mol), Interactions with amino acid, Types of Interaction, Bond length (Å) of the template molecule (1 in Table 5), designed molecule (7 in Table 5), and the approved direct-acting antiviral agents (Telaprevir, Simeprevir, and Voxilaprevir) is reported in Table 6, while Figs. 5, 6, 7, 8, 9 show the three-dimensional and two-dimensional interaction of the template molecule (molecule 1 in Table 5), designed molecule (molecule 7 in Table 5), and the approved direct-acting antiviral agents (molecule 14, 15, and 16, respectively, in Table 5) with the binding pocket of 3D structures of HCV NS3/4a protease/helicase (PDB ID: 4A92), respectively.

As shown in Fig. 6, we observe that the designed molecule is excellently placed in the active pocket of the receptor. Because it has the highest activity and the lowest binding energy (17.3373 and − 10.7) compared to molecule 1 (8.3572 and − 7.5), molecule 14 (13.5142 and − 9.5), molecule 15 (15.1774 and − 10.0), and molecule 16 (16.9516 and − 10.5).

Figure 5 shows that THR298, SER229, SER297, GLU291, SER294, TRP50, ALA497, PRO230, and HIS293 amino acid of the target receptor are involved in the interaction with template molecule (molecule 1 in Table 6). Figure 6 shows that THR416, THR416, SER294, THR295, GLY484, TYR391, HIS293, ASP454, SER457, THR295, SER483, ARG393, and VAL456 are the amino acid of the target receptor involved in the interaction with a designed molecule (molecule 7 in Table 5), while Fig. 7 shows that ARG481, MET485, GLY484, THR295, THR295, HIS369, VAL490, VAL490, PRO523, VAL456, and VAL432 are the amino acid of the target receptor involved in the interaction with the approved direct-acting antiviral agents (molecule 14 in Table 5). Also Fig. 8 shows that ALA413, GLN434, ASN556, SER489, GLU493, ASP454, THR295, THR433, VAL456, and VAL490 are the amino acid of the target receptor involved in the interaction with molecule 15 in Table 5 and Fig. 9 shows that GLU493, TRP501, TYR502, GLY253, THR269, ASP412, ASP412, ALA413, PRO558, ALA497 are the amino acid of the target receptor involved in the interaction with molecule 16 in Table 5. It was observed from the docking results presented in Table 6 that the target amino acid THR is involved in the interaction with all the docked molecules. This implies the importance of this amino acid in the inhibition of HCV NS3/4a protease/helicase. The bound complex of telaprevir with HCV NS3/4a protease/helicase is accompanied by the establishment of a covalent bond between the serine nucleophile of the HCV protease catalytic triad and the ketoamide connectivity of telaprevir which forms a stable, covalent and reversible complex with the serine protease, but the designed molecule accounts for both covalent and non-covalent interactions of the inhibitor with HCV NS3/4a protease/helicase. It was also observed that molecule 7 shows more interaction with the target receptor when compared to the template molecule and approved direct-acting antiviral agents which imply, the more the interaction the better the inhibition.

Limitations of the present study

Non-availability of reliable experimental datasets on hepatitis C virus.

Conclusion

The theoretically verified QSAR model found offered rationales to describe the anti-hepatitis-C virus activities of researched molecules. The model is theoretically reliable with sound statistical record (\(r^{2} = 0.{77}0{4}\) and \(r_{{{\text{pred}}}}^{2} = 0.{7}0{47}\)). And meet the conditions of a satisfactory QSAR model suggested by various groups. Different molecules having improved anti-hepatitis-C virus activity compared to the better active molecule in the data collection (compound 33), have been proposed for deep investigation. The binding affinity (− 10.7) of this newly identified molecule docked into the binding pocket of 3D structures of HCV NS3/4a protease/helicase (PDB ID: 4A92) were found to be better than that of compound 33 (− 7.5) in the datasets as well as approved direct-acting antiviral agents (Telaprevir, Simeprevir, and Voxilaprevir) which are − 9.5, − 10.0, and − 10.5, respectively. Hence, a novel molecule was identified showing high potency as HCV NS3/4a protease inhibitors.