Introduction

The novel coronavirus disease 2019 (COVID-19), caused by SARS-CoV-2 which vented in December 2019 in Wuhan, China, is creating disaster by causing significant morbidity and mortality. Cases of COVID-19 are being reported in nearly every country around the globe. As of September 9, 2020, the number of confirmed cases reached 27,486,960 globally, with 894,983 deaths (https://covid19.who.int/). In India, the number of confirmed cases reached 4,462,965, and the death toll reached 75,091 (https://www.worldometers.info/coronavirus/country/india/). Coronaviruses have been implicated in other epidemics in recent decades, such as acute respiratory disease (SARS) and Middle East respiratory disease (MERS). But compared to these, the transmission rate of COVID-19 is much higher, with substantial spreading of the viral infection from one infected individual to two to three healthy individuals on average.

Coronaviruses (CoVs), members of the Coronaviridae family, are among the largest known single-stranded RNA viruses [1]. CoVs contain the biggest genomes among all known RNA viruses, up to 26 to 32 kb in length [2]. The coronavirus genome is composed of four major structural proteins: the spike (S) protein, the nucleocapsid (N) protein, the protein membrane (M), and the protein envelope (E) that are indispensable for the development of a complete viral particle [3, 4]. A significant chunk of the genome of coronavirus is transcribed into polypeptides necessary for viral replication and gene expression. An approximately 306-amino acid polypeptide called main protease (Mpro) has a highly conserved sequence and is a crucial enzyme necessary for coronavirus replication [5]. Due to the known protein structure, main proteases are the primary targets for designing antiviral drugs to combat coronavirus infections [6, 7]. Towards this effort, numerous inhibitors have been designed to block different stages of viral entry, attachment, and replication in host cells. These compounds are then tested in cell-based systems [8, 9]. Currently, the CoV-associated pathologies are not approved for any specific antiviral treatment. The majority of therapies rely mostly on the control of symptoms and support treatments [10]. Few therapeutic agents that are under development are ribavirin, interferon (IFN)-α, and mycophenolic acid. Reports cited the effectiveness of anti-HIV drugs such as ritonavir, lopinavir, either alone or in combination with oseltamivir, remdesivir, and chloroquine [11]. Among these, ritonavir, remdesivir, and chloroquine showed efficacy at the cellular level. However, further experimental support and validation are needed to verify safety and efficacy. Common phytocompound and plant medicines were also used for decades in the fight against normal flulike conditions and fever. Ashwagandha (Withania somnifera) is an Indian Ayurvedic plant, used for herbal therapies in traditional medicine. Ashwagandha is considered to improve the immune system and to have a range of prophylactic and medicinal actions [12]. Withaferin-A is a bioactive withanolide from Ashwagandha that reportedly possesses inhibitory activity for HPV and influenza viruses [12, 13]. Based on these observations, the ability of the SARS-CoV-2 main protease (Mpro) inhibitors withaferin-A and its derivatives has been explored.

In a combination of the docking results reported earlier [14,15,16,17,18,19,20,21,22,23], naturally abundant phytocompounds like hesperidin, baicalin, myricitrin, calceolarioside B, methyl rosmarinate, rutin, diosmin, apiin, diacetylcurcumin, withaferin-A, zingiberene, and limonene might be worthy of clinical trials [24]. However, there is no report indicating the potential of important fragments and small molecule derivatives of these phytocompounds as potential agents against SARS-CoV-2 main protease (Mpro).

Several commercially available FDA-approved antiviral drugs such as lopinavir, ritonavir, remdesivir, and several other antiviral drugs are previously predicted to bind to the main protease of SARS-CoV. SARS-CoV-2 3CLpro or Mpro also shows a projected affinity and a strong efficacy value of Kd > 100 nM with these drugs [25, 26]. Prediction suggests that viral proteinase-targeting drugs could effectively influence the viral replication process. This prediction was backed by studies on molecular docking of HIV proteinase inhibitors of CoV protease [27]. This study showed that lopinavir, atazanavir, and ritonavir may inhibit the CoV proteinase. In case studies with inhibitor medicines such as hydroxychloroquin and remdesivir, atazanavir and ritonavir were tested similar to lopinavir [28]. But there is no significant evidence hitherto, whether these drugs can act efficiently as predicted against COVID-19 or not. Here we have reported the molecular interaction studies for both FDA-approved synthetic inhibitors and phytocompounds with the main protease Mpro. In doing so, we aimed to screen out the best phytocompound in comparison to synthetic drugs. In addition, the ADMET profiles of all the compounds were taken into account in order to apply a chemoinformatics approach to find out the small molecule fragments and derivatives of the best docked phytocompounds. Furthermore, quantitative structure-activity relationship (QSAR) analysis was performed to predict the IC50 values of novel derivatives. The CLC-Pred and pharmacophore models, along with molecular dynamics simulations, further supplemented these results in order to ascertain the best derivative compound from the initial set of phytocompounds.

Materials and methods

Retrieval of ligands

The rationale for the selection of 12 phytocompounds and 12 FDA-approved synthetic drugs was based on the reports depicted in Table S-1A (supplementary material). The 12 FDA-approved drugs are reported to have a significant inhibitory effect upon CoV main proteases and are thus considered as reference drugs in the study (Table S-1B, supplementary material). All of the selected molecules were retrieved from the PubChem domain (URL https://pubchem.ncbi.nlm.nih.gov/) [29]. The respective PubChem IDs are also listed in Table S-1A. A flowchart representing the pipeline adopted in the study is depicted in Fig. 1.

Fig. 1
figure 1

Flowchart of pipeline adopted in the study to identify small molecule inhibitors of Mpro

ADMET screening of the phytocompounds

A significant step in the production of drugs is the estimation of essential pharmacological properties of the possible small molecules in silico and in vivo. In silico approaches (referring to virtual screening) are preferable over in vivo predictions that are costly and time-consuming. A graph modeling–based tool, pkCSM (predicting small molecule pharmacokinetic properties using graph-based signatures), was used for studying ADMET (absorption, distribution, metabolism, excretion, and toxicity) properties. In this analysis, the respective canonical SMILEs of the compounds were used to calculate the ADMET properties.

Molecular interaction studies of Mpro with naturally derived phytocompounds and FDA-approved drugs

Molecular interaction studies using docking have predicted potential interactions between drug targets through energy minimization and binding energy calculations. The interaction between small molecules (ligands) and the respective protein receptor (which may be an enzyme) is a possible site of inhibition [30]. The molecular docking studies were carried out in AutoDock 4.2.1 [31]. Ligands retrieved from PubChem database for the docking analysis in 3D SDF format were translated and stored in Mol2 format using Open Babel 2.2.3 [32].

The atomic resolution structures resolved in X-ray and NMR 3D coordinates of the target protein molecule CoV main protease (Mpro) (PDB id 2gz9) were downloaded from the protein data bank (RCSB-PDB) and processed using the AutoDock tool. Ligand preparation and molecular docking were done according to the methods of Ghosh and co-workers. [33]. The rationale of this study was based on the ability of phytocompounds and FDA-approved synthetic drugs to bind with the target protein. It was then necessary to evaluate the binding free energy (∆G) in order to establish a comparison that might screen for the best phytocompound against CoV main protease (Mpro). Generation of small molecule derivatives of the best docked phytocompounds.

Generation of small molecule derivatives of the best docked phytocompounds

The 3 best docked phytocompounds (withaferin-A, hesperidin, and baicalin) in the sdf format were uploaded individually to the deep neural networking–based LigDream tool (https://www.playmolecule.org/LigDream/) to generate derivative molecules along with their canonical smiles from each phytocompound. DNNs are used to design effective routes of chemical synthesis where reversed reactions formally decompose the molecule (retrosynthesis) [34]. This process results in a series of reactions which can then be performed in the forward direction in the laboratory to synthesize the target fragments.

ADME analysis of the derived small molecules

The SMILE strings of 100 derivatives from each phytocompound has analyzed in the SwissADME web tool (http://www.swissadme.ch/) to quantify the physicochemical descriptors and estimate ADME parameters, drug-like nature, pharmacokinetic properties, and medicinal chemistry friendliness of multiple small molecules that are the prerequisites for a successful drug discovery. In this study, the ADMET profiles were analyzed for 300 derivatives to further screen out small molecule derivatives on the basis of their synthetic accessibility score and the drug likeliness parameters (Lipinski/Ghose/Veber/Egan/Muegge). Only those molecules that showed a synthetic accessibility score in a range of 1–4 and having no violation of drug likeliness parameters were considered for further studies. A final set of 30 molecules was obtained from 300 derivatives generated from the three preferred phytocompounds.

Molecular docking of Mpro with phytocompound derivatives

The second set of docking was performed to evaluate the binding free energy (∆G) for the selected phytocompound derivatives. A similar docking strategy was adopted as described in the previous section.

Quantitative structure-activity relationship (QSAR) analysis of the screened derivative molecules

Structure-activity relationships (SAR) based upon machine learning and statistical methods are extensively used in many areas of drug development, ranging from primary screening to lead optimization [35]. In this study, the phytocompounds were subjected to QSAR analysis applying a multiple linear regression model using EasyQSAR 1.0. A QSAR equation can be given as:

$$ Biological\ Activity= Constant+\left(C1\ P1\right)+\left(C2\ P2\right)+\left(C3\ P3\right)+\dots \dots \left( Cn\ Pn\right) $$

where the parameters P1 through Pn are computed for each molecule in the series and the coefficients C1 through Cn are calculated by fitting variations in the parameters and the biological activity. These P1, P2, P3, and so on are the descriptor variables of the QSAR equation [35]. For this purpose, molecular descriptors were obtained using the information’s from the tool Marvin 20.15, 2020 (http://www.chemaxon.com), and respective experimental IC50 values of the training dataset were collected from ChEMBL database (https://www.ebi.ac.uk/chembl/). The training dataset included 18 reported inhibitor molecules and 6 ligand descriptors, viz., molecular weight, LOGP, refractivity, polar surface area-2D, polarizability, and molecular surface area-3D which were used to predict the QSAR model. Thereafter, IC50 values of 10 test datasets including 7 best docked derivative molecules and their 3 parent phytocompounds were derived based on the predicted QSAR model. The model was developed from a specific chemical class of SARS-CoV inhibitors and not upon the values of the molecular descriptors. So the influence of the number of training set compounds on the applicability domain of the QSAR model is considered trivial. However, owing to the small training set of compounds, the domain of applicability will naturally be localized.

In silico prediction of cytotoxicity for tumor and non-tumor cell lines

In silico cytotoxicity prediction was carried out for tumor and non-tumor cell lines using CLC-Pred (Cell Line Cytotoxicity Predictor). CLC-Pred tools predict cytotoxicity of tumor cell lines depending on the cytotoxicity relationships between the structure and the cell lines built by Prediction of Activity Spectra for Substances (PASS) special training sets equipped with the leave-one-out cross-validation procedure. The in silico prediction results reportedly accord with the results of in vivo experiments by about 96% [36]. The web service, http://www.way2drug.com/Cell-line/, was used to predict the cytotoxic effects of chemical compounds in non-transformed and cancer cell lines based on the structural formula in silico. CLC-Pred provides a likelihood of the cytotoxicity of a chemical compound to evaluate whether the substance will be used in the experimental screening. The interpretation was done using the default parameters as described in CLC-Pred’s protocol. Using the parameters of this protocol, “Pa” indicates activity, and “Pi” indicates inactivity. Thus, our screening with Pa > Pi indicated that the probability of action is considerably higher than the probability of inactivity.

Pharmacophore modeling

“Pharmacophores” are mostly considered to be molecular fragments or functional groups of a chemical compound. A pharmacophore may be generated in a structural or ligand-based method. [37]. Here we have used pharmacophore modeling based upon the ligand-based approach to establish the pharmacophore models of the phytocompound derivatives. The PDB format file of the target receptor molecule main protease (Mpro) and the two best ligands or the pharmacophore feature molecules in mol2 format were analyzed in the ZINCPharmer Pharmacophore tool (zincpharmer.csb.pitt.edu). The ZINCPharmer utilizes the Pharmer open-source pharmacophore search to screen a large database of fixed conformers such as hydrophobic interactions, hydrogen bond donor/acceptor, positive/negative ions, and other pharmacophore features for pharmacophore matches [38].

MD simulation study

At the atomistic level, the MD simulation results allowed for the investigation of the structural dynamics of the receptor CoV-2 protease (Mpro) upon binding with small ligand (proposed drug) molecules. Molecular dynamics (MD) simulation studies were carried out in order to determine the backbone configuration of SARS-Cov-2 main protease bound to withaferin-A derivative molecule 61. To set up the simulation initially, the systems were built for Mpro with and without the withaferin-A derivative molecule 61 ligand in the system builder. MD simulation study was carried out in Desmond vs 2020–1. To set up the initial parameters of an orthorhombic box of 10x10x10 Å, Desmond system builder was used. The target protein Mpro and target-ligand complex were neutralized with NaCl by adding 0.15 M Na + ions. The prepared systems were relaxed using the Desmond default protocol of relaxation [39]. An MDS run of 50 ns was set up at constant temperature and constant pressure (NPT) for the final production run. The NPT ensemble was set up using the Nosé-Hoover chain coupling scheme at a temperature of 300 K for final production and throughout the dynamics with relaxation time 1 ps. An RESPA integrator was used to calculate the bonding interactions for a time step of 2 fs. All other parameters were associated in the settings followed as described by Shaw and co-workers [39]. After the final production run, the simulation trajectories of SARS-CoV-2 main protease complexed with withaferin-A derivative molecule 61 were analyzed for the final outcome of RMSD, RMSF, and ligand RMSF, derived from the simulation studies.

Results and discussions

The safety and efficacy of drug candidates are the necessary prerequisites for regulatory approval, and an initial ADMET profiling of the chosen reference drugs and phytocompounds provides the ADME properties which confirm the drug likeliness of the selected phytocompounds. The comparable statement from ADMET profiles of the phytocompounds with respect to the FDA-approved synthetic drugs is displayed in Table 1. The water solubility of a compound (logS) reveals the solubility of the molecule in water at 25 °C. The phytocompounds diacetylcurcumin (−6.225), withaferin-A (−5.063), and zingiberene (−5.967) displayed highest solubilities (log mol/L) over the reference set of drugs considered in this study. The intestinal absorptions of the phytocompounds (Table 1) are significantly lower than the synthetic drugs. The phytocompounds diacetylcurcumin, withaferin-A, zingiberene, and limonene shows the highest intestinal absorption among the tested phytocompounds and are comparable to the reference set of drugs. The steady-state volume of distribution (VDss) is the theoretical volume that the total drug would need to be uniformly distributed to give the same concentration as in blood plasma. VDss is considered low if below − 0.15 and high if above 0.45 in a logarithmic scale of L/Kg. The tested phytocompounds (Table 1) depict comparable VDss to the synthetic drugs. Also, all the phytocompounds gave negative results to AMES toxicity which indicated that the compounds are safe to be carried forward in further analysis.

Table 1 ADMET profiles of synthetic drugs and phytocompounds selected for the study

Molecular docking of phytocompounds and reference drugs

Molecular docking is employed to find out the interaction of our chosen reference drugs (bictegravir, tegobuvir, baricitinib, remdesivir, nelfinavir, hydroxychloroquin, prulifloxacin, mefloquine, favipiravir, dexamethasone, chloroquine, methylprednisolone) and selected phytocompounds (hesperidin, baicalin, myricitrin, calceolarioside B, methyl rosmarinate, rutin, diosmin, apiin, diacetylcurcumin, withaferin-A, zingiberene, limonene). Our investigations and results from molecular docking of the synthetic drugs and phytocompounds with the SARS-CoV2 target protein main protease (Mpro) revealed that the docking scores of most of the phytocompounds are comparable to that of the synthetic drugs (Table 2).

Table 2 Comparative chart depicting the binding energies and inhibitory concentrations of synthetic drugs and phytocompounds

However, withaferin-A showed best docking score among all the phytocompounds as well as synthetic drugs that were screened (Table 2). The compounds with low binding energy and inhibitory concentration are better suited to act as drug compounds owing to their better binding conformations. Thus, withaferin-A, hesperidin, and baicalin having binding energies − 9.22 kcal/mol, − 6.87 kcal/mol, and − 6.68 kcal/mol, respectively, were selected for further analysis. The synthetic drug methylprednisolone having binding energy − 8.02 kcal/mol was selected to be taken as a reference. The docked ligand molecules with protease were shown in Fig. 2 (a), (c), (e), and (g). While Figs. 2 (b), (d), (f), and (h) highlighted all the amino acids involved in the ligand-receptor binding. Sharma and Deep previously reported withaferin as a potential phytocompound against Mpro, and their experiments generated a dock score of − 8.9 kcal/mol [40]. Our result however is substantiated and better than the previous findings as reported elsewhere [40].

Fig. 2
figure 2figure 2

Molecular dock pose of a withaferin-A and Mpro complex, b withaferin-A-Mpro binding in 2D, c hesperidin and Mpro complex, d hesperidin-Mpro binding in 2D, e baicalin and Mpro complex, f baicalin-Mpro binding in 2D, g methylprednisolone and Mpro complex, h methylprednisolone-Mpro binding in 2D

The phytocompound withaferin-A binds to the SARS-CoV2 Mpro strongly, and the polar and nonpolar amino acid residues involved are GLN192, THR190, PRO168, LEU167, ALA191, GLN189, ARG188, GLU166, HIS164, MET49, MET165, HIS41, GLY143, CYS145, LEU27, THR25, and THR26, while CYS145, HIS163, ASN142, GLU166, HIS41, THR45, LEU141, PHE140, HIS172, SER144, HIS164, THR25, VAL42, CYS44, GLU47, ALA46, MET49, ASP187, ARG188, GLN189, and MET165 amino acids are involved in hesperidin::Mpro binding. The baicalin and SARS-CoV2 Mpro binding involves ARG88, GLN83, GLU178, TYR37, LYS102, TYR101, ASP33, THR98, PRO99, LYS100, and PHE103 amino acids, while the reference drug methylprednisolone involves THR190, GLN192, GLU166, LEU141, HIS163, ASN142, GLY143, GLN189, ARG188, MET49, MET165, HIS41, HIS164, HIS172, CYS145, SER144, and PHE140 amino acids to bind to the catalytic core of SARS-CoV2 Mpro. Sharma and Deep however previously reported that the withaferin-A shows interactions with residues Thr 24, Thr 25, Cys 44, Ser 46, Met 49, His 41Leu 141, His 164, Phe 140, Asn 142, and Glu 166 of the SARS-CoV2 Mpro.

Table 3 depicts withaferin-A, hesperidin, and baicalin having the best dock scores having comparable amino acids involved in the catalytic core as is present in the reference drug methylprednisolone, and thus these 3 phytocompounds are considered to generate small molecule fragment derivatives.

Table 3 Binding energy and active site residues present in the catalytic core of Mpro(2gz9)-ligand complex

The molecular docking and QSAR analysis was performed to screen out the best potential hits against SARS-CoV2 main protease (Mpro) from the phytocompounds by taking reference FDA-approved synthetic drugs. Prashanth and co-workers, 2020 [41] recently reported tenufolin as a highly effective phytocompound against SARS-CoV2 showing a dock score of − 8.8 kcal/mol. Our results for initial molecular docking studies successfully identified withaferin-A to be the best inhibitor of Mpro whose dock score(− 9.22 kcal/mol) surpassed every other synthetic drugs taken as reference. Reports from Kumar and co-workers, 2020 [22, 42], revealed that withaferin-A and withanone could also bind and stably interact to the catalytic site of TMPRSS2 which further validates the strong potential of withaferin compounds against SARS-CoV2. Also, we considered phytocompounds hesperidin (− 6.87 kcal/mol) and baicalin (− 6.68 kcal/mol) for further analysis owing to their significantly better binding energies over other chosen compounds on an average.

Generation of small molecule derivatives

The deep learning and neural networking–based LigDream tool (https://www.playmolecule.org/LigDream/) within the PlayMolecules software generated 100 small molecule derivatives for each of the three phytocompounds. All the derivatives were tested for the adsorption, distribution, metabolism, and excretion analysis using SwissADME software. Only those compounds were chosen which had no violations of Lipinski’s rule of 5 and had synthetic accessibility score between 1 and 4. This step of screening resulted in 8 derivative compounds from withaferin-A, 1 from hesperidin, and 21 derivatives of baicalin. A final set of 30 molecules was obtained from 300 derivatives generated from 3 phytocompounds. All 30 derivatives thus obtained were again docked with the SARS-CoV2 main protease target. Tables S-2(A), S-2(B), and S-2(C) (supplementary material) show the best derivatives screened on the basis of ADME profiling from the 100 derivatives generated from each phytocompounds along with their binding energies and inhibitory concentrations obtained by molecular docking with Mpro.

The best small molecule fragment derivatives from the 3 phytocompounds obtained were re-docked with the main protease of SARS-CoV 2 to identify a set of finest small molecule inhibitors of Mpro based upon the dock scores generated. Two derivates of withaferin-A, 1 derivative of hesperidin, and 4 derivatives from baicalin were considered as the best hits for QSAR analysis.

QSAR analysis

Dataset selection and descriptor calculation

In present work, the dataset selection and descriptor calculation for the QSAR analysis involved the collection of a set of 18 molecules (Table S-3, supplementary material) as inhibitors against the main protease of SARS-CoV 2 virus from the CHEMBL database. The dataset used comprises of varied classes of compounds where the experimental activity of each compound is expressed in IC50 (nM) values. For model development, we have converted the IC50 values to pIC50 (pIC50 = -logIC50) values. All the compounds were drawn using MarvinSketch, followed by cleaning of molecules. The descriptors were computed using MarvinSketch.

Analysis and prediction of IC50 of phytocompounds

The QSAR analysis (Fig. 3) with descriptors molecular weight (MW), LOGP, refractivity, polar surface area (PSA), polarizability, and molar surface area (MSA) reveals an Rsq = 67.25%, adjusted Rsq = 49.39%, F statistics = 3.76, and critical F = 2.70. Among all the descriptors, polarizability demonstrated a negative correlation of − 0.02 with the activity. It is also important to mention that LOGP contributed in activity to a greater extent with a percentage contribution of 44%. The predicted IC50 values of 10 test data including 7 best docked derivative molecules and their 3 parent phytocompounds are depicted in Table 4.

Fig. 3
figure 3

QSAR activity plot and governing equation for the derivative molecules and phytocompounds

Table 4 Predicted IC50 and corresponding values of descriptors obtained through QSAR analysis

Molecular docking analysis of withaferin-A derivatives with M PRO

The molecular docking studies of SARS-CoV2 main protease (Mpro) with withaferin-A derivative molecules 61 and 64 (Fig. 4a–d) having the lowest IC50 values as obtained by the QSAR study generated significant binding free energies of − 7.84 kcal/mol and − 7.94 kcal/mol, respectively (Table 5). The docking study reveals that the withaferin-A mol 61 forms 5 polar H-bonds with the Mpro where amino acids involved are GLU166, THR190, CYS145, MET165, and GLN152 (Table 5), and the withaferin-A mol 64 forms 4 polar H-bonds with Mpro involving amino acids THR190, GLN192, CYS145, and GLU166 (Table 5), while Table 3 shows that the parent molecule withaferin-A forms only 2 polar H-bonds with Mpro with amino acids GLN192 and THR190 involved. Thus, the withaferin-A derivatives are predicted to exhibit better binding with the target protease of SARS-CoV2 over the parent withaferin-A molecule.

Fig. 4
figure 4

Molecular dock pose of a withaferin-A mol 61 and Mpro complex, b withaferin-A mol 61::Mpro binding in 2D, c withaferin-A mol 64 and Mpro complex, d withaferin-A mol 64::Mpro binding in 2D, e pharmacophore model of withaferin-A derivative molecule 61 showing pharmacophore interaction at the binding site of main protease (Mpro), f pharmacophore model of withaferin-A derivative molecule 64 showing pharmacophore interaction at the binding site of the main protease (Mpro)

Table 5 Binding energy and active site residues present in the catalytic core of Mpro(2gz9)-withaferin-A derivative molecules 61 and 64

Our study identified the 2 withaferin derivatives to be the top hits among all other derivatives owing to their relatively low predicted IC50 values. Further upon comparative analyzing the ligand-receptor interactions of the derivatives and parent withaferin-A molecule with Mpro reveals 4 to 5 polar hydrogen bonds of derivatives in the catalytic region of Mpro while only 2 polar H-bonds were involved in the interaction of the parent withaferin-A molecule with Mpro.

In silico cytotoxicity prediction and analyses

Withaferin-A fragment derivative molecules 61 and 64 that showed the lowest predicted IC50 values were selected to predict the biological spectrum by PASS. It finds the probability of activity and inactivity against tumor and non-tumor cells out of a maximum probability score of 1. In the PASS filter, the significant anticarcinogenic activity was displayed with osteosarcoma for both the withaferin-A derivative molecules 61 and 64 having an active coefficient 0.414 and 0.412, respectively (Table 6). Nonetheless, these derivatives have a significant role in inhibiting the growth of major carcinoma cell lines such as colon adenocarcinoma, gastric carcinoma, stomach carcinoma, prostate carcinoma, ovarian adenocarcinoma, thyroid gland undifferentiated (anaplastic) carcinoma, T-lymphoid carcinoma, and osteosarcoma. These results indicate a strong potential for anticarcinogenic activity. It is also shown in Table 7 that the activity of the fragment derivatives maintained the growth of embryonic lung fibroblasts (0.254, 0.226), embryonic kidney fibroblasts (0.208), umbilical vein endothelial cells (0.163, 0.413), lymphocytes (0.092, 0.085), and fibroblasts (0.135). This bioactivity analysis confirms the role of withaferin derivative molecules 61 and 64 in maintaining human health against tumor generation and inflammation.

Table 6 PASS prediction coefficient with tumor cell lines based on the best phytocompound derivatives with lowest IC50 obtained after QSAR analysis
Table 7 PASS prediction coefficient with non-tumor cell lines based on the best phytocompound derivatives with lowest IC50 obtained after QSAR analysis

Generation of pharmacophore models

Pharmacophore modeling based on the principle of Lipinski’s rule of five displayed a more accurate picture of the withaferin-A derivative ligand interaction with the binding site of the SARS-CoV2 main protease more accurately to predict the drug likeliness. The pharmacophore model of derivative molecule 61 displayed that it has four hydrogen bond acceptors and three hydrophobic interactions which is a major parameter of drug likeliness (Fig. 4e). Moreover, the Pharmer webserver identified 6 new hits for derivative molecule 61 by identifying its hydrophobic, hydrogen bond donor/acceptor, positive/negative ions, and other pharmacophore features and are enlisted in Table S-4 (supplementary material). The pharmacophore model of derivative molecule 64 displayed that it has 4 hydrogen acceptors and also 4 hydrophobic interactions (Fig. 4f). This model did not generate any further hit compound. Withaferin derivative molecule 61 is considered to be more potent than derivative mol 64 and thus carried forward for further investigations.

ADME toxicity prediction of withaferin-A fragment derivatives

In order to be an effective drug, a potent molecule must meet its target in the body in adequate concentration and remain there in a bioactive form sufficiently long to cause the predicted biological events. Early in the drug discovery process, assessment of absorption, distribution, metabolism, and excretion (ADME) occurs at a stage when chosen compounds are abundant, but access to the physical samples is restricted. Withaferin-A derivative molecule 61, using the SwissADME tool showed a significant lipophilicity of 1.47 (Fig. S-5A, supplementary material). Typically, lipophilicity falls in a range between (XLOGP3) − 0.7 and + 5.0. The molecular weight of the molecule was shown to be 373.51 g/mol, where the acceptable range is between 150 and 500 g/mol). The molecule’s polarity, measured as topological polar surface area (TPSA), was found to be 89.13 Å2, where the acceptable range is typically between 20 and 130 Å2. The solubility was derived at log S -2.95, a value not recommended higher than 6.0. Taken together, these data indicate that the molecule is soluble in water and moderately polar with some bond flexibility (measured at 8.0 where a value greater than 9.0 indicates rotatable bonds). Given these analyses, the compound is predicted to be an orally bioavailable or administered drug.

Other pharmacokinetic parameters were found. The drug had a high absorption in the GI tract (white oval shape, Fig. S-5B, supplementary material) and had permeability across the blood-brain barrier. These parameters are typically failed by most drug candidates but were apparent with withaferin-A derivative molecule 61. In contrast, the withaferin-A derivative molecule showed high blood-brain barrier (BBB) permeability (yellow oval shape, Fig. S-5B, supplementary material).

One of the major physiological phenomena in human and higher eukaryotes is to efflux the drug molecule from the central nervous system for protection. This poses a major concern in drug delivery systems. In this case, the derivative molecules are retained within the neurological system where the CNS did not show any signs of efflux (no blue spot within the yellow oval, Fig. S-5B, supplementary material). On the other hand, the molecule also displayed a reasonable skin permeation of log Kp − 7.53 cm/s. Hence, the drug can be considered as a safe and effective potential fragment derivative of withaferin-A.

Molecular dynamics and simulation study

For MD simulation, systems were developed both for SARS-CoV2 main protease (Mpro) and withaferin-A derivative mol 61 bound complex that were analyzed for 50 ns. RMSD plots thus generated from the MDS production displayed the conformational change of the target protein, i.e., SARS-CoV2 Mpro, as displayed in Fig. 5a, whereas withaferin-A derivative molecule 61 bound Mpro displayed more stability as compared to the SARS-CoV2 Mpro alone (Fig. 5b). Initial changes were observed in RMSD from 1.2 to 3.4 Å for the first 18 20 ns, and later large conformational changes observed at 65–78 ns before final stabilized conformation were achieved (Fig. 5a). By contrast, the Mpro-withaferin-A derivative molecule 61 bound complex showed much better stability in the Cα-backbone conformation. The ligand conformation over the aligned position of Cα-backbone of the main protease displayed the conformational difference of 1.6 Å until 60 ns due to the diffused pattern. Later, the stabilized RMSD was observed from 60 to 100 ns due to ligand accommodation at the binding cavity of Mpro (Fig. 5b). A final conformational variation was observed for the bound complex from the beginning of the simulation till the end at 50 ns was 0.8 Å (Fig. 5b). This suggested that the ligand-bound state of the SARS-CoV2 main protease gained much more stability during the simulation as compared to the unbound state. Moreover, RMSF plots exhibited the evidence derived from the RMSD plot where clearly visible that the positional fluctuations of amino acids were more in as compared to the withaferin-A derivative bound state (Fig. 5c and d). The diffusion of ligand at the initial stage of the simulation indicated the entry movement through the SARS-CoV2 main protease receptor complex and later stabilized due to deep entry inside the binding cavity and significant binding. In addition, ligand RMSF plot (Fig. 5e) exhibited the interaction position of withaferin-A derivative molecule 61 within SARS-CoV2 during the simulation. The protein-ligand complex is first aligned on the protein backbone, and then the ligand RMSF is measured on the ligand heavy atoms. The ligand root-mean-square fluctuation reveals that the atomic residues from 1 to 12 illustrate elevated fluctuations. But the fluctuation curve from 12th atom to 13th exhibits a sharp ramp which means that at this point, the ligand is entropically changed and thereby fits the catalytic core of SARS-CoV-2 Mpro (Fig. 5e).

Fig. 5
figure 5

a SARS-CoV2 Mpro displaying the Cα backbone of structural conformation showing large a change from beginning to the end of the simulation, b aligned Mpro::withaferin-A mol 61 bound complex displaying the Cα backbone and withaferin-A mol 61 conformation allowing less changes over time during simulation. RMSF plot of c Mpro displaying large fluctuations with respect to particular amino acids on the Cα backbone and d displaying more stable complex in withaferin-A derivative mol 61 bound state, and e ligand RMSD plot displaying the atomic positions throughout the simulation

Our docking results, QSAR studies, and MD simulations are in coherence with the previously published literatures. Our study shows that the binding energy for the ligand withaferin-A and its 2 best derivatives mol 61 and 64 are considerably high (refer to Table 5) and their respective inhibitory constant (ki) values are low (refer to Table S- 2A, supplementary material). Upon QSAR study, the predicted activity (IC50) comes the lowest for withaferin derivative mol 61 and 64. Hence, we conclude that the predicted molecules are having high affinity towards the target and the ligands will show activity even at lower concentration. Results on data mining–based predictions of some hit fragments from natural compounds are reported by Ghosh et al. (2020) [43], but our study is the first to report the effectiveness of small molecule fragment derivative of withaferin-A against SARS-CoV2 Mpro by integrating DNN and machine learning–based tool in screening out derivatives against SARS-CoV2 which still retain lead-like properties. The predictions in this report may open new possibilities for the use of small molecule inhibitor drugs to successfully combat COVID-19.

Conclusion

The COVID-19 outbreak originated by the highly pathogenic SARS-CoV-2 coronavirus has posed a major threat to public health and needs urgent intervention. With the current global crisis, all successful diagnostics and novel treatments need to be produced at a reasonable price with limited to no side effects. Over the last 30 years, structural bioinformatics and cheminformatics have emerged as an effective drug discovery technique. In this regard, 3D target protein frameworks have played important roles in designing as well as the development of novel or alternative drugs. Medicinal plants are considered a significant source for the treatment of various diseases. In the current study, the antiviral potential of some phytoconstituents and their small molecule derivatives was studied. The results of molecular docking, QSAR analysis, and MD simulations suggested that withaferin-A and associated fragment derivatives may act as an inhibitor for the Mpro protease of SARS-CoV-2. Withaferin-A, a bioactive withanolide from Ashwagandha, was shown to possess inhibitory activity for HPV and a wide range of influenza viruses. Based on previous reports as well as the results presented here, we propose withaferin-A derivatives as efficient lead compounds of potential drugs for combatting COVID-19. The six hit compounds generated by the pharmacophore model of withaferin-A derivative molecule 61 from the ZINC database might be used to screen for anti-CoV activities. Further, experimental work for all of the compounds predicted in this study needs to be carried out in order to verify specific drug likeliness in greater depth. The in silico strategy of integrating DNN and machine learning–based tool adopted here might be utilized to explore the potential applications of several other medicinal phytocompounds and also the available drugs against COVID-19. Finally, a line of caution: prior to using any outcome of an in silico study, a rigorous in vivo and in vitro research is obligatory.