Introduction

During the past decades, humans have faced a great challenge of several viral infections like HIV, Influenza, herpes, etc. The recent outbreak of SARS-CoV-2 has posed a great concern on human existence all over the world [1]. The intrinsic morbidity and mortality as well as the related deaths because of respiratory illness, make SARS-CoV-2 a major and recurrent global public health concern. The SARS-CoV-2 (Family-Coronaviridae) virus is enveloped, positive-sense, and has a single-stranded RNA genome of 30 kb which encodes more than 20 proteins. Proteins of SARS-CoV-2 can be grouped into structural proteins and non-structural proteins. Among many reported drug targets, 3C-like protease (3CLpro) or main protease (Mpro) is considered an important drug target [2] because it cleaves poly protein pp1a and pp1ab to create functional proteins. An RNA-dependent RNA polymerase, a helicase, a single-stranded RNA-binding protein, an exoribonuclease, an endoribonuclease, and a 2′-O-ribose methyltransferase are among the 11 proteins which are cleaved by 3CLpro to yield distinct functional proteins [3]. Thus 3CLpro is required for coronavirus replication, and it is also not found in host cells, making it a suitable target for antiviral medicines [4]. Moreover, 3C-like protease is considered as a key enzyme for the survival and growth of the virus [5].

According to WHO, there are currently more than 50 COVID-19 vaccine candidates in trials. However, they can pose some safety risk and the efficacies rate of these vaccines are 95% for COVID-19 mRNA vaccine BNT162b2 (Pfizer), 70.4% for ChAdOx1 nCoV-19 vaccine/AZD1222 (AstraZeneca) vaccine, 78% for sinovac, 94.1% for mRNA-1273 vaccine (Moderna), and 81% for Covaxine (Bharat Biotech), respectively [6]. In India, two vaccines Covishield and Covaxine have been given approval for India's immunization program and the efficacy of vaccines are 81%. Despite all these, the treatment of SARS-CoV-2 is still challenging because of emerging mutations and unexplained complications in many patients. Moreover, due to continue mutations, SARS-CoV-2 is developing new strains which are more dangerous than previous ones. So keeping this problem in mind we need to discover new drugs for the future challenges.

Thus, in the currents situations of world pandemic, the development of novel antiviral drugs is much needed to provide successful treatment. Since the synthesis of new drugs is very challenging, we conducted the repurposing of drugs available in the Drug bank database against SARS-CoV-2 enzyme 3CLpro for novel testable hypotheses for systematic drug repurposing [7].

We anticipate that the results of this research may be helpful in the discovery of novel drug candidates against SARS-CoV-2.

Material and methods

Sequence alignment and basic local alignment search tool (BLAST)

Sequence alignment was done to determine the suitability of inhibitor dataset of 3C-like protease of SARS-CoV-1 for screening inhibitors against 3C-like protease of SARS-CoV-2. Therefore, sequence alignment was carried by BLASTp tool (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins) using sequence 3C-like protease of SARS-CoV-1 and SARS-CoV-2. FASTA format of both protein sequences was downloaded from the protein databank database (PDB-ID 2GZ7 for SARS-CoV-1 and PDB-ID 6W63 for SARS-CoV-2) and subjected for BLAST.

Predictive modeling by deep learning

In this study, a deep learning algorithm was prepared to develop a predictive model for the screening of novel compounds against COVID-19. A predictive model was developed from deep learning online server (http://deepscreening.xielab.net) [8]. The CHEMBL3927 dataset was used to build the predictive model, which included the IC50 value for inhibiting the activity of SARS coronavirus 3C-like protease. SARS-CoV-2 datasets were unavailable against 3CLpro, so in this study we used CHEMBL3927 dataset which is a set of inhibitors for 3CLpro of SARS-CoV-1 that was preprocessed for molecular vectorization by using PubChem fingerprint which generated 881 fingerprints using PaDEL software [9]. In deep learning algorithm, deep recurrent neural networks (RNN) were used to construct a regression model using Pubchem fingerprints. Various models were developed by manually optimized hyperparameters such as learning rate, epoch, batch size, number of neurons, and hidden layer. For the creation of models, ReLU (y = (max(0,1)) activation function was used for hidden layers, while the sigmoid function was used for the output layer.

Model evaluation

The validation of deep learning models was done using several statistical matrixes. In this analysis, regression algorithm was considered to develop deep learning models, and we used various statistical parameters such as R squared (R2), Mean squared error (MSE), Root MSE (RMSE), and Mean absolute error (MAE) to evaluate model efficiency.

$$ {\text{MSE}} = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \left( {y_{i} - } \right.\left. {\hat{y}} \right)^{2} $$
$$ {\text{RMSE}} = \sqrt {{\text{MSE}}} = \sqrt{\frac{1}{N}} \mathop \sum \limits_{i = 1}^{N} \left( {y_{i} - } \right.\left. {\hat{y}} \right)^{2} $$
$$ {\text{MAE}} = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \left| {y_{i} - \hat{y}} \right| $$
$$ R^{2} = 1 - \frac{{\sum \left( {y_{i} - } \right.\left. {\hat{y}} \right)^{2} }}{{\sum \left( {y_{i} - } \right.\left. {\overline{y}} \right)^{2} }} $$

where \(y_{i} \) = Observed value. \(\hat{y}\) = Predicted value. \(\overline{y}\) = Mean value.

Protein preparation

The Protein Data Bank (https://www.rcsb.org) server was used to obtain the crystal structure of 3CLpro (PDB-ID 6W63) that is bound with X77 a potent non-covalent inhibitor of SARS-COV-2 [10]. Further, using PyMOL software, all water molecules, ions, and ligands were removed from the protein structure, and then, hydrogen atoms were added to the protein using MGL Tools [11]. Reference molecule N-(4-tert-butylphenyl)-N-[(1R)-2-(cyclohexylamino)-2-oxo-1-(pyridin-3-yl)ethyl]-1Himidazole-4-carboxamide (X77) (PubChem ID-145998279), were downloaded from PubChem server.

Molecular docking and visualization

The analysis of molecular docking calculation was conducted by Autodock Vina using PyRx open-source software (GUI version 0.8) [11] to obtain a population of possible orientations and binding energy of compounds at the active sites of the protein. Molecular docking analysis was first carried out with a reference molecule to verify the docking procedure using coordinates: X = − 23.05, Y = 13.32, and Z = − 29.93 with dimensions of the grid box 25 × 25 × 25 Å against 6W63. After that, virtual screening with ligand molecules was carried out with protein and the result of molecular docking was extracted. For further study, the best confirmation of the compounds with lower binding energy than the reference molecule was selected. Finally, Lig plot + v.1.4.5 software was used to confirm molecular interactions between protein–ligand complexes, including hydrogen bonds and the bond lengths.

Molecular dynamics simulations

The obtained complexes from molecular docking were subjected to MD simulations using the GROMACS 5.0.7 [12] package after a comprehensive screening study. Topologies for protein and protein–ligand complexes were produced using the CHARMM 36 force field [13]. All of the complexes and single protein structures were solvated in the water model after the topology file was created, and these structures were neutralized by adding ions. Furthermore, these structures were relaxed using an energy minimization approach involving the steepest descent Algorithm and the Verlet cut-off scheme that was run for 50,000 cycles at 10 kJ/mol. The equilibration step of protein and ligands complex was performed on NVT (constant volume) as well as NPT (constant pressure) for 100 ps trajectory period. After equilibration step, the simulation analysis was calculated at 300 K temperature and 1 atm pressure using 2 fs time step for a 100 ns. The trajectory files produced were used to visualize the deviation of each protein and complex in order to determine the system's stability in a water environment. To investigate the deviation between protein and ligand complexes Root mean square variance (RMSD), Root mean square fluctuation (RMSF), Radius of gyration (RG), hydrogen bonds, Solvent accessible surface area (SASA), and Principal component analysis (PCA) were used. Further, we calculated the interaction energy between protein and ligands to calculate the strength between protein and ligand. Furthermore, Molecular Mechanics Poisson–Boltzmann Surface Area (MM-PBSA) method was used to calculated the total binding free energy using g_mmpbsa package in GROMACS 5.0.7 software, the free solvation energy (polar + non-polar solvation energies), and potential energy (electrostatic + Van der Waals interactions) of each protein–ligand complex for last 30 ns time period [14].

Functional group analysis

The functional group’s frequency analyses of all compounds were calculated by R (version 3.4.3) software using the library of “ChemmineR” [15]. Nine functional groups Ester group (RCOOR), Carbonyl group (RCOR), Nitrile (RCN), Primary amine (RNH2), Carboxyl group (RCOOH), Hydroxyl group (ROH), Ether group (ROR), Secondary amine (R2NH), and Tertiary amine (R3N), and Aromatic groups and rings were analyzed of hit compounds and compared with reference compounds.

Results

BLAST results

With coverage of 100 percent query sequence, BLAST results showed that 3CLpro of SARS-CoV-1 has 96.08% identity to 3CLpro of SARS-CoV-2 (Fig. S1). The 96.08% identity suggests that there is enough functional similarity between SARS-CoV-1 and SARS-CoV-2. Therefore, dataset of inhibitors of 3CLpro of SARS-CoV-1 can be used for prescreening a broad dataset of drugs for repurposing against 3CLpro of SARS-CoV-2.

Predictive modeling and virtual screening

As a result of the high similarity in structure between SARS-CoV-1 and SARS-CoV-2 3CLpro enzymes and the lack of a dataset against SARS-CoV-2, we used the SARS-CoV-1 dataset for deep learning prescreening of a wide library of drug bank datasets. In this study, we created ten models with various hyperparameters, which were manually optimized and analyzed using statistical parameters (Table S1), and the best model (Number 4) was chosen from all of them whose learning rate was 0.01, Epochs was 80, batch size was 16, hidden layers was 3, and neuron numbers were 2000, 700, 200, and the activation function was ReLU, Drop out was 0, and output function was sigmoid. Compared with other models, the best model displayed a reasonable range of statistical parameters and provided good results with a 0.26 loss value, a 0.72 R2 value, 0.26 MSE value, 0.51 RMSE value, and a 0.41 MAE value (Fig. S2). Furthermore, the best deep learning model for virtual screening was developed based on the dataset of Drug bank compounds. The best-predicted model screened 500 compounds, which were then subjected to molecular docking.

Molecular docking and visualization

To verify the docking protocol the reference molecule (X77) was re-docked with a protein. The results of the re-docking showed that the reference molecule X77 was fully superimposed on a co-crystallized reference molecule (Fig. 1), with an RMSD of 0.62. The Reference compound X77 had a − 8.4 kcal mol−1 binding energy and showed interaction with His163, Gly143, Glu166, and Cys145. Ten compounds’ binding energy was lower as compare to reference compound, and it was observed that {4-[(2s,4e)-2-(1,3-Benzothiazol-2-Yl)-2-(1h-1,2,3-Benzotriazol-1-Yl)-5-Phenylpent-4-Enyl]Phenyl}(Difluoro)Methylphosphonic Acid was showing hydrogen bonds with Gln192, Thr190, Arg188, and Met165 with the − 10.4 kcal mol−1 binding energy, Ergotamine was interacting with Gln166 and Gln189 with the − 9.7 kcal mol−1 binding energy, PF-03882845 forms hydrogen bonds with Phe140 with the − 9.7 kcal mol−1 binding energy, Bromocriptine was found to be interacting with Thr190 with the − 9.4 kcal mol−1 binding energy, 1-(3-(2,4-dimethylthiazol-5-yl)-4-oxo-2,4-dihydroindeno[1,2-c]pyrazol-5-yl)-3-(4-methylpiperazin-1-yl)urea was made hydrogen bonds with His164 with the − 9.4 kcal mol−1 binding energy, Omipalisib was interacting with Tyr54, Glu166, and Thr24 with the − 9.3 kcal mol−1 binding energy, (1 s)-1-(1h-indol-3-ylmethyl)-2-(2-pyridin-4-yl-[1,7]naphtyridin-5-yloxy)-ehylamine forms hydrogen bonds with Gln166, Phe140, and His41 with the − 9.3 kcal mol−1 binding energy, 2-(2f-Benzothiazolyl)-5-Styryl-3-(4f-Phthalhydrazidyl)Tetrazolium Chloride makes hydrogen bonds with Ser144 and Cys145 with the − 9.3 kcal mol−1 binding energy, MK-7622 was made hydrogen bonds with Glu166, Ser144, His163, and Cys145 with the − 9.3 kcal mol−1 binding energy, SGX-523 was found to be interacting with Phe140 with the − 9.3 kcal mol−1 binding energy (Fig. 2) (Table 1).

Fig. 1
figure 1

a 3D structure of protein and reference complex b The 3D structure of super-imposition of the docked reference molecule (X77) with its X-ray crystal structure c The 2D structure of super-imposition of the docked reference molecule (X77) with its X-ray crystal structure

Fig. 2
figure 2

Hydrogen bonds and hydrophobic bond interactions between protein complexes derived from virtual docking. The green color indicates hydrogen bonds and red color shows hydrophobic bonds with an amino acid of 3CLpro

Table 1 Drug compounds name and their binding energy against 3CLpro

Molecular dynamics simulation (MDS)

In this study, we conducted MDS to evaluate the stability of 3CLpro ligand complexes and to find deeper insight into conformation and structural changes of the top-ranking lead compounds as a final filter for the selection of hit compounds. The top five compounds {4-[(2s,4e)-2-(1,3-Benzothiazol-2-Yl)-2-(1h-1,2,3-Benzotriazol-1-Yl)-5-Phenylpent-4-Enyl]Phenyl}(Difluoro)Methylphosphonic Acid, Ergotamine, PF-03882845, Bromocriptine, and 1-(3-(2,4-dimethylthiazol-5-yl)-4-oxo-2,4-dihydroindeno[1,2-c]pyrazol-5-yl)-3-(4-methylpiperazin-1-yl)urea which showed good binding energy as compared to reference were subjected to MD simulation and two compounds {4-[(2s,4e)-2-(1,3-Benzothiazol-2-Yl)-2-(1h-1,2,3-Benzotriazol-1-Yl)-5-Phenylpent-4-Enyl]Phenyl}(Difluoro)Methylphosphonic Acid and 1-(3-(2,4-dimethylthiazol-5-yl)-4-oxo-2,4-dihydroindeno[1,2-c]pyrazol-5-yl)-3-(4-methylpiperazin-1-yl)urea confirmed good stability with 3CLpro for the 100 ns simulation in term of RMSD, RMSF, SASA, Rg, and PCA.

Root mean square deviation

The Root Mean Square Deviation (RMSD) calculation of all complexes with protein was calculated to analyze the deviation of compounds for 100 ns trajectory period. The RMSD plot of all protein–ligand complexes (3CLpro-X77, 3CLpro-{4-[(2s,4e)-2-(1,3-Benzothiazol-2-Yl)-2-(1h-1,2,3-Benzotriazol-1-Yl)-5-Phenylpent-4-Enyl]Phenyl}(Difluoro)Methylphosphonic Acid, and 3CLpro-1-(3-(2,4-dimethylthiazol-5-yl)-4-oxo-2,4-dihydroindeno[1,2-c]pyrazol-5-yl)-3-(4-methylpiperazin-1-yl)urea) showed their stability with protein. In this study, the average value of RMSD is 0.181 nm (green), 0.189 nm (blue), respectively, as compared to the reference 0.17 nm (red) (Table 2). As a result, the analysis of the RMSD plot revealed that protein and complexes achieved good stability in 100 ns and produced a stable trajectory for further investigation. Rg, RMSF, SASA, hydrogen bonds, interaction energy, and principal component analysis were also performed for the 100 ns trajectory period (Fig. 3a).

Table 2 The average values of RMSD, RG, SASA, and Interaction energy of protein and Protein–ligand complexes and Hydrogen numbers of Protein–ligand complexes
Fig. 3
figure 3

Binding stability analysis of the screened ligands during 100 ns molecular dynamics simulation (3CLpro-reference (Red), 3CLpro-{4-[(2s,4e)-2-(1,3-Benzothiazol-2-Yl)-2-(1h-1,2,3-Benzotriazol-1-Yl)-5-Phenylpent-4-Enyl]Phenyl}(Difluoro)Methylphosphonic Acid (green) and 3CLpro-1-(3-(2,4-dimethylthiazol-5-yl)-4-oxo-2,4-dihydroindeno[1,2-c]pyrazol-5-yl)-3-(4-methylpiperazin-1-yl)urea (blue)) a Root Mean Square Deviation RMSD, b Root Mean Square Fluctuation RMS, c solvent accessible surface area (SASA) and d Radius of gyration (SASA)

Root mean square fluctuation (RMSF)

The local changes of compounds, as well as the protein chain residues, were analyzed using the Root Mean Square Fluctuation (RMSF) measurement at a particular temperature and pressure.

During the 100 ns trajectory period, there were very few variations in the constituent residues of 3CLpro and all the protein–ligand complexes (3CLpro-X77, 3CLpro-{4-[(2s,4e)-2-(1,3-Benzothiazol-2-Yl)-2-(1h-1,2,3-Benzotriazol-1-Yl)-5-Phenylpent-4-Enyl]Phenyl}(Difluoro)Methylphosphonic Acid and 3CLpro-1-(3-(2,4-dimethylthiazol-5-yl)-4-oxo-2,4-dihydroindeno[1,2-c]pyrazol-5-yl)-3-(4-methylpiperazin-1-yl)urea), which were plotted to compare the flexibility of each residue in the protein and the complex. Figure 3b depicts all complexes fluctuations were under 0.2 nm but 1-(3-(2,4-dimethylthiazol-5-yl)-4-oxo-2,4-dihydroindeno[1,2-c]pyrazol-5-yl)-3-(4-methylpiperazin-1-yl)urea showed a fluctuation of more than 0.25 nm in 45, 46, 277, and 278 residues. At the starting point, all complexes showed fluctuation but these residues are not involved in hydrogen bonds as shown in the lig plot; hence, they can be neglected. In conclusion, it indicated that fluctuation in residues of complexes is significantly similar as compared to reference resulting in less fluctuation and good stability.

Transformation in the accessibility of solvent

The solvent accessible surface area (SASA) parameter calculation was performed to measure the proportion of the protein surface that was reached by water solvent during MDS. SASA can predict the extent of the conformational changes that occur during interaction energy simulation [16]. Figure 3c shows the plot of SASA value vs. time for all the protein–ligand complexes (3CLpro-X77, 3CLpro-{4-[(2s,4e)-2-(1,3-Benzothiazol-2-Yl)-2-(1h-1,2,3-Benzotriazol-1-Yl)-5-Phenylpent-4-Enyl]Phenyl}(Difluoro)Methylphosphonic Acid, and 3CLpro-1-(3-(2,4-dimethylthiazol-5-yl)-4-oxo-2,4-dihydroindeno[1,2-c]pyrazol-5-yl)-3-(4-methylpiperazin-1-yl)urea). The average SASA of protein–ligand complexes is 148.40 nm2 (green) and 152.01 nm2 (blue) as compared to the reference 149.2 nm2 (red) through the molecular dynamics simulation of 100 ns trajectory period (Table 2). All the complexes showed a very similar value of SASA as the reference 3CLpro complex. From the SASA analysis, we have concluded that 3CLpro-ligand complexes are relatively stable.

Radius of gyration

The Radius of gyration (Rg) analysis was done to assess the stability of protein–ligand systems by calculating the structural compactness along the MD trajectories [17]. The Rg calculation was also determined by the stably folded or unfolded of the protein and complexes system. In this study, 100 ns trajectories were used for the Radius of gyration analysis. The graph of Rg as a function of time for protein and all protein–ligand complexes (3CLpro-X77, 3CLpro-{4-[(2s,4e)-2-(1,3-Benzothiazol-2-Yl)-2-(1h-1,2,3-Benzotriazol-1-Yl)-5-Phenylpent-4-Enyl]Phenyl}(Difluoro)Methylphosphonic Acid, and 3CLpro-1-(3-(2,4-dimethylthiazol-5-yl)-4-oxo-2,4-dihydroindeno[1,2-c]pyrazol-5-yl)-3-(4-methylpiperazin-1-yl)urea) is shown in Fig. 3d. The average Rg value of complexes is 1.88 nm (green) and 1.88 nm (blue), respectively, significant as compared to the reference 1.88 nm (red) (Table 2). The result shows that all complexes have relatively similar and consistent values of Rg as compared to the native and reference which indicates that these are perfectively superimposed with each other and have good stability.

Calculation of interaction energy

The interaction energy calculation was carried out to estimate the free interaction energies associated with the 3CLpro-ligand complexes using the Parrinello-Rahman parameter of GROMACS. The average interaction energy of all the complexes was observed in the acceptable range of − 99–− 200 kJ mol−1. The interaction energy of reference complex, 3CLpro-X77 was − 137.521 kJ mol−1, and other complexes 3CLpro-{4-[(2s,4e)-2-(1,3-Benzothiazol-2-Yl)-2-(1h-1,2,3-Benzotriazol-1-Yl)-5-Phenylpent-4-Enyl]Phenyl}(Difluoro)Methylphosphonic Acid were − 186.60 kJ mol−1, and 3CLpro-1-(3-(2,4-dimethylthiazol-5-yl)-4-oxo-2,4-dihydroindeno[1,2-c]pyrazol-5-yl)-3-(4-methylpiperazin-1-yl)urea was − 131.07 kJ mol−1 showed better and significantly good interaction energy than the reference compound (Table 2). The interaction energy results validated the molecular docking results and indicated the screened drug compounds could bind to the 3CLpro favorably and can be used as a drug to treat COVID-19.

Analysis of hydrogen numbers

The hydrogen bond is essential in ligand binding to receptors because it affects drug specificity, metabolization, and adsorption. As a result, during the 100 ns simulation phase, the total number of hydrogen bonds that could be present in the complexes was estimated. Around four hydrogen bonds were observed in the reference complex 3CLpro-X77 while in complexes, five in 3CLpro-{4-[(2s,4e)-2-(1,3-Benzothiazol-2-Yl)-2-(1h-1,2,3-Benzotriazol-1-Yl)-5-Phenylpent-4-Enyl]Phenyl}(Difluoro)Methylphosphonic Acid and four in 3CLpro-1-(3-(2,4-dimethylthiazol-5-yl)-4-oxo-2,4-dihydroindeno[1,2-c]pyrazol-5-yl)-3-(4-methylpiperazin-1-yl)urea (Table 2). The observed bonding parameters showed that all compounds were as effectively and closely bound to the 3CLpro as the reference compounds, X77.

Analysis of principal component in protein–ligand complexes

The projection of their own first (PC1) and second (PC2) eigenvector was used to examine the Gibbs energy landscape (Fig. 4a, b, c). Gibbs free energy landscape examines the path of fluctuation in the two structures for all Cα atoms of the free 3CLpro-X77, 3CLpro-{4-[(2s,4e)-2-(1,3-Benzothiazol-2-Yl)-2-(1h-1,2,3-Benzotriazol-1-Yl)-5-Phenylpent-4-Enyl]Phenyl}(Difluoro)Methylphosphonic Acid, and 3CLpro-1-(3-(2,4-dimethylthiazol-5-yl)-4-oxo-2,4-dihydroindeno[1,2-c]pyrazol-5-yl)-3-(4-methylpiperazin-1-yl)urea complex, and the range of Gibbs energy value is 0–12.5, 0–12, and 0–12.5, respectively. Lower energy is shown by a deeper blue color on the accompanying free energy contour diagram. The free energy spectrum was found to be identical to that of the reference compound. The stable conformational states of these molecules with protein were well demonstrated by these free energies.

Fig. 4
figure 4

Gibbs free energy landscape of compounds (a) 3CLpro-X77 (b) 3CLpro-{4-[(2s,4e)-2-(1,3-Benzothiazol-2-Yl)-2-(1h-1,2,3-Benzotriazol-1-Yl)-5-Phenylpent-4-Enyl]Phenyl}(Difluoro)Methylphosphonic Acid (c) 3CLpro-1-(3-(2,4-dimethylthiazol-5-yl)-4-oxo-2,4-dihydroindeno[1,2-c]pyrazol-5-yl)-3-(4-methylpiperazin-1-yl)urea and Principal Component Analysis (d) Plot of eigenvalues vs. first 40 eigenvectors, (e) First two eigenvectors is describing the protein motion in phase space for all the complexes. The color code for all panels are protein-reference (Red), 3CLpro-{4-[(2s,4e)-2-(1,3-Benzothiazol-2-Yl)-2-(1h-1,2,3-Benzotriazol-1-Yl)-5-Phenylpent-4-Enyl]Phenyl}(Difluoro)Methylphosphonic Acid (Green) and 3CLpro-1-(3-(2,4-dimethylthiazol-5-yl)-4-oxo-2,4-dihydroindeno[1,2-c]pyrazol-5-yl)-3-(4-methylpiperazin-1-yl)urea (blue)

The analysis of PCA was performed to calculate the first few eigenvectors which are important for the overall motion of protein during MD simulation of 3CLpro-X77, 3CLpro-{4-[(2s,4e)-2-(1,3-Benzothiazol-2-Yl)-2-(1h-1,2,3-Benzotriazol-1-Yl)-5-Phenylpent-4-Enyl]Phenyl}(Difluoro)Methylphosphonic Acid and 3CLpro-1-(3-(2,4-dimethylthiazol-5-yl)-4-oxo-2,4-dihydroindeno[1,2-c]pyrazol-5-yl)-3-(4-methylpiperazin-1-yl)urea. For this study, 40 eigenvectors were selected for the calculation of concerted motions. The eigenvalues and the corresponding eigenvector for all the protein–ligand complexes are presented in Fig. 4d. The first ten eigenvector accounts 74.52%, 69.01%, and 75.28%, motions in 100 ns simulation period for 3CLpro-X77, 3CLpro-{4-[(2s,4e)-2-(1,3-Benzothiazol-2-Yl)-2-(1h-1,2,3-Benzotriazol-1-Yl)-5-Phenylpent-4-Enyl]Phenyl}(Difluoro)Methylphosphonic Acid, and 3CLpro-1-(3-(2,4-dimethylthiazol-5-yl)-4-oxo-2,4-dihydroindeno[1,2-c]pyrazol-5-yl)-3-(4-methylpiperazin-1-yl)urea, respectively. Further, a porcupine plot was generated by using the extreme projections on principal component PC1 to visualize the movement directions captured by the eigenvectors (Fig. S3). In the porcupine plot, each Cα-atom has a cone pointing in the direction of the motion of the atom. The cone’s length reflects the amplitude of the motion, and the size of the cone indicates the number of such Ca-atom [18]. The top two eigenvectors are used for visualizing the motion of the backbone atoms [19] that points to the direction and magnitude of selected eigenvectors. Although, there are might be some differences between the simulations concerning the motions. This result suggests that the properties of motions in three protein–ligand complexes were differently described by using the first two PCA. The direction of the arrow in each Cα atom represents the direction of motion, while the length of the arrow characterizes the movement strength. The porcupine plot represents the rotational movements that occur in the protein–ligand complex during the simulation.

For PCA analysis, we selected the first 40 eigenvectors for the calculation of concerted motions for the MD trajectory. MTX (Red), 3CLpro-{4-[(2s,4e)-2-(1,3-Benzothiazol-2-Yl)-2-(1h-1,2,3-Benzotriazol-1-Yl)-5-Phenylpent-4-Enyl]Phenyl}(Difluoro)Methylphosphonic Acid (Green), and 3CLpro-1-(3-(2,4-dimethylthiazol-5-yl)-4-oxo-2,4-dihydroindeno[1,2-c]pyrazol-5-yl)-3-(4-methylpiperazin-1-yl)urea (blue) showed the eigenvalues 5.5 nm2, 2.4 nm2, and 5.4 nm2, respectively, which has been obtained for all complexes by diagonalizing the covariance matrix of atomic fluctuations in decreasing order versus the corresponding eigenvector (Fig. 4d).

Further, a 2D projection plot was generated to analyze the dynamics of protein–ligand complexes via PCA. Hence, we used the first two principal components (PCs), i.e., PC1 and PC2 for analysis of the motions. Figure 4e displays the projection of two eigenvectors for reference compound, MTX (Red) as well as hit compounds 3CLpro-{4-[(2s,4e)-2-(1,3-Benzothiazol-2-Yl)-2-(1h-1,2,3-Benzotriazol-1-Yl)-5-Phenylpent-4-Enyl]Phenyl}(Difluoro)Methylphosphonic Acid (Green) and 3CLpro-1-(3-(2,4-dimethylthiazol-5-yl)-4-oxo-2,4-dihydroindeno[1,2-c]pyrazol-5-yl)-3-(4-methylpiperazin-1-yl)urea (blue). In the 2D projection plot, the stable cluster is represented by the complex that occupies less phase space, while the non-stable cluster is represented by the complex that occupies more space. From the plot, it was found that all complexes occupied the same space as compared to the reference. Hence, all compounds complexes are stable pretty good for drug development.

Average binding energy calculation of protein–ligand complexes

The binding energy is a parameter of the ligand's affinity for a receptor that is measured using the MM-PBSA method by adding the polar, non-polar, and non-bonded interaction energies (Vander Waals and electrostatic interaction). The last 30 ns of MD trajectories were used to measure binding free energies, which are shown in Table 3. The 3CLpro-X77(Ref), 3CLpro-{4-[(2s,4e)-2-(1,3-Benzothiazol-2-Yl)-2-(1h-1,2,3-Benzotriazol-1-Yl)-5-Phenylpent-4-Enyl]Phenyl}(Difluoro)Methylphosphonic Acid, and 3CLpro -1-(3-(2,4-dimethylthiazol-5-yl)-4-oxo-2,4-dihydroindeno[1,2-c]pyrazol-5-yl)-3-(4-methylpiperazin-1-yl)urea showed − 50.699 kJ mol−1, − 89.343, and − 54.648 kJ mol−1 binding free energy, respectively. Hits compounds showed higher binding affinity as compared to the reference molecule. Based on the MM-PBSA results, we observed that all compounds had a strong binding affinity with 3CLpro in terms of binding energy.

Table 3 Table is showing the Van der Waal (VdW), two electrostatic (Elec.), polar salvation, SASA, the binding energy of Protein–Ligand Complexes

Functional group analysis of hits compounds

The group frequency of vital functional groups was analyzed for both hit compounds (Table S2). Among nine groups, five functional groups R2NH (amine), followed by tertiary amines (R3N), Carbonyl group (RCOR), rings, and aromatic were found in the 1-(3-(2,4-dimethylthiazol-5-yl)-4-oxo-2,4-dihydroindeno[1,2-c]pyrazol-5-yl)-3-(4-methylpiperazin-1-yl)urea compound and only rings and aromatic groups were found in the {4-[(2s,4e)-2-(1,3-Benzothiazol-2-Yl)-2-(1h-1,2,3-Benzotriazol-1-Yl)-5-Phenylpent-4-Enyl]Phenyl}(Difluoro)Methylphosphonic Acid compound as compared to reference compounds. The reference compound has four functional groups viz., R2NH (amine), followed by tertiary amines (R3N), rings, and aromatic; we observed from the functional group frequency that rings and aromatic groups are present in both hit molecules and reference molecules, but the number of these groups are higher in both hits compounds as compare to the reference molecule (Table S2).

Discussion

Drug repurposing is a novel idea to discover a potential drug against any disease very fast. Many studies have proven that drug repurposing is an effective strategy to find useful drug candidates for a different disease. In a current situation where the whole world is trying to find a solution to treat COVID-19, drug repurposing may be an effective tool to find a useful drug against COVID-19. A recent study also suggests that some drugs may be used to the treatment of COVID-19. Clinical trials have reported that some drugs like Chloroquine, Lopinavir, ritonavir may be useful to treat COVID-19. Chloroquine is an anti-malarial drug and in a recent study, it was found to inhibit the growth of SARS-CoV-2 in vitro [15]. Another study conducted by Biot et al., 2006, showed that Hydroxychloroquine, an analog of Chloroquine has shown in vitro antiviral activity against SARS-CoV-2 [20].

Before this study, many researchers have found several new drugs against 3CLpro receptors using a repurposing strategy. In a recent study of [21], they found Paritaprevir and Raltegravir have good binding energy against the 3CLpro receptor. In another study of [22], Ritonavir showed a higher binding affinity to 3CLpro receptor. Thus, continuing drug repurposing but using a different strategy, we have also found some new drugs against the 3CLpro receptor of the SARS-CoV-2 virus from the drug bank database containing 9001 drugs. The screening of drug bank compounds was started with a deep learning model. Deep learning models were prepared on the basis of SARS-CoV-1 3CLpro receptor because sequence alignment results suggested that 3CLpro receptor of SARS-CoV-1 and 2 shares 96.08% identity and data of SARS-CoV-1 can be used against SARS-CoV-2. Further, the best model showed its performance with loss (0.26), R2 value (0.72), MSE (0.26), RMSE (0.51), and MAE (0.41) functions. These functions help a network understand whether it is learning in the right direction. The lower value of loss MSE, RMSE, and MAE means the model is perfect. On other hand, the higher R2 value near “1” is best for the model. From our all “4” number model showed good performance among all models and selected for screening. Prescreening by deep learning resulted in 500 compounds which were narrow down up to 10 drugs by molecular docking based on binding affinity against the 3CLpro receptor.

The binding energy of all screened compounds was better than reference compounds. Through these results, we can suggest these compounds can be used against 3CLpro receptor. Though currently, these drug compounds are used to treat some other diseases and some compounds are under the experimental stage. Studies found that Ergotamine is used for therapy to abort or prevent vascular headaches, e.g., migraine, migraine variants, hypertension [23, 24]. Bromocriptine is an approved drug and used for the treatment of acromegaly, Parkinson's disease (PD), type 2 diabetes mellitus, idiopathic hyperprolactinemic disorder, and Neuroleptic malignant syndrome (NMS). Omipalisib, PF-03882845, MK-7622, and SGX-523drug compounds are not approved, but various studies have shown that Omipalisib has been used against different types of cancer treatment [25, 26] MK-7622 is used to treat Alzheimer's disease [27, 28]; PF-03882845 has been used as an antagonist [29, 30]. Other compounds are in the experimental stage, but this study shows that these compounds can be used against coronavirus.

In addition, two drugs were selected based on MD simulation stability and binding energy. These two drugs namely {4-[(2s,4e)-2-(1,3-Benzothiazol-2-Yl)-2-(1h-1,2,3-Benzotriazol-1-Yl)-5-Phenylpent-4-Enyl]Phenyl}(Difluoro)Methylphosphonic Acid and 1-(3-(2,4-dimethylthiazol-5-yl)-4-oxo-2,4-dihydroindeno[1,2-c]pyrazol-5-yl)-3-(4-methylpiperazin-1-yl)urea have indicated good stability during 100 ns dynamics simulation trajectory periods, and residue of compounds also showed less fluctuation in RMSF results which means compounds and protein were bound strongly with each other at specific temperature and pressure. Other calculations Rg, SASA, interaction energy, and PCA results indicated that these two compounds are stable with 3CLpro enzyme and showing their reliability as an inhibitor specific to 3CLpro. Rg calculation showed a little deviation that indicates that protein is compactly packed, and binding of compounds has not affect protein’s rigidity. In the present study, the Rg value has remained relatively consistent throughout the MD simulation, which indicates that the protein is stably folded [31]. The calculation of SASA and hydrogen bond also supports the stable interaction of the ligands to the protein. The value of the interaction energy of protein–ligand complexes was also good. The interaction energy value indicates the strength of protein–ligand complex systems. This study showed that all compounds have higher and significantly better interaction energy with protein as compared to reference compounds and also showed an acceptable range of interaction energy. Further, the analysis of binding energy through MM-PBSA also indicated that compounds binding affinity to 3CLpro enzyme is better.

The functional groups that support the drug molecules’ lipid solubility are often known to as hydrophobic or lipophilic functional groups, e.g., Aromatic groups and rings. The present study showed that antiviral functional groups like R2NH (amine) are abundant in hits compounds, followed by carbonyl groups (RCOR), tertiary amines (R3N), rings, and aromatic [32]. Amines groups have a mildly acidic and alkaline pH in the intestine and are easily ionized in the blood, they are called poor bases, and the most drugs have functional classes. These groups are adept at balancing ionized and non-ionized states. Non-ionized forms are able to pass across cell membranes, while ionized forms have a high water solubility, allowing for intense protein–ligand interactions [33]. On other hand, secondary amines have N–H groups and this group serve as a hydrogen bond donor that allowing the compound to bind strongly to the target protein [34]. In this study, the hit compound 1-(3-(2,4-dimethylthiazol-5-yl)-4-oxo-2,4-dihydroindeno[1,2-c]pyrazol-5-yl)-3-(4-methylpiperazin-1-yl)urea has a higher frequency of amines. Other functional groups like aromatic rings generally involved in Van der Waals interactions with the binding site atoms and can be associate with an aminium (cation formed by protonation of an amine) or quaternary ammonium ion through induced dipole interaction or hydrogen bonding [33]. The presence of rings and aromatics at higher numbers in hit compounds indicates the chemical diversity and their drug-like property. In the recent study of Nand et al., 2020 [32], it was also found that R2NH, R3N, rings, and aromatic groups were higher in reference and screened inhibitors compounds against 3CLpro of COVID-19. Various studies also showed that the structure of screened inhibitors against COVID-19 similar to the current study screened compounds (Fig. S4) [35,36,37,38]. These compounds also have aromatic and ring groups which are also present in our study. It shows that our compounds can also be used against COVID-19.

Finally, as compared to another drug repurposing study, our screened compounds showed good stability with 3CLpro. In a study carried out by Elmezaven et al., 2020, drug compounds showed higher fluctuation as compared to our compounds with main protease enzymes [39]. In an another study done by Bharadwaj et al., 2020, doxycycline, tetracycline, demeclocycline, and minocycline showed higher fluctuation with Mpro enzyme as compared to our compounds [40]. As compared to their study, our compounds exhibited better stability with 3CLpro enzyme. Therefore, we suggest that these compounds can be for further evaluated against coronavirus in in vitro and in vivo conditions.

Conclusion

The present study was carried to discover novel inhibitor molecules against the 3CLpro enzyme of SARS-CoV-2 by using computational techniques. This study can have an important impact on the treatment of the SARS-CoV-2 virus. This study showed two drugs namely {4-[(2s,4e)-2-(1,3-Benzothiazol-2-Yl)-2-(1h-1,2,3-Benzotriazol-1-Yl)-5-Phenylpent-4-Enyl]Phenyl}(Difluoro)Methylphosphonic Acid and 1-(3-(2,4-dimethylthiazol-5-yl)-4-oxo-2,4-dihydroindeno[1,2-c]pyrazol-5-yl)-3-(4-methylpiperazin-1-yl)urea could inhibit the activity of SARS-CoV-2 by targeting the 3CLpro enzyme. Thus from this study, we conclude these compounds can be utilized as potential antiviral candidates against COVID-19 infection. These novel molecules could be utilized for further innovation and development of antiviral compounds against Coronavirus.