Introduction

The COVID-19 pandemic has caused difficult situations throughout the globe. The developments in epidemiology, testing, clinical care, prevention, management, and speed of development of vaccines/therapeutics have been expedited [1,2,3,4,5]. After the initial outbreak of this disease in Wuhan in China, several new mutated strains have been isolated. The UK B.1.1.7, which spread in the UK, was shown to have a significantly higher transmission rate than the previous strains [6]. Many variants have been identified as variants of concern (VOC) based on the evidence of their increased transmissibility, severe disease outcome, chances of antibody escape, and failure in diagnosis and detection. B.1.1.7 with genetic alteration such as Δ69/70, Δ144Y, E484K, S494P, N501Y, A570D, D614G, P681H, first detected in the UK. The variant P.1 carried mutations such as K417N/T, E484K, N501Y, and D614G was first detected in Japan/Brazil. The variant B.1.351, having mutations such as K417N, E484K, N501Y, and D614G, was first isolated from South Africa. And the variants B.1.427 and B.1.429 carrying mutations in L452R, D614G, and S13I, W152C, L452R, D614G, respectively, were identified in the USA-California. These VOCs have shown increased transmissibility and have an impact on antibody neutralization [7,8,9]. A recent press release from the Ministry of Health and Family Welfare in India reported a novel variant of SARS-CoV-2 in India [10]. Since December 2020, an increase in the fraction of samples with E484Q and L542R mutations has been observed. However, the VOCs are reported in samples across the country with a new double mutation in a smaller presence [10].

It has become important to discover and develop new therapeutics for this disease [11, 12]. Several potential drug targets to tackle this disease have been identified, which include viral targets like 3CLpro or Mpro, PLpro, RNA‐dependent RNA polymerase (RdRp), S protein, and host targets like cathepsin L, helicase, furin, TMPRSS2, and ACE2 [13,14,15,16]. RNA-dependent RNA polymerase (RdRp) is a viral enzyme with no host cell homologs; therefore, the inhibitors of SARS-CoV-2 RdRp are selective. These RdRp inhibitors are expected to have improved potency and fewer off-target effects against human host proteins and thus are safer and more effective therapeutics for treating COVID-19 [17]. On the contrary, the SARS-COV-2 main protease (Mpro) is an effective target; however, due to the presence of the residue Ser46 in between the Cys44-Pro52 loop causes significantly reduced capability of the inhibitors to reach the binding site. Additionally, the active site of the Mpro is comprised of four pockets S1, S2, S3, and S4; hence, it becomes challenging to develop a drug with an affinity for all these four pockets in order to inhibit Mpro activity [18].

RdRp, also known as nsp12, is a non-structural protein that is reported to have an important role during the replication cycle of RNA viruses and is known to be conserved in several other viral species, including hepatitis C virus, influenza virus, coronavirus (CoV), and Zika virus [19]. It shares 96% sequence similarity between SARS-CoV and SARS-CoV-2 [20], and structural differences are found in areas other than the catalytic domain. RdRp plays a principal role in the replication and transcription cycle of SARS‐CoV‐2, possibly with the aid of nsp7 and nsp8 as cofactors [21]. In CoV, the synthesis of RdRp is vital for the production of the viral progeny genome [19].

The core structure of the RdRp of the SARS-CoV-2 virus consists of a large and deep groove right-hand RdRp domain (residues S367 to F920) interlinked by different (palm, fingers, and thumb) subdomains that are in the vicinity of the active site of RNA synthesis. The various RdRp structural motifs, i.e., A to G (A to E embedded in the conserved palm subdomain and F and G in the fingers domain) with a comparatively fixed arrangement, influence the catalytic process. Motif A is made up of residues 611–626 (TPHLMGWDYPKCDRAM), including a highly conserved divalent-cation-binding residue D618. The motif C, composed of residues 753–767 (FSMMILSDDAVVCFN), contains the catalytic residues (S759-D760-D761) in a turn between beta strands. The entwined fingers and flexible thumb assist in the template channel formation that spreads across the finger’s surface to direct incoming nucleotides to the active site, and they also regulate the initiation site recognition [21, 22].

Gilead Sciences, in 2017, developed remdesivir (RDV) as a small molecule broad-spectrum antiviral agent for the treatment of Ebola virus infection [23]. Structurally, it is a monophosphoramidate prodrug of an adenosine analog and is being currently under investigation for the treatment of SARS-CoV-2, and recent reports have shown some evidence of the efficacy of remdesivir in randomized clinical trials [24]. Its administration effectively attenuated the pulmonary viral loads and ameliorated pathological symptoms in a SARS-CoV-infected mouse model and a rhesus macaque model of MERS disease [23]. Its antiviral mechanism includes delayed chain cessation of nascent viral RNA [25].

Favipiravir (FPV) is another antiviral medication used for the treatment of influenza and is approved in Japan. It is a prodrug that selectively inhibits viral RdRp and thus disturbs the replication cycle of RNA viruses [19]. FPV is phosphoribosylated intracellularly to its active metabolite, favipiravir-4-ribofuranosyl-5’-triphosphate, which is perceived as a purine nucleotide by viral RdRp with no effect on mammalian cells as they lack RdRp domain [26]. Recent reports indicate that favipiravir is associated with some degree of clinical benefits in the form of accelerated discharge rate from hospitals and lesser need for mechanical ventilation, without any effect on the mortality rate [27]. The overall observation from both remdesivir and favipiravir is that inhibition of RdRp of SARS-COV-2 is a legitimate target for the treatment of COVID-19.

In this manuscript, we have reported structure-based virtual screening of a structurally diverse library of 261,120 compounds against SARS-CoV-2 RdRp in order to identify new chemotypes for the development of new drug candidates against COVID-19. The computer-based research methodologies could help to provide insightful information about the biological molecules, in which the virtual screening experiments could lead to the identification of several hit compounds representing several scaffolds [28,29,30]. Indeed, combinations of computations and experiments could provide insightful information for the investigated systems, especially for the biologically related systems [31, 32]. Furthermore, the top-scoring hits were evaluated in silico for their pharmacokinetic and toxicity properties and gave encouraging results. Additionally, the top two hits were subjected to molecular dynamics simulation in order to understand the dynamics of their protein–ligand interaction. Overall, the study produced several new chemotypes as hit compounds against SARS-COV-2 RdRp, which can be further developed into potential drug candidates.

Experimental section

All the computational studies were performed on Dell Precision 3630 Tower workstation with Intel Xeon E-2136 processor, 32 GB DDR4 RAM, nVIDIA Quadro P400 2 GB graphics card. A molecular dynamics study was performed using the Desmond [33] molecular dynamics tool.

Protein preparation

After the outbreak of the COVID-19 pandemic, a vast amount of structural data was generated by researchers. Several RdRP structures have been solved using either X-ray crystallography or cryo-EM technique. For this study, we have used the cryo-EM structure of SARS-CoV-2 RdRp in a complex with a 50-base template-primer RNA and remdesivir at 2.5 Å resolution (PDB id: 7BV2) [34]. The PDB EM validation report indicated that the structure has a clashcsore in the percentile between 3 and 3.5, no Ramachandran, and backbone outliers indicating a good structural model. The protein structure was prepared by removing water, fixing bond orders, and adding missing hydrogens by using UCSF Chimera [35].

Small molecule database preparation

In recent years, a large number of purchasable small molecule datasets have been available for screening purposes. In this study, we used Asinex Gold & Platinum Collection (https://www.asinex.com/screening-libraries-(all-libraries)), containing 261,120 compounds with structural diversity and covering drug-like chemical space [36]. This library has been previously screened to identify potential hit compounds against multiple targets [37, 38]. One of the most important tasks in virtual screening is the careful preparation of ligands, including desalting, proper ionization at specified pH, tautomer generation, and stereoisomer generation based on the chiral centers present in a molecule. There are several commercials, but a few open-source tools are available for ligand preparation. We have used Gypsum-DL for preparing the ligand database [39]. The prepared library contained ~ 128,000 compounds after filtration by the Lipinski rule. Similarly, remdesivir was prepared for molecular docking.

Structure-based virtual screening

The virtual screening study was performed using Autodock vina in PyRx 8.0 virtual screening platform [40, 41]. As a first step, the compounds were imported into the OpenBabel program [42] available in the PyRx tool, followed by energy minimization using the MMFF94 force field. The energy-minimized molecules were then converted into Autodock PDBQT format. In this study, we used active site docking where the 3D docking grid was defined using the coordinates of co-crystallized remdesivir with a grid size of 25 Å × 25 Å × 25 Å in x, y, and z directions, respectively. The 2D ligand–protein interaction diagrams and the 3D pose of the docked ligand were prepared using a free Maestro visualizer from Schrӧdinger (https://www.schrodinger.com/). The top ten hits were selected after a detailed inspection of their interaction with the binding site amino acid residues. The structures of these top ten hits are given in Fig. 1. The docking score of top-scoring compounds is given in Table 1.

Fig. 1
figure 1

Top ten hits identified using docking based virtual screening approach

Table 1 Docking score of top hits

In silico ADMET prediction

A large proportion of promising molecules fails in the later stage of drug discovery and development due to non-optimal ADMET properties. Therefore, it is advised to filter the potential problematic molecules using in silico precision methods [43]. Therefore, we used a webserver called SwissADME [44], available at http://www.swissadme.ch/, to calculate various ADME parameters for the top ten molecules. The calculated properties include various physicochemical properties, including lipophilicity and water solubility, pharmacokinetic properties, and drug-likeness. The predicted values of ADME properties are given in Table 2.

Table 2 ADME properties of the top ten hits

Furthermore, toxicological endpoints were predicted using the ProTox-II web server, which is a virtual toxicity lab accessible through https://tox-new.charite.de/protox_II/index.php?site=home [45]. Computational prediction of toxicity using machine learning–based approaches is not only high-throughput but reduces the use of animals in experimentation. ProTox-II uses machine learning–based tools for the prediction of various toxicological endpoints such as acute toxicity, which provides predicted LD50 of the compound, cytotoxicity, carcinogenicity, hepatotoxicity, mutagenicity, immunotoxicity, and adverse outcome (Tox21) pathways. The hit compound canonical smiles were used to predict these endpoints, and the results are given in Table 3.

Table 3 Prediction of multiple toxicological endpoints of identified hit compounds

Molecular dynamics simulation

Molecular docking provides only a static picture of the interaction of ligands with the protein’s binding pocket. Therefore, in order to get understand the protein–ligand dynamics, we performed an atomistic molecular dynamics simulation, which provides the detailed interaction of ligands with the protein during the simulated time period. The top two hits bound to the active site obtained from virtual screening were subjected to molecular dynamics simulation using Desmond molecular dynamics tool following an earlier reported procedure [38]. Briefly, an orthorhombic box of dimensions 10 × 10 × 10 Å in x, y, and z plane, containing protein–ligand complex, was solvated explicitly with TIP3P waters while OPLS3e was used as a force field.

After neutralization of charge by addition of Na+ and Cl ions (0.15 M final concentration), the whole system was energy-minimized and pre-equilibrated using the default settings (Brownian Dynamics NVT simulation for 100 ps at 10 K temperature, 12 ps NVT simulation at 10 K temperature, 12 ps NPT simulation at 10 K with small timesteps and restraints on solute heavy atoms, and finally, a 24 ps NPT simulation without any restraints). After the equilibration step, a final production simulation was performed for 50 ns using NPT (normal pressure and temperature) ensemble at 300 K and 1.013 bars with the default setting of relaxation before simulation [33]. The RESPA integrator with a time step of 2 fs was used during the simulation along with a smooth PME method for the calculation of long-range electrostatic interaction. Nose–Hoover Chain thermostat and Martyna–Tobias–Klein barostat were used to maintain the temperature and pressure, respectively [46]. The final simulation trajectory was analyzed using the simulation interaction diagram available in Maestro.

Results and discussion

Structure-based virtual screening

In order to discover and develop drugs against COVID-19, large-scale efforts have been made to identify new targets. Furthermore, with the help of structural biology, several crystal structures of these targets have been reported using x-ray crystallography and cryo-EM techniques. We have used the structure of SARS-CoV-2 RdRp in a complex with a 50-base template-primer RNA and remdesivir at 2.5 Å resolution [20]. After initial protein preparation, as described in the “Experimental section,” the protein structure was used for virtual screening. The screening library dataset was downloaded from Asinex and prepared as described in the “Experimental section”.

The prepared compound library was docked into the active site of RdRp. Rdrp is a highly conserved enzyme in the coronavirus family and is essential for its replication [47]. The virtual screen ranked the compounds according to their interaction score with the binding site amino acid residues. On manual inspection of the interaction of the top-scoring compounds, we selected the top ten compounds. The two-dimensional structure of these top-scoring compounds is given in Fig. 1, and the docking score, including that of Remdesivir, which was included in the study as the active drug, is given in Table 1. The two-dimensional ligand interaction diagram and three-dimensional docking pose for Remdesivir and the top-scoring five hits are given in Figs. 2, 3, and 4.

Fig. 2
figure 2

Docked 3D pose and protein–ligand interaction diagram of co-crystalized ligand, Remdesivir (A) and top hit AP-263/43503625 (B)

Fig. 3
figure 3

Docked 3D pose and protein–ligand interaction diagram of hits AQ-149/41812552 (A) and AB-323/25048482 (B)

Fig. 4
figure 4

Docked 3D pose and protein–ligand interaction diagram of hits AO-022/43455017 (A) and AE-641/00004064 (B)

The identified hits cover diverse chemical scaffolds such as benzene sulphonamide, 2,3-pyrrolidinedione, and pteridinone. As given in Fig. 2A, Remdesivir gave a binding affinity of −8 and interacted with the binding pocket amino acid residues Arg553, Arg555, Tyr619, and Asp623 of RdRp mainly through hydrogen bonds and π-cation interaction.

Compound AP-263/43503625 (Fig. 2B) showed a binding affinity of −7.6 and of interacted with Asp684, Ser682, Arg555, Tyr619, Asp760 mainly through hydrogen bonds and π-cation interaction. Compound AQ-149/41812552 (Fig. 3A) interacted mainly with the amino acid residues Lys551, Arg553, Asp623, and Asp760 through hydrogen bonds. Compound AB-323/25048482 (Fig. 3B) interacted with Asp761, Asp760, Ser759, Glu811, and Asp618 through hydrogen bonds. Compound AO-022/43455017(Fig. 4A) interacted with Lys621, Thr680, and Ser759 through hydrogen bonds. Compound AE-641/00004064 (Fig. 3A) interacted with Arg555 with a hydrogen bond. Overall, the top selected hits interacted with the binding site amino acid residues of RdRp.

In silico ADMET prediction

Prediction of in silico pharmacokinetic properties and their use for decision-making at an early stage of drug discovery and development program is of utmost importance because of the high attrition rate of drug candidates in late-phase clinical trials due to pharmacokinetic issues. Therefore, in silico, ADME evaluation of the top hits was performed using the freely available SwissADME (www.swissadme.ch) web server [29], and the results are provided in Table 2. For comparison, we have calculated the ADME properties of Remdesivir as well. The results of the ADME study indicated that most of the hits possess optimal pharmacokinetic properties. It can be observed from the results of the ADME study that all the compounds except compounds AK-778/15446009 and AE-641/42124185 did not tend to inhibit the CYP enzyme as predicted by the SwissADME server. Also, none of the hits violated Lipinski, Ghose, Veber, Egan, and Muegge’s rules. All the hits have been predicted to have high synthetic accessibility as well. Overall, the results from ADME prediction indicated that the identified hits have favorable pharmacokinetic properties, which are important for drug development.

For evaluating the potential toxicity of the hit compounds, LD50 (mg/kg), and the probability of hepatotoxicity, carcinogenicity, immunotoxicity, mutagenicity, cytotoxicity, and inhibition of Tox21 nuclear receptor signalling pathways such as Aryl hydrocarbon receptor (AhR), androgen receptor (AR), androgen receptor ligand binding domain (AR-LBD), aromatase, estrogen receptor alpha (ER), estrogen receptor ligand binding domain (ER-LBD), peroxisome proliferator activated receptor gamma (PPAR-Gamma), and probability of inhibition of Tox21 stress response pathways such as nuclear factor (erythroid-derived 2)-like 2/antioxidant responsive element (nrf2/ARE), heat shock factor response element (HSE), mitochondrial membrane potential (MMP), phosphoprotein (Tumor Suppressor) p53, and ATPase family AAA domain-containing protein 5 (ATAD5) were calculated using ProTox-II web server [43]. The results (Table 3) indicated that the predicted LD50 of the evaluated compounds ranged from 400 to 10,000 mg/kg, indicating the safety of the hit compounds. For comparison, Remdesivir was also subjected to toxicity prediction. Except compounds AE-641/42124185 and AK-778/15446009, all the other hit compounds were predicted to be inactive for hepatotoxicity. Moreover, the probability of compounds AE-641/42124185 and AK-778/15446009 producing hepatotoxicity was 0.53. Additionally, none of the compounds showed potential for carcinogenicity. However, some of the hit compounds (AQ-149/41812552, AO-022/43455017, AQ-149/41812734, AE-641/42124185, and AK-778/15446009) were predicted to exhibit immunotoxicity. None of the hit compounds except AB-323/25048482 and AK-778/15446009 showed mutagenic potential and cytotoxic potential, respectively. Nonetheless, no compound was predicted to show Tox21 nuclear receptor signalling pathways and Tox21 stress response pathways inhibition and toxicity.

Molecular dynamics simulation

Factors like docking score and ADMET properties were considered for selecting the top two hits (AP-263/43503625, AQ-149/41812552), which were further subjected to molecular dynamics studies to understand the stability of the RdRp-hit complex.

Results from the analysis of the MD trajectory of AP-236/43503625 complex are summarized in Fig. 5. By analyzing the ligand and protein backbone’s RMSD (Fig. 5A), we found that the RMSD value for both protein and ligand ranged from 0 to 2.4 Å without any major fluctuation, indicating a stable period protein–ligand complex. The RMSF values were also calculated and suggested that the protein is relatively rigid with some fluctuations in the terminal region, as indicated by the value of RMSF ranging from 0.4 to 3.6 Å (Fig. 5B). Furthermore, analyzing the interaction between AP-236/43503625 and the binding site, it was observed that the ligand interacted with the RdRp active site Ser549, Lys621, Asp623, and Asp760 through hydrogen bonding and Arg555 through π-cation interaction. Hydrophobic interactions was also noticed with Arg553 and Tyr619 (Fig. 5C). Overall, these molecular interactions stabilized the ligand into the binding pocket of RdRP. Moreover, the ligand RMSF value was observed between 0.5 and 2 Å and indicated a stable bound ligand (Fig. 5D). Additionally, the extent of molecular interaction during the simulation period is shown in Fig. 5E. The 4-hydroxyphenyl substituent is shown to interact with the Arg553, Lys621, and Asp623 at an extent of 63, 51, and 98%, respectively, during the simulated timescale mediated through hydrogen bonds, hydrophobic interactions, and water bridges. Moreover, to provide a graphical overview of protein–ligand interaction, a plot is made showing (Fig. 5F) various interactions in the form of hydrogen bonds, hydrophobic, ionic interactions, and water bridges occurring during 50-ns simulation time. The above part of the figure indicates the total number of specific protein–ligand contacts, and the below part indicates the residue level interaction of the ligand.

Fig. 5
figure 5

Analysis of MD trajectory of AP-263/43503625-protein complex (A) RMSD plot using first frame as reference (B) RMSF plot of protein (C) ligand–protein contacts histogram plot (D) RMSF plot of ligand (E) ligand–protein contacts diagram (F) molecular interaction as a function of time plot

Similarly, results from the analysis of the MD trajectory of AQ-149/41812552 complex are summarized in Fig. 6. By analyzing the ligand and protein backbone’s RMSD (Fig. 6A), we found that the RMSD value for both protein and ligand ranged from 1.2 to 2.4 Å without any major fluctuation, indicating a stable period protein–ligand complex. The RMSF values were also calculated and suggested that the protein is relatively rigid with some fluctuations in the terminal region, as indicated by the value of RMSF ranging from 0.4 to 4 Å (Fig. 6B). Furthermore, analyzing the interaction between AQ-149/41812552 and the binding site, it was observed that the ligand interacted with the RdRp active site Ser549, Lys621, Asp623, and Asp760 through hydrogen bonding and Arg555 through π-cation interaction. Hydrophobic interactions with Arg553 and Tyr619 (Fig. 6C). Overall, these molecular interactions stabilized the ligand into the binding pocket of RdRP. Moreover, the ligand RMSF value was observed between 0.5 and 2 Å and indicated a stable bound ligand (Fig. 6D). Additionally, the extent of molecular interaction during the simulation period is shown in Fig. 6E. The 4-hydroxyphenyl substituent is shown to interact with the Arg553, Lys621, and Asp623 at an extent of 63, 51, and 98%, respectively, during the simulated timescale mediated through hydrogen bonds, hydrophobic interactions, and water bridges. Moreover, to provide a graphical overview of protein–ligand interaction, a plot is made showing (Fig. 6F) various interactions in the form of hydrogen bonds, hydrophobic, ionic interactions, and water bridges occurring during 50-ns simulation time. The above part of the figure indicates the total number of specific protein–ligand contacts, and the below part indicates the residue level interaction of the ligand.

Fig. 6
figure 6

Analysis of MD trajectory of AQ-149/41812552 protein complex (A) RMSD plot using first frame as reference (B) RMSF plot of protein (C) ligand–protein contacts histogram plot (D) RMSF plot of ligand (E) ligand–protein contacts diagram (F) molecular interaction as a function of time plot

Conclusions

COVID-19 pandemic caused by SARS-CoV-2 has created havoc worldwide, with millions of people losing their life. The current vaccination program gives some hope, but the constantly mutating virus is prone to become resistant to the currently available vaccines. Therefore, novel therapeutics targeting the replication of viruses are much needed. In this research work, we have reported structure-based virtual screening to identify ten potential hit molecules against SARS-CoV-2 RdRp from a commercially available small molecule library. Most of the hits show favorable binding with the binding pocket of RdRp. Furthermore, a 50-ns-long molecular dynamics simulation of the top two hits reveals a stable protein–ligand complex. The main attractive forces for ligand binding inside the active site cavity were hydrogen bonding and hydrophobic interactions. Additionally, in silico, the ADMET study indicated that the top identified hits have favorable pharmacokinetic properties, which is essential for a drug candidate to be successful in the development phase. We believe that the current study might help in the development of drugs against RdRp to tackle COVID-19.