Introduction

The rabies virus (RABV) is a single-stranded, negative-sense, non-segmented, and enveloped RNA virus belonging to the Rhabdoviridae family. Infection by RABV occurs in more than 150 countries, mainly in Asia and Africa. Rabies exposure without immediately receiving the vaccine [post-exposure prophylaxis (PEP)] results in a worldwide human mortality rate of over 99%.

The RABV genome encodes five proteins associated with either the ribonucleoprotein (RNP) complex or the viral envelope. The nucleoprotein (N), the viral RNA polymerase (L), and NS (transcriptase-associated) protein comprise the RNP complex, together with the viral RNA. The matrix (M) and glycoprotein (G) proteins are associated with the lipid envelope [1], all of which have complimentary 3′ and 5′ termini [2]. The L protein fragment contains the viral RNA-dependent RNA polymerase (RdRp) domain. In the virus cycle, RABV enters host cells by adsorption: the virus attaches to the host cell membranes by the G protein, enters the cytoplasm by pinocytosis, and becomes uncoated to RNP in the cytoplasm. A virally encoded polymerase (L gene) transcribes the genomic strand of rabies RNA into leader RNA and five capped and polyadenylated mRNAs, which are translated into individual viral proteins. After viral proteins have been synthesized, replication of the genomic RNA continues with the synthesis of full length, positive-stranded RNA that acts as a template for the reproduction of progeny negative-stranded RNA [3].

The computational tools approach has a wide variety of applications in drug discovery [4,5,6,7], giving novel insights to the screening of potential drugs for the treatment of RdRp. These approaches are becoming increasingly important to also aid the understanding of protein-ligand interactions. However, virtual screening is bringing a more effective, faster, and cheaper screening approach to drug discovery by using high-performance computations to analyze in vitro testing [8]. One of the approaches for virtual screening, the receptor-based approach has identified leading antiviral compounds against viral epidemics, for example, HIV [9,10,11,12,13,14,15], EBOV [16], ZIKV [17,18,19,20], DENV [21], influenza A virus [22,23,24,25,26,27,28,29,30,31,32,33,34], and SARS-CoV-2 [35]. Moreover, molecular dynamics simulation has contributed binding free energy calculations in search of potent antivirals and helps in the understanding of the protein-ligand interactions [6].

To date, there are still no FDA-approved drugs available. In this study, we aimed to identify potential inhibitory compounds targeting the rabies RdRp. The RNA-dependent RNA polymerase was used as a target to screen ZINC compounds by virtual screening and molecular dynamics for the development of medicines to treat rabies. Finally, five potential compounds were finally identified as possible candidates as templates of anti-rabies therapy.

Materials and methods

Sequence retrieval, homology modeling, and refinement

Due to the absence of an X-ray crystallography structure of Rabies lyssavirus RdRp, we therefore have to model one. The amino acid sequence of Rabies lyssavirus RdRp protein retrieved from the National Center for Biotechnology Information with the accession number “NP_056797” [36] was selected for this study. The protein sequence was queried using Blast-P against the PDB database, and coverage of the query was used to identify closely related structural homologs for rabies RdRp. The first hit obtained on Blast-P with query coverage 95% and E-value of 4e-63 was L protein of the vesicular stomatitis virus taken as a template (PDB CODE: 5A22) for homology modeling of Rabies lyssavirus RdRp via SWISS MODEL (http://swissmodel.expasy.org).

GROMACS 5.1.4 software package using ff54a7 [37] force field was used to refine the RdRp model from homology modeling. The simulation was performed in a cubic box containing the protein with TIP3P water molecules and three Cl ions to neutralize the positive charge. The temperature of the simulation was set up at a constant 300 K and pressure at 1 bar. The energy minimization step was performed using the steepest descent algorithm with an energy step size of 0.01 s, and a maximum of 50,000 steps were performed. For production of MD simulations of 50 ns, the step size was 0.02 fs. The coordinates of the protein were collected every 10 ps throughout the simulation time. Backbone dynamics were analyzed using the Qtgrace program [38]. The Ramachandran plot was built by PROCHECK [39] (https://servicesn.mbi.ucla.edu/PROCHECK/), and local quality and comparison plots by SWISS-MODEL were utilized to check the validity of the models built. Furthermore, the clustering analysis of RdRp modeled was applied to calculate the RMS clusters of RdRp backbone conformations by g_cluster module of GROMACS package, using the method of simple linkage (nearest neighbor) with an RMSD cutoff of 0.25 nm. The cluster representative structures extracted from the trajectories were analyzed by using the PyMOL Molecular Graphics System, Version 1.2r3pre, Schrödinger, LLC. [40].

Binding site prediction

The RdRp protein binding sites were predicted using FT site server (https://ftsite.bu.edu/) [41], RaptorX binding server (http://raptorx.uchicago.edu/BindingSite/) [42], and CASTp server (http://sts.bioe.uic.edu/castp/index.html?2cpk) [43]. All consensus residues were used to map binding site of the modeled RdRp.

Virtual screening (VS) with a compound library

Virtual screening was performed using the GOLD 5.7.1 software [44] which is a structure-based virtual screening tool. Docking simulations were performed within 15 Å radius of binding site residues, using default parameters of the genetic algorithm: 100 runs per molecules and 100,000 operation with Goldscore fitness score as default scoring function [45] to perform protein-ligand docking. ZINC Diversity Set III is a free database of commercially available compounds for virtual screening [46]. The Top 5 hits of the library were selected based on docking score (the higher the better). The analysis of docking results was carried out using Discovery Studio Client v 17.2.0.16349 (Accelrys Software Inc. San Diego)

MD simulations of RdRp-ligand complexes

The screened protein-ligand complexes was further validated for its binding affinity with the binding site residues of RdRp by performing the molecular dynamics simulations. MDs studies of RdRp complex with top five virtual hits were performed using GROMACS 5.4.1 [47]. The topology and parameters of each ligands were generated using the PRODRG online server [48]. The system minimization, heating, and equilibration (NVT and NPT) were carried out in the same manner used for the optimization of 3D RdRp structure described above. Finally, the MD simulations were performed for 50 ns for each systems of complexes. For each simulation performed in triplicate, the potential of each trajectory produced after MDs was analyzed using g_rms and g_hbond module of GROMACS utilities. The hydrogen bonds were calculated by proton donor and an acceptor distance cutoff of ≤3.5 Å and an angle cutoff of 30°. The graph was produced using the Qtgrace tool [38].

Binding free energy calculations

The binding free energy was calculated using the MM-PBSA (Poisson-Boltzmann equations) method developed by Rashmikumara (https://rashmikumari.github.io/g_mmpbsa/) [49]. The tool is suited for calculating relative binding free energies of similar systems. The analysis of free energy and energy contribution per residues for RdRp with five ZINC compounds were calculated. A total of 751 snapshots were extracted last 15 ns of the equilibrium phase of each system. In this study, different components of the interaction energy that contributed to the binding energy were estimated. That included electrostatic interactions, van der Waals interactions, polar solvation energy, and non-polar solvation energy.

Briefly, the basic principle is shown in a formula, as below:

$$ \Delta {\mathrm{G}}_{\mathrm{binding}}=\Delta {E}_{\mathrm{MM}}+\Delta {\mathrm{G}}_{\mathrm{sol}}-T\Delta S $$

To estimate the free energy of each component, both entropy and enthalpy terms are considered, and the total binding free energy (Gbinding) is calculated as a sum of the gas phase interaction energy between protein and ligand (∆EMM), the solvation energy associated with the transition from the gas phase to the solvated state (∆Gsol), and the change of conformation entropy associated with ligand binding ( − TS).

Results and discussion

Structure prediction of RdRp protein

Currently, 3D X-ray crystallography structure of the Rabies lyssavirus RdRp protein has not yet been determined. We therefore have to model one. A complete Rabies lyssavirus protein sequence was retrieved from the National Centre for Biotechnology and Information (accession No; NP_056797) [36]. The structures of the RdRp of RABV domains are modeled using SWISS-MODEL web server (http://swissmodel.expasy.org/) based on a template VSV L protein (PDB: 5A22) [50] (Fig. 1A). The initial alignment of the rabies RdRp sequence with the template sequence was obtained using ClustalW. The final alignment (37.08% identity) used in the homology modeling and the model structure is generated using the structure of the VSV L protein determined by cryo-electron microscopy (PDB: 5A22) [51] as a template (Fig. 1B). The RABV RdRp protein was predicted to have an RdRp domain of the RABV L protein as well as the VSV L protein [51]. The sequence alignment between rabies RdRp and the template sequence are generated by ClustalO (supplementary Fig. S1). The RdRp model contained four motifs, A, B, C, and D, that represent regions of the highest similarity [36, 52]. The Mononegavirales polymerase sequence has the tetrad GDNQ conserved among all its families, with exception of the genus Novirhabdovirus [53,54,55,56]. It was previously reported that the GDN motif located in CRIII is important for RdRp activity of L proteins of NNS RNA viruses. Moreover, the directed mutagenesis in viruses belonging to other families of the Mononegavirales, have shown that mutations to the aspartate or the asparagine of Motif C result in complete lose by the enzymatic activity. [57,58,59]. The final RdRp models from the SWISS-MODEL server was refined through 50 ns MD simulations to optimize overall geometry and to remove clashes in geometry for later analysis.

Fig. 1
figure 1

The structural superimposition of Rabies Lyssavirus RdRp model. A Template of the model (PDB:5A22) shown in purple color. B The RdRp modeled shown in salmon color. The tri-residues “GDN” of the RdRp binding sites at a B-hairpin in the palm motif

MD simulations has been commonly applied to refine homology models [47]. So, the initial 3D structure of rabies RdRp from homology modeling was optimized using MD simulations in a solvent mimicking the real physiological environment. The stability of the RdRp model structure during the MD simulations was measured by RMSD. The RMSD values of the RdRp backbone atoms in the entire MD simulations trajectory stabilizes after 10 ns of MDs as seen in Fig. 2A. To further analyze structural stability, conformational cluster analysis corresponding to simulation time is performed with a cutoff of 0.25 nm as shown in Fig. 2B. The statistical analysis showed that the distribution ratio of clusters 1 to 5 of the RdRp model is 0.62 to 5.46 Å. Cluster 1 of RdRp model at 23.34 ns was the most stabilized during the 50 ns MD simulations. Herein, the RdRp model from cluster 1 was selected as a representative conformation for molecular docking.

Fig. 2
figure 2

The RMSD of RdRp protein 50 ns. A After molecular dynamics simulation produced by GROMACS [47]. B The clustering analysis was performed with a cutoff of 2.5 Å

The RdRp model was validated for their structural quality using the Ramachandran plot obtained through PROCHECK. The distribution of the phi and psi angles for the amino acid residues was represented by the Ramachandran plot. After refinement by MD simulations, the Ramachandran plot of optimized RdRp model shows that 82.4% and 99.4% of residues were placed in the favored zone and the allowed zone, respectively (supplementary Fig. S2), higher than the initial structure (supplementary Fig. S3). This indicates that the MD-optimized rabies RdRp structure will be more stable and reliable for use.

However, recently the Cryo-EM structure of rabies RdRp (PDB ID: 6UEB) was published in Protein Database Databank. Therefore, the Cryo-EM structure with our optimized RdRp model was superimposed which showed both of them have highly similar structure. The RMSD between RdRp of crystal structure and homology model of RdRp was 1.36 Å, indicating the reliability of our RdRp model.

Binding site analysis

The binding site residues of RdRp modeled predicted by three server are submitted on server FTsite server, RaptorX, and CASTp; the predicted binding sites by these servers are summarized in Fig. 3 and supplementary Table S1. All three servers used a different algorithm, and their combined results give us the increased probability of where the binding pockets are located in the protein structure. The residues identified in the burial cavity of protein include Tyr619, Glu620, Lys621, Ile696, Asp729, Asn730, Gly792, and Lys793, revealing a deep cleft as shown in Fig. 3 which indicated the hydrophobic nature of the binding site of the protein.

Fig. 3
figure 3

Identified binding pocket and hydrophobic cleft of RdRp protein. Surface area show residues that involves binding site residues (gray color). Binding site results predicted by FTsite, COACH, and CASTp binding site servers for protein target as visualized by the pymol molecular graphics system, Version 1.2r3pre, Schrödinger, LLC [40]

Virtual screening with NCI dataset III

The virtual screening of the RdRp protein was performed against 2045 compounds from NCI diversity dataset III using GOLD docking program [44]. Based on GOLD results, the top five compounds having highest scores are identified and screened out as the possible inhibitors of the RdRp protein as shown in supplementary Table S2. All the selected compounds have docking scores in the range (70–90 kcal/mol). The selected five compounds, Z01690699, ZINC29590257, ZINC29590259, ZINC29590262, and ZINC29590263, are examined in terms of their binding modes and intermolecular interactions with important amino acid residues at the binding site of RdRp shown in Fig. 3. Interestingly, the benzimidazole and benzopyran moiety from these compounds seem to play a central role during the intermolecular interaction. The compounds Z01690699 containing benzimidazole moiety are arranged in a head-to-head disposition and involve the benzene ring. One carbonyl from Z01690699 established a hydrogen bond with the Arg552 and Asn623. Furthermore, two imidazole nitrogen atoms of Z01690699 also interacted with Lys548, Lys621, and Glu696, while its imidazole ring formed a π–π interaction with the Leu577 and Leu583. Additionally, a π–sulfur interaction was observed between the imidazole and the side chain of Lys701. For the benzene ring moiety of Z01690699, it formed π–π and π-alkyl interaction with Leu577 Leu583 Met585 Ala573 Ala726 Trp622, respectively. The 2D-3D interaction profiles of Z01690699 are showed in Fig. 4B.

Fig. 4
figure 4

2D interactions diagrams between RdRp protein with five compounds (A) Z29590259, (B) Z01690699, (C) Z29590263, (D) Z29590257, and (E) Z29590262 at binding site of rabies RdRp virus

The remaining 4 compounds, namely ZINC29590257, ZINC29590259, ZINC29590262, and ZINC29590263, have a similar benzopyran moiety at center of compounds. The orientations of all four binders are similar: the part of the hydroxyl group at position 4 of benzene ring that could form hydrogen bonding with the residues of binding pocket composed of Lys621 Lys778 and Glu620. Furthermore, these compounds establish hydrophobic interaction with the surrounding residues Lys299, Glu550 Leu698 Lys548 Trp622 Leu583 Leu577 Ala573 Met585 and Glu696 shown in Fig. 4A, C and D. These interactions tightly bound and inhibited the interaction of RdRp. The binding free energy was further investigate using MM-PBSA by GROMACS [47].

Molecular dynamics simulation and binding free energy MM-PBSA calculations

MD simulations were carried out for five RdRp-ligand complexes to understand their dynamic behavior and stability. The five MDs consists of RdRp-Z01690699, RdRp-Z29590257, RdRp-Z29590259, RdRp-Z29590263, and RdRp-Z29590263 complexes were estimated using RMSD changes during the MD simulations. In this current study, the RMSD value, H-bonds, MM-PBSA ,and decomposition analysis of the protein-ligand complexes were analyzed. Additionally, the binding free energies of all the complexes for the last stable 15 ns which equilibrated trajectory were performed.

In order to evaluate binding free energy interactions between the RdRp domain of RABV and the ligand, these were subjected to 50 ns molecular dynamics simulation using the GROMACS 4.1.0 [47]. This shows the RMSD of the Cα atom for all complexes throughout MDs. The RMSD values are shown in the plot of RMSD (nm) per time in Fig. 5A. The RMSD plot for Cα backbone values indicated that most of the MDs reached equilibrium within 35 ns. The average RMSD values for Z01690699, Z29590257, Z29590259, Z29590262, and Z29590263 were 0.58 ± 0.06 nm, 0.80 ± 0.11 nm, 0.55 ± 0.05 nm, 0.30 ± 0.07 nm, and 0.57 ± 0.08 nm, respectively. Overall comparison of Cα backbone RMSD showed that all of complexes except RdRp-Z29590257 system were much more stable during the MDs when compared to RdRp Apo form.

Fig. 5
figure 5

RMSD of RdRp Apo form and RdRp-ligand complexes. A The RdRp Apo form (brown), RdRp/Z01690699 (black), RdRp/Z29590259 (green), RdRp/29590257 (red), RdRp/Z29590262 (blue), and RdRp/Z29590263 (yellow). B The number of hydrogen bonds during the simulation time between RdRp and compounds

The average hydrogen bonds between protein and ligand were calculated from stable trajectories 35–50 ns of MDs shown in Fig. 5A; the H-bonds are calculated and analyzed using gmx hbond tool that determines the presence of H-bonds based on a cutoff distance of 3.5 Å and an angle of 30°. From the Fig. 5B, it is observed that the five complexes exhibited around similar hydrogen bonding ranging from 0 to 11. RdRp-Z01690699 complex represents 0 to 6 hydrogen bonds, while RdRp-29,590,257, RdRp-Z29590259, RdRp-Z29590262, and RdRp-Z29590263 showed 0 to 9, 0 to 11, 0 to 10, and 0 to 8 hydrogen bonds, respectively. All the complexes are very much consistent in terms of changes in numbers of hydrogen bonds after 35 ns; it could therefore be concluded that Z29590259 showed maximum hydrogen bonds, while Z01690699 showed minimum hydrogen bonding.

The free energy of binding interaction between ligands and receptors was calculated for five complexes. MMPBSA methods were applied in triplicate for the last 15 ns (35–50 ns) characterized with stable trajectories. A total of 751 frames at every 20 ps time frames were taken for the calculations of binding free energy (ΔG) which is an estimation of the non-bonded interaction energies. The ΔG values for RdRp-Z01690699, RdRp-Z29590257, RdRp-29590259, RdRp-29590262, and RdRp-29590263 were − 240.01 ± 0.75, − 193.17 ± 0.89, − 77.34 ± 0.830, − 176.71 ± 0.80, and − 158.95 ± 0.93 kJ·mol−1, respectively. Our analysis indicates that RdRp-Z01690699 remained quite high compared with other systems. Therefore, the binding energy of all five compounds is below zero and showed a good binding affinity for RdRp. In all five complexes shown the individual component for binding energy, the van der Waals, the electrostatic interactions, and non-polar solvation energy had contributed negatively to the overall interaction energy as shown in Table 1. The van der Waals energy, electrostatic contribution, and non-polar energy were favorable for the stability of the binding pattern, while the positive value of polar solvation and non-polar energy values of the fives compounds indicated these contributions have no significant differences of the ligand binding with RNA polymerase. Therefore, the main effect of the binding free energy was from van der Waals energy and electrostatic contribution. To gain more detail into key residues involved in ligand binding, the residues energy decomposition plots were generated which show the total binding energy contribution of each residue for all five MDs.

Table 1 Total binding free energy for all the RdRp-ligands complexes

We further analyzed the energy contributions of compounds and the key amino acid in binding site cleft of the RdRp. The graphical representation of per-residue energy decomposition analysis of all complexes is represented in Fig. 6 and supplementary Table S3. Notably, in RdRp-Z01690699 system, most of the binding site residue contributes higher total binding free energy except Met585, Glu620, and Lys778 (− 0.40 ± 0.01 kcal·mol−1, − 0.32 ± 0.02 kcal·mol−1, and 0.38 ± 0.01 kcal·mol−1, respectively) which has less significant energy contributions. However, residue Lys621 in the RdRp-Z01690699 complex has the highest binding free energy contribution (−5.83 ± 0.22 kcal·mol−1) relative to other binding sites. Therefore, Lys621 can be regarded as the stabilizing factor in the binding of RdRp-Z01690699 to RdRp protein. Likewise, the RdRp-Z29590257 complex exhibits Trp 622 to have the highest binding free energy contribution (−10.31 ± 0.12 kcal·mol−1) compared to a lower energy value (1.94 ± 0.24 kcal·mol−1).

Fig. 6
figure 6

The per-residue energy decomposition analysis graph of RdRp-01690699, RdRp-29590257, RdRp-29590259, RdRp-29590262, and RdRp-29590263 complexes

Glu620 and Lys696 in RdRp-Z29690262 complex exhibit the lowest polar energy contribution (33.31 ± 0.56 kcal·mol−1and 7.47 ± 0.43 kcal·mol−1) and compared to its contribution in RdRp-Z29590259 and RdRp-Z29590263, which is relatively higher (0.04 ± 0.03 kcal·mol−1 and 0.63 ± 0.05 kcal·mol−1). This might be attributed to the orientation of Z29590262 in the protein conformational space. Therefore, it can be concluded that the binding energy order for the docked candidates was found to be Z01690699 > Z29590257 > Z29590259 > Z29590263 > Z29690262, respectively.

Conclusions

In this research, the homology model of rabies RdRP generated from VSV L protein as a template had more close homology and reliability when compared with the Cryo-EM structure of rabies RdRp. The fully optimized structure was subjected to MD simulations and clustering method. Assessments of the structure before and after MD simulations revealed that RdRp was structurally and dynamically refined. At the clustering step, the structure at 23.34 ns was selected for docking. The GOLD docking results demonstrated that the active compounds were projected toward the binding site residues, namely Met585, Glu620, Lys621, Trp622, Asn623, Glu696, Leu698, Ala726, and Lys778. The lead compounds had a different binding mode. Further, it was found that the benzimidazole and benzopyran moiety could bind to the surrounding residues in a central hollow of RdRp active site. This difference in the binding modes of RdRp and lead compounds could affect the enzyme activity of the RdRp. Molecular dynamics simulation of five ligand Z01690699, Z29590257, Z29590259, Z29590263, and Z29590263 showed that the difference in forming hydrogen bonds, and electrostatic and hydrophobic interactions with the important residues, such as Met585, Glu620, Lys621, Asn623, Glu696, Leu698, Ala726, and Lys778 in RdRp, played a crucial role in inhibition of these ligand. The present study identified NCI compounds that may be good candidates as drugs for rabies virus treatment. Our methodology is a good practical way to identify new inhibitor with unknown 3D structure. Further in vitro enzymatic experiment will be required for these compounds to demonstrate action against the RNA-dependent RNA polymerase of the rabies virus.