1 Background

Coronavirus Disease or COVID-19 had its first reported cases in Wuhan, China, in December 2019. It spreads like a forest fire and was soon declared a global pandemic by the World Health Organization (WHO). A virus causes this disease, severe acute respiratory syndrome coronavirus 2 (SARS-CoV2). SARS-CoV2 is a single-stranded, wrapped, unbroken, positive sense RNA virus with about 30,000 nucleotides coding for 9860 amino acids [1]. The appearance of the virus is somewhat like a crown due to the presence of the ‘spike’ glycoprotein. The virus genome codes for several structural proteins like the spike protein, membrane proteins, envelope proteins, nucleocapsid proteins, and accessory proteins. The spike (S) glycoprotein plays an important role in virus infectivity, as one of its prime functions is to be the prime corrupter of host immunity and the identification of target receptor [2]. It is observed that for successful entry into cell targets of human host the S-Glycoprotein and the host trans-membrane serine protease 2 (TMPRSS2) play a crucial role in binding to the angiotensin-converting enzyme 2 (ACE2) receptors [3,4,5]. The SARS-CoV2 genome has 10 open reading frames (ORF). ORF1a/b consists of about 2/3rd of viral RNA that codes for polyprotein 1a, polyprotein 1b and 1–16 non-structural proteins. The remaining ORFs code for structural proteins, such as spike proteins, membrane proteins, envelope proteins, nucleocapsid proteins, and accessory proteins. ORF1ab is translated into the pp1ab polyprotein through the 1-ribosomal frame shift mechanism [6,7,8] followed by proteolytic processing that results in the formation of the main protease Mpro, also called 3C-like protease(3CLpro). Mpro is responsible for proteolytic processing [9,10,11,12] (cleaving) at 11 sites that take part in the formation of replicase transcriptase complex. These complexes are vital for virus replication. Mpro is found to be an essential expression of the viral genome replication and is coded by the nsp5 gene in the viral genome [9]. The Mpro protease has a mass of around 33.8 kDa [10] and is distinguished as a self-cleavage protein [11, 12]. It consists of a homodimer subdivided into two protomers (A and B) that have 3 well-defined domains (Additional file 1: Fig. S1) [13]. The first two domains I and II are made of β-barrels forming a chymotrypsin structure and bearing a catalytic couple histidine 41 (HIS41), and cysteine 145 (CYS145) [6,7,8,9,10,11,12,13,14]. Domain III is made up of α-helices [15]. For catalytic interaction, Mpro needs to dimerize, establishing interactions with both N- and C- terminal domains of the other protomer [16].

It is well known that the main protease (Mpro) of SARS-CoV-2 plays a crucial role in the maturation of several viral proteins like such as RNA-dependent RNA polymerase (RdRp), and Nsp4-Nsp16. The dependency of the virus on Mpro, and given that no human proteases share similarity with it, makes this protein an optimistic drug target [17,18,19], and it is highly preserved (96.1% similarity) among coronaviruses [20]. Hence, there have been efforts to discover therapeutic candidates targeting Mpro using various computer aided drug designing methods like virtual screening methods based on pharmacophore, molecular docking, and molecular dynamic simulation [21,22,23,24,25,26,27,28]. Drugs such as Danoprevir, which is legally used for the treatment of chronic Hepatitis C in China, and Darunavir, which inhibits the maturation of virus particles by obstructing polypeptide cleavage in infected cells [29], have been used as repurposed drugs to inhibit virus protease clinically [30, 31]. Su et al. [32] have repurposed the inhibitory potential of Shuanghuanglian oral preparation a traditional Chinese patented medicine for 3CL protease, where Baicalein an active compound from the extract was observed productively embedded in the core of the substrate-binding pocket by interacting with two catalytic residues acting as a ‘blockade’ in front of the catalytic dyad, preventing substrate access within the active site. Similarly, Al-Zahrani [33] obtained 51 phytochemicals from the extract of Juniperus procera and docked them against the main protease of COVID-19. Among them, rutin (Additional file 1: Fig. S2) and lopinavir emerged as the best performing molecules.

Rutin, or rutoside or sophorin, is a glycoside that combines the flavonol quercetin and the disaccharide rutinose. Rutin and many other flavonols are under initial clinical research for their potential biological activity in post-thrombotic syndrome, venous insufficiency, and endothelial dysfunction. However, there was no prime evidence for safe and effective use as of 2018 [34,35,36]. Compared to other flavonols, rutin has a lower bioavailability due to poor absorption, high metabolism, and rapid excretion, making it unsuitable for therapeutic use [34]. Although many in silico studies have been conducted for the therapeutic applications of Rutin for the main protease SARS-CoV-2 [37,38,39,40,41], there is an acute need to conduct more experiments towards bioavailability.

2 Methods

The graphical representation of the materials and method can be expressed as in Fig. 1.

Fig. 1
figure 1

Working pipeline for proteomic investigation

2.1 Protein selection and structure preparation

Natural compounds were analysed using in silico approaches using the crystal structure of SARS-CoV-2 main protease (no ligand) (PDB ID–7DJR) [42] involved in viral replication of the SARS-CoV-2 genome that were downloaded in PDB format from the RCSB Protein Data Bank website (https://www.rcsb.org/) R-value work was 0.166. R-value free was 0.201 with a resolution of 1.45 Å selected for the present study. Specifically, this protein was chosen because not enough in silico studies have been performed using this protein and sole intention of the study is to inhibit the disease in its premature state, i.e. the monomer state. 7DJR has only one chain used to prepare macromolecules, and other coexisting water molecules and non-standard residues were removed using Biovia Discovery Studio 2021 Client. To produce a protonated state at physiological pH, build up geometry optimisation, addition of polar hydrogen, Kollman charges and Gasteiger charges, one uses Autodock4.2.

The three-dimensional structures of all the Natural Products and FDA-approved [43, 44] were retrieved from the ZINC database (https://zinc.docking.org) [45]. Energy minimisation, geometrical conformation, and hydrogen bond were made and the file format was converted from SDF to PDBQT using the Open Babel programme [46]. In the complete study, rutin (ZINC4096846) along with the FDA-approved drugs was considered as reference molecules.

2.2 Virtual screening

In order to investigate biological activity, the prime objective of molecular docking was to assess the binding interaction between the protein and ligand molecules. For the primary screening of the library, AutoDock Vina [47, 48] (an open-source programme for molecular docking) was used. Due to the massive size of the library, the process had to be automated. For automation purposes, a shell script was written that iterates the process of screening and stores results in a different directory.

2.3 Molecular docking

AutoDock4 [49] was used to perform individual molecular docking of top-screened ligand molecules (that is, natural products and FDA drugs) to gain confidence in the results. All the molecular docking was performed using a Genetic Algorithm, and exhaustiveness (or no. of runs) was set to 100. The size of the grid was adjusted according to the receptor binding pocket at coordinate X, Y, Z that were set around the centroid of the active site to centre X = 10.430, Y = − 0.021, Z = 20.536 and dimension coordinates at X = 72, Y = 64, Z = 60 (as in Additional file 1: Fig. S4). The complexes of all the files were obtained and were visualised with the help of PyMOL.

2.4 Molecular dynamic simulation

Molecular dynamics simulation studies were carried out by using GROMACS [50]. Swiss Param (https://www.swissparam.ch/) [51] server was used for ligand topology generation. The MOL2 coordinates of the ligand molecule were uploaded and the server provided the zip file for the ligand topology. The force field was set to CHARMM27 [52], the water model was TIP3, the box type was set to be cubic for the apo-protein and the dodecahedron for the complexes, respectively, the salt type was NaCl, the energy minimisation steps were set to be 50,000, the equilibration of NPT and NVT was carried out at 300 K with a simulation time of 100 ns. This process was repeated for all of the complexes, including the reference molecule, i.e. Rutin, FDA drug, and the apo-protein as well. All the results were then individually analysed.

2.5 Physiological parameters

Pharmacokinetic studies and toxicological characteristics are an important criterion for the selection of potential drug candidates. As an alternative to clinical trials, computational methods were developed to assess the bioactivity of a potential drug candidate [53]. ADMETlab 2.0 (https://admetmesh.scbdd.com/) [54] is an online ADME predictor that takes smile notation as input and tabulates the result of physiological characters.

3 Results

3.1 Virtual screening and molecular docking

The results observed for virtual screening were recorded considering the exhaustiveness to 8, and the substantial data set was tabulated, and the ZINC85893430, ZINC70699832 and ZINC70669786 (structures given below Fig. 2) were recorded as top 3 molecules with − 12.8, − 11.7 and − 11.5 kcal/mol ΔG, respectively. These three top molecules were utilised for further experiments against SARS-CoV-2 Mpro. Whereas rutin and danoprevir (Additional file 1: Table S1) had significantly less the binding affinity of − 7.39 kcal/mol ΔG0 and − 9.8 kcal/mol ΔG0.

Fig. 2
figure 2

Top three screened ligands (A ZINC70669786, B ZINC70699832, C ZINC85893430)

To increase the confidence in the observed binding affinities, another docking with an iteration of 100 Genetic Algorithms run was performed and the results obtained for the above molecules were tabulated (Table 1).

Table 1 Table showing a summary of molecular docking performed on AutoDock4 (Bold residues are Hydrogen bond interactions)

The complexes received were uploaded to the protein–ligand interaction profiler (PLIP) [55], an online web server to study the protein–ligand interaction. The output “. pse” file was visualised using PyMOL (Fig. 3),

Fig. 3
figure 3

Complex formed by A ZINC70699832, B ZINC70669786, C ZINC85893430, D ZINC4096846 (Rutin), E Danoprevir (FDA drug) visualised in PyMOL

3.2 Molecular dynamic simulation

The molecular dynamic simulation of all the samples was carried out on GROMACS using the CHARM27 forcefield, water model was TIP3, the box-type was Cubic and Dodecahedron, the salt-type was NaCl, the energy minimisation steps were set to be 5000, NPT and NVT equilibration was carried out at 300 K with a simulation time of 100 ns. The results obtained are shown below (Figs. 4, 5, and 6).

Fig. 4
figure 4

RMSD comparison of protein–ligand complexes with the apoprotein. A ZINC70699832, B ZINC70669786

Fig. 5
figure 5

RMSD Comparison of protein–ligand complexes with the apo-protein. A ZINC85893430, B Danoprevir (highest screened FDA drug)

Fig. 6
figure 6

RMSD Comparison of protein–ligand complexes with the apo-protein. A ZINC4096846 (Rutin)

3.3 Physiological parameters

The physiological quality or ADMET scores of the ZINC molecules were examined and tabulated from ADMETlab 2.0 on the basis of molecular weight, Log P, and Lipinski drug likeliness. From Table 2, the data suggest that among all the compounds ZINC70699832 has the highest potency to be a drug molecule due to least violations.

Table 2 Physiological parameters checked in ADMETlab2.0

4 Discussion

4.1 Molecular docking

It is observed that among complexes (Fig. 3) ZINC70669786 has the highest binding energy of − 12.03 kcal/mol but due to its large size and sterically strained structure it is not able to fit itself in the active cavity and thus, has very few hydrogen bonds. Therefore, ZINC70669786 cannot interact with the crucial residues such as GLU-166 responsible for dimerisation of the protein [56], GLN-189 which is responsible for catalysis [57], HIS-41 and CYS-145 responsible for the formation of the catalytic dyad at the N and C terminal of the protein [6,7,8,9,10,11,12,13,14]. On the other hand, ZINC70699832 has shown a binding affinity of − 11.05 kcal/mol and interactions at all the critical active site residues, while ZINC4096846 (Rutin, reference) has also interacted with critical residues but failed to interact with GLN-189 responsible for catalysis, unlike ZINC70669832 and also has a very low binding affinity of − 7.39 kcal/mol. Both of the molecules have a perfect fitting in the active site cavity. In addition to conventional hydrogen bonds and hydrophobic interactions reported in the active site cavity, a π-stacking and a salt bridge formation over HIS-41 residue can also be seen. π-stacking is a non-bonding interaction, whereas a salt bridge is an ionic interaction both seemingly increase the stability of the complexes formed. ZINC85893430 and the FDA-approved drug danoprevir have a binding affinity of − 10.91 kcal/mol and − 9.8 kcal/mol, respectively, but due to their large size and sterically strained structure, ZINC85893430 and danoprevir are unable to fit themselves in the active site cavity. It attaches itself to a completely different position on the protein that may have some unknown function.

4.2 Molecular dynamic simulation

Figure 4 depicts the RMSD comparison of complexes formed by ZINC70699832 and ZINC70669786 having a stable deviation under 2 Å and 4 Å, respectively. The spikes in the graphs can be explained by the high presence of loops in the protein structure. Complex like ZINC85893430 due to their steric properties and massive size was not able to pull off a stable simulation and has shown considerable high deviation like 40 Å as shown in Fig. 5A. Control setups like danoprevir have shown a stable RMSD graph in Fig. 5B, but the stability difference is much higher (around 10 Å) from that of the apoprotein molecule, thus deeming it unfit to be a repurposed drug. The other control molecule ZINC4096846 or rutin has a seemingly unstable simulation pattern as shown in Fig. 6A. The RMSD differences between the apoprotein and protein-rutin complexes are not very high, but the complex struggled to attain stability throughout the simulation. From the above representation of RMSD graphs, judging an explicit drug molecule might be a difficult task but among the complexes, and it is evident that ZINC70699832 has shown better stability than the rest of the samples as well as control molecules.

5 Conclusion

This study mainly focuses on a swift pipeline for drug discovery. The proposed molecule ZINC70699832 has an optimum binding affinity of − 11.05 kcal/mol, covering almost all the important residues in the active site cavity of the protein like GLN-189, GLU-166, HIS-41, CYS-145 just like the reference molecule. The protein-ZINC70699832 complex has also shown a stable molecular dynamic simulation rather than a protein-reference complex. Drug likeliness of this compound is also higher than the others. This molecule can be used as a lead in a pharmacophoric study to produce even better results than ZINC70699832.