1 Introduction

The novel coronavirus was declared a global pandemic after its devastating effect was felt in most countries across the world [1]. Its adverse impact on the health, social, and economic sectors has pressed the need for rapid solutions. With intense ongoing scientific research resulting in the development of vaccines, the rapid mutation of the virus poses another major concern. As a result, efficacious therapeutics with lower side effects are required to combat present and future viral strains. Coronaviruses can be classified into four main subgroups—alpha, beta, gamma, and delta—based on their genomic structure [2]. These viruses are well noted for their structural [spike(S), nucleocapsid (N), matrix (M), envelope (E)] and non-structural proteins (nsp1 to nsp16) [3]. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) belongs to the family Coronaviridae and the genus Betacoronavirus. It is characterized as a positive-sense, single-stranded RNA virus with a genome sequence length of about 32 kb [4]. The proteins encoded by SARS-CoV-2 function together to aid in transcription, replication, transmission and virulence. SARS-CoV-2 possesses over 50% genome sequence similarity with both SARS-CoV and MERS-CoV [5].

The main protease (Mpro) of SARS-CoV-2 is an important enzyme for cleaving translated polyproteins from the viral RNA. Proteolysis occurs at 11 distinct cleavage sites on the translated polyprotein, with Leu-Gln↓ (Ala, Gly, Ser) specified as the recognition sequence [6]. Mpro has two protomers, with each protomer possessing three domains, domains I, II, and III. Domains I (residues 10–99) and II (residues 100–182) are made up of six-stranded anti-parallel barrels with the substrate-binding site positioned in the middle of the barrel. Domain III (residues 198–303), which is characterized by a globular cluster of five alpha-helices, is responsible for regulating protease dimerization [7] (Fig. 1).

Fig. 1
figure 1

a Sequence map of Mpro showing the various domains. Domain I (residues 10–99) is colored red, Domain II (residues 100–182) blue, and Domain III (residues 198–303) green. b 3D structure of SARS-CoV-2 Mpro marking specified domains. His41 is indicated as yellow and Cys145 as red. c Surface representation of a dimer SARS-CoV-2 M.pro showing protomer A (purple) and protomer B (green)

Mpro induces its catalytic activity in a dimeric state, with His41 and Cys145—a catalytic dyad situated in the gap between domains I and II—playing a crucial role in proteolysis [6]. Some highly conserved binding sites—S1´, S1, S2, S4—are also present in the catalytic machinery, with the S1 site specifying the enzyme specificity for glutamine at the P1 position of the peptide substrate. During the first step of the hydrolysis reaction catalyzed by MPro, Cys145 serves as a nucleophile with His41 acting as a base catalyst. An oxyanion hole is generated by the backbone amido groups of Gly143, Ser144, and Cys145, and these residues are arranged in the so-called oxyanion loop (residues 138–146) to stabilize the partial negative charge developed at the P1 carbonyl group of the peptide substrate during the hydrolysis of the P1–P1′ bond (Fig. 2) [8]. Because there are no known human proteases with a similar cleavage specificity, the MPro has become an attractive target for developing potential SARS-CoV-2 inhibitors [9].

Fig. 2
figure 2

a Structures of peptide substrate with labels of the SARS-CoV-2 Mpro proteolytic cleavage sites, and a peptidomimetic inhibitor (N3). b Molecular details SARS-CoV-2 Mpro polypeptide hydrolysis in the oxyanion hole (blue structures, Gly143, Ser144, Cys145), (1) Substrate (green) binds to the active site such that the thiol group of Cys145 is positioned to attack the substrate carbonyl carbon. Simultaneously, the carbonyl oxygen hydrogen bonds to three -NH groups of the protein and the histidine removes a proton from the cysteine -SH, thus activating it. The three -NH groups are part of the peptide backbone forming a loop called the oxyanion hole (colored in blue). (2) In the transition state for formation of the acyl-enzyme intermediate, negative charge has developed on the carbonyl oxygen; this is also stabilized by hydrogen bonding in the oxyanion hole. The first product is beginning to be released in this step and the bond between the carbonyl carbon of the substrate and the cysteine sulphur is beginning to form. (3) The acyl-intermediate. In this intermediate, the cysteine side chain is covalently bound to the carbonyl carbon of the substrate. The histidine gives up the proton it acquired from the cysteine to the nitrogen of the substrate. (4) In the transition state for hydrolysis of the acyl-enzyme intermediate, a water molecule attacks the carbonyl carbon of the acyl-enzyme ester as the histidine accepts a proton from it, analogous to the acylation step (2). (5) The transient negative charge on the oxygen of the acyl-enzyme is again stabilized by hydrogen bonding in the oxyanion hole. (6) Formation of product. The transient state collapses to release the other portion of the substrate and regenerate the cysteine -SH. c Figure showing the oxyanion hole of Mpro. Glutamine of substrate shown in magenta. Backbone NH groups of the oxyanion hole residues are colored blue

The viability of Mpro as a potential drug target has seen efforts channeled towards its inhibition by experimental [10] and computational strategies [11]. These efforts have focused on the development of inhibitors that compete with the native substrate at the active site. The effectiveness of these inhibitors could however be limited by competition with the natural substrate or issues of affinity [12]. Exclusion of electrophiles from potential lead compounds for reasons of safety and other negative effects such as carcinogenesis has also limited the pool of compounds available for development [13]. Interrogation of compounds capable of allosteric modulation of MPro has, thus, become an attractive option that is being earnestly pursued. El-Baba and his colleagues, for example, employed fragment molecules that bind to distant areas from the active site to inhibit Mpro [14]. Sztain and coworkers also used Gaussian Accelerated Molecular Dynamics to identify and characterize three areas of Mpro, namely the active site, dimer interface, and distal site, which could serve as druggable sites for inhibition [15]. The distal site, found in the C-terminal region, was shown to be highly druggable while having an interdependent dynamic effect on the active site, which is located in the N-terminal region [15, 16]. Allosteric inhibition, via the proposed distal site of Mpro, could serve as a new strategy for targeting SARS-COV-2.

Herbal remedies have existed in folkloric medicine for a long time [17,18,19]. Some of the plants used in such herbal remedies provide elegant structural scaffolds that can serve as lead compounds in drug discovery efforts. Natural products have served well in this regard, and various successes in other therapeutic areas—artemisinin and quinine for malaria, taxol for cancer—have encouraged a search for antiviral agents from natural products. A major strategy utilized in the identification of potential therapeutic agents against the covid-19 disease is repurposing, where either old drugs or existing compounds with antiviral activities are screened against the virus. Several natural products have been reported to possess significant antiviral activities and could therefore serve as a starting point for such screens. The alkaloids of Cryptolepis sanguinolenta have been shown to potentially inhibit some of the viral proteins of SARS-CoV-2 in an in silico study [20]. Jo and coworkers also showed that some natural flavonoids possess significant activity against SARS-CoV-2 in an experimental study that utilized a Fluorescence Resonance Energy Transfer (FRET) protease assay. This was further corroborated by an in silico study, which further highlighted baicalin, herbacetin, and pectolinarin as potential inhibitors of SARS-CoV-2 Mpro [21]. The use of computational tools to facilitate lead identification and optimization, and also to tease out plausible mechanisms of actions of lead compounds exist in the literature [22]. In particular, molecular docking and molecular dynamics simulations have proved very useful in this regard [23]. Molecular docking was used to identify potent inhibitors of Aurora Kinase A (AKA) and this resulted in the identification of the (4-methoxy-pyrimidin-2-yl)-phenyl-amine scaffold as a promising warhead. Using a computer-aided drug design (CADD) protocol involving structure-based virtual screening, de novo design and free energy perturbations, the study served as a starting point for the discovery of anticancer therapeutics [24]. Molecular docking and MD simulations have also been used to tease out new targets for HIV-1 therapeutics [25].

The Hibiscus plant is widely used in ethnomedicine to treat a wide range of diseases. Pharmacological evidence has shown that several species from the genus Hibiscus possess anti-diarrhoeic, antihypertensive, anti-inflammatory, antipyretic, hepaprotective, anti-spermatogenic, antihelminthic, anti-tumor, antidiabetic, anticonvulsant, immunomodulatory, antioxidant, antimutagenic and antiviral properties. This study therefore aimed to evaluate the potential of selected natural product-derived compounds to interfere with the normal functioning of the MPro of SARS-CoV-2 as a means of identifying structural scaffolds that could serve as lead compounds using in silico tools. Fifteen isolated compounds (gossypin, canthin-6-one, hibiscanal, myriceric acid, syriancusin A, syriancusin B, hibiscatin, gossypetin, cyanin, hibiscatin, rutin) from various plants of the genus Hibiscus were selected for this study based on their reported antiviral action [26,27,28]. Isolated compounds with biological activities from other plants were also considered. Cynaratriol and 1,4-benzoquinone, which are active compounds of Cynara scolymus and Embelia ribes respectively, are known inhibitors of angiotensin converting enzymes (ACE) [29]. Casticin, present in most plants from the genus Vitex, is a known immunomodulatory compound and an inhibitor of the Lassa virus [30] and was therefore also included in the study. Casticin was observed to interact with Mpro primarily at its distal site and hence was hypothesized to be capable of modulating the activity of the enzyme through a possible allosteric mechanism. Molecular dynamics simulation was used to investigate the effect of casticin binding on events at the active site in the enzyme. We herein report that casticin binding at the distal site distorts the orientation of the peptide substrate, and this could markedly decrease or potentially, abolish enzyme catalytic activity.

2 Methods

2.1 Pharmacokinetic Properties

The structures of the compounds used are shown in Fig. 3.

Fig. 3
figure 3

Chemical structures of compounds used in this study

The SWISS ADME webserver was used to analyze the pharmacokinetic properties of the compounds [31]. Lipinski’s rule of five was employed in assessing the drug-likeness of the selected compounds [32]. Drug-likeness properties investigated included absorption, distribution, metabolism, and excretion. To be drug-like, a compound was supposed to have a molecular weight < 500, logP < 5, hydrogen-bond donors < 5, and hydrogen-bond acceptos < 10. Compounds that had 0 violation of these requirements were predicted to be most drug-like.

2.1.1 Ligand Preparation and Substrate Peptide Modeling

The three-dimensional structures of the library of selected “drug-like” compounds were modeled and optimized using quantum mechanical methods as reported elsewhere [20, 33]. To monitor events at the substrate-binding site of MPro, a peptide substrate was required. A peptide that mimics the natural substrate of MPro and bears the recognition sequence needed for catalytic activity Ala-Val-Leu-Gln-Ser was designed. The residues of the peptide were represented as P4, P3, P2, P1 and P1’ respectively. This peptide (Fig. 2) was labeled N-3’ (due to its similarity with N3, a known peptidic inhibitor of MPro). The energy of N-3′ was minimized in Discovery Studio 2017 R2 client (BIOVIA, Discovery Studio Visualizer, [2017 R2 Client), San Diego: Dassault Systèmes, (2017)] with the CHARMM force field. All ligands were prepared for docking by adding polar hydrogens, merging non-polar hydrogens, and calculating Gasteiger charges for all atoms [20].

2.1.2 Protein Selection and Preparation

6WNP (X-ray Structure of SARS-CoV-2 main protease bound to boceprevir at 1.45 A) was obtained from the Protein Data Bank (PDB). To prepare the protein for docking, all crystallographic water molecules and co-crystallized ligands were removed, polar hydrogens added, and Gasteiger charges calculated for all receptor atoms. All these were achieved with AutoDockTools-1.5.7rc1 [34].

2.1.3 Molecular Docking

Receptors used for molecular docking simulations included the SARS-CoV-2 main protease under the PDB accession codes 6LU7 and 6WMP. The apo protein was obtained by deleting all co-crystallized ligands and solvents. Global docking was performed with Autodock Vina embedded in PyRx to ascertain the preferential binding modes and their respective binding affinities of the ligands. Ligand candidates, yielding a binary complex were chosen based on the highest binding affinity at the binding site which contained majority of the predicted ligand conformations. Following these considerations, casticin was ranked best, and molecular docking with AutoDock Vina in UCSF Chimera was used to model ternary complexes. The peptide substrate from 6LU7 was initially docked into the substrate-binding site of 6WMP. To generate the ternary complex, casticin was subsequently docked to the 6WMP-peptide complex using the following grid dimensions: center: x =  − 13.4736, y = 10.7715, z =  − 0583865 and size: x = 23.4867, y = 17.6548, z = 20.9149. The selected casticin pose for further MD analysis was based on interactions with at least one of the residues, namely Ser284, Ala285 and Leu286.

2.2 Molecular Dynamics Simulation

2.2.1 System Preparation and MD Production

Complexes considered for MD simulations were built in UCSF Chimera. The peptide substrate was placed into the active site by superimposing the 6LU7-peptide complex obtained from Suarez et al. work [35] with 6WNP. The gmx pdb2gmx tool was used to construct protein topologies using the CHARMM 36 all-atom force field in GROMACS v. 2018.2 on a local workstation. The CgenFF webserver was used to produce ligand topologies[36]. A high performance and parallel GROMACS v.2018.6 [37] module were used for all MD production runs. This was pre-compiled on the Lengau cluster (Center For High-Performance Computing, Cape Town, South Africa). Complexes were solvated with the TIP3P water model in a dodecahedron box set. Sodium and chloride ions were added for system-specific neutralization. The steepest descent algorithm was used to minimize energy without constraints over 50,000 steps until a tolerance of 10 kJ/mol was reached. The systems were then pressure-equilibrated at 1 atm for 100 ps using the Berendsen barostat after temperature equilibration at 300 K using V-rescale (modified Berendsen thermostat) for 100 ps under constant number and volume. For positional constraints, the linear constraints solver (LINCS) algorithm was utilized, while for long-range electrostatics, the particle-mesh Ewald technique was used. Coulombs and van der Waal’s cut-offs were set to 1.2 nm. The time step for the production run was set at 2 fs, writing trajectories every 10 ps. Finally, periodic boundary conditions (pbc) were set before every production run of 50 ns [38]. Trajectory Analysis, Visualization, and Plotting

The total pressure, temperature, potential and kinetic energies of all systems were monitored before the production run. The gmx trjconv tool was used to correct all output trajectory files prior to further analyses. Root mean square deviation (RMSD), radius of gyration (Rg), and root mean square fluctuation were computed using gmx rms, gmx gyrate and gmx rmsf tools, respectively [39]. Visualization of trajectories was performed with VMD 1.9.3 [40] using the generated structural (.gro) and pbc-corrected compressed trajectory (.xtc) files of all complexes. Molecular graphic images were generated with UCSF Chimera and DS where necessary. Datasets were plotted with XMGRACE. Binding Free Energy Calculations

The average binding energies were estimated with g_mmpbsa implemented in GROMACS. The estimated binding energies comprise molecular mechanical (EMM) as well as polar (Gpol) and apolar (Gapol) solvation terms. EMM calculates the molecular mechanics contribution of each system in vacuum and this consists of bonding (dihedrals, bond stretching, angle bending) and non-bonding (electrostatics and van der Waal’s) interactions. For Gpol, g_mmpbsa relies on solving for the Poisson–Boltzman Surface Area (PBSA) while the solvent accessible surface area (SASA) model was used for Gapol estimations. Available python scripts (MmPbSaDecomp.py and MmPSaStat.py) were used for these calculations [41].

3 Results

3.1 Pharmacokinetic Predictions for Ligands

To be considered drug-like, a compound should not violate more than one of the following requirements set by Lipinski; molecular weight < 500, logP < 5, hydrogen-bond donors  < 5, and hydrogen-bond acceptors < 10 [32]. A summary of the pharmacokinetic properties of the compounds is presented in Table 1. For this work, a strict criterion of zero (0) violation was used for ligand candidate selection. The logP values of most of the compounds were less than 5, ranging from − 3.89 to 2.41. Myriceric acid was however, an exception, with a logP value of 5.07. Two glycosides, rutin and cyanin, as well as myriceric acid (with 6 rings), had molecular weights greater than 500. The rest of the compounds had molecular weights of less than 500. Based on the hydrogen-bond donor requirement, gossypetin, hibiscatin, hibiscetin, rutin, cyanin and gossypin failed the set criteria, since they possessed more than 5 hydrogen bond donor groups. Hibiscatin, rutin, cyanin, gossypin and hibiscitrin had more than 10 hydrogen acceptors, violating the hydrogen bond acceptor rule. In all, 8 compounds—myriceric acid, rutin, cyanin, gossypetin, hibiscatin, hibiscetin, rutin, cyanin and gossypin—violated at least one of Lipinski’s criteria. The compounds that did not violate any of the set rules were 1,4-benzoquinone, canthin-6-one, casticin, cleomiscosin A, C and D, cynaratriol, hibiscanal, syriacusin A and B. These compounds were therefore selected for molecular docking studies.

Table 1 Pharmacokinetic properties of compounds

3.2 Molecular Docking Studies

The results of the blind docking of ligands with MPro are summarized in Fig. 4 and Table 2. The ligands had binding preference for three notable sites on the MPro – the active site, the active site-distal site interface, and the distal site (Fig. 4). Canthin-6-one, casticin and hibiscanal had majority of their poses in the distal site. Cleomiscosin A, C and D were clustered mainly at the active site of the protease. The other ligands had poses scattered around the aforementioned binding pockets. From the clusters obtained, docking scores obtained for the best pose of each compound were − 6.4 kcal/mol for canthin-6-one, − 7.1 kcal/mol for casticin and − 5.9 kcal/mol for hibiscanal. Due to the high affinity of casticin to the distal site, it had the highest number of poses (78%) within the distal site hence, formed the binary complex of choice hinting at the allosteric effect. Casticin was subjected to precision docking at the MPro distal site. The best pose considered for MD simulations was selected based on high binding affinity and interaction with regulatory residues Ser284, Ala285, and Leu286. N-3′ was placed in the active site with the help of UCSF Chimera to generate an Mpro-N-3′ (binary) complex and an Mpro-N-3´-casticin (ternary) complex. These complexes were then subjected to MD simulations.

Fig. 4
figure 4

a Collection of docked poses of compounds after blind docking, depicting preferential ligand binding regions. Protein is shown in grey while binding modes of ligands are shown in red. b Figure showing the low energy clusters of casticin at the active and distal sites

Table 2 Binding locations of the different poses of the selected compounds in blind docking against Mpro

3.3 MD Simulations

To explore the allosteric potential of casticin, MD simulations were performed in explicit solvent. Since the SARS-CoV-2 main protease performs its catalysis by recognizing conserved glutamine at the P1 site (i.e., cleavage occurs at the P1−P1′ peptide bond), Mpro-N-3′ (binary complex) was considered to provide insights into the orientation of the peptide substrate during the catalytic event across the simulation period. Another complex, with N-3′ bound at the active site and casticin at the distal site, was generated to explore the effect casticin may have on the orientation of N-3’ across time. This casticin—Mpro—N-3′ system was the ternary complex used. An unbound protein (apo form) was used to explore the protein dynamics in the bound and unbound states. The binary and ternary complexes were sampled at various time frames to explore the changes occurring across time. This was to unravel the effect of casticin binding on the orientation of the peptide substrate at the catalytic site. The 3D structures of the complexes were extracted from trajectory times at 10 ns, 20 ns, 30 ns, 40 ns, and 50 ns to sample the trajectories across the entire simulation period. The sampled structures were compared to their minimized structures (0 ns) prior to simulation to explore the change in dynamic events. 3D and 2D interactions were used to elucidate detailed observations made across the simulation. Comparative studies were made for both complexes at the selected time frames.

3.3.1 Protein-Peptide (binary) Complex

From the initial structure for the binary complex, N-3′ interacted with Mpro via hydrogen bonds with Asn142, Gly143, Ser144, Cys145, His163, Glu166, Pro168, Gln189, Gln192 and hydrophobic interactions with His41, Met49, Met165. Particularly, the carbonyl group of the glutamine residue of N-3’ formed hydrogen bond interactions with the backbone NH of Gly143, Ser144 and Cys145, with bond distances of 2.04, 2.97 and 2.76 Å respectively. Gly143, Ser144, and Cys145 form the all-important oxyanion hole in the S1 pocket of the active site of Mpro during proteolysis. Other hydrogen bonds and non-polar interactions were observed between the remaining residues of MPro and N-3′. These residues aid in the binding of N-3′ in the active site. At 10 ns, a slight change in the orientation, and hence bonding interactions of N-3′ was observed. While interactions with residues such as Glu166, Met49 and Met165 persisted, some contacts were lost and a few other new ones were formed. The carbonyl group of the glutamine residue of the peptide substrate (N-3′) maintained its hydrogen bond interactions with Gly143 (1.86 Å) and Ser144 (1.94 Å) of the oxyanion hole but lost contact with Cys145 (> 5 Å). Similar observations were made at 20 ns—the glutamine residue of N-3’ maintained interactions with Gly143 (1.79 Å) and Ser144 (3.7 Å) at shorter distances but none with Cys145. Interestingly, the hydrogen bond interactions of N-3’ glutamine carbonyl with the oxyanion hole residues (Gly143, Ser144 and Cys145) were restored at 30 ns. Even though interactions with other residues differed across the rest of the trajectory, the hydrogen bond contact between the glutamine carbonyl with the oxyanion hole formed was ever-present throughout the rest of the simulation at distances ranging from 1.91 to 3.0 Å. (Figs. 5a and S1).

Fig. 5
figure 5

Influence of allosteric binding of casticin on the interactions of the oxyanion hole residues with peptide substrate by 3D and 2D comparative analysis of binary and ternary complexes (represented by 3D and 2D interactions). 3D diagram showing the peptide substrate poses in the active site. The oxyanion hole is indicated by red-yellow-green surface patches. Hydrogen bond interactions (green dashes) of P1 carbonyl group of the peptide substrate (black) with the backbone amido groups of oxyanion hole, are shown with 3D interaction diagrams (bottom panels of a, b). Residue is indicated as green stick model while backbone amido group is labeled as blue a. Trajectory based time-specific analysis of the residence times of the conformations of N-3′ and presence of oxyanion hole residues in the binary (a) and ternary (b) complex at 0, 10, 20, 30, 40 and 50 ns. c 2D view of interactions of casticin at the distal site in the ternary complex

3.3.2 Peptide-protein-Casticin (ternary) Complex Events at the Active Site

The initial structure of the ternary complex featured similar interactions between N-3′ and Mpro as was seen in the binary complex. Hydrogen bond interactions between Mpro with N-3’ were through Thr25, Leu141, Asn142, Gly143, Ser144, Cys145, Glu166, Gln189, while hydrophobic interactions with His41, Met49, Met165 were observed. Similar to the binary complex, the glutamine carbonyl made hydrogen bond contacts with the oxyanion hole residues—Gly143, Ser144, and Cys145 at 1.79, 2.06 and 2.53 Å respectively. A change in residue contacts was observed at 10 ns. Interestingly, the peptide substrate lost interactions with all 3 oxyanion hole residues, resulting in large bond distances (> 5 Å) as seen in Fig. 5b. Even though N-3′ had a strong hydrogen bond interaction with Gly143 (2.43 Å) and Cys145 (2.45 Å) at 20 ns, these interactions, however, were with the Ser residue of N-3′ and not the glutamine residue as seen in the binary complex. The glutamine carbonyl, instead, interacted with Glu166 via hydrogen bond at a distance of 2.80 Å. No interaction was observed with the glutamine residue of the peptide substrate and any of the oxyanion hole residues at 30 ns, leading to higher bond distances (> 5 Å). At 40 ns, the glutamine residue interacted with only Gly143 (1.84 Å) and a similar scenario prevailed at 50 ns. Details of the interactions have been shown in Figs. 5b and S2. Events at the Distal Site

In the starting structure, casticin interacted with MPro via hydrogen bonds to Thr199, Ala285, Leu286 and Asp289, and hydrophobic interactions with Leu286. The residues, Ala285 and Leu286, have been reported in previous works to be important for modulating activity at the active site of the protein. Much focus was therefore placed on these residues during the analysis. At 10 ns, hydrogen bond and hydrophobic interactions involving the oxygen in casticin’s pyran ring (2.72 Å) and pi system of ring A with Leu286 (4.76 Å) respectively were observed. This was accompanied by hydrogen interactions with Leu271 (2.80 Å), Tyr239 (2.19 Å), Leu287 (2.16 Å), Met276 (2.74 Å) and Gly275 (2.42). Interaction with Ala285 was however lost. The snapshot at 20 ns revealed hydrogen bond contact with Ala285 but not with Leu286, while hydrogen bonds with other distal site residues were observed. Although interactions with Ala285 and Leu286 were lost at 30 ns, they were however restored at 40 ns via hydrophobic interactions (4.85 Å). However, binding of casticin at the distal site was short-lived as it exited the site after 40 ns. Summary of the interactions has been shown in Fig. 5c.

3.3.3 Comparison of Binary and Ternary Complexes

Both complexes exhibited similar interactions between N-3’ and the active site residues at the start of the MD run. Interactions in the binary complex involved His41, Met49, Asn142, Gly143, Ser144, Cys145, His163, Met165, Glu166, Pro168, Gln189, and Gln192 while Thr25, His41, Met49, Leu141, Asn142, Gly143, Ser144, Cys145, Met165, Glu166, Gln189, and Thr190 was observed in the ternary complex. Notably, both complexes exhibited strong hydrogen bond interactions between the glutamine carbonyl of N-3′ and the backbone NH groups of Gly143, Ser144, and Cys145. At 10 ns and 20 ns, Gly143 and Ser144 hydrogen bond interactions with the N-3′ glutamine’s carbonyl were still present in the binary complex. However, hydrogen bond interactions involving the glutamine carbonyl and the three oxyanion hole residues were displaced in the ternary complex. An interesting observation was however made at 30 ns. In the binary complex, the glutamine carbonyl of N-3′ established hydrogen bond interactions with the backbone NH groups of Gly143, Ser144 and Cys145. This is in contrast to the ternary complex where none of the aforementioned hydrogen bond interactions was observed. At 40 ns, the hydrogen bond interactions in question were retained in the binary complex. In the ternary complex, only hydrogen bond interaction with Gly143 was observed. These observations made in both complexes at 40 ns persisted to the end of the simulation.

3.4 Stability of Complexes

The RMSD of the apo protein, the binary complex, and the ternary complex were analyzed to determine the deviation of the protein backbone atoms in reference to their initial positions in the trajectory. This was used to examine the complexes’ stability across the simulated time frames. As can be seen in Fig. 6a, an RMSD value of ~ 0.3 nm was recorded for both binary and ternary complexes, while a much higher ~ 1.3 nm was observed for the apo protein. The RMSD of N-3’ and casticin was used to determine the deviation of their heavy atoms to the protein backbone. This was used to measure their stability in their respective pockets in the dynamic event. In the binary complex, the peptide substrate had an RMSD value of ~ 0.5 nm for the first 25 ns, which later increased to ~ 1 nm for the rest of the simulation (Fig. 6b). In the ternary complex, an RMSD value of ~ 0.4 nm was recorded for the peptide substrate from the start of the simulation and this was stable throughout the rest of the simulation period. RMSD for casticin was measured as ~ 0.4 nm for the initial 40 ns of the trajectory. However, there was a huge rise in the RMSD value (~ 8 nm) after 40 ns to the end of the simulation (Fig. 6c). This corresponded to casticin exiting its binding pocket after 40 ns.

Fig. 6
figure 6

Summary of system stability analysis from RMSD, RMSF, and Rg plots of apo protein and bound complexes. a RMSD of the Cα atoms of the protein. Bound complexes were relatively stable compared to the apo protein. b RMSD of binary system relative to first MD frame. The peptide substrate showed deviation from 25 to 50 ns, which was attributed to torsions in the ligand. Complex was however stable throughout the simulation c. RMSD of the ternary complex relative to first MD frame. Even though peptide substrate had a lower RMSD value, however, its conformation was disoriented, compared to the binary complex. Casticin binding was generally stable for 40 ns (RMSD ≤ 1.5), before exiting the distal site (RMSD ≥ 1.5 nm and ≥ 8 nm). d Radius of gyration of apo protein and bound complexes. Protein in bound complexes was more compact, compared to the apo protein. e RMSF of apo and bound complexes. Fluctuations were observed mostly in the N-terminal and the C-domain compared to the other regions of the protein

The root mean square fluctuation (rmsf) was used to identify the regions of the protein that undergo residue fluctuations during the simulation period. This provides insight into the effect of ligand binding on the flexibility of the protein. Unsurprisingly, the graphs show that the most flexible regions occurred at the N and C terminus of the protein, with rmsf values less than 0.5 nm. Although similar observations were made in most parts of the various protein structures under consideration (apo, binary, and ternary complexes), residues 50–80 and 190–220 of the apo protein exhibited higher fluctuations compared to the bound states. The protein in its bound state, therefore, undergoes fewer fluctuations as compared to its unbound state, depicting its stability upon substrate only, or peptide and casticin binding (Fig. 6e). To determine the compactness of the protein structures, the radius of gyration (rg) was determined. The radius of gyration of the apo protein was ~ 2.33 nm. In the bound complexes (binary and ternary complexes), an rg value of ~ 2.23 nm was also observed. Comparing the apo structure to the bound complexes, the protein in its bound states was slightly more compact (Fig. 6d).

3.5 Binding Free Energy Calculations

End-point binding free energy was computed to determine the binding affinity and stability of casticin at the distal site. From the results, van der Waals, electrostatic, and non-polar solvation energy contributed significantly to the total binding energy of casticin at − 100.156 kJ/mol, − 36.768 kJ/mol, and − 11.485 kJ/mol respectively. The polar solvation energy however had an unfavorable contribution with a binding energy of 71.257 kJ/mol. The overall binding free energy of casticin was reported as − 74.760 kJ/mol. A summary of the results can be found in Table 3. The lower binding energy observed indicates a stable interaction between casticin and Mpro.

Table 3 MM-PBSA calculations of casticin in the ternary complex (kJ/mol)

4 Discussion

Computer-aided drug design (CADD) has become one of the most important methodologies in modern drug development because it may greatly reduce the cost and effort involved in the drug discovery process. CADD speeds up lead development by cutting down experimental efforts. In particular, molecular docking, molecular dynamics, and ADME technologies have become crucial aspects of CADD [42, 43]. Drugs with good pharmacokinetic features are needed to meet efficacy and safety requirements, and these features must therefore be optimized to pass standard clinical trials [44]. Out of 19 compounds considered for this study, 11 compounds showed zero violation of the 5 Lipinski rules, and this is an indication of their high drug-likeness. Compounds with MW < 500 are usually drug-like because they could rapidly diffuse across cell membranes to reach target site. Also, a compound with the right hydrophilicity/lipophilicity ensures the permeability of the drugs to reach their targets in the body. The number of hydrogen bond donors (< 5) and acceptors (< 10) plays a major role in oral bioavailability by impacting diffusion across cell membranes [32]. The pharmacokinetic properties of these 11 selected compounds, therefore, merited further investigations as potential anti-SARS-CoV-2 agents. Given the important role of MPro in the viral cycle of the coronavirus, (cleavage of the translated polypeptide into functional subunits) there have been numerous efforts to identify potent inhibitors of this protease. Many natural products have been explored as potential inhibitors of the major SARS-COV-2 viral proteins [45,46,47]. Some of these natural products were suggested to function by acting as active site inhibitors of MPro. While no natural product or natural product-derived compound has yet to be approved as a drug for covid-19, the surge in research highlights the significance of natural products in the development of potential anti-SARS-COV-2 drugs. Blind docking studies displayed various binding modes of the compounds to MPro. Notably, 3 binding sites were observed: the active site, the active site-distal site interface, and the distal site (Table 2/Fig. 4). Several promising strategies for inhibiting Mpro have been investigated and this includes covalent inhibition of the active site by peptidomimetic inhibitors such as N3 and boceprevir [10, 48]. Non-covalent modulators, such as compounds that alter the catalytic activity of the protease at distant sites from the active site have not been studied extensively. Recently, Sztain and coworkers used Gaussian Accelerated Molecular Dynamics to identify and characterize two areas of Mpro, namely the dimer interface, and distal site, which could serve as druggable sites for allosteric inhibition. The distal site, found in the C-terminal region, was proven to be highly druggable using the PockDrug19 webserver, which predicts druggability probability based on a variety of factors including geometry, hydrophobicity, and aromaticity [15]. Mass spectrometric analyses by El-Baba identified small molecules that inhibit MPro through a potential allosteric mechanism by binding at the distal site. Some allosteric inhibitors were identified to bind to the proposed distal site. Interactions of these allosteric inhibitors (such as 2-{[(1H-benzimidazol-2-yl)amino]methyl}phenol and (2R,3R)-1-benzyl-2-methylpiperidin-3-ol) via hydrogen bonding with Thr196, Thr198 and Glu240 of MPro was observed [14]. Of the molecules screened in this study, casticin exhibited drug-like and allosteric features that caught our eye. In addition to zero violation of the Lipinski rules, 78% of its poses were in the distal site and the binding affinity was the strongest of all compounds that were docked to MPro in the distal site. Casticin, a natural product identified in Achillea millefolium, Artemisia annua and most Vitex species (eg. Vitex agnus-castus) has been reported to have immunomodulatory and anti-inflammatory effects on the lungs [49]. This flavonoid has also been reported for its antiviral effect against Lassa virus by blocking low-pH-induced membrane fusion [30] and possesses anticancer, antifibrotic, and anti-helminthic effects [50, 51].

The coronavirus main protease invariably recognizes peptide cleavage sites within polyproteins with a strictly conserved glutamine residue at the P1 position [52]. Given that the hydrolysis event mediated by the enzyme takes place at the P1–P1′ amide bond (Fig. 2a), the glutamine residue should necessarily be rightly positioned in the active site [53]. The insertion of the glutamine carbonyl group into the protease’s oxyanion hole, formed by Gly143, Ser144, and Cys145, is a critical pre-requisite for catalysis (Fig. 2b). Oxyanion holes are formed from two or three hydrogen bonds oriented towards a central atom in the substrate. In cysteine proteases, the oxyanion hole contains a cysteine residue in which the thiol group reacts with the carbonyl group of the substrate, resulting in the tetrahedral intermediate [54]. The optimum binding arrangement for anions also improves binding for the reaction's starting materials [55]. As the reaction proceeds towards the transition state, insertion of the carbonyl group of the substrate into the oxyanion hole is the most effective arrangement that best stabilizes the transition state. Also, since the oxyanion hole plays a major role in the proper orientation of the substrate for hydrolysis, any substrate that does not occupy the hole would not meet the structural requirements of the catalytic protease [56, 57]. The presence of the oxyanion hole in cysteine proteases is therefore vital for proteolysis [6, 58].

In the binary complex, the carbonyl group of the glutamine residue was well placed within the oxyanion hole defined by hydrogen bond interactions with the backbone amido groups of Gly143, Ser144, and Cys145 in the initial structure (Fig. 5a). Within the first 20 ns of the trajectory, the hydrogen bonding network that established the oxyanion hole was unstable, probably due to the peptide searching for a stable position within the active site to establish the necessary interaction required for the formation of the oxyanion hole. By the 30th ns, the glutamine-oxyanion hole network was re-established, and this conformation was stable enough to persist throughout the entire simulation. The stable hydrogen bond contacts between the glutamine carbonyl and the oxyanion hole result in a stable transition state, which lowers the activation energy required for the reaction and so promotes catalysis. The alignment of the substrate chain, together with the accommodation of glutamine of the peptide substrate in the S1 pocket, effectively locks the P1–P1′ amide bond into the Gly143/Ser144/Cys145 oxyanion hole. This observation made in the protein-peptide substrate suggests a stable orientation of the cleavage site for catalytic activity.

Analysis of the ternary complex sheds light on the effect of casticin binding on the orientation of the peptide substrate at the active site. In the starting structure, the glutamine carbonyl of the peptide substrate was inserted right into the oxyanion hole, forming hydrogen bond contacts with the backbone NH groups of the oxyanion hole residues. For the first 20 ns, the peptide substrate searched the active site for a stable conformation with the pocket residues. This resulted in the distortion of the glutamine carbonyl position in the oxyanion hole pocket. However, in contrast to stabilization of the hydrogen bond network in the binary complex at 30 ns, the position of the glutamine of the peptide substrate was never restored to the initial state, leading to the observed disorientation of the glutamine–oxyanion hole hydrogen bond network in the ternary complex. This was confirmed by large bond distances (> 5 Å) between the glutamine carbonyl and the backbone NH groups of the oxyanion hole residues (Fig. 5b). At 40 and 50 ns, the glutamine of the peptide substrate was observed to move closer to the S1 pocket oxyanion hole. However, the glutamine carbonyl was stabilized by a hydrogen bond with only Gly143. The loss of hydrogen bond contacts with the other oxyanion hole residues (Ser144 and Cys145) could destabilize the peptide substrate orientation and thus lead to a reduction in the rate of hydrolysis, or altogether abolish catalysis. Disruption of the hydrogen bonding network that forms the oxyanion hole in subtilisin via site-directed mutagenesis revealed that the catalytic contribution of the oxyanion hole to the overall stability of the complex was about 3.2–3.4 kcal/mol [54]. It has been reported that double mutations removed two hydrogen bond donors in the oxyanion hole of 3-Oxo-∆5-Steroid Isomerase in Pseudomonas putida and Comamonas testostereroni resulted in a 105–108 fold decrease in proteolytic activity [59, 60]. Furthermore, strong interactions exhibited by N-3′ with the active site residues of the ternary complex might prevent rotation that will lead to N-3’ achieving the right orientation in the active site.

Analysis of the interactions at the distal site revealed hydrogen bond contacts with Thr199, Ala285, Leu286 and Asp289 and hydrophobic interaction with Leu286 in the starting structure (Fig. 5c). Ala285 and Leu286 have shown to be critically important residues responsible for regulating motions in the active site of Mpro [15, 16]. The first 20 ns showed interactions between casticin and at least one of the two aforementioned residues. At 30 ns, there was deviation in the position of casticin in the pocket such that interactions with Ala285 and Leu286 were not observed. However, casticin quickly attained hydrophobic interactions with these residues as observed at 40 ns. This observation was short-lived though as casticin exited the pocket after 40 ns. However, the residence time of casticin coupled with its interaction with Ala285 and Leu286 was sufficient to distort the orientation of the peptide substrate in the active site of Mpro. The importance of Ala285 and Leu286 has been highlighted in some reports. It has been shown that the replacement of Ser284, Thr285, and Ile286 by alanine residues in SARS-CoV Mpro led to enhanced catalytic activity of the protease. This also led to changes in dynamic events in the protease that transmit the effect of the mutation to the catalytic site [61]. Another study also revealed that Ser284, Ala285 and Leu286 influenced catalytic efficiency, with Ala285 shown to make SARS-CoV-2 Mpro much more efficient than SARS-CoV Mpro [62]. A molecular docking study also suggested potential allosteric inhibition of SARS-CoV-2 Mpro after the compounds interacted with the regulatory residues, Ala285 and Leu286 [63]. Another work by Dubanevics et al. also confirmed dynamic allostery by the SARS-CoV-2 MPro via Ala285 and Leu286, where a dynamic motion correlation between the active site and the distal site was observed [16]. We, hereby, propose that interaction between casticin and the regulatory residues Ala285 and Leu286 caused a change in the internal motion of the active site, thereby causing disorientation of the position of the peptide substrate. This study, therefore, highlights casticin as a potential allosteric modulator that could be considered for further studies. Another careful observation throughout the trajectory shows hydrogen bond interaction between the oxygen in ring C of casticin and Leu271. This observation was consistent from the start of the simulation to 30 ns. At 40 ns, this interaction was however lost, which led to casticin’s exit from the distal site. It therefore suggests that the hydrogen bond interaction with Leu271 is responsible for anchoring casticin at the distal site, while casticin interacts with Ala285 and Leu286 to regulate motion at the active site.

Analysis of the RMSD, RMSF, and Rg helps to provide insights into the structural dynamics of the proteins and the ligands. RMSD analysis of the protein in the apo structure, binary and complexes showed that the protein gets stabilized after peptide substrate and casticin binding. Results from RMSD analysis of the peptide substrate in the binary and ternary complexes showed less deviation in the heavy atoms, depicting stability in the active site (Fig. 6). In the ternary complex, casticin was stable in the distal site for 40 ns, before it exited the site. Analysis of the protein in the apo structure, the binary and ternary complexes revealed less deviation in the side chain residues of the bound complexes (binary and ternary) compared to the apo protein. This shows that binding of the ligands induced less fluctuations in the protein, making it more stable. The radius of gyration helps to determine the compactness of the protein. Comparing the apo protein to the binary and ternary complexes, the bound systems were more compact compared to the apo protein (Fig. 6). Overall, the binary and ternary complexes were stable throughout the trajectory. This correlates with previous studies where it was reported that ligand binding stabilized Mpro after trajectory analysis [44, 64]. The binding free energy of casticin during its residence time at the distal site (40 ns) was 77.151 kJ/mol, with van der Waals, electrostatic and non-polar solvation energy contributing significantly to binding. The estimated binding energy was sufficient enough to distort the orientation of the peptide substrate at the active site.

5 Conclusion

The Mpro protein has been widely considered one of the most druggable targets for anti-SARS-COV-2 drug discovery. The protease activity of the protein is required to process viral protein in the host and thus aid in viral replication. Our study revealed that casticin could serve as a potential allosteric modulator of the SARS-COV-2 main protease. Interaction of casticin with reported regulatory residues Ala285 and Leu286 at the distal site could probably lead to a distortion in the conformation of the peptide substrate (N-3′) in the active site of the protein. This distortion disrupts the key oxyanion hole-peptide substrate interactions needed for proteolysis. RMSD, RMSF, and Rg analysis suggest a stable conformation of the protein upon casticin binding in its pocket for 40 ns. Binding energy calculation reveals a favorable interaction between Mpro and casticin. The results of this study reveal that casticin possesses potential allosteric inhibitory against the SARS-COV-2 main protease. Further in vitro and in vivo assays could be considered to explore the inhibitory potential of casticin against the main protease.