Molecular evolution of the SARS-CoV-2 RBDSpike: reviewing key residues
SARS-CoV-2 has a high rate of transmission in human [56,57,58,59] (though the fatality rate is low) while transmitting only nominally within other close species (civet, rodents, ferrets, other primates, etc.). Evolutionary genomic studies have revealed that the RBDSpike is the most variable part of the corona virus genome [20, 60]. Furthermore, recent literature on the proximal origin of SARS-CoV-2 [1] has highlighted the essential effective difference between RBDSpike of CoV and CoV-2 to be localized within a 51 amino acid stretch (residues: 442–491 in CoV; 455–505 in CoV-2) on their (evolutionarily mapped) ACE2 binding sites. Let this stretch be henceforth referred to as the “Spike-RBD-hotspot.” A visual structural examination revealed that the stretch primarily mapped to a long partially folded disordered loop with a small anti-parallel β-strand embedded in it (see Supplementary Fig. S2). The hotspot region includes six “critical” amino acid positions that physically bind to the receptor out of which five are mutated in CoV-2 with respect to CoV (Y442 → L455, L472 → F486, N479 → Q493, D480 → S494, T487 → N501) [1]. The overall composition or physicochemical consensus (in terms of hydrophobicity, charge, polarity, aromaticity, amino acid volume, etc.) upon these evolutionary changes remains almost unaltered in the two viral species. The only noticeable effective difference is in the mutation of one negatively charged amino acid to a polar residue (D480: CoV → S494: CoV-2). In a sense, the mutations collectively appear to be a reshuffling of the overall discrete sequence space (consisting of the aforementioned crucial positions). So, based on the above hypothesis [1], it is quite surprising that how this small, localized change could alone lead to such an incredibly high increase in transmission rate in CoV-2 with respect to that in CoV. To portray a more comprehensive picture of the evolutionary event, the observation window was broadened to the aligned full-length sequences of the two homologous protein domains (RBDSpike). As a matter of fact, the total number of point mutations between RBDSpike of CoV and CoV-2 are found to be 17, 12 out of which have an alternating hydrophobic character (i.e., polar/charged ↔ hydrophobic). Interestingly, all these mutations are situated within the “Spike-RBD-hotspot” defined above.
Affinity and stability of binding from local and non-local measures of complementarity
The coupling between the dual attributes of complementarity is well known in biomolecular recognition, concerning shape and electrostatic matching of the interacting molecular surfaces [44, 61,62,63,64]. It was also realized subsequently that shape complementarity (Sc) is a necessary criterion for macro-molecular binding while electrostatic complementarity (EC) is sufficient [61, 65, 66]. For oligomer formation in proteins, where large surface area (~1600 Å2) [67] are required to get buried upon complexation, surfaces have to be carefully tailored for the complementary interlocking of side-chains at the interface. This close association between the interacting molecular partners enhances the effective match between their protrusions and crevices so that extended areas can move into close contact [31, 44, 68, 69]. A poor complementarity in shape between two macro-molecular surfaces, therefore, stands out to be a strong forbidding factor for their close association. For example, two purely convex surfaces (say, two spheroids or ellipsoids) lack the steric fit to bind.
On the other hand, complementarity in surface electrostatic potential serves as a secondary criterion in macro-molecular interactions, especially for proteins. The inter-relation of electrostatic forces and protein stability is well known [62]. For example, optimizing Coulomb interactions through charge substitution on the protein surface leads to increased stability [70,71,72,73]. However, the same may not be achieved by a mere non-strategic increase in the net charge (positive or negative) as electrostatic repulsion may interfere within the folded state [70, 74, 75]. Along the same line, complementarity in surface charge and/or net charge were ruled out as the representative complementarity term in protein binary complexes [32] and was corrected by redefining EC as the correlation in surface electrostatic potentials. Sub-optimal EC values (even negative values) have been found to result occasionally from unfavorable or repulsive interactions in protein complexes, also in protein-ligand interactions [76], often compensated by strong counterbalancing geometric fit [63]. Such instances have been found in statistically considerable proportion (in ~20% of the cases) in native protein-protein complexes [65], wherein, compensatory elevated Sc values have frequently been recorded [65]. Such obligate interactionsFootnote 4 are generally found to be transient in nature, often linked with signaling pathways [77,78,79,80].
The long- and short-range nature of the forces giving rise to EC and Sc, respectively, leads to their corresponding stringent and relaxed criteria. Accordingly, the height and width of the “probable” regions vary in the complementarity plots (see Fig. 1). From this conceptual platform, it is quite logical to envisage shape complementarity (Sc) as an attractant factor in macro-molecular interaction representing the mutual affinity of the two molecular partners to engage into physical binding. On the other hand, since adequate electrostatic matching at the interaction-surface works favorably to stabilize the bound protein-complex, EC may plausibly be treated as the analogous structural parameter representing binding stability.
Evolution of the CoV-2 RBDSpike–ACE2 interaction dynamics
Based on the conceptual foundations discussed in the “Affinity and stability of binding from local and non-local measures of complementarity” section, the relative Sc and EC values (see Table 1) computed for SARS-CoV (2AJF) and SARS-CoV-2 (6VW1) were insightful. 2AJF has an Sc of 0.417 with an EC of 0.185. Together, these values rationalize the binding, both numbers are appreciably positive, falling in the “both-positive” (+, +) first quadrant of CPdock (refer to the “Complementarity plot (CPint and CPdock)” section). However, the ordered pair {Sc, EC} values also indicate that the binding is sub-optimal with respect to their corresponding reference ranges—which is clearly reflected from the location of the corresponding point in CPdock (see Fig. 2). In more elaborate terms, the point falls outside the optimal or near-optimal zones, i.e., outside the “probable” and “less probable” regions in the plot (refer to the “Complementarity plot (CPint and CPdock)” section). In contrast, in 6VW1, Sc is found to be 0.555 (14% increase w.r.t. CoV) while EC is as low as 0.102 (~5% drop w.r.t. CoV). Again, both values are positive, the resultant {Sc, EC} point in CPdock hits the first (+, +) quadrant of the plot (see Fig. 2), thereby, rationalizing the binding (refer to the “Complementarity plot (CPint and CPdock)” section). Visual investigation of the two {Sc, EC} points from 2AJF (CoV), 6VW1 (CoV-2) side-by-side on CPdock further revealed their comparative interaction dynamics which is evolutionarily insightful. Biochemical solution studies elsewhere [21] had already confirmed that the RBDSpike has a significantly greater affinity towards ACE2 relative to that in CoV. The same is also reflected in their corresponding Sc values. The 14% increase in Sc in CoV-2 relative to that in CoV actually makes the Sc value hit its non-rigid optimal range (refer to the “Shape and electrostatic complementarity” section). As a result of this appreciably increased shape matching, the RBDSpike in CoV-2 would have a much higher affinity for ACE2 than that of CoV and would therefore be attracted much faster to its cognate receptor. However, at the same time, it renders a sub-optimal EC value (0.102) upon interacting with ACE2. In elaborate terms, the receptor and the ligand contact-surfaces share just 10% match between their surface electrostatic potentials coming from the electric fields of their own and that of their partner’s (see Fig. 3). By definition (refer to the “Shape and electrostatic complementarity” section), this means weak anti-correlation in surface potentials at the interface, as the close association of two perfectly anti-correlated electrostatic surfaces would ideally return a value of EC = 1 [32]. Hence, yet being attracted to ACE2 faster than that in CoV, the RBDSpike in CoV-2 would also get released from the receptor faster as the unfavorable electrostatic interactions would act against a stable binding. The lower stability in the ACE2-bound binary PPI complex in CoV-2 relative to that in CoV can also be cross-validated by comparing the “dG_separated” values for both, computed by structure driven thermodynamic calculations using Rosetta [23]. Interestingly, in spite of the sub-optimal EC, the increase in Sc in CoV-2 relative to CoV results in a right-shift along the horizontal axis of the corresponding resultant point (CoV-2) in CPdock making the point map to the near optimal zone (~ “less probable” region). Overall, the RBDSpike–ACE2 interaction in CoV-2 does appear to have a quasi-stable character in spite of having a high affinity. At the same time, it is also interesting to reveal that a disease with such a high rate of transmission is actually triggered by a quasi-stable interaction—which may potentially instigate parallel research endeavors to further explore the phenomenon at more complex molecular hierarchies.
Table 1 Comparison of the complementary estimates of the homologous RBDSpike bound binary PPI complexes In order to carry out a comparison among the available homologs, Sc, EC were computed for all six RBDSpike–ACE2 binary complexes (refer to the “Details of experimental structures used in the study” section) and were plotted together in CPdock. Both Sc and EC hit values in their corresponding sub-optimal to near-optimal ranges (see the “Shape and electrostatic complementarity” section) making the corresponding points scattered around the “improbable” and “less probable” regions of CPdock. Noticeably, the civet strain, 3SCJ has the closest approach (see Fig. 2) to optimality (see the “Shape and electrostatic complementarity” section) in terms of the combined {Sc, EC} ordered pair, corresponding to its relative closeness from the “probable” region of CPdock (compared to the other candidates in the set). Interestingly, the {Sc, EC} points corresponding to all the homologs was found to cluster around the left-bottom (south west) of the “probable” (optimal) region in CPdock (see Fig. 2b). Such a distribution of points in CPdock is indicative of sub-optimal quasi-stable binding of the two molecular partners along evolution. This was also prominent from a structural display of the molecular interface (see Fig. 2a). For instance, there were no deep grooves or any binding pockets on the receptor where the ligand may stably fit with high affinity. Neither there were signs of any conformation-induced knotting upon binding nor other known/intuitive structural models that might map to “high affinity stable binding.” Rather, the binding appears to be reminiscent of a “molecular handshake” [82] rather than a molecular hug or cling, both from CPdock and from the corresponding structural displays. It is also noteworthy that the part of the “ACE2 peptidase domain (PD)” that physically binds to RBDSpike is actually a single α-helix, known as the “ACE2 PD α1 helix.” The same relative trends among the homologous structures (see Table 1) are also naturally reflected from CP-based global (Complementarity score, CSl) and local measures [45, 46].
Comparison with equivalent protein complexes from MERS and Ebola
As a point of reference, equivalent protein (binary) complexes from other deadly viral diseases in human were surveyed in a likewise manner. MERS (PDB ID: 4L72) CoV RBD, when bound to its human-receptor Dipeptyl transferase (DPP4) had substantially better shape fit and electrostatic matching along extended mutually compatible surfaces (see Fig. 4, upper panels). On the other hand, the Ebola Viral Glycoprotein, bound to its endosomal receptor Niemann-Pick C1, displayed signatures of knotting upon binding induced conformational changes naturally having far greater surface fit coupled with optimal electrostatic matching (see Fig. 4, lower panels).
Comparative stability of the RBDSpike conformers influencing their switch
As discussed in the “Introduction” section, pioneering EM studies [15] have revealed a “surprisingly low kinetic barrier” for the conformational transition between the pre- and post-fusion forms of the Spike protein. The key mediator of this conformational transition is the RBDSpike domain which, when proximal to the ACE2 expressing lung cells, switches from its native “down” (RBDdown) to active “up” (RBDup) forms primed by a conformation dependent proteolytic cleavage. This cleavage along with the conformational switch, together, set the RBDSpike free and enable it to bind to ACE2 concomitantly. Intuitively, the RBDdown is structurally preferred over RBDup as the “down” state is also known to be functionally coupled to its ability to escape the host immune surveillance. To that end, we carried out a comparison based on the proposed complementarity measures (Sc, EC) computed independently on RBDdown and RBDup (as “target” objects) with respect to their respective (local, global) neighborhoods in order to reveal if the said preference (RBDdown over RBDup) can indeed be portrayed from the relative numbers. In addition, surface area buried upon association (BSA, see the “Buried surface area” section) for both forms (RBDdown, RBDup) was also considered as a third measure of comparison. Thus, essentially, we surveyed to which of its two surrounding neighborhoods ((i) as embedded within the native Spike or (ii) as in complex with ACE2) does the RBDSpike (as the “target” molecular object) feel more harmonious. Notably, binding and folding in proteins can be treated equivalently based on the concept of complementarity [44], wherein, folding can be envisaged as the self-docking of the interior components of a protein-chain/domain onto their respective native environments, consistent with short- and long-range forces sustaining the native fold. To that end, the trimeric RBDdown was contemplated to have self-docked onto the rest of the (native) Spike protein.
The full-length Spike protein in its native pre-fusion form is a biological trimer (PDB ID: 6XR8, bio-assembly-1). Thus, structurally, RBDdown is actually an assemblage of three symmetry-related RBDSpike (down) units while they remain integral to the Spike protein, serving as its limbs. On the other hand, RBDup refers to the post-cleavage S1 fragment(s) entrapped as the ligand chain(s) in the RBDSpike–ACE2 binary complex, which again is a biological monomer (6VW1, two bio-assemblies, both monomeric). The proposed mechanism for the viral host cell entry [15] also clearly portrays this “trimeric → monomeric” switch of the RBDSpike (RBDdown → RBDup) upon binding to ACE2. Thus, as would be appropriate, RBDdown was taken as the trimeric association of the RBDSpike (down) units embedded in the full-length Spike protein (6XR8) while its neighborhood consisted of the “rest of the Spike protein” (barring the RBDdown). On the other hand, RBDup was retained (as throughout the paper) as the ligand (E) chain in 6VW1 with the receptor (A chain) ACE2 serving as its neighborhood. The three following calculations were then performed:
-
(i)
EC for RBDdown in native Spike (ECRBD_down) was computed (from 6XR8), and compared with the equivalent measure (ECRBD_up, referred to as EC1,2 in Fig. 3) already computed for RBDup (referred to as the “ligand” in 6VW1: see the “Evolution of the CoV-2 RBDSpike–ACE2 interaction dynamics” section). For ECRBD_down, RBDdown served as the “target” (refer to the “Shape and electrostatic complementarity” section) while the “rest of the Spike protein” served as its global neighborhood.
-
(ii)
Likewise, Sc for RBDdown (target) in native Spike (ScRBD_down) was computed (from 6XR8), and compared with the equivalent measure (ScRBD_up) already computed for RBDup (refer to the “Evolution of the CoV-2 RBDSpike–ACE2 interaction dynamics” section). Likewise to that of ECRBD_up, RBDdown also served as the “target” for computing ScRBD_down while its local neighborhood was sampled from the “rest of the Spike protein”. To that end, the local neighborhood of RBDdown was delineated by collecting those residues (from the “rest of the Spike protein”) which were found within a relaxed Cα-Cα cut-off of 12 Å from any residue in RBDdown (see Supplementary Fig. S3). The calculation was also repeated at a 15-Å cut-off which returned the same ScRBD_down. The over-relaxed cut-offs ensured not to miss out any potential neighboring atoms, while, at the same time, helped to speed up the calculations.
-
(iii)
BSA was computed (see the “Buried surface area” section) for RBDdown (target) in native Spike (BSARBD_down), and compared with that of RBDup (target) in complex with ACE2 (BSARBD_up).
The expected preference for “down” over “up” forms in RBDSpike was reflected from all three measures (EC, Sc, BSA) (see Supplementary Table S1). Although the ECRBD_Down (referred to as EC1,2 in Supplementary Fig. S4) was fairly low (0.254), the correlation is over 16,847 points (p value <0.00001; significant at p < 0.01) and the value is 4.5 times more than that of ECRBD_Up (0.055) computed over 762 points (p value, 0.129293; not significant even at p < 0.1). ECRBD_Up is the same measure referred to as EC1,2 in Fig. 3 (where RBDUp is referred to as the “ligand”). The corresponding shape complementarities also followed a similar trend (ScRBD_Down = 0.617; ScRBD_Up = 0.566), though, as expected (refer to the “Affinity and stability of binding from local and non-local measures of complementarity” and “Evolution of the CoV-2 RBDSpike–ACE2 interaction dynamics” sections), the difference was nominal. The preference is perhaps most pronounced and direct from the corresponding BSA values. While, in the native Spike (6XR8), BSARBD_down amounts to 6306.1 Å2 over 1538 atoms at the interface (∆ASA ≠ 0, see the “Buried surface area” section), BSARBD_up reduced to 875.3 Å2 over 189 interfacial atoms in the ACE2-bound complex (6VW1). Thus, both the relative BSA and the relative number of atoms buried upon association/complexation are more than 7 to 8 times higher in RBDdown (in “Spike native”) to that of RBDup in complex with ACE2. So, it is clear and unmistakable from all three measures that RBDSpike indeed prefers to stay in the passive “down” state till it reaches the primary site of infection, while, switching over to its more active “up” state only when proximal and exposed to the ACE2 receptors. This structural preference of RBDSpike (for “down” over “up”), in effect, serves to aid as a “transient” molecular switch to trigger the membrane fusion and host cell entry of the virus. The fact that such transitions are energetically costly and are therefore expected to be kinetically driven perfectly aligns with the finding of the dissociated cleaved S1/S2 complex in absence of ACE2 and the adopted post-fusion conformer of the S2 fragment under (membrane mimicking) mild detergent conditions, which together reveals the “surprisingly low kinetic barrier” for the conformational transition [15].
Reaction-prone nature of the ACE2 binding site in SARS-CoV-2 RBDSpike
As elaborated in the above sections, when compared with analogous ligand-receptor binary PPI complexes from related viral strains in the human host, the RBDSpike–ACE2 interface in SARS-CoV-2 does appear to be different and rare. All analyses unequivocally indicate that the interface maps to protein binary complexes involving transient interactions [79] which is likely to be causally linked to its presumably unique modus operandi. To cross-validate this observation, other independent approaches were also adopted concerning the study of the interface. This included (i) calculation of the accessibility score (rGb) of the binary PPI complex and different relevant molecular fragments, and (ii) a detailed analysis of the contact map at the interface. As a matter of interest, 6VW1 (i.e., the only representative interface structure from CoV-2) alone was chosen for the analyses.
As detailed in the “Materials and methods” section (see the “Accessibility score” section), a value of rGb greater than 0.011 (and higher the better) qualifies a globular protein/protein complex/peptide fragment/protein domain to be considered native-like in terms of hydrophobic burial or the distribution of amino acid residues with respect to solvent exposure. Any value less than this empirical threshold renders the input protein molecule non-native like which physically means that hydrophobic residues are exposed to the solvent. This would cause the molecule to stay in an unfavorable/frustrated disordered (high entropy) state. A negative value virtually guarantees this instability which may be extended to depict a reaction-prone nature of the said protein fragment.
With this understanding, rGb was computed for the (i) whole native protein binary complex (referred to as 6VW1_AE in Table 2) and its different relevant molecular fragments, namely, (ii) the ligand chain (chain E of 6VW1) or the RBDSpike alone (6VW1_E in Table 2), (iii) the “Spike-RBD-hotspot” (residues 455–505, refer to the “Molecular evolution of the SARS-CoV-2 RBDSpike: reviewing key residues” section) where all key mutations are localized (6VW1_E_hotspot in Table 2), and (iv) the actual ACE2 binding site or the collection of mapped interfacial residues on chain E as found in the contact map (6VW1_E_bs in Table 2). Interestingly, the rGb scores were found to be decreasing in large fractions from (i) 6VW1_AE to (iv) 6VW1_AE_bs following the descending order of size of the input protein fragment. The relative numbers clearly indicate that the binary PPI complex has the most optimum (or native-like) distribution of hydrophobic burial (rGb 0.052, see Table 2) in the whole set which is substantially better than the ligand chain alone (rGb 0.028). The high negative value (rGb −0.055) obtained for 6VW1_E_bs speaks for its high reaction-proneness [83]. In other words, the high degree of unfavorable hydrophobic exposure makes the ACE2 binding site in RBDSpike critically scurried or strained in its free state. Thus, it is always in a crisis need to embed itself within a befitting complementary surface of an appropriate binding partner.
Table 2 Reaction proneness of the ACE2 binding site on RBDSpike surveyed by the accessibility (rGb) score For another level of cross-checking, the contact map at the interface (see the “Contact map at the interface” section) was also rigorously scrutinized. The interface was large with an accessible surface area buried upon complexation (∆ASA) of 1644.4 Å2 considering both molecular partners. It involved 23 inter-residue contacts between the residues coming from the two molecular partners totaling 96 pairwise atomic contacts between their side-chain atoms. The interface appears to have many rare interesting features. From the rGb calculations stated above, it was already clear that the RBDSpike interfacial surface had several exposed hydrophobic residues; hence, it is perhaps of no surprise that the contact map consisted of several hydrophobic residues coming from the ligand (RBDSpike). Interestingly enough, most of these hydrophobic residues were found to be in contact with hydrophilic residues coming from the receptor. Furthermore, a large majority of these hydrophobic residues were in fact bulky aromatic amino acids (see Supplementary Table S2). They were mostly found to be in contact with either “elongated positively charged” (Lys) or “aromatic yet polar amino acids” (His) coming from the receptor. The corresponding interactions mapped to close hydrophobic packing between extended chains of successive mythelene groups (-(-CH2)4) of the lysine(s) and the aromatic ring (31-Lys-A–489-Tyr-E, 353-Lys-A–505-Tyr-E) (see Fig. 5a, b). There were also instances of polar interactions involving aromatic components (34-His-A–453-Tyr-E) (see Fig. 5d), although, there were no clear signatures of any cation–Π or Π-Π stacking between the charged residues and the aromatic rings. However, there were instances of regular aromatic stacking with a slide and an open angle separating the otherwise-parallel aromatic rings (83-Tyr-A–486-Phe-E) (see Fig. 5e). Also, there were hydrophobic packing (79-Leu-A–486-Phe-E, 34-His-A–455-Leu-E) and electrostatic interactions involving polar atoms (24-Gln-A–487-Asn-E, 42-Gln-A–498-Gln-E, 34-His-A–493-Gln-E) (see Fig. 5f). Interestingly, there was a salt-bridge (31-Lys-A–484-Glu-E) as well at the interface (see Fig. 5c) whose presence may be further destabilizing due to desolvation effects—as has been found for salt-bridges in general at protein-interfaces [32, 38, 84]. Overall, it genuinely appears that the interface high potential to harbor and withstand unfavorable electrostatic interactions—which may be causal to the resultant sub-optimal electrostatic complementarity (EC = 0.102).
Inherent evolutionary features of RBDSpike naturally aiding the design of its structural mimics
The primary objective of the current study was to develop non-virulent structural mimics of the RBDSpike that could bind to the ACE2 receptor stably with high affinity. For convenience, let these binary PPI complexes be henceforth referred to as “ACE2-complexes” pertaining to the corresponding RBDSpike-ligands (native and designed). These designed mimics would thus serve as potential competitive inhibitors of the viral RBDSpike by occupying the binding sites on the ACE2 receptors. To that end, a protein design approach was adopted aiming to raise the EC of the designed ACE2-complexes (from their sub-optimal native reference value: EC6VW1 = 0.102) while retaining or raising Sc at or from its already near-optimal range (Sc6VW1 = 0.555). The conceptual foundations of the “plausibility of the design strategy” relied on a twofold fact. Firstly, the RBDSpike is an independently foldable domain which is self-sustained as a protein unit and can undergo folding independent to that of the rest of the Spike protein [21]. Secondly, the RBDSpike is resilient to conformational changes upon multi mutations, as has been evident from structural analyses (refer to the “Evolution of the CoV-2 RBDSpike–ACE2 interaction dynamics” section) of the homologs. This means that the basic fold in RBDSpike remains unaltered in spite of the evolutionary sequence variations. The pairwise sequence similarity of the CoV RBDSpike sequences with respect to 6VW1 (CoV-2) was found to be ~69%. RMS deviations (Cα) upon superposing the CoV RBDSpike–ACE2 structures (refer to the “Details of experimental structures used in the study” section) onto 6VW1 were found varying from 1.29 Å (for 3SCL) to 3.18 Å (for 3D0G) (see Supplementary Fig. S5). Furthermore, there were virtually no conformational changes of the RBDSpike upon binding to the ACE2 receptor with respect to its structure in free form (6VXX). RMS deviation (Cα) upon superposing the RBDSpike from 6VW1 onto the free and full structure of the Spike protein (6VXX) was 0.893 Å. Together this means that one may simply administer the finally selected designed mimics without having to bother about their folding (ab-initio) as long as their sequences fit the fold. Test of this fitness with the given fold (i.e., fold compatibility) of the designed sequences was made by state-of-the-art scoring functions for fold recognition (refer to the “Fold recognition” section).
The protein design strategy: sampling and scoring
As mentioned in several earlier sections, a protein-design approach was adopted aiming to develop non-reactive structural mimics of the RBDSpike which may serve as potential competitive inhibitors of the native viral Spike protein to act against the viral pathogenicity. As was found out, the interacting surfaces of CoV-2 RBDSpike and ACE2 has a high shape fit (Sc: 0.555) mapping to its optimal range (refer to the “Shape and electrostatic complementarity” section) coupled with a sub-optimal electrostatic matching at the interface (EC: 0.102). Together, these may be interpreted in terms of having a high affinity yet with a low stability upon binding. Aligned observations have also been proclaimed by biochemical solution assays [21] and calculation of structure based thermodynamic parameters [23] carried out in other studies. This quasi-stable nature of the binding potentially triggers a fast-release of the ligand from the receptor, making them amenable to interact with a greater number of cells having surface-exposed ACE2 receptors. So, the primary objective in the designed RBDSpike mimics was to increase the EC at the interface which would make the interaction more stable. Combining the shape affinity factor, the design problem aimed to improve EC while retaining Sc at least native-like in that “near-optimal to optimal” range. Experimental structural studies in an aligned direction have already demonstrated the favorable effect of key residue substitutions performed across the whole C-terminal domain of the CoV-2 Spike protein harboring the RBDSpike (see Supplementary Fig. 1). Such key-substitutions have been found to strengthen the RBDSpike–ACE2 interaction leading to a 4-fold increased affinity for receptor binding than that of the native ACE2-complex (see the “Inherent evolutionary features of RBDSpike naturally aiding the design of its structural mimics” section) [24]. For our purpose, we had chosen to operate on the RBDSpike itself. When the native binary PPI complexes from the homologs (refer to the “Details of experimental structures used in the study” section) were superposed onto 6VW1, the average pairwise Cα-RMS deviation was found to be 2.05 Å. This evolutionary structural conservation meant that mutations at the ligand (RBDSpike) interface can directly be performed on the native ACE2-complex (6VW1) itself. In a sense, the bound binary PPI complexes were treated like unified globular proteins, wherein, the design protocol may be considered analogous to performing a “hydrophobic core design” or a “full sequence design” in globular proteins. Any protein design protocol has two essential steps: (i) sampling and (ii) scoring. For the current study, sampling (i.e., incorporating strategic mutations) was attempted by essentially two approaches, consistent with the main objective of raising the EC while retaining an at least native-like Sc. In the first of the two approaches, attempts were made to alter the hydrophobic character of the amino acid residues at the interface while keeping their shape and size as similar as possible. Intuitively, this could alter and possibly raise the EC while keeping Sc similar. An equivalent strategy, earlier, was found fruitful in incorporating unbalanced partial charges into native globular protein interiors and detecting the local “electrostatic” errors in-turn [45]. In the second approach, homologous sequences (i.e., direct examples from nature) that were already found to hit appreciably higher EC values were threaded on the native RBDSpike template in 6VW1. Strategic mutations were performed on this threaded homologous sequence based on the contact map at the interface. All mutations in the aforementioned two approaches were performed on the ligand molecule alone retaining the receptor as it is. Scoring and raking of the binary PPI complexes were primarily based on the complementarity measures (refer to the “Shape and electrostatic complementarity” and “Complementarity plot (CPint and CPdock)” sections). Fitness or compatibility of each designed sequence with respect to the native fold was tested by fold recognition measures also based on complementarity (refer to the “Accessibility score” section).
Design strategy-1: altering the hydrophobic character of the amino acids
First, from the distribution of interfacial amino acid residues of the ligand chain (6VW1_E) in the residue-wise Complementarity Plots (CPint), residues falling in the “less-probable” and “improbable” regions (see Fig. 1) were accumulated. They were then united with critical residue positions on the ACE2 binding site (the “Spike-RBD-hotspot,” residues 455–505: see the “Molecular evolution of the SARS-CoV-2 RBDSpike: reviewing key residues” section) said to be harboring determining evolutionary mutations [1]. The full set (S1) consisted of 11 amino acids in total (see Supplementary Table S2) and but for the case of 417-Val the rest of the residues were covered within the aforementioned “hotspot” region. Out of the 11 amino acids selected, four were bulky aromatics, three branched chain hydrophobic, and the rest polar. As a first trial (strategy-1a), mutations were made in this set of 11 residues alone. The raw combinatorial space considering all possible amino-acid mutations is of the astronomical order. To curtail it down to the limits of finite sampling, ad-hoc filters involving semi-empirical rules of thumb (detailed as follows) were judiciously incorporated. Each designed sequence was unique as the sampling involved random seeds. Coupled with the random seeds a weighting scheme was further adopted. For 50% of cases, the amino acids were mutated to (i) residues with alternating hydrophobic character and/or structural properties (S↔S, A↔S, V↔T, L↔N, F↔Y, L↔D, I↔M, M↔R, E↔R, E↔Q, D↔N, R↔M, R↔E, etc.: antonymous changes) and for the other 50%, to (ii) amino acids with similar properties (G↔P, V↔L, F↔W, K↔R, E↔D, Q↔N, H↔Y, S↔T, etc.: synonymous changes). Care was taken to retain their size and/or shape as much as possible. This 1:1 ratio of weights was further varied from 2:1 to 1:1. The intent was to raise the residue-wise electrostatic complementarity (Em) of amino acids falling into the “improbable”/“less probable” regions of CPint in such extents that they can make it to the “probable” regions. It was subsequently realized that electrostatic matching is essentially a global effect and need not necessarily affect the mutated residue itself. Hence, in an alternative approach (strategy-1b), the contact map of the interface was surveyed (refer to the “Contact map at the interface” section) and the ligand residues involved in this set (S2: 13 of them) were chosen as the target positions (see Supplementary Table S2) to perform the mutations keeping the same sampling strategy. There was appreciable overlap (~46%) between the two sets, S1 and S2.
For each of the two aforementioned strategies (1a and 1b) 50 redesigned sequences were constructed and tested in CPdock. Each individual case was carefully scrutinized with visual intervention at all stages of the design protocol. When plotted in CPdock, they were fairly closely spaced creating a south-west island (see Fig. 6a) relative to the center of the optimal zone in CPdock (i.e., the “probable” region). The points were more closely clustered for the first set (strategy-1a) relative to the second (strategy-1b) in terms of both complementarity measures: Sc, EC as reflected in their corresponding range of obtained values (Set-1a: [0.394, 0.544] in Sc, [0.113, 0.298] in EC; Set-1b: [0.514, 0.733] in Sc, [0.042, 0.314] in EC for strategies 1a and 1b, respectively).
In spite of being more closely clustered, Set-1a mapped to values further away from the optimal zone relative to Set-1b. On the other hand, Set-1b appeared to have a greater chance of returning false positive points falling in the “improbable” regions (sub-optimal zones) of the plot (see Fig. 6b). The top 25 sequences from each set were then filtered based on their residence in CPdock (relative to the optimal zone). All filtered sequences successfully passed the test for fold-compatibility (averages 2.76 ± 0.17 in CSgl; 0.016 ± 0.0001 in CScp). These sequences were more closely spaced in CPdock relative to the corresponding original sets. Set-1b mapped more into the “probable”/“less probable” regions (i.e., optimal/near-optimal zone) relative to Set-1a, though, with a greater number of false-positives (see Fig. 6a, b). To serve as negative controls, “scrambled” sequences (refer to the “Scrambled sequences as negative control” section) were generated for each set by random reshuffling of the designed sequences and plotted alongside the “hits” in the two sets (1a and 1b). Clear discriminatory clusters were obtained for the “hits” and the “scrambled” sequences (refer to the “Scrambled sequences as negative control” section) with virtually no overlap (see Fig. 6). All points in the corresponding “random” clusters (the “red dots” in Fig. 6) representing the scrambled sequences were found at the “improbable” regions of the plot, indicating that they were unambiguously sub-optimal.
Design strategy-2: homology-based protein design: taking templates from nature itself
In several well-posed hard-to-solve bioinformatics problems, direct adoption of empirical natural strategies [85,86,87,88] coupled with trial-and-error modulations has found much scope and penetration. This includes the very problem of protein structure prediction (considered to be the “holy grail of structural biology”) or other related sub-problems emerging from the core of the protein folding problem (e.g., fold recognition [44], protein design [89], etc.). The “fragment assembly simulated annealing” strategy [87, 90] as in Rosetta is based on natural examples—which is arguably the best structure prediction methodology till date. With the same intuition, we also attempted the direct use of empirical natural examples in our design pipeline, as an alternative to changing the hydrophobic character of amino acids at the interface (strategy-1, a and b). In that line, we picked up the RBDSpike sequence from 3SCJ (i.e., the civet strain from predicted SARS-CoV; see Table 1) motivated by its complementarity estimates (Sc: 0.523, EC: 0.301)—together which stood out to be the best among the homologous. Consequently, 3SCJ also had the closest approach to the “probable” region of CPdock (see Fig. 2) relative to the other homologous, which is to say the closest to being an optimal solution. The sequence of 3SCJ and 6VW1 were aligned, and the aligned 3SCJ sequence (target) was directly threaded onto the main-chain trajectory of the ligand in 6VW1 (template). The threading protocol followed three simple rules of thumb. (R1) For a deletion in the target sequence with respect to the template, the template amino acid was incorporated to fill the gap. (R2) In case of substitution(s), the obvious choice was the target amino acid. (R3) For identical amino acids in the corresponding positions in the template and the target, choosing either of the two meant the same. As a matter of fact, there were no insertions in the target with respect to the template (i.e., no gaps in the template).
Subsequent to threading, dynamic perturbations were introduced to the designed binary PPI complexes (refer to the “Molecular dynamic simulation (short and long)” section) and the final atomic models were surveyed for their contact maps at the receptor–ligand interface. Absurdities in atomic contacts (design artifacts) such as those between two positively or two negatively charged amino acids (Lys-Lys, Glu-Asp, etc.) were obviated, wherever found, by mutating the corresponding amino acid in the originally threaded sequence (e.g., Lys → Glu, Glu → Arg, etc.). Such “artifact cleaning mutations” were chosen based on overall knowledge of atomic interactions in proteins. Such mutations often involved alteration in the hydrophobic character of the amino acids as well. This process gave rise to an iterative (threading → mutation → contact-map)n cycle in the protein design pipeline. Each resultant contact map was rigorously and manually scrutinized wherein other mutable positions were jotted down that could intuitively raise the EC while retaining the Sc. At instances, drastic changes like deleting a bulky side-chain (e.g., Phe → Ala) were also attempted. Charged amino acids were introduced as well as eliminated to favor and forbid the formation of salt-bridges. To eliminate the negative charge in Glu, Asp, they were mutated to corresponding polar variants (Gln, Asn). Attempts were also made to deliberately incorporate extended hydrophobic packing (i.e., introducing Ile, Met at strategic places, etc.) as well as aromatic stacking (introducing Tyr, His, etc.) at the interface. The final evaluation of the binary PPI complexes was made by the complementarity measures and their mapping in CPdock. Again, a total of 50 redesigned alternatives were constructed and tested in CPdock. Among the given alternatives, this set could fairly cover all non-redundant “presumably sensitive” point mutations and their combinations. Each individual case was carefully scrutinized with visual structural intervention of their redesigned interfaces to remove design artifacts. When plotted in CPdock, their population distribution in a close cluster ensured empirical thresholds in both measures to be naturally satisfied (Scmin: 0.402, ECmin: 0.173). In other words, the range of values obtained in the whole set were tight in both complementarity measures (Sc: [0.421, 0.723], EC: [0.178, 0.342]). Obtaining such tightly spaced numbers does not seem to be possible by random design or a mere reshuffling of sequence. To test this, scrambled sequences (refer to the “Scrambled sequences as negative control” section) were generated and undertaken in the same analysis. Just as the cases for strategies 1a and 1b, clear discriminatory clusters were obtained for the hits and the scrambled sequences (see Fig. 6) with practically no overlap. The disjointedness of the two clusters was clearer and more convincing than the earlier two sets (strategies 1a and 1b).
An apparent saturation was ensured in terms of covering arguably the whole spectrum of “sensitive” mutations attempted on the plausible mutational hot-spots. The analyses were greatly helped by the rigorous repeated use of visual structural examination. Interestingly, shape complementarity of the “hits” in this third set (strategy-2) has a much wider range (~ 1.5 to 2 times) than that of electrostatic complementarity, compared to the other two sets (strategies 1a, 1b). More interestingly, there was not a single case with the EC raised to 40%. The difference in geometric fit among the designed sequences may cause from mutations either resulting in undue holes being created at the interface or leading to short contacts. The two events involve truncation and forced incorporation of bulky groups (e.g., Gly → Trp and Tyr → Val, respectively) at the designed interface. At the same time, there appears to be natural evolutionary constraints on the upper limit of EC at this interface, which does not seem possible to be oversteped by different levels of protein engineering using the pull of 20 naturally occurring amino acids. The resultant EC values (natural as well as designed) physically mean quasi-stable to stable binding. The ones that are stable (i.e., optimal in terms of CPdock) were the ones of interest to be considered further. Overall, there appears to be strong natural and evolutionary control over the dynamics of RBDSpike–ACE2 binding. The top 25 sequences were filtered based on their residence relative to the optimal zone in CPdock, and considered further. The filtering also accompanied careful individual visual re-scrutiny of their interface. It is but trivial that these sequences were more closely spaced in CPdock and mapped to the “probable”/“less probable” regions (i.e., optimal/near-optimal zones). Again, all filtered sequences were successfully validated for fold-compatibility (averages 2.84 ± 0.16 in CSgl; 0.017 ± 0.0002 in CScp).
It was unambiguous from the comparison of the three plots pertaining to the three different design-sets (see Fig. 6) that the predicted solutions gradually improved from Set-1a, Set-1b to Set-2 reflected in the gradual north-eastern shift of the clusters (black dots in the plots). In other words, the homology-based design performed the best among the three. It was also evident from these results that the “scrambled” sequences may indeed serve as negative controls in the future experimental validation of the current hypothesis.
A demonstrative example is cited in Fig. 7, wherein, a case consisting of three designed sequences (HM0, HM3, HM5) selected from the pool (Set-2) collectively portrays the impact of strategic point mutations. For HM5, the designed sequence contains a single point mutation (493-Q → N) with respect to the initially threaded sequence (3SCJ_E on 6VW1_E, referred to as HM0 in Fig. 7). In the third case (HM5), the designed sequence further contains a second strategic point mutation (505-Y → H) over and above the earlier mutation. Here in this particular triad, the one with the single point mutation (HM3) gives somewhat better numbers (Sc: 0.710, EC: 0.224) than the one (HM5) with the additional aromatic mutation (Sc: 0.605, EC: 0.243), both better than the threaded sequence alone (HM0; Sc: 0.563, EC: 0.248). This demonstrates the scope and benefit of strategic point mutations to be invoked on the threaded homologous sequence to further improve the solution. Taken together with the natives (6VW1_E, 3SCJ_E), the results show a gradual shift towards a more balanced optimal solution upon threading (HM0) followed by subsequent strategic point mutations (HM3, HM5). The full-length sequences of these designed RBDSpike mimics are provided in Supplementary Dataset S1.
Dynamic persistence of the binding of the selected designed structural mimics
Two best predicted solutions (HM19, HM21) designed from strategy-2 were undertaken for long MD simulations (refer to the “Molecular dynamic simulation (short and long)” section) to study the dynamic persistence of the binding parameters. As a mean to set the baseline, the native ACE2-complex (6VW1) was also included in the calculation. HM19 and HM21 had originally attained {Sc, EC} values of {0.614, 0.276} and {0.687, 0.310}, respectively. To that end, all atom explicit-water MD simulation production runs were performed for 200 ns each, wherein, the simulated coordinates were accumulated at an interval of 100 ps resulting in 2000 snapshots (or time-stamps) for each simulated protein-complex. The post-simulation analyses commenced with collecting all snapshots pertaining to each trajectory and superposing them (using TM-align [91]) onto their respective templates (i.e., the starting structures of their respective MD simulations). The time-averaged Cα-RMS deviations of these superposed coordinates were found to be 2.50 (±0.38) Å, 2.66 (±0.39) Å for the designed ACE2-complexes (see the “Inherent evolutionary features of RBDSpike naturally aiding the design of its structural mimics” section) pertaining to HM19 and HM21, respectively (see Supplementary Fig. S6). In contrast, the native-average was ~1.5 times more with ~1.8 times the fluctuations (3.82 ± 0.66 Å) than both mimics. The dynamic persistence of the complementarity measures was analyzed by running CPdock on each sampled snapshot along the trajectory for each of the three subjects (HM19, HM21, native) followed by drawing their time-series plots individually for Sc, EC (see Fig. 8), and their statistical analysis.
A direct comparison of the original and time-evolved values (averages and standard deviations) for the complementarity measures (Sc, EC) can be made from the corresponding time-series plots (see Fig. 8) as well as from Table 3. For HM19 and HM21, the time-series averages (and standard deviations) were, respectively, found to be 0.664 (±0.048), 0.669 (±0.049) for Sc, and 0.278 (±0.082), 0.248 (±0.074) for EC while the same for the native was found to be 0.628 (±0.050) for Sc and 0.149 (±0.080) for EC. Thus, by and large, both complementarity measures fairly retain their original trends and nuances as revealed from their respective initial values (see Table 3) in all three subjects. The numbers further suggest that the primary differentiating descriptor between the native and the designed mimics is indeed EC, while, the shape descriptor (Sc) serves as a (threshold-dependent) necessary criterion for the complexation, as it does generally for macro-molecular binding per se (refer to the “Affinity and stability of binding from local and non-local measures of complementarity” section). In more elaborate terms, Sc, once into its optimal range (refer to the “Shape and electrostatic complementarity” section), converges further to a more optimized narrower range (dependent on the particular protein co-complex system) with time, irrespective of their fine-grained structural difference brought about by the strategic design(s) (see Table 3). The difference between the corresponding ECs (designed vs. native) however persists throughout the entire 200 ns simulated trajectories. Notably, the native EC originally falling into the sub-optimal range (EC6vw1 = 0.102), largely remains in the same (sub-optimal) range throughout the course of the entire simulation run. On the other hand, the improvement brought about by the strategic design is fairly retained with time in both selected designed mimics. Equally notable is the fact that EC values for the designed mimics (original as well as time-evolved) regularly and consistently hit the crucial “near-optimal to optimal” range (refer to the “The protein design strategy: sampling and scoring” section) indicating stable electrostatic matching at the designed interfaces. These observations are consistent with the original proposition that the native ACE2-complex (6VW1) forms with high affinity, but lacks stability over time due to sub-optimal electrostatic matching at its interface. On the other hand, the directed design enables the mimics (HM19, HM21) with the potential to bind to ACE2 with equivalent high affinity, and also to remain bound stably over time.
Table 3 Complementarity (Sc, EC) and its time-evolution for the selected designed binary complexes compared to the native Within the entire 200 ns trajectories, Sc could maximally be raised to 0.797 and 0.793 for HM19 and HM21 while their corresponding highest EC values attained were 0.592 and 0.497, respectively. All numbers unequivocally indicate that the binding is dynamically stable and of high affinity. The directed improvement in the matching of electrostatic surface potentials for HM19 and HM21 are portrayed in Fig. 9 and Supplementary Fig. S7, respectively, citing the MD-snapshot(s) with their highest attained EC values. A comparison with Fig. 3 reveals the improvement in EC from the sub-optimal to the optimal range.
Similar dynamical trends are also reflected from the time-series plots for E2d (see Supplementary Fig. S8)—which estimates the 2D Euclidean distance of a plotted {Sc, EC} point in CPdock from the “probable” region of the plot (refer to the “Measuring the dynamic stability of the proposed ‘optimal’ solutions” section). To note, E2d renders a value of “zero” if the point falls into the “probable” region. For E2d, the native has substantially greater fluctuations (see Supplementary Fig. S8) compared to both HM19 and HM21 at different patches of the simulation trajectories. Overall, this leads to a standard deviation of ~2.5 times higher in the native than in both of the designed ACE2-complexes (see the “Inherent evolutionary features of RBDSpike naturally aiding the design of its structural mimics” section). Also, notably, the time-series average for the native E2d is more than 4 times to that of the designed ACE2-complexes. In contrast, the same time-series averages for both HM19 and HM21 are almost identical to each-other and close to zero. All the numbers unambiguously indicate the dynamic stability of the designed ACE2-complexes relative to that of the native.
Implicit to the E2d analysis, distribution of {Sc, EC} points (coming from each snapshot in a given trajectory) across the three defined regions in CPdock (refer to the “Fold recognition” section) was also surveyed for each ACE2-complex (pertaining to HM19, HM21, native). While, for HM19, the fraction of snapshots falling into the “probable,” “less probable,” and “improbable” regions of CPdock were 78.05, 20.05, and 14.45%, respectively, the same fractional counts for HM21 were found to be 77.8, 20.9, and 1.3%. In great contrast, the “less probable” and “improbable” regions together populated 56% of the native trajectory (“probable” 43.5%, “less probable” 42.55%, “improbable” 13.95%). Overall, the numbers collectively suggest clear improvements from native instability to stable binding in the designed ACE2-complexes over time. As a formal test of significance (of the obtained deviations), we performed a χ2 test between the native and each of the designed sets from their respective raw counts using a 3-bin model (i.e., df = 3–1: “probable,” “less probable,” “improbable”; χ20.05 = 5.991). The χ2 method is traditionally associated with the Complementarity Plot(s) through several earlier applications using the plot(s) as discerning discriminatory metric(s) between different population-distributions [44, 45, 48]). For the current cause, the “null hypothesis” assumed “no significant improvement in stability over time upon the directed design” and that “the deviations from the native distribution were simply obtained by chance”. In reality, however, the resultant χ2 values (see Eq. 6 defined in the “Buried surface area” section) were computed to be 1001.375 and 990.654 for HM19 and HM21, respectively, both more than 160 times to that of the (above-quoted) χ20.05 for a 3-bin model. This literally rules out even the slightest of chances to accept the proposed “null hypothesis” and concludes instead that the deviations from the frequencies distributed under the “null hypothesis” are indeed significant and must not have occurred by chance. The fact that the selected designed ACE2-complexes (for both HM19 and HM21) are largely contained within the “near-optimal to optimal” regions of the CPdock over time is also reflected from their three-dimensional population density plots (see Supplementary Fig. S9).
Further, as a mean to cross-validate the predicted improvement in binding stability reflected from the complementarity measures (Sc, EC), binding/interaction energies (ΔGbinding) of the native (\( \Delta {G}_{binding}^{native} \)) and the selected designed ACE2-complexes (\( \Delta {G}_{binding}^{mimic} \)) were computed using FoldX (refer to the “Estimating changes in binding/interaction energies for the proposed ‘optimal’ solutions” section) along their corresponding (200 ns) simulated trajectories. This was followed by computing their directed difference (\( \Delta \Delta {G}_{binding}^{mimic} \)) following Eq. 5 (defined in the “Estimating changes in binding/interaction energies for the proposed ‘optimal’ solutions” section) and drawing time-series plots individually for all three free-energy-difference terms (see Fig. 10). Time-series averages (and standard deviations) of the corresponding ΔGbinding terms were found to be −5.939 (± 2.581) kcal/mol and −5.634 (± 3.011) kcal/mol for the ACE2-complexes in HM19 and HM21, while, only amounting to 0.854 (± 4.981) kcal/mol for the native ACE2-complex. The obtained native average seems to be of potential physical significance, since it hits a near-zero value in \( \Delta {G}_{binding}^{native} \)meaning that the dynamic persistence of the native ACE2-complex is only mildly favored thermodynamically. The associated standard deviation of ~ ± 5 kcal/mol reflecting high dynamic fluctuations (μ = 6σFootnote 5) in the native ΔGbinding further suggests that the native ACE2-complex (6VW1) is indeed energetically unstable over time. Together, this favorably speaks for a model of quasi-stable binding/interaction. Given that the purpose of the complexation here is to switch on the membrane fusion and viral entry to the host cell [15], a transient (quasi-stable) nature in the interaction of the native RBDSpike and ACE2 is indeed intuitively expected, perhaps also reflected from the appreciably low (and sub-optimal) native-EC values all-throughout. Also, the “surprisingly low kinetic barrier” revealed for the preceding event (see the “Comparative stability of the RBDSpike conformers influencing their switch” section) does seem to add to the proposition. Notably, the proposition of the “low kinetic barrier” for the conformational switching of the Spike protein (“pre” to “post”-fusion forms) is purely based on experimental biophysical and structural data, wherein, they have found the dissociated “cleaved S1/S2 complex” in absence of ACE2 as well as the adopted “post-fusion conformer of the S2 fragment” under mild detergent conditions mimicking a membrane environment [15].
The relative improvement in binding stability over time brought about by the strategic design is also reflected from the high negative time-averaged \( \Delta {G}_{binding}^{mimic} \)values (Table 3) and their appreciably low standard deviations (roughly scaling to μ = 2σ for both HM19 and HM21). As a result, the corresponding \( \Delta \Delta {G}_{binding}^{mimic} \) values are also equally negative (HM19 −6.793 ± 5.990 kcal/mol; HM21 −6.487 ± 5.781 kcal/mol)—which further confirms the predicted improvement in their thermodynamic stability over time. Thus the improvement in binding stability predicted from complementarity (EC in particular) is also clearly reflected in the corresponding free energy estimates of the binding events, over time.
Nullifying the feasibility of the proposed designed therapeutics to compete with the ACE2–angiotensin II binding
Angiotensin Converting Enzyme 2 (ACE2), a vital counter-regulatory component of the Renin-Angiotensin System (RAS), has recently got great attention in COVID-19 research for acting as a doorway to SARS-CoV-2 into the host cells [92,93,94,95,96]. Upon low blood flow, kidney cells convert the circulating pro-renins into renins which further take part in catalyzing the angiotensinogen secreted by liver cells into angiotensin I [95]. The membrane-bound Angiotensin Converting Enzyme (ACE) present on vascular endothelial cell surface in the lungs, thereafter, converts angiotensin I into angiotensin II which is an amphipathic linear octa-peptide that serves as a vasoconstrictor [95]. As a result, angiotensin II causes blood vessels to be constricted to increase blood pressure through engaging type 1 angiotensin receptor (AT1R) [96, 97]. Angiotensin II also increases blood pressure by stimulating adrenal cortex cells to secrete the aldosterone hormone. So, under normal physiological condition, a fine balance between ACE2–angiotensin II and ACE2–Ang-(1–7) has to be maintained in order to control the blood pressure and inflammation. As because SARS-CoV-2 utilizes the membrane bound ACE2 receptor to gain entry into host cells, so this is a condition where the viral Spike protein bound ACE2 receptors will be less available to angiotensin II. As a result, an equilibrium shift towards the increased activity of ACE2–angiotensin II might drive acute lung injury. Furthermore, according to the current hypothesis, SARS-CoV-2–ACE2 binding causes increased internalization and shedding off of the ACE2 receptor making it further unavailable to angiotensin II and thereby causing less production of Ang-(1–7). This can induce blood pressure along with direct parenchymal injury [98].
Our current work has considered the possibility of whether or not our designed plausible therapeutics can compete with the binding site of angiotensin II on ACE2 and may thereby disrupt the balance in RAS. In this regard, the NMR structure of angiotensin II (PDB ID: 1N9V) was surveyed which has little conformational deviations among its 21 models (average RMS deviation: 0.187 ± 0.09 Å upon aligning to MODEL-1 in PyMol). When, 1N9V (MODEL-1 taken as the representative structure) was superposed onto the ligand (E) chain of 6VW1, the peptide is found distant from the ACE2 binding site (see Supplementary Fig. S10) having an RMS deviation of 4.28 Å. Based on a pairwise sequence alignment (in CLUSTAL-OMEGA [99]), the angiotensin II sequence was then threaded onto “6VW1_E_bs,” the ACE2 binding site on RBDSpike (refer to the “Reaction-prone nature of the ACE2 binding site in SARS-CoV-2 RBDSpike” section). The corresponding atomic model was subsequently built which resulted in an RMS deviation of 3.46 Å considering a stretch of just eight mapped amino acids. Thus, the two molecular objects does not seem to have any appreciable structural resemblance. Furthermore, when this built atomic model is placed onto the RBDSpike–ACE2 complex (6VW1), it has no proximity with the ACE2 receptor (displayed as solid surface in Supplementary Fig. S10, bottom panel). No atoms were found at the native RBDSpike–ACE2 interface. Naturally, a small bent linear octa-peptide like angiotensin II (see Supplementary Fig. S10, top panel) finds little chance to fit into a plausible binding model with the Spike protein binding site in ACE2—which is no more than a single α-helix (refer to the “Evolution of the CoV-2 RBDSpike–ACE2 interaction dynamics” section). Rather, a deep groove or a pocket is generally required to engulf such small molecules without having the necessity to have a proper shape and/or electrostatic match at the interface [100,101,102]. Thus, the two ligands (angiotensin II and RBDSpike) have no good reason to compete for an identical binding site on ACE2. Also, it is well-known that unlike protein–protein binding, where large interacting surfaces (~1600 Å2 on average) [103] need to be carefully tailored to fit into each-other over extended areas, a small-molecule ligand (or co-factor) can present far greater conformational variation upon binding to different binding pockets—which, in-turn, exhibit more variability in shape and physicochemical attributes than can be accounted for by the adopted conformational multiplicity of the ligand [44, 100,101,102]. This further nullifies the possibility of a binding conflict with angiotensin II at the Spike binding site of ACE2.
Having said that, the actual binding site of angiotensin II on ACE2 is not yet known experimentally. To that end, further computational structural investigation of the two available individual partner molecules were carried out to gain some more intuitive insights into their plausible binding mode, followed by performing a molecular docking of the two.
The membrane bound ACE2 receptor represents the extra-membrane domain of the corresponding integral membrane protein. A closer look into its structure reveals that it is an all-α protein-domain (helical bundle) resembling the shape of an elongated spheroid and thereby forming a percolative channel fairly open to the aqueous solvent at either pole. It should thus mostly be facing an aqueous environment supported by having accordingly a bulk majority of hydrophilic regions. This was confirmed by the BRANEart webserver (http://babylone.3bio.ulb.ac.be/BRANEart/index.php) which analyzes strength, stability, and weaknesses of different regions of membrane proteins [104] and colors them accordingly (blue: hydrophilic, white: neutral, red: hydrophobic). BRANEart further lists a residue-wise “Membrane Propensity” score, defined in the range of −1 (red: hydrophobic) to +1 (blue: hydrophilic) computed by a linear regression machine trained on a collection of statistical potentials. From numerically as well as from the visual outputs (see Supplementary Fig. S11A), it was evident that indeed most part of the ACE2 structure (~85.5%) prefers to stay in polar (aqueous) environments. These hydrophilic regions are interspersed with neutral/mildly hydrophobic patches coming from some of the component helices, thereby forming an amphipathicFootnote 6 open inner-groove, partially exposed to the solvent at either poles. A small molecule thus has a great chance to pervade and slip through the long axis of the open-inner groove and be sustained there stably—which appears to be genuinely plausible for an open-ended amphipathic linear octa-peptide like that of angiotensin II (see Supplementary Fig. S11B). To test this structural hypothesis, two docking studies were performed using the popular protein-docking webserver Cluspro (v.2) [105, 106]: (a) docking of angiotensin II vs. ACE2 and (b) docking of angiotensin II vs. the RBDSpike–ACE2 binary PPI complex.
As was anticipated from the structural hypothesis, the results of the first docking test (a) indeed revealed that angiotensin II prefers to diffuse through the open inner-groove of ACE2 and be contained stably at the protein core. The top 10 docked poses (as ranked and returned by Cluspro) upon superposition onto the ACE2 global frame of reference (as in 6VW1) were invariably found to hit the inner groove/core of the protein (see Fig. 11a) which has no structural conflict with the binding of RBDSpike (displayed alongside the docked poses in the same image). As can be expected, the same results were virtually reproduced in the second docking test (b) even within the larger structural context of the RBDSpike–ACE2 binary PPI complex, fed in as the receptor (see Supplementary Fig. S12). The top ranked docked binary complex (from (a)) was further surveyed in BRANEart which resulted in compatible hydrophilicity/hydrophobicity profiles for the two binding partners (angiotensin II and ACE2) in their bound form (see Fig. 11b). Thus the docking results are very much in accordance with the structural hypothesis stated and reasoned above—which practically nullifies all realistic chances of a potential conflict between the two bindings. Taken together, there does not seem to be any convincing structural rationale to favor a plausible interference caused by the proposed therapeutic intervention to the RAS via ACE2.
Comparing the proposed therapeutic intervention with the current state-of-the-art
One of the prime focuses of the recent research advances on anti-viral therapeutics for SARS-CoV-2 has been on utilizing the already available knowledge on the host cell entry mechanisms of SARS-CoV, MERS, and other coronaviruses. Three general pathways that could lead to the development of potential antiviral therapeutics are (i) repurposing through the testing of pre-existing antiviral drugs, (ii) by high throughput screening of small molecules, and (iii) through the redevelopment of new drugs or neutralizing antibodies or vaccines. Our current study proposes a non-trivial protein design approach to develop antiviral therapeutics that might act as potential competitive inhibitors of the SARS-CoV-2 RBDSpike. After gaining insight into host cell entry mechanisms, importantly through the revelation of X-ray crystallographic structure of SARS-CoV-2 Spike protein binding to its cognate receptor, ACE2, on human cells [21, 24, 107], the drug-designing methods are primarily revolving around the S protein subdomain blockers for obvious reasons.
There are also peptide-based approaches involving strategic contextual design of hybrid and fusion peptides. Such a hybrid peptide has been computationally constructed by linking two discontinuous fragments of ACE2 (residues 22–44 and 351–357) by a linker glycine [108]. In addition to designing of small peptides from ACE2 sequence, clinical-grade soluble hACE2 has proven to be a promising therapeutic candidate molecule which has shown to block the entry and growth of SARS-CoV-2 in the blood vessel and kidney organoids system [109]. In order to develop potential therapeutics against SARS-CoV-2, researchers have also targeted the HR1 (heptad repeat 1) and HR2 domains in the S2 subunit besides targeting RBDSpike (S1) [110]. Lipo-peptide such as EK1C4 has been demonstrated to be the most potent fusion inhibitor [110, 111]. Further, evidences have been put forward in support of significant efficacies of peptide inhibitors derived from the HR2 domain which can block the fusion of the viral and the host cell membranes [112].
Alternatively, it has been shown by wet-lab experiments in hACE2-expressing cells that the recombinant RBDSpike could block the entry of both the SARS-CoV and SARS-CoV-2 into the host cells [113]. A recent MD simulation study coupled with bio-layer interferometry [114] has targeted the “ACE2 PD α1 helix” (refer to the “Evolution of the CoV-2 RBDSpike–ACE2 interaction dynamics” section) where the SARS-CoV-2 RBDSpike binding actually occurs. This 23-mer peptide fragment (residues 21–43) can effectively bind to SARS-CoV-2 RBDSpike at a very low nano-molar affinity (Kd = 47 nM) thereby posing a high possibility to interfere with the viral entry into host [114]. Importantly, although their peptide-based drug designing approach means to bypass the alteration in ACE2 physiological functions, the actual effect of their RBDSpike blocker still remains to be checked in terms of titters in human system. Such approaches are essentially aiming for an “antigen arrest” before the pathogen reaches the host pulmonary system. A similar approach has also been adapted using nanobodies for directed delivery of neutralizing antibodies of RBDSpike [115]. In complete contrast, our approach takes the alternative route to develop therapeutics which may potentially block the RBDSpike binding site on the cognate receptor, ACE2. We take advantage of the quasi-stable native binding of RBDSpike to ACE2 in SARS-CoV-2 and aim to appreciably increase the binding stability while retaining near-native high affinity. The mutations were directly performed on the native experimental RBDSpike–ACE2 complex. The proposed designed variants are the end-products of cycles of rigorous computational screening through high-level structural descriptors, and the predicted improvement in binding stability in their corresponding ACE2-complexes (see the “Inherent evolutionary features of RBDSpike naturally aiding the design of its structural mimics” section) over time is also cross-validated by appropriate free energy estimates. The proposed “high affinity stable binding” in the predicted ACE2-complexes pertaining to the designed structural mimics should therefore serve as the basis of their potential usage as blockers of the native Spike protein for its cognate receptor. Aligned approaches have shown the effect of key residue substitutions in SARS-CoV-2-CTD (see Supplementary Fig. 1) leading to a fourfold increased affinity for receptor binding than that of the native binary PPI complex [24]. We further structurally cross-checked that the designed RBDSpike mimics do not seem to have a realistic chance to cause a potential conflict with the binding of angiotensin II to ACE2, and therefore presents only a thin feasibility to interfere with the native physiological function of ACE2 (refer to the “Nullifying the feasibility of the proposed designed therapeutics to compete with the ACE2–angiotensin II binding” section). Furthermore, the prescribed RBDSpike mimics being substantially smaller in size (of the order of 1/100th) than those of the full virus particles should be able to reach the binding sites at a much faster time-scale.
Although other groups have followed a more direct approach (“antigen arrests” as well as “immunization”) to prevent RBDSpike binding to ACE2 through designing mini-proteins [116], peptide blockers [114], nanobodies [115], and vaccines [117,118,119], we have chosen a more indirect and unconventional (reverse) approach in our proposed bio-therapeutic design. The reasons for our choice are as follows:
Firstly, in the absence of the viral infection, the ACE2–angiotensin II binding is not known to transmit any molecular signal leading to transcription of downstream genes [120, 121]. So, from that end, the proposed therapeutics do not appear to not cause any further impact on the intra-cellular downstream signaling. The second benefit is related to the “systemic clearance” of the therapeutics after their course of action—which is a common concern to all administered competitive inhibitors. It is well known that SARS-CoV-2 infection is associated with ACE2 downregulation [121, 122] mostly by endocytic internalization of ACE2, and also influenced by some other unknown mechanisms. The proposed RBDSpike mimics will likewise be internalized in the form of their ACE2-complexes (see the “Inherent evolutionary features of RBDSpike naturally aiding the design of its structural mimics” section), however, with the definite advantage of not carrying with them the rest of the viral particle. Additionally, the designed mimics being significantly smaller in size than the viral particle would likely have a faster approach to ACE2. By virtue of potentially having a greater stability (as all the results unequivocally indicate), they would thus occupy the viral attachment sites on the host cell membrane, eventually out-competing the viral binding (and infection). So, that way, the designed mimics would actually act against the endocytic internalization of the native RBDSpike, and at the same time, inhibit the host cell entry of the viral particle, by the proposed membrane fusion mechanisms [15]. The suggested downregulation of ACE2 will thus (in all probability) be only short-termed followed by a fast restoration of the physiological homeostasis both in terms of ACE2 and angiotensin II. Moreover, the internalization of ACE2-complexes (see the “Inherent evolutionary features of RBDSpike naturally aiding the design of its structural mimics” section) pertaining to the proposed designed mimics will naturally ensure the metabolism of the therapeutics and their systemic clearance. Thirdly, SARS-CoV-2 being extremely pleiotropic in nature, its titer(s) in individuals of different age groups, gender, and with different medical conditions might be challenging to evaluate. Since our reverse approach is aimed to block the ACE2 receptor which is native to the individual (rather than a foreign body), the precise doses of the therapeutics will likely be easier to determine. Considering these salient advantageous features, we preferred the reverse approach.
The proposed method, however, comes with certain potential caveats. Firstly, the predictions are purely computational (however, based on available experimental structures), yet to be validated in the wet lab. Secondly, important part of the structural hypothesis is based on available knowledge and current understanding of the viral entry mechanisms, part of which are also currently at a hypothesis level. Thirdly, the mode of administration (oral/intravenous/inhalation) is yet to be determined through wet lab experiments. Fourthly, cytokine storms (as immune responses) [123, 124] are found to be triggered upon binding of coronavirus with ACE2 and the consequences of the proposed therapeutic(s) to that end is yet to be tested again by wet lab experiments.