Introduction

Disease caused by SARS-CoV-2 has been recognized as Corona Virus Disease 2019 (COVID-19). SARS-CoV first came out in the Guangdong province of China in 2002 and had outspread into five countries infecting 8098 people and 774 deaths, having a mortality rate of 11% [1]. After that, in 2012, MERS-CoV appeared in the Arabian Peninsula and had outspread into 27 countries, infecting a total of 2494 individuals and took 858 lives with a mortality rate of 34% [2]. Recently SARS-CoV-2 has been elevated in Wuhan city, Hubei province of China, in December 2019. Till now (11.01.2022), there are over thirty core cases of COVID-19 and over 5.4 million deaths (mortality rate around 3.40%) have been reported to affect 222 countries globally. Currently, a new variant of COVID-19 named “Omicron” are also reported in many countries with high transmission rate. On March 11, 2020, the World Health Organization announced the COVID-19 pandemic, a public health emergency of global concern. All age’s people can catch this viral infection, but immune-compromised people having co-morbidities are most vulnerable. Propensity of age, males with chronic diseases (like- diabetes, heart disease, cancer, etc.) are higher vulnerable than other groups of people [3]. This virus can be easily transmitted through the droplets generated when coughing and sneezing by the infected people [4]. These infectious droplets can be spread up to 1–2 m and stay on surfaces. This virus can survive on metal surfaces for several hours, even days, in favorable conditions but can be destroyed by disinfectants like hydrogen peroxide, sodium hypochlorite, etc. [5]. The incubation period varies from 2 to 14 days. Few common clinical symptoms are fever (except asymptomatic cases), dry cough, sore throat, fatigue, headache, breathlessness, sudden loss of smell and taste. Without proper treatment, this disease can cause pneumonia, respiratory failure and even death. Generally, after --week recovery started. It has been observed in patients that the progression of this disease increases the release of cytokines including interleukin (IL)-6 and IL-10, whereas the levels of CD4+T and CD8+T are reduced [6]. There is no approved treatment for COVID-19 but anti-viral drugs such as Remdesivir, Tocilizumab are in use for treatment [7]. Also, many chemical compounds and bioactive compounds appear by molecular docking studies as a drug in treatment of COVID-19 [8, 9].

Coronavirus is an enveloped virus having a positive single-strand RNA genome, and they have spike proteins on the surface with a size of 60–140 nm [10]. There are four subtypes such as alpha, beta, gamma, and delta coronaviruses. Most of the highly pathogenic viruses are severe acute respiratory syndrome coronavirus (SARS-CoV), Middle East respiratory syndrome coronavirus (MERS-CoV), and SARS-CoV-2; and belongs to β-coronavirus [11]. Generally, the β-coronavirus genome contains six open reading frames (ORFs); first ORFs (ORF1a/b) are in two-thirds of the whole genome and encode 16 nonstructural proteins (nsps). There is one frameshift between ORF1a and ORF1b, which produces two polypeptides, pp1a and pp1ab. Main protease (Mpro) and chymotrypsin-like protease (3CLpro) are involved in the processing of these polypeptides [12, 13]. Other ORFs of the genome near the 3′-terminus encode the four main structural proteins, spike glycoproteins, membrane, envelope, and nucleocapsid proteins [14]. Genome analysis of SARS-CoV-2 revealed that there are 79.5% and 97% of similarities with the whole genome sequences of SARS-CoV and bat SARS-CoV, respectively [3]. SARS-CoV-2 enters the host respiratory mucosa by binding with the receptor of angiotensin-converting enzyme 2 (ACE2) with its spike glycoproteins [15]. A recent study has shown that SARS-CoV-2 binds with ACE2 with a tenfold higher affinity compared to SARS-CoV [16]. The basic reproduction number (R0), the average number of secondary infections produced by patients, is between 2.47 and 2.86 for SARS-CoV-2, whereas the R0 value of SARS-CoV is 2.2–3.6, and 2.0–6.7 for MERS-CoV [17,18,19]. These results indicate that SARS-CoV-2 has comparatively high transmission ability than other coronaviruses. Sequence analysis of SARS-CoV 2, SARS-CoV, and other SARS-related coronavirus (SARSr-CoV) spike glycoproteins showed that four amino acids are inserted in the positions of 681–684 between S1 and S2 subunit of SARS-CoV-2 [20]. SARS-CoV ORF 3b, ORF 6, and N proteins inhibit the expression of beta interferon (IFN-β) [21]. The envelope (E) protein in coronavirus is a small membrane protein that has several functions in virion assembly and ion-channel activity, through which it can interact with the host [22].

With the unavailability of anti-viral drugs for nCoV, society demands sincere efforts in drug design and discovery for COVID-19 [23, 24]. Since 2002, SARS has present on this earth. But it creates a dangerous effect and makes a pandemic situation after 18 years. Why? Why is this virus so harmful to us? What are the fundamental differences between SARS-CoV-2 and SARS? How evolution makes them stronger than SARS? How can they gain stability in such extreme environments? Do intra-protein interactions play a vital role in SARS-CoV-2? This study will help to find out all those questions.

Materials and methods

Dataset

A detailed investigation of the sequences and structures of SARS-CoV-2 was performed with reference to the old SARS. Four types of SARS-CoV-2 and SARS reviewed protein sequences, i.e., spike proteins, membrane proteins, nucleoproteins, and ORF proteins (ORF 3, ORF 6, ORF 7, ORF 8, and ORF 9) were considered in this study. All annotated protein sequences of SARS-CoV-2 and SARS were retrieved from the UNIPROT [25] database. The crystal structures of SARS-CoV-2 and SARS proteins were retrieved from the RCSB protein database (PDB) [26]. The structure was chosen based on some criteria of crystal structures.

Physicochemical properties

The protein sequences were subjected to multiple sequence alignment (MSA) with the help of CLUSTAL Omega [27]. Both block and non-block FASTA [28] formats of the sequences were analyzed. Block of the sequence was prepared by BLOCK Maker [29] from MSA. Both non-block and block formats were analyzed by ProtParam server [30,31,32] and ProtScale server [33] for calculation of physicochemical properties likes amino acid composition, GRAVY, aliphatic index, bulkiness, polarity, etc. The value of ORF protein analysis is the average of all ORF (ORF 3, ORF 6, ORF 7, ORF 8, and ORF 9). The total amount of disorder-forming residues (i.e., E, P, K, S) and order forming residues (i.e., I, F, W, Y) are calculated from amino acid compositions based on previous reports [34, 35]. Intrinsic disorder regions of protein were analyzed by DisEMBL [36] server.

Analysis of crystal structure

SARS-CoV-2 protease (5R80) and SARS protease (2H2Z) were extracting from RCSB PDB for structural comparison. All structured were minimized in 1000 steps using UCSF Chimera with forcefield [37]. Analyses of the secondary structure were done by CFSSP [38] server to find the amino acid abundance in coil, helix, sheet and turn. Number of salt bridges were extracted by WHAT IF server [39]. Intra-protein interactions were determined by Protein interaction calculator [40] and Arpeggio [41]. Free solvation energy was calculated by ProWaVE server [42]. Surface area and volume were determined by the CASTp [43]. Phosphorylation sites of protein were identified by the NetPhos server [44]. Protein mutations were analyzed by the DUET [45].

Results and discussions

Effect of polar residues on SARS-CoV-2 sequence

Here D, E, H, R, K amino acids were considered as a charged residues and C, S, T, N, Q, Y, W as uncharged polar residues. Amino acid compositions were calculated from the non-block format, whereas block format was used to calculate disorder-forming residues, order forming residues, bulkiness, aliphatic index (AI), and polarity. GRAVY (grand average of hydropathy) is calculated by adding the hydropathy value [46] for each residue and dividing by the length of the protein sequence. Is there a preference for amino acids in SARS-CoV-2 relative to SARS? To findout the answer, all physicochemical properties were calculated.

Spike proteins showed higher abundance (Fig. 1) of charged residues (except D) in SARS-CoV-2. Polar residues in spike proteins showed higher quantity (except T, W) in SARS-CoV-2. In nucleoproteins of SARS-CoV-2 D, K and R showed higher abundance and E, H showed lower abundance as charged amino acids. Polar residues in nucleoproteins also showed higher plenty (except T, N) in SARS-CoV-2. Surprisingly C is absent in both groups of sequence in nucleoproteins. Other proteins, i.e., membrane proteins and ORF proteins, showed almost similar abundance with those previous results. Polar residues also help proteins to tolerate temperature [47]. Number of disorder-forming residues has higher abundance in SARS-CoV-2 than SARS. The number of order forming residues has lower abundance in SARS-CoV-2 than SARS (in case of spike and ORF proteins). The higher number of disorder-forming residues in SARS-CoV-2 indicates that it can easily increase pathogenicity or virulence. Proline may give a preadaptive advantage by enhancing antioxidant defenses, which in the setting of disease would extend cell viability, raise colonization efficiencies, and enhance virulence [48]. It was also reported that disorder-forming residues like S and E, are responsible for increase pathogenicity [49, 50]. The aliphatic index is high in every SARS-CoV-2 protein. Increased value of the aliphatic index in SARS-CoV-2 proved that SARS-CoV-2 is more thermally stable than SARS [51].

Fig. 1
figure 1

Comparative analysis of physicochemical properties like amino acid compositions, disorder-forming residues, order forming residues, GRAVY, aliphatic index of spike proteins (SP), nucleoproteins (NP), membrane proteins (MP), ORF proteins (OP) from SARS-CoV-2 (red bar) and SARS (green bar)

The polarity of those proteins showed slightly higher values in SARS-CoV-2 than SARS (Fig. 2). Due to the latter, bulkiness is also high in SARS-CoV-2 than SARS. The high value of bulkiness in SARS-CoV-2 indicates that they need more extended heating periods in hydrolysis [52]. They can tolerate heat better than SARS. The Kyte–Doolittle hydrophobicity scale suggests that the SARS-CoV-2 is hydrophilic in nature (Fig. 3). The lower value of GRAVY (except nucleoproteins) indicates the hydrophilic nature of SARS-CoV-2. The hydrophilic nature of SARS-CoV-2 gives a clue that it can quickly interact with water or aqueous medium and spread easily than SARS [53, 54]. The intrinsic disorder regions are very much high in SARS-CoV-2 than SARS. A high abundance of intrinsic disorder regions of SARS-CoV-2 indicates that it helps in protein folding of SARS-CoV-2 and will interact more with other proteins than SARS. Many intrinsically disordered proteins (IDPs) have been found to undergo a disorder-to-order transition, implying that their folding processes are inherently distinct from those seen in globular proteins. After binding to natural partners, certain IDPs can fold into a unique 3D form. Many IDPs/IDPRs can fold when they engage with their binding partners and have various binding specificities, allowing them to participate in one-to-many and many-to-one interactions [34, 55,56,57,58,59,60]. At various levels, viral IDPs mediate successful infection and govern pathogenesis. Because of their widespread engagement in host–pathogen mediated regulators and great prevalence in viral proteomes, virus IDPs are being investigated as possible therapeutic targets [61].

Fig. 2
figure 2

Comparative analysis of bulkiness and polarity of spike proteins (SP), nucleoproteins (NP), membrane proteins (MP), ORF proteins (OP) from SARS-CoV-2 (red line), and SARS (green line)

Fig. 3
figure 3

Comparative study of intrinsic disorder regions and Kyte–Doolittle hydrophobic scale of spike proteins (SP), nucleoproteins (NP), membrane proteins (MP), ORF proteins (OP) from SARS-CoV-2 (red line) and SARS (green line)

Analysis of secondary structure of SARS-CoV-2 and SARS

The building blocks of proteins, i.e., amino acids, are found in four positions of secondary structure, i.e., coil, helix, sheet, and turn.

Charged residues showed higher abundance in every position (turn, helix, coil, and sheet) of SARS-CoV-2 (Table 1) than SARS. Charged residues showed higher abundance within the helix of both proteins. The introduction of higher number of charged residues in the helix, resulted in more resistant proteins to the acidic environment or temperature denaturation which helps in increasing the stability [62, 63]. Hydrophobic residues have higher abundance in SARS (except coil) than SARS-CoV-2. Polar residues also showed higher abundance in every secondary structure position of SARS-CoV-2 than SARS. It was already proved that polar amino acids on the surface can influence helix formation and increase its stability [64]. However, the highest abundance of polar residues was found in sheet of both SARS-CoV-2 and SARS. More than 50% of residues were present in sheet of SARS, whereas SARS-CoV-2 have 39.33% and 31.54% residues on sheet and helix. So, SARS-CoV-2 increase amino acids propensity in helix to increase its stability.

Table 1 Amino acid abundance (%) in protein secondary structures (turn, helix, coil, and sheet) of SARS-CoV-2 (5R80) and SARS (2H2Z)

Effect of intra-protein interactions on SARS-CoV-2 and SARS

Salt bridges have a significant effect on protein stability [65,66,67,68] Charged residues are participating in the formation of salt bridges. Usually, two types of salt bridges are found in proteins, i.e., isolated salt bridge and network salt bridge. The increasing number of charged residues of SARS-CoV-2 indicates that charged residues might enhance salt bridge formation to gain more stability. Other intra-protein interactions like, metal ion binding site [69], aromatic-aromatic interactions [70,71,72] also help in protein stabilization.

SARS-CoV-2 has large pocket area than SARS (Fig. 4A, B), which gives it more protein–protein or protein–ligand interactions possibilities (Table 2). The volume of the protein is also high in SARS-CoV-2 than in SARS. Protease from SARS-CoV-2 possess 9 isolated salt bridges and 1 network salt bridge, whereas SARS protease has 8 isolated and 1 network salt bridge. The result indicated that SARS-CoV-2 is highly stabilized by the help of salt bridges. Though SARS-CoV-2 and SARS proteins have only one type of network salt bridge, but SARS-CoV-2 has gained a special engineered salt bridge (Fig. 4E, F), which is cyclic in nature (R131-E290, K137-E290, R131-D197, K137-D197, R131-D289). Residue number 131R participated maximum time to form this cyclic salt bridge. Novel cyclic salt bridge might have a great role in its protein stability [68, 72].

Fig. 4
figure 4

Large pocket area (red zone) in 3D protein structures of SARS-CoV-2 (A) than SARS (B). 3D protein structure of SARS-CoV-2 with cyclic salt bridge (C). 3D view of only cyclic salt bridge formation of SARS-CoV-2 (D). Phosphorylation sites of SARS-CoV-2 (E) and SARS (F)

Table 2 Volume, pocket area, isolated salt bridges (ISB), network salt bridges (NSB), metal binding site (MBS), and solvation free energy (ΔGsolv) of SARS-CoV-2 (5R80) and SARS (2H2Z)

Number of metal ion binding sites is also high in SARS-CoV-2 than SARS. These 3 metal ion binding sites contain dimethyl sulfoxide in COVID-19 virus. Free solvation energy is a thermodynamic factor that determines protein solvation or the nature of denaturation [73]. By this property, the rate of proteins denaturation can be determined. Solvation free energy is also high in SARS-CoV-2 than SARS which indicates that, the SARS-CoV-2 protein cannot be easily denatured in contact with the solvent.

Aromatic-aromatic interactions showed high number in SARS-CoV-2 than SARS (Table 3). Not only number, but some of the residues are participated in aromatic-aromatic interactions are forming a very long network, which has never been reported in any viral proteins. SARS-CoV-2 has 3 isolated and 2 network aromatic-aromatic interactions whereas SARS has only 9 isolated aromatic–aromatic interactions.

Table 3 Isolated and network (bold and italic) aromatic–aromatic interactions of SARS-CoV-2 (5R80) and SARS (2H2Z)

The number of phosphorylation sites (Fig. 4C, D) in SARS-CoV-2 is 54, whereas the number of phosphorylation sites in SARS is 45. That means SARS-CoV-2 has higher number of phosphorylation sites than SARS. The high numbers of phosphorylation sites in SARS-CoV-2 increase the strength of protein–protein interactions and helps in stability [74].

Favorable point mutations of SARS-CoV-2

Result of MSA of both structures showed some point mutations in SARS-CoV-2. So, their effect on protein stability has been analyzed. Total 11 mutations have been identified, among which 8 are favorable and 3 are unfavorable in SARS-CoV-2 (Table 4). Residue number 35, which was threonine of SARS substituted by valine in SARS-CoV-2 after mutation, contributes highest energy, i.e., − 2.24 kcal/mol. Residue S63N mutation contributed 2nd highest energy to SARS-CoV-2, i.e., − 1.16 kcal/mol. However, mutation on A46S, K180N and I286L showed destabilization in SARS-CoV-2 protein stability. 6 polar and 5 non-polar amino acids of SARS were mutated to 5 polar and 6 non-polar amino acids in SARS-CoV-2. The point mutations predicted in SARS-CoV-2 contributed about the total energy level of − 7.46 kcal/mol, which is the main driving force in more stability as compared to SARS.

Table 4 Effect of amino acid mutations in SARS-CoV-2 with their contributing energies

Conclusion

The acidic and basic residues are playing the significant role in evolution. The presence of charged residues in the helix region contributed increasing in protein stability. Increasing hydrophilicity helps SARS-CoV-2 to spread easily through air droplets. Disorder forming residues increase SARS-CoV-2 pathogenicity. High bulkiness of SARS-CoV-2 make them heat tolerate. The long network aromatic-aromatic interactions are the added advantage in protein stability. It is the first report of the presence of cyclic salt bridge and long network aromatic-aromatic interaction in viral protein. Increasing of metal ion binding sites and phosphorylation sites are also playing a crucial role in SARS-CoV-2 protein stability. The point mutations showed, how SARS-CoV-2 engendered itself to gain more stability. It is also a clue to stop SARS-CoV-2 infection severity by deleting those favorable mutant amino acid residues. Protein engineering helps us in this process. The findings of the present investigation contributed many more things, which are essential in drug and vaccine development against SARS-CoV-2.