Introduction

The SARS-CoV-2 global pandemic has had a significant impact on public health and economies around the world [1, 2]. Since the end of 2020, a variety of mutations of the SARS-CoV-2 strain have appeared, which can potentially seriously threaten the health of human beings and impose significant long-term impacts on human production and living activities [3,4,5,6,7,8,9,10,11,12]. Thus, vaccines and antiviral drugs against SARS-CoV-2 are urgently needed. The current strategy for addressing SARS-CoV-2 are aimed at attacking the main link in the life cycle of the virus [6, 13,14,15,16]. It is therefore becomes imperative to understand the life cycle of SARS-CoV-2 at the molecular level and to analyze the structure and function of its constituent proteins [17]. Among the four structural proteins and sixteen nonstructural proteins encoded by the SARS-CoV-2 genome, the majority of attention has been focused on the coronavirus spike (S) proteins due to their importance in the viral life cycle [17, 18]. However, many other SARS-CoV-2 proteins play an equally important role in the virus life cycle, but we know relatively little about their structures or biophysical properties [19, 20]. Among them, the N protein is a particularly attractive antiviral target. In fact, the N protein is not only the basis of viral RNA genome packaging into ribonucleotide complex (RNP) and assembly into virus particles but also is the most abundant protein in virions and a high immunogenicity antigen. Moreover, it is also the determinant of virulence and pathogenesis [17, 21,22,23]. The N protein can not only be used as a potential target for therapy or vaccines but also as an important diagnostic marker for COVID-19 [20, 24, 25]. Therefore, this paper focuses on the SARS-CoV-2 life cycle, the structure and function of N protein domains, the post-translational modification of the N protein, the mechanism and function of N protein-related LLPS, and the N protein-based development of vaccines and diagnostics. This information may help in designing effective drugs and vaccines to end the SARS-CoV-2 pandemic.

SARS-CoV-2

SARS-CoV-2 is a pleomorphic particle with a diameter of approximately 60–100 nm [26, 27]. The length of its RNA genome is approximately 30 kb, and the specific structural composition is shown in Fig. 1a and b: it contains two large overlapping open reading frames, ORF1a and ORF1b, which are further processed to produce 16 nonstructural proteins (Nsp1 to 16) [28]. Additionally, it encodes 4 structural proteins, namely the spike (S), membrane (M), envelope (E), and nucleocapsid (N) proteins and 9 auxiliary proteins (ORF3a, ORF3b, ORF6, ORF7a, ORF7b, ORF8, ORF9b, ORF9c, and ORF10) [9, 28,29,30,31,32]. The structures of some of these proteins have been resolved, and their functions have been studied in depth [31, 33,34,35,36,37,38].

Fig. 1
figure 1

Genome organization of SARS-CoV-2 and structural overview of the SARS-CoV-2 nucleocapsid protein. a. Genome organization of SARS-CoV-2. b. Schematic representation of SARS-CoV-2 N protein domains. The N-terminal domain (NTD), C-terminal domain (CTD) and three intrinsically disordered regions (IDRs), i.e. N-arm, linker region (LKR) and C-tail are illustrated. The charged Ser/Arg (SR)-rich region is shown. c. Deletion analyses of N protein of SARS‐CoV‐2. Structures were visualized in PyMol v2.4

The role of the N protein in SARS-CoV-2 life cycle

The life cycle of the SARS-CoV-2 is shown in Fig. 2 [26, 39]. The initial stage of coronavirus infection involves the specific binding of the SARS-CoV-2 spike (S) protein to host cell entry receptors [26, 39,40,41]. The spike protein, which is a homotrimeric class I fusion glycoprotein, consists of S1 and S2 subunits with different functions. The surface-exposed S1 subunit receptor binding domain (RBD) specifically binds to the host cell receptor angiotensin-converting enzyme 2 (ACE2) [37, 42,43,44,45,46]. The transmembrane S2 domain comprises heptapeptide repeat regions and fusion peptides. When hydrolyzed by host cell-derived serine protease TMPRSS2, they mediate fusion of the virus and host cell membrane through extensive conformational rearrangement [39, 47, 48]. Host cysteine protease cathepsin B (CatB) and CATL facilitate this process [39, 47, 50,51,52]. In addition, a multi-alkali cleavage site (PRRAR) in the prototype protein converting enzyme Furin can be found at the boundary of S1–S2, and Furin cleavage leads to increased infection [46, 53,54,55]. In addition, SARS-CoV-2 can enter the cells through endocytosis, and fusion is then induced by S cleavage of endosomal/lysosomal proteins at low Ph [26, 39]. When SARS-CoV-2 enters the host cell, the N protein is dissociated from the positive strand (+) RNA genome of the virus, and the viral gene replication and expression program begins, which is highly regulated in space and time [26]. In host cells, there is an inherent antiviral immune defense mechanism, called RNA interference (RNAi), which can lead to degradation of the virus genome to inhibit virus replication [56,57,58]. The N protein acts as a viral inhibitor of RNAi in host cells [56, 57]. In the initial step, the dsRNA in infected cells can be intercepted by the N protein, thus preventing the recognition and cleavage of viral dsRNA [57, 58]. On the one hand, the positive strand (+) RNA genome of the virus translates ORF1a and ORF1b into polyprotein pp1a and pp1ab, respectively [26, 39, 59]. After the cleavage of two cysteine proteases, nsp3 (papain-like protease, PLpro) and nsp5 (chymotrypsin-like protease, also known as 3C protease 3CLpro, or major proteolytic enzyme, Mpro), nonstructural proteins are produced [26, 39, 59, 60]. NSP1 shuts down host translation and promotes host mRNA degradation. Nsp2-11 regulates the intracellular environment to facilitate viral replication. Nsp12-16 contains core enzymes needed for RNA synthesis, including RNA-dependent RNA polymerase (RdRp; NSP 12), while NSP2-16 and N protein establish a viral replication-transcriptional complex (RTC) and reshape the cell membrane to form replicating organelles, which are essential for keeping RNA replicated and transcribed in an orderly conformation [26, 42, 61, 62]. These organelles are connected to the endoplasmic reticulum (ER), providing the best environment for viral RNA replication [26, 39]. On the other hand, viral RNAs are replicated in double-membrane vesicles (DMVs) [26, 39]. Replication begins with the synthesis of negative-strand RNA copies, which are used as templates to synthesize new plus-strand RNA genomes that may enter additional rounds of translation or be incorporated into new virions. The discontinuous transcription of positive-strand genomic RNA produces subgenomic minus-strand RNA, which is used as a template to synthesize subgenomic positive-strand RNA that encodes structural proteins and helper proteins [26, 39]. The newborn virus RNA exits DMVs via transmembrane pores to reach the location for translation or virion assembly [26]. Subsequently, the positive-strand RNA of the genome is encapsulated with the N protein and undergoes assembly with structural proteins S, M, and E in the ER–Golgi intermediate compartment (ERGIC), and new virions are formed by budding into the lumen at ERGIC. Finally, the offspring virions may be released from the host cells via exocytosis [26]. However, recent evidence suggests that SARS-CoV-2 is more likely to exit infected cells through lysosomal transport [39, 63]. In general, N proteins are responsible for regulating host cell cycle progression, host–pathogen interactions, and apoptosis [64,65,66,67]. In addition, the N protein has strong immunogenicity, can induce protective immune responses, and is highly expressed during infection [20, 68]. The exploitation of host cellular mechanisms plays a regulatory role in the virus life cycle and is the key to the integration of viral RNA into virus offspring particles. Thus, the N protein plays an important role in the virus infecting host cells.

Fig. 2
figure 2

Depicts the SARS-CoV-2 life cycle. When the SARS-CoV-2 spike protein binds to the target cell receptor ACE2, the S protein is cleaved by host cell proteases, such as TMPRSS2, triggering fusion of the virus with the plasma membrane. In addition, SARS-CoV-2 can enter cells through endocytosis. The N protein is the dissociated from the positive strand (+) RNA genome of SARS-CoV-2 and translated into polyproteins pp1a and pp1ab. These polyproteins are translated and processed into nonstructural proteins nsp1-16, which build viral replication and transcription complexes (RTCs) and reshape the cell membrane to form replicating organelles (DMVs). These organelles form a continuum with the endoplasmic reticulum, and viral RNA replication mainly occurs in DMVs. The newborn virus RNA exits DMVs through transmembrane pores to reach the location for translation or virion assembly. The translated structural proteins are transported to the endoplasmic reticulum (ER) membrane and transit through the ER-to-Golgi intermediate compartment (ERGIC). The positive strand RNA of the genome wrapped by the N protein undergoes assembly with structural proteins S, M, and E, and new virions are formed by budding into the lumen at ERGIC. Finally, the progeny virions are released from the host cell

Structure and function of the N protein

The N protein is a structurally heterogeneous, 419-amino acid-long, multi-domain RNA-binding protein (Fig. 1b and c) [69]. Like other coronaviruses, the N protein also has two conserved, independently folded domains, namely the N-terminal domain (NTD) and the C-terminal domain (CTD) (Fig. 1b) [21, 28]. These two domains are connected by an inherently disordered region (IDR) called the central linking region (LKR). The LKR includes a Ser/Arg (SR)-rich region, which contains putative phosphorylation sites [26, 28, 39, 70]. In addition, there are two IDRs on both sides of the NTD and CTD, which are called N-arm and C-tail. NTD is responsible for RNA binding, CTD is responsible for RNA binding and dimerization, and IDR is responsible for regulating the RNA-binding activity and oligomerization of NTD and CTD [21, 28]. Next, this paper will present the latest research progress on characterizing the structure and function of N protein domains.

Structure and function of NTD

A number of scientific teams have successfully resolved the structure of SARS-CoV-2 NTD [28, 38, 71, 72]. Its structure is very similar to the N proteins of other coronaviruses [28, 71, 72]. The SARS-CoV-2 NTD takes the shape of a right-handed fist (Fig. 3 a). It consists of a four-strand antiparallel β-fold core subdomain, which is located between the annular or short 310 helix and the prominent β-hairpin region formed by β2 and β3 chains outside the nucleus(Fig. 3c). There is a large protruding β-hairpin between β2 and β3 that acts as a bridge connecting them and protrudes from the core (PDBID:6YI3 and 7CDZ); the β-hairpin has a high degree of flexibility [28, 71, 72]. The N protein has the ability to recognize and bind RNA [28, 72, 73]. In NTD, the prominent β-hairpin is mainly composed of basic amino acid residues. Further analysis of the surface electrostatic potential shows that there is a positively charged pocket at the connection between the basic hairpin and the core structure that is considered to be an RNA-binding site, and this site is conserved between different coronaviruses that infect humans (Fig. 3b) [71]. By constructing an atomic model of the protein–RNA complex, Dinesh et al. [71] demonstrated that both dsRNA and ssRNA bind to the positively charged canyon between the alkaline β-hairpin and the core of NTD in a similar way, and the arginine residues R92, R107, and R149 that directly bind RNA are located in the canyon [28, 71, 72].

Fig. 3
figure 3

Structures of SARS-CoV-2 NTD and CTD. a Cartoon representation of SARS-CoV-2 NTD. b The electrostatic surface potential of SARS-CoV-2 NTD. Red and blue colours indicate negative and positive potential, respectively. The RNA-binding sites are highlighted in dotted circles and labelled. c Topology diagram for SARS-CoV-2 NTD; β represents the β-sheet and η represents the 310 helix. d Cartoon representation of SARS-CoV-2 CTD. e The electrostatic surface potential of SARS-CoV-2 CTD. Red and blue colours indicate negative and positive potential, respectively. f Topology diagram for SARS-CoV-2 CTD; α represents the α-helices, β represents the β-sheet and η represents the 310 helix

Although the overall structure of SARS-CoV-2 NTD is very similar to that of other coronavirus NTD structures, there are significant structural differences in several regions. First, compared with all other existing CoV NTD structures that cause human infections, the N-terminal loop of SARS-CoV-2 extends outward, while the circumferential core subdomain of HCoV-OC43 and HCoV-NL63 NTD rotates [28, 38, 69]. Second, the β-hairpin region protruding from SARS-CoV-2 NTD is flexible, but the flexible ring cannot even be seen in HCoV-OC43 or HCoV-NL63 NTD structures. Additionally, the residues that bind to AMP are also different. The N-terminal region of the SARS-CoV-2, SARS-CoV, and MERS-CoV NTDs is similar to that of HCoV-OC43 and contains conserved residues. The phenolic hydroxyl of Y124 interacts with the adenine ring of AMP through a hydrogen bond, while the skeleton of G68 forms a hydrogen bond with the monophosphate group of AMP [28, 71, 72]. For HCoV-NL63, residues H77 and P24 lack the ability to form hydrogen bonds, which may further block the binding of AMP. Additionally, in different coronavirus NTD structures, the electrostatic surface potential also shows different charge distribution patterns, especially corresponding to the N-terminal ring, the top of the protruding region, and the bottom of the core subdomain [28]. In addition, the N-terminal tail residues (Asn48, Asn49, Thr50, and Ala51) of SARS-CoV-2 NTD are more flexible and extend outward to a greater extent than their equivalent residues in HCoV-OC43 NTD, and this may facilitate opening of the binding pocket to adequately fit the high-order structure of the viral RNA genome [38].

Structure and function of N-CTD

Similarly, the SARS-CoV-2 CTD crystal structure has been characterized by several research teams [17, 28, 72, 74]. Usually, two CTD monomers combine to form dimers (Fig. 3d) (PDB:6YUN,7CE0 and 6ZCO), which are diamond-shaped and tile-shaped, and each monomer consists of five α-helices, two 310-helices, and two β-strands (Fig. 3f) [17]. The β-hairpin from one prototype is inserted into the cavity of another prototype, resulting in the formation of a four-chain, antiparallel β-strand at the dimer interface. The β-strand forms one side of the dimer, while on the other side of the dimer, the surface is formed by an α-helix and a ring. The extensive hydrogen bonding interaction between the two hairpins and the hydrophobic interaction between the β-sheet and α-helix result in the dimer structure being highly stable [17, 28]. The CTD structure of SARS-CoV-2 is highly similar to that of SARS-CoV, MERS-CoV, and HCoV-NL63. All of them show a conservative positively charged groove on the helicoid of the N-CTD dimer, and it is speculated that they contain RNA-binding sites [17, 28, 72, 73, 75, 76]. In SARS-CoV CTD, the hypothetical RNA-binding site is located at 248–280 aa and is conserved [17, 75, 76]. The residues identified as RNA-binding sites in SARS-CoV2 CTD correspond to Arg319, Thr334, and Ala336 [34, 76]. The analysis of the electrostatic surface potential of these positions shows that the positively charged residues belonging to these amino acid regions gather at the relative edge of the basic groove, extending laterally relative to the dimer interface line (Fig. 3e) [17, 28]. The positive charge in this region of SARS-CoV-2 CTD is mainly due to the distribution of several positively charged residues, including K256, K257, K261, and R262 [28, 72]. However, the electrostatic potential surfaces of the β-strand surfaces of these proteins show different characteristics [28]. The MERS-CoV structure shows a positively charged central region, while the SARS-CoV-2 and SARS-CoV structures both show a negatively charged region. For HCoV-NL63, the central region of its β-folding surface shows a highly negatively charged region. These differences may affect the binding pattern of RNA recognition [28]. According to the proposed model of the coronavirus RNP complex, CTD primordia are packaged into a spiral core, around which the ssRNA of the genome is distorted, so each CTD contains a single-stranded RNA channel consisting of seven bases in its positively charged groove [17, 64]. Microscale thermophoresis demonstrated that binding occurs to a 7-nucleotide fragment corresponding to the CTD of the single-stranded SARS-CoV-2 RNA genome, showing micromolar affinity, and this fragment is surrounded by the basic grooves of the CTD [17, 28]. In addition, it has been demonstrated that CTD can self-bind to form oligomers (dimers, trimers, tetramers, or even octamers), and the instantaneous interaction between CTD dimers leads to the formation of higher-order oligomers. The degree of aggregation depends on the protein concentration [17, 72, 75, 77,78,79,80]. Similarly, the high-resolution crystal structure of CTD shows that it exists as a dimer in solution due to chain exchange resulting from close contact [28, 72]. The CTD detection and analysis of SARS-CoV-2 by static light scattering and chemical crosslinking showed that the SARS-CoV-2 CTD dimer is stable in solution, and the self-binding of this domain plays an important role in the overall N stability of SARS-CoV-2 [17, 81]. Most importantly, it was found that the C-terminal domain could also self-assemble and further mediate formation of the N protein tetramer [72, 82]. In addition, the SARS-CoV-2 CTD is crucial for liquid–liquid phase separation (LLPS) and NF-κB regulation of the N protein [83]. The formation mechanism contributing to LLPS is described in detail below.

Structure and function of LKR and arms of the N protein

α-Helices and β-sheets are traditionally recognized as important elements of a protein’s secondary structure, and the intrinsically disordered regions (IDR) is becoming increasingly regarded as an important part of protein function [84,85,86,87,88]. The IDR lacks an inherent structure and is flexible and variable in form, which is why it can interact with biological macromolecules such as RNA, DNA, and proteins [85, 89, 90]. Previous studies have shown that there are three IDRs in the N proteins of SARS CoV-1, two of which are either at the N-terminal or the carboxyl terminal (N-arm and C-tail), whereas the third is in the central region (LKR) (Fig. 1 b) [64, 73, 84]. Similarly, there are also three IDRs in the N protein of SARS-CoV-2 [28, 72, 84, 87, 88]. The disordered C-tail of the N protein is thought to play a key role in the interaction with virus M protein and packaging signal [91]. The structural prediction of the C-tail of SARS-CoV-2 N protein shows that this region can form an instantaneous helix [87]. Cubuk et al. [82] reported formation of an instantaneous helix in the leucine-rich region using molecular dynamics simulation and suggested it provides an interface for oligomerization. Using structure prediction tools, Zhao et al. [87] confirmed the existence of a helix across residues 215–235 and found evidence of its role in protein oligomerization and co-assembly with N-terminal nucleic acid(NA) [87]. Regarding the conserved leucine-rich sequence 218–231, its potential role comes from its position in the junction region at 210–246, which has been found to be essential for RNA-mediated LLPS [87, 92]. In addition, hydrogen–deuterium exchange mass spectrometry analysis has shown that the conserved junction region rich in serine/arginine also has RNA-binding capacity [88]. At the N-terminal of the disordered linker, the region rich in SR is due to charged residues and groups serving as phosphorylation sites [87]. Their phosphorylation status is thought to regulate the function of the N protein through interactions of viral NSP3 protein with host proteins such as glycogen synthase 3, CDK-1, and 14–3-3 proteins [88, 93,94,95,96]. The R203K/G204R mutation has been experimentally shown to enhance the ability of the N protein to undergo agglutination, while R203M, which is commonly found in delta variants, can enhance virus replication [97, 98]. The most recently discovered mutation, G215C, is located in the linker between SR-rich and leucine-rich regions. G215C enhanced dimer–dimer interactions under reducing conditions and the possibility of forming disulfide bonds between different protoplasts [87]. In general, these regions are involved in interactions with viral RNA and proteins [21, 84, 87, 88, 99].

Biological function of SARS-CoV-2 N protein

The N protein is the core component of the SARS-CoV-2 virus [28]. It is mainly responsible for identifying and wrapping the virus RNA into a helical symmetrical structure and plays an important multi-functional role in the life cycle of the coronavirus [21, 100, 101]. It binds to virus genomic RNA to form ribonucleoprotein complex (RNP). In addition to assembly, N proteins have other functions, including roles in viral mRNA transcription and replication, cytoskeletal tissue, and immune regulation [21, 28, 102,103,104]. In particular, N protein has been found to counteract host RNAi-mediated antiviral responses through its RNA binding activity, acting as a viral inhibitor through RNA silencing [56]. In addition, N protein can induce humoral and cellular immune response after infection, making it a key target for the development of diagnostics and vaccines [28, 105, 106]. N protein usually exists as a stable dimer [87]. Both NTD and CTD domains and connectors contribute to the binding of hybrid NA. The binding of NA induces a more ordered conformation, allowing dimer–dimer interactions, which in turn coordinate the N protein with scaffolding on NA, resulting in the co-assembly of polymers [87, 106]. Recent studies have shown that N proteins can be separated by liquid–liquid phase separation (LLPS) [101, 107,108,109]. NA binding also promotes LLPS, and the high concentration of N protein and NA co-condensates allows the formation of ribonucleoprotein particles [82, 92, 110]. The N protein also interacts with the SARS-CoV-2 membrane (M) protein, which seems to play a role in promoting N protein aggregates, fixing ribonucleoprotein particles on the virus membrane, and recognizing viral RNA [91, 92]. A recent study found that SARS-CoV-2 N protein binds to mannan-binding lectin (MBL)-associated serine protease 2 (masp2) and leads to complement overactivation and the aggravation of inflammatory lung injury [111, 112]. In addition, Oh and Shin [113] identified the role of the N protein in regulating antiviral immunity [113]. N protein overexpression leads to retinoic acid-induced gene-I (RIG-I)-like receptor-mediated interferon production and a reduction in interferon-induced gene expression. N protein inhibits the interaction between three-part motif protein 25 (TRIM25) and RIG-I [113]. In addition, N protein inhibits polyinosinic:polycytidylic-mediated interferon signal transduction at the level of Tank binding protein 1 (Tank-binding kinase1, TBK1), which interferes with the binding of TK1 to interferon regulatory factor 3 (IRF3), thus preventing nuclear translocation of IRF3 [113]. Another study showed that 11 s proteasome activator PA28γ can regulate intracellular abundance of the N protein [39]. Immunoprecipitation has been used to show that proteasome activator PA28γ is a nucleocapsid binding protein, whereby PA28γ binding plays an important role in regulating 20 s proteasome activity, which in turn regulates the level of SARS-CoV-2 key nucleocapsid protein [114].

N protein post-translational modifications

The N protein is post-translationally modified, and studying these modifications is very important for developing potential medical applications based on N protein [20]. Post-translational modifications of the N protein are shown in Fig. 4: First, phosphorylation plays an important role in regulating RNA binding and changing the physical and chemical properties of the N protein [88]. In the early stage of infection, the SR region is rapidly phosphorylated by cytoplasmic kinases at multiple sites [93, 95, 115, 116]. Phosphorylation leads to binding to RNA helicase DDX1, which promotes the structural RNA changes needed for long subgenomic RNA transcription in RTC [117]. Multivalent RNA–protein and protein–protein interactions based on unmodified N proteins lead to the formation partially ordered gel-like aggregates and discrete particles [93]. Phosphorylation of the C-terminal region disrupts these interactions, and the phosphorylated proteins form droplets that resemble liquids for viral genome processing [93]. In the late stage of infection, nucleocapsid formation and virus assembly do not seem to be dependent on N protein phosphorylation, which is significantly reduced in the nucleocapsids of MHV and SARS-CoV viruses [88, 93, 117]. N proteins form structured oligomers suitable for nucleocapsid assembly. Studies by Wu et al. [88] showed that mutations in S176, S188, and S206 in SR motifs lead to reduced RNA binding and a transition in protein–RNA populations with different solution properties. In addition, the highly phosphorylated state of the SR domain can regulate the interactions of the N protein with viral NSP3 proteins and host proteins such as glycogen synthase 3, CDK-1, and 14–3-3 proteins [20, 82, 87, 95, 115, 116, 118,119,120,121,122,123]. The phosphorylated N protein binds to the 14–3-3 protein in the host cytoplasm and regulates nucleoplasmic N protein shuttling [66, 96]. The phosphorylation of residues in the serine/arginine of LKR regulates discontinuous transcription, especially for shorter subgenomic mRNA that is closer to the 3' end in the early stage of replication [88, 99, 117].

Fig. 4
figure 4

Domain structure and post-translational modification of the SARS-CoV-2 N protein

Second, the N protein exhibits methylation modification. Cai et al. [124] demonstrated that the R95 and R177 residues in the RGG/RG motif of PRMT1-methylated N protein regulate the binding of N protein to its 5'-UTR genomic RNA. The methylation of R95 regulates the N protein by inhibiting the formation of stress granules (SGs). Arginine methylation affects nearby phosphorylation sites, which are usually antagonistic [124, 125]. The methylation of N proteins R95 and R177 is required for RNA binding [124]. Since S176, S180, S183, and S184 are phosphorylated by SRPK1 and GSK3 cell cycle-dependent kinase 1, there is likely to be an interaction between phosphorylation and methylation, especially in the vicinity of R177, in regulating binding to the 5'UTR of SARS-CoV-2 RNA [99, 117, 124, 126].

Finally, the N protein can undergo glycosylation and acetylation modification. Positions 48 and 270 of the N protein are N-glycosylation sites [127]. The Lys375 site is acetylated by host acetyltransferase, and acetylation-mimicking mutations occur frequently at this site, all of which adversely impact liquid–liquid separation of the P protein and RNA [128]. In addition, the post-translational modification of the N protein is also related to its production pathway [20]. The results of MS/MS and 18O labeling experiments showed that N47 and N269 of the N protein are modified by N-glycosylation when expressed in HEK293 cells. Phosphorylation of the T393 site was also observed. On the other hand, the natural N protein produced in the cell exhibits O-phosphorylation, at Ser176, but not glycosylation [20].

Formation mechanism and function of liquid–liquid phase separation (LLPS)

Many RNA-binding proteins, especially those with a high proportion of inherently disordered regions, participate in liquid–liquid phase separation (LLPS) [97, 129,130,131,132]. The protein LLPS is a physical and chemical phenomenon that is considered to be the key mechanism for organizing macromolecules, such as proteins and nucleic acids, into membrane-free organelles [97, 133]. These membraneless cell compartments are dynamically assembled by LLPS and endow cells with the important ability to initiate biological functions or responses to a range of pressures [134,135,136,137]. After RNA virus infection, LLPS mediates the formation of stress granules and P-bodies. These substances play an important role in antiviral immunity by inhibiting the translation of viral mRNA and promoting RNA degradation [129,130,131]. LLPS is also considered to be the key to virus assembly [97, 138, 139]. A key step in coronavirus replication is the association of the N protein with virus genomic RNA, which then condenses into a higher-order RNA–protein complex, thus initiating the assembly of virions [97, 140]. To date, phase separation has been invoked or suggested in many virus environments [82, 141,142,143,144]. LLPS is involved in the interaction between the SARS-CoV-2 N protein–virus RNA complex and other viral proteins, such as nsp12 [82, 92, 101, 107,108,109]. Confocal fluorescence microscopy has shown that the N protein can easily self-bind into many micron-sized spherical aggregates, and its aggregates then fuse and condense into larger aggregates when reaching confluence, which serves as verification of the liquid characteristics of the N protein aggregates [101]. Fluorescence recovery after photobleaching was used to study the dynamics of internal molecules of N protein agglutinate [101]. It was shown that N proteins can diffuse partly freely in the condensed phase, which is consistent with their liquid behavior. In addition, the phase condensation of the N protein is very sensitive to increases in ionic strength, which indicates that the electrostatic interaction is very important for its condensation. The N protein can also experience LLPS in cells and exhibit liquid-like behavior [101]. Matsuo [30] summarized the formation mechanism of LLPS: RNA binding triggers LLPS of the N protein [97]. The N protein easily binds RNA, they are effectively co-separated at physiological salt concentration [101, 145, 146]. These droplets are formed by electrostatic interactions between positively charged N proteins and negatively charged RNA [30]. The shape of the droplet depends on the ratio of the N protein to RNA concentration [107, 109]. The size of the droplet is affected by the used RNA length [109]. The phase separation behavior also depends on the pH value and salt concentration [118]. and ATP can biphasically regulate LLPS by specifically binding to Arg/Lys residues in IDD, which is induced at low ATP concentrations but dissipates at high concentrations [118, 147, 148]. In addition, Each domain of SARS CoV-2 N contributes to its phase separation [101]. Nsp12 cannot undergo spontaneous separation, even if it is mixed with virus RNA, and it can be easily transformed into amorphous agglutinate with poor dynamic performance. However, nsp12 can be easily recruited into N protein–RNA aggregates without changing its shape or arrangement [101]. Similar results were obtained using RdRp complexes and ubiquitin-like domain 1 (Ubl1), which means that the N protein-driven LLPS may play an important role in the life cycle of SARS-CoV-2 [101, 102, 149].

LLPS also mediates the interaction between N protein/RNA and host. By binding to viral RNA, the N protein undergoes liquid–liquid phase separation and forms functional membraneless organelles to recruit TAK1 and IKK complexes, thus promoting the activation of NF-κB. Consistently, 1.6-hexanediol, an inhibitor of LLPS, attenuates SARS-CoV-2-induced NF-κB activation. LLPS of N protein/RNA contributes to virus-induced inflammation [83]. In cells, N protein forms aggregates and recruits stress granule protein G3BP1, highlighting the potential role of the N protein in isolating G3BP1 and inhibiting stress granules [92]. The phase separation of the N protein helps to inhibit G3BP1-dependent host immune response and package genomic RNA during virion assembly [92].

Vaccine research and development

The N protein, spike protein, and membrane protein have been used in vaccine development due to their high immunogenicity [18, 30, 150,151,152,153,154,155,156,157]. Nucleocapsid-specific antibodies can improve protection against SARS-CoV-2 [158]. Gao et al. [159] used high-affinity glycan ligand-decorated glyconanoparticles to develop a universal SARS-CoV-2 vaccine (TCCSia-Ace-Dex-N-Rd vaccine), the TCCSia-Ace-Dex-N-Rd vaccine carries SARS-CoV-2 nucleocapsid protein (N) and can trigger strong N-specific CTL responses against target cells infected with SARS-CoV-2 and its variants of concern. Thura et al. [160] demonstrate novel vaccine candidates against SARS-CoV-2 by using the whole conserved N-protein or its fragment/peptides. The high titers of specific anti-N antibodies maintained for a reasonably long duration, suggesting that N-protein is a excellent immunogen to stimulate host immune system and enhance B-cells activation. Purified inactivated viruses manufactured by Sinovac and Sinopharm, such as CoronaVac and BBIBP-CorV, are options because they integrate not only S protein, but also other viral proteins, including matrix (M), envelope (E) and nucleocapsid (N) [161]. Studies have shown that in populations immunized with inactivated virus and the seropositivity rate is low, enhanced vaccinations significantly improves immunogenicity [162, 163]. Appelberg et al. [164] designed a universal SARS-CoV-2 DNA vaccine containing receptor-binding domain loops from the huCoV-19/WH01, the Alpha, and the Beta variants, combined with the membrane and nucleoproteins. The vaccine induced spike antibodies cross-reactive that neutralized huCoV-19/WH01, Beta, Delta, and Omicron virus in vitro, and primed nucleoprotein-specific T cells. Priming of cross-reactive nucleoprotein-specific T cells alone was 60% protective [164]. Similarly, other teams also designed and developed multiple recombinant vaccines based on nucleoproteins [165,166,167,168].The results showed that recombinant vaccines could significantly increase the levels of serum neutralizing antibody and total immunoglobulin [165], and induce strong specific lymphocyte proliferative response and T cell response [165]. The antigen specificity level of interferon-γ in splenocytes of immunized mice increased [166, 167]. The recently published reviews provided a good summary of the role of SARS-CoV-2 nucleocapsid protein in antiviral immunity and vaccine development [169, 170].

In addition, the N protein gene has attracted much attention as a potential drug target because it is more conservative, is more stable, and has fewer mutations than other viral proteins, such as spike proteins [171]. The drug research and development of the nucleocapsid protein is mainly based on its structure, function, and life cycle [30]. Since the RNA-binding activity of the N protein is very important for viral RNP formation and genome replication, the development of drugs that block RNA binding of the NTD or CTD has been proven to be a good antiviral strategy [28]. In recent studies, molecular docking and molecular dynamics simulation were used to identify potential antiviral drugs and to study the stability of NTD drug complexes [30, 172]. New strategies for the use of old drugs can be used in the treatment of new outbreaks of disease in a very short time and at a low cost. As a result, 34 drugs that have been approved or are under development were studied [30]. The results showed that rapamycin had the best binding affinity for NTD, and other compounds, such as saracatinib, camostat, trimetini and nafamostat, also showed high binding affinity and high stability [30]. Another study focused on compounds from medicinal plants. Five compounds were successfully isolated from 100 plant compounds: aloe-emodin, anthrarine, alizarine, dantron, and emodin [173,174,175,176]. Previous studies have shown that the compound PJ34, which targets the ribonucleotide binding site in NTD, can effectively inhibit the RNA-binding activity of the HCoV-OC43 N protein and inhibit viral replication [149]. By comparing the binding sites of PJ34 in the SARS-CoV-2 NTD structure with those in the HCoV-OC43 NTD structure, it is found that the key residues involved in the interaction are conserved [28, 149]. Dhankhar et al. [177] identified three small molecules with a conformation similar to guanosine monophosphate (GMP) at the active site of the NTD, and they exhibited high binding affinity and stable binding through the formation of hydrogen bonds with Arg107, Tyr111, and Arg149 of the N-terminal domain.

Second, we should consider screening inhibitors that block normal N protein oligomerization so as to prevent RNP formation or to induce abnormal aggregation. Recently, a new inhibitor, 5-benzyloxy Grammer (P3), was discovered by virtual screening [178]. This compound can mediate the NTD nonnative dimerization of MERS-CoV and induce N protein aggregation. It has been proven to have strong antiviral activity against MERS-CoV. By comparing the binding cavity of P3 with the corresponding parts of the SARS CoV-2 N-NTD structure, it was found that almost all the residues involved in the interaction are conserved [28, 178].

Third, many conserved coronavirus proteins, especially N proteins, need to be phosphorylated to be fully functional. Accordingly, Yaron et al. [179] identified alectinib through screening, a kinase inhibitor approved by the FDA that can inhibit N protein phosphorylation of SRPK1/2 and restrict SARS-CoV-2 replication. It is reported that the phosphorylated N protein dimer directly binds to the dimer 14–3-3 protein in a phosphorylation-dependent manner. The relatively tight 14–3-3/N protein binding can regulate nucleocytoplasmic shuttling and other functions of the N protein by blocking the SR enrichment region and hijacking the cell pathway by 14–3-3 protein isolation [96]. Therefore, this component may be a valuable target for therapeutic intervention. GSK-3 inhibitors block N protein phosphorylation and reduce the accumulation of viral RNA in cells. Targeting GSK-3 may provide a new strategy for addressing COVID-19 [96, 123].

The final antiviral therapy strategy involves indirect targeting of N protein regulation by inhibiting host cell kinases [180]. The site-specific phosphorylation of the SR domain by the host cell kinase seems to indicate that the N protein hijacks the host cell kinase for spatiotemporal regulation during the virus life cycle. SRPK1 seems to play an important role in the viral replication of many different viruses, as the inhibition or activation of SRPK1 may be beneficial to viruses [180,181,182,183]. The inhibition of SRPK1 and overactivation of SRPK1 may be effective antiviral strategies alone [182]. In addition, in cell culture models, the inhibition of SRPK1/2 has been shown to inhibit virus replication [182,183,184]. Similarly, viral replication SRPK1/2 inhibitors that have been shown to reduce SAR-SCoV-2 continue to be developed as potential cancer drugs, e.g., feasible strategies to interfere with SARS-CoV-2 replication and transmission include use of the FDA-approved kinase inhibitor alectinib, which strongly reacts with SRPK1 and inhibits SRPK1; the reuse of SRPK1/2 or GSK-3 inhibitors; or the development of new inhibitors [180, 185].

Furthermore, the formation and regulation of biomolecule condensates may be an important activity of the N protein that is essential for SARS-CoV-2 [180]. Cascarina and Ross [180] suggested that the N protein may use its ability to form or bind biomolecule condensates to disrupt stress particles, enhance viral replication or viral protein translation, and package the viral RNA genome into new virions. Considering the important role of the N protein in many stages of the virus life cycle, the regulation of the N protein through treatment with host cell kinase or nonmembrane organelle may be a feasible strategy to combat existing SARSCoV-2 infection [180].

Diagnostic technology development

In addition to being a potential therapeutic or vaccine target, the N protein can also be used as an important diagnostic marker of COVID-19 [20, 24, 25, 186]. The presence of N protein peptides in gargles and nasopharyngeal swabs can be used for the immediate high-throughput detection of SARS-CoV-2 [20]. Fabiani et al. [187] used electrochemical immunosensors to detect SARS-CoV-2 S and N proteins in saliva at concentrations as low as 19 and 8 ng/mL, respectively. Cai et al. [11] developed an ultra-sensitive, rapid, and double digital enzyme-linked immunosorbent assay (dELISA) based on a single-molecule array, which is used to simulate the detection of the spike protein (S-RBD) and N protein. It shows supersensitivity and a high signal-to-noise ratio, which are helpful to improve the accuracy of COVID-19 diagnosis [11]. The detection of SARS-CoV-2 peptides through tandem mass spectrometry can be used as an alternative to polymerase chain reaction (PCR) and immunodiagnosis [20]. Clinical studies have shown that the peptide 41RPQGLPNNTASWFTALTQHGK61 in the N protein can be detected in the saliva of patients with COVID-19 [24]. Another study showed that strong signals for the N protein peptides 375ADETQALPQR385 and 170GYAQGSR177 can be rapidly detected in nasopharyngeal samples of patients with COVID-19 [20, 25].

This detection method based on the N protein serum level is helpful to accurately distinguish PCR-positive patients with COVID-19 from healthy and uninfected individuals [188,189,190,191,192,193,194,195,196]. Li et al. [188, 197] detected the N protein in the serum of COVID-19 patients with a sensitivity of 92% and a specificity of 97%. In another study, S and N proteins were detected in the plasma of patients with COVID-19 at concentrations ranging from 8 to 20,000 pg/mL and 0.8 to 1700 pg/mL, respectively [198]. Tan et al. [199] developed a microfluidic chemiluminescence ELISA platform that can detect S and N proteins in tenfold diluted serum within 40 min. Haljasmägi et al. [189] developed sensitive and specific LIPS methods for binding different SARS-CoV-2 antigens. The LIPS detection of S and N antigen fragments may provide useful information for the immune response of COVID-19 patients with different clinical courses [189, 190, 193]. Torrente-Rodriguez et al. [200] reported a multiplex electrochemical immunoassay for the detection of the N protein and S protein IgG and IgM in 100 × diluted serum samples. These studies have shown that the quantitative measurement of SARS-CoV-2 antigens, such as N and S proteins in serum/plasma, can be used for the accurate early detection of COVID-19 [188, 201].

Conclusion

The outbreak of COVID-19 at the end of 2019 has had a wide range of medical, social, political, and financial implications. The high morbidity and mortality rate of COVID-19 has far exceeded that of seasonal influenza and other diseases. In order to curb the rapid spread of SARS-CoV-2 around the world, people have made great efforts to discover effective methods for diagnosis and treatment. Therefore, a detailed understanding of molecular events in the life cycle of SARS-CoV-2 and their underlying mechanisms, including virus replication and assembly, is urgently needed. Here, we summarize the progress in the research on SARS-CoV-2 N protein. It is important to deepen and develop our understanding of the structure and function of SARS-CoV-2 N proteins, their role in the life cycle of the virus, and their potential in vaccine and drug development.

In addition, to date, the molecular mechanisms involved in the SARS-CoV-2 life cycle after invading cells are not fully understood. In particular, SARS-CoV-2 has been constantly mutating and evolving new adaptation mechanisms, such as the "escape mechanism". More in-depth research is needed to discern the details. New coronavirus vaccines and diagnostic techniques robust to virus mutation have been developed.

Finally, specific drugs and treatments for COVID-19 have not yet been popularized; accurate diagnosis and a series of prevention and control measures are still the most effective means to prevent disease spread. Therefore, everyone must be vigilant, on the one hand, by strengthening their physical fitness levels to improve their own immunity and, on the other hand, by gathering scientific knowledge of self-protection to prevent infection in addition to establishing a routine for COVID-19 disease treatment in the case of future infections.