Introduction

Hepatitis E virus (HEV) infection is currently regarded as the major cause of acute viral hepatitis in the world. It is endemic in developing countries and is now becoming recognized in industrialized countries, with 20 million HEV infections and 70,000 deaths annually [Sayed and Meuleman, 2019; Webb and Dalton, 2019].

HEV is present in feces and bile as a non-enveloped particle of 32–34 nm, while in circulating blood and culture supernatants, HEV can be found as a membrane-associated particle covered in lipids, named "quasi-enveloped" HEV with a diameter of ~ 40 nm [Ji et al., 2021].

HEV is a member of the family Hepeviridae, subfamily Orthohepevirinae [Smith et al., 2014], and has a single-stranded positive-sense RNA genome (RNA ss+) of approximately 7.2 kb [Kumar et al., 2013]. According to phylogenetic analysis, HEV strains can be divided into eight distinct genotypes (HEV1-HEV8) and subtypes [Smith et al., 2014]. HEV1 and HEV2 anthroponotic strains are known to cause large waterborne epidemic outbreaks in developing countries [Kumar et al., 2013], whereas HEV3 and HEV4 are zoonotic and have recently become an increasing public-health issue in developed countries. Infections with these genotypes have been associated with the consumption of raw or undercooked food, particularly pork meat and liver sausages [Colson and Decoster, 2019].

Acute HEV infection is generally self-limiting in the general population, with a case fatality rate < 4%. Nonetheless, in pregnant women infected with HEV1, this rate may increase to 20%, as infection can evolve to fulminant liver failure [Chandra et al., 2008b; Kumar et al., 2013].

Chronic HEV infections can lead to liver failure in some cases, or to acute-on-chronic liver failure in patients with previous liver disease [Horvatits et al., 2019]. Recently, chronic and acute-on-chronic hepatitis E have been considered clinical manifestations of major concern related to HEV3 infection in individuals receiving liver and kidney transplants as well as immunocompromised HIV, lymphoma, and leukemia patients [Fang and Han, 2017; Gerolami et al., 2008; Pischke et al., 2010]. Furthermore, chronic hepatitis E is frequently associated with severe extrahepatic manifestations [Narayanan et al., 2019]. However, extrahepatic manifestations such as neurological disorders, have also been reported in immunocompetent individuals with acute HEV infection [Mendoza-Lopez et al., 2020; Wu et al., 2021].

The HEV genome has a methyl guanosine cap (m7G) at the 5’end and a polyA tail at the 3’ end, and it contains three partially overlapping open reading frames (ORFs) [Kumar et al., 2013].

The HEV Sar55 strain (GenBank accession number AF444002) was used as a reference for the nucleotide and amino acid positions mentioned throughout the following text.

Briefly, ORF1 encodes a non-structural polyprotein, ORF2 encodes a capsid protein, and ORF3 encodes a multifunctional protein. Two short untranslated regions (NCRs) at the 5’ and 3’ ends have also been described, with lengths of 27 and 68 nt, respectively [Nan and Zhang, 2016]. In addition, the viral genome contains four cis-reactive elements (CREs), one of which overlaps the 3’ carboxy-terminal sequence of ORF2 and the 3’ NCR and plays an essential role in viral replication by binding to the RNA-dependent RNA polymerase (RdRp). The second one is located in the intergenic region between ORF1 and ORF3 and forms a stem-loop structure, which has been suggested to be the site of initiation of synthesis of a 2.2-kb capped subgenomic bicistronic mRNA [Cao et al., 2010; Parvez, 2015a]. The other two highly conserved CREs are located at the start of ORF1 and at the end of ORF2 [Ju et al., 2020].

The bicistronic mRNA encodes the ORF2 and ORF3 proteins, the latter of which substantially overlaps the 5’ region of ORF2 in different reading frames [Graff et al., 2006].

In addition, HEV1 strains contain an additional ORF (ORF4, nt 2835–3308), with an internal ribosome entry site (IRES)-like (nt 2701–2787) translation initiation site [Nair et al., 2016]. This short-lived ORF4 protein (20 kDa) is expressed under endoplasmic reticulum (ER) stress conditions, and its expression is likely induced by the antiviral response of the host. The N-terminal region of the ORF4 product interacts with multiple viral and host proteins in order to generate a replication complex with RdRp, Hel, and eukaryotic elongation factor 1 isoform-1 (eEF1α1), stimulating RdRp activity [Nair et al., 2016].

Here, we present a thorough update of the main structural characteristics of the HEV proteins (Figs. 14) and their functions, discussing current issues and proposing potential experimental strategies.

Fig. 1
figure 1

Schematic representation of ORF1 of HEV. The HEV genome is approximately 7.2 kb in length, with a methyl guanosine cap (Cap) at the 5’ end and a polyA at the 3’ end, containing two untranslated regions (NCRs) at the 5’ and 3’ ends. HEV contains three partially overlapping open reading frames (ORFs). ORF1 includes eight putative domains: Y domain (Y), papain-like cysteine protease (PCP), hypervariable region (HVR), proline-rich region (PRO), X domain (X), helicase (HEL), and RNA-dependent RNA polymerase (RdRp). Four cis-reactive elements (CRE) with a stem-loop structure (SL) are indicated, the second of which is located in the junction region (JR). The black lightning bolt symbol represents the cleavage site for the serine protease cellular factor Xa, and the red lightning bolt symbols represent cleavage sites for the serine protease thrombin in ORF1. “MB” indicates the membrane-binding site in the MT-Y iceberg region. The nucleotide and amino acid positions are according to HEV strain Sar55 (GenBank accession number AF444002).

Fig. 2
figure 2

Summary of ORF2 structure and characteristics. (A) ORF2 contains a signal peptide (SP) and three domains: the shell domain (S), middle domain (M), and protruding domain (P). The black triangles indicate the glycosylation sites in ORF2 (N137, N310, and N562), and the black star indicates the proline-rich hinge between the M and P domains. The secondary structure of the HEV3 capsid protein (PDB ID 2ZTN) is displayed. S, M, and P domains are shown in blue, green, and black, respectively. α-Helices, β-sheets, and loops are represented by yellow rectangles, pink arrows, and grey thick lines. Red dashed lines indicate the disordered regions. The nucleotide and amino acid positions are according to HEV strain Sar55 (GenBank accession number AF444002). (B) 3D structure of the capsid protein (HEV3 PDB ID 2ZTN). N-linked glycosylation sites (N137 and N310 in the S domain and N562 in the P domain) are indicated in red.

Fig. 3
figure 3

Major features and motifs of ORF3. The two hydrophobic domains (D1 and D2) and the two proline-rich domains (P1 and P2) are shown.

Fig. 4
figure 4

Novel ORF4, present only in HEV1. The IRES-like element (nt 2701–2787) and ORF4 protein (overlapping ORF1), with its putative ubiquitination site, are shown.

HEV viral proteins

HEV belongs to the alphavirus-like supergroup III of RNA ss + viruses, whose members encode a type 1 helicase (Hel), a type 1 methyltransferase domain (MT), and a type 3 RdRp domain. They also produce capped genomic RNAs and subgenomic RNAs encoding the structural protein and have a polyA tail [van der Heijden and Bol, 2002]. Animal viruses from this supergroup include members of the genera Alphavirus and Rubivirus and the subfamily Orthohepevirinae of the families Togaviridae, Matonaviridae, and Hepeviridae, respectively. Although HEV can circulate as a "quasi-enveloped" particle, it lacks surface glycoproteins, thereby differing from enveloped animal alpha-like viruses [van der Heijden and Bol, 2002; Yin et al., 2016]. Furthermore, members of these viral families have a conserved X domain, whose function remains unknown [Koonin and Dolja, 1993]. Interestingly, alphaviruses have a genome organization similar to the one seen in plant viruses belonging to the genera Tobamovirus, Tobravirus, Hordeivirus, and Furovirus. However, the structural proteins have at least three different origins, resulting in viruses with very divergent structures [Strauss and Strauss, 1994].

ORF1

ORF1 is the largest open reading frame in the HEV genome, with 5082 nucleotides (nt), encoding a polyprotein 1693 amino acid (aa) in length, containing eight putative functional domains. These domains include, from the N-terminal to the C-terminal end: MT, Y domain (Y), papain-like cysteine protease (PCP), hypervariable region (HVR), proline-rich region (Pro), X domain (X), Hel domain, and RdRp [Nan and Zhang, 2016].

The non-structural ORF1 protein of HEV shares the most sequence similarity with rubi-like viruses of the genera Rubivirus, Betatetravirus, Benyvirus, and Omegatetravirus as well as Sclerotinia sclerotiorum debilitation-associated virus [Batts et al., 2011; Liu et al., 2009].

Localization studies of the ORF1-encoded protein in human cell lines have shown that it is associated with the cell membrane in the perinuclear region, particularly in the ER and the ER-Golgi intermediate compartment [Perttilä et al., 2013].

Whether the ORF1 polyprotein needs to be further processed into single domains or can function as a single unit is still unclear, as contradictory results have been obtained [Paliwal et al., 2014; Parvez, 2013; Perttilä et al., 2013; Suppiah et al., 2011].

So far, although several studies have demonstrated the proteolytic processing of the ORF1 polyprotein in vaccinia virus, baculovirus, and eukaryotic expression systems, it is not yet clear whether this proteolysis occurs due to viral or host proteases [Sehgal et al., 2006]. It has been hypothesized that the HEV ORF1 polyprotein is processed by the cellular factor Xa (in the PCP domain) and thrombin (in the X domain) [Palta et al., 2014], and this processing seems to be essential for viral replication [Kanade et al., 2018]. These cellular serine proteases are involved in the blood coagulation cascade [Palta et al., 2014] and are first synthesized in the liver as precursor proenzymes (prothrombin and factor X, respectively), after which prothrombin is cleaved into active thrombin by factor X in the plasma [Wood et al., 2011]. In contrast, ORF1 expression in cell-free and prokaryotic systems does not result in polyprotein processing [Ansari et al., 2000].

Recently, in order to study the polyprotein processing of HEV ORF1, a novel BacMam strategy was employed in which a complete HEV3 genome (GenBank accession number AY575859) was cloned into a BacMam vector. Huh7 cells were infected with the recombinant baculovirus containing the HEV genome, and fragments of 18, 35, 37, 56 kDa were obtained, corresponding to PCP, MT, RdRp, and ORF2, respectively, suggesting that proteolytic processing had occurred. Additionally, MT activity was confirmed [Kumar et al., 2020].

Methyltransferase domain

Early studies revealed the expression of a 110-kDa HEV protein (P110) and an 80-kDa putative proteolytic product in insect cells [Nan and Zhang, 2016]. These proteins were shown to participate in the synthesis of the 5’ cap of the viral RNA through their guanine-7-methyltransferase and guanylyltransferase (GT) activity [Nan and Zhang, 2016], which are essential for the infectivity of HEV [Emerson et al., 2004]. The MTs of the alphavirus supergroup use an unconventional capping pathway [Decroly et al., 2011] in which the P110 polyprotein product first methylates GTP to produce m7GTP and then transfers it to the 5´ end of the mRNA to form a covalent enzyme-m7GTP complex, releasing pyrophosphate [Magden et al., 2001]. In alphavirus-like togaviruses, the methyl group is retrieved from S‑adenosyl‑L-methionine [Decroly et al., 2011]. In addition, HEV P110 was found to be strongly bound to a membrane, like an integral membrane protein, but it lacks nonpolar amino acid sequences typical of transmembrane segments [Magden et al., 2001]. Also, through in silico predictive approaches, a putative Zn2+ finger domain was identified around position 73–94 [Karpe and Lole, 2011].

Sequence analysis of the MT “alto”-group within the alphavirus supergroup (alphaviruses, orthohepeviruses, piscihepeviruses, tricornaviruses, tobamoviruses, tobraviruses, and hordeiviruses) has shown that they have a core region of 200 aa comprised of nine interspersed α-helices and β-strands, αA to αE and βA to βD, followed by three β-strands, βE to βG [Ahola and Karlin, 2015]. The HEV MT domain resembles those of alfalfa mosaic virus, brome mosaic virus, and cucumber mosaic virus of the family Bromoviridae [van der Poel et al., 2001], containing seven conserved motifs (I, Ia1, Ia2, II, IIa1, III, and IV) with invariant H, DxxR, and Y residues (at the beginning of βG) in I, II, and IV, respectively [Rozanov et al., 1992]. The histidine residue has been shown to be necessary for the GT reaction but not for MT activity in alphavirus-like togaviruses [Decroly et al., 2011]. The conserved DxxR motif in αC is believed to be part of the binding site for the methyl donor substrate S-adenosyl-l-methionine.

Subsequently, sequence analysis of members of over 50 genera of viruses (including HEV) demonstrated that MT has a region of conserved secondary structure downstream of the core region, named the “iceberg region”. In particular, the “iceberg region” in the “alto” group within the alphavirus supergroup, which includes HEV, is composed of six to seven predicted β-strands (βG to βL’), followed by four to five α-helices (αF to αJ), with the insertion of a helix (αE’) between βG and βH, and of two strands (βM and βN) between helices αG and αH, unlike the members of the “tymo” group (order Tymovirales) within the alphavirus supergroup [Ahola and Karlin, 2015]. The “iceberg region” in the “alto”group has three conserved or semi-conserved positions, H at the end of strand βG, D/E in the middle of strand βI, and G/A/S in the loop between βM and βN [Ahola and Karlin, 2015]. According to several studies, the “iceberg region” is crucial for MT and GT functionality, strongly suggesting that it plays an important role in binding the methyl acceptor substrate GTP [Ahola and Karlin, 2015]. The “iceberg” C-terminus region in the “alto” group contains proven membrane-binding amphipathic helices composed of a hydrophobic segment followed by polar positively charged residues in αH. In the adjacent region, additional membrane-binding amphipathic helices within αI have been predicted [Ahola and Karlin, 2015].

Recently, a study showed that D29N and V27A substitutions in the MT sequence were associated with a more severe outcome in patients with HEV-associated acute liver failure, whereas the H105R mutation was associated with low HEV viremia, suggesting that this region might be a potential antiviral drug target [Borkakoti et al., 2017].

An HEV-host protein-protein interaction network study revealed that the PSMB4 protein (component of the 20S proteasome) interacts directly with MT, presumably altering the processing of the major histocompatibility complex (MHC) class I peptides [Subramani et al., 2018; Wißing et al., 2021].

Furthermore, an MT from a cell-culture-adapted HEV strain (47832c) expressed in HEK293T cells was reported to prevent interferon regulatory factor 3 and the p65 subunit of NF-κB from phosphorylation and activation in a dose-dependent manner [Myoung et al., 2019]. Moreover, HEV MT was demonstrated to strongly inhibit pattern recognition receptor (PRR) melanoma differentiation-associated protein 5 (MDA5)-mediated induction of the IFN-β promoter [Myoung et al., 2019] as well as RIG-I-induced activation of type 1 interferons (IFNs) [Kang et al., 2018]. MDA5 and RIG-I are PRRs that sense cytoplasmic double-stranded RNA [Kang and Myoung, 2017a, 2017b; Loo et al., 2008; Takeuchi and Akira, 2010]. Interestingly, this effect was not observed for other HEV strains analyzed (Sar-55, Mex-14, ZJ-1, or Kernow-C1), which suggests that blockage of IFN-β signaling may be necessary for adaptation of HEV to cell culture. Notably, the MT of strain 47832c lacks the C-terminal Y domain-iceberg region, which is present in the other strains and is now considered an integral part of the MT protein (see section 2.1.2), suggesting that the presence or absence of the Y domain might alter the functional activity of MT. [Myoung et al., 2019]. MT interferes with ferritin secretion to decrease the inflammatory response and acts on retinoic-acid-inducible gene I (RIG-І) and MDA5 to reduce IFN production [Li et al., 2019]. As an acute-phase protein, ferritin is abundantly secreted in HEV-infected patients and is associated with the inflammatory response. Thus, it has been proposed that this domain inhibits the host immune response by preventing ferritin secretion [Li et al., 2019; Yadav and Kenney, 2021]. This seems to occur through interaction with the light chain of human ferritin [Lhomme et al., 2020a].

Y domain

This second domain, which spans from aa 216 to 442, seems to be unique to HEV, rubella virus (RubV), and the plant virus beet necrotic yellow vein virus (BNYVV) within the alpha-like supergroup, and it shows the highest sequence similarity to that of RubV [Koonin et al., 1992].

Notably, analysis of alphavirus-like superfamily sequences has suggested that the Y domain might be an extension of the C-terminal MT domain [Ahola and Karlin, 2015]. In fact, it was observed that the N-terminus of the Y domain in HEV and RubV overlapped the conserved motif III of the MT domain [Koonin et al., 1992]. However, no specific function for this region has been assigned yet.

Sequence analysis of HEV and related alphaviruses has also identified a potential palmitoylation site (C336-C337) that is highly conserved in genotypes HEV1-4. These amino acids, together with W413, are important for HEV replication and are possibly involved in membrane binding in intracellular replication complexes [Parvez, 2017a]. Tryptophan is known to be a key hydrophobic residue for α-helical protein folding for protein-protein interactions [Parvez, 2017a].

In addition, an α-helix segment consisting of L410Y411S412W413L414F415E416 has been shown to be conserved in HEV and to be involved in cytoplasmic membrane binding [Parvez, 2017a].

Furthermore, in terms of RNA secondary structure, three stable hairpins/stem-loops at nt 788–856, 857–925 and 926–994 have been shown to be indispensable for HEV replication and infectivity [Parvez, 2017a].

PCP domain

The main difference in the organization of the functional domains in the genome of members of the alpha-like supergroup lies in the protease region. The proteases of RubV and other alphaviruses show a relocation in relation to the putative PCP domain of HEV, whereas BNYVV completely lacks it. Nonetheless, a region of the HEV PCP exhibits moderate similarity to the one of RubV [Koonin et al., 1992].

The PCP domain is a putative chymotrypsin-like protease that can process both ORF1 and ORF2 [Paliwal et al., 2014]. Six highly conserved cysteine residues (C457, C459, C471, C472, C481, and C483) and three histidine residues (H443, H497, and H590) in PCP have been found to be critical for HEV-Sar55 replicon replication in S10-3 cells, possibly belonging to the enzyme active site [Parvez, 2013].

An in silico 3D model of HEV PCP was constructed by homology modelling using RubV p150, and the presence of a predicted "papain-like β-barrel fold" confirmed its classification as protease [Parvez and Khan, 2014]. Based on homology to RubV, residues C434 and H443 were identified as the putative catalytic dyad [Parvez and Khan, 2014], rather than C434 and H590, as had been proposed previously [Koonin et al., 1992]. In another study using 3D modelling, a catalytic triad consisting of C483, H590, and N591 was predicted to be part of the active site between the N-terminal helical domain and the C-terminal β-sheet domain, which is the main characteristic of papain-like cysteine proteases [Saraswat et al., 2019].

Furthermore, a Zn2+-binding pocket coordinated by C457-H458-C459 and C481-C483 was recognized within the β-barrel fold of HEV. Structural Zn2+-binding sites are commonly coordinated by four cysteines and a histidine as ligands [Zhou et al., 2009]. Based on homology modeling, putative Ca2+-dependent association of the calmodulin (CaM) binding site signature "D-X-[DNS]-[ILVFYW]-[DEN]-G-[GP]-XXDE" was identified. Among the different Ca2+-binding motifs found in living systems, the most common one consists of a "helix-loop-helix" structure, called the "EF-hand". Viral EF-hand Ca2+-binding motifs have also been reported in the rotavirus VP7 protein, the HIV-1 gp41 protein, the polyomavirus VP1 protein, and the RubV nonstructural p150 [Zhou et al., 2007]. The presence of essential active cysteines in the overlapping putative Ca2+/CaM-binding motif of HEV suggests the formation of three intramolecular disulfide bridges that might structurally enable the orientation of the EF-hand towards Ca2+ binding.

One of the best-known small protein modules that specifically interacts with proline-rich motifs of regulatory proteins consists of a "WW-domain" or "rsp5-domain", with two distantly located tryptophan residues. Although these modules have not been reported in viral proteins so far, in the HEV protease model, a putative W437-W476/rs5 domain has been identified and proposed to interact with the proline-rich hypervariable region in HEV ORF1.

The proposed model suggests then that the putative catalytic dyad and divalent metal-binding motifs are essential for the structural integrity of the HEV protease and for polyprotein processing and RNA replication [Parvez and Khan, 2014].

On the other hand, a recent computational analysis of the complete ORF1 polyprotein identified an uncharacterized ordered secondary structure region involving residues 510–619, surrounded by two disordered regions (residues 492–509 and 692–779). The crystal structure of this protein was determined by X-ray diffraction (PDB code 6NU9), and no similar amino acid sequences were found in the RCSB Protein Data Bank (PDB). The structure consists of 10 β-strands and four α-helices. β-Strands 1 to 10 are arranged in two antiparallel sheets that form a sandwich-like fold. α-Helices 1 and 2 are located between β-strands 1 and 2, and α-helices 3 and 4 are positioned at the C-terminus of the protein, with α-helix 4 situated between the two antiparallel β-sheets. Furthermore, this protein exhibited significant structural similarity to multiple fatty-acid-binding domains and was found to contain a bound zinc ion coordinated by residues H671, E673, and H686. Whether this coordinated zinc plays a catalytic or structural role remains unknown. Therefore, this protein was associated with possible zinc metalloprotease activity, previously believed to be located at aa 433–592 [Proudfoot et al., 2019].

Detailed studies have suggested that the highly conserved residue E583, located between β-strands 5 and 6, might act as the catalytic residue. However, the crystal structure showed that the geometry of the zinc-binding motif was not ideal for executing a proteolytic reaction. Hence, the authors proposed that the binding of a fatty acid, which would be readily available in the liver, or another endogenous ligand between the two β-sheets could shift α-helices 3 and 4, reorienting the zinc-coordinating amino acids into a catalytically active position [Proudfoot et al., 2019].

HEV PCP has been suggested to process ORF1, due to the presence of LXGG cleavage site motifs, which are commonly found in plus-sense RNA viruses at aa 664 (between the PCP and X domains), and at aa 1205 (between Hel and RdRp), and also to possess deubiquitinating activity [Karpe and Lole, 2011], which is known to require a Zn2+−binding finger [Reyes-Turcu et al., 2009], which, as discussed above, is present in the MT region.

Ubiquitination (Ub) is the process of protein tagging for selective degradation in proteasomes. There are also some small ubiquitin-like molecules (UBLs) that are expressed in eukaryotes and conjugated to target proteins to modulate their stability and function [d’Azzo et al., 2005; Haglund and Dikic, 2005; Welchman et al., 2005]. Conversely, deubiquitinating enzymes are proteases that cleave Ub or UBLs from target proteins. The deubiquitinating activity of HEV MT-PCP was tested in vitro employing fluorogenic UBL substrates (Ub-AMC, ISG15-AMC, Nedd8-AMC, and SUMO-AMC), and deISGylation of interferon-stimulated gene 15 (ISG-15)-conjugated cellular proteins was demonstrated [Karpe and Lole, 2011]. UBL ISG-15 is expressed and conjugated to targets, a process known as ISGylation, in response to infection and INF-α or INF-β expression, thus inhibiting entry, replication, or release of intracellular pathogens [Villarroya-Beltri et al., 2017]. Therefore, HEV MT-PCP deISGylation might play a role in evasion of cellular antiviral pathways [Karpe and Lole, 2011]. This is supported by the observation that other viral proteases, such as PCP of porcine reproductive and respiratory syndrome virus, have been found to inhibit host innate immunity through their deubiquitinase activity [Li et al., 2010; Sun et al., 2010].

Notably, the PCP domain has been demonstrated to have deubiquitinase activity for RIG-1 and TBK-1 a downstream molecule activated by mitochondrial antiviral signaling due to RIG-1. These proteins require ubiquitination for their activation in an experimental model in which IFN is induced in hepatoma cells by polyinosinic polycytidylic acid, a double-stranded RNA homologue [Nan et al., 2014b]. Indeed, it has been observed that the HEV PCP strongly downregulates MDA5-mediated activation of INF-β induction and consequently severely decreases the level of phosphorylation of interferon regulatory factor 3 (IRF3). For full induction of INF-β expression, both IRF3 and NF-κB need to be activated and translocated into the nucleus. Hence, this study supports the notion that the HEV PCP is an antagonist and regulator of the antiviral state of type 1 IFN through IRF3 and NF-κB [Kim and Myoung, 2018].

Furthermore, the entire amino-terminal region of HEV3 ORF1 (MT-Y-PCP) has been shown to inhibit IFN-stimulated response element promoter activation and the expression of several IFN-stimulated genes in response to INF-β. These regions were also found to interfere with INF-β-induced STAT1 nuclear translocation and phosphorylation, indicating that MT-Y-PCP targets the JAK/STAT pathway. This inhibitory role seemed to be genotype-dependent, as it was not seen with HEV1 [Bagdassarian et al., 2018].

An intraviral interactome analysis revealed that the PCP domain is able to self-interact and to interact with other viral proteins, including MT, RdRp, and ORF3, suggesting that it might participate not only in cleavage of the ORF1 polyprotein but also in the assembly of replication complexes, along with ORF3 [Osterman et al., 2015].

Currently, the data regarding the structure and function of the HEV protease in ORF1 processing are not fully consistent, and further investigations are needed, especially since it could be considered a potential target for antiviral drugs.

HVR and Pro domain

There is still some debate regarding the nomenclature of these two regions, since, generally, they are not discriminated as different domains and their function is currently unknown.

At first, the HVR and Pro segments were considered part of the same hypervariable region because of the extreme sequence divergence around nt 2011–2325 (aa 662–766) in the HEV-Sar55 strain [Nan and Zhang, 2016]. Later, a section overlapping the HVR at aa 712–778 was identified as a proline-rich region due to the large number of proline residues. The Pro region contains only a few bulky hydrophobic amino acids (I, M, F, W, and Y) and a high amount of polar and charged amino acids (A, G, P, and S).

The Pro region in HEV1-4 is flanked at the N- and C-terminal end by the conserved sequences TLYTRTWS and RRLLXTYPDG, respectively [Purdy et al., 2012]. Recently, the HVR region was shown to be located in an intermediate region flanked by the Pro N- and C-terminal regions [Muñoz-Chimeno et al., 2020].

Moreover, the HVR has been recognized as a hinge between the X domain and the upstream sections, with inherent flexibility resulting from the multiple "disorder-promoting" proline residues, which might lead to an unstable tertiary structure [Dunker et al., 2008; Koonin et al., 1992; Tsai et al., 2001] with incomplete folding [Campen et al., 2008; Radivojac et al., 2007; Williams et al., 2001]. It has been suggested that the Pro region might be an essential part or modulator of the helicase or protease domains [Gouvea et al., 1998]. Indeed, researchers have demonstrated that certain proteins lack a fixed structure under physiological conditions and that this unstructured state is important for their function [Dunker et al., 2008].

Interestingly, it has been suggested that the high genetic heterogeneity of the HVR and the X domain might be associated, with the persistence of the virus in the acute phase of HEV infection, which could be explained by the appearance of mutants capable of overcoming the host immune response. Furthermore, a study has revealed that the complexity and heterogeneity of the Pro and X domains are correlated, indicating that they could have evolved together, since the ORF1 product might not undergo cleavage [Lhomme et al., 2014b].

Previously, the hypervariability of the Pro region was believed to result from the high rate of insertions and deletions, but a study showed that the rate in this region is similar to that in the rest of ORF1. The difference likely lies in the tolerance of mutations in the first and second codon positions, possibly because of its intrinsically disordered structure. This variability allows a shift in codon usage towards codons containing cytosine residues, which in turn produces more proline, alanine, serine, and threonine residues, which favor formation of disordered proline-rich structures [Purdy, 2012]. In contrast, Smith et al. have proposed that the requirement for certain amino acids in this region gives rise to the increased frequency of cytosines rather than being a consequence of it. Thorough analysis has shown that the evolution of the HEV Pro region is shaped by pressures leading to increased proline content with a consequent decreased frequency of aromatic amino acids [Lhomme et al., 2014b].

It has also been observed that the carboxyl half of the Pro region might be more permissive to mutations and may bind more ligands than the amino half [Purdy et al., 2012].

Curiously, analysis of HEV Pro region sequences has suggested that HEV-3 and HEV-4 strains share a common ancestor and are twofold more heterogenous than HEV-1 strains [Purdy et al., 2012]. At the same time, the zoonotic strains share a certain similarity in their purine/pyrimidine content in the amino half of the Pro domain. The same study also showed that this region is the only one within ORF1 in HEV-1, HEV-3, and HEV-4 that contains sites that are under positive selection, with 4–10 codons with a dN/dS ratio greater than 1, and it possesses the highest density of sites with homoplasy values greater than 0.5. Particularly, HEV-3 and HEV-4 showed threefold higher homoplastic values, whereas no difference was observed in HEV-1. This presence of numerous highly homoplastic sites indicates the operation of recurrent selection pressure on Pro in the zoonotic genotypes [Purdy et al., 2012].

Due to the numerous insertions and deletions, this region is the main one responsible for size differences in HEV genomes among genotypes [Pudupakam et al., 2011]. Indeed, sequence analysis at the amino acid level has revealed that the HVR region represents up to 71% of the sequence divergence between genotypes, with an intra-genotype variability of 31–46%. [Pudupakam et al., 2011], which is possibly related to adaptation to a wide range of hosts [Purdy et al., 2012].

Although it was demonstrated, using deletion mutants of HEV-1 replicons in Huh7 cells, that the HVR is not required for viral infectivity in vitro, it was observed that it influenced the efficiency of RNA replication, whereas deletion of nearly all of this region from an avian HEV infectious clone resulted in viral attenuation in chickens [Pudupakam et al., 2011, 2009]. Furthermore, it has been reported that HVR is functionally exchangeable between genotypes, resulting in genotype-specific differences in replication efficiency [Pudupakam et al., 2009, 2011]. Therefore, it has been suggested that the HVR may tolerate small deletions that do not affect infectivity but might be needed for interaction with viral and host factors for virus entry and assembly [Parvez, 2017b].

Additionally, the SH3 PxxP binding domains, which would seem to be a consequence of the proline content [Smith et al., 2012], were found in HVR of HEV1-4, and hence, these interaction motifs were believed to be employed by HEV to enhance its replication and/or infectivity [Pudupakam et al., 2011].

Further, an HEV-3 Pro 3D model protein was predicted to contain a peptide cleavage site modified by enzymes and that bind to proteins, nucleotides, and metal ions located in the conserved regions flanking the HVR, which have been shown to regulate cellular signal transduction, protein phosphorylation, transcription, and translation [Purdy et al., 2012]. Particularly, within the intrinsically disordered region (IDR) in the HVR/Pro domain of HEV1-4, seven putative linear motifs were located, including two protease-cleavage sites, three ligand binding sites, and two kinase phosphorylation sites [Purdy et al., 2012]. Structure-based analysis showed that these linear motifs are able to bind a wide range of ligands.

In fact, peptides that contain a large number of proline residues act as ligands, since the cyclized side chain restricts movement of the backbone [Kay et al., 2000; Williamson, 1994]. Furthermore, the aforementioned Pro 3D model showed that this protein is highly polarized, negatively charged, and largely solvent accessible and flexible, which is common in IDRs [Purdy et al., 2012].

Interestingly, a 171-nt insertion of the human ribosomal protein S17 was detected in the HVR region of an isolate from a patient chronically infected with HEV and coinfected with HIV-1 (Kernow C1-p6 strain). This insertion has been suggested to confer a cell culture adaptation and growth advantage in vitro as well as expanding the host range, making it able to infect pig, deer, chicken, cat, dog, mouse, and hamster cells. Therefore, it has been proposed that the divergent HVR sequences might represent evolved host-derived sequences acquired during chronic infection [Nguyen et al., 2012; Shukla et al., 2012]. The authors suggest the possibility that this insertion might enhance the stability and/or translatability of the RNA or assist in the folding, processing, or stability of the ORF protein [Shukla et al., 2012]. Moreover, other HEV-3 strains have been reported to possess duplications or insertions in the HVR [Debing et al., 2016a; Legrand-Abravanel et al., 2009], suggesting that HEV recombination might not be such a rare event as previously thought [Parvez, 2017b]. This S17 gene insertion was demonstrated to confer nuclear/nucleolar trafficking ability to the ORF1 protein, and its lysine residues were associated with enhanced replication of that HEV strain [Kenney and Meng, 2015].

A recent study demonstrated that an HEV3 47832c strain (originally isolated from a chronically infected transplant patient) carries a bipartite insertion in the HVR, resulting from duplications of an adjacent part of the HVR and a part of its RdRp region, which can also enhance HEV cell culture replication. This effect seemed to be dependent on the translated amino acid sequence of the insertion instead of the RNA sequence [Scholz et al., 2021].

Additional recombinant events in the HEV Pro region have been reported in 11% (3/27) of strains isolated from French chronically infected solid-organ transplant recipients, and these involved parts of the Pro and RdRp, a fragment of a human tyrosine aminotransferase gene and a fragment of the human inter-α-trypsin inhibitor (ITI) gene, suggesting that the ITI gene insertion might confer increased HEV growth capacity in vitro. In silico analysis showed that these sequences, which are rich in aliphatic and basic amino acids, could provide acetylation, ubiquitination, and phosphorylation sites [Lhomme et al., 2014a]. However, in another study, three out of seven HEV-3 strains with genomic rearrangements were found in the acute phase of infection, six of which represented virus-host recombinants [Lhomme et al., 2020b]. Other human genes have been found to insert into this region as well, such as the eukaryotic translation elongation factor EEF1A1P13, the 18S ribosomal pseudogene RNA 18SP5, the kinesin family member KIF1B, and the zinc finger protein ZNF787 [Lhomme et al., 2020b].

A host-virus interaction analysis showed that HVR interacts directly with C3 (core component of the classical and alternative complement activation pathways), suggesting that this binding might alter or inhibit complement activation as a host immune evasion strategy [Subramani et al., 2018].

In summary, the Pro/HVR domain could be important for viral replication, with a structural rather than a regulatory or enzymatic function [Smith et al., 2012].

Due to the characteristics mentioned above, this region has been proposed as a target for development of novel antiviral drugs [Purdy et al., 2012].

X domain

The X domain is also known as the macrodomain, since it resembles the non-histone domain of the histone macro H2A and is recognized as a very conserved protein throughout evolution in all eukaryotic organisms, bacteria, and archaea, and is even present in members of three ss + RNA virus families: Coronaviridae, Togaviridae, and Hepeviridae [Li et al., 2016].

HEV domain X is classified as a member of the macrodomain protein family of ADP-ribose-1´´-monophosphatase (Appr-1′′-pase), which catalyzes the reaction converting ADP-ribose-1′′-monophosphate (a side product of cellular pre-tRNA splicing) to ADP-ribose [Parvez, 2015b]. Furthermore, the HEV macrodomain has been shown to have hydrolytic activity for mono-ADP-ribose (MAR) and poly-ADP-ribose (PAR) chain removal, known as de-MARylation and de-PARylation, respectively [Li et al., 2016]. Indeed, the HEV Hel domain, when located in cis, drastically increases the binding of the macrodomain to poly-ADP-ribose, promoting de-PARylation activity [Li et al., 2016].

When molecular modeling of the X domain was carried out in order to predict possible active ligand binding sites, 10 potential sites were identified, including sites for metallic ligands such as Mg2+ and Zn2+ [Vikram and Kumar, 2018].

In silico and in vitro analysis identified a putative Appr-1′′-pase catalytic site "N806, N809, H812, G815, G816, and G817". The "G" triad forms a loop that connects "N" containing β-strand 3 and α-helix 1, homologous to RUBV [Parvez, 2015b, 2013]. The mutations G816V and G817V have been shown to be lethal for replication of HEV strain Sar55 in S10-3 cells. Therefore, the regulatory or catalytic role of the X domain depends on this "N, N, H, G, G, G" sequence and/or secondary structure elements. It was then concluded that the HEV macrodomain is vital for genome replication at the post-translational stage [Parvez, 2015b], but not during the transcription process [Parvez, 2013].

Moreover, it was proposed that the C-terminal region of the X domain can interact directly with ORF3 and MT through "I66-I67" and "L101-L102", respectively, which are highly conserved residues among HEV genotypes [Anang et al., 2016]. The X domain binding region identified in this study was located almost inside the putative core MT domain (56–146 aa) [Anang et al., 2016].

Subsequently, HEV-human protein-protein interaction analysis showed that the PSMB1 protein (a component of the 20S proteasome) interacts with the X domain, apparently altering the processing of major histocompatibility complex (MHC) class I peptides [Subramani et al., 2018]. Additionally, the HEV X domain was found to interact with the RACK1 protein, which is believed to promote viral translation/replication [Subramani et al., 2018]. Interestingly, the HEV macrodomain has been shown to downregulate type I IFN synthesis in vitro by inhibiting poly(I:C)-induced phosphorylation of IRF-3, a key transcription factor for IFN induction [Nan et al., 2014b]. It has been suggested that this domain can bind directly to the light chain subunit of human ferritin, sequestering it in order to prevent its secretion and possibly suppressing the cellular innate immune response, since ferritin has been reported to be an acute-phase protein in viral hepatitis patients [Ojha and Lole, 2016].

On the other hand, it has been suggested that the great genetic heterogeneity within HEV quasispecies in the macrodomain for chronically HEV infected patients might favor the appearance of persistent variants [Lhomme et al., 2014b].

Hel domain

The helicases of RNA viruses can be classified into two superfamilies: SF1 and SF2 [Kadaré and Haenni, 1997]. The helicase of HEV, like those of other alphavirus-like superfamily members, belongs to the SF1 superfamily and contains a purine nucleoside triphosphate (NTP)-binding motif composed of the two conserved sites Walker A (aa 975–982) and Walker B (aa 1029–1032). The A site contains a stretch of hydrophobic residues followed by the conserved sequence GxxxxGKS/T (x being any amino acid), and it has been reported that it is directly involved in binding to the β and γ phosphates of the NTP. The B site is formed by a D residue and hydrophobic amino acids, and this site acts as a chelator of the Mg+ 2 of the Mg-NTP complex [Kadaré and Haenni, 1997].

Seven signature motifs, I (site A), Ia, II (site B), III, IV, V, and VI in colinear disposition, have been identified [Kadaré and Haenni, 1997; Koonin et al., 1992; Nan and Zhang, 2016]. Motifs Ia, III, and IV are the most variable, and their function is unknown [Kadaré and Haenni, 1997], while motif VI is believed to bind nucleic acids because it is rich in basic residues [Kadaré and Haenni, 1997].

The HEV Hel domain has been demonstrated to have NTPase activity and to be able to unwind duplex RNA with 5’ overhangs with a 5´-3´ polarity [Karpe and Lole, 2010a]. Furthermore, HEV Hel can also hydrolyze rNTPs and dNTPs, but with lower efficiency [Karpe and Lole, 2010a]. This domain exhibits RNA 5´-triphosphatase activity (removal of γ-phosphate from the 5´ end of primary transcripts) and is suggested to participate in the first step of 5´ cap synthesis along with MT [Karpe and Lole, 2010b].

A mutagenesis study showed that motifs Ia and III are critical for Hel function, whereas I, IV, and VI are not essential [Mhaindarkar et al., 2014]. Moreover, in patients with fulminant hepatic failure, unique and highly conserved mutations in Hel domain have been reported. L1110F is specific to HEV1, and V1120I is frequent in HEV3 and rare in HEV4. These mutants expressed in vitro in Escherichia coli showed a slight decrease in ATPase activity; however, RNA unwinding activity was not affected. These mutations may be responsible for modifying virus-host protein-protein interaction, leading to an alteration in the host responses, which could therefore manifest as a more severe disease [Devhare et al., 2014]. On the other hand, expression in S10-3 cells resulted in a lower viral replication rate for the V1120I mutant. Altogether, the mutants´ replicons showed lower replication efficiency [Devhare et al., 2014].

Notably, mutations in the Walker A and Walker B motifs drastically reduced ATPase and RNA unwinding activity [Karpe and Lole, 2010a], while replacement of critical residues (GKS to GAS in site A and DE to AA in site B) completely eliminated viral RNA replication in a hepatoma cell line [Karpe and Lole, 2010a].

In another study, it was observed that the V1213A mutant had very low replication efficiency, and it was suggested that the amino acid V1213 favors the replication of HEV3 and HEV4, but not HEV1 [Cao et al., 2018a].

Interestingly, the substitution V239A found in Japanese patients infected with HEV3 of zoonotic origin was associated with increased virulence [Takahashi et al., 2009].

Recently, a 3D model of HEV Hel was constructed by homology modelling with tomato mosaic virus as a template (sharing 33% structural identity) for testing potential Hel compounds inhibitors in silico. According to the data, the most promising results were obtained with three molecules (PubChem ID: JFD02650, RDR03130, and HTS11136), which interacted with residues in the Walker A site [Parvez and Subbarao, 2018].

Moreover, the interaction of the Hel domain with host factors C4a and C8 was identified by a protein-protein interaction analysis (components of the classical and alternative complement activation pathways), suggesting that this domain might also somehow alter or inhibit complement activation [Subramani et al., 2018].

RdRp domain

HEV has a type 3 RdRp, typical of the "alpha-like" supergroup III of RNA ss + viruses, where HEV RdRp, RubV RdRp, and BNYVV RdRp form a distinct close cluster. In this supergroup, eight conserved motifs (I-VIII) have been described [Koonin et al., 1992].

HEV RdRp contains the highly conserved motif GDD, which in general plays a crucial role in catalytic activity and metal ion coordination [Wang and Meng, 2021], which explains its conservation among a wide range of RdRps [Koonin et al., 1992]. In fact, in vitro substitutions in the GDD motif can abolish the RdRp activity of HEV [Emerson et al., 2004], HCV [Yamashita et al., 1998], RUBV [Wang and Gillam, 2001], calicivirus [Vázquez et al., 2000], and poliovirus [Jablonski and Morrow, 1995].

Localization studies revealed that HEV RdRp is present in the ER, suggesting the involvement of the ER membrane in HEV replication [Rehman et al., 2008].

Furthermore, it has been suggested that HEV RdRp can either initiate de novo synthesis from the RNA template or employ the template end to prime the synthesis from the 3´ OH end [Mahilkar et al., 2016]. Moreover, it has been demonstrated that the HEV RdRp binds specifically to the 3´ end of the HEV RNA, requiring two stem-loop structures known as SL1 (nt 7173–7194) and SL2 (nt 7089–7163) domains at the poly(A) stretch, which are separated by a single-stranded region. Therefore, the 3´ end of the viral genome acts as a CRE that is critical for the initiation of HEV genome replication [Agrawal et al., 2001].

The second CRE is located in the junction region (JR) (between ORF1 and the start site of the subgenomic region) of the HEV genome, which contains a highly conserved stem-loop structure that is essential for subgenomic RNA synthesis. This JR region exhibits sequence similarity to its homologue of RubV [Huang et al., 2004]. Recently, it was reported that the last 41 nt at the 3´ end of ORF1 (surrounding the JR) also fold into a stem-loop structure that might act as an enhancer for the subgenomic RNA promoter [Cao et al., 2018b]. In summary, it has been proposed that HEV RdRp binds to the SL in the JR and that the upstream nucleotides at the 3´ end of ORF1 stabilize the binding of RdRp to the minus-strand RNA to promote replication, suggesting that the 3´ end of ORF1 might be a component of the subgenomic RNA promoter [Cao et al., 2018b].

Protein-protein interaction analysis revealed that the RdRp interacts with cellular C3, C8, and C4a proteins, possibly altering or inhibiting complement activation [Subramani et al., 2018], as mentioned for the other domains (Hel and HVR) described in this review. Interestingly, HEV RdRp interacts directly with eIF4A2, recruiting the host factors eIF4E and eIF4G into the viral replication complex, forming the eIF4F complex (an element of the host translation machinery). HEV RdRp is also able to interact with the factor eIF3A, which has been shown to be involved in the viral replication process, and the modulation of its activity might favor the translation of viral RNA by shutting down host protein synthesis. Indeed, it was observed that the host factor eEF1A1 is key for RdRp activity and for the stabilization of the viral translation/replication complex [Subramani et al., 2018]. Recently, the interaction of the host proteins HNRNPK and HNRNPA2B1 (nuclear ribonucleoproteins) with HEV RdRp was reported, and these proteins are believed to play a crucial role in viral replication [Kanade et al., 2019].

Some attention has been given to the role of the tetratricopeptide repeat 1 protein (IFIT1), which is part of the interferon-stimulated gene cascade activated by the host´s innate antiviral response. IFIT1 recognizes cap0 RNA structures (m7G) and blocks the binding of the eukaryotic translation initiation factor eIF4E to the RNA, thus inhibiting translation [Andrejeva et al., 2013]. In this case, it was demonstrated that HEV RdRp interacts directly with IFIT1, thereby protecting HEV RNA by preventing its binding to IFIT1, leaving HEV RNA available for the translation process [Pingale et al., 2019].

On the other hand, an analysis of the intraviral interactome showed that HEV RdRp can self-interact, which is apparently important for its polymerase function, and at the same time, it can interact with the PCP domain [Osterman et al., 2015].

Ribavirin (RBV) (1-β-D-ribofuranosyl-1,2,4-triazole), a synthetic guanosine/adenosine analog with a broad antiviral spectrum, is the only drug approved for the treatment of chronic HEV infection. RBV can be incorporated by the RdRp into the nascent viral RNA, where it induces base transitions, causes early chain termination, and interferes with replication by competitively inhibiting the binding of nucleotides [Feld and Hoofnagle, 2005]. Recently, treatment failure in chronically infected patients has been reported to be due to HEV antiviral resistance, probably associated with G1634R, Y1320H, and K1383N substitutions in the RdRp [Debing et al., 2016b]. The K1383N substitution strongly decreases viral replication and increases RBV sensitivity in vitro, opposite to the observed clinical phenotype [Debing et al., 2016b]. However, the Y1320H substitution increases HEV replication efficiency without altering RBV sensitivity, and this may be a compensatory change that helps to overcome the fitness loss resulting from the K1383N mutation. The G1634R substitution seemed to increase the replicative capacity of HEV and reduce the efficiency of RBV [Debing et al., 2014]. This substitution has also been demonstrated to increase viral titers in cell culture [Todt et al., 2020].

Other substitutions have also been reported in HEV-infected patients (D1384G, K1398R, V1479I, Y1587F) that possibly affect HEV replication by modulating RdRp activity [Debing et al., 2016a; Todt et al., 2016b, 2016a]. The substitutions mutants C1483W and N1530T isolated from acute liver failure patients with HEV have been strongly associated with high viral loads and mortality [Mishra et al., 2013], and the substitution F1439Y has been reported to be significantly associated with fulminant liver failure [Smith and Simmonds, 2015].

Recently, our group reported in silico 3D modelling studies of the HEV3 RdRp from a chronic patient in whom we identified a region of the HEV RdRp that hypothetically interacts with incoming nucleotides or RBV and performed molecular docking and molecular dynamics simulations between the enzyme and RBV triphosphate or GTP. The RBVT-HEV3 RdRp interaction was mediated by six hydrogen-bonds Q195-O14, S198-O11, E257-O13, S260-O2, O3, and S311-O11 [Cancela et al., 2021].

Moreover, with the aim of exploring novel antiviral therapy strategies for hepatitis E management, one research group reported the role of the microRNA miR-122 in HEV infection and replication. MicroRNA miR-122 is the most abundant liver-specific miRNA and is involved in numerous pathophysiological processes. In silico analysis of HEV1 to HEV4 sequences predicted most of them to have at least one miR-122 site. Notably, HEV1 genome sequences contained a highly conserved miR-122 target site in the RdRp region. In vitro studies employing HEV1 and HEV3 replicons in hepatoma cells showed that miR-122 promotes HEV replication, while inhibition of miR-122 decreased HEV replication dramatically. Thus, this role of miR-122 in HEV replication represents an opportunity for the development of new potential HEV antiviral drugs [Haldipur et al., 2018].

ORF2

ORF2 is 1983 nt in length, starting 37 nt downstream of the ORF1 stop codon and overlapping with ORF3 except for 14 nt, ending 65 nt upstream of the poly-A tail. The encoded viral structural protein has 660 aa residues and a predicted molecular mass of 72 kDa [Nan and Zhang, 2016].

It has been suggested recently that the ORF2 protein present in the serum of HEV-infected patients and the supernatant cultured cells exists mostly as a free form that is not associated with viral particles. Two forms have been described for the ORF2 protein, a secreted form (ORF2s) and a capsid-associated form (ORF2c). ORF2c is translated at a previously unknown internal AUG codon (15 aa downstream from the start of ORF2s) and remains in the cytosol to be incorporated into infectious virus particles.

On the other hand, ORF2s is secreted in the extracellular space in the form of a glycosylated dimer, and it lacks the regions involved in cell binding. In cultured cells, studies have suggested that ORF2s is not essential in the HEV life cycle but is capable of reducing antibody-mediated neutralization [Montpellier et al., 2018; Yin et al., 2018].

A 3.5-Å crystal structure was obtained from an HEV virus-like particle (VLP), in which three linear domains were identified: S, the shell domain (aa 129–319); M, the middle domain (aa 320–455); and P, the protruding domain, also known as E2s (aa 456–606). The icosahedral S domain adopts a classical antiparallel jelly-roll β-barrel fold with eight β-strands (named B to I) and four short α-helices strengthened by 3-fold protrusions formed by M and 2-fold spikes of P. Its inner region is rich in basic amino acids (six arginine residues per subunit), which could participate in neutralizing the negative charges of the genomic RNA [Guu et al., 2009]. Four loops have been identified between the β-sheets in the S domain, named loops B–C (aa 139–152), D–E (aa 196–206), F–G (aa 236–241), and H–I (aa 281–296), around the center of the pentamer structure, in which α1 and α4 were found between strands C/D and D/F, respectively [Yamashita et al., 2009].

The M domain, which is closely associated with the S domain (through βB, βC, and loops CD, EF, GH) and positioned on the surface around the icosahedral 3-fold axis, forms a twisted antiparallel squashed β-barrel structure consisting of six β-strands and four short α-helices and is involved in the trimeric interaction [Yamashita et al., 2009]. This domain also contains a putative sialic acid binding site in a helix-turn-helix motif (aa 376–391) positioned at one end of the β-barrel [Guu et al., 2009].

The P domain in the HEV VLP forms a twisted antiparallel β-sheet structure and is connected to the M domain by a long proline-rich hinge "445-NQHEQDRPTPSPAPSRPF-462" (making the capsid more resistant to proteases), contributing to dimer formation on the capsid surface [Yamashita et al., 2009]. The P domain contains a large insertion (aa 504–533), compared to the M region, between β20 and β22 of the central β-barrel. This 30-aa insertion (three β-strands and one α-helix) mediates the interaction between the surface spike and the 3-fold protrusion [Guu et al., 2009]. Three highly exposed loop insertions (aa 482–490, 550–566 and 583–593) can be found on the top of this surface spike, which is suggested to participate in antigenicity determination [Guu et al., 2009].

The region located at residues 118–131 of the HEV capsid protein form the N-terminal arm, which makes a sharp turn at the beginning of βB, initiating an extended loop interacting with a 2-fold-related and a 3-fold-related adjacent molecule [Guu et al., 2009].

The HEV capsid protein contains a signal sequence in the N-terminal region of 22 aa (consisting of arginine-rich residues, a 14-aa hydrophobic core and a turn-inducing stretch of proline residues) and N-linked glycosylation sites (N137, N310, and N562). N562 appears to be important for dimerization of the capsid protein [Xu et al., 2016]. It also contains a putative ER localization signal at its N-terminus [Surjit et al., 2007].

Interestingly, expression of the complete ORF2 in insect cells resulted in proteolytic cleavage of the first 111 and the last 52 residues (lacking the signal sequence) [Zhang et al., 1997], producing a 55-kDa protein that can self-assemble into VLPs [Xing et al., 1999]. However, its expression in mammalian cells yielded two protein forms, a 74-kDa form corresponding to the unglycosylated protein and an 88-kDa form corresponding to the glycosylated protein [Jameel et al., 1996].

Xing et al. reported that infection of insect cells with an ORF2-producing recombinant baculovirus (with a deletion of the N-terminal 13 aa) resulted in two types of particles, HEV-VLP/ T = 1 and HEV-VLP/ T = 3 [Xing et al., 2010].

HEV-VLP/T = 1 produced in vitro has icosahedral symmetry with an external diameter of 270 Å and is composed of 60 subunits of truncated capsid protein producing the icosahedral 2-, 3-, and 5-fold axes.

The capsid subunit interactions needed for HEV-VLP T = 1 packaging are dimeric, trimeric, and pentameric, occurring at the 2-, 3-, and 5-fold axes [Guu et al., 2009]. The particle also has 30 dimeric protrusions at the 2-fold axes of the surface with deep depressions at the 3- and 5-fold axes. Moreover, the P domain dimer produces the protruding spikes around the 2-fold axes of HEV-VLP T = 1, stabilizing capsid protein interactions [Guu et al., 2009]. Dimerization of the P domain is mediated by an extended loop (aa 550–566) and three β-strands from the central β-barrel (β18, β24, and β27).

Mutagenesis analysis revealed that residues A597, V598, A599, L601, and A602 are critical for the dimeric interaction [Li et al., 2005]. Additionally, it was shown that amino acid substitutions in β-strand 27 at the dimer interface formed by residues 585–610 could possibly lead to a folding alteration resulting in a disruption of the compact packing between the two β-sheets [Guu et al., 2009].

In contrast to the T = 1 particle, the T = 3 particle (180 capsid subunits), with an outer diameter of 370Å, has a total of 90 surface spikes (dimers C-C and A-B) and 60 trimeric protrusions (A-B-C) [Guu et al., 2009]. Structural modeling analysis showed that the assembly of the native T = 3 capsid, unlike HEV-VLP T = 1, requires flat capsid protein dimers [Yamashita et al., 2009]. The A-B subunits form a dimer with bent conformation around the 5-fold axis, whereas the C monomers have a flat conformation at the 2-fold icosahedral axis. The orientation of the P domain C-C dimer in the HEV-VLP T = 3 particle relative to the M and S domains is turned by 90° compared to the A-B dimer and the dimer in HEV-VLP T = 1, possibly facilitated by the proline-rich hinge between the P and M domains [Mori and Matsuura, 2011].

However, the HEV-VLP T = 1 particle seemed to exhibit properties similar to those of the native virion with respect to antigenicity and surface substructure [Li et al., 2004; Xing et al., 1999].

The capsid protein has been demonstrated to inhibit type I and type III IFNs by interacting with the MAVS-TBK1-IRF3 complex, thus blocking the phosphorylation of IRF3. The arginine-rich motif within the N-terminus of the ORF2 protein is critical for this inhibition [Lin et al., 2019].

The capsid protein has also been found to inhibit signaling by the RIG-1 and Toll-like receptor (TLR) adapters IPS-1, MyD88, and TRIF [Hingane et al., 2020].

It has also been demonstrated that cells expressing ORF2 can activate the pro-apoptotic gene CHOP, mediated by ATF4. ORF2 is also able to increase phosphorylation of eukaryotic initiation factor 2 alpha and promote ATF4 translation. However, no apoptosis has been reported in these cases. Contrarily, ORF2 can induce upregulation of chaperones (such as HS70B’ and Hsp72) and co-chaperones (such as Hsp40), which could correspond to a survival mechanism instead [John et al., 2011].

Interestingly, the glycosylated form of ORF2 is associated with NF-κB inhibition activity by its direct association with the protein βTRCP and thus blocks the assembly of the IκBα ubiquitination complex [Surjit et al., 2012].

The HEV virion resembles plant RNA ss + viruses (tombusviruses and sobemoviruses) in its assembly pathway due to the employment of a long electropositive N-terminal domain to interact with genomic RNA. Molecular simulations have suggested that the ORF2 decamer is the assembly intermediate of the T = 3 HEV particle [Yamashita et al., 2009].

The E2s domain (P domain) has been identified as the minimum antigenic domain (aa 455–602) capable of inducing HEV-neutralizing antibodies [Zhao et al., 2015], as it contains the immunodominant epitopes [Guu et al., 2009; Li et al., 2009; Yamashita et al., 2009]. The crystal structure of E2s has been reported previously [Li et al., 2009], and the dimerization of this domain has been demonstrated to be important for host interactions in all HEV genotypes [Li et al., 2009].

It has been observed that the packing of this domain results in a flat conformation of the dimer, which is thought to be stabilized by the hydrophobic residues 585–595 [Bai et al., 2020].

E2s forms a β-barrel in which the residues from β2, β3, β6, and β7 as well as the loops protrude at one side of the structure in order to produce a surface groove (15-Å width and 11-Å depth). This β-barrel possesses nine antiparallel β-strands, in which on one side there are three loops that connect adjacent β-strands, while on the other side, three loops and a double-stranded β-sheet link the adjacent β-strand. The inner pore consists of 28 hydrophobic residues blocked by loops at the top and bottom of the cavity connecting residues T586-A590 and A467-F462, which are possibly involved in recognizing hydrophobic ligands [Li et al., 2009].

In 2018, a liver-transplanted patient with hepatitis E caused by rat HEV-C (species C) was reported for the first-time [Sridhar et al., 2018]. Very recently, the crystal structure of the HEV-C E2s domain was determined at 1.8Å resolution. HEV-C E2s has 41% aa sequence identity to HEV E2s from species A (HEV-A E2s), but they nevertheless share a conserved overall structure. HEV-C E2s consists of a compact barrel with 12 β-strands linked by loops and a unique groove region. Inside the β-barrel, a highly hydrophobic cavity (30 Å deep) was identified, blocked by loops containing residues that are conserved (I583-P594 and T552-D567) among members of the family Hepeviridae [Bai et al., 2020]. The groove region (15 Å wide and 10.5 Å deep) formed by β2, β6, β7, β9, and β10 were connected by fusion loops at one side containing hydrophobic residues (A481, A486, M487, G488, P491, G433, and L534) [He et al., 2008] and was reported to be the likely antibody recognition site of HEV [Li et al., 2009].

Interestingly, structure-based mutagenesis done with VLPs revealed that E549A, K554A, G591A, and D430A substitutions in the E2s region completely abolished HEV host-cell penetration [Gu et al., 2015]. Furthermore, the amino acid mutations F51L, T59A, and S390L have been associated with attenuation of HEV in pig models [Córdoba et al., 2011].

A monoclonal antibody, 8C11, has been reported to recognize a neutralizing conformational epitope exclusively on HEV1 (preventing the VLP from binding and entering the host cell). The 8C11 epitopes on E2s, identified by X-ray crystallography, were D496-T499, V510-L514, and N573-R578, where R512 was recognized as the key residue for neutralization [Tang et al., 2011].

Moreover, using the monoclonal antibodies 3E8 and 1B5 against avian HEV capsid protein, it was shown that the motifs I/VPHD and VKLYM/TS are critical for the interaction, and both epitopes seemed to be present in avian, swine, and human HEV [Wang et al., 2015].

Although the cell receptor for HEV has not been identified, several host factors have been suggested to be involved in cell attachment and/or entry of HEV. In the case of the naked HEV particle, the following putative cell receptors have been described: heparan sulfate proteoglycans (HSPGs), glucose-regulated protein 78 (GRP78/Bip), asialoglycoprotein receptor (ASGPR), ATP synthase subunit 5β (ATP5B), and integrin alpha 3 (ITGA3) [Wißing et al., 2021; Yin and Feng, 2019].

VLPs from recombinant ORF2 have been reported to bind target cells via cell-surface HSPGs on syndecans in Huh7 cells [Kalia et al., 2009], whereas for the "quasi-enveloped" HEV, HSPGs were not essential for cell attachment and infection [Yin et al., 2016].

GRP78/Bip, a molecular chaperone located in the ER, was found by interaction studies to bind the recombinant ORF2 protein (aa 368–606), p239 [Zheng et al., 2010].

ASGPR is present in the basolateral membrane of hepatocytes, and its expression in HeLa cells resulted in an increase in HEV binding ability, while a depletion of ASGPR in PLC/PRF/5 cells decreased HEV binding, but not virion release [Zhang et al., 2016].

On the other hand, the "quasi-enveloped" HEV membrane contains phosphatidylserine, which might bind to the cell surface receptor T cell immunoglobulin mucin domain 1 (TIM-1) on target cells acting as an attachment factor [Wißing et al., 2021; Yin and Feng, 2019].

In the past few years, significant efforts have been made in the development of HEV vaccines based on the ORF2 protein as a subunit or as a VLP. So far, no success has been achieved in the production of VLPs in plants [Ma et al., 2003; Maloney et al., 2005; Mazalovska et al., 2017; Zhou et al., 2006]. Similarly, in baculovirus-insect cell system the expression of the whole ORF2 did not produce VLPs [Li et al., 1997]. However, in the Tn5 cell line, VLP formation was achieved once the N-terminal region was truncated [T.-C. Li et al., 2015; Zhou et al., 2015].

Expression of recombinant ORF2 in E. coli has produced highly immunogenic VLPs (HEV 239), which have been shown to be safe and effective for humans in phase II and phase III clinical trials [Zhang et al., 2009; Zhu et al., 2010]. This Hecolin® (HEV 239) vaccine has been licensed for use in China since 2012 [S.-W. Li et al., 2015; Park, 2012], providing long-term protection for 4.5 years with 86.6% efficacy [Zhang et al., 2015]. However, for global use, further assessment of safety and efficacy in risk groups must be still carried out. Indeed, two clinical trials are taking place: phase I in the USA – NCT03827395 and phase IV in Bangladesh in pregnant women – NCT02759991 [Zaman et al., 2020]. A phase I clinical trial for a very recent VLP vaccine (p179) generated from HEV4 in China is ongoing [Cao et al., 2017].

ORF3

ORF3 is the smallest ORF in the HEV genome. It is translated from a subgenomic RNA and overlaps with ORF2 by 300 nt in a different frame, producing a 113- to 115-aa protein. This overlapping region (nt 5145–5475) has been identified as the most conserved region in various HEV strains [Nan and Zhang, 2016]. This phosphoprotein (also known as VP13) has a molecular weight of 13 kDa [Holla et al., 2013].

VP13 contains two major N-terminal hydrophobic domains D1 (aa 7–23) and D2 (aa 28–53) and two proline-rich regions in its C-terminus, P1 (aa 66–77) and P2 (aa 95–111) [Holla et al., 2013]. The D1 domain is rich in cysteine and is necessary for to association of VP13 with the cytoskeleton [Holla et al., 2013].

It has been demonstrated that VP13 interacts with microtubules through both hydrophobic N-terminal domains by electrostatic interactions (behaving like a microtubule-associated protein), and this interaction possibly inhibits the release of cytochrome c, thus protecting the cell from apoptosis and favouring successful HEV infection. Moreover, VP13 has been suggested to be transported to the microtubule organizing center by its association with dynein [Kannan et al., 2009].

The P1 domain contains a PMSP motif in which the residue S71 can be phosphorylated by extracellular signal-regulated kinase (ERK), a member of the mitogen-activated protein kinase (MAPK) family [Nan and Zhang, 2016]. It has been suggested that VP13 phosphorylation is not necessary for HEV replication and infection in cultured cells and rhesus monkeys [Graff et al., 2005].

Moreover, VP13 binds to the linker region of MAPK phosphatase 3 (MKP-3), and this interaction blocks the conformational change needed for its correct function, leading to activation of ERK by inhibiting this phosphatase [Kar-Roy et al., 2004]. Furthermore, the activation of ERK also reduces the levels of pSTAT3 [Chandra et al., 2008a], thus promoting cell proliferation.

The P2 domain has one PSAP motif (aa 95–98) that is conserved in all HEV strains, while HEV3 possess one additional PSAP motif located at aa 86–89. The PSAP motif at aa 95–98 has been shown to be a functional domain for virion release. The PXXP motifs in the P2 domain can bind many SRC homology 3 (SH3) domains from other proteins [Nagashima et al., 2011b], which suggests that the ORF3 protein can modulate the cellular environment for infection [Holla et al., 2013]. This P2 forms a type II polyproline helix with SH3, with three residues per turn [Cohen et al., 1995; Pawson, 1995], and this structure is stabilized by a salt bridge between the terminal arginine of P2 and a conserved acidic residue in SH3 [Korkaya et al., 2001].

VP13 interacts with CIN85 (a protein involved in the downregulation of receptor tyrosine kinases) and delays the internalization of activated growth factor receptors [Chandra et al., 2008a].

Another protein that has been shown to bind the D2 domain of VP13 is human hemopexin, which is an acute-phase protein involved in heme transport in plasma and protection of hemoglobin against oxidative damage [Ratra et al., 2008].

Cells expressing ORF3 also exhibit high levels of hexokinase and oligomeric forms of the voltage-dependent anion channel, resulting in downregulation of the signaling pathway for mitochondrial death [Moin et al., 2007]. All of this results in a reduced inflammatory response in the liver to facilitate HEV infection [Holla et al., 2013].

The ORF3 protein has also been reported to interact with α1 microglobulin bikunin precursor protein (AMBP) and its corresponding product α1m (an immunosuppressive molecule), promoting its secretion, mediated by tumor susceptibility gene 101 (Tsg101), which is a central component of the endosomal sorting pathway (ESCRT). VP13 binds to Tsg101 through the conserved PSAP motif in the P2 domain, and overexpression of this immunosuppressive molecule creates a protective state for the infected hepatocyte [Surjit et al., 2006]. ESCRT is involved in the budding of several enveloped viruses, and studies have shown that HEV forms membrane-associated particles in the cytoplasm, possibly mediated by the interaction with ESCRT machinery, induced by the enzyme class E vacuolar protein sorting (Vps4) [Nagashima et al., 2011a]. The multivesicular body pathway is afterwards required to release the "quasi-enveloped" viral particles [Nagashima et al., 2014]. It has been reported that VP13 binds to the surface of "quasi-enveloped" HEV virions in the patients’ blood and in cell-culture, but not in feces [Takahashi et al., 2008]. Thus, the PSAP motif acts as a functional domain for HEV egress [Nagashima et al., 2011b]. Also, VP13 phosphorylated at S80 has been reported to interact with non-glycosylated ORF2. [Tyagi et al., 2002].

Interestingly, VP13 has been shown to be an ion channel viroporin (similar to class IA viroporins) that is required for the release of infectious viral particles [Ding et al., 2017].

VP13 stabilizes the highly unstable α subunit of hypoxia inducible factor (HIF-α) by activating the PI3K/Akt signaling pathway, which accumulates HIF-α and recruits phosphorylated p300/CBP, leading to transcriptional activation of genes encoding glycolytic enzymes. This action may regulate energy homeostasis to create a favorable environment in HEV-infected cells [Moin et al., 2009].

Notably, it has been observed that VP13 is able to regulate several hepatotropic proteins through the induction of phosphorylation of hepatocyte nuclear factor 4 (HNF4), resulting in its reduced translocation to the nucleus and thus in diminished transcription factor activity, which could also contribute to the HEV infection state [Chandra et al., 2011].

In vitro studies have also demonstrated that VP13 enhances IFN expression in HeLa cells induced by the synthetic analog of double-stranded RNA (dsRNA), poly I:C, by increasing RIG-1 expression. VP13 interacts with N-terminal domain of RIG-1 and promotes its ubiquitination, which is necessary for RIG-1 activation. Of note, it has been observed that only HEV1 and HEV3 are able to enhance RIG-1 signaling, while HEV2 and HEV4 do not have this effect. Since VP13 is required for HEV infection in vivo, it has been suggested that this enhancement may be involved in HEV invasion [Nan et al., 2014a].

Moreover, the P2 domain of VP13 represses the NF-κB pathway via Toll-like receptor 3 signaling (TLR3 detects dsRNA) by degrading tumor necrosis factor receptor type 1 (TRADD) and decreasing receptor-interacting serine/threonine-protein kinase 1 (RIP1) K63 ubiquitination in A549 poly-I:C-induced cells. This effect reduces the inflammatory response and therefore likely promotes cell survival [He et al., 2016].

On the other hand, VP13 has been reported to block STAT1 phosphorylation by inhibiting IFN-α signaling, and also to downregulate some IFN-α-stimulated genes in A549 cells [Dong et al., 2012].

Additional in vitro interaction studies revealed that VP13 can associate with hepsin, which is a type II transmembrane serine protease related to the progression of cancer [Wang et al., 2014], and with the fibrinogen Bβ chain (FBG), which is involved in the inflammatory response [Ratra et al., 2009]. Furthermore, VP13 has been found to interact with 32 proteins, mostly ones related to blood coagulation and homeostasis, suggesting that this viral protein may alter the coagulation and fibrinolysis processes [Geng et al., 2013]. In fact, in patients with hepatitis E, elevated levels of transaminase enzymes have been associated with coagulopathies and severe disease [Ibrahim et al., 2009].

VP13 can also activate the MAPK-JNK1/2 pathway in infected hepatocytes ex vivo, which has been suggested to induce pro-survival cell signaling, thus allowing chronic HEV infection [Parvez and Al-Dosari, 2015].

In addition, VP13 has been found to be palmitoylated in a cysteine-rich part of the N-terminal region. This palmitoylation is critical for VP13 membrane association and subcellular localization, and it is possibly involved in stabilization of the viral protein. These cysteine residues have also been shown to be necessary for the secretion of infectious virions, indicating that posttranslational modifications mediated by the host cell play a key functional role for HEV [Gouttenoire et al., 2018].

ORF4

ORF4 (nt 2835–3308), which is present exclusively in HEV1, is synthesized only under ER stress conditions, in an alternative reading frame. It is a short-lived protein, and its amino acid sequence is generally conserved among HEV1 strains. ORF4 translation is dependent on an IRES-like element at nt 2701–2787. The ORF4 product is indispensable for HEV1 replication and interacts with multiple viral proteins to assemble a viral replication complex of RdRp, Hel, and X proteins, and the ORF4 protein promotes RdRp activity by interacting with host eEF1α1 and tubulin β [Nair et al., 2016].

Interestingly, ORF4 has been demonstrated to be degraded by the host proteasome, as it possesses a proteasomal degradation signal, which might be an antiviral strategy to restrict virus spread. This putative ubiquitination site is located in a region containing residue K51, which is flanked by two P residues. Sequence analysis of HEV from infected patients showed that most of the HEV1 isolates analyzed demonstrated conservation of K51, whereas the ubiquitination site was lost in some strains due to an amino acid change from P50 to L50, suggesting that HEV in those patients produced a proteasome-resistant ORF4.

Since ORF4 is produced under ER stress, HEV1 replication in cell culture is very inefficient, except in cell lines stably expressing ORF4 or with viral mutants with proteasome-resistant ORF4 [Nair et al., 2016].

In silico sequence and structure analysis has shown that ORF4 has an IDR that is enriched in typically disorder-promoting residues (R, P, and S) and neutral residues (A, G, and T). Moreover, a high abundance of structure-breaking residues (G and P) reinforces this hypothesis [Shafat et al., 2021].

Recently, ORF4 codon usage analysis patterns showed an overrepresentation of C, while A was the least represented nucleotide. It was also observed that the preferred codons mostly ended with C and G, which might be useful information for efficient expression of the ORF4 protein [Shafat et al., 2022].

Conclusions and perspectives

Although important breakthroughs have been achieved in the last few years in terms of deciphering HEV protein structure and function, many crucial aspects involving functional domains, host-cell interactions, pathogenesis, and interactions with antiviral drugs remain to be elucidated, which therefore hinders our understanding of HEV biology. Some important and interesting issues regarding HEV proteins that need to be clarified are summarized in Table 1.

Table 1 Important questions about different aspects of the HEV ORF1, ORF2, ORF3, and ORF4 proteins that remain to be elucidated

A particularly interesting question that remains to be addressed is the involvement of the ORF3 protein in the formation of "quasi-enveloped" particles and virion release. Determining the molecular mechanisms of this process might be helpful for understanding why the two HEV particles types (naked and "quasi-enveloped") seem to bind different cellular receptors, and, at the same time, it would be relevant to study if this difference in cellular receptor use might influence tissue tropism, as several extrahepatic manifestations have been reported.

Another relevant aspect that needs to be studied is the structure and function of ORF2s which could help to explain its role in immune evasion and infection, mainly its possible immunomodulatory function in HEV persistence. In fact, during an HEV infection, ORF2s has been suggested to act as a decoy against the humoral immunity, as the "quasi-enveloped" particles in the bloodstream are insensitive to neutralizing antibodies.

From the start, in vitro isolation of HEV has posed a challenge, and the lack of an efficient and standardized system has hampered the characterization of this virus. However, some in vitro and in vivo models, including novel human liver chimeric mice, have been reported to carry out HEV replication (especially in certain adapted strains) from HEV replicons, recombinant proteins, or fully infectious particles [Fu et al., 2019; Sayed et al., 2019]. Recently, a human-liver-derived 3D organoid system was reported to be highly permissive for HEV infection [Li et al., 2022], representing an interesting strategy for future research on cellular receptors for HEV and antiviral drug development.

To fill some knowledge gaps about HEV proteins, more-efficient cell-culture- or animal-model-based studies are still needed. For instance, recent methodologies such as CRISPR/Cas9 and approaches such as ribosome profiling (Ribo-Seq) could help to deepen our knowledge about HEV molecular biology. CRISPR/Cas9 is a powerful, valuable, and robust tool for gene editing that could allow the host factors acting as cellular receptors for the naked and "quasi-enveloped" particles to be identified. Ribo-Seq has not been employed in HEV research so far. This approach, developed by Ingolia et al. in 2009 [Ingolia et al., 2009], allows viral elements that are being actively translated in infected cells to be identified and characterized by high-throughput sequencing. Additionally, it is possible to calculate the translation efficiency of the expressed genes [Stern-Ginossar, 2015]. So far, the ribosome profile has been reported for only a few viruses, including SARS-CoV-2 [Finkel et al., 2020]. In the case of HEV, this method would allow mapping of the HEV translatome, quantification of the expression of the canonical ORFs, identification of possible unannotated ORFs, and investigation of virus-cell interactions.

Finally, structural data obtained by NMR or X-ray crystallography or in silico 3D modelling is still needed to determine the structural features of the remaining HEV proteins.