Background

Hepatitis E virus (HEV) is a major zoonotic pathogen causing acute hepatitis E worldwide. HEV is a single-stranded RNA virus belonging to the family Hepeviridae [1]. According to recent data, about 15–110 million of the individuals worldwide are still experiencing infections and about 939 million of the world populations have already (past HEV) experienced infection from HEV [2]. In India, the reporting of hepatitis infections to the CBHI (Central Bureau of Health Intelligence) is exceptionally low, as mostly of the people share a common belief that the disease is characterized with lack of cure in allopathy, thus, establishing inaccurate burden of HEV infections. However as suggested, India has been reported with 10–40% cases of acute hepatitis and 15–45% cases of acute liver failure in its population [3, 4]. As reported in studies, the HEV is segregated into 8 genotypes (genotype 1–genotype 8). Orthohepevirus A species of the Orthohepevirus genus is segregated into 8 genotypes (genotype 1, genotype 2, GT genotype 3, genotype 4, genotype 5, genotype 6, genotype 7, and genotype 8). The genotypes 1 and 2 infect only humans and are major reasons for waterborne outbreaks in endemic regions of Africa, parts of Asia (South and Southeast) and Mexico while the genotypes 3 and 4 infect diverse mammals, such as humans, rabbits, swine and deer, and cause sporadic of hepatitis E cases in mainly developed countries of East Asia and Europe [59]. The genotypes 5 and 6 have been identified in Japan from wild boars [10, 11] while genotype 7 from dromedaries in Middle East countries and genotype 8 from China in Bactrian camels [12, 13]. The meat product (raw or undercooked) consumption from animals in developed nations is the chief cause of sporadic infections [14]. Moreover, transmissions such as blood-mediated [15], person-to-person [16], animal (pet) to human [17, 18] have been reported in patients, thus categorizing HEV as major concern of health issue. The diagnostic tests for HEV largely target mostly anti-HEV IgG and anti-HEV IgM antibodies. In diagnosis, initially HEV RNA and anti-HEV IgM antibodies are detected, which is followed up by anti-HEV IgG antibody detection. Presence or positive detection of anti-HEV IgM antibodies in serum are considered as important marker for acute HEV infection. Anti-HEV IgG antibodies have long-lasting persistence (duration is uncertain), thus are considered as markers for an individual who has experienced past infection [19, 20]. In contrast to this, anti-HEV IgM antibodies persist for a short duration of time (3 to 4 months), and thus are considered as markers in an individual who is undergoing recent infection [19, 20].

The three open reading frames (ORFs) altogether forms the HEV genome in which the first, second, and third reading frames encoded proteins ORF1, ORF2, and ORF3 codes for the HEV’s nonstructural polyprotein, capsid, and regulatory protein respectively [21]. Though earlier it was revealed that ORF2 acts as a capsid protein [22], later its role was described in multiple crucial functions of HEV [14, 2327].

Intrinsically disordered regions (IDRs) [including intrinsically disordered proteins (IDPs) or intrinsically disordered protein regions (IDPRs)] constitute the fraction of a proteome that are known as “dark proteomes.” This dark proteome does not have noticeable similarity to any PDB structure. The IDRs (IDP or IDPR) lack unique structures as they are not folded into three dimensional structures within the viral proteomes [28]. The IDRs exhibit specific functions due to lack of definite 3D structure [2931]. Additionally, due to possession of intrinsic disorder phenomenon, the IDRs (IDPs or IDPRs) have been correlated with a number of implications in various human diseases (cancer, etc.) [32]. Interestingly, most of the viral proteins possess MoRFs (molecular recognition features), i.e., short protein regions within IDRs that upon binding to partners (interacting partners) undergo disorder-to-order transition [33]. Therefore, the unique characteristics possessed by IDRs assist proteins in their interaction with diverse biological partners and thus form an important requirement for completion of multiple cellular pathway regulation through protein–protein interaction networks [34]. Moreover, the disordered regions in proteins constitute potential drug targets due to their association with important biological processes [35, 36]. As IDP/IDPR is indispensable for carrying out several important crucial functions in viruses, thus they are analyzed using computational approaches [2931]. It has been revealed recently that besides performing capsid function, the ORF2 plays crucial role in other processes, such as viral replication, immune response regulation, cellular signalling, host tropism, and pathogenesis of HEV. Moreover, it has been suggested that ORF2 has the potential in forming the development of vaccine against HEV [27].

Thus, in this regard, the present study carried out the intrinsic disorder analysis of the HEV ORF2 proteins by employing a set of bioinformatics predictors through evaluating its functional significance. The disorder analysis predicted ORF2 as either moderately disordered or highly disordered protein revealing them as IDPR or IDP variants. The presence of MoRF regions in ORF2 sequences revealed its propensity to bind to interacting partners. Additionally, we also carried out the structure-based analysis of ORF2 protein (using sequences obtained from different host and genotypes) in order to reveal its molecular functions. The identified ion binding, protein binding, and metal binding sites in conjunction with diverse biological processes, such as viral replication, RNA biosynthetic process in the ORF2 protein signified its importance in interaction with the host cell membrane. Our study could shed some novel light on the understanding more ORF2 protein functions beyond its role as a viral capsid protein.

Methods

Sequence retrieval

The HEV ORF2 protein sequences were obtained from GenBank at NCBI (National Center for Biotechnology Information). The present analysis included those sequences that encompassed different GTs and hosts and are listed in Table 1.

Table 1 Secondary structure and disorder prediction by Phyre2

Three dimensional (3D) structure

The elements of secondary structure and 3D tertiary structure of the HEV ORF2 proteins were predicted using Phyre2 (Protein Homology/AnalogY Recognition Engine) (PHYRE2 Protein Fold Recognition Server (ic.ac.uk) and I-TASSER (Iterative Threading ASSEmbly Refinement) (itasser - Search (bing.com)) webserver, respectively.

Disorder plot evaluation

The disorder character of the HEV ORF2 proteins was examined using the online tool PONDR (Predictor of Natural Disordered Regions) by considering its default parameters. Different predictors of PONDR like PONDR-VLS2 [37], PONDR-VL3 [38], and PONDR-VL-XT [39], in combination with DISOPRED3 (PSIPRED Workbench (ucl.ac.uk)), were used to identify the disordered regions in the ORF2 proteins.

MoRFs (molecular recognition features) prediction

The disorder-based binding residues within ORF2 protein segments were identified using three different online predictors, DISOPRED3 (PSIPRED Workbench (ucl.ac.uk)), IUPred3 (IUPred3 (elte.hu)), and IUPred2A (IUPred2A (elte.hu)). The cutoff score was set as 0.5 for all the predictors (cutoff = ≥ 0.5).

Phosphorylation pattern evaluation

DEPP (disorder enhanced phosphorylation prediction) (http://www.pondr.com/cgi-bin/depp.cgi), an online predictor, was used to predict the phosphorylated residues (Ser, Thr and Tyr) in the HEV ORF2 sequences.

Structure-based function prediction

COFACTOR algorithm [40, 41] was employed to identify the probable molecular functions and biological processes of the HEV ORF2 proteins by utilizing their 3D structured models.

Results

The HEV genome encoded ORF2 starts at the 5147th nucleotide and terminates at the 7127th position of nucleotide. The schematic illustration of the genome of HEV (with reference to Sar55 strain with accession number AF444002) is represented in Fig. 1 [42].

Fig. 1
figure 1

Schematic representation of the genome of HEV. The genomic organization is organized into three open reading frames (ORFs), i.e., ORF1, ORF2, and ORF3. The nucleotide numbers are with reference to the strain Sar55 (GenBank Accession Number: AF444002)

3D structures with predicted disorder

The HEV ORF2 protein 3D structures modeled through I-TASSER (based on homology modeling) consisted of alpha helix, beta strand and coil (Fig. 2A–H). The results showed the dominance of coils in comparison to both helices and strands. The predicted percentage of helices, strands, and disorder (evaluated through Phyre2) are mentioned in Table 1.

Fig. 2
figure 2

Generated 3D models of the HEV ORF2. A JF443720 (GT 1); B M74506 (GT 2); C AB222182 (GT 3); D GU119961 (GT 4); E AB573435 (GT 5); F AB602441 (GT 6); KJ496143 (GT 7); and H KX387865 (GT 8). The prediction was carried out using I-TASSER

Thus, our results clearly revealed that the HEV ORF2 proteins consisted of significant fraction of disordered regions. This further prompted us to analyze its intrinsic disorder content using computational approach.

Intrinsic disorder distribution

The HEV ORF2 proteins intrinsic disorder analysis was evaluated (using different disorder predictors) as mentioned in Table 2. The disorder graphs of the ORF2 proteins are shown in Fig. 2A–H. As suggested, the proteins are termed as structured, moderately disordered, or highly disordered due to their overall predicted intrinsic disorder fraction [43]. Additionally, disorder variants like ORDP (ordered protein), IDPR (intrinsically disordered protein region), and IDP (intrinsically disordered protein) are termed as due to the disordered domain’s length and disordered residue (overall) fraction [44, 45].

Table 2 The predicted percentage of intrinsic disorder scores of the ORF2 protein in hepatitis E viruses

ORF2 protein (JF443720)

The disorder distribution analysis of the ORF2 polypeptide sequence (JF443720) categorized it into a highly disordered protein as it contained >30% of disordered residues (43.25% by VLXT and 35.66% by VSL2), and moderately disordered protein as it contained <30% of disordered residues (28.53% by VL3). Additionally, the inclusion of long disordered domain at N-terminus in the polypeptide sequence, i.e., up to 59 to 132 consecutive amino acid residues, categorized it into IDP, i.e., protein possessing significant fraction of disordered regions (as predicted by VLXT and VSL2) or IDPR, i.e., structured protein possessing intrinsically disordered segments (as predicted by VL3). The DISPOPRED3 predicted the ORF2 as a moderately disordered protein or IDPR as it contained 20.33% (<30%) of disordered residues with long continuous stretch of disordered domain (about 45 amino acid residues).

ORF2 protein (M74506)

The disorder distribution analysis of the ORF2 polypeptide sequence (M74506) categorized it into a highly disordered protein as it contained >30% of disordered residues (43.55% by VLXT and 36.57% by VSL2), and moderately disordered protein as it contained <30% of disordered residues (25.19% by VL3). Additionally, the inclusion of long disordered domain at N-terminus in the polypeptide sequence, i.e., up to 60 to 150 consecutive amino acid residues, categorized it into IDP (as predicted by VLXT and VSL2) or IDPR (as predicted by VL3). The DISPOPRED3 predicted the ORF2 as a moderately disordered protein or IDPR as it contained 20.48% (<30%) of disordered residues with long continuous stretch of disordered domain (about 48 amino acid residues).

ORF2 protein (AB222182)

The disorder distribution analysis of the ORF2 polypeptide sequence (AB222182) categorized it into a highly disordered protein as it contained >30% of disordered residues (40.30% by VLXT and 37.12% by VSL2), and moderately disordered protein as it contained <30% of disordered residues (28.48% by VL3). Additionally, the inclusion of long disordered domain at N-terminus in the polypeptide sequence, i.e., up to 57 to 91 consecutive amino acid residues, categorized it into IDP (as predicted by VLXT and VSL2) or IDPR (as predicted by VL3). The DISPOPRED3 predicted the ORF2 as a moderately disordered protein or IDPR as it contained 19.39% (<30%) of disordered residues with long continuous stretch of disordered domain (about 48 amino acid residues).

ORF2 protein (GU119961)

The disorder distribution analysis of the ORF2 polypeptide sequence (GU119961) categorized it into a highly disordered protein as it contained >30% of disordered residues (40.06% by VLXT and 33.23% by VSL2), and moderately disordered protein as it contained <30% of disordered residues (24.04% by VL3). Additionally, the inclusion of long disordered domain at N-terminus in the polypeptide sequence, i.e., up to 60 to 131 consecutive amino acid residues was also observed, thus was categorized it into IDP (as predicted by VLXT and VSL2) or IDPR (as predicted by VL3). The DISPOPRED3 predicted the ORF2 as a moderately disordered protein or IDPR as it contained 21.22% (<30%) of disordered residues with long continuous stretch of disordered domain (about 53 amino acid residues).

ORF2 protein (AB573435)

The disorder distribution analysis of the ORF2 polypeptide sequence (AB5734351) categorized it into a highly disordered protein as it contained >30% of disordered residues (39.02% by VLXT and 32.64% by VSL2). Additionally, the inclusion of long disordered domain at N-terminus in the polypeptide sequence, i.e., up to 56 to 130 consecutive amino acid residues, categorized it into IDP (as predicted by VLXT and VSL2). The DISPOPRED3 predicted the ORF2 as a moderately disordered protein or IDPR as it contained 21.81% (<30%) of disordered residues with long continuous stretch of disordered domain (about 55 amino acid residues).

ORF2 protein (AB602441)

The disorder distribution analysis of the ORF2 polypeptide sequence (AB602441) categorized it into a highly disordered protein as it contained >30% of disordered residues (41.97% by VLXT and 38.18% by VSL2), and moderately disordered protein as it contained <30% of disordered residues (26.97% by VL3). Additionally, the inclusion of long disordered domain at N-terminus in the polypeptide sequence, i.e., up to 57 to 147 consecutive amino acid residues, categorized it into IDP (as predicted by VLXT and VSL2) or IDPR (as predicted by VL3). The DISPOPRED3 predicted the ORF2 as a moderately disordered protein or IDPR as it contained 18.93% (<30%) of disordered residues with long continuous stretch of disordered domain (about 43 amino acid residues).

ORF2 protein (KJ496143)

The disorder distribution analysis of the ORF2 polypeptide sequence (KJ496143) categorized it into a highly disordered protein as it contained >30% of disordered residues (41.52% by VLXT and 36.82% by VSL2), and moderately disordered protein as it contained <30% of disordered residues (26.21% by VL3). Additionally, the inclusion of long disordered domain at N-terminus in the polypeptide sequence, i.e., up to 90 to 108 consecutive amino acid residues, categorized it into IDP (as predicted by VLXT and VSL2) or IDPR (as predicted by VL3). The DISPOPRED3 predicted the ORF2 as a moderately disordered protein or IDPR as it contained 18.78% (<30%) of disordered residues with long continuous stretch of disordered domain (about 46 amino acid residues).

ORF2 protein (KX387865)

The disorder distribution analysis of the ORF2 polypeptide sequence (KX387865) categorized it into a highly disordered protein as it contained >30% of disordered residues (41.06% by VLXT and 34.70% by VSL2), and moderately disordered protein as it contained <30% of disordered residues (27.192 by VL3). Additionally, the inclusion of long disordered domain at N-terminus in the polypeptide sequence, i.e., up to 64 to 148 consecutive amino acid residues, categorized it into IDP (as predicted by VLXT and VSL2) or IDPR (as predicted by VL3). The DISPOPRED3 predicted the ORF2 as a moderately disordered protein or IDPR as it contained 19.39% (<30%) of disordered residues with long continuous stretch of disordered domain (about 47 amino acid residues).

On combining the results, obtained from the aforementioned predictors (VLXT, VL3, VSL2, and DISOPRED3), it was inferred that the HEV exhibited significant intrinsic disorder character in the ORF2 proteins.

Categorizing protein variant on the basis of predicted disorder percentage

Next we combined the results of obtained disorder predictors for the HEV ORF2 protein sequences to categorize them into a specific category of disorder variant, i.e., ORDP, IPD, or IDPR. The mean PPID (predicted percentage of intrinsic disorder) score was obtained by summing up the individual percentage disorder score predicted by different predictors (VLXT, VL3 VSL2, and DISOPRED3) and dividing by 4. The mean PPID scores of the ORF2 proteins are mentioned in Table 3.

Table 3 Categorization of the disorder variant of the HEV ORF2 proteins

As it could be interpreted from Table 3, most of the HEV ORF2 were predicted as highly disordered proteins or IDPs due to the presence of more than 30% of the disordered residues in its polypeptide chain. However, two proteins (GU119961 and AB573435) were categorized as moderately disordered proteins or IDPRs as they consisted of less than 30% of the disordered residues in its polypeptide chain along with disordered domains (as seen in Table 2). But it is important to mention that these sequences also possessed IDP character as the percentages were only almost equivalent to 30. All the ORF2 protein sequences obtained from different genotypes showed that significant proportion of their fraction consisted disordered character. Thus, it can be assumed from these results that ORF2 belonged to the IDP category.

Disorder-based binding site in protein

In addition to intrinsic disorder, we have also computationally estimated the presence of disorder-based binding sites, MoRFs, in each HEV ORF2 protein. The predicted MoRFs for individual ORF2 protein, by different computational tools (DISOPRED3, and IUPRED3 ANCHOR and IUPRED2A ANCHOR), are listed in Table 4.

Table 4 Identified MoRF regions in HEV ORF2 proteins

Analysis of phosphorylation sites

The ORF2 sequences were predicted with several phosphorylation sites (P-sites). The predicted phosphorylated residues, i.e., Ser, Thr, and Tyr in HEV ORF2 sequences with the DEPP score are summarized in Table 5 (Fig. 4).

Table 5 Predicted number and percentage of phosphorylated residues in ORF2 of hepatitis E viruses

Our results revealed that Ser was found in higher fractions in comparison to other phosphorylated residues (Thr and Tyr) (Fig. 4). Our analysis revealed that most of the P-sites were present within the disordered ORF2 regions which clearly indicated the correlation between disordered regions and phosphorylation sites (Figs. 3 and 4).

Fig. 3
figure 3

Analysis of intrinsic disorder predisposition of HEV ORF2. A JF443720 (GT 1); B M74506 (GT 2); C AB222182 (GT 3); D GU119961 (GT 4); E AB573435 (GT 5); F AB602441 (GT 6); KJ496143 (GT 7); and H KX387865 (GT 8). Graphs AH represent the intrinsic disorder profiles of ORF2 sequences of HEV. Disorder probability was calculated using three members of the family PONDR (Prediction of Natural Disordered Regions), i.e., VLXT, VL3, and VSL2. A threshold value of 0.5 was set to distinguish between ordered and disordered region along the genome (line). Regions above the threshold are predicted to be disordered

Fig. 4
figure 4

Prediction of phosphorylation sites showing the scores of phosphorylated residues (Ser, Thr, Tyr) along with the depicted scores within ORF2. A JF443720 (GT 1); B M74506 (GT 2); C AB222182 (GT 3); D GU119961 (GT 4); E AB573435 (GT 5); F AB602441 (GT 6); KJ496143 (GT 7); and H KX387865 (GT 8). Graphs AH represent the phosphorylation patterns of the ORF2 sequences of HEV. The score was computed using DEPP (disorder enhanced phosphorylation prediction). A threshold value of 0.5 was set to distinguish between ordered and disordered region along the genome (line). The predicted phosphorylated residues above the threshold are represented as Ser (S): Blue, Thr (T): Green, and Tyr (Y): Red

Prediction of consensus GO terms

The putative 3D modeled structure-based molecular functions and biological processes (using COFACTOR algorithm) are summarized in Table 6.

Table 6 Predicted consensus GO terms for homology modeled ORF2 structures

The molecular functions included structural molecule activity, ion binding, metal ion binding, transition metal ion binding, ion-sulfur cluster binding, and oxidoreductase activity. In this regard, our gene ontology findings clearly revealed that binding interactions in conjunction with catalytic activities were attributed to ORF2. The binding interactions, such as metal cluster binding (GO:0051540), protein binding (GO:004280), transition metal ion binding (GO:0046914), and iron-sulfur cluster binding (GO:0051536) revealed the propensity of ORF2 to bind to a variety of molecules (ion, metal, protein). Furthermore, the involvement of ORF2 proteins in different predicted biological processes, such as, electron transport chain (GO:0022900), oxidoreductase activity (GO:0016638), DNA replication (GO:0006260), and cell differentiation regulation (GO:0045595) revealed the significant mitochondrial functions as well as significant processes attributed to ORF2 (Table 6).

Discussion

The three ORFs (ORF1, ORF2 and ORF3) constitute the genome of HEV [21]. The ORF2 encoded protein polypeptide comprises 660 amino acids [46] and codes for the viral capsid [22, 23]. The intrinsic disorder occurrence in diverse viral proteins has been predicted through different bioinformatics tools [4749]. The disordered segments (IDRs) in viral proteins perform indispensable functions like accommodation and adaptation of the virus in unsympathetic habitats, and assist in helping proper management of virus and invasion of the host cell pathways [50, 51]. IDPs are often frequently associated with the progression of diseases and they constitute druggable-targets [35, 36, 52, 53]. The ORF2 protein performs various regulatory roles in addition to its role in viral replication and pathogenesis [54]. Also, its application in vaccine development has been documented recently [54]. Thus, targeting the ORF2 protein is ideal for devising treatment against the HEV. Some of our recent investigations have shown varied levels of disorderedness in different HEV ORF encoded proteins [5560], however, irrespective of the significance of the ORF2 protein role in the virus life cycle, its disorder character has not been explored in different GTs and hosts. In this regard, it is essential to investigate the disorder status of the ORF2 protein (sequences that encompassed different hosts and genotypes) to understand its functions based on its disorderness. The presented study employed different computational tools to shed light on the ORF2 disorderedness in HEV functionality through utilizing GenBank data.

As detailed examination of a protein’s structure provides knowledge on its functional aspects, therefore, we scrutinized the homology modeled ORF2 structures generated through I-TASSER webserver (a portal for protein modeling and analysis). The homology modeled structures comprised major secondary elements (in form of α-helix, β-strand) and coils. As defined by Kabsch and Sander in 1983 [61], in a study [62], that though coils/loops are not necessarily found within disorder protein segments, necessarily disorderness of proteins exists in loops only [62]. On examining the ORF2 constructed homology models, we found that the structures possessed significant disordeness that initially revealed ORF2 proteins with significant percentage of IDRs within loops. Further, detailed examination of the ORF2 proteins was undertaken by employing various disorder predictors. The presented study utilized three PONDR family members VLXT, VL3, and VSL2 [3739], to examine the ORF2 proteins related to HEV. The predictor VL3 was chosen as it predicts disorderness of long segments with high accuracy [63], whereas VLXT was shown as it is very sensitive [64, 65]. DISOPRED3 was utilized as it predicts disordered segments within protein sequences precisely [66].

The complete life cycle of a virus is achieved by establishing a variety of interactions with the different components of host cell. The various stages of virus life cycle, such as, its attachment, entry, commandeering host machinery, viral component synthesis, and assembly till its exit from hosts as new infectious particle, heavily depend on the intrinsic disorder prevalent in their proteomes [67]. Importantly, studies have shown the relation of intrinsically disorder protein to specific roles [68] in viruses like HCV (hepatitis C virus) [69], MeV (Measles virus) [70], and Hendra virus [71]. The nonstructural HEV ORF1 domains like PPR (Polyproline region) [72] and Y-domain [55] in addition to other proteins [5660] have also been linked to regulation of HEV due to its characteristic intrinsic disorder property. The HEV ORF2 proteins were initially categorized into structured proteins, moderately disordered proteins, and highly disordered proteins, on the basis of the overall degree of intrinsic disorder [43]. Next, the ORF2 proteins were categorized into ORDPs, IDPRs, and IDPs on the basis of the disordered domain’s length with the overall disordered fraction of residues [44, 45]. These three categories of intrinsic disorder variants are briefly described as follows: ORDPs are variants which consist of less than 30% of disordered residues with the absence of disordered domain at either terminus (C- or N-) or in positions distinct from both terminals. IDPRs are variants which consist of less than 30% of disordered residues with the presence of disordered domain at either terminus (C- or N-) or in positions distinct from the terminals. IDPs are variants which consist of more than 30% of disordered residues. On exploiting these criteria, our intrinsic disorder propensity analysis revealed that ORF2 as a highly disordered protein or moderately disordered protein categorizing them as IDP or IDPR. Our results showed the disorderness in ORF2 protein at N-terminals is due to the occurrence of continuous long disordered domains. According to a study, it has been revealed that the N-terminal region arginine-rich motif (from 1st to 111th amino acid residues) of the ORF2 protein inhibits the phosphorylation of IRF3 (Interferon Regulatory Factor 3) via interacting with a multiprotein complex [73]. But the exact domain binding this complex remains to be determined. A recent study has shown the involvement of arginine-rich motifs in nuclear translocation of ORF2 by serving as nuclear location signals [74]. Thus, it is noteworthy to mention that the prevalent intrinsic disordered regions in ORF2 protein could perform crucial regulatory functions by interacting with the other viral and host components.

Further, studies have shown the importance of MoRFs in several viruses [55, 56, 7577]. The MoRF is defined as short segment within disordered regions of a protein (exists as IDPR or IDP) that upon binding with its partner undergoes a transition from disorder-to-order state [33]. These are segments that are prone to interactions [33]. Our study predicted MoRFs in ORF2 proteins by set of three predictors (DISOPRED3, IUPred3, and IUPred2A). DISOPRED3 server predicts the protein binding disordered regions within given target sequences [66]. We used DISOPRED3 for identifying IDRs as it provides significantly improved results over DISOPRED2 [66]. Additionally, IUPred members (IUPred3 and IUPred2A) were also utilized to predict the disordered binding regions within ORF2 protein sequences [78, 79]. IUPred3 and IUPred2A are webservers that allow identification of both disordered protein regions (using IUPred2/IUPred3) and disordered binding regions (using ANCHOR2) respectively [78, 79]. The identified multiple MoRFs at N-terminus of ORF2 protein further provided compelling evidence that ORF2 show propensity towards interaction with multiple partners with its N-terminus through its order/disorder tendency. According to some reports, ORF2 also contributes to interferon production and immunity recognition [80]. Thus, altogether these findings substantiate our results that the disordered N-terminal of the ORF2 protein interacts with various partners. Further, the ORF2 has also been linked to host tropism [81, 82]. This suggested the involvement of ORF2 in regulation and pathogenesis of HEV which shows consistency with recent report [54].

Further, studies have documented the role of post-translational modifications (PTMs) in various processes, such as folding of proteins, transduction of signals, regulation of transcription, progression of cell cycle, survival, and apoptosis [83]. Also, phosphorylation is essential for the establishment productive infection cycle for majority of intracellular pathogens [84, 85]. In RNA viruses, such as Alphaviruses [86, 87] and Flaviviruses [8890], literatures have shown the essentiality of phosphorylation in critical protein functions. In this regard, we evaluated the phosphorylation scores using the DEPP online tool by analyzing the ORF2 sequences. Our phosphorylation patterns of ORF2 protein revealed that all the sequences consisted of P-sites. The observations revealed that P-sites were prevalent within disordered segments of the ORF2 polypeptide chains. This inferred strong correlation between protein phosphorylation and intrinsically disordered ORF2 regions. Thus, our findings are in agreement with previous report that has shown the interconnection between phosphorylated residues and overall disorderness in the proteins [91, 92]. It has been suggested that the disordered segment of protein regions displays sites for PTM is perhaps due to the conformational flexibility of display sites provided by the disordered regions in the proteins [93, 94]. Report has revealed that serines’s hydroxyl group act as target for phosphorylation by protein kinases, within the disordered protein segments [95]. Therefore, the higher predicted serine (phosphorylated) residue in ORF2 proteins revealed its interacting ability and flexible nature, ultimately relating its important role to protein regulation and its linked various biological processes.

Furthermore, the predicted molecular functions and biological processes (based on 3D ORF2 structured models) using COFACTOR algorithm [40, 41], such as, ion binding, metal ion binding, transition metal ion binding, ion-sulfur cluster binding, and protein binding, were predicted which clearly revealed the propensity of ORF2 to bind to a variety of molecules, such as ions, metals, and proteins. Such kind of interactive functions have been reported in regulation of various processes, like cellular signal transduction, phosphorylation, transcription, and translation [96]. Electron transport chain occurs inside mitochondria’s matrix and it involves redox reactions which are catalyzed by oxidoreductase enzymes. The predicted ETC and oxidoreductase activity suggested the ORF2 involvement in HEV regulation as mitochondrion serves as signaling hub for immune response [97, 98]. Also, a literature has evidenced the role of complex III (of electron transport chain) in HEV infection [99], which performs assorted biological functions [100, 101]. These predicted biological processes besides structural molecule activity clearly inferred the involvement of ORF2 protein in multiple crucial roles beyond its role as capsid protein [54]. These findings further substantiate our present hypothesis. On summing up our observations, it can be hypothesized that ORF2 is a protein with multiple functions and is involved in cell regulation and pathogenesis in addition to its role as a HEV capsid protein.

Conclusions

The present study provides novel investigation on the biology of HEV ORF2 protein in terms of its disorderedness. This unique study employed disorder predictors to reveal peculiar intrinsic disorder patterns of HEV ORF2 that will help in understanding its behavior. The extent of intrinsic disorder distribution was calculated using different bioinformatics predictors by obtaining the ORF2 protein sequences from publicly available database. The initial comprehensive analysis of the ORF2 structured models showed significant percentages of coil. On analyzing the occurrence of intrinsic disorder extent, the ORF2 protein was revealed as IDP (highly disordered protein), thus suggesting its various significant roles in the life cycle of viruses. Further, the predicted MoRFs in the HEV ORF2 proteins suggested its propensity towards numerous disorder-based binding functions. These identified disordered regions in addition to disorder-based protein binding residues could perform diverse important roles such as viral replication, pathogenicity, and its particle assembly. Furthermore, the presence of several phosphorylation sites signified the involvement of ORF2 protein in various important mechanisms like cellular and signalling pathways. Moreover, structure-based prediction of crucial molecular functions and biological processes indicated multiple functions associated with it beyond its capsid function. Our study is further envisaged to provide critical information in studying the HEV ORF2 behavior.