Background

Hepatitis E virus (HEV) is a non-enveloped RNA virus of the family Hepeviridae [1]. HEV is the major causative agent of acute hepatitis worldwide. Largely, the infection is asymptomatic in the general population; however, HEV can lead to severe infections in pregnant women, such as fulminant hepatic failure with a high mortality rate (20–30%) [2]. Recently, it has been estimated that around 939 million individuals across the globe have experienced past HEV infection and around 15–110 million of the population are still undergoing or experiencing recent infections [3].

HEV is currently segregated into eight genotypes (GT 1 to GT 8). GT 1 and GT 2 infect humans and are mainly transmitted through contaminated water causing acute hepatitis while GT 3 and GT 4 strains have an expanded range of hosts which includes humans, rabbits, wild boars, and pigs [4,5,6,7] and are the cause of chronic HEV infections, especially in organ transplant patients [8, 9]. However, studies have reported the isolation of other strains of HEV from specific hosts, such as GT 5 and GT 6 from wild boars in Japan [10, 11], GT 7 from dromedary camels [12], and HEV-8 from Bactrian camels [13]. Consumption of uncooked/undercooked or raw animal meat products is regarded as the main cause of sporadic cases of HEV in developed countries [14]. Due to the continuous increase in the number of newly discovered strains and expanding host range, the implications of HEV on the health of humans remain doubtful [14]. This further complicates the transmission and the risk of HEV infection. Besides water- and food-mediated transmission routes, blood-borne transmission has also been reported in patients receiving organ transplantation [15]. Additionally, person-to-person transmission has recently been reported [16]. Additionally, evidence has indicated that pet animals including cat, dog, rabbit, and horse act as accidental hosts in the transmission of HEV to humans [17, 18]. Thus, HEV has become a global health burden in both the developing as well as developed countries and therefore requires urgent attention to design its preventive measures. Anti-HEV IgG antibody is considered as the marker for persons who have experienced past infection as it usually persists for many years [19, 20]. In contrary to this, anti-HEV IgM antibody is regarded as a marker for the ongoing or recent infection in individuals as it is short-lived (up to few months). HEV RNA detection is considered as the bona fide marker for the active ongoing infection in the population.

The HEV genome is systematized into three partially over-lapped ORFs (ORF1, ORF2, and ORF3) [21]. The largest ORF1 encodes the nonstructural proteins required for the viral replication [22, 23]. ORF2 encodes the viral capsid protein [24, 25], and ORF3 encodes a protein, which has regulatory functions [26,27,28]. The ORF1 nonstructural YDR (Y-domain region) is the second domain at 5′ end and is situated between the methyltransferase (MTase) and putative cysteine protease (PCP) domains [29, 30]. The HEV YDR critical residue indispensability was first reported by Parvez [30]. This study has suggested the presence of universally conserved residues (L410, S412, and W413) in the predicted YDR alpha-helix homolog (LYSWLFE). These critical residues have been demonstrated to play crucial role in the RNA replication of the virion [30]. It was also determined that mutations in the highly conserved cysteine dyad (C336–C337), attributed to membrane binding, completely abolished RNA replication. Such functional and/or structural integrity clearly suggests YDR essentiality in replication of HEV that might embody common principles of YDR and cytoplasmic membrane interaction [30]. Although a recent study has proposed the role of YDR in HEV replication by suggesting the essentiality of two conserved motifs (putative palmitoylation site and an alpha-helical segment) in the HEV life cycle [30], a direct correlation between the function of YDR conserved segments and viral adaptation has not been discovered. Thus, we attempted to delineate the role of YDR in viral adaptation.

The present study analyzed the structurally “unknown” regions (i.e., a fraction of a proteome that has no detectable similarity to any PDB structure) of the HEV YDR. This fraction we call it as the “dark proteome.” These disordered protein regions exist as extremely active ensembles that are rapidly interconvertible under different physiological conditions [31,32,33]. Due to the occurrence of a peculiar phenomenon, i.e., binding of several disordered regions to one ligand or vice versa (one disordered region binds to many partners), the intrinsic disordered regions are utilized in protein–protein interactions [34, 35]. Thus, the intrinsic disordered regions in proteins are considered as potential drug targets due to disordered to ordered transition state upon drug binding [36]. The current study reports analysis on the disordered side of HEV YDR using a combination of different computational methods to check the occurrence of disordered regions in order to gain insights into their disorder-related functions. As unstructured regions in viruses are strongly associated with virulence, thus, the identification of protein functions related to disorder will shed some light on the role of YDR in HEV adaptation.

Methods

Sequences

The protein sequences of HEV YDR were obtained from the GenBank. The individual protein sequence considered for the present analysis included a total of eight study sequences. The individual sequence included different genotypes, i.e., GT 1–GT 8 as currently eight genotypes have been recognized in HEV. The obtained sequences were accumulated in such a way that they encompassed different host organisms (human, swine, wild boar, and camel). Thus, we carried out multiple predictions of these eight study sequences by computational methods and comparative analyses were performed.

Structural analysis

The 3D models of HEV YDR sequences were predicted using Phyre2 (Protein Homology/AnalogY Recognition Engine) server (http://www.sbg.bio.ic.ac.uk/~phyre2/html/page.cgi?id=index) [37] and analyzed.

Amino acid distribution

The amino acid composition of the individual sequences of HEV YDR was computed and thoroughly analyzed. The analysis was conducted using the online webserver Expasy ProtParam (https://web.expasy.org/protparam/).

Protein disorder and flexibility prediction

Intrinsically disordered regions (IDRs) of the YDR sequences were predicted using the PONDR® (Predictor of Natural Disordered Regions) at its default settings. Multiple predictors such as members of the PONDR® family including PONDR®VLS2 [38], PONDR®VL3 [39], and PONDR® VLXT [40] were exploited to predict the intrinsic disorder predisposition in YDR. This bioinformatics tool predicts the residues or regions which fail in propensity for an ordered structure formation. The protein residues with predicted scores between 0.2 and 0.5 were considered as flexible, while the residues which had scores exceeding the 0.5 threshold value were predicted as intrinsically disordered ones.

Protein-binding region prediction

MoRFpred [41] online bioinformatics predictor was used to identify the protein–protein interaction regions within the HEV YDR sequences. This webserver is designed to recognize the protein Molecular Recognition Features (MoRFs). The residues which scored above the threshold value of 0.5 were considered as MoRF regions.

Nucleotide-binding region prediction

Various online servers are available to predict the RNA- and DNA-binding regions within the YDR sequences. DisoRDPbind webserver predicts the RNA-, DNA-, and protein-binding residues located in the intrinsically disordered region of proteins. DRNApred webserver provides a sequence-based prediction of DNA- and RNA-binding residues within proteins. PPRInt webserver predicts the RNA-interacting amino acid residues in the given sequence. Thus, these tools were used in combination to predict the RNA- and DNA-interacting residues within the HEV YDR sequences.

RNA-binding residue prediction

For RNA-binding residue identification, we used a combination of three webservers, i.e., DisoRDPbind [42], DRNApred [43], and PPRInt [44].

DNA-binding residue prediction

For DNA-binding residue identification, we used a combination of two webservers, i.e., DisoRDPbind [42] and DRNApred [43] webservers.

Phosphorylation prediction

The phosphorylated Ser, Thr, and Tyr residues in HEV YDR sequences were predicted using the online tool DEPP (Disorder enhanced phosphorylation prediction) (http://www.pondr.com/cgi-bin/depp.cgi). The disorder information is used by the DEPP algorithm to improve the discrimination between phosphorylation and non-phosphorylation sites. The accuracy of DEPP reaches 76.0 ± 0.3%, 81.3 ± 0.3%, and 83.3 ± 0.3% for Ser, Thr, and Tyr respectively.

Structure-based function prediction

As HEV exhibits a broad-host range, thus, HEV YDR 3D structural models were generated using YDR sequences obtained from different host organisms. The probable molecular functions were predicted using the COFACTOR algorithm [45, 46]. The analysis was conducted using the sequences AF444002 (HEV), JF443720 (human), GU119961 (swine), AB222182 (wild boar), and KJ496143 (camel).

Results

The HEV genome comprises three ORFs (ORF1, ORF2, and ORF3): The ORF1 consists of seven domains, i.e., MTase, methyltransferase; Y, undefined; PCP, papain-like cysteine protease; P/HVR, proline-rich/hypervariable region; X, macro; Hel/NTPase, helicase/nucleotide triphosphatase; and RdRp, RNA-dependent RNA polymerase. The Y-domain region (YDR) is of 228 amino acids in length (650–1339 nucleotides) and consists of a potential palmitoylation site (C336C337) and an alpha-helix segment (L410Y411S412W413L414F415E416). These segments are found to be indispensable for cytoplasmic membrane binding and are highly conserved within HEV genotypes. The YDR of HEV GT 1 (accession number: AF444002) is represented in Fig. 1.

Fig. 1
figure 1

Diagrammatic representation of hepatitis E virus nonstructural polyprotein (ORF1) domain, showing the Y-domain. The ORF1 constitutes seven domains, i.e., MTase, methyltransferase; Y, undefined; PCP, papain-like cysteine protease; P/HVR, proline-rich/hypervariable region; X, Macro; Hel/NTPase, helicase/nucleotide triphosphatase; and RdRp, RNA-dependent RNA polymerase. The Y-domain region (YDR) is of 228 amino acids in length (650–1339 nucleotides) and consists of a potential palmitoylation site (C336C337) and an alpha-helix segment (L410Y411S412W413L414F415E416). These segments are found to be indispensable for cytoplasmic membrane binding and are highly conserved within HEV genotypes

Retrieval of sequences

The YDR sequences were analyzed to assess its disorder-based binding functions, using different computational approaches. The list of sequences considered for the present analysis is listed as supplemental material (S1 Table).

Structural annotation

Comprehensive analyses of protein structures provide a detailed understanding of its function conformation in terms of amino acid sequence and composition. Thus, the YDR structure was examined thoroughly using a web portal for protein modeling and analysis. The predicted 3D models for YDR sequences were generated through the homology modeling approach (S1A–H Figure). Three states of secondary structure: helix (H; includes alpha-, pi-, and 3_10-helix), (beta-)strand (E = extended strand in beta-sheet conformation of at least two residues length), and loop (L) were identified in YDR models. The results in the YDR sequences showed the dominance of coils followed by helices and strands (S1A–H Figures). It was found that connectivity between secondary structure elements was made by long loops, called the coiled region. Additionally, in the obtained YDR models, the amino acid residues that were found to be missing indicated the presence of high conformational flexible regions (S1A–H Figure).

Analysis of amino acid distributions

The amino acid composition was thoroughly examined to identify the characteristic residue features in the YDR. The predicted amino acid percentages in YDR sequences are mentioned in Table 1 and Fig. 2.

Table 1 The predicted amino acid percentages of YDR in hepatitis E viruses
Fig. 2
figure 2

Depiction of amino acid percentage composition in the YDR sequences considered for the study: (A) JF443720 (GT 1), (B) M74506 (GT 2), (C) AB222182 (GT 3), (D) GU119961 (GT 4), (E) AB573435 (GT 5), (F) AB602441 (GT 6), KJ496143 (GT 7), and (H) KX387865 (GT 8)

Categorization of protein structure

Unexpectedly, the presence of both hydrophobic and polar residues was favored in YDR sequences. The amino acids on the basis of their relative abundance ratios are clustered into three major classes: ordered (O), disordered (D), and dual personality (DP) [47].

  • The first group constitutes the very small (Ala, Gly, Ser) as well as few hydrophilic (Glu, Lys) amino acids. These amino acids are prevalent in D fragments, while deficient in O fragments.

  • The second group comprises mostly hydrophilic amino acids (Asp, Thr, Gln, Asn, Pro, and Arg). Most of these amino acids show a higher preference towards DP fragments.

  • The third group constitutes the mostly hydrophobic amino acids (Ile, Phe, Tyr, His, Met, Cys, and Trp). These amino acids are deficient in D fragments while showing abundance in O fragments.

The considered study sequences of YDR for our analysis were observed with a higher preference towards both ordered (Leu, Phe, Tyr, Val) and disordered amino acid residues (Ala, Arg, Gly, Pro, Ser) [48,49,50,51,52,53,54] (Fig. 2). Our results thus indicated the abundance of both order-promoting and disorder-promoting amino acid residues in YDR sequences, which clearly revealed the characteristics of protein hybrids, i.e., proteins having both intrinsically disordered regions (IDPRs) and structured regions. Furthermore, the abundance of signature hydrophobic amino acid residues such as Thr, Arg, Gly, and Pro revealed that YDR possessed the characteristics of “Dual Personality” (DP) fragments, i.e., the prevalence of order as well as disorder characteristics [47]. These DP protein segments exist either in the ordered (O) or in the disordered (D) states and thus are designated as DP fragments. Therefore, DP is more rigid (ordered) in some conditions while more flexible (disordered) in others. Due to this fact, DP fragments are marginally stable in both the buried and exposed parts of the protein model [47].

Analysis of protein disorder and flexibility

PONDR

The webserver predicts the natural disordered regions upon single protein sequences. The resulting disorder profiles of YDR sequences with the predicted disorder scores clearly revealed them as moderately disordered proteins (Fig. 3A–H). They consisted of flexible N- and C-terminals with multiple flexible regions along the entire polypeptide chain length (Fig. 3A–H). The predicted intrinsic disordered residues obtained from three disorder predictors for YDR sequences are represented (Table 2) (Fig. 4A–H).

Fig. 3
figure 3

Analysis of intrinsic disorder predisposition of HEV YDR. (A) JF443720 (GT 1); (B) M74506 (GT 2); (C) AB222182 (GT 3); (D) GU119961 (GT 4); (E) AB573435 (GT 5); (F) AB602441 (GT 6); KJ496143 (GT 7); and (H) KX387865 (GT 8). Graphs AH represent the intrinsic disorder profiles of YDR sequences of HEV. Disorder probability was calculated using three members of the family PONDR (Prediction of Natural Disordered Regions), i.e., VLXT, VL3, and VSL2. A threshold value of 0.5 was set to distinguish between ordered and disordered regions along the genome (dashed line). Regions above the threshold are predicted to be disordered

Table 2 The predicted percentage of intrinsic disorder scores of YDR in hepatitis E viruses
Fig. 4
figure 4

Prediction of disordered residues in HEV YDR. A JF443720 (GT 1); B M74506 (GT 2); C AB222182 (GT 3); D GU119961 (GT 4); E AB573435 (GT 5); F AB602441 (GT 6); KJ496143 (GT 7); and H KX387865 (GT 8). The prediction of disordered residues was carried out using three members of the family PONDR (Prediction of Natural Disordered Regions), i.e., VLXT, VL3, and VSL2. A threshold value of 0.5 was set to distinguish between ordered and disordered regions along the genome (dashed line). Regions above the threshold are predicted to be disordered. The predicted disordered residues are shown with the alphabet “D”

The individual YDR sequences were analyzed for the prediction of disordered regions. Based on the overall degree of intrinsic disorder, i.e., predicted fraction of disordered residues, the proteins are categorized into different intrinsic disorder variants: structured proteins (0–10%), moderately disordered proteins (10–30%), and highly disordered proteins (30–100%) [55, 56]. The percentage fraction of disordered residues was predicted in the range of 10–30%, by VLXT in combination with VSL2. The disorder profiles of the YDR sequences, obtained from disorder predictors (VLXT and VSL2), revealed them as moderately disordered proteins, as they consisted of 10–30% of the disordered residues in their polypeptide chain, with multiple flexible regions. It was observed that YDR sequences did not possess significant disorder as mostly it consisted of structured regions. Moreover, the absence of 30 or more consecutively long amino acid regions suggests a lack of long disordered regions in YDR sequences (Table 2). Figure 3 A–H represent the disorder profiles of YDR sequences obtained from three different predictors of the PONDR family. The graph profiles showed similarity in disorder in YDR sequences at both N- and C-terminals.

Thus, it was revealed that the presence of disordered residues in the conserved “LYSWLFE” counterpart in all the YDR sequences clearly indicated that this conserved motif was characterized by structural flexibility.

Analysis of protein-binding propensity

MoRFpred: The results of MoRFs (protein-binding regions) analysis are elaborated (Fig. 5), which clearly indicated that YDR had flexible C-terminals. These regions due to possession of MoRFs can be used for protein–protein interactions due to structural flexibility.

Fig. 5
figure 5

Analysis of protein-binding propensity of HEV YDR, i.e., JF443720 (GT 1), M74506 (GT 2), AB222182 (GT 3), GU119961 (GT 4), AB573435 (GT 5), AB602441 (GT 6), KJ496143 (GT 7), and KX387865 (GT 8). The resulting protein-binding profile was calculated using MoRFpred. YDR mainly contains MoRFs at C-terminals. The protein-binding residues are depicted in blue while the non-interacting residues are depicted in black

DisoRDPbind: DisoRDPbind did not predict the protein-binding residues within the YDR sequences.

Thus, the presence of MoRFs at N- and C-terminus in the YDR sequences indicated its involvement in interaction with the MTase and PCP domain for the ORF1 functionality respectively. Also, the MoRF presence in the conserved “LYSWLFE” counterpart in YDR sequences revealed its interactive role with the host cell receptor. Therefore, our protein-binding propensity analysis indicated the important role performed by YDR disorder in the functionality of these proteins.

Analysis of nucleotide-binding propensity

A combination of different online predictors (DisoRDPbind, DRNApred, and PPRInt) was used to find out the situated protein residues that had propensity to bind to nucleotides (DNA and RNA).

Identification of RNA-binding regions

DisoRDPbind: Several RNA-binding residues were identified at the C-terminus of the YDR sequences (Fig. 6A).

Fig. 6
figure 6

A Analysis of RNA-binding propensity of HEV YDR, i.e., JF443720 (GT 1), M74506 (GT 2), AB222182 (GT 3), GU119961 (GT 4), AB573435 (GT 5), AB602441 (GT 6), KJ496143 (GT 7), and KX387865 (GT 8). The resulting RNA-binding profile was calculated using webservers (A) DisoRDPbind and (B) PPRInt. The RNA-binding residues were situated at the C-terminus of the YDR. The identified RNA-binding residues are depicted in red while the non-interacting residues are depicted in black

DRNApred: The RNA-binding residues were not predicted using the DRNApred server.

PPRInt: Numerous RNA-binding residues throughout the polypeptide chain of YDR sequences were identified (Fig. 6B).

Our RNA-binding propensity analysis revealed the presence of several RNA-binding residues in the YDR sequences. However, only the C-terminus residues in YDR showed RNA-binding affinity (as predicted by DisoRDPbind and PPRInt). Moreover, the residues were also identified within the highly conserved “LYSWLFE” segment (α-helix counterpart) of the YDR (predicted by DisoRDPbind and PPRInt).

Identification of DNA-binding regions

DisoRDPbind: The DNA-binding residues were found to be absent in the YDR sequences.

DRNApred: The DNA-binding residues were observed at both the N- and C-terminals of the YDR sequences (Fig. 7).

Thus, our DNA-binding propensity analysis revealed the presence of several DNA-binding residues in the YDR sequences. Both the N- and C-terminals including the entire length of the polypeptide chain showed DNA-binding affinity towards YDR. Moreover, the residues were also identified within the highly conserved “LYSWLFE” segment (α-helix counterpart) of the YDR (as predicted by DRNApred).

Fig. 7
figure 7

Analysis of DNA-binding propensity of HEV YDR, i.e., JF443720 (GT 1), M74506 (GT 2), AB222182 (GT 3), GU119961 (GT 4), AB573435 (GT 5), AB602441 (GT 6), KJ496143 (GT 7), and KX387865 (GT 8). The resulting DNA-binding profile was calculated using webservers DRNApred. The DNA-binding residues distributed throughout the polypeptide chains of the YDR sequences

Therefore, our nucleotide propensity analysis indicated the high propensities of these predicted residues towards RNA and DNA. Moreover, the residues predicted within the “LYSWLFE” segment indicated its involvement in the critical function of viral replication.

Analysis of phosphorylation sites

Our phosphorylation analysis showed the presence of phosphorylation sites (P-sites) in all the YDR sequences. The predicted phosphorylated residues, i.e., Ser, Thr, and Tyr, in HEV YDR sequences with the DEPP score are summarized (Table 3) (Fig. 8).

Table 3 Predicted number and percentage of phosphorylated residues in YDR of hepatitis E viruses
Fig. 8
figure 8

Prediction of phosphorylation sites showing the scores of phosphorylated residues (Ser, Thr, Tyr) along with the depicted scores within YDR. A JF443720 (GT 1); B M74506 (GT 2); C AB222182 (GT 3); D GU119961 (GT 4); E AB573435 (GT 5); F AB602441 (GT 6); KJ496143 (GT 7); and H KX387865 (GT 8). Graphs AH represent the phosphorylation patterns of the YDR sequences of HEV. The score was computed using DEPP (Disorder Enhanced Phosphorylation Predictor). A threshold value of 0.5 was set to distinguish between ordered and disordered regions along the genome (line). The predicted phosphorylated residues above the threshold are represented as Ser (S), blue; Thr (T), green; and Tyr (Y), red

Our results revealed that Ser was found in higher fractions in comparison to the other phosphorylated residues, i.e., Thr and Tyr (Fig. 8A-H). It was revealed that most of the phosphorylation sites (P-sites) were found within intrinsically disordered regions of the YDR (S2A–H Figure). VLXT is considered the most accurate predictor due to the different attributes that make up this algorithm and good accuracy [57]. Thus, we used the disorder information (as predicted by VLXT) of YDR to correlate the presence of P-sites and non-phosphorylation sites. Figure 8A-H shows the phosphorylation pattern profiles of the YDR sequences with the predicted DEPP scores. Our results revealed that the phosphorylated residues (Ser, Thr, and Tyr) were present within the disordered fragments of YDR, which clearly indicated the correlation between disordered regions and phosphorylation sites (S2A–H Figure). The specific amino acid position of the predicted phosphorylated residues in YDR is shown (Fig. 9).

Fig. 9
figure 9

Depiction of phosphorylated residues within HEV YDR (A) JF443720 (GT 1); (B) M74506 (GT 2); (C) AB222182 (GT 3); (D) GU119961 (GT 4); (E) AB573435 (GT 5); (F) AB602441 (GT 6); KJ496143 (GT 7); and (H) KX387865 (GT 8). The was carried out using DEPP (Disorder Enhanced Phosphorylation Predictor). The predicted phosphorylated residues in the YDR proteins are marked with asterisk (*)

Prediction of molecular functions

The putative molecular functions of the YDR based on the predicted 3D structures were identified using the COFACTOR algorithm. The consensus GO annotations associated to the models are summarized in Table 4.

Table 4 Predicted consensus GO terms for YDR models

The molecular functions included heme binding, copper ion binding, ubiquinone binding, nucleotide binding, ATP binding, ion binding, electron transfer activity, cytochrome-c oxidase activity, N-glycosylase activity, ligase activity, kinase activity, and citrate-synthase activity. Thus, binding interactions and catalytic activities were the major functional roles that were attributed to YDR in respective hosts. The binding interactions, such as heme binding (GO:0020037), ion binding (GO:0043167), and nucleotide binding (GO:0000166), revealed the propensity of YDR to bind to a variety of molecules (protein, nucleotide, ion), similar to our earlier results. Furthermore, the predicted different catalytic activities, such as electron transfer activity (GO:0009055) and cytochrome c oxidase activity (GO:0004129), revealed the significant mitochondrial functional roles associated with YDR in respective host organisms (Table 4).

Discussion

The functional implication of YDR in HEV adaptation remains to be explored. To complete the life cycle, viruses require various interactions with the components of the host cells, beginning from the virus’s attachment, its entry, commandeering the host machinery, synthesis of the viral components, and particle assembly to the last phase, i.e., exiting as new infectious particles from the host cell [58]. All these stages rely heavily on the intrinsic disorder prevalent in viral proteins [58]. Thus, intrinsic disorder is linked with the pathogenesis and infection of the virions. Therefore, the presented study reports the analysis on the unstructured regions of YDR to shed novel light on its functionality in HEV regulation. Moreover, other parameters in proteins such as structural annotation, function, and protein–protein interactions also influence the process of adaptation [59]. Thus, we employed different bioinformatics predictors based on a set of algorithms to analyze the effect of these factors on YDR in order to delineate its role in viral adaptation.

The diversifications in structure and amino acid composition play a vital role in the evolutionary adaptation. Our initial structural investigation on the YDR model revealed the presence of all three secondary structural components, i.e., alpha-helix (α), beta-strand (β), and coils. All the YDR sequences consisted of higher percentage of α-helices as compared to β-strands with the predominance of coils which is in agreement with the recent study [60]. Then, we next examined the amino acid composition in different YDR sequences to reveal the residue percentages. The disordered regions are rooted in the idiosyncrasies of their amino acid composition, which are deficient in order-promoting residues (Trp, Cys, Tyr, Ile, Phe, Val, Asn, and Leu) and abundant in disorder-promoting residues (Arg, Pro, Gln, Gly, Glu, Ser, Ala, and Lys) [48,49,50,51,52,53,54]. Thus, sequence-based analyses of YDR uncovered both ordered (Val, Leu, Phe, Tyr, and Ile) and disordered (Arg, Ala, Ser, Pro, Gly) promoting residues, categorizing it as DP fragments, i.e., consisting of both structured (ordered) and unstructured (disordered) regions [47]. These DP fragments exhibit peculiar characteristics between order and disorder which distinguish them from both regularly folded proteins and intrinsically disordered proteins/ protein fragments. Additionally, Dunker and colleagues demonstrated the dominance of six signature amino acids (Thr, Arg, Gly, Asn, Pro, and Asp) in DP fragments which determine their distinguishing conformational physiognomies. Thus, predominance of hydrophobic amino acid residues such as Thr, Arg, Gly, and Pro further substantiates our present findings that YDR possesses the characteristics of “Dual Personality” (DP) fragments [47].

In line with this, our intrinsic disorder propensity analysis also revealed YDR to be moderately disordered proteins. Based on the overall degree of intrinsic disorder, i.e., predicted fraction of disordered residues, the different intrinsic disorder variants are categorized into structured proteins (0–10%), moderately disordered proteins (10–30%), and highly disordered proteins (30–100%) [55, 56]. The YDR sequences considered in the study consisted of 10–30% of the disordered residues and thus were categorized into moderately disordered proteins, i.e., protein hybrids consisting of both structured regions as well as unstructured (disordered) regions. Thus, it is noteworthy to mention that YDR possessed both ordered and disordered domains [47]. Additionally, evidence has suggested order/disorder transitions in some DP fragments (upon signals), which can contribute to protein activity through regulation [47]. Our intrinsic disorder propensity analysis suggested the presence of some disordered regions in YDR sequences (found to be ordered in other databases), which suggests their order to disorder tendency upon binding. This clearly reveals the peculiar characteristic of dual-personality fragments which straddles between the ordered and disordered protein phases [61, 62]. Additionally, highly flexible and disordered segments in DP on binding with substrate or by protein phosphorylation become ordered fragments, suggesting order/disorder transition in DP fragments [47]. This substantiates our findings which revealed the role of YDR in regulation through order/disorder tendency.

Furthermore, it has been well documented that disordered protein segments possess enormous flexibility [34]. These intrinsically disordered segments in proteins perform a variety of important cellular functions by binding through specific interactions with RNA, DNA, and protein ligands [35, 36]. There are many computational methods through which intrinsically disordered proteins (IDPs) or intrinsically disordered protein regions (IDPRs) can be predicted within protein sequences; however, only few of them can predict the given protein’s functions through its protein-binding propensity. MoRFpred is a computational sequence-based prediction tool used to characterize short disorder-to-order transition binding regions in the target protein upon identification. It is based on a novel design and identifies all types of MoRFs (α, β, coil, and complex) with accuracy [41]. DisoRDPbind webserver predicts the disordered RNA-, DNA-, and protein-binding residues located within the disordered segments of target proteins. We used DisoRDPbind as it is user-friendly and provides accurate predictions, as well as it provides insights into the multiple functions carried out by the disordered protein regions [42]. Moreover, protein–RNA and protein–DNA interactions also play diverse and essential cellular functional roles [35, 36]. Most of the sequence-based bioinformatic predictor tools are relatively slow and could not accurately predict the RNA- and DNA-binding residues and sometimes result in cross-predictions of RNA-binding residues with DNA-binding residues and vice versa. Therefore, we used DRNApred, a relatively fast sequence-based method, that accurately predicts and differentiates RNA- and DNA-binding residues [43]. Therefore, we used a combination of different predictors (MoRFpred, DisoRDPbind, and DRNApred) to identify the disorder-based functions of YDR by carrying out its sequence-based binding tendency.

MoRFs specifically focus on interactions between proteins and are considered as a specific subset of DP fragments [47]. MoRFs are short-disordered segments in IDPs/IDPRs that are prone to interactions with their binding partners upon transition from a disorder-to-order state [44]. The presence of MoRFs at the C-terminals of YDR suggests its engagement with the ORF1 PCP domain. Also, MoRF at N-terminus in two YDR sequences (JF443720 and AB602441), suggests that YDR is engaged in with the MTase domain. The sequence alignment of the HEV and the closely related viruses (EEV, SFV, and SINV) showed universally conserved residues (Lys, Ser, and Trp) in the amphipathic α-helical segment (LYSWLFE), which has been implicated in intracellular membrane binding. Similarly, the YDR of nonstructural ORF1 polyprotein consists of a membrane-binding motif having structural/functional significance in the replication and infection of HEV [30]. The multiple sequence alignment of the HEV strains showed the presence of a highly conserved α-helix segment (LYSWLFE) within the YDR of ORF1. This highly conserved α-helical motif in YDR of HEV plays an indispensable role in membrane-binding interaction. Moreover, Trp, a hydrophobic residue, within this conserved segment has been demonstrated to play a crucial role in PPIs through protein folding. Thus, the presence of disordered residues in the conserved “LYSWLFE” counterpart clearly suggests that this conserved motif is essential for the interaction of YDR with their binding partners due to the possession of structural flexibility. Additionally, the presence of MoRFs in this conserved region n YDR further signifies that these conserved residues might assist in guiding the specific function of membrane binding. Therefore, it is interesting to mention that our MoRFs prediction in this signature α-helical counterpart provides compelling evidence of YDR involvement in membrane binding through PPI. Furthermore, we also predicted the interactions between protein and DNA- and RNA-binding residues to provide deep knowledge into the functional role of YDR. Our nucleotide-binding analysis revealed that YDR showed high propensity towards RNA- and DNA-binding residues. Identification of nucleotide-binding residues at C-terminals, which also included some residues within the LYSWLFE segment in the YDR (as predicted by DisoRDPbind), revealed flexible (disorder-based) RNA-binding regions, thus elucidating the critical residue role in viral replication of HEV as suggested earlier [30]. Moreover, the presence of both RNA- and DNA-binding residues within the conserved “LYSWLFE” segment revealed that these residues may play an important role at the transcriptional or translational level which is in accordance with the previous report [30]. Thus, the presence of molecular recognition (protein, RNA, and DNA-binding) in the LYSWLFE conserved counterpart (C-terminus) suggests YDR functional/structural essentiality in HEV replication and intracellular membrane binding which is consistent with the previous report [30]. Though these findings enhance our knowledge on this precisely understood Y-domain, however, further information is still required to delineate its function and its conserved residues criticality in the viral replication.

Fluctuation in the conformation of the intrinsically disordered regions in proteins transiently reveals dynamic interaction motifs, which lead to post-translational modifications (PTMs), resulting in their interaction with several target protein molecules that have an effect on cell cycle control [63, 64]. PTM is an essential requirement of a protein to carry out the regulation of various functions. Phosphorylation of viral proteins for many acute RNA viruses including Alphaviruses [65,66,67,68] and Flaviviruses [69,70,71,72,73] has been demonstrated to be critical for protein functionality. Protein phosphorylation is also essential for many intracellular pathogens to establish a productive infection cycle [74, 75]. Also, phosphorylation is required for protein folding, signal transduction, intracellular localization PPIs, transcription regulation, cell cycle progression, survival, and apoptosis [76]. Thus, the phosphorylation patterns of YDR were analyzed to study its related functions using an online algorithm DEPP. It was revealed that all the YDR sequences consisted of P-sites. Our observations revealed that P-sites were predicted within the disordered regions of the YDR’s polypeptide chains, suggesting tight interconnection between protein phosphorylation and disordered YDR regions. These findings are in accordance with the existing literature which suggested that the overall phosphorylated residues show an inclination towards disordered regions rather than the ordered protein segments [77, 78]. Indeed, computational analysis through various prediction tools has shown that disordered protein segments are enriched in phosphorylation sites (P-sites) [77, 78]. This underlines the significance of disordered regions as display sites for PTMs, probably due to the conformational flexibility provided to the display sites by the disordered region over ordered region in proteins [79, 80]. Furthermore, DP fragments have been closely linked to post-translational modifications, as post-translationally modified sites are located at/close to DP segments [47], further signifying that YDR has the characteristics of a DP molecule [47]. Moreover, the hydroxyl group present in the disordered protein segments of serine has been suggested as a target for phosphorylation by protein kinases [81]. Thus, the higher number of the predicted phosphorylated serine residues in the YDR sequences reveals the flexibility and interacting ability, characterizing its important role in protein regulation via various biological processes [47].

Furthermore, the predicted YDR 3D models were used to predict the molecular functions using GO annotations [35, 36]. The molecular functional roles revealed numerous potential sites. The predicted sites were shown to have interacted with several ligands including modified sites that bind to enzymes in conjunction with sites binding to nucleotides, proteins, and metal ions. Thus, our results suggest the involvement of YDR in binding to a wide range of substrates. These types of interactions have been reported to contribute to the regulation of various processes in cells such as cellular signal transduction, phosphorylation, transcription, and translation [34]. Moreover, the multiple catalytic functions associated with YDR in different hosts clearly indicate the YDR multifunctionality associated with it. Electron transfer activity and cytochrome c oxidase activity were among major catalytic functions, thus revealing YDR involvement in HEV regulation as mitochondria not only serve as signaling hubs for immune responses but also lead to facilitation of downstream signaling resulting in IFN synthesis [82, 83]. Mitochondrian remains in constant communication with the cytosol for the initiation of biological events. Additionally, mitochondrial functions are also strategically altered by viruses which affect the energy production, metabolism, and immune signaling [84]. Moreover, it has been suggested that complex III of the mitochondrial electron transport chain performs diverse biological functions [85, 86]. Recently, a study has suggested the important role of complex III in HEV infection [87]. It has been demonstrated that inhibition of complex III inhibits the replication of HEV, i.e., complex III is required for the sustenance of HEV infection [87]. These findings provide further evidence that YDR participates in regulatory functions of HEV.

Moreover, several disordered regions in nonstructural proteins have been demonstrated to play specific regulatory functions in viruses [88]. For instance, replication of hepatitis C virus (HCV) depends on the nonstructural NS5A protein which forms a multi-protein complex [89] by interacting with numerous viral and host proteins [90, 91] via its disordered domain [92]. The disordered region in the P (Polymerase) protein has been reported in Paramyxoviridae [93,94,95] and Rhabdoviridae [96, 97] at its N-terminus (PNT). It has been demonstrated that PNT domain interacts with N (nucleocapsid) and cellular protein of MeV (Measles virus) and N and L proteins of SeV (Sendai virus). The disordered N in the MeV has been shown to interact with several viral and host cellular components [98]. The disordered components along with the structural components were also observed in proteins like nucleoprotein and phosphoprotein of Nipah and Hendra viruses [99]. These protein–protein interactions result in the occurrence of several significant biological functions. Moreover, the polyproline region (PPR) of nonstructural ORF1 has been associated with the regulation of HEV in addition to its role in replication, due to its characteristic intrinsic disorder property [100]. Thus, it is noteworthy to mention that the intrinsic disordered regions in YDR could perform crucial regulatory functions by interacting with the other viral and host components.

To sum up our observations, it can be hypothesized that YDR has regulatory functions in addition to its role in the replication of HEV that is essential for viral adaptation. The inclusive information provided in this prospective study thus strongly proposes the role of YDR in HEV adaptation.

Conclusions

The current study provides novel data on the role of YDR in HEV adaptation. The amino acid distribution revealed the signature residues prevalent in DP fragments. The presence of both ordered and disordered amino acid residues revealed YDR as protein hybrids. The occurrence of the unstructured region in YDR sequences suggested their disorder and flexibility. We also established that all the YDR sequences consisted of MoRFs, thus revealing its disorder-based propensity towards protein-binding partners. Furthermore, identification of several RNA- and DNA-binding sites in the YDR sequences suggested its critical role in the interaction with the hosts and further viral infection. The presence of various phosphorylation sites in YDR further signified it as an important constituent of mechanisms involving cellular and signaling pathways. Additionally, the presence of P-sites within the disordered segments of YDR further substantiated our findings, as PTM sites are located at/close to DP fragments. Furthermore, structure-based analysis of YDR models revealed several potential sites which further signifies their role in vital processes like cellular signaling transduction, phosphorylation, transcription, and translation by interacting with several ligand molecules, which suggested its noteworthy multiple functions associated with it. The involvement of YDR in mitochondrial functions further revealed its association with regulatory functions. Due to the DP flexibility to associate with different physiological partners, our analysis is envisaged to assist in producing important knowledge in the interaction of YDR with other HEV proteins. Furthermore, delineations of these interactions could possibly contribute to future research in revealing the molecular biology of HEV.