1 Background

Hepatitis E virus (HEV) is the major aetiological agent of hepatitis E, also called enteric hepatitis (enteric means related to the intestines) infection [1]. Worldwide, about 20 million HEV infections and 3.3 million symptomatic hepatitis E cases occur annually, which results in 44,000 deaths [2]. HEV is a quasi-enveloped Orthohepevirus [3], with a single-strand, positive-sense RNA genome of around 7.2 kb in length and flanked with short 5′ and 3’ non-coding regions (NCR) [4]. The HEV genome comprises three partially overlapped open reading frames (ORFs): ORF1, ORF2 and ORF3. The ORF1, ORF2 and ORF3 encode the non-structural polyprotein (pORF1), capsid protein (pORF2) and the pleotropic protein (pORF3), respectively [5].

The ORF1 consists of seven domains: methyltransferase (MTase/MeT), Y (Y ), papain-like cysteine protease (PCP), hypervariable region/proline-rich hinge (HVR/PPR), X (macro), helicase (Hel/NTPase) and RNA-dependent RNA polymerase (RdRp) [6]. Several studies have reported the expression and characterization of full-length pORF1, but its function as a single polyprotein with multiple functional domains remains debated [6,7,8,9]. Recently, a study suggested the role of Y-domain sequences in HEV life cycle through gene regulation and/or ER membrane binding in replication complexes [9, 11]. A highly conserved cysteine dyad ‘C336–C337’ in the HEV Y-domain is identified as a potential palmitoylation-site homolog of closely related alphavirus non-structural polyprotein attributed to membrane binding, wherein C→A mutation has completely abolished RNA replication. In addition, substitutions of the universally conserved Y-domain residues (L410, S412 and W413) in the predicted alpha-helix homolog (L410Y411S412W413L414F415E416) are also shown to abort HEV RNA replication. Regardless of its important role, the putative Y-domain is not well characterized structurally or functionally. Thus, we conducted computational analyses to provide an insight into the molecular characteristics of this potential region.

2 Methods

2.1 Amino acid sequence retrieval

The ORF1 Y-domain amino acid sequence of HEV was retrieved from GenBank database NCBI (National Center for Biotechnology Information). The source of the sequence was AF444002.1 with protein ID AAL50055.1 (26…0.5107). The putative Y-domain was explored utilizing 218 amino acid long sequence. This obtained study sequence was used for carrying out the structural and functional analysis in the present study.

2.2 Multiple sequence alignment and phylogenetic analysis

A total of 50 Y-domain protein gene sequences of HEV were retrieved from GenBank. Sequences from different genotypes and various hosts, encompassing humans, pigs and rabbits were included in the present study. The multiple sequence alignment was achieved using Clustal X2 in BioEdit v.7.2 [12]. The phylogenetic tree was generated in MEGA v.6.06 software [13], using best-fitting nucleotide substitution model, with the general time-reversible (GTR) model and gamma distribution. To evaluate the reliability of a tree, bootstrap analysis was used by setting a value up to 1000 replicates.

2.3 Physicochemical properties analysis

The amino acid sequence of HEV Y-domain was retrieved in FASTA format and used as query sequence for determination of physiochemical parameters. The various physical and chemical parameters of the retrieved sequence were computed using ProtParam (Expasy), a web-based server [14]. Various parameters were employed by ProtParam tool; amino acid composition, instability index (II—protein stability) [15], aliphatic index (AI—relative volume occupied by protein’s aliphatic side chains) [16], extinction coefficients (EC—protein–protein/protein–ligand interactions quantitative study) [17], grand average of hydropathicity (GRAVY—sum of all hydropathicity values divided by number of residues in a sequence) [18], theoretical, pI, half-life [19] and number of positive and negative residues.

2.4 Primary and secondary structural analysis

The structural analysis was conducted using different online webservers ProtParam (Expasy) [14], PSIPRED (http://bioinf.cs.ucl.ac.uk/psipred) and SOPMA (self-optimized prediction method with alignment) [20]. Initially, the primary structure of the Y-domain in terms of amino acid composition was scrutinized using a combination of two different webservers ProtParam (Expasy) [14] and PSIPRED (http://bioinf.cs.ucl.ac.uk/psipred). Then, the SOPMA software was used to predict the secondary structure of the Y-domain, to reveal the secondary element content details in terms of the fraction of alpha-helix (α), beta-strand (β) and random coil.

2.5 Homology modelling and 3D structure validation

Due to the unavailability of experimentally deduced Y-domain 3D structure in protein data bank (PDB), we modelled the unexplored domain using homology modelling approach. The tertiary structure of the target protein domain was modelled using three different online programs RaptorX (http://www.raptorx. uchicago.edu/), Phyre2 (http://www.sbg.bio.ic.ac.uk/phyre2) and I-TASSER [21]. The generated 3D modelled protein structures of the Y-domain were validated using Ramachandran plot. In order to find the energetically favourable residues within the 3D models, PROCHECK (http://nihserver.mbi.ucla.edu/SAVES) was utilized for the generation of Ramachandran plots. Ramachandran plots provide an overview of φ-ψ torsion angles of the protein backbone. It also provides a measure of the percentage of favourable residues as well as residues present within allowed and outlier regions. The most suitable 3D modelled protein structure of the Y-domain was selected for final analyses.

2.6 Functional analysis

Post-translational modification predictions N-linked and O-linked glycosylation and phosphorylation sites in the Y-domain were predicted using NetNGlyc-1.0 (https://services.healthtech.dtu.dk/service.php?NetNGlyc-1.0) and NetOGlyc-4.0 (https://services.healthtech.dtu.dk/service.php?NetOGlyc-4.0) servers, respectively. The phosphorylation sites were also predicted using NetPhos-3.1 (https://services.healthtech.dtu.dk/service.php?NetPhos-3.1) server. For phosphorylation studies, we performed both generic and kinase-specific predictions. The servers were provided by Centre for Biological Sequence Analysis, Technical University of Denmark (CBS DTU). Motif prediction The presence of several motifs and other modified sites in the Y-domain were predicted using the ANTHEPROT v.6.9.3. Peptide signal detection Location of signal peptide cleavage sites as well as nuclear localization signals (NLS) in the Y-domain was predicted using Signal P-4.1 [22] and cNLS Mapper [23,24,25], respectively. Cysteine residues prediction CYC_REC tool was used to predict the SS bonding of cysteine residues in the Y-domain protein sequence. Subcellular localization prediction with functional annotation CELLO2GO  [26], a web-based public system, was used to infer biological function for the non-structural Y-domain. It was also used for the identification of its subcellular localization. Mutational analysis PROVEAN (Protein Variation Effect Analyzer) version 1.1 (http://provean.jcvi.org/seq_submit.php) and I-mutant2.0 (https://folding.biofold.org/i-mutant/i-mutant2.0.html) webservers were used to predict the effect of amino acid mutation on the biological function of the Y-domain.

3 Results

The HEV genome comprises three ORFs (ORF1, ORF2 and ORF3): The ORF1 consists of seven domains, out of which we have characterized the Y-domain in the present study. The seven domains include: MTase: methyltransferase; Y: Y; PCP: papain-like cysteine protease; P/HVR: proline-rich/hypervariable region; X: macro; Hel/NTPase: helicase/nucleotide triphosphatase; and RdRp: RNA-dependent RNA polymerase [6, 9]. The Y-domain of non-structural ORF1 of HEV consists of 228 amino acid residues (650–1339 nucleotides) and comprises potential palmitoylation site (C336C337) and an alpha-helix segment (L410Y411S412W413L414F415E416) [11], as represented in Fig. 1. These segments are found to be indispensable for cytoplasmic membrane binding and are highly conserved within HEV genotypes [11]. The HEV Y-domain (accession number: AF444002) was retrieved from the NCBI and was analysed to assess its various structural and functional properties, using different in silico approaches.

Fig. 1
figure 1

Diagrammatic representation of the HEV largest non-structural polyprotein (ORF1) showing the Y-domain of HEV. The HEV ORF1 constitutes seven domains which include MTase: methyltransferase; Y: Y; PCP: papain-like cysteine protease; P/HVR: proline-rich/hypervariable region; X: macro; Hel/NTPase: helicase/nucleotide triphosphatase; and RdRp: RNA-dependent RNA polymerase. The Y-domain (AF444002) constitutes 228 amino acids and comprises a potential palmitoylation-site (C336C337) and an alpha-helical motif (L410Y411S412W413L414F415E416)

3.1 Analysis of phylogenetic tree

Our phylogenetic analysis of Y-domain gene sequences, as listed in Fig. 2, revealed that the AF444002 sequence was closest to the reference strain NC_001434 of HEV (Additional file 1: Figure S1). It was evident that the study sequences collected from different geographical regions showed the conservation of Y-domain protein genes across all HEV isolates (Fig. 2). Prevalence of non-synonymous mutations at N-terminal, rather than C-terminal, was observed in the HEV Y-domain alignment (Fig. 3). Further, it was revealed that the sequences formed different clades in terms of genotypic distribution and had dissimilar topography (Additional file 1: Figure S1).

Fig. 2
figure 2

Alignment of amino acid sequences in Y-domain protein genes of HEV genomes showing the sequence conservation in different hosts across all genotypes. The analysis includes a total of 50 sequences

Fig. 3
figure 3

Alignment showing the comparative analysis of amino acid sequences in Y-domain protein genes of HEV genomes at (A) N-terminal and (B) C-terminal. The substitutions are shown by amino acid symbols at the respective positions, and similarities are represented by the dots

3.2 Analysis of physicochemical parameters

Physiochemical analysis showed that HEV Y-domain polypeptide (with reference to AF444002) is 218 amino acids (24.63 kDa), with an isoelectric point (pI) of 9.13. The computed instability index was 41.57, which classified it as an unstable protein (> 40 value implies unstable protein). A high aliphatic index (82.75) value suggested the increased thermostability of the protein for a wide temperature range. Further, the grand average of hydropathicity (GRAVY) value of -0.141 indicated the hydrophilic nature of the protein. (Positive score indicated hydrophobicity.) Taken together, the protein was found to be basic in nature and appeared to have better interaction with water (Table 1).

Table 1 Physiochemical parameters of Y-domain

3.3 Analysis of primary structure

Proteins differ from one another in their structure, primarily in their sequence of amino acids. The linear sequence of the amino acid polypeptide chain refers to its primary structure. The amino acid composition of Y-domain is summarized in Table 2 (Fig. 4).

Table 2 Amino acid composition of Y-domain
Fig. 4
figure 4

Representation of amino acid composition in HEV Y-domain using PSIPRED. Thirty-eight percent of total amino acids were non-polar, 35.32% were small non-polar, 27.06% were polar, 20.64% were hydrophobic, and 13.30% contributed to aromatic plus cysteine

The distribution of amino acids revealed Thr, Leu, Arg, Ala and Val/Ser as the five top-most contributing residues. Also, the prevalence of Gly and Pro was observed in the Y-domain.

3.4 Analysis of secondary structure

SOPMA predicted the secondary structure of the model that consisted of 40.37% alpha-helix, 20.64% beta-strand and 32.57% random coil (Fig. 5). The default parameters (similarity threshold: 8; window width: 17) were considered by SOPMA for the secondary structure prediction with > 70% prediction accuracy, utilizing 511 proteins (sub-database) and 15 aligned proteins. Although α-helix was one of the prominent secondary structures found in our protein, the presence of other conformations was also predicted. Secondary structures predicted in the Y-domain are described as follows (Table 3).

Fig. 5
figure 5

HEV Y-domain showing the secondary structure elements. The analysis was conducted using SOPMA. SOPMA predicted that 40.37% of total amino acids contributed to alpha-helices, 20.64% to beta-strands and 32.57% to random coils

Table 3 Secondary structure elements prediction by SOPMA

The protein secondary structure consists of helices, beta-strands and coils, and coil comprises turns, bulges and random coils  [27]. The α-helix, a right-handed coiled structure (40.37%), was the most prevalent helical arrangement found in the Y-domain protein. The presence of other helical conformations such as π and 310-helices was not detected. Helices have minimum steric hindrances and high potential for the formation of hydrogen bonds. The β-structures are also one of the major secondary structure elements found in the proteins. Our protein consisted of (20.64%) β-strands. β-sheet consists of several β-strands stabilized by inter-chain or intra-chain hydrogen bonds. The sharp or tight turns in proteins are called β-turns  [28]. The Y-domain consisted of 6.42% of β-turns. β-turns are short stretches of four amino acid residues and play a crucial role in both confirmation and function of proteins  [28]. Random coils were also found to be prevalent among the secondary structure elements found in the Y-domain (32.57%).

3.5 Analysis of predicted 3D structure via homology modelling and its validation

The amino acids structural diversity plays a vital role in the formation of protein self-assembly. The three-dimensional spatial arrangement of amino acid residues in a protein is known as the tertiary structure. The secondary structure elements (helices and strands) are combined in different ways to form three-dimensional structures of a protein. To perform structure-based drug designing, it is quite essential to build a reliable model. Thus, the target Y-domain sequence was inserted (FASTA format) in three different workspaces and the structured models were predicted (Fig. 6, Additional file 3–Additional file 6: S3–S6 Files). The generated 3D tertiary structures of the Y-domain were analysed by visualization through homology modelling approach. All the predicted 3D models were assessed using Ramachandran plot analysis (PROCHECK). The overall protein’s stereochemical quality, amino acids present in the allowed, disallowed region and the G-Factor for the various models were evaluated (Fig. 7, Additional file 2: Figure S2).

Fig. 6
figure 6

The 3D structures of the Y-domain of HEV modelled using different online servers: (A) RaptorX; (B) Phyre2; (C) I-TASSER (model 1); and (D) I-TASSER (model 5)

Fig. 7
figure 7

The Ramachandran plots of the generated 3D models of Y-domain of HEV showing the favoured regions: (A) RaptorX; (B) Phyre2; (C) I-TASSER (model 1); and (D) I-TASSER (model 5). The analysis was conducted using PROCHECK

A good quality model should have percentage favourable regions above 90% (http://www.ebi.ac.uk/thornton-srv/databases/pdbsum)  [29]. The stereochemical evaluation of backbone φ and ψ dihedral angles of Y-domain, modelled from “RaptorX, by PROCHECK revealed that 88.4% of the residues were in the most favoured regions in comparison with the structures modelled by Phyre2 (78.4%), I-TASSER (model 1) (63.2%) and I-TASSER (model 2) (67.4%) models. Additionally, the overall average G-Factor value predicted by I-TASSER was found to be unusual (i.e. values below -0.5) as compared to the RaptorX model and Phyre2 model having values −0.22 and −0.14, respectively (Additional file 2: Figure S2, Table 4). On combining these two parameters, the model obtained from “RaptorX” was observed to be most reliable as it consisted of 88.4% (closest to 90%) of favourable regions and a usual G-Factor value. The Ramachandran plots of the predicted models showing the percentage favourable regions are illustrated in Fig. 7. Thus, the obtained most thermodynamically stable model (generated by RaptorX) was selected and further used for analysis (Fig. 7A).

Table 4 PROCHECK statistics of Y-domain 3D structures obtained using different tools

For the RaptorX model, the best template selected was 4n20A (hydrolase protein from organism Homo sapiens). However, details about the chosen template were not provided by the server in terms of similarity with the Y-domain (Additional file 7: Figure S7). It is interesting to mention that the modelled Y-domain structure consisted of 36% of α-helix, 22% of β-strand and 41% of coil, which is in excellent agreement with the secondary structural prediction by SOPMA (40.37% α-helices, 20.64% β-strands and 32.57% coils) (Additional file 7: Figure S7). Further, this 3D model was analysed for the presence of cleft, tunnel or pore. It was revealed that the overall modelled protein structure was irregular and revealed ten clefts, one pore and five tunnels (Fig. 8). The modelled protein secondary structure consisted of various motifs, as predicted by PROCHECK, which included 2 sheets, 2 beta-hairpins, 1 beta-bulge, 5 strands, 5 helices, 7 helix–helix interactions, 14 beta-turns and 1 gamma-turn.

Fig. 8
figure 8

Surface representation of the 3D Y-domain structure of HEV. The overall shape of the protein is irregular having several clefts and tunnels

3.6 Analysis of functional characteristics

3.6.1 Prediction of modified sites and motifs

Several post-translationally modified sites were predicted within the Y-domain. One N-linked (Additional file 8: Figure S8) and two O-linked possible sites for glycosylation were found in the Y-domain. Additionally, a total of 12 phosphorylation sites, including 6 Ser, 5 Thr and 1 Tyr, were predicted in the Y-domain. The phosphorylation sites prediction with the score is summarized in Additional file 9: Table S9. Further, we identified several motifs in the Y-domain, which included four protein kinase C phosphorylation sites, two casein kinase II phosphorylation sites and two N-linked myristoylation sites. The identified motifs are mentioned in Table 5 (Additional file 10: Table S10).

Table 5 Motif regions present in the Y-domain protein sequence

3.6.2 Prediction of signal and localization

The potential cleavage site for signal peptide was found to be absent in the amino acid sequence (Fig. 9). The NLS signal was absent, which suggested the Y-domain to be non-nuclear in origin. Then, the subcellular localization of the Y-domain was also confirmed using the CELLO2GO prediction tool, which found it to be potent plasma membrane localization (Additional file 11: Table S11). The SS-bonding states of cysteines in the Y-domain protein sequence predicted 5 cysteines at positions: 46, 91, 121, 122 and 166 (Additional file 12: Table S12). In addition, one functional motif was also detected from the functional study in case of Y-domain but was not found to be a member of any family (Additional file 13: Figure S13).

Fig. 9
figure 9

The signal peptide likelihood was absent in the HEV Y-domain. The analysis was conducted using SignalP-5.0 prediction

3.6.3 Prediction of molecular functions

Previous investigation has reported the Y-domain involvement in the viral replication and pathogenicity; however, lack of extensive data prompted us to explore its other functions. Thus, we explored in detail the molecular function, biological process and cellular component of HEV Y-domain. The identified molecular functions and biological processes are mentioned in Table 6.

Table 6 Functional annotation returned by CELLO2GO for Y-domain

As mentioned in Table 6, binding interactions and catalytic activities were the major molecular functional roles that were attributed to the Y-domain. The identified molecular functions (RNA binding, RNA-directed RNA polymerase activity) and biological processes (viral reproduction, genomic replication, viral protein processing) of the Y-domain suggested its involvement in several crucial cellular processes. The binding interactions, such as nucleotide binding, RNA binding and ATP binding, revealed the propensity of Y-domain to bind to a variety of molecules. The predicted hydrolase further provided compelling evidence regarding the involvement of Y-domain in hydrolase activity similar to our earlier result as predicted by RaptorX. These identified functions and processes further highlighted the significance of Y-domain in HEV life cycle.

3.6.4 Prediction of effect of mutations on protein function

Previous investigation has reported the Y-domain palmitoylation-site (C336C337) and an alpha-helical motif (L410Y411S412W413L414F415E416) indispensability in the life cycle of HEV [11]. Thus, we used two different webservers, i.e. PROVEAN and I-Mutant2.0, to analyse the impact of mutations in the conserved counterparts. Our results from both predictors were in accordance with each other, which clearly indicated functional/structural characteristics of these conserved segments. The amino acid substitutions with predicted effect using PROVEAN webserver are summarized in Table 7.

Table 7 Amino acid mutations with predicted effect using the PROVEAN tool

It has been postulated that the mutations with slightly negative and positive DDG values do not contribute much to the overall stability of the protein structure. However, mutations with highly positive/negative DDG values suggest stabilization/destabilization of the receptor protein (Additional file 14: Table S14). The mutational study results indicated that the highest score was observed in highly conserved cysteine variants (C336C337), situated in the core region of Y-domain. The variant W413 also had high PROVEAN score which again shows the essentiality of this Trp residue in HEV replication.

4 Discussion

Although Y-domain is an important genomic component attributed to HEV replication, its functional implication remains unexplored [9, 11]. In the study presented here, we determined the functional and structural properties of the Y-domain through assessing its phylogenetic relationships, physicochemical properties, secondary and tertiary structure prediction, motif prediction and functional analysis.

The phylogenetic analysis revealed that the Y-domain was very much conserved throughout the evolutionary processes across all genotypes. The physiochemical parameters are vital in deciphering the protein’s characteristics and thus were analysed computationally. The half-life of protein is the time it takes for half of the amount of protein in a cell to disappear after its synthesis in the cell. In this study, the half-life of all the proteins was 30 h. Aliphatic index property plays a role in governing the thermal stability of the protein. Proteins with high aliphatic index values are comparatively more thermally stable with higher content of aliphatic amino acids. Thus, high aliphatic index value (84.33) suggested Y-domain to be a thermostable protein due to the presence of some aliphatic hydrophobic amino acids (Ile, Phe and Trp). Since aliphatic amino acids are hydrophobic in nature, they govern the Y-domain protein–ligand interactions  [30]. Instability index is another factor governing the protein’s nature. A protein whose instability index is less than 40 is predicted as stable, while a value above 40 predicts that the protein will be unstable. The results from this study revealed higher instability index (> 40) of the Y-domain indicating its unstable nature [31, 32]. The protein was predicted to be thermostable and basic due to the presence of higher aliphatic index value and pI of about 9.13. Furthermore, GRAVY is also considered as an important factor for protein in determining its physiochemical properties. The value of GRAVY is between −0.310 and −0.514, and lower values are suggested to have good interactions between water and protein [31, 32]. Therefore, the Y-domain was found to be hydrophilic in nature since its GRAVY value was -0.141. The prevalence of Thr, Leu, Arg, Ala, Val/Ser, Gly and Pro was observed in the Y-domain. Leucine is categorized into the group “regulatory” as this group consists of eight most potent amino acids, such as Tyr, Phe, Gln, Pro, His, Trp and Met [33, 34]. Charged amino acids like Arg is mostly involved in ligand binding [35]. Ser is generally classified as a nutritionally nonessential (dispensable) amino acid but plays an essential role in several cellular processes [36]. It has been well established that Gly residues provide enormous flexibility to the polypeptide chain due to the absence of a side chain [37]. Pro has important structural and functional implications in the proteins. Pro performs important functions like molecular recognition and intracellular signalling [38] . Also, evidence has suggested the role of Pro in essential signalling cascades [38]. Thus, our initial structural analysis in terms of major contributing amino acid residues to the Y-domain structure signifies its role in various regulatory functions.

Further, we carried out the structural analysis of the Y-domain of HEV. The predicted secondary structure by SOPMA showed the presence of α-helices, β-strand and random coils. The results revealed that Y-domain had higher percentage of α-helix than β-strand (40.37% α-helices, 20.64% β-strands and 32.57% random coils). Thus, the presence of Ser and Gly amino acid residues was observed in the Y-domain. The structure prediction theoretically forms the basis in the determination of functions of a novel protein [39,40,41,42,43]. Therefore, we next examined the tertiary structure of the Y-domain via homology modelling. The Y-domain structure prediction was performed successfully, and the generated 3D models were assessed by PROCHECK. After stereochemical evaluation, it was revealed that the 3D structure modelled through RaptorX was of a good quality. (A good quality should have more than 90% residues in favoured region which are attributes of a good quality model.) The modelled 3D structure generated by RaptorX also showed higher percentage of helices as compared to strands (36% of α-helix, 22% of β-strand and 41% of coil). Thus, the modelled Y-domain 3D structure was found to be subsequently stabilized by the secondary elements. To sum up these observations, the structural investigation revealed that the ORF1 Y-domain of HEV is a mixed α/β structural-fold (having higher content of α-helices) with the prevalence of coils. Thus, it is noteworthy to mention that our structural analysis of Y-domain at different levels, i.e. secondary (as predicted by SOPMA) and tertiary (as predicted by 3D model generated by RaptorX), was in good agreement with each other. Secondary and tertiary structures are sometimes bridged by hierarchical gaps in different ways to each other through ‘compounds’ of secondary structure elements. In the Y-domain 3D structure, it was found that this connectivity was made by long loops, called coiled region.

Additionally, identification of clefts, tunnels and pores accessible to ligand molecules is essential in the context of structure-based drug design process [44, 45]. Thus, the modelled structure of the Y-domain (RaptorX) was scrutinized using PDBsum analysis to reveal the presence of binding sites. Interestingly, the occurrence of several clefts and tunnels in addition to a pore was revealed. Clefts are defined as gaps in the protein structure and are important in determining the protein interaction with other molecules [46]. Clefts or pockets present on protein’s surface are sizeable depressions that have tendency to be enzyme active sites [46]. Tunnels are defined as access paths which connects the interior of the protein molecule to the surrounding environment. Tunnels influence the reactivity of the protein and determine the interaction nature and intensity [47]. Thus, the presence of clefts and tunnels also strengthens our analysis, suggesting the commitment of Y-domain towards interaction with other target molecules. Thus, these findings suggest the involvement of Y-domain in protein–protein interactions.

Post-translational modifications (PTMs) are considered as vital requirement for a specific protein in order to carry out its regulation of various functions [48]. PTM includes diverse types of modifications including phosphorylation, glycosylation, ubiquitnation, acetylation, nitroslation, etc. [49]. The Y-domain 3D-model was predicted with some motifs which also included modified sites such as glycosylation, phosphorylation and myristoylation. Such interactions have been shown to contribute to cellular signal transduction regulation, protein phosphorylation as well as transcription and translation  [50,51,52]. Protein phosphorylation constitutes an essential mechanism for the proper establishment of an infection cycle in several intracellular pathogens [53, 54]. Phosphorylation is required for protein folding, signal transduction, intracellular localization PPIs, transcription regulation, cell cycle progression, survival and apoptosis [48, 55, 56]. As suggested in previous reports, attachment of a myristoyl group regulates cellular signalling pathways in several biological processes [51]. Also, the presence of glycosylation has been shown to modulate the intracellular signalling machinery [52]. From these findings, it is noteworthy to mention that Y-domain could perform crucial regulatory functions by interacting with the other viral and host components and thus signifies its essentiality in HEV pathogenesis.

Furthermore, algorithm-based approaches were employed to examine the changes in protein stability in response to mutations. Previous investigation has reported the Y-domain palmitoylation-site (C336C337) and alpha-helical segment (L410Y411S412W413L414F415E416) indispensability in the life cycle of HEV [11]. Therefore, a combination of two different online predictors, i.e. PROVEAN and I-Mutant2.0, was used in order to increase the accuracy of the predicted results. These two different webservers examined the effect of single point mutation in these Y-domain conserved counterparts (palmitoylation-site and alpha-helical segment). PROVEAN server predicts whether a variation in the sequence of a protein affects its function [5758]. I-Mutant predicts the changes in the stability of protein upon single point mutations (https://folding.biofold.org/i-mutant/i-mutant2.0.html). The PROVEAN tool considered these mutations as deleterious, which shows similarity with earlier investigations [11]. Additionally, I-Mutant2.0 analysis also revealed seven highly negative mutations, suggesting their destabilizing effect on the target Y-domain. Thus, in silico mutational analyses revealed that amino acid changes in the conserved regions may alter the secondary structure of Y-domain that might affect the structure–function relationship. Thereby, the overall virus infectivity may be affected accordingly. Our predicted molecular functions suggested the involvement of Y-domain in RNA binding, RNA-directed RNA polymerase activity, which clearly revealed its involvement in significant processes of HEV replication. Moreover, the identified hydrolase activity among molecular functions substantiated our earlier results that revealed the best-chosen template as a hydrolase, and further provided compelling evidence regarding the involvement of Y-domain in hydrolase activity. Furthermore, the identified biological processes, such as RNA processing, viral protein processing, its replication and reproduction, were in accordance with earlier findings [11, 59]. Thus, our gene ontology findings show consistency with the previous investigation [11].

To sum up these observations, it can be concluded that our proposed hypothesis is further substantiated by the existing literature that has demonstrated the critical role Y-domain in the life cycle of HEV.

5 Conclusions

The non-structural ORF1 Y-domain plays an essential role in the intracellular membrane binding and replication of HEV. Due to the presence of two conserved segments (potential palmitoylation-site and alpha-helix segment), the Y-domain serves as indispensable and essential component in the process of HEV life cycle. Therefore, structural and functional analysis of Y-domain was conducted to further provide clarity into its role in the viral pathogenesis. The in silico analysis revealed that the Y-domain was unstable, hydrophilic and basic in nature. We modelled the 3D structure of the Y-domain of HEV to assist further in-depth analysis. The structural analysis revealed mixed α/β structural fold of the Y-domain having higher percentage of alpha-helices with the predominance of random coils. The mutational analysis suggested that mutations in the conserved segments may affect the overall structure of the receptor that might affect function of the protein. Our gene ontology findings on Y-domain showed its involvement in several binding and catalytic activities as well as significant biological processes in accordance with the previous report. In addition, the detailed experimental confirmations of these analyses are envisaged towards a better understanding of the HEV life cycle. Our data can be used as initial platform for further research in order to determine the structural characteristics of Y-domain of HEV.