Introduction

Hepatitis E is a zoonotic liver disease infecting 20 million people and causing 44,000 deaths yearly [1]. Hepatitis E virus (HEV) has been detected in a wide range of species and is the sole hepatotropic virus with documented zoonotic transmission [2]. HEV members belong to the family Hepeviridae which is subdivided into Orthohepevirinae and Parahepevirinae subfamilies. The Orthohepevirinae subfamily is divided into four genera (Avihepevirus, Chirohepevirus, Paslahepevirus and Rocahepevirus) of which infect mammals and birds, while the Parahepevirinae subfamily has the Piscihepevirus genus of which infects trout and salmon. The members of the Paslahepevirus balayani species are divided into eight HEV genotypes. Genotypes 1–4 are associated with self-limited acute hepatitis with exclusively human-human (HEV-1 and HEV-2), or zoonotic transmission (HEV-3/HEV-4) [3, 4]. Genotypes 5–8 infect wild mammals [5,6,7] with one single report of HEV-7 in human, so far [8].

HEV subtyping criteria have been debated mainly because of their subjectivity and lack of consistency [9,10,11]. Although subtyping is not within the remit of the International Committee on Taxonomy of Viruses (ICTV), the ICTV Hepeviridae Study Group has tried to develop an agreed upon set of subtype reference sequences to improve clarity for researchers, epidemiologists and clinicians [12]. Currently, viruses in genotype 3 of the species Paslahepevirus balayani are classified into 13 recognized subtypes. HEV unassigned sequences can be recognized as new proposed subtypes if at least three complete genomic sequences from epidemiologically unrelated source form a distinct phylogenetic group [13]. The distribution of genotypes and subtypes differs between geographical regions and hosts [13, 14] and may be associated with different clinical outcomes [15, 16]. Although distance-based criteria for assigning new sequences to subtypes can be used, their limits are not clearly established. Therefore, sequences are assigned to subtypes according to phylogenetic position compared to reference sequences [10, 13].

In South America, due to the low number of partial and complete sequences (~ 340 entries compared to ~ 11,000 from Europe) deposited in GenBank, there is a lack of knowledge about HEV genetic variability (Supplementary Table 1). All previously reported sequences obtained clustered into genotype 3 and were detected in swine herds, pork products, human cases, experimental studies in primates or environmental samples. The few partial sequences obtained for Brazilian sequences indicate that they cluster into subtypes 3b, 3c, 3d, 3f, 3h and 3i. However, the absence of complete genomes prevents accurate analysis of HEV genetic diversity in Brazil. Therefore, in this study, we sequenced available HEV positive samples from previous studies to obtain complete genomic sequences.

Materials and methods

Sera and feces samples were obtained from six positive swine and one human sample previously reported in studies from the northeastern, southeastern, and southern regions in Brazil [17,18,19,20,21]. The study with human sampling was approved by the Fiocruz institutional ethical committee (protocol PO 307/06) [18]. Sample data, including access numbers, hosts, geographical area, genotype and reported subtype are detailed in Table 1. RNA was extracted and purified from stool or serum samples using TRIzolTM Reagent (Invitrogen) following the manufacturer’s recommendations. cDNA was synthesized with the ImProm-II™ Reverse Transcription System (Promega). RT-PCR was performed using the SuperScript™ III One-Step RT-PCR System with Platinum™ Taq DNA Polymerase kit (Invitrogen) following the manufacturer’s instructions. Primers were designed based on template complete genomes of sequences genetically close to the 156 Brazilian partial HEV-3 sequences available in the GenBank.

Table 1 Sample origin and subtyping

The complete genome of PRsw1 and PEsw1 were obtained by Sanger sequencing, and the other positive samples were also initially sequenced by Sanger method (primers available in Supplementary Table 1). The failed runs were further tentatively sequenced using DNA-based High-throughput sequencing (HTS) using MiSeq (Illumina) [22]. Sequencing libraries were prepared using Nextera XT Library Kit, and sequencing was performed using a paired-end strategy with a V3–150 cycles flow cell. Low-quality sequencing reads were removed using the default parameters of Trimmomatic 0.36 [23]. Assembly was performed using Velvet 1.0.10 [24], default parameters. Sequence data were compared to all HEV genotype 3 complete or nearly complete genomes (genome size > 6.5 kb; n = 747) available in GenBank as of 28.03.2023, including the references for each HEV-3 subtype [12, 13]. Aiming to investigate the amount of unresolved to fully resolved trees, the phylogenetic signal was verified through likelihood mapping analysis of 10,000 random quartets [25]. Then, phylogenetic reconstruction based on Maximum Likelihood was performed using IQ-Tree server (http://iqtree.cibiv.univie.ac.at/), through default settings, using the best-fit substitution model TIM2 + F + R4 with 1000 ultrafast bootstrap replicates.

Results and discussion

We were able to sequence two complete genomic sequences (PRsw1 and PEsw1) (Table 1) and, four nearly complete porcine-derived genomic sequence (RJsw1, RJsw2, RJsw3 ad RJsw4), and a fragment of approximately 1000 nucleotides of the first recorded human-derived sequence in Brazil (RJh1). (Fig. 1a). The two complete genomic sequences exhibited the typical features of an HEV genome and were 7215 and 7233 nucleotides length, respectively. Both HEV-3 genomes contain three partially overlapping open reading frames, with ORF3 overlapping ORF1 and ORF2 (Fig. 1b).

Fig. 1
figure 1

Sample location and HEV genome organization. (A) Geographic distribution of HEV sequenced in domestic pigs sampled in different regions from Brazil. Map built with Quantum GIS (http://qgis.osgeo.org). (B) Comparison of the genome organization of the two complete sequences obtained in this study. Complete sequence length is given in nucleotides and methyltransferase (Met), a putative papain-like cysteine protease (PCP), hypervariable (HVR) and X domains, RNA helicase (Hel), and the RNA-dependent RNA polymerase (RdRp) are indicated

Because we were not able to obtain complete genomic sequences of all positive samples and considering the epidemiological importance of the genetic information, we performed further analyses with multiple datasets to compare with HEV subtype reference sequences. Accordingly, we compared the two complete genomic sequences obtained, the nearly complete genomic sequences and the complete capsid sequence of all six sequences from pigs to a set of HEV-3 reference sequences (Fig. 2a and b). Our analysis suggests that full genome sequences from South (PRsw1) and Southeastern (RJsw1), previously classified as HEV-3 subtype i, likely represent a new subtype with nucleotide divergence of 15.5% from subtype ‘m’ together with two other sequences found in Uruguay (MW596896 and MZ969073) (Fig. 3; Supplementary Table 2). Moreover, the nucleotide divergence of PEsw1 from the northeast region, previously classified as subtype 3f in our previous study [20], differed from the HEV-3f reference sequence at 12.0%, but remained in the same subtype classification. Interestingly, both nearly-complete-genome- and capsid-based analysis suggests that strain RJsw2 (from Southern region) clustered between the subtype 3i and a group of unassigned sequences represented by MF959764 (UN-V). Strains RJsw3 and RJsw4, from the same region, are related to subtype 3b (Fig. 2b). On the one hand, the robust results on the capsid-based phylogeny corroborate that ORF2 sequences can be alternatively used in the absence of full genomic sequences (Fig. 2c), as previously suggested [11, 20, 26], including with subgenomic regions of ORF1 and ORF2 optimized for robust phylogenetic inference and subtyping [26]. By contrast, this illustrates the high genetic variability of the HEV strains circulating in Brazil and upholds the need to increase the number of complete genomic sequences from different samples from Latin America. Overall, the consistent topology derived from multiple inferences supports the robustness of the new proposed subtype.

Fig. 2
figure 2

Phylogenetic trees based on nucleotide sequences of complete genomic sequences (A) nearly complete (B), complete capsid gene (C) and partial capsid (D) sequences. SH-aLRT/aBayes/ultrafast bootstrap supports values of ≥ 0.80. Detailed information on the evolutionary divergence between our sequences and the reference sequences for each dataset is available on the supplementary Table 2. Reference sequences for HEV-3 subtypes including the remaining unsigned (un) HEV-3 sequences: AF082843 (3a), AP003430 (3b), FJ705359 (3c), AB248521(3e), AB369687(3f), AF455784 (3g), JQ013794 (3h), FJ998008 (3i), AY115488 (3j), AB369689 (3k), JQ953664 (3l ), KU513561(3m), FJ906895 (3ra), AB290313(3-unI), MF959765 (3-unII), LC260517 (3-unIII), MK390971 (3-unIV), MF959764(3-unV), KP294371 (3-unVI)

Fig. 3
figure 3

Phylogenetic tree based on nucleotide sequences of nearly complete genomic sequences, including 747 HEV genotype 3 sequences. To improve visualization, subtype branches are collapsed, hence sequences RJsw3 and RJsw4 clustering into subtype 3b and PEsw1 clustering into subtype 3f are omitted. Detailed information, including all sequences used, the alignment, and the complete tree, is available in the supplementary material. SH-aLRT/aBayes/ultrafast bootstrap supports values of ≥ 0.80 were replaced by a circle. Reference sequences for HEV-3 subtypes including the remaining unssined(un) HEV-3 sequences: AF082843 (3a), AP003430 (3b), FJ705359 (3c), AB248521(3e), AB369687(3f), AF455784 (3g), JQ013794 (3h), FJ998008 (3i), AY115488 (3j), AB369689 (3k), JQ953664 (3l), KU513561(3m), FJ906895 (3ra); unassigned sequences are AB290313, MF959765, LC260517, MK390971, MF959764, KP294371

To investigate the zoonotic origin of the first autochthonous human case of acute hepatitis E reported in Brazil, we compared the partial fragments of the capsid and the ORF1. Phylogenetic analyses using the ORF2 fragment, as well as the previously published ORF1 fragment, indicated that the sequence from the human patient (RJh1) clusters among those from pigs from southeastern region (RJsw3 and RJsw4) with nucleotide identities of 99.5 and 99.6%, respectively (Fig. 2d; Supplementary Fig. 1 and Supplementary Table 2). In the previous study [18], only small fragments of ORF1 and ORF2 were available, and by providing a larger fragment of ORF2, our data support the robustness of the evolutionary inferences and the evidence of zoonotic transmission. Therefore, regardless of whether strains RJsw3 and RJsw4 represent a new subtype or not, our results show strong evidence that the first reported case of HEV in Brazil is likely of zoonotic origin.

Noteworthy, our study is limited by the lack of success in obtaining larger fragments of four sequences which were probably related to the volume availability and conservation status of the samples. However, except for the human sample, the study provided a good amount of retrospective information that allowed robust analysis. Although our findings and those previously published support that the capsid-based analyses are robust, as recommended by the ICTV Hepeviridae Study Group, it would be necessary to obtain full genomic sequence before proposing a new subtype [12].

Finally, the genetic variability of HEV in Brazil, along with the occurrence of zoonotic transmission, illustrates the challenges and lack of knowledge of HEV epidemiology in South America. Therefore, further studies, including more representative sequences, are needed to investigate the genetic diversity of HEV in humans, animal reservoirs, food, and environmental samples to obtain an accurate picture of the genetic variability of HEV in South America and its implication in zoonotic transmission and pathogenicity.