Porcine epidemic diarrhea virus (PEDV) is an enveloped single-stranded, positive-sense RNA virus that is taxonomically classified within the family Coronaviridae, genus Alphacoronavirus [16]. The virus is the causative agent of porcine epidemic diarrhea (PED), which was first reported in Belgium and the United Kingdom in 1978 [16]. PEDV causes a devastating enteric disease that is characterized by watery diarrhea, dehydration and significant mortality in piglets, thereby resulting in tremendous economic losses to the swine industry in Europe and Asia, including Japan, Korea, and China [6, 7, 17, 20].

The PEDV genome is approximately 28 kb in length and contains seven open reading frames (ORFs), which encode replicase 1a and 1b, the spike (S), ORF3, envelope (E), membrane (M), and nucleoprotein (N), arranged in the order 5′-replicase(1a/1b)-S-ORF3-E-M-N-3′ [14]. Two long ORFs (ORF1a and ORF1b) encode the non-structural replicase polyproteins (replicase 1a and 1b), which occupy two thirds of the genome. Genes for the major structural proteins S, E, M, and N are located downstream of the replicase gene. The S glycoprotein makes up the large surface projections of the virion and plays an important role in binding to specific host cell receptor glycoproteins, with subsequent penetration into the cells occurring via membrane fusion. The S glycoprotein also stimulates production of neutralizing antibodies by the host [8]. The M protein is essential for viral envelope formation and release. It not only induces antibodies that neutralize the virus in the presence of complement [18] but also stimulates the production of interferon-α (IFN-α) [13]. In addition, the ORF3 gene is the only accessory gene, and it has been suggested to be an important determinant of virulence in PEDV. The virulence of the virus can be reduced by altering the ORF3 gene through cell culture adaptation, in a manner similar to that for transmissible gastroenteritis virus (TGEV) [19, 23], and its differentiation may be a marker of adaptation to cell culture and virus attenuation [15, 19]. Thus, differentiation of the ORF3 gene could be a valuable tool for molecular epidemiological studies of PEDV [6, 15, 19].

To reveal the characteristics of this virus and determine more precisely the relationships among the PEDV strains currently circulating in China and other PEDV strains, the complete genomic sequence of the SHQP/YM/2013 strain was determined and analyzed.

Intestinal tracts were collected from dead piglets during an outbreak of diarrhea among immunized swine (inactivated transmissible gastroenteritis [TGEV H] and porcine epidemic diarrhea [CV777]) on a breeding farm in Shanghai in 2013. These intestinal samples were confirmed to be positive for PEDV using a commercial real-time reverse transcription polymerase chain reaction (RT-PCR) kit (SuoAo, Beijing, China).

The SHQP/YM/2013 strain was characterized further by complete genome sequencing. Primers were designed to anneal to sites that are highly conserved among PEDV sequences available in the GenBank database, and the complete genome sequence was determined using a primer-walking strategy. Viral RNA extraction, RT-PCR amplification, and cloning were performed according to conventional protocols. The 5′ and 3′ end regions were amplified using 5′ and 3′ full RACE kits (TaKaRa, Dalian, China) according to the manufacturer’s instructions. These plasmids were sequenced by the Invitrogen Company (Life Technologies, Shanghai, China). Each nucleotide was identified from three identical results. Sequences were assembled and analyzed using the DNASTAR software package (DNASTAR Inc., Madison, WI, USA).

Phylogenetic analysis based on the seven coding regions of the SHQP/YM/2013 strain and the other PEDV strains available in the GenBank database was carried out. Phylogenetic trees were constructed by the neighbor-joining method using MEGA, version 4 [21]. The topology of trees based on the nucleotide sequences was obtained by majority-rule consensus using 1000 bootstrap replicates, shown as percentages, and bootstrap values greater than 60 % were considered statistically significant for grouping. The newly characterized sequence has been deposited in the GenBank database under the accession number KJ196348.

The complete genomic sequence of SHQP/YM/2013, excluding the poly (A) tail, comprises 28,038 nucleotides (nt), with 5′ (292 nt) and 3′ (334 nt) ends containing untranslated regions (UTRs). The entire genome has a GC content of 41.81 %. The genomic organization of SHQP/YM/2013, which is typical of all previously characterized PEDV strains, is summarized as follows: 5′ UTR-replicase (1a/1b)-spike (S)-ORF3-envelope (E)-membrane (M)-nucleoprotein (N)-3′ UTR (Table 1). UTRs were found at both ends: the 5′ UTR was 292 nt long with a relatively high GC content of 44.86 % when compared with the whole genome. The 3′ UTR of 334 nt was present immediately downstream of the N gene of the genome and had 46.41 % GC content. Seven coding regions, respectively, encode four structural proteins (S [nt 20,634 to 24,794], E [nt 25,449 to 25,679], M [nt 25,687 to 26,367], and N [nt 26,379 to 27,704]), two nonstructural proteins (replicase 1a [nt 293 to 12,601] and replicase 1b [nt 12,601 to 20,637]), and the single accessory protein, ORF3 (nt 24,794 to 25,468).

Table 1 Differences in genome organization between the SHQP/YM/2013 strain and the CV777 strain

The nucleotide and amino acid (aa) sequence identities of different regions of the SHQP/YM/2013 genome were compared with another 32 PEDV strains (Table 2). The SHQP/YM/2013 genome exhibited identities ranging from 96.6 % to 99.8 % to other strains; the highest level of similarity, with JS-HZ2012, was 99.8 %. The 5′ UTR of SHQP/YM/2013 shared 96.6 %–99.3 % identity with other PEDV strains. The 3′ UTR of SH/YM/2013 showed greater genetic conservation than the 5′ UTR and displayed 97.3 %–99.8 % homology to other PEDV stains.

Table 2 Nucleotide and amino acid sequence identity (%) of different regions of the SHQP/YM/2013 genome compared with those of the other viruses (values in bold show highest identities)

Two open reading frames, ORFs 1a and 1b, reside in the first two thirds of the SHQP/YM/2013 genome and are translated into non-structural proteins 1a and 1b. Replicase 1a was predicted to encode a protein of 4103 aa, while replicase 1b was predicted to encode a protein of 2678 aa. Nucleotide sequence analysis revealed that there were no deletions or insertions in the replicase gene of any of the PEDV strains. Both the replicase 1a and the replicase 1b proteins were highly conserved, while the replicase 1b protein exhibited the highest amino acid sequence similarity among strains (99.0 %–99.9 %) (Table 2). The frameshift “slippery sequence” UUUAAAC [4] in ORF1b was identified in SHQP/YM/2013, as shown in Fig. 1. The sequences downstream of UUUAAAC were predicted to form a pseudoknot to support the translational frameshift [4, 10, 12]. Based on analysis of the nucleotide sequences of the PEDV ORF1a and ORF1b, all of the PEDV strains could be divided into three groups (Fig. 2A and B). Group 1 comprised one Chinese strain (LZC), one Korean strain (SM98) and one European strain (CV777). Group 2 consisted of vaccine strains (attenuated DR13) and two Chinese PEDV field strains (SD-M and JS2008). Group 3 was made up of virulent DR13, 10 USA strains isolated in 2013 and 16 Chinese strains (except CH/S), which were isolated from China during PED outbreaks in 2011–2013.

Fig. 1
figure 1

The putative ribosomal frame shift region. The nucleotide sequence covers the pseudoknot structures of PEDV reference strains. The putative slippery sites are represented by the dotted-line box, the stems are boxed, the loops are indicated by thick dark lines, and the 144 nucleotides (N144) are boxed in gray

Fig. 2
figure 2figure 2figure 2

Phylogenetic analysis using the neighbor-joining method based on nucleotide sequences of different genes (A: ORF1a; B: ORF1b; C: S; D: M; E: N; F: ORF3) of PEDVs. Bootstrapping for 1,000 replicates with a value >60 % was performed to determine the percentage reliability of each internal node. PUR46-MAD is an outgroup control. The scale bar indicates nucleotide substitutions per site. The sequence of the SH/QPYM/2013 strain is indicated by the black triangle

The S protein gene was located immediately downstream of ORF1b; it encoded a predicted protein of 1386 amino acids. It contained 4161 nt and was therefore 9 nt longer than that of the PEDV reference strain CV777 (Table 1). Compared with CV777, two insertions (at positions 56–59 and 140) and one deletion (at position 156–157) were observed (Fig. 3). The S gene shared 93.8 %–99.4 % nucleotide sequence identity with those of other PEDV strains (Table 2). Phylogenetic analysis of the S gene nucleotide sequences revealed that all PEDV strains in this study could be separated into three groups (Fig. 2C): Group 1 comprised one Japanese strain (NK) and seven strains from South Korea (KNU-0801, KNU-0901, KNU-0903–KNU-0905, Spk1 and Chinju 99). Group 2 comprised two vaccine strains (the attenuated strains DR13 and CV777 vs), two Europe strains (CV777 and Br1/87), two Korean field strain (SM98 and virulent DR13), three strains from Japan (parent 83p-5, 100th-passaged 83p-5 and MK) and 12 strains from China, including six field strains from 2011. SHQP/YM/2013 belonged to group 3, which also including four Korean field strains (KNU-0802, KNU-0902, CNU-091222-01 and CNU-091222-02) from 2008–2009, 10 USA field isolates from 2013, and 15 Chinese strains, which were isolated during severe PED outbreaks in China during 2011–2013.

Fig. 3
figure 3

Amino acid sequence alignment of the S glycoprotein genes of the SHQP/YM/2013 and PEDV reference strains. The dashes (−) indicate the deleted sequences. Insertions and deletions in PEDV isolates are boxed

The E gene was 231 nucleotides long, encoding a protein of 76 amino acids. It shared 93.5 %–98.7 % amino acid sequence identity with other PEDV strains. The M gene was 681 nucleotides long and was able to make a protein of 226 amino acids. It had no nucleotide deletions or insertions but did show point mutations. The M gene had a conserved ATAAAC sequence 11 nucleotides upstream of the initiator ATG, as previously recognized in Br1/87 [9]. There was a highly conserved domain of 12 amino acids (SWWSFNPETDAL) located at aa 108 to 119. These residues are almost completely conserved across the entire family Coronaviridae [1]. Sequence analysis of the complete M gene showed that the PEDVs fell into three groups (Fig. 2D). The SHQP/YM/2013 strain belonged to the third group, which comprised three Japanese strains (JMe2, parent 83p-5, 100th-passaged 83p-5), 10 Korean strains, 10 USA field isolates from 2013, and all Chinese isolates (excluding LZC).

The N gene was 1326 nucleotides in length, encoding a polypeptide of 441 amino acids. The amino acid sequences had 93.2 %–99.3 % identity to those of other PEDV strains, and the highest identity (99.3 %) to BJ-2011-1, CH/FJZZ-9/2012 and JS-HZ2012. The N protein of SHQP/YM/2013 had 16 amino acid substitutions compared with CV777. All PEDV strains could be divided into two groups based on their N gene sequences (Fig. 2E). Group 2 comprised 10 USA field isolates from 2013 and 16 Chinese strains, including SHQP/YM/2013, isolated from China during 2004–2013.

The ORF3 gene, which is an accessory gene, is located between the structural genes for S and E. It had a single ORF of 675 nucleotides encoding a protein of 224 amino acids. The ORF3 gene had a conserved sequence (CTAGAC) 46 nucleotides upstream of the initiator ATG, similar to that described above for the M gene. Based on phylogenetic analysis of ORF3, all of the PEDV strains could be divided into three groups (Fig. 2F). One group had two subgroups (1-1 and 1-2). Seven Chinese field isolates from 2011–2012 formed one subgroup (1-1), and three Chinese field isolates (CH/GSJIII/07, JS2008 and SD-M) formed the second subgroup (1-2), together with vaccine strains (the attenuated DR13 and CV777 vs) and the DBI865 Korean field isolate. The second group included the CV777, Br1/87, SM98 and LZC strains. The third group, containing six Korean field isolates, 10 USA field isolates from 2013 and 19 Chinese isolates, had two subgroups (3-1 and 3-2). Two Korean strains (virulent DR13 and Chinju99) and the Chinese field isolate formed one subgroup (G3-1), and 18 Chinese field strains, including SHQP/YM/2013, formed the second subgroup (G3-2) with 10 USA field isolates from 2013 and four Korean field isolates from 2007.

As mentioned above, SHQP/YM/2013 was obtained in 2013 from a piglet with severe diarrhea on a vaccinated (inactivated transmissible gastroenteritis [TGEV H] and porcine epidemic diarrhea [CV777]) farm in eastern China. According to the investigation of the pig farm during the outbreak, about 90 % of piglets presented typical clinical signs of PED, and most of them died, whereas none of the sows immunized with CV777 showed any clinical signs. However, we also detected PEDV in sows’ milk, which suggests vertical transmission of the virus, as reported previously [20].

We have determined the complete genomic sequence of SHQP/YM/2013 and analyzed the phylogenetic relationships among PEDV strains. According to the complete M gene sequence, all Chinese PEDV strains (except LZC) are closely related to group 3, and SHQP/YM/2013 is especially closely related to 10 strains obtained from the USA in 2013 and two Korean field strains isolated in 2007.

The most highly variable regions were located in the S and ORF3 genes. As in other coronaviruses, the S and ORF3 genes have been hypothesized to be important in the virulence and pathogenesis of PEDV infections [3, 8, 19, 22]. The S protein is known to play pivotal roles in interacting with cellular receptors to mediate viral entry and in inducing neutralizing antibodies in the natural host [2, 5, 11]. The same amino acid insertions and deletion of the S gene were observed among prevailing PEDV strains (except SD-M, JS2008, CH/BJSY/2011 and CH/JL/2011) in China and two Korean (KNU-0802 and KNU-0902) strains isolated in South Korea during 2008–2009. Our study showed that all of the PEDV strains fell into three groups. Recent prevalent Chinese PEDV field isolates were divided into two different groups (group 2 and group 3), which shows that two different genotypes are prevalent in China. The prevalent PEDV isolates in China, including SHQP/YM/2013, are most closely related to 10 USA strains from 2013 and four Korean field strains from 2008–2009.

The complete ORF3 gene sequence and phylogenetic analysis showed that all of the Chinese PEDV strains fell into three groups, and again the recent prevalent Chinese PEDV field isolates were divided into two different groups (group 1 and group 3). Eighteen Chinese field strains, including SHQP/YM/2013, formed the second subgroup (G3-2) with 10 USA field isolates from 2013 and four Korean field isolates from 2007. These strains are genetically different from the CV777 and attenuated DR13 vaccine strains (G1-2), which have been used for prevention of PEDV infection in China. Therefore, recent prevalent Chinese PEDV field isolates represent a new genotype that differs from the genotype that includes the vaccine strains. Based on phylogenetic analysis of the M gene, S gene and ORF3 gene, our study has demonstrated that the prevalent PEDV isolates in China may have originated from Korean strains.

Since October 2010, outbreaks of PED characterized by high morbidity (approaching 100 %) and mortality (80 % to 100 %) in newborn piglets have emerged in China [20]. Although the bi-combined commercial vaccines (inactivated transmissible gastroenteritis [TGEV H] and porcine epidemic diarrhea [CV777]) against TGEV and PEDV infection are used on swine farms, PED still occurs in immunized pig herds in China. This shows that these vaccines are no longer able to confer protection. Therefore, it is necessary to investigate further the molecular biology, as well as the mechanisms of immunogenicity and pathogenesis, of PEDV. There is an urgent need to develop more effective vaccines to prevent outbreaks of PEDV-induced diarrhea.