Introduction

Porcine transmissible gastroenteritis virus (TGEV) is an enteropathogenic coronavirus. Like other coronaviruses, it is a pleomorphic enveloped virus that contains a large, positive-sense single-stranded RNA genome [20]. It is a major pathogen that replicates in the cytoplasm of villous epithelial cells in the small intestine, leading to severe villous atrophy and malabsorptive diarrhea and resulting in significant economic losses in the swine industry, where the mortality rates may reach 100% in the newborn piglets [23]. TGEV was identified as an etiological agent of transmissible gastroenteritis in swine in the United States in 1946 [7] and was reported in many swine-producing countries between the late 1970s and the 1980s, including England, Japan, China, Belgium, Africa and Australia [13, 28, 34]. The genomic sequence length is about 28.5 kb [33]. The genome contains nine open reading frames (ORFs) encoding four structural proteins (spike [S], envelope [E], membrane [M] and nucleoprotein [N]) and five nonstructural proteins (replicase 1a and 1b, 3a, 3b, and protein 7) arranged in the genome in the order 5′-replicase (1a/1b)-S-3a-3b-E-M-N-7-3′ [9].

About two-thirds of the coronavirus genome is devoted to encoding the viral replicase, which mediates coronavirus replication, transcription and translation [21]. The ORF1 region of the replicase gene is composed of two opening frames, ORF1a and ORF1b, and the translation of the ORF1a/1b polyprotein involves an efficient ribosomal frameshifting activity [27]. The S protein is highly glycosylated and is believed to be the viral attachment protein. It interacts with porcine aminopeptidase N, which acts as a cell receptor for TGEV [17]. The interaction between coronavirus S protein and eIF3f plays a functional role in controlling the expression of host genes, especially genes that are induced during coronavirus infection [35]. PRCV has a large deletion in the S gene, resulting in the loss of major antigenic sites B and C in the S protein [10]. The cell-culture-passaged TGEV strain TOY56 has point mutations in the spike gene that cause a shift from intestinal to respiratory tract tropism [29]. The mutation in the spike protein may be an important indicator for evaluating the tropism and virulence of TGEV. The M protein, the main virion membrane protein, is mainly embedded in the lipid vesicle membrane and is connected to the capsular membrane during viral nucleocapsid assembly [15]. The E protein is a membrane-spanning protein, while the N protein is found within the viral membrane. The N protein has been shown to interfere with interferon signaling through various mechanisms [16, 19]. It has also been observed that the N protein of several coronaviruses can localize in the nucleolus, where it may perturb cell cycle activities of the host cell for the benefit of viral mRNA synthesis [6, 37]. The ORF3 is composed of two opening frames, ORF3a and ORF3b, and deletions in ORF3a are found in many TGEV strains and strain PRCV [14, 22]. Some studies have suggested that while ORF3a is not essential for virulence, the deletion of this gene may affect viral virulence and tissue tropism [32]. The ORF7 counteracts host-cell defenses and affects TGEV persistence, increasing TGEV survival through the negative modulation of downstream caspase-dependent apoptotic pathways [3, 4].

The objective of this study was to determine the complete genomic sequence of strain TGEV SHXB isolated in Shanghai, China. To distinguish strain TGEV SHXB from other domestic and international strains at the molecular level, we analyzed the differences between the nucleotide and deduced amino acid sequences of structural and nonstructural proteins of the TGEV strains as well as the PRCV strain ISU-1. This will enhance our understanding of the evolution of coronaviruses and lay the foundation for further development of a genetically engineered TGEV vaccine.

Materials and methods

Virus, cell culture, and virus passages

TGEV SHXB was isolated from porcine intestinal contents. STC cells were grown in Dulbecco’s modified Eagle medium (DMEM, HyClone, USA) supplemented with 10 % fetal bovine serum (FBS, GIBCO) and were maintained in maintenance medium (DMEM supplemented with 2 % FBS) at 37 °C in a 5 % CO2 incubator. Strain SHXB was passaged ten times in STCs.

Extraction of genomic TGEV RNA and RT-PCR

When virus-infected STCs showed 70-80 % cytopathic effect (CPE), cell culture flasks were frozen and thawed three times, and cell debris was pelleted by centrifugation for 20 min at 7000 × g. Culture supernatants from infected cells were collected and used for preparation of viral RNA. Total RNA was extracted using TRIzol Reagent (Invitrogen) according to the manufacturer’s instructions. The extracted RNA pellet was washed with 1 ml of 75 % alcohol, collected by centrifugation for 10 min at 8000 × g, and dried for about 5 min, and the resulting RNA pellet was resuspended in 20 µl of diethylpyrocarbonate (DEPC)-treated deionized water. Viral cDNA was generated by reverse transcription using PrimeScript Reverse Transcriptase (TaKaRa) according to the manufacturer’s instructions.

Extraction of the viral genome sequence

Specific oligonucleotide primers were designed based on the sequence information available in GenBank for TGEV strains H16 (FJ755618.2) as a reference genome (Table S1). As the virus sequence is too long to clone in its entirety, the viral genome was divided into 27 segments. The length of each segment was 1000-1500 nt, and there was 200-to 400-nt overlap between segments. Primers were designed using Primer 5. For amplification of the TGEV subclones, reactions were carried out in a total volume of 50 µl containing 10 µl of 5× Q5 reaction buffer, 4 µl of 2.5 mM dNTPs, 0.5 µl of Q5 fidelity polymerase (New England BioLabs), 2.5 µl of each specific primer, 1 µl of viral cDNA and sterile deionized water. The PCR protocol was as follows: denaturation for 30 seconds, followed by 35 cycles at 98 °C for 7 s, 55-72 °C for 15 s, depending on the primers used, based on optimal annealing temperatures predicted using NEB Tm Caculator (www.neb.com /Tm Caculator), and 72 °C for several minutes, depending on the size of the PCR product, and finally, an elongation step at 72 °C for 2 min. The 5′ and 3′ ends of the viral genome were confirmed by rapid amplification of cDNA ends using a 3′ and 5′-Full RACE Core Set with PrimeScript (TakaRa). The PCR products were purified using a Gel Extraction Kit (Omega), cloned into the pJET1.2 vector (Thermo), and used to transform E. coli DH5α. Three positive bacteria (colony PCR validation) were sequenced by Invitrogen.

Sequence analysis

Sequence data were assembled and analyzed using Lasergene software. Multiple sequence alignments were made using the Clustal W method. Phylogenetic trees were constructed by the neighbor-joining method using the MegAlign program from the DNASTAR software package (Version 7.1.0, DNASTAR Inc., USA). The reliability of the neighbor-joining tree was estimated by bootstrap analysis with 1000 replicates. The nucleotide and the amino acid sequences of strain TGEV SHXB were compared with the corresponding sequences of TGEV strains in the GenBank database. Sequences were analyzed using the computer program MEGA version 5.0. The TGEV strains used in this study were – Virulent Purdue, DQ811789; Purdue P115, DQ811788; PUR46-MAD, NC_002306; Miller M60, DQ811786; Miller M6, DQ811785; SC-Y, DQ443743; TS, DQ201447; H16, FJ755618; Attenuated H, EU07421; WH-1, HQ462571.1; PRCV- ISU-1, DQ811787.1.

Results

Complete genome sequence of TGEV SHXB

The full-length genome sequence of strain SHXB was deduced by combining the sequences of several overlapping cDNA fragments. The genome sequence of strain SHXB was 28,571 nucleotides (nt) long, including the poly A tail. The 5′ portion of the genome (nt 1-20,368) contained the 314-nt non-translated region (NTR), ORF1a (nt 315-12,368), and ORF1b (nt 12,326-20,368) encoding the viral RNA-dependent RNA replicase. The structural proteins S, E, M and N were found to be encoded by ORFs S (nt 20,365-24,708), E (nt 25,857-26,110), M (nt 26,035-26,904), and N (nt 26,917-28,065), respectively. The three non-structural proteins were ORF3a (nt 24,827-25,042), ORF3b (nt 25,136-25,870), ORF7 (nt 28,071-28,307), respectively. The 5’ NTR consisted of 314 nt and included a potential short AUG-initiated ORF (nt 114-121), beginning within a Kozak context (UCUaugA) (Fig. 1a). The 3′ end of the genome contained a 264-nt untranslated sequence and the poly (A) tail. At nt 106-113 upstream from the poly(A) tail, there was an octameric sequence of “GGAAGAGC” (Fig. 1b).

Fig. 1
figure 1

(a) The 5′ NTR and a potential short AUG-initiated ORF beginning within a Kozak context (UCUaugA). (b) Octameric sequence of “GGAAGAGC” upstream of the poly(A) tail in all strains except strain Miller M60

The non-structural genes

The replicase genes were composed of ORF1a and ORF1b, which contained a 43-nt common region and included a typical coronavirus “slippery site” (5′-UUUAAAC-3′, nt 12,333-12,339; Fig. 2a), which allows the ORF1a translation termination site to be bypassed and an additional ORF, ORF1b to be read. As has been shown for other TGEV coronavirus [8], the ORF1a gene of strain SHXB was predicted to encode a protein of 4,017 aa, and the ORF1b gene was predicted to encode a protein of 2,680 amino acids. Nucleotide sequence analysis indicated that there were no deletions or insertions in the ORF1ab region of any of the TGEV strains. The ORF1a of strain PRCV-ISU had a 3-amino-acid deletion (Table 1).

Fig. 2
figure 2

(a) A typical coronavirus “slippery site” 5′-UUUAAAC-3′, found in all TGEV strains. (b) A 16-nt deletion before the initiation codon “ATG”, found in strains attenuated H, H16, Miller M60, TS, and Virulent Purdue. (c) A 29-nt deletion before stop codon “TAA”, found in strains attenuated H, H16, Miller M60, TS, and Virulent Purdue

Table 1 Length in amino acids of the predicted structural and nonstructural proteins of twelve TGEV strains and PRCV-ISU-1

ORF3a and ORF3b of strain SHXB were predicted to encode proteins of 72 and 244 amino acids, respectively. In strain Miller M60, a 531-nt deletion in the ORF3b gene results in the ORF3b-encoded protein 67 amino acid being truncated; in strain PRCV-ISU, a 184-nt deletion in the ORF3a gene disrupts the predicted ORF3a-encoded protein, and a 117-nt deletion in the ORF3b gene caused the predicted ORF3b encoded protein was shorter than other TGEV strains [36]. As shown in Fig. 2b, a 16-nt deletion before the initiation codon “ATG” found in strains attenuated H, H16, Miller M60, TS, and Virulent Purdue, and a 29-nt deletion before the stop codon “TAA” were also found in these strains (Fig. 2c). No deletions or insertions were found in the ORF3a or ORF3b gene of strain SHXB. The ORF7 gene of strain SHXB was predicted to encode 78 amino acids, and no deletions or insertions were found in comparison to other TGEV strains and strain PRCV-ISU-1 (Table 1).

The structural genes

The nucleotide sequence of the S gene of strain SHXB was 4,344 nt in length, encoding a predicted protein of 1,447 amino acids. It had the same length as those of strains Purdue P115, Pur46-MAD, WH-1, TGEV-HX, and SC-Y (Table 1). At position 655 of all TGEV strains, the nucleotide was G (Fig. 2a). A 6-nt deletion was found in the S gene at position 1,123-1,128 of strains SHXB, Purdue P115, PUR46-MAD, WH-1, SC-Y, and TGEV-HX (Fig. 3b), which caused the S protein to be two amino acids shorter than in strains Virulent M6, Virulent Purdue, and TS (Table 1). At nt position 1,753 of strains Virulent M6, Virulent Purdue, and TS, the nucleotide was T, while in the other strains, it was G (Fig. 3c). A 3-nt deletion was found in the S gene, at position 2,386-2,388 of strains attenuated H, H16, and Miller M60 (Fig. 3d), which caused a one-amino-acid deletion in the S protein compared to strains Virulent M6, Virulent Purdue, and TS (Table 1), while it was not found in strain SHXB. Sequence analysis confirmed a previous report of a 681-nt deletion in the 5′ end of the S gene of PRCV-ISU [24, 36].

Fig. 3
figure 3

(a) A guanine (G) at position nt 655 of all TGEV strains. (b) A 6-nt deletion in the S gene at nt position 1123-1128 of strains SHXB, Purdue P115, PUR46-MAD, WH-1, SC-Y, and TGEV-HX. (c) A thymine (T) residue at position nt 1753 of strains Virulent M6, Virulent Purdue, and TS and G in the other strains. (d) A 3-nt deletion in the S gene, at nt position 2386-2388 of strains attenuated H, H16, and Miller M60. (e) A 3-nt deletion at position 151-153 in the M gene of strain PRCV-ISU-1. (f) A 6-nt insertion in the M gene of Miller M60 compared with other TGEV strains

Sequence analysis revealed no deletions or insertions in the E and N genes of any of the TGEV strains or strain PRCV-ISU-1. The predicted E and N proteins were 82 and 382 amino acids long, respectively (Table 1). At nt position 151-153, there was a 3-nt deletion in the M gene of strain PRCV-ISU-1 (Fig. 3e), making it one amino acid shorter than those of the other strains. There was a 6-nt insertion in the M gene of Miller M60 when compared with other TGEV strains (Fig. 2f), making it two amino acids longer than that of the other strains (Table 1).

Homology comparisons

To investigate the homology of strain SHXB to other TGEV strains and PRCV-ISU-1, the nucleotide and predicted amino acid sequences of the nonstructural and structural protein genes (replicase ORF1, S, 3a, 3b, E, M, N, ORF7) of strain SHXB were compared. As shown in Table 2, the amino acid sequence identity in ORF1 was 96.7 %-100 %; in protein S it was 97.7 %-100 %; in ORF3a it was 86.1 %-100 %; in ORF3b it was 44.1 %-100 %; in protein E it was 91.6 %-98.8 %; in protein M it was 96.2 %-99.7 %; in protein N it was 97.9 %-99.7 %; and in protein ORF7, it was 94.9 %-97.5 %. Interestingly, when comparing the amino acid sequences of ORF3a, strain SHXB showed 100 % sequence identity to strains PUR16-MAD, Purdue P115, SC-Y, TGEV-HX, and WH-1, and showed 98.9 % identity to strain Virulent Purdue, but showed less than 90 % to strains Attenuated H, H16, Miller M6, Miller M60, and TS. Protein N showed the highest amino acid similarity. A 531-nt deletion in the ORF3b gene of Miller M60 resulted in 44.1 % amino acid similarity to strain SHXB.

Table 2 Nucleotide and amino acids sequence identity of the genomes of TGEV SHXB to other TGEV strains and PRCV-ISU-1(%)

Phylogenetic analysis

Based on the phylogenetic analysis of the entire genomic nucleotide sequences of TGEV strains, all TGEV strains were divided into two groups. One group consists of Purdue strains, and the other of Miller strains. To further explore the evolutionary relationships among these TGEV strains and strain PRCV-ISU-1, a phylogenetic tree was constructed using the nucleotide sequence of the S structural protein. As shown in Fig. 4b, strain SHXB also had a close relationship to strains Purdue P115, TGEV-HX, WH-1, and PUR46-MAD. Overall, strain SHXB belongs to Purdue strains group and is more distant evolutionarily from the Miller strains group and strain PRCV-ISU-1, but all of these strains appear to share a common ancestor.

Fig. 4
figure 4

(a) Phylogenetic analysis of the complete genome sequences of the strain SHXB, other TGEV reference strains, and strain PRCV-ISU-1. (b) Phylogenetic analysis of the S protein of strain SHXB, other TGEV reference strains, and strain PRCV-ISU-1. Both of these phylogenetic trees were constructed using the neighbor-joining method with the MegAlign program

Discussion

We sequenced the complete genome of strain TGEV SHXB. To investigate the differences in their genetic structure, diversity and evolution, TGEV SHXB was compared with other TGEV strains and the ISU-1 strain of PRCV. Some researchers had reported detailed comparisons of the sequences of TGEV strains, and these results will give a reference for us to understand the changes in virulence of SHXB. Compared with the coronavirus PEDV, the number of complete sequences of TGEV in public databases is limited. Before the present study, there were twelve complete genomic sequences of TGEV strains in GenBank. Analyzing the whole genomic sequence of SHXB will help provide information about its genetic structure, diversity, and evolution, and in particular, the epidemic characteristics of the coronavirus in China.

The 5′ and 3′ NTRs of TGEV are critically important for viral replication and transcription [1]. In this study, no deletions or insertions were found in 5′ and 3′-NTR regions of strain SHXB, suggesting that the replication and transcription mechanism of strain SHXB was not changed. We found a homopolymeric “slippery” sequence of nucleotides (5′-UUUAAAC-3′) and a pseudo-knot structure, which is critical for the transcription of the ORF1ab gene and involves ribosomal frame shifting [2]. These elements were also found in strain SHXB.

A similar nucleotide change in the S protein has been reported previously. It was shown that a G residue at position 655 of the S protein was essential for maintaining enteric tropism of the TGEV strain PUR46-MADand mutation of this nucleotide caused a shift in tropism from enteric to respiratory [30]. As shown in Fig. 3a, at position 655, all TGEV strains contain a G, except strain PRCV-ISU-1, showing that strain SHXB is an enteric virus. Antigenic site A/B has been mapped to aa 506-706 of the spike protein [11]. At nt position 1753, a T-to-G mutation caused a serine-to-alanine mutation at amino acid 585, which is located in the main major antigenic sites A/B of the TGEV S protein [5]. This change may significantly influence receptor binding or interactions with neutralizing antibody. A single amino acid change in the S protein can have a significant effect on antigenicity [36]. Analyzing the S gene sequence of strain SHXB, we found that the nucleotide at position 1753 was G, indicating that the antigenicity of the S protein may be changed. There was a 6-nt (nt 1,223 to 1,228) deletion in the Purdue strains group, except strain Virulent Purdue, but not in the Miller strains group, strain PRCV-ISU-1, and strain PUR46-MAD. Purdue P115 is attenuated, and this deletion may also play a role in attenuation and is a distinguishing mark of the attenuated Purdue strains group. The same deletion was found in the S protein of attenuated Purdue strain PUR46-C8 but was not found in strain PUR46-C11, which was maintained in vivo [26]. This phenomenon should be investigated by generating specific mutants by reverse genetics, followed by animal experiments.

There were two large deletions (16 and 29 nt, respectively) in the ORF3a gene of strains H16, TS, and Virulent Purdue, and attenuated strains H and Miller M60. Some researchers have found the phenomenon of large nucleotide deletions in the ORF3 gene [22], indicating that deletions in the ORF3 gene may affect the virulence of TGEV [25], but some researchers have found that these deletions in ORF3 gene do not affect virus replication [18], and this suggests that the ORF3a gene is not necessary for viral replication and has a minor effect in the virulence of virus [32]. Two large deletions were only found in the Miller strains group, except strain Virulent Purdue, a phenomenon that may also distinguish between the Miller strains group and the Purdue strains group.

An accumulation of point mutations has been proposed to be a driving force for coronavirus evolution [29]. These mutations and recombination lead to the generation of new coronaviruses and may alter their pathogenicity, change tissue tropism, and even break the barrier between host species. The SARS-CoV-like coronaviruses of bats may have potentially become adapted to humans through genomic mutation and recombination events either directly or via intermediate hosts [12]. The evolution and tissue tropism shift between strains TGEV and PRCV had been described previously [31]. Homology comparisons and phylogenetic analysis help us to understand the evolution of strain SHXB, and homology comparisons showed that strains PUR46-MAD, Purdue-P115, SC-Y, TGEV-HX, Virulent Purdue, and WH-1 are highly similar to SHXB. The nucleotide and amino acid sequence homology of structural proteins and non-structural proteins between 97 %-100 %, especially in the ORF3a gene, showed significant species specificity. Phylogenetic analysis also showed that strain SHXB is closely related to with strains PUR46-MAD, Purdue-P115, SC-Y, TGEV-HX, Virulent Purdue and WH-1, which have the same ancestor, and this is consistent with the results of homology comparison. Strain PUR46-MAD is a derivative of Purdue P115, and both were derived from the strain Virulent Purdue, Therefore, these isolates cluster together in the phylogenetic tree. As shown in Fig. 4, the other branch included Miller strains and Chinese strain TS, attenuated H, and H16, which were from different evolutionary group, but all TGEV strains shared a common ancestor with PRCV-ISU-1.

Previous research has revealed that genetic divergence most frequently occurs within the S and ORF3a/3b genes, suggesting that these regions are frequently mutated and that these changes can occur due to RNA recombination when multiple distinct coronaviruses infect the same host. It has also been found that the 5′ and 3′ UTRs are critically important for viral replication and transcription. By comparing these regions with those of other TGEV strains, we have gained further understanding of the genetic structure, diversity, and evolution of strain SHXB, as well as other coronaviruses. Future studies could involve generating specific mutants by using reverse genetics and characterizing these mutants in animal experiments.