Introduction

Members of the Coronaviridae, within the order Nidovirales, contain positive-stranded RNA genomes that range from 27 kb to 31 kb in size [1]. Transmissible gastroenteritis virus (TGEV), together with porcine epidemic diarrhea virus (PEDV), feline infectious peritonitis virus (FIPV), and the human coronaviruses HCoV-229E and HKU1, belongs to the genus Alphacoronaviruses (http://www.ictvonline.org/). TGEV is an enveloped virus that contains a positive-sense, single-stranded RNA genome of about 28.5 kb in length. Its genome includes nine open reading frames (ORFs) that encode four structural proteins (spike [S], envelope [E], membrane [M], and nucleoprotein [N]), and five nonstructural proteins (the replicases 1a and 1ab, proteins 3a and 3b, and protein 7) in the order 5′-replicase (1a/1b)–S–3a–3b–E–M–N–7-3′ [2]. TGEV was initially identified as the etiological agent of transmissible gastroenteritis (TGE) in swine in 1946 in the United States [3, 4]. In neonates, TGEV infects the epithelial cells of the small intestines, leading to potentially fatal gastroenteritis. The virus can also lead to infection in the upper respiratory tract and less often, in the lungs [5]. In adults, TGEV causes mild disease. In swine, it is the major cause of viral enteritis and fetal diarrhea in neonates, resulting in significant economic losses [6].

TGEV was reported in many swine-producing countries between the late 1980s and the 1990s [710]. TGEV strains of varying virulence have been isolated and characterized worldwide [1114]. Some strains have been used to develop modified live vaccines with limited success. In China, a TGE outbreak was first reported in the 1970s. Since then it has been prevalent in many provinces and has become one of the most important viral diarrhea diseases in China. The Chinese TGEV vaccine strain H165 was derived from a virulent field strain H16 by 165 passages in PK15 cells. H165 virus was proven to be safe in piglets and pregnant sows and efficacious against TGEV infection [15]. Vaccines based on the H165 strain are currently commercially available to prevent and control TGEV infections in China. To genetically characterize the Chinese vaccine strain H165 and its parental strain H16 at the molecular level, we determined the complete nucleotide (nt) sequences of the H165 and H16 viruses. To determine the molecular basis for attenuation of the Chinese TGEV vaccine strain, the deduced amino acid (aa) sequences of the structural and non-structural proteins of the H165 and H16 viruses were compared with those of TGEV reference strains, as well as with that of the porcine respiratory coronavirus (PRCV) strain PRCV-ISU-1.

Materials and methods

Cells and viruses

The pig kidney cell line PK-15 (ATCC, CCL-33) was regularly maintained in Eagle’s Minimum Essential Medium (MEM) supplemented with 10% fetal bovine serum, penicillin (100 U/ml), streptomycin (100 μg/ml), and 1 mM Na-pyruvate. The Chinese vaccine strain H165 and its parental strain H16 were obtained by our laboratory [15].

Extraction of genomic TGEV RNA

Total RNA was extracted from PK-15 cells infected with H165 virus and from purified H16 virus, using the QIAamp Viral RNA Mini kit (QIAgen). The resulting RNA pellet was resuspended in 10 μl DNase-free, RNase-free double-distilled water and was used as a template for RT-PCR.

RT-PCR and DNA sequencing

The complete genomes of the H165 virus and its parental strain H16 were amplified by RT-PCR and sequenced. The reverse transcriptase step was performed with Superscript® III reverse transcriptase (Invitrogen) using a combined random priming or oligo (dT) priming strategy and the resulting cDNA was amplified by PCR using PrimerSTAR® HS DNA Polymerase (TaKaRa). Primers were designed based on conserved regions of the TGEV strains (their sequences are available on request). For amplification of the TGEV subclones, reactions were carried out in a total of volume of 50 μl containing 10 μl of 5 × PCR Buffer, 4 μl of 2.5 mmol/l dNTPs, 0.5 μl of Taq polymerase (TaKaRa), 1 μl of each specific primer (20 μM), about 2.5 ng of template, and sterile deionized water. The PCR protocols was carried out for 30 cycles at 98°C for 10 s, 55°C for 15 s, and 68°C for several minutes depending on the size of the amplicons. The 5′ and 3′ ends of the viral genome were confirmed by rapid amplification of cDNA ends using the SMART™ RACE cDNA Amplification Kit (Clontech). The PCR amplicons were purified using the AxyPrepTM Gel Extraction Kit (Axygen) and cloned into pGEM-T Easy Vector (Promega). Three to five independent clones of each TGEV amplicon were isolated and sequenced using the M13 universal primers.

Sequence analysis

Sequences were assembled and edited to obtain the complete genome sequences of the viral strains H165 and H16. The complete nt sequences of the H165 and H16 strains were submitted to the GenBank sequence database and assigned accession nos. EU074218 and FJ755618. ORFs were predicted using the Gene Runner program, version 3.00 (http://www.generunner.com). The nt sequences of both genomes and the deduced aa sequences of the ORFs were compared to several TGEV strains and the PRCV strain PRCV-ISU-1. Multiple alignments and phylogenetic trees were constructed using the MegAlign program from the DNAStar software package (Version 7.1.0, DNASTAR Inc., USA). The sequences used for analysis were the available strains on 4 February 2010 in the GenBank. Prediction of signal peptides and their cleavage sites was performed using SignalP software [16]. Prediction of transmembrane domains was performed using TMHMM software [17]. Potential N-glycosylation sites were predicted using ScanProsite software [18]. The secondary structures of the 5′ and 3′-untranslated regions (UTRs) of the H165 and H16 viruses were predicted using RNAdraw 1.1 software [19].

Results

Genomic sequences of the strains H165 and H16

Full-length genome sequences were generated by several overlapping cDNA fragments to encompass the entire RNA genomes of strains H165 and H16. The findings indicated that the genome sequence was 28,569 nt in length, including a poly A tail of 30 nt. The genome encodes nine ORFs, characteristic of the genus Alphacoronavirus, and 5′ and 3′ untranslated sequences of 314 and 274 nt, respectively (which do not include the poly A tail). The predicted ORFs 1a (nt 315–12,368) and 1ab (nt 12,326–20,368) of the Chinese vaccine strain H165 contained 12,051 and 8,040 nt, encoding the non-structural proteins 1a and 1ab, respectively. ORF 1ab contained a 43 nt region that overlapped with ORF 1a and included a typical coronavirus “slippery site,” 5′-UUUAAAC-3′ (nt 12,332–12,338). Based on evidence from other TGEV coronaviruses [20], this sequence causes a –1 frameshift during the translation of ORF1a, which results in a portion of the proteins (protein 1ab) avoiding translation termination and containing an additional 2,680 aa. The structural proteins S, E, M and N were found to be encoded by ORFs S (nt 20,365–24,711), E (nt 25,815–26,063), M (nt 26,074–26,862) and N (nt 26,875–28,023), respectively. Furthermore, the three non-structural proteins 3a, 3b and 7, were identified to be encoded by ORFs 3a (nt 24,814–25,023), 3b (nt 25,094–25,828) and 7 (nt 28,029–28,265), respectively. Upstream of eight of the genes of the Chinese vaccine strain H165 and its parental strain H16, there was a repeated intergenic sequence, 5′-CUAAAC-3′, called a transcription regulating sequence (TRS) [21].

Sequence analysis showed that strains H165 and H16 were more closely related to strains Miller M6 and TS than the other reference strains (Table 1). Strains H165 and H16 shared between 98.3 and 99.9% nt identity to the other TGEV reference strains, being slightly less closely related to strains Purdue and SC-Y with around 98.5–98.7% and 98.3% nt identity, respectively (Table 1). They were also less closely related to strain PRCV-ISU-1 with around 97.9% of nt identity. Nearly all the ORFs of strains H165 and H16 were highly similar to those of strains Miller M6 and TS (98.4–100% nt and 95.2–100% aa identity), except for 88.9% and 52.9% nt identity with strain Miller M60 genes 3a and 3b, respectively, which are attributable to the fact that Miller M60 has a 531 nt in frame deletion in gene 3 [22]. In order to investigate the relationships among TGEV strains and PRCV-ISU-1, a phylogenetic analysis was performed on the genome sequences of strains H165 and H16, strain PRCV-ISU-1, and the available TGEV strains in the GenBank. As shown in Fig. 1, strains H165 and H16 were confirmed to be more closely related to the TGEV Miller strains than the Purdue strains or strain PRCV-ISU-1. To further explore the evolutionary relationship among strains, the S protein sequences were used to investigate genetic relatedness among TGEV strains and strain PRCV-ISU-1. Phylogenetic tree constructed using the available S protein sequences showed that the reference strains were grouped into five distinct clusters (Fig. 2). Strains H165 and H16 formed a cluster with strains Miller M60, HN2002, Miller M6, TS, TSX, 96-1933, TFI, FS772/70, and TO14. Taken together these results, it appears that strains H165 and H16 are more closely related to strains Miller M60, HN2002, Miller M6, and TS than to the other Chinese TGEV strains, even though they are of Chinese origin.

Table 1 Sequence comparisons of the Chinese vaccine strain H165 and its parental strain H16 with other TGEV strains and the PRCV-ISU-1 strain
Fig. 1
figure 1

Phylogenetic analysis of the entire genome sequences of the Chinese vaccine strain H165, its parental strain H16, several TGEV reference strains, and PRCV-ISU-1 strain. The phylogenetic tree was constructed using the ClustalW method with DNASTAR software

Fig. 2
figure 2

Phylogenetic analysis of the S protein of the Chinese vaccine strain H165, its parental strain H16, other TGEV reference strains, and PRCV-ISU-1 strain. The phylogenetic tree was constructed using the Jotun Hein method with DNASTAR software

Characterization of 5′ and 3′-UTR within the strains H165 and H16

Compared with its parental strain H16, there was no mutation, deletion, or insertion in the 5′ and 3′-UTR. Both the 5′ and 3′ ends of the genome of the two strains contain short UTRs. The 5′-UTR comprises 314 nt and includes a potential short AUG-initiated ORF (nt 117–123), begins in a suboptimal Kozak context (GCCAUGG) for translation [23] and potentially encodes peptides of three aa (MKS). Analysis of the 5′-UTR showed a high level of secondary structure, with three simple and two complex stem-loop structures (data not shown), as predicted using the software RNAdraw 1.1. The 3′-UTR comprises 274 nt (28,266–28,539) and possess an octameric sequence of GGAAGAGC beginning at nt 198–205 upstream from the poly (A) tail, and possess a poly (A) tail of 30 nt. A high level of secondary structure, with six simple and one complex stem-loop structures (data not shown) predicted using RNAdraw 1.1., which has been shown to be important in enterovirus replication [24].

Nucleotide and amino acid mutations of the Chinese vaccine strain H165

As shown in Table 2, there were a total of 27 nt mutations identified in strain H165, resulting in a total of 16 aa mutations mainly located within proteins 1a, 1ab, S, 3a, 3b, and E. Moreover, a point mutation (A25074GORF3) was found in the intergenic region between the genes 3a and 3b. In brief, there were 13, 8, 2, and 2 nt mutations in the genes 1, S, 3, and sM, resulting in 7, 5, 2, and 2 aa differences from its parental strain H16, respectively (Table 2). Furthermore, six nt mutations (G6014TORF1a, T12388CORF1b, T21937CS, T21969AS, A26025CE, and C27507TN) could be the makers used to differentiate the Chinese vaccine strain from other strains of TGEV in the GenBank. Comparisons with the other two pairs of attenuated and virulent TGEV strains (virulent Purdue and P115, Miller M6, and Miller M60) revealed no common mutations among them.

Table 2 Nucleotide and amino acid differences unique to strain H165 or to both attenuated and virulent viruses (strains H165 and H16) compared with all other TGEV strains or strain PRCV-ISU-1 listed in the GenBank database

Bioinformatics analysis of the structural proteins of the strains H165 and H16

ScanProsite analysis of the proteins S, M, and N of strains H165 and H16 showed that there were thirty-three (seven NXS and twenty-six NXT), three (two NXS and one NXT), and four (three NXS and one NXT) potential N-linked glycosylation sites, respectively. No N-linked glycosylation site was identified in the E proteins of strains H165 and H16. SignalP analysis of the proteins S of strains H165 and H16 showed that there was a potential cleavage site between aa 16 and 17, revealing a signal peptide probability of 0.995. It was predicted to contain a signal anchor (probability 0.998) at position 30 with a cleavage site between aa 37 and 38 in E proteins of strains H165 and H16. An N-terminal signal peptide was also identified in the M proteins of strains H165 and H16 with a potential cleavage site between aa 16 and 17, revealing a signal peptide probability of 0.931. The S proteins of strains H165 and H16 were the typical type I membrane protein with the N-terminal 1,411 aa residues exposed on the outside of the cell surface or virus particle and a transmembrane domain near the C-terminus (aa 1,412–1,434) followed by a cytoplasmic tail rich in cysteine residues (aa 1,435–1,472). There was also a stretch of highly hydrophobic residues at position 1,394–1,412, with a maximum value of 3.467 at position 1,398 and a minimum value of –2.967 at position 953–954. One transmembrane domain was predicted at position 15–37 of proteins E of strains H165 and H16 by TMHMM analysis, which predicted the N-terminus of the E protein to be external to the cell surface or viral envelope. Three transmembrane domains were predicted to be present at positions 46–68, 78–100, and 112–134 of proteins M of strains H165 and H16 by TMHMM analysis, with a stretch of eight aa (SEESFNPE) directly adjacent to the third hydrophobic domain.

Discussion

Since TGEV was identified in 1946, researchers have tried to attenuate the virulence of field TGEV in order to develop an appropriated attenuated vaccine. Some TGEV isolates gradually lose their virulence and developed to be commercially attenuated TGEV vaccines after continuous passages in cell culture. Strain H165, derived from a virulent strain H16 by 165 passages in PK-15 cells, has been proven to be safe in piglets and pregnant sows and displays efficacy against TGEV infection [15]. In this study, sequence analysis showed that passaging of strain H16 in PK-15 cells resulted in 27 nt mutations that caused 16 aa mutations in strain H165 relative to strain H16. The mutation rate observed was lower than that previously attained in studies by Zhang [14] where twenty and thirty-two aa mutations were found in strains Miller M60 and Purdue P115, respectively, when compared to their virulent counterparts. Up to now, there were only seven TGEV strains and one PRCV strain PRCV-ISU-1 had been fully sequenced, though partial sequences of TGEV strains were available in the GenBank. Moreover, only two virulent and attenuated TGEV pairs were reported [14]. Whole genome sequences of strains H165 and H16 will help us to understand the genetic basis of TGEV coronavirus attenuation and enhance the geographic differentiation information among TGEV strains.

The 5′ and 3′-UTR regions of TGEV genome are crucial for viral replication and transcription [25]. Futhermore, the 5′ and 3′-UTR regions had been demonstrated to be targeted for the attenuation of some viruses [26]. In this study, no mutation, deletion, or insertion was detected in the 5′ and 3′-UTR regions of strains H165 and H16. Moreover, the single nt difference from the two pairs of virulent/attenuated TGEV strains found in the 5′-UTR of strains H165 and H16 may not abolish the role in viral replication and transcription. The second conserved region involves the gene 1a, a frameshift region that forms a typical coronavirus “slippery” sequence (5′-UUUAAAC-3′) and a pseudo-knot structure, which is proposed to be critical for the transcription of gene 1ab that involves ribosomal frame shifting [20]. Though a nt mutation was found in the TRS of the gene 3b for virulent Purdue and attenuated P115 strains [14], no mutation was found in the conserved region TRS, 5′-CUAAAC-3′, located upstream of each gene serve as signals for the transcription of the sgRNAs.

In this study, of the fifteen nt mutations identified in ORFs, 1a, 1ab, 3a, and 3b, eleven led to aa mutations in nonstructural proteins of strain H165. Seven of eleven mutations were present in proteins 1a and 1ab, with two aa mutations present in proteins 3a and 3b, respectively. The proteins 1a and 1ab are expressed by ribosomal frameshifting and polyprotein cleavage. A comparative analysis of replicative polyproteins of coronaviruses and arteriviruses identified the most variable regions in the N-terminal half of protein 1a [2729], and four aa mutations (S296CORF1a, R653KORF1a, A745VORF1a, and L1900FORF1a) in protein 1a were found to occur in the N-terminal half of protein 1a, while 1 aa mutation (P3867RORF1a) occurred in the C-terminal of protein 1a. A point mutation within the replicase gene 1a had been demonstrated to affect coronavirus genome versus minigenome replication differentially [30], although, attempts to delete the replicase non-conserved domains to determine whether they are essential have not yet been made. Deletion of genes 3a and 3b reduced TGEV virulence very little [31], further studies will be needed to ascertain the roles of the two aa mutations (A55SORF3a and A144TORF3b) in proteins 3a and 3b, as well as one nt mutation (A25074GORF3) identified in the intergenic region between the genes 3a and 3b.

No aa mutations were present in proteins M and N, but two aa mutations occurred in protein E and five aa mutations occurred in protein S, which had been demonstrated to be an antigen for a neutralizing antibody for coronaviruses and play vital roles in viral entry, cell-to-cell spread, and determining tissue tropism [3235]. All of the five aa mutations (P48SS, L514PS, F525PS, D671GS, and I740SS) in protein S had also been found to occur in the N-terminal 1,411 aa residues, which had been predicted to expose on the outside of the cell surface or virus particle. Antigenic site A/B has been mapped from aa 506–706 of protein S [36], previously studies showed that there were two aa mutations for M60 within this region as compared to M6 and four aa mutations for Purdue P115 as compared to virulent Purdue [14]. In this study, there were three aa mutations for H165 as compared to H16 in the corresponding region. It has been implied that protein S may be related to the virulence of TGEV [35, 37]. However, whether these five aa mutations in protein S had influenced the virulence of H16 will require additional work by using a reverse genetics system and the characterization of these mutations in animal experiments. There are also one aa mutation (F23VE) to be found in position 15–37 of protein E, which showed that the N-terminus of protein E to be external to the cell surface or viral envelope.

In summary, most of the aa mutations identified were located in the functional regions of the TGEV genome, such as in the most variable regions of the protein 1a of the TGEV (S296CORF1a, R653KORF1a, A745VORF1a, and L1900FORF1a), at the N-terminal (P48SS, L514PS, F525PS, D671GS, and I740SS) of protein S and its antigenic site (L514PS, F525PS, and D671GS), as well as in the putative transmembrane domain (F23VE), all of which are hypothesized to have the potential to affect virus replication, signal transport, and antibody neutralization processes associated with TGEV. Future studies could involve generating specific mutants via reverse genetics and characterizing these mutants in animal experiments. In addition, the findings from our study also revealed that six unique nt mutations (G6014TORF1a, T12388CORF1b, T21937CS, T21969AS, A26025CE, and C27507TN) in the genome sequence of Chinese vaccine strain H165 that could be the markers used to differentiate strain 165 from the other strains of TGEV. Furthermore, a rapid differentiation method to differentiate H165 from wild-type viruses of TGEV had been established by restriction fragment length polymorphism of the N gene based on the nt mutation (C27507TN) (data not shown). Whole genome sequences of strains H165 and H16 may enhance our understanding of the evolution of TGEV coronavirus, as well as the other coronaviruses.