Introduction

Turkey coronaviral enteritis of varying severity, caused by turkey coronavirus (TCoV), has been reported in turkey flocks from multiple states in the United States since the 1990s. The major clinical signs of TCoV infection include depression, ruffled feathers, diarrhea, decreased body weight, and uneven flock growth. The most apparent gross lesions are markedly distended intestines with gaseous and watery content, particularly in the ileum and ceca. Salient histopathologic findings include shortening of the intestinal villi, an increase in crypt depth, and widening of intervillous spaces [1]. When turkeys are infected with TCoV and other infectious agents such as astrovirus, small round virus, and Escherichia coli (E. coli), they can develop poult enteritis-mortality syndrome (PEMS), which causes high mortality [2, 3]. Subsequent experimental studies of the TCoV isolates VR-911 and TCoV/ON/MG10/08 from Canada have shown that TCoV can cause symptoms similar to those caused by PEMS [3, 4]. Therefore, TCoV has been suggested to be the major causative pathogen for turkey enteritis, and secondary infections caused by other opportunistic microorganisms enhance the severity of TCoV enteritis and contribute to the development of PEMS. Turkey enteritis associated with TCoV infection has caused substantial economic losses in Indiana, North Carolina, Arkansas, and other states in the United States [5, 6], as well as in Canada [4], Europe [2, 7], and Brazil [8]. Currently, there are no vaccines to prevent the disease, and treatment of infected turkeys is often unsuccessful.

A member of the species Avian coronavirus (CoV) in the genus Gammacoronavirus and family Coronaviridae, TCoV has a positive single-stranded RNA genome that is approximately 27 Kb in size. The major structural proteins of TCoV include the spike (S), envelope (E), matrix (M), and nucleocapsid (N) proteins. Comparisons of 3’-end coding regions [9, 10] as well as the full genomes [11] of TCoV isolates and infectious bronchitis virus (IBV) have suggested that TCoV arises through recombination in the S gene, because pairwise comparisons of S gene sequences have revealed only a 34 % similarity between TCoV isolates and IBV strains, whereas gene 3, M gene, gene 5, and N gene sequences have over 80 % similarity [9, 11, 12]. The S gene sequences of different TCoV isolates (93 %-99.7 %) are more conserved than those of various IBV strains (67.4 %-94.4 %), which could explain the close antigenic relationship of TCoV isolates compared with the distant antigenicity of different IBV serotypes [5, 10, 13, 14]. Investigations during TCoV outbreaks and genomic analyses of TCoV isolates have revealed that distinct TCoV isolates tend to circulate endemically, and their respective sequences group phylogenetically according to their state of origin [6, 7, 11]. However, these observations may have been biased because of the small number of TCoV sequences that were analyzed. In the present study, 24 TCoV isolates were recovered from clinical cases submitted to the Indiana Animal Disease Diagnostic Laboratory at Purdue University by turkey farms in Minnesota, Indiana, North Carolina, Missouri, Arkansas, Texas, South Carolina, and Pennsylvania between 1994 and 2010. The objective of the present study was to elucidate the relationship between the genotypes and geographic distribution of TCoV isolates from turkey farms in multiple states in the United States by using sequence analysis and comparing the full-length S gene.

Materials and methods

Clinical samples and virus purification

Twenty-four field isolates of TCoV were recovered from clinical cases submitted to the Animal Disease Diagnostic Laboratory at Purdue University by turkey farms in Minnesota, Indiana, North Carolina, Missouri, Arkansas, Texas, South Carolina, and Pennsylvania between 1994 and 2010 (Table 1). Field cases of TCoV were confirmed by clinical signs, gross lesions, histopathologic findings, immunofluorescence antibody (IFA) assay with antiserum against TCoV/IN/540/94, electron microscopy, and reverse transcription polymerase chain reaction (RT-PCR). All 24 TCoV isolates were propagated five times in embryonated turkey eggs as described previously [1]. In brief, intestines from TCoV-infected turkeys were homogenized as 20 % suspensions in chilled sterile phosphate-buffered saline and clarified by centrifugation at 3000 rpm for 10 minutes at 4 °C. The supernatant was filtered through a 0.22-μm membrane filter (Millipore, Bedford, MA, USA). The filtrate was inoculated into the amniotic cavity of 22-day-old embryonated turkey eggs. The embryo intestines were harvested after 3 days of incubation for virus purification. The harvested intestines were homogenized and clarified at 3000 rpm for 10 minutes at 4 °C. The supernatant was layered on top of 30 % and 60 % sucrose and clarified using ultracentrifugation in an SW28 rotor at 24,000 rpm for 3 hours at 4 °C in an Optima XL-100K ultracentrifuge (Beckman Coulter, Fullerton, CA, USA). The interface between 30 % and 60 % sucrose was collected and placed on top of a continuous 40 %–60 % sucrose gradient and clarified by ultracentrifugation at 24,000 rpm for 20 hours at 4 °C. A band of buoyant density 1.16–1.24 g/mL (containing TCoV) was collected and saved at -80 °C as the viral stock.

Table 1 Turkey coronavirus (TCoV) isolates and other coronaviruses used in the molecular analysis of the spike (S) gene

RNA and cDNA

The viral RNA was extracted from the purified virus using RNApure™ Reagent (GenHunter, Nashville, TN, USA) and chloroform, followed by precipitation using cold isopropyl alcohol and ethanol. The extracted RNA was reverse transcribed to cDNA using SuperScript™ III reverse transcriptase (Invitrogen, Carlsbad, CA, USA) according to the manufacturer’s instructions. The first reaction was 10 minutes of incubation with the RNA, random hexamer primer (100 ng/µL), and 10 mM dNTPs at 70 °C, followed by 1 minute on ice. The second reaction was 5 minutes of incubation with a mixture of 5x first-strand buffer, 0.1 M of dithiothreitol (DTT), 1 U of SuperScript™ III reverse transcriptase, and 40 U of RNaseOUT™ (Invitrogen, Carlsbad, CA, USA) at 25 °C, followed by 1 hour of incubation at 50 °C and 15 minutes of inactivation at 70 °C.

PCR amplification and sequencing

The full-length S gene (approximately 3.9 kb) was amplified by PCR using the cDNA of each TCoV isolate with the primers Sup and Sdown3 (Online Resource 1). The mixture (64:1, v:v) of Taq (Promega Corp., Madison, WI, USA) and Pfu DNA polymerases (Stratagene, La Jolla, CA, USA) with proofreading ability was used in a 96-well thermal cycler (GeneAmp, Perkin-Elmer Cetus Corp., Norwalk, CT, USA) to maintain the fidelity of the PCR [9]. The PCR products were electrophoresed on 1 % agarose gels and purified using a ZymocleanTM Gel DNA Recovery Kit (Zymo, Irvine, CA, USA) for further sequencing. Several primers (Online Resource 1) designed to sequence overlapping fragments covering the full length of the S gene of TCoV were used to determine the nucleotide sequences of the purified PCR products at the Purdue University Genomics Core Facility (West Lafayette, IN, USA). In addition, the purified PCR product was cloned into the PCR-II plasmid vector and used to transform E. coli strain TOP10F’ according to the manufacturer’s instructions (Invitrogen, Carlsbad, CA, USA).

Sequence analysis

Twenty-four TCoV isolates collected from eight states in the United States from 1994 to 2010 were purified and sequenced in our laboratory. TCoV/IN/549/94 was analyzed in a previous study [15], and the full-length S gene sequences of the other 23 TCoV isolates were published for the first time in the present study (Table 1). The putative peptide cleavage site separating the amino-terminus of the S1 subunit from the carboxyl terminus of the S2 subunits as well as a second possible peptide cleavage site in the S2 subunit were detected using the ProP server (http://www.cbs.dtu.uk/services/ProP/). The S1 subunit of TCoV was further designated as S1a at the amino-terminus (1-204 in TCoV/IN/540/94) and S1b at the carboxyl-terminus (205-536 in TCoV/IN/540/94) [7]. Figure 1 shows a diagram of the S protein of TCoV. The nucleotide and deduced amino acid sequence similarities of the S genes of all 24 TCoV isolates were analyzed using the Clustal W alignment method in MEGA6 [16]. A phylogenetic tree based on full-length nucleotide sequences of the S gene was constructed using the maximum-likelihood method and the Kimura 2-parameter model. A phylogenetic tree based on deduced amino acid sequences of the S1a subunit, containing a hypervariable region (HVR), was constructed using the neighbor-joining method and the Jones–Taylor–Thornton model. A codon-based Z-test of positive selection for S gene sequences of various TCoV isolates was conducted to analyze the differences in the number of nonsynonymous (dN) and synonymous (dS) substitutions per site by using the Nei–Gojobori method [17] in MEGA6. The variance of both trees and codon-based Z-test were validated using 1000 bootstrap replicates.

Fig. 1
figure 1

Schematic diagram of the spike protein of turkey coronavirus. TRS, transcription regulatory sequence; SP, signal peptide; HVR, hypervariable region; HR, heptad repeat; TM, transmembrane domain; CP, cytoplasmic peptides. Numbers in parentheses indicate the amino acid position from the start codon of the spike protein from TCoV/IN/540/94 (EU022525)

Nucleotide sequence accession numbers

The S sequences of TCoV isolates reported in the present study were submitted to the GenBank databse, and their accession numbers ranged from KF652218 to KF652240. The accession numbers of other CoVs used for phylogenetic analysis are also listed in Table 1.

Results

Genetic analysis of the spike sequences of turkey coronavirus isolates

The sizes of the S genes of TCoV isolates reported in the present study ranged from 3609 to 3630 nucleotides. All 24 TCoV isolates exhibited similar S protein sequences (Fig. 1). The consensus transcription-regulating sequence (TRS), CTGAACAA, was identified 52 nucleotides upstream of the start codon of the S protein. Several conserved motifs and one HVR were found in all TCoV isolates, and their sequences are listed for comparison in Table 2. The consensus motif RXRR/X (X is any amino acid, R is arginine, and slash [/] indicates the cleavage position) was found at the cleavage site for the S1 and S2 subunits in 22 TCoV isolates analyzed in the present study and the French TCoV isolates, except for the isolates TCoV/MN/310/96 and TCoV/PA/682/98, which had the amino acid sequence ATS followed by the cleavage site, similar to TCoV/MN/ATCC/76. A conserved sequence, NQGR/S, resembling the furin-dependent cleavage site in IBV [18] was identified in the S2 subunit in all TCoV isolates except the isolate TCoV/MO/2216/99, in which the critical arginine was mutated to glycine and the second probable protein cleavage site was lost. Rather than NQGR/S, French TCoV isolates had the PQGR/S sequence as the conserved cleavage motif in the S2 subunit. Only one HVR, spanning amino acid positions 126 to 134 (TCoV/IN/540/94) was found in the TCoV isolates, rather than the three HVRs identified in IBV [7]. Two 14-amino-acid insertions in heptad repeats (HR1 and HR2), the consensus motif (YIKWPWYVWL) in the transmembrane domain, and the late Golgi retention signal (YYTTF) for S protein were also observed in all 24 TCoV isolates. Among the 45 amino acid residues of the neutralizing-epitope-containing S fragment in the S1 subunit identified in a previous study [19], 33 consensus residues were observed among the 24 TCoV isolates (Online Resource 2).

Table 2 Conserved motifs in the spike (S) protein cleavage sites and the sequences of the hypervariable region (HVR) of turkey coronavirus (TCoV) isolates

Comparison of the spike sequences of turkey coronavirus isolates

A pairwise comparison of the deduced amino acid sequences of the 24 TCoV isolates showed that the sequence identity ranged from 90.0 % to 98.4 % for the full-length S protein, 77.6 % to 96.6 % for the S1a subunit containing HVR, and 92.1 % to 99.3 % for the S2 subunit (Online Resource 3). No positive selection for the S gene was observed among the TCoV isolates (Online Resource 4). The values of dS were greater than those of dN in all comparisons except three pairs of TCoV isolates (TCoV/IN/540/94 and IN/517-Purdue/94, IN/671/04 and IN/834/04, MO/2216/99 and MO/168/99), which had similar dS and dN values and shared high sequence identity in the S gene, exceeding 95 %.

Phylogenetic analysis of the spike gene

Phylogenetic trees based on the full-length S nucleotide sequences (Fig. 2A) and S1a amino acid sequences containing HVRs (Fig. 2B) of different CoVs of the genus Gammacoronavirus were generated. As shown in Figures 2A and 2B, the IBV strains were separated from TCoV isolates, and North American TCoV isolates were separated from French TCoV isolates. Three genetic groups, referred to as groups I, II, and III, were observed in North American TCoV isolates (Fig. 2A). Group I included all North Carolina isolates except TCoV/NC/1020/96, Indiana isolates from 2004 and 2010, Missouri isolates from 1999, and isolates from Pennsylvania, Virginia, and Canada. Group II contained all Texas isolates, Indiana isolates from 1994 and 2009, one Missouri isolate each from 1999 and 2006, and North Carolina, South Carolina, and Arkansas isolates from 1996. Group III was composed of two Minnesota isolates that were isolated 20 years apart and TCoV/PA/682/98, and they had sequence identities higher than 98.5 %. Because of the high degree of variation, most phylogenetic groupings based on the S1a deduced amino acid sequences did not have a bootstrap value over 50 % (Fig. 2B). Nevertheless, the Texas TCoV isolates of group II and all three TCoV isolates of group III shown in the phylogenetic tree based on the full-length S nucleotide sequences still clustered according to their S1a amino acid sequences containing their HVR.

Fig. 2
figure 2

Consensus bootstrap phylogenetic tree based on the full-length spike (S) gene nucleotide sequences (A) and the S1a amino acid sequences (B) of turkey coronavirus (TCoV) isolates, infectious bronchitis virus (IBV) strains, guinea fowl (Gf) CoV, and beluga whale SW1 CoV (accession number in parentheses). The nucleotide sequence tree was constructed by the maximum-likelihood method and the Kimura 2-parameter model, and the amino acid sequence tree was constructed by the neighbor-joining method and the Jones-Taylor-Thonton model in MEGA 6. The bootstrap values were calculated from 1000 trees

Discussion

Turkey coronavirus isolates from different geographic areas in the United States have been shown to be antigenically related to one another [5]. In the present study, an antiserum against the isolate TCoV/IN/540/94 reacted with all 24 TCoV isolates from TCoV-infected turkeys and embryos by IFA assay (data not shown). The close antigenicity among TCoV isolates was associated with the high similarity of the S gene sequences, which ranged from 90.4 % to 99.4 %. The S genes of TCoV isolates were conserved compared with the diverse S genes among various IBV strains, which range from 67.4 % to 94 %, resulting in the existence of many serotypes of IBV [10]. The emergence of new IBV serotypes has been postulated to involve the recombination of S genes of vaccine strains of IBV in the field [20, 21]. In a previous study, different serotypes of TCoV/VA/73/03, TX/1038/98, and IN/517/94 were identified by using a neutralization test in conjunction with real-time RT-PCR despite the high level of amino acid sequence identity (96 % to 98 %) among these TCoV isolates [11]. Additional studies are necessary to clarify the antigenic relationships among the various TCoV isolates and serotypes.

In the present study, similar to previous findings with IBV strains, most of the variations in the S protein sequences among TCoV isolates were observed in the amino-terminal half. Along the alignment of these S protein sequences, the region of sequence with the most variation was between residues 126 and 134 of TCoV/IN/540/94 from the start codon of the S protein. Various deletions occurred in this region in different TCoV isolates. This region is in the vicinity of HVR II (residues 117 to 131) of the IBV S protein. HVR I (residues 56 to 69), II, and III (residues 250 to 365) of the IBV S protein are associated with three neutralizing epitopes. The sequences of these regions could be used for differential diagnosis of IBV serotypes [13, 22]. By contrast, similar regions of high variation corresponding to HVR I or III of IBV were not detected among the TCoV isolates examined. These differences illustrate why the S proteins of IBV strains are more diverse than those of TCoV isolates. Similar phylogenetic trees were constructed using the full-length S and S1a proteins. Thus, genotyping TCoV field isolates based on the S1a sequence containing HVR rather than the whole S1 gene or full-length S gene is more practical.

The observation that TCoV isolates originating from the same state were closely clustered together in the phylogenetic tree suggested endemic circulation of distinct TCoV genotypes in various geographic locations. Endemic circulation of distinct TCoV genotypes in France and North American are recognized because French TCoV isolates share only 60 % amino acid sequence identity in the S protein with North American TCoV isolates [7]. Distinct sources of recombination promoting the emergence of TCoV in North America and Europe has been suggested [7, 11]. The groupings of North Carolina, Texas, and Minnesota TCoV isolates also support the theory of endemic TCoV genotypes. The TCoV isolates from the outbreaks in Arkansas and North Carolina in 2012 also clustered geographically [6] and could be placed phylogenetically in group I in the present study. The 99.3 % amino acid sequence identity of the S proteins of two TCoV isolates recovered 20 years apart in Minnesota (TCoV/MN/ATCC/76 and TCoV/MN/310/96) implied that the TCoV isolate MN/ATCC/76 remained endemic and that no substantial genetic changes occurred over two decades. Conservation of TCoV isolates is also shown in the result that no positive selection of the S protein was found among the 24 TCoV isolates. Because Indiana isolates from 2004 and 2010 clustered in group I with most North Carolina isolates and Indiana isolates from 1994 and 2009 clustered in group II with Texas isolates, it is most likely that the turkey sources were the same for the turkey farms in North Carolina and Indiana in 2004 and 2010, whereas the turkey sources were the same for Texas and Indiana in 1994 and 2009.

In conclusion, the relationship between TCoV genotypes and the geographic distribution of TCoV presented in the present study provides crucial information for the monitoring and control of diseases associated with TCoV infection in the United States.