Porcine epidemic diarrhea virus (PEDV), a member of genus Alphacoronavirus, family Coronaviridae, which together with the families Arteriviridae and Roniviridae, constitute the order Nidovirales, has a single-stranded positive-sense RNA genome of approximately 28 kb that is infectious. PEDV was first reported in Belgium and the United Kingdom in 1978 [1]. In China, PEDV was first confirmed by fluorescent antibody test and serum neutralization in 1984 [2]. Although a bivalent attenuated vaccine against TGEV and PEDV is being used in China [3], PEDV occurs frequently on many swine-raising farms in China.

Coronavirus nucleocapsid (N) proteins vary from 377 to 455 amino acids in length, are highly basic, and have a high (7 to 11 %) serine content. These serines are potential targets for phosphorylation. Antigenic studies have shown that the N protein is one of the immunodominant antigens in members of the family Coronaviridae [4]. The N protein of infectious bronchitis virus (IBV) is a relevant target for immune recognition in both mice and chickens [5].

The sequences of N genes of PEDV strains CV777 and Br1/87 were first determined in 1993, and the deduced amino acid sequences of these strains are 441 amino acids in length [6]. A nested polymerase chain reaction (PCR) based on a partial sequence of the N gene was designed to detect PEDV in Japan in 1999 [7]. The purpose of this study was to investigate the sequence diversity of N genes of PEDV field strains during 2006–2011 in China. Because the detection of PEDV in feces or contents of the small intestine might be affected by the reliability and sensitivity of the technique [8], a reverse transcription nested PCR (RT-nested PCR) was established to amplify the full-length N genes of field strains in this study.

One hundred twenty-seven porcine fecal samples or contents of the small intestine were collected from piglets showing watery diarrhea and dehydration on 32 swine-raising farms in 15 provinces in China from January 2006 to August 2011. All of the samples were diluted with phosphate-buffered saline to make 10 % (v/v) suspensions. The suspensions were vortexed for 1 min and clarified by centrifugation for 10 min at 5,000 rpm. The supernatants were collected for RT-nested PCR.

Viral RNA was extracted from the supernatants using TRIzol Reagent (Invitrogen Corp., Carlsbad, USA), and the first-strand complementary DNA (cDNA) was synthesized with M-MLV reverse transcriptase (Promega, USA) using a specific primer (N1L, 5′-TCAAATACCTGGCACGCTCT-3′) according to the manufacturer’s instructions.

Two pairs of primers (N1U, 5′-TATAAGGTTGCTACTGGCGT-3′ and N1L, 5′-TCA AATACCTGGCACGCTCT-3′; N2U, 5′-GTCAAAACACGGCGACTATT-3′ and N2L, 5′-TGGCACTACCCTGGAACATA-3′) for RT-nested PCR were designed and synthesized according to the corresponding sequence of CV777 (AF353511). The outer span, including primers N1U and N1L, was 1843 bp, and the inner span, including primers N2U and N2L, was 1465 bp. The fragments containing the full-length N gene were amplified from 94 samples using the inner primers (N2U/N2L). The overall detection rate of PEDV in the samples was 74.0 % (94/127). TGEV was also detected by RT-PCR in our lab, and the overall detection rate was 36.5 % in pigs with diarrhea in China.

PCR products were excised from 1.0 % agarose gels, purified using an AxyPrepTM DNA Gel Extraction Kit (Axygen Scientific, inc., USA), and cloned into the T-tailed vector pMD18-T, and these clones were introduced into JM109 competent cells (TaKaRa, China) by transformation. Three recombinant DNA clones were sequenced by the dideoxy nucleotide chain terminator method. All of the sequences in this study have been deposited in the GenBank database. The field strains and their accession numbers are as follows: CH/IMB/06 (FJ473387), CH/HNCH/06 (FJ473388), CH/JSX/06 (FJ473389), CH/HLJH/06 (FJ473390), CH/IMT/06 (FJ473391), CH/SHH/06 (FJ473392), CH/HLJM/07 (FJ473393), CH/HNH/07 (FJ473394), CH/GSJ/07 (HM210880), CH/JL/09 (HM210881), CH/GDS/09 (HM210882), CH/HLJQ/2010 (HQ455345), CH/HLJHG/2010 (HQ455346), CH/HNZZ/2011 (JN601052), CH/BJYQ-1/2011 (JN601053), CH/BJYQ-2/2011 (JN601054), CH/FJND/2011 (JN601055), CH/GDQY-1/2011 (JN601056), CH/GDQY-2/2011 (JN601057), CH/GDQY-3/2011 (JN601058), CH/HLJHG/2011 (JN601059), CH/SDRZ-1/2011 (JN601060), CH/SDRZ-2/2011 (JN601061), CH/GXNN/2011 (JN601062), CH/BJSY/2011 (JQ735953), CH/HLJHRB/2011 (JQ743650), CH/HLJHH/2011 (JQ743651), CH/GXWP/2011 (JQ743652), CH/XJUrumqi/2011 (JQ743653), CH/GXQZ/2011 (JQ743654), CH/ZJHZ/2011 (JQ743655), CH/GXWM/2011 (JQ743656).

The N genes of the 32 field strains were found to contain a single open reading frame (ORF) consisting of 1326 nucleotides. There were no nucleotide deletions or insertions in the ORFs of the N genes. All of the N genes had a hexamer motif (CTAAAC), which is the transcription-regulating sequence (TRS), located in the nine nucleotides upstream of the initiator ATG, as recognized in a previously study [9]. There was a nine-nucleotide conserved sequence (AGAAACTTT) between the TRS and the start codon of the N gene. The N genes of the 32 field strains showed 95.3–100 % sequence identity to each other. They showed lower sequence identity to the field strain LZC (95.0–97.4 %) than to other Chinese reference field strains (95.6–99.7 %). They showed 95.9–100 % sequence identity to three attenuated strains (CV777, DR13 and 83P-5), whose N genes were 100 % identical to each other.

The N proteins of the 32 field strains were predicted to be 441 amino acids in length, with 7.3–8.4 % (32–37 serines) serine content. Phosphorylated N protein binds to viral RNA with a higher binding affinity than non-viral RNA, suggesting that phosphorylation of the N protein determines the recognition of viral RNA [10]. The phosphorylation sites of the N proteins of the field strains were predicted by the Web tool DISPHOS (http://www.ist.temple.edu/DISPHOS), which uses disorder information to detect phosphorylation sites. Only residues with a prediction value >0.5 are considered to be phosphorylated. The number of predicted phosphorylation sites of N proteins varied from 5 to 12. Of the 32 field strains, one strain had 5 predicted phosphorylation sites, two strains had 7, sixteen had 8, one had 9, three had 10, two had have 11 and seven strains had 12 (Table 1). The deduced amino acid sequences of 32 field strains showed 95.0–100 % sequence identity to each other. They showed 95.0–98.0 and 96.2–100.0 % sequence identity to CV777 and attenuated strains, respectively.

Table 1 The predicted phosphorylation sites of N proteins of Chinese field strains and CV777

Like the spike glycoprotein gene [11, 12], the N gene is an important component in the phylogenetic analysis of the epidemiological situation of coronaviruses in the field [13, 14]. A phylogenetic tree based on the N gene was constructed using MEGA 5.05 software [15], and the tree showed that the PEDV strains were divided into four groups (Fig. 1). The 32 field strains were divided into three groups. Twelve field strains in group 1 have a close relationship to CH/S, 83P-5. Fourteen field strains in group 3 have a close relationship to five Chinese reference strains (LJB/03, JS-2004-2, DX, BJ2010, HB/HS). Six field strains in group 4 are genetically different from other Chinese field strains and may represent a novel genotype of PEDV. It is notable that these field strains have 4–6 nucleotides that are different from the other strains at nt 1290–1298.

Fig. 1
figure 1

Phylogenetic analysis of the nucleotide sequences of N genes of PEDV field strains. The tree was constructed based on the neighbor-joining method using the MEGA 5.05 software. The scale bar indicates the branch lengths for 0.2 % nucleotide differences. The Chinese PEDV strains are marked with triangles

In summary, PEDV has a high prevalence rate in swine herds. The field strains are genetically diverse in their N genes, both among themselves and as compared with reference strains. Phylogenetic analysis indicated that there are three genotypes of PEDV prevailing in China. Moreover, the sequences of N genes will enrich the information of the sequence database and form the basis for further functional exploration of PEDV.