Introduction

Avian infectious bronchitis virus (IBV) is the causal agent of the Infectious Bronchitis, an acute and highly contagious disease of chickens. It is categorized to the group 3 of genus Coronavirus, together with genetically related coronavirus of turkey [1]. Numerous IBV strains existed, causing pathological changes in different organs varying from respiratory tract to kidney and gonads [24].

IBV is an enveloped virus replicates in the cell cytoplasm with a single-stranded, positive-sense RNA genome of 27.6 kb in size [5]. Five 5′- and 3′-co-terminal subgenomic mRNAs are transcribed from the IBV genome in virus-infected cells [6]. The Gene1 (replicase gene) of IBV contains two overlapping open reading frames (ORFs) ORF 1a and ORF 1b, of which the latter is translated to polyprotein pp 1ab by −1 frameshift translation [7]. These polyproteins are proteolytically processed into smaller products, e.g., papain-like proteinase (PLpro), main proteinase (Mpro), RNA-dependent RNA polymerase (RdRp), and RNA helicase, et al., which are involved in RNA synthesis and other aspects of viral pathogenesis. Recently, the replicase gene was demonstrated to be a determinant of pathogenicity in IBV by reverse genetics [8]. Four structural proteins including spike (S) glycoprotein, small envelope (E) protein, membrane (M) glycoprotein, and nucleocapsid (N) protein are encoded by mRNA 2, 3, 4, and 6, respectively [79]. The S protein is cleaved into S1 and S2, of which S1 produces neutralizing and serotype-specific antibodies [3, 9].

Presently, vaccination is the major way for prevention of IBV, but the protection efficiency for IBV infection is not satisfactory because of the continual emergence of new serotypes and poor cross-protection between them [1014]. H120, an attenuated live vaccine strain of Massachusetts (Mass) serotype, was originally obtained by serial passage of strain H (isolated in the Netherlands [15] in 1956) in embryonated chicken eggs up to the 120th passage. It has been used worldwide as a primary vaccine in broilers, breeders, and future layers for more than 50 years. Even in recent years, H120 is still popular used and commonly considered to conform to the rigorous safety standards required for avian vaccines (review by Bijlenga et al. [15]). However, the protection of H120 was also reported to be poor against some Chinese field strains [12], raising the issues of further evaluating the antigenicity and security of H120 from genome level.

To investigate the genomic feature of H120 and further understand its role in the epidemiology of IBV, the complete genome of IBV strain H120 was sequenced and compared with sequences of several widely used vaccine strains, recently reported field strains, and other typical IBV strains by phylogenetic and recombination analysis.

Materials and methods

Amplification, cloning, and sequencing of the H120 virus

Virus propagation

IBV strain H120 was obtained from Chinese Institute of Veterinary Drug Control. The H120 virus was inoculated into the allantoic cavity of 9-day-old embryonated specific pathogen-free (SPF) chicken eggs. The embryos were incubated at 37°C and examined twice daily for their viability. The allantoic fluid was collected at 72 h post-inoculation and stored at −80°C.

Viral RNA extraction, RACE, and RT-PCR

Genomic RNA was extracted from virus-infected allantoic fluid with TRIzol (Invitrogen, USA) following manufacturer’s instructions. Primers were designed from the consensus sequence deduced from the alignment of 16 IBV complete sequences in GenBank (Table 1). Five primers were used for RACE PCR to amplify 5′ and 3′ terminal sequences, and 16 pairs of primers were used to amplify the remaining parts of H120 genome (Table 2). 5′-RACE, 3′-RACE, and reverse transcription were performed using 5′-full RACE kit (TaKaRa, Japan) and Superscript III First-Strand Synthesis System (Invitrogen, USA) following the manufacturer’s instructions. PCR was performed in PCR Machine (Bio-Rad, USA) with 2 μl cDNA as template in a total of 25 μl reaction volume containing 2.5 μl of 10× LA buffer, 5 μl of 2.5 mM dNTPs, 2 μl of 25 mM MgCl2, 1 μl of 10 pmol/μl each primer, and 0.3 μl of LA Taq polymerase (TaKaRa, Japan). The PCR parameters included an initial denaturation for 5 min at 94°C followed by 30 cycles of denaturation at 94°C for 50 s, annealing at 53°C for 50 s, and extension at 72°C for 1–3 min depending on the sizes of the products and a final extension at 72°C for 10 min.

Table 1 Coronavirus sequences analyzed in this study
Table 2 Primers used for H120 genome amplification

Genome sequencing

All PCR products were purified from 0.8% agarose gel using QIAquick PCR Purification Kit (QIAGEN Inc., Valencia, CA). The purified products were cloned into the pMD19-T Vector (TaKaRa, Japan). The nucleotide sequences of the positive clones were determined by Sangon Biological Engineering Technology & Services Co., Ltd. Each nucleotide was determined from three identical results. Sequences were assembled into complete genome sequence using SeqMan II program of DNAstar software package (DNAStar, Madison, WI).

Genome sequence analysis

Sequence comparison and phylogenetic analysis

Full or partial sequences of 30 IBV strains and four bird coronavirus strains (as outgroup) were retrieved from GenBank with accession numbers listed in Table 1. MegAlign program of DNAStar software package was used to generate multiple sequence alignments and determine the nucleotide identity with clustal V method. Phylogenetic analyses were performed with the neighbor-joining method using MEGA version 4 [16]. The bootstrap values were determined from 1000 replicates of the original data.

Putative recombination analysis

The complete genome sequences based on the multiple alignment result were used for recombination analysis. Potential recombination events were detected using the RDP [17], GENECONV [18], and MaxChi [19] recombination detection methods implemented in RDP3.41 [20]. The highest acceptable P value and the sliding window size were set to 0.05 and 30 bp, respectively.

Results and discussion

Complete genome of H120

The complete genome sequence of H120 obtained in this study was submitted to GenBank database under the accession number of FJ888351. The complete genome of H120 consists of 27631 nucleotides (nt), which encodes six genes. The gene 1 consists of two ORFs, ORF 1a and 1b, which are 11802 and 8064 nt long, respectively. Gene 2, encoding S protein, has a single ORF of 3489 nt (encoding 1162 amino acids (aa)). The S1 and S2 genes are 1596 and 1893 nt long, respectively. Gene 3 consists of 709 nt with 5′-end 32nt overlaps with S gene. There were three ORFs in gene 3 including 3a (174nt), 3b (195nt), and 3c (330nt), encoding proteins of sizes 57, 64, and 109 aa in length, respectively. Gene 4, encoding M protein, has a single ORF of 678 nt (encoding 225 aa). There is a 342 nt non-coding region between the 3′-end of the M protein gene and 5′-end of gene 5. Gene 5 contains two ORFs: 5a and 5b with 198 and 249 nt, respectively (encoding 65 and 82 aa, respectively). Gene 6 has a single ORF of 1230 nt, encoding the N protein of 409 aa. Untranslated regions (UTRs) at 5′- and 3′-end of the genome are both 528nt in length. Transcription-regulating sequence (TRS), CT(T/G)AACAA, exists at the start of each gene, and the distances between TRS and the initiation codons of gene 1 to gene 6 are 464, 52, 23, 77, 9, and 93 nt, respectively.

The genomic features of H120, including location of TRS and Hypervariable regions (HVRs), size of ORFs, 5′-UTR and overlaps between neighboring genes, are similar to those of ZJ971, H52, Ark DPI, M41, and Beaudette. Specifically, H120 and H52 were found to possess a more intact and longer 3′-UTR in contrast to all the other IBV strains compared, making 3′-UTR a potential molecular marker for the H strains in IBV detection and phylogenetic studies.

Genome sequence analysis

Sequence comparison

Different sequences of the H lineage were compared with each other before compared with other IBV lineages. H120 sequences were found to be highly consistent with each other with no deletion or insertion. Corresponding regions of H120 (FJ888351) were 99.7–100% identical with other H120 sequences in GenBank (Table 3). Comparatively, the H52 sequences are more variable. Sequences of the US H52 isolates have huge differences from those of Chinese isolates though they were all originated in the Netherlands and introduced to the market of each country at about 1980 (USA) and 1990 (China). The US H52 isolates exhibit high nucleotide identities with H120. The four sequences of H52-USA1 (located at nt 1–1263, 4186–5496, 8881–9300, and 14046–14744) and the two sequences of H52-USA2 (located at nt 24150–24479 and 24451–25128) are 99.7, 99.8, 99.8, 100, 99.7, and 99.9% identical with H120 at the nucleotide level, respectively. Whereas the Chinese H52 isolates are less similar to H120. Especially, three regions of the complete sequence of H52-China1 (located at nt 14046–14744, 25682–25930, 25873–27102) and the two regions of H52-China2 (located at nt 24451–25128 and 25873–27102) exhibit comparatively low identities of 91.8, 96, 95.9, 96, and 89.9%, respectively.

Table 3 Comparison between H120 (FJ888351) and other H120 sequences

Compared with other complete sequences of IBV, H120 exhibited identities ranging from 84.6 to 99.8% (Table 4). H120 shows the highest similarity with ZJ971 (a Chinese strain causes swollen proventriculus of chickens [21]) with identity rate of 99.8% and the lowest with most of other Chinese strains (BJ, S14, LX4, SAIBK, SC021202, and TW2575/98) with identities ranging from 84.6 to 87.7%.

Table 4 Percentage of nucleotide identity of different regions of H120 genome compared with other IBV strains (complete genome and 5′-terminal 20 kb)

Between H120 and other IBV strains, 5′-UTR is the most conserved region with 91.5–99.4% identity, while the 3′-UTR is the most variable region with identities ranging from 49.9 to 80.9% (Tables 4, 5). Overall, H120 has higher nucleotide identities with other IBV strains in replicase gene (85.3–96.4%), Mpro (84.5–99.5%), PLpro (82–99.7%), RdRp (86.4–99.9%), M (82.3–97.4%), 5b (91.2–100%), and N (86.1–94.2%) than in S1 (52.2–99.8%), S2 (74.5–98.3%), 3a (76.1–96%), 3b (77.1–99.5%), E (73.9–97.3%), and 5a (79.3–100%). In particular, the frequency of silent mutation was found to be high in RdRp and Mpro, leading their amino acid sequences more conserved (90–99.3% and 93.6–100%, respectively) than nucleotide sequences.

Table 5 Percentage of nucleotide identity of different regions of H120 genome compared with other IBV strains (3′-terminal 7 kb)

Sequence differences between H120 and ZJ971

As shown in Table 6, when compared with H120, there were 23 point mutations detected in coding regions of ZJ971 resulting in 9 amino acid changes. Three of the mutations were in nsp2, two in nsp4, and four in S1 including three are located in HVRs (52I to V, 122I to L, 130S to F). One point mutation and a 3-nt deletion were identified at the 3′-UTR of ZJ971. Which of these mutations are responsible for the virulence reversion from H120 to ZJ971 remains further investigation.

Table 6 Nucleotide and deduced amino acid difference between H120 and ZJ971

Phylogenetic trees

Phylogenetic trees were constructed based on the sequence alignment of complete or partial genome of 30 IBV strains and four bird coronavirus strains (Table 1). Trees based on different genomic regions exhibited considerably different topologies (Fig. 1). In the phylogenetic tree of complete genome, IBV strains were grouped into five clades (Fig. 1a). H120 was distributed into clade I with H52, ZJ971, Ark DPI 11, Ark DPI 101, and Cal99. Clade II was formed by Beaudette, Peafowl/GD/KQ6/2003 (abbr. to KQ6), and M41. Clades III, IV, and V were exclusively formed by the Chinese strains. Phylogenetic trees based on different regions of genome were presented in Fig. 1b–i. In the PLpro-based tree, H120 clustered with the Ark strains. Whereas in the trees based on structural genes of S1, S2, E, M, H120 grouped with the Mass strains into one clade. In trees based on Mpro and RdRp, H120 formed one clade with ZJ971, H52, Ma5, and DE072, distinct from clades of Ark and Mass strains. In the N-based tree, H120 clustered with Cal99 and Ark DPI.

Fig. 1
figure 1figure 1figure 1

Phylogenetic trees by neighbor-joining method (bootstrapping for 1000 replicates with its value >70%) based on complete genome and different regions of genome. Sequences of H120 strains were labeled by filled upright triangle, sequences of H52 strains were labeled by filled inverted triangle, sequences of Ark DPI strains were labeled by filled diamond, sequences of KQ6 were labeled by filled square and sequences of SAIBK were labeled by filled circle

Recombination analysis

Potential recombination events

Supported by all the employed recombination detection methods, five strains of IBV including H52-China1, KQ6, SAIBK, Ark DPI 11, and Ark DPI 101 were identified to be possible mosaics with H120 as one of their putative parents (Fig. 2).

Fig. 2
figure 2

RDP screenshots displaying the possible recombination events associated with H120. Each panel displays the pairwise identities among the possible mosaic and its putative parents. Pairwise identity refers to the average pairwise sequence identity within a 30nt sliding window moved one nucleotide at a time along the alignment of the three sequences. The light area demarcates the potential recombination regions. Crossover sites were indicated by arrows with nt positions above. a Comparisons among the putative mosaic H52-China1 and its putative parents, H120 and M41. b Comparisons among the putative mosaic KQ6 and its putative parents, H120 and M41. c Comparisons among the putative mosaic SAIBK and its putative parents, H120 and SC021202. d Comparisons among the putative mosaics Ark DPI 11/Ark DPI 101 and their putative parents, H120 and M41. The RDP result of Ark DPI 101 was not shown independently because it was identical to that of Ark DPI 11

H52-China1 was found to be a mosaic between H120 and M41 (Fig. 2a). As indicated in Fig. 2a, the regions of recombination were located at position 3859–4571, 10954–20511, 22390–23497, 24791–25919, and 26684–27251 (the light areas). These regions exhibit higher identities with M41 (99.7, 99.7, 99.5, 99.5, and 99.4%, given 5′ to 3′ in turn, similarly hereinafter) and lower identities with H120 (85.8, 93.3, 96.6, 97.2, and 90.2%). Whereas other regions of genome exhibit higher identities with H120 (99.5, 99.7, 99.4, 99.7, 99.7, and 99.8%) and lower identities with M41 (88.8, 88.3, 96.9, 98.1, 87.6, and 77.6%). However, although H120 was identified to be the parent strain, it is more convincing if H52-China1 was developed by recombination between M41 and an ancestral H52 highly similar with H120 based on two reasons: First, as indicated by a previous study on the sequence changes responsible for the attenuation of Ark DPI [22], similarities between the ancestral strains of H lineage should be extremely high. Second, all the sequences of the H isolates except those of the Chinese H52 isolates share high identities, suggesting that genetic changes have occurred in the ancestry of these Chinese H52 isolates.

The IBV strain KQ6, isolated from wild peafowl in Guangdong province of China, was described as a Mass prototype like IBV strain which had existed in peafowl for a long period and evolved independently away from the influence of currently used vaccine in the previous study [23]. However, KQ6 is more likely to be a mosaic in our current study (Fig. 2b). Interestingly, its putative parents were also identified to be H120 and M41, indicating that H52-China1 and KQ6 might be developed in a similar way related with vaccine manufacturing. As indicated in Fig. 2b, five H120-like regions in KQ6 were located at position 683–1473, 2775–3571, 14220–15159, 15873–16248, and 26002–27237 (the light areas). These regions exhibit higher identities with H120 (99.6, 99.7, 99.6, 100, and 99.5%) and lower identities with M41 (91.5, 77.4, 92.2, 91.7, and 88.8%). Whereas other regions of KQ6 genome exhibit higher identities with M41 (99.3, 99.7, 99.3, 99.3, 99.6, and 99.4%) and lower identities with H120 (95.8, 88.8, 90.1, 93.1, 95.4, and 77.7%).

In addition, H120-like regions were also found in SAIBK and Ark DPI 11/Ark DPI 101. The H120-like region of SAIBK, which is 99.1% identical to H120, was located at position 7431–9451 (the light area in Fig. 2c). While other regions of genome showed lower identities with H120 (86.1 and 87.7%). As shown in Fig. 2d, the H120-like regions of the Ark DPI strains were located at position 1–8832 with 99.6% identity and the other part of genome exhibit 92.1% with H120. The other parents of SAIBK and the Ark DPI strains were identified to be unknown IBV strains, which have comparatively close relationships with the strains of SC021202 and M41, respectively.

Specifically, several regions of H52-China1, including the RdRp domain, 5′-terminal 826 nt of S2, 5′-terminal 612 nt of M and E gene, were found to be involved in the regions of recombination (Fig. 2a). As a result, in the phylogenetic trees based on RdRp, S2, M and E, H52-China1 clustered away from other isolates of H strains and clustered with M41 instead. Besides, 5′-terminal 692 nt of RdRp domain of KQ6 and the PLpro domain of Ark DPI 11/Ark DPI 101 are also located in the regions of recombination, and with a similar pattern to H52-China1, their phylogenies in trees of corresponding regions have been affected by the recombination events.

In previous studies, vaccine strains H52 [10], D1466 [11], Conn [24], and Ark DPI [25] were reported to act as the “heterologous RNA donor template” in recombination events. However, those studies focused only on the recombination in 3′-terminal 7 kb of the genome. Our present study firstly analyzed potential recombination events in the entire genome of IBV and found proofs demonstrating that recombination events also occurred frequently in the 5′-terminal 20 kb of the genome.

Recombination hot spots

Recombination in coronavirus is generally believed to occur via template switching mechanism [26, 27], and it is predicted that recombination occurs more frequently at RNA sites of strong secondary structure, since these structures promote transcriptional pausing [28]. Potential recombination “hot spots” are reported to be located adjacent to the putative recombinant junctions, and all of these “hot spots” are described to be AT-rich motifs, e.g., CT(T/G)AACAA [10], (A/T/G)TTTTG, CTTTTG [26, 29]. However, a previous report indicated that the crossover sites were distributed almost randomly when recombination of coronavirus was examined under non-selective conditions [30]. In current study, AT-rich motifs (TTTTG or T(T/A)(G/T)AACAA) were found in regions adjacent to nearly all the putative crossover sites. Even though, it is still difficult to determine whether these motifs are related to the recombination or they just appear around the recombinant sites by chance, because these motifs are detected frequently (119 times for TTTTG and 50 times for AACAA) in the genome of H120.

In summary, this study described the features of the genome of H120 and showed the putative evidences of virulence reversion and recombination associated with H120, indicating that IBV attenuated vaccine H120 might contribute to the emergence of new IBV variants through both virulence reversion and recombination. Although more proofs are needed to verify the putative results of this study, great attention should be paid to supervise the use of IBV attenuated live vaccines.