Avian infectious bronchitis virus (IBV), a member of the family Coronaviridae, order Nidovirales [6], is a highly infectious pathogen of domestic fowl. IBV is an enveloped virus that replicates in the cell cytoplasm and has a single-stranded, positive-sense RNA genome of 27.6 kb in size [2]. The IBV genome comprises ten open reading frames (ORFs). The ORF1 or replicase gene contains two overlapping open reading frames, ORF 1a and 1b [2]. The ORF 1b is produced as a fusion protein of 1a and 1b by -1 frameshift translation [3]. The IBV genome encodes four major structural proteins: the spike (S) glycoprotein, the small envelope (E) protein, the membrane (M) glycoprotein, and the nucleocapsid (N) protein [22, 23]. The spike protein is cleaved into S1 and S2, of which S1 produces neutralizing and serotype-specific antibodies [8, 19]. Because of the error-prone nature of RNA polymerase, coronavirus genomic RNA accumulates several point mutations during it replication, which leads to the emergence of new serotypes and variants [13]. In the case of IBV, most mutations occur in the spike glycoprotein, which is necessary for viral attachment and entry into host cells [11, 27, 28].

While it is possible to study the evolution of viruses and its impact on viral pathogenicity by comparing genomic sequences of heterologous strains, the analysis of homologous strains provides a unique opportunity to understand specific genes that are likely to be involved in viral pathogenicity. To identify specific sequence changes responsible for adaptation of the field virus to chick embryonic tissue and subsequent attenuation, we carried out comparative sequence analysis of the virulent Arkansas (Ark) DPI 11 (passage 11 in chick embryo) strain and its egg-adapted attenuated vaccine virus, Ark DPI 101 (passage 101 in chick embryo). Here, we have chosen Ark DPI as a model to determine the molecular basis of attenuation of IBV. This is the first report of comparative and complete genome sequence analysis of two homologous infectious bronchitis viruses to identify sequence changes responsible for adaptation to chick embryo and subsequent attenuation of IBV.

Chicken embryo passage numbers 11, 26, 51, and 101 of the Ark DPI strain were performed in Dr. Gelb’s laboratory at the University of Delaware. Seed stocks of each passage number were prepared by inoculating 9-day-old specific-pathogen-free (SPF) embryonated chicken eggs and collecting allantoic fluid 72 h post-inoculation. Forty-one-day-old SPF leghorn chickens (SPAFAS, Inc., Norwich, CT) were assigned to 5 treatment groups of 8 birds each (Table 1). Chicks in groups 1 through 4 were inoculated intratracheally with 104.5 embryo infectious dose50 (EID50) per chick of virus from each of the different passage numbers. Experimental inoculation of one-day-old chicks was carried out to evaluate the virulence of Ark DPI 11 and Ark DPI 101. The results of pathogenicity studies, summarized in Table 1, clearly demonstrate that the virulent IBV-Ark DPI strain is gradually attenuated after passage in chicken embryos.

Table 1 Results of experimental inoculation of day-old chicks with IBV-Ark DPI strain obtained after different numbers of passages in embryos

Viral RNA was extracted from allantoic fluid seed stocks and stored at −20°C using the Qiagen RNAeasy kit according to the manufacturer’s instructions. The RT-PCR and the cloning were carried out as described earlier [1]. DNAs from three independent clones were sequenced for each amplicon to exclude errors that can occur from RT and PCR reactions. The assembly of contiguous sequences and multiple sequence alignments were performed with the GeneDoc software [17]. The complete sequences of Ark DPI strain embryo passage numbers 101 and 11 have been submitted to GenBank with the accession numbers EU418975 and EU418976, respectively.

The genomes of both viruses of Ark DPI consist of 27,620 nucleotides (nts) excluding the poly (A) tail and include ten ORFs flanked by 5′ (529 nts) and 3′ (507 nts) untranslated regions (UTRs). The genome organization of Ark DPI is 5′-Rep1a-Rep1b-S-3-M-5-N-3′, as shown in Fig. 1. In this comparative study, we found only 21 nucleotides differences between virulent and avirulent ArkDPI strains, which result in 17 amino acids changes (see Table 2). A single amino acid substitution was found in the p87 protein at nucleotide position 1,107. A similar kind of single amino acid change was reported previously in this coding region (nt 529–1,263) when challenge and vaccine viruses of the M41 strain were compared [14]. The role of the p87 protein is not clearly defined, but it may have a negative effect on PLpro-mediated proteolytic cleavage at the p87/p195 site [30]. The amino acid substitution found in the region of PL1pro at position 945 is unpredictable, because PL1pro, is inactive in IBV [30]. Two nucleotide differences were observed in viral proteinase PL2pro and one of them is silent. The amino acid substitution from acidic Asp to neutral Gly found in PL2pro region could be considered an important one. This amino acid substitution is very close to the active catalytic site (nucleophile cysteine) of PL2pro and could possibly interfere with proteolytic processing. Therefore, we speculate that this amino acid substitution may restrict viral maturation or replication. The amino acid substitution found in domain Y and p9 is difficult to predict because of role of both is unknown. A substitution of Ser to Pro was found in the virulent strain at nucleotide position 10,036 in the HD3 domain. Earlier, it was shown that some of the ORF1a-encoded hydrophobic domains are involved in membrane association of the replication complex of members of the Nidovirales [21, 24, 25]. Hence, this substitution might be critical for adaptation of virus by controlling the replication rate of the virus in different hosts. One amino acid substitution was found in the growth-factor-like (GFL) protein, which is involved in the growth factor signaling pathway [16]. The amino acid change from polar Thr to non-polar Ile may interrupt membrane association of this protein and thereby affect viral replication.

Fig. 1
figure 1

Organization of the infectious bronchitis virus genome. The genome of Ark DPI is 27,620 nt long, excluding the poly (A) tract. Middle ten genes and their ORFs. The scale indicates the approximate positions and sizes of genes in the Ark DPI genome. Bottom putative domains of ORF1a/1b polyprotein: nsp non-structural protein; Ac acidic domain; X unknown domain X; PL1 papain-like proteinase1; PL2 papain-like proteinase 2; Y unknown domain Y; HD hydrophobic domain; 3CL 3C-like proteinase; G growth-factor-like protein (GFL); RdRp RNA-dependent RNA polymerase; Hel helicase; ExoN exoribonuclease; Ne nidoviral uridylate-specific endoribonuclease; MT 2′-O-ribose methyltransferase. Top details of spike protein. SP signal peptide; RRSRR/S spike protein cleavage site between 544 and 545aa; TM transmembrane domain of spike protein. Nucleotide nt and amino acid AA differences between ArkDPI 11 and 101 and their approximate positions are depicted

Table 2 Nucleotide and deduced amino acid differences between virulent and avirulent IBVs of Ark DPI strain

The replicase gene is usually not subjected to host immunity and is quite conserved in coronaviruses [15]. The main replicase proteins, RNA-dependent RNA polymerase (RdRp) and 3C-like cysteine protease (3CLpro) or main proteinase (Mpro), were highly conserved, and not a single amino acid difference was noted. Virulent and attenuated strains differed by one amino acid in the helicase domain at nucleotide position 15,763. The amino acid change of Arg for His in the attenuated strain might significantly alter viral replication.

Among the structural genes, most of the nucleotide differences were located in the spike gene. Out of eight amino acid differences in the S protein, six were in the S1 region, located between amino acid positions 42 and 324. It has been shown that the S protein of coronaviruses is responsible for cell tropism [4, 12, 20]. Earlier workers predicted three hypervariable regions (HVR) in S1 of the spike protein [7, 15, 18] depending upon clustering of amino acid differences. In this study, we found a single amino acid change in each one of the HVRs. The changes were positive His to neutral Tyr in HVR I, polar Ser to nonpolar Pro in HVR II, and positive polar Arg to neutral nonpolar Ile in HVR III. Apart from HVRs, the region between residues 162 and 214 had three amino acid substitutions. Previous studies have shown that HVRs encode the serotype- and neutralization-specific epitopes, and the amino acid substitutions observed between Ark DPI 11 and Ark DPI 101 in the S1 region of the spike protein may have a similar function [7, 15]. Our findings are supported by a recent study, which revealed that a single passage of Ark DPI vaccine in a chicken led to selection of virus populations with an S1 gene that is similar to that of the virulent parental strain [26]. It is evident that the markers of virulence and adaptation reside mostly in the S1 protein. Of the two amino acid substitutions in the S2 region, one is located downstream and in the vicinity of the fusion peptide, and the other one is located in heptad repeat region 2. These mutations in S2 may alter the fusogenic properties of the S protein. The S1 undergoes more nucleotide changes than S2, which is quite conserved. But minimal changes in S2 are enough to alter the membrane fusion ability of the spike protein and thereby infectivity [9]. Interestingly, out of eight amino acid substitutions in S, six of the charged residues in Ark DPI 11 were mutated to neutral residues in Ark DPI 101. The two charged amino acids of Ark DPI 11 S2 were changed to membrane-interacting (hydrophobic) residues in Ark DPI 101. These residue changes in the S protein may contribute to adaptation of field virus to chick embryonic tissue and subsequent attenuation of the virus.

There was one amino acid difference between the attenuated and the virulent strain found in the 5b protein. The role of gene 5 in pathogenesis and replication is not clear, and it is considered non-essential for replication of virus [5]. Therefore, this amino acid difference in the 5b protein could be regarded as non-significant for viral attenuation. There was a notable nucleotide substitution found in the N protein gene, and one in the 3′UTR. In the attenuated strain, at nucleotide position 27,101 in the N gene, G was changed to A, and at the same time, at nucleotide position 21,580 in the 3′UTR, C was changed to T. This corresponding nucleotide change seems to be significant, because earlier studies demonstrated that the N protein binds very strongly to the extreme 3′ end of UTR [29]. The binding of the N protein to the 3′UTR is essential for synthesis of negative-strand viral RNA. It has been shown that the N protein interacts with the 3′UTR, but the sequence–specific interaction between the N gene and the 3′UTR is not clear [29]. The nucleotide substitutions found in N and the 3′UTR suggest that it may have an impact on viral replication and thereby on viral pathogenesis.

The role of the replicase gene of IBV in pathogenicity is not well understood. However, the amino acid changes in the ORF1a/1b proteins give an insight into putative residues that may be involved in the adaptation to chick embryonic tissue and subsequent attenuation of the virus. Although Ark DPI 11 and Ark DPI 101 are 99.92% similar in their nucleotide sequences, the pathogenicity of these viruses is entirely different. The spike protein is the major determinant of cell tropism in IBV, and the majority of nucleotide differences observed in the S1 gene in this study support and extend earlier observations [4]. The substitutions in the replicase proteins should be considered critical for their role in replication, and thus the pathogenicity of the virus. Even though only structural genes of IBV are known for affecting pathogenicity [4, 10], this study also suggests the involvement of the replicase gene.