Infectious bronchitis (IB) is a viral disease that affects chickens. It causes high morbidity and economic losses in the industry around the world [1, 10]. The etiological agent is the Avian coronavirus [8] or avian infectious bronchitis virus (IBV), member of the genus Gammacoronavirus, family Coronaviridae, order Nidovirales [2, 8, 13]. The virion has a lipid envelope and the genome is a positive-sense linear RNA of approx. 27.6 kb [1, 2, 10], with the following as the most common genome organization 5′-UTR-Pol-S-3a-3b-E-M-5a-5b-N-3′UTR [24, 35]. The first gene encodes proteins involved in replication and transcription, and it has two open reading frames (ORFs): 1a and 1b [1, 4, 13, 34] which are translated into polyprotein 1a and 1ab due to a change in the reading frame [1, 4, 34]. The last third of the genome has the structural genes for the spike (S), envelope (E), membrane (M), and nucleocapsid (N) proteins, with additional ORFs that codify for the accessory proteins 3a, 3b, 5a y 5b [24, 28, 34].

Different strains of IBV have been described around the world including Massachusetts, Beaudette, Holte, 4/91, Arkansas, Connecticut, D274, and QX-Like, among others [2, 5, 12]. The classification of IBV genotypes is based on the S1 sequence of the spike gene, and six genotypes which comprise 32 lineages have been described [1, 34]. Mass-like, Ark-like and Penn-like variants, plus one unassigned genotype designated as IBV-CR-53 [5, 23], have been reported in Costa Rica since 1990 [17, 23].

Starting in May 2016 until mid 2017, there was an IBV outbreak throughout farms in Costa Rica. Poultry exhibited mild respiratory infections and mortality. However, great economic losses due to carcass condemnation were reported [30]. The IBV isolate associated to this outbreak was classified as a Georgia 13-like type (GA13-like), based on S1 gene sequence [30].

We isolated the virus associated to this outbreak in 9-to-11-day-old embryonated specific-pathogen-free (SPF) eggs [2]. For RNA isolation, allantoic fluid was collected and purified in a 30% sucrose cushion by ultracentrifugation at 27 800 rpm for 4 h [19]. The RNA from the pellet was extracted using TRIZOL (Ambion® CA, ES) in accordance with the manufacturer's instructions [35]. RNA was converted to double-stranded DNA using random hexamers, Superscript III and Klenow enzyme. The NGS library was prepared with Nextera™ XT (Illumina®) reagents. RNA and dsDNA quantification and quality control were conducted using Quantus Fluorometer® (Promega) and a QIAxcel® (QIAGEN®). The library was sequenced on an Illumina MiSeq using a paired-end (2 × 250 bp) protocol. The quality control of the sequence run was analyzed with Sequence Analysis Viewer (SAV) (Illumina®). Short reads quality was analyzed using FASTQC and quality trimming was conducted using Trimmomatic [7]. De novo assembly was conducted using SPADES [6] and the longest contig was compared with viral sequences at NCBI using BLAST. Bowtie2 was used to align all short reads of the draft genome and Artemis [9] was used for visualization of the alignment. Automatic annotation was done using PROKKA [31] followed by manual curation using information from the ViPr database. Each CDS and genome feature was compared to existing sequences using BLASTP (nr database).

Possible recombination events were evaluated using the recombination detection program RDP4 V.4.95 [1, 2, 33, 35], to detect recombinations in at least five out of the seven possible methods [35]. The phylogenetic analysis was conducted by taking sequences of the S1 gene region and the IBV whole-genome sequences available in the GenBank database. The sequences were aligned using MAFFT algorithm available in Guidance2 server [18, 32], and the software PartitionFinder2 was used to determine the best substitution model [20, 21]. The phylogenetic trees were made using the Bayesian inference with Mr.Bayes 3.2.6 [15]. Phylogenetic analyses were performed in the CIPRES Science Gateway Cluster [26].

The whole-genome sequence of isolate CK/CR/1160/16 was uploaded to the GenBank database under the accession number MN757859 and raw data were deposited in the SRA under accession number SRR10547950, BioProject number PRJNA592262, and BioSample number SAMN13419001. The complete sequence was 27 696 bp long, consistent with previously reported lengths [2, 29, 35], with thirteen ORFs, containing 9 genes, with two UTR regions, and a noncoding region between gene N and 6b (Table 1). The genome organization of isolate CK/CR/1160/16 was 5′-UTR-Pol-S-3a-3b-E-4b-4c-M-5a-5b-N-6b-3′-UTR, which differs with the classic IBV gene distribution [24, 35], but has been previously reported [1, 27, 33].

Table 1 Viral genes, genome location, deduced protein length and nucleotide sequence identity (%) of the CK/CR/1160/16 isolate (Genbank accession MN757859), compared to full-length sequences of representative IBV strains

The phylogenetic analysis of the S1 region shows that isolate CK/CR/1160/16 forms a cluster with sequences that belong to the genotype 1 lineage 17, where the variants from California and Pennsylvania isolates (1990s) are found [34] (Fig. 1a). Moreover, the Costa Rican isolate CK/CR/1160/16 shows a close phylogenetic relatedness to isolate GA-13/14,255/14 (KM087780) with a bootstrap value of 100 (Fig. 1a). The sequences from other regions in America form two different clusters: GI-11 (unique for South America, including sequences from Argentina and Brazil), and GI-16 (reported in Asia and Italy and including sequences from Argentina and Chile) [34] (Fig. 1a). The phylogeny obtained using the whole genome sequence shows that isolate CK/CR/1160/16 forms a cluster with isolates from USA, specifically with CAV/CAV56b/91 (GU393331) and Cal99/NE15172/95 (FJ904714) from California [25, 33] and with DMV/1639/GA9977/2019 (MK878536), from Georgia [14] (Fig. 1b). These sequences correspond to the GI-17 clasification, based on the S1 region.

Fig. 1
figure 1

Consensus phylogenetic trees constructed using (a) the S1 gene and (b) the full-length genome. Both trees were made using Bayesian Inference and four MCMC runs, 10,000,000 generations, and corroborated by the Maximun likelihood method, using Mega X. The CK/CR/1160/16 isolate (Genbank accession MN757859) is show in boldface

Comparison of isolate CK/CR/1160/16 to other whole-genome sequences (Table 1) indicates that this isolate has the highest nucleotide sequence identity (94.03%) with DMV/1639/GA9977/2019, with which the nucleotide sequence identities for every gene were higher than 88%. The IBV sequence with the lowest identity (86.03%) was ck/CH/LHLJ/08-6. The S1 gene region is highly variable, with nucleotide sequence identities varying between 58.3–88.48% among the different IBV serotypes, due to a high mutation frequency as well as recombination events [2, 3, 11, 29]. For this reason, it is important to point out that the isolate in this study exhibits a very high sequence identity in the S1 gene (96.89%) with the Georgia 13 genotype (Ga-13/14255/14). Finally, ORF 6b shows the lowest sequence identity among all the coding regions in the genomes analyzed in this study.

Two possible recombination points were detected, shown in at least six of the seven models using RDP4 software. The first event has a beginning breakpoint position at nucleotide 20,410 and an ending breakpoint position at 23,695 that is found in the S gene (Fig. 2a). In this case, the minor parent was inferred as a Massachusetts (FJ904722) strain, and the putative major parent was determined as a Connecticut type (KF696629). The second recombination event starts at nucleotide 24,291 and ends at position 25,518 in the sequence. The major parent belongs to an Arkansas type (EU418976), and the minor parent to Ma5 (KY6226045), comprising a part of the E, M, 4B and 4C genes (Fig. 2b). Recombination hotspots in region S of the genome, which are associated with the appearance of new virus variants have been described in the past [22, 33]. Recombination events in genes 3 and M have also been detected previously [22, 33].

Fig. 2
figure 2

RDP4 recombination plot and confirmation table of the recombination events detected on isolate CK/CR/1160/16. Putative recombination regions are shaded in red. a Breakpoint 20,410–23,695. The minor parent was inferred as Mass41.V and the putative major parent was determined as Conn.V. b Breakpoint 24,291–25,518. The major parent was inferred as ArkDPI11 and the putative minor parent was determined as Ma5.V

Our results show that the 2016 outbreak of IBV in Costa Rica was caused by a virus that belongs to the GI-17 group, which includes strains native to the United States. More specifically, we confirmed that the outbreak was caused by a Ga-13/14255/14 strain similar to the one that circulated in the United States during in 2016 [16, 30]. Our whole genome analyses provide the first evidence that the isolate CK/CR/1160/16 may be the result of the recombination of at least four different variants (Mass, Connecticut, Arkansas and Ma5). Detection of recombination events supports the need to maintain epidemiological surveillance, monitor the variants present in Latin America and optimize vaccination schemes, as outbreaks usually originate from variants not covered by vaccine serotypes [35].

The raw data of isolate CK/CR/1160/16 of the GA13-like strain has been deposited in the Sequence Read Archive (SRA) under number SRR10547950, BioProject number PRJNA592262, and BioSample number SAMN13419001 of the NCBI. The whole-sequence of the genome, has been uploaded in the GenBank database under accession number MN757859.