Background

Poultry accounts for almost half of all meat consumed in the UK, with 875 million chickens, 17 million turkeys, 16 million ducks and 250,000 geese a year supplied by over 2500 poultry farms [1]. Their production can be adversely affected by infection with avian specific viruses such as infectious bursal disease virus (IBDV), infectious bronchitis virus (IBV) and Newcastle virus (NDV) [2,3,4,5,6,7]. Poultry can also serve as the source of zoonotic, or potentially zoonotic, infections with viruses such as H5N1 and H7N9, transmitted to humans through contact with poultry. To reduce the threat to the global food supply and to minimize the risk of zoonotic events, there is an ongoing need to better understand the biology of avian viral infections, the mechanism of natural resistance (viral intrinsic and innate immunity) and the characterization of the biological factors that might be involved.

Interferon inducible transmembrane (IFITM) proteins are effectors of the immune system widely involved in restricting entry into cells of a broad range of viruses including Influenza viruses, Ebola and Zika [8,9,10,11,12,13,14]. In chickens, four IFITM genes have been annotated to date by the chicken gene nomenclature consortium (CGNC), namely chIFITM1 (LOC422993-3-like), chIFITM3 (LOC770612-1-like), chIFITM5, and chIFITM10 [15,16,17,18,19,20,21,22]. Although not yet annotated by the CGNC, we have previously shown the existence of chIFITM2 (putative LOC107053353-dispanin-2b-like) and suggested a hypothetical genetic structure of the locus based on the human syntenic genome region [23].

As IFN-stimulated genes (ISGs), IFITM abundance within a cell increases following activation of the type 1 IFN signaling pathway in response to the detection of pathogen associated molecular patterns (PAMPs) such as viral nucleic acid in the cytoplasm of the infected cell. In addition, binding of the IFNα/β to their cell surface receptors induces translocation of the transcription factor complex IFN-stimulated gene factor 3 (ISGF3) into the nucleus [24]. This induces the transcription of several ISGs, among which are the IFITM genes. The IFITM proteins target the final stages of viral entry by preventing fusion of the viral and cellular membranes [25]. This mechanism also reflects the localization of the human IFITM2 and IFITM3 which are found predominantly in intracellular membrane compartments such as late endosomes and lysosomes [21]. It is suggested that the membrane-defined site of fusion, namely plasma membrane and endosomes, is critical for the antiviral activity of these proteins [15].

While genetics and cell biology of the human IFITMs has been extensively characterized, lack of an accurate and complete reference genome sequence has hampered progress in characterizing the locus in diverse vertebrates including avian IFITMs. The genetic structure of the chicken locus was proposed based on the human locus however critical differences suggest that the current chIFITM nomenclature might be incorrect [23]. Indeed, the relative intracellular localizations of chIFITM1 and 2 as defined by genome synteny are the opposite of their human counterpart. This prompted Smith at al. to suggest an inversion might have occurred within the locus [23]. Subsequently, it was shown that duck IFITM1 localizes on the plasma membrane, like human IFITM1, highlighting further classification difficulty in avian IFITMs [16]. In addition, in the effort to explain the conserved antiviral activities of the different human IFITMs genes, Compton et al. have recently suggested that these differences, which reflect their localization and abundance in a cell, are a sign of a duplication and mutational events of the IFITM genes that arose millions of years [26]. Although their studies focused in the evolution of the IFITM genes in various non-human primates, it underlines the necessity to consider how avian IFITM genes should be considered as their nomenclature does not reflect necessarily their human orthologues. In this scenario, while ancestral IFITM3 is clearly syntenic with hIFITM3, more studies are required to elucidate the relationship between the other two IFITM proteins.

The most recent version of the chicken genome (v5) has incorporated long PacBio sequencing reads. This new sequencing has improved the chicken genome, including the IFITM locus. However, when sequencing an entire genome and performing whole genome assembly, minor assembly errors can occur, often due to lack of coverage or because paralogous sequences at other loci compromise accurate assembly. The IFITM gene family is one of the most paralogous families known with multiple copies of both IFITM genes and pseudogenes. For this reason, we sequenced just a small region of chromosome 5 containing the IFITM locus at high coverage with PacBio and with Illumina MiSeq.

The average PacBio read length is >10 kb, depending only on the activity of the polymerase [27,28,29] and although PacBio raw reads have a higher error rate compared to other technologies (14% versus 0.1 to 1% for Illumina), high quality consensus sequence can be obtained from overlapping reads. To complement the new Gallus gallus reference we have focused solely on the chIFITM locus and better elucidated its genetic structure by sequencing a bacterial artificial chromosome (BAC), from the BAC library used to generate the original Gallus gallus genome. The 203Kb-long BAC (CH261-109H20 [30]), containing the chIFITM locus, does not include chIFITM10. Given that the current literature focuses mainly on the antiviral activity of chIFITM1, 2, 3 and 5, we present evidence of the high confidence, high coverage sequence of this locus and the expression of these 4 genes by mapping of publicly-available RNA-seq data, to define each of the chIFITM proteins at the transcriptional level. Further, we describe the design and use of hybrid capture (SureSelect) probes and their use in genome capture and sequencing of other Galliform IFITM loci.

Results

De novo assembly of PacBio and MiSeq sequencing reads

In order to obtain a consensus reference sequence from the raw sequencing data, PacBio reads derived from the BAC clone sequencing were quality filtered and de novo assembled with HGAP using the protocols available on the SMRT portal (Additional file 1). Summaries of assembly and mapping statistics for PacBio (and also Illumina, see below) reads are shown in Table 1. Because of the length of the PacBio reads, the PacBio de novo assembled consisted of 6 assembled fragments (compared to 13 with Illumina). Of these, one contig (number 2, Table 2) contained the chicken genome sequence; the others represented genomic sequences from the E.coli BAC vector (Additional file 2). Contig 2, containing chicken sequences, had the highest base coverage and its length suggested it represented the full-length BAC clone. Therefore, to confirm the identity of this de novo assembled fragment, we utilized ACT and sequence similarly plots to compare contig 2 with chromosome 5 reference sequence from both Gallus gallus v4 and Gallus gallus v5 (Fig. 1a-c). Contig 2 contained the full chIFITM locus and highlights the substantial deficiencies to the Gallus gallus v4 genome assembly (Fig. 1a and b). This contrasts with Gallus gallus v5 genome assembly where fewer large gaps are observed, but with the presence of a small INDEL (Fig. 1c-d). Inspection of the similarity plot shows these differences observed at the nucleotide level fall in the genomic region of the chIFITM3 gene, within the intronic region (Fig. 1c, bottom Dot Plot and sequence alignment in 1D). To further analyse this reagion we have also screened the full locus for repeats and low complexity DNA sequences as shown in Additional file 3. We attempted de novo assembly of Illumina MiSeq paired-end reads using three software packages (namely IVA, SGA and HGAP) resulting in only partial consensus sequence covering between 50 and 70% of the full chIFITM locus (including the flanking genes ATHL1 and B4GALNT4) (data not shown). The best assembly was generated using IVA, which produced the least number of contigs (13). In order to identify Illumina contigs that contained the BAC, and specifically the chIFITM locus, sequence similarity was used to compare the Illumina MiSeq contigs with the PacBio contig 2 (Additional file 4). All of the Illumina MiSeq contigs covered either portions of the PacBio contig 2, or just the chIFITM locus. These results suggest that while the longer PacBio reads map well to the reference genome (Additional file 5), Illumina MiSeq raw reads on their own are not be sufficient to assemble this region de novo, although they do map accurately to the de novo PacBio reference.

Table 1 PacBio and Illumina MiSeq de novo assembly and mapping statistics
Table 2 Basic statistic of de novo assembled contigs from PacBio reads
Fig. 1
figure 1

Locus comparison between PacBio consensus sequence (contig 2) and a portion of chromosome 5 of the two versions of the chicken genome. a: The 203 kb BAC reference sequence contained in the PacBio contig 2 (in the middle) is compared with chromosome 5 of Gallus gallus v4 (top) or v5 (bottom) using ACT, Artemis Comparison Tool. The annotation files for Gallus gallus v4 and PacBio contig 2 have been compressed to allow visualization of the whole BAC; for Gallus gallus v5 it was drawn manually only to visualize location of the locus. b: The chIFITM locus (circled in A) is enlarged in B to show only the chIFITM locus including the flanking genes (this is a 40 kb region extracted from the 203Kb total). Gaps are visible in Gallus gallus v4 represented by white bars (N nucleotides), while these are absent in the comparison with the more complete Gallus gallus v5. The graph does not show differences at the nucleotide level, but only an overall view of the locus. c: Dot Plot comparison graphs of the assembled PacBio contig 2 versus Gallus gallus v5 showing differences not visible when using ACT for the 40Kb region. The region enlarged in the right Dot Plot shows a stretch of the genomic region within the intronic region of the chIFITM3 gene which shows differences with chicken genome assembly v5. d Clustal Omega alignment of the PacBio contig 2 consensus sequences and the chicken genome v5 (portion of the IFITM3 gene corresponding to the gap seen in 2C). In yellow is highlighted the gap

Organization of the chIFITM locus in PacBio contig 2 and Gallus gallus v4 and v5 reference sequences

To study in more detail the gene order of the v4 and v5 assembled locus relative to our assembly we used Artemis. Concentrating on the chIFITM genes, we show that combined reads from both sequencing technologies mapped well to v4 or v5 assemblies, covering the locus to significant depths and aligning to all the regions of interest (Fig. 2 and Additional file 5A-D). The deep and accurate sequence of the chIFITM locus allows us to be confident that the chIFITM1 and 2 genes as named and annotated in the v5 genome are indeed inverted in comparison to the human locus with chIFITM1, 2, 3 and 5 genes having their transcriptional units in the same direction (Table 3) [23].

Fig. 2
figure 2

Artemis coverage and stack view of Illumina MiSeq reads mapped against PacBio consensus sequence (contig 2). a Overall coverage and GC content of the Illumina MiSeq BAC reads (203 kb region) mapped against the PacBio contig 2. This reference was built using the annotation of Gallus gallus v4 as scaffold. The chIFITM genes are located between 138150 and 177724 in the 203Kb region. b stack view of the Illumina MiSeq reads showing the chIFITM locus

Table 3 Coordinates of the chIFITM genes within the PacBio consensus sequence (contig 2)

SureSelect probes design and pull down of the IFITM locus from turkey breast tissue and DF1 cells

The consensus sequence we have generated was used to design Agilent SureSelect probes covering the 40 kb region encompassing the IFITM locus. Our primary purpose is to use these probes to study possible IFITM variants in different chicken breeds and further into the phylogeny of Galliformes. We were able to successfully pull down the IFITM locus in DF1 cells (chicken embryonic fibroblasts) as well as turkey breast tissue (Fig. 3), showing we are able to use chicken (Phasianinae, sub-family of Galliformes) IFITM probes to pulldown and sequence the locus in a different Galliform sub-family, namely the Meleagridinae, to which the turkey belongs. The BAC clone, like Gallus gallus v5 of the chicken genome, is from a Red Jungle fowl, inbred line UCD001 (Inbred 256, female) while the DF1 cells are derived from a White Leghorn (East Lansing line-0, 10-day old eggs). Mapping of PacBio reads from DF1 cells against either v5 of the chicken genome sequence or our PacBio contig 2 gives a good coverage but with low coverage gaps detected in IFITM3 and B4GALNT4 (Fig. 3a-b). The IFITM3 gap was closable with the low frequency PacBio reads and the PacBio contig 2 reference, yielding an accurate IFITM locus sequence for DF1 cells. Illumina sequencing of the turkey IFITM locus assembles more poorly to the turkey reference genome (Fig. 3c), suggesting the current turkey genome is in need of improvement with long read PacBio sequences as achieved for the chicken genome. We were however, able to identify all four IFITM genes in the turkey locus. We constructed multiple sequence alignments for the two chicken and turkey genome IFITMs (Fig. 4). Amino acid sequence alignment of the IFITM proteins of DF1, turkey and Gallus gallus v5 shows substantial differences as we can see from Fig. 4. For the known antiviral IFITMs one amino acid change was found between Red Jungle fowl and White Leghorn, namely A63V in the CIL domain of IFITM2. More amino acid substitutions were seen for Turkey compared to chicken IFITMs with 13, 13 and 11 differences between IFITM1, 2 and 3 respectively. Variation in one of the chicken IFITMs is maintained in the turkey gene, namely amino acid 63 A (Red Jungle Fowl) or V (White Leghorn) and 63 V (Turkey) in IFITM2.

Fig. 3
figure 3

a Artemis coverage and stack view of the IFITM locus in DF1 cells following pull down of the IFITM locus using SureSelect probes and sequencing with PacBio. The figure shows an intact locus and successful mapping of the IFITM locus against the Gallus gallus sequence reference, despite two gaps observed within the B4GALNT4 and IFITM3 genes. b Artemis coverage and stack view of the IFITM locus in DF1 cells following pull down of the IFITM locus using SureSelect probes and sequencing with PacBio. These reads were instead mapped against the new PacBio contig 2 sequence reference. As for the mapping above, two gaps (one partial) are observed within the B4GALNT4 and IFITM3 genes, although more reads cross the gaps, allowing full coverage. c Artemis coverage and stack view of the IFITM locus in turkey breast tissue following pull down of the IFITM locus using SureSelect probes and sequencing with Illumina MiSeq. The graph shows successful mapping of MiSeq reads despite using chicken probes to pull down the locus in turkey tissue. The white bars represent actual gaps in the turkey reference as published on both Ensemble and NCBI and to which the probes will not eventually map as gaps are shown in the reference as “NNN”

Fig. 4
figure 4

Clustal Omega alignment of the amino acid sequence of the IFITM proteins derived from the consensus sequence of DF1 and turkey samples following targeted SureSelect pulldown. The amino acid sequences are compared to the Gallus gallus v5 sequences. Domain structures are represented as: IM1 and IM2, intramembrane domain 1 and 2, CIL, conserved intracellular loop. These have yet to be defined for chIFITM5

Mapping RNA-seq data to the PacBio contig 2 reference containing the chIFITM locus

The generation of a high quality de novo assembly of the IFITM locus sequence allows accurate mapping of RNA-sequence data from previous published studies for qualitative and quantitative analysis. To validate which chIFITM transcripts were expressed, and to assess their level of expression, we first used RNA-seq reads from 293 T cells, engineered to express only chicken IFITM proteins constitutively. Reads from the control cells (wild type 293 T) do not map to the chIFITM locus (Table 4). Focusing on the 40 kb region containing the chIFITM locus, including the flanking genes ATHL1 and B4GALNT4, we observed RNA-seq reads from 293 T cell lines stably expressing chIFITM1, 2, or 3 with expected peaks of expression at gene exon locations (Additional file 6). The number of mapped reads and by implication the expression level for chIFITM3 was higher than that of chIFITM2 and in turn both higher than that of chIFITM1 (Additional file 6). We analysed 26 RNA-seq studies totaling 293 sequenced chicken tissues and avian cell lines that were identified in the ENA database. The samples were examined for constitutive expression levels of the chIFITMs in a subset of each study covering at least one immune relevant tissue type (Table 5). To analyze constitutive expression, RNA-seq data from liver, spleen, lung and trachea samples taken from the studies as listed in Table 6, were mapped against the PacBio contig 2. To these, we added expression data from commonly used laboratory cell lines (DF1, CEF, HD11, DT40).

Table 4 chIFITM transcripts average coverage values in the stable cell lines
Table 5 Expression levels of the IFITM transcripts calculated as RPKM in the different RNA-seq studies deposited in the European Nucleotide Archive (ENA) database
Table 6 Tissue types, experimental conditions and species considered in the different RNA-seq studies

ChIFITM3 is constitutively expressed (both exons) in all tissues and cell lines analysed at levels higher than the putative chIFITM1 and chIFITM2. Indeed, putative chIFITM1 is barely detectable in most of the tissues, and much lower compared to the other IFITM transcripts, as also shown from the RPKM values in Table 5. Further, when infected or subject to cellular stress chIFITM2 and chIFITM3 are abundantly expressed, again with little IFITM1 expression. Indeed, it is not possible to detect convincing levels of IFITM1 expression at any time except for Caecal tissue and Ileum tissue infected by influenza A H5N2 or H5N11 (Figs. 5 and 6, Additional file 7 and Table 5). In addition, the coverage graphs confirm that the typical genetic structure of the chIFITM genes is maintained, with two exons separated by a single intron in all cases, although reads were observed to map beyond the boundaries of the annotated genes particularly in the stretch of genomic region between IFITM2 and 5 (Figs. 5 and 6).

Fig. 5
figure 5

The read alignment views in Artemis showing RNA-Seq data from the different studies. Top panel: the ‘coverage view” showing a separate plot for each BAM mapped to our PacBio contig 2 (40Kb region). The coverage shows only data relative to constitutive expression level of chIFITMs in immune-relevant tissues and cell lines (lung, trachea, spleen, liver, DF1, CEF, HD11, DT40). Bottom panel: the “stack view” (paired reads: blue, single reads and/or reads with an unmapped pair: black; reads spanning the same region: green) to show in more detail read depth across each chIFITM transcript. All the features were annotated manually blasting the sequences from the latest version of the chicken genome. Cyan: CDS region, grey: mRNA, white: gene (overlapping with mRNA features)

Fig. 6
figure 6

RNA-seq data alignment of reads from the immune relevant tissues and cell lines in treated conditions: infection with IBDV, ALV-J, ILVV, LPS, H5N5/H5N1 or heat stress-induced conditions. The graph shows that also in these conditions, levels of chIFITM are lower compared to chIFITM2 and chIFITM3. Top panel, overall coverage. Bottom panel stack view of each chIFITM transcript

Discussion

In this study we have sequenced a BAC clone containing the complete chIFITM locus using both PacBio and Illumina MiSeq sequencing technologies producing an accurate assembly of the locus. We analysed expression levels of the chIFITM genes using publicly available RNA-seq data from different chicken lines and tissues, and produced hybrid capture probes for ‘pull-down’ sequencing of another chicken line and the more distant turkey IFITM locus.

The chIFITM locus showed several gaps in the version 4 of the chicken genome release (Gallus gallus 4). It had been improved by sequencing the same DNA reference source (Female Red Jungle Fowl, UCD001 inbred line) with PacBio technology. Comparison of the two public versions of the chIFITM locus with the one generated in our study (PacBio contig 2) still demonstrated differences, despite being the same inbred line. We believe these discrepancies in the public genome assemblies might be a consequence of genome wide assembly required for full chicken genome, suggesting that our BAC sequence (203 kb) is likely to be more accurate, particularly in GC-rich regions. In addition, quality control analysis and type of assembler used will influence the final consensus sequence generated for any region of the chicken genome, leading to the differences observed in the sequences. To produce our sequence, we employed both PacBio RSII and Illumina MiSeq technologies because they have complementary properties that met our requirements for covering gaps and maintaining sequence integrity. Sequencing within Gallus gallus domesticus lines, more outbred chickens and more divergent Galliforms is now possible using hybrid capture genome sequencing. Indeed, we have been able to document many amino acid sequence changes between chickens and turkeys in the antiviral IFITMs in regions of the proteins known to be important for their antiviral activity (Fig. 4).

The importance of obtaining an accurate sequence is vital to understand the genetic structure and confirm the identity of the IFITM locus, thus to correctly annotate the genes. Hypothetical structures of the chIFITM locus have been suggested, based on the human locus but inconsistencies remain between alignments for the putative chIFITM1 and chIFITM2 [16, 17, 23]. Based on the literature and current annotation the four genes are clustered on chromosome 5 which also contains the chIFITM10 gene (the function of which remains to be elucidated). Following the discovery of chIFITM2, Smith at al. [23] proposed an organizational structure for the locus, based on features such as membrane localization and lack of an N-terminal extension (both characteristic of the IFITM2 and IFITM3 proteins), suggesting that chIFITM2 is actually analogous to human IFITM1 [23]. Our immunofluorescence analysis to study localization of the chicken proteins expressed in human (293 T) stable cell lines is in agreement with Smith et al. (data not shown, Bassano et al. in preparation) [23]. Indeed, chIFITM2 is membrane-bound, while chIFITM1 localizes to the early endosomes. Here our RNA-seq analysis of the ENA dataset shows that chIFITM1 basal expression levels are very low compared to chIFITM2 and chIFITM3. The analysis of the samples in presence of IFN\( \boldsymbol{\upalpha} \), H5N2, H5N1, H5N3, IBDV, IRF7, ALV, Lipopolysaccharide or in heat-stress induced conditions, also shows that higher expression levels can be observed for chIFITM3 and chIFITM2 suggesting a key role for these two proteins as antiviral IFITMs compared to chIFITM1, expression of which is only in the intestinal tract and in the testis. Although immunofluorescence staining seems to suggest that chIFITM2 is analogous to hIFITM1 (they are both plasma membrane-bound) the genome organisation supported by long read PacBio sequences now unambiguously confirms that the chIFITM2 and chIFITM1 locus is inverted compared to the human locus. We therefore, propose based on gene expression, genome architecture and published functional data the gene order in the chicken locus on chromosome 5q should be renamed: centromeric – B4GALNT4 – chIFITM3 – chIFITM2 – chIFITM1 – chIFITM5 – ATHL1 – telomeric.

Conclusions

In this report we have produced an updated genomic map of chIFITM locus that includes the two flanking genes ATHL1 and B4GALNT4, by combining and analyzing sequencing data derived from PacBio RS II and Illumina MiSeq sequencing technologies. The only difference detected in our assembled locus sequence relative to the Gallus Gallus (v5) is a 5 bp insertion in the intronic region of chIFITM3. This change in sequence may not have any influence on the function and expression of the chIFITM3 gene. However, RNA-seq analysis shows expression of all IFITMs from this locus but that chIFITM1 has different patterns of expression from the other antiviral IFITMs. Initial analysis of different chicken breeds shows IFITM amino acid variation between different chicken breeds and turkeys.

Methods

Bacterial Artificial Chromosome (BAC) construct recovery

The BAC clone (CHORI-261) from Red Jungle Fowl strain UCD001 covering the predicted IFITM locus was purchased from BACPAC Resources Centre. The BAC clone, delivered as a stab culture was streaked directly on Luria Broth (LB) agar (chloramphenicol 12.5 μg/mL) to isolate single colonies and incubated overnight at the designated growth temperature. Single colonies were picked and cultured in LB media. Plasmid DNA was then extracted and purified according to Qiagen Plasmid DNA kit manufacturer’s protocol.

Sequencing, assembly and alignment

A total of 3 μg isolated plasmid DNA was sequenced across two platforms, the Illumina MiSeq and PacBio RSII. Library preparation and quality control was undertaken by The Wellcome Trust Sanger Institute’s core sequencing facility. Assembly of PacBio sequencing reads was performed using protocols available on the SMRT® Portal. Briefly, sequencing fragments were first filtered to remove reads that did not meet read quality and length thresholds, then de novo assembled using HGAP [31]. Errors in the re-circularization of the BAC as well as sequence consensus generation for the DF1 cell line were corrected using iCORN v2, Interative Correction of Reference Nucleotides [32]. MiSeq reads were first analysed for low quality reads with FastQC [33] and low quality reads were trimmed using Trimmomatic [34]. De novo assembly of MiSeq reads was attempted using IVA [35], SGA [36] and HGAP from the SMRT® Analysis package [31]. SMALT, (http://www.sanger.ac.uk/science/tools/smalt-0) a pairwise sequence alignment program was used to map MiSeq reads onto genomic reference sequences, either chromosome 5 of Gallus gallus (v4 and v5) or the consensus sequence generated from de novo assembly of PacBio sequencing reads. The SAM files generated were converted into indexed BAM files using Samtools 0.1.19 [37]. Artemis (v13.0) and ACT (Artemis Comparison Tool) [38] were used to analyse locus coverage and accuracy of the alignment. Comparison files required to run ACT were generated with megablast [39]. Dot plots were generated calling dotter from the command line [40]. Annotation for the PacBio consensus sequence was generated by RATT, Rapid Annotation Transfer Tool [41] using as scaffold the annotation from Gallus gallus version 4.

All sequences produced in this manuscript are deposited in the ENA under the accession numbers ERS556272, ERS565108, ERS1276179, PRJNA361311.

SureSelect pull down of the IFITM locus

SureSelect probes covering the chicken IFITM locus (40Kb region) were purchased from Agilent and samples processed for targeting pulldown according to the Illumina and PacBio protocols.

Cell culture

Two hundred ninety-three T and DF1 cells were cultured in DMEM medium supplemented with 10% FCS, in absence of any antibiotics. Stable transfections were performed using Fugene (Promega) according to the manufacturer’s instructions and cells maintained in culture in presence of puromycin for positive selection. RNA extraction was performed using Qiagen RNA extraction kit according to the manufacturer’s instructions. Up to 5 μg of extracted RNA was reverse transcribed and sequenced using Illumina MiSeq. DNA extraction from turkey breast tissue was performed using Qiagen Tissue and blood DNA extraction kit, according to the manufacturer’s protocol.

European Nucleotide Archive (ENA) sequencing data download and RNA-seq analysis

RNA-seq datasets for this study were retrieved from ENA records (Table 5). A total of 26 studies for chicken sequencing datasets were identified. FastQC-corrected reads were aligned to the PacBio-derived consensus sequence using BWA version 0.7.12-r1039, Samtools 0.1.19 and MAFFT version 7.205. The BAM files generated were then visualized using Artemis. To quantify transcripts expression, RPKM (Reads Per Kilobase per Million mapped reads) were calculated using Artemis by selecting the feature of interest. Read depth for RNA-seq alignment was calculated using Ugene v1.25.0.