Background

Marek’s disease (MD) is a lymphoproliferative disease of chickens caused by the Marek’s disease virus (MDV or Gallid herpesvirus 2), a naturally occurring alphaherpesvirus [1, 2]. The virus targets lymphoid tissue such as the bursa of Fabricius, thymus, and spleen, where it infects B and T cells [3]. The pathological characteristics of MD include mononuclear infiltration of the peripheral nerves, gonads, iris, various viscera, muscles, and skin. Susceptible chickens develop lymphomas in visceral tissues and enlarged nerves resulting in paralysis, blindness, and eventually death. Economic losses to the poultry industry due to condemnation in broilers (meat-type) and reduction in egg production in layers (egg-type) are estimated to be $1-2 billion per year [4]. Due to the persistent nature of the virus in the feather dander shed from MDV-infected birds, all commercial chickens are exposed at a very early age. Although MD vaccines can prevent the formation of tumors, they do not prevent viral replication and bird-to-bird spread. The lack of sterilizing immunity is thought to be a major contributing factor for MDV field strains evolving to higher virulence [5], which can result in unpredictable and devastating disease outbreaks in vaccinated commercial flocks.

An attractive solution to augment MD vaccinal and husbandry control measures is to increase genetic resistance in commercial chicken lines through marker-assisted or genomic selection, which avoids the need to expose elite flocks under selection to a hazardous pathogen. In the past, focusing on the experimental White Leghorn (layer) lines 63 and 72, which are MD resistant and susceptible, respectively, we have utilized multiple techniques such as genome-wide QTL scans [6, 7], transcript profiling with microarrays [8], and protein-protein interaction [9] screens to understand the response to MDV infection and MD genetic resistance. Integrating the results from these approaches identified three genes and many other strong candidates that confer genetic resistance to MD [8, 10, 11]. While promising, these genes account for only a small fraction of the observed genetic variation, a situation that is similar for many other complex traits.

Allele-specific expression (ASE) is a powerful technique that measures the expression of each allele via a marker single nucleotide polymorphism (SNP) within a RNA sample. When a gene shows allelic imbalance, it is sufficient to identify a cis-acting regulatory element [12]. The key advantage of this approach is identification of a SNP exhibiting ASE, though likely not to be causative, is in tight linkage disequilibrium with the causative polymorphism, thus, essentially identifying a high-confidence candidate gene (genetic factor) with expression variation. And since variation in gene expression is thought to be a major factor accounting for phenotypic variation, genes with ASE SNPs provide candidates and markers that may account for the complex trait of interest.

In this study, the primary aim was to use ASE to identify SNPs associated with variation in transcriptional response to MDV infection in both broilers and layers, which would provide a strong foundation for future evaluations of genetic resistance to MD. In addition, the resulting information allowed us to compare the identified SNPs, genes, and pathways enriched between these two diverse chicken types. Furthermore, since RNA sequencing (RNA-Seq) data is comparable to microarray data, gene counts were used to identify genes differentially expressed (DE) in response to MDV infection and, subsequently, used to evaluate the biological pathways that were shared between the broiler and layer chickens.

Results

MD incidence in broilers

As the long-term objective is to improve MD genetic resistance through genomic selection, we first needed to identify broiler pure lines that differed substantially in MD incidence; layer lines 63 and 72 are already known to be MD resistant and susceptible, respectively [13]. Anecdotal information suggested that Cobb-Vantress lines “Red” and “Blue” used in this study differed in MD genetic resistance. We confirmed these differences in two MDV challenge trials (Table 1), where the Red line birds had about twice the MD incidence as those from the Blue line.

Table 1 Marek’s disease (MD) incidence in two pure line broilers

Whole-Genome transcriptomics and ASE SNP discovery

In the present experiment, we sequenced RNA samples from 7 uninfected and 7 infected birds for each type (broiler or layer). For the broiler dataset, with one exception, the average number of single end raw reads was approximately 27 million (Additional file 1: Table S1), and after quality trimming was between 21 million and 29 million. Tophat/Bowtie [14] aligned 68-81% of the reads to the chicken genome with nearly 98.5% of the reads in the resulting alignments having a mapping quality score of 30 or higher (MAPQ≥30). These high quality scores were necessary to confidently call SNPs with a minimum of false positives. Using Freebayes [15], SNPs were called within alignments from similar treatment samples (e.g., all 7 MDV-infected layers), with criteria to contain at least 14 reads in the population. The average number of SNPs finally called was in approximate range of 200,000-310,000 per sample in the broilers.

For layer samples, similar quality control measures were undertaken to reduce false positives SNPs. For these RNAs, sequencing was performed using the Illumina HiSeq platform, which not only produced more reads compared to the broiler-generated data (Illumina GAII) but also produced longer pair end fragments (Additional file 1: Table S2). The average number of reads ranged from 40 to 95 million per sample. The average number of SNPs called was in approximate range of 195,000-390,000 per sample in the layers. The increase in reads provided more confidence in SNP calls (read depth) but did not significantly increase the number of SNPs called between the samples (broilers vs. layers). Also, higher read depth did not significantly change the Tophat/Bowtie [14] alignment statistics between the samples (broilers vs. layers).

An analysis of variance (ANOVA) was used to identify the SNPs that showed ASE due to MDV infection in broilers or layers. After further filtering of the raw SNP data, an ANOVA was performed on 150,403 SNPs from the broiler SNP dataset and 134,953 SNPs for the layer that passed our test criteria (see Materials and Methods). Based on ANOVA significance (p<0.05), the number of SNPs exhibiting ASE in response to MDV infection in broilers and layers was 6,132 and 4,528, respectively (Additional file 1: Tables S3 and S4). No correction was made for multiple testing because the list of genes would be tested subsequently for pathway enrichment. In the Fisher exact test invoked in DAVID [16] to identify enriched pathways, the proportion of genes submitted must be significantly greater than the proportion of genes found by chance. In this way, Type 2 errors (false negatives) are reduced in the first step while the pathway test in the second step includes an experiment wise error rate to control for Type 1 errors (false positives).

Classification of ASE SNPs

The significant SNPs exhibiting ASE in response to MDV infection for both broilers and layers were distributed on chromosomes 1 to 28 of the chicken genome, with the number of SNPs per each chromosome roughly proportional to the size of the each chromosome in the chicken genome assembly (Additional file 2: Figure S1).

To locate these SNPs on the chicken genome with respect to genes and classify them based on function, ANNOVAR [17] was used. In the broiler population, the largest number of SNPs (1,780 or 29.0%) was classified as exonic (Table 2). The second largest number of SNPs fell in the intergenic group (1,610 or 26.3%), and the third largest group were located downstream of a gene (1,073 or 17.5% of the SNPs). As RNA was sequenced and analyzed, SNPs in both of these latter categories reflect the incomplete or inaccurate annotation of the chicken genome. Similarly in layers, the two largest classifications of SNPs associated with intergenic (1,059 or 25.4%) and exonic (1,501 or 24.8%) regions (Table 2).

Table 2 Classification of allele-specific expression (ASE) SNPs responding to Marek’s disease virus (MDV) infection in broilers and layers

A nucleotide position match between broiler and layer ASE SNPs showed 72 SNPs matched at the exact position (Additional file 1: Table S5). These 72 SNPs were distributed in 71 genes (Table 3). At the gene level, there were 850 genes showing ASE in response to infection that were common between the two chicken types (Table 3). The observed overlap was not significant suggesting different genes are responsible for conferring MD resistance between the broiler and layer lines examined.

Table 3 Classification of allele-specific expression (ASE) SNPs in response to Marek’s disease virus (MDV) infection in common between broilers and layers at the nucleotide position and gene level

The ASE data were further analyzed to quantify the biological implications due to the presence of a SNP. A function-based approach using ANNOVAR classified the SNPs as synonymous, nonsynonymous, stop gain, and stop loss (Table 4). In broiler and layers, 1,818 and 1,163 SNPs, respectively, were identified (Additional file 1: Tables S6 and S7). The largest category in both chicken types was synonymous implying that the SNPs were neutral and downstream amino acid changes were not significant. The second largest category in both the chicken breeds was nonsynonymous SNPs. A very small number of the SNPs fell into the other two categories (stop gains and stop loss).

Table 4 Functional classification of allele-specific expression (ASE) SNPs in response to Marek’s disease virus infection from broilers and layers in known genes

Differential expression (DE) analysis in response to MDV infection

DE was estimated by obtaining count data from each biological sample (broilers and layers) using the Htseq script [18]. The ENSEMBL exon annotation file [19] identified 17,934 chicken genes of which 15,261 and 15,468 genes were expressed in the broilers and layers, respectively. DESeq analysis [20] was used to estimate DE and a total of 548 and 434 genes were revealed following MDV infection (p<0.05) in broilers and layers, respectively (Additional file 1: Tables S8 and S9). The fold change ranged from −6.41 to 35.0 and −8.86 to 48.0 in broilers and layers, respectively. Of the genes that were significant, only 20 were common between the broiler and layer chickens.

Pathway analysis of ASE and DE genes

To identify potential common pathways, gene lists obtained by ASE and DE analysis were analyzed for biological process and pathway enrichment using DAVID [16]. Due to the likely limited effect of SNPs falling in the intergenic, intronic, and downstream regions of the chicken genome (based on ANNOVAR classification), we further filtered the input gene list to only include genes that corresponded to SNPs that were found to be synonymous, nonsynonymous, or located at the 5’ UTR, and 3’ UTR regions. From the initial gene list (Table 2), 2,156 and 1,207 genes following ASE analysis in the broiler and layer samples, respectively, were queried. Using functional annotation clustering, at the highest classification stringency, 109 and 63 clusters were formed in broilers and layers, respectively. When an enrichment cutoff at >1.0 was used, only 12 and 6 clusters were chosen in broilers and layer samples, respectively (Additional file 1: Table S10 and S11). There were no common enriched clusters between both datasets. The pathways analysis invoked in DAVID yielded 13 and 6 pathways in the broiler and layer samples, respectively (Table 5). Of these samples, there were three pathways that were common between the two bird types: apoptosis, DNA replication, and amino and nucleotide sugar metabolism. There were additional pathways found that were unique to each chicken type (Table 5). DAVID analysis of nonsynonymous ASE SNPs from broilers and layers also resulted in no common pathways that were found to be enriched (Additional file 1: Tables S12 and S13) but 14 genes were obtained that were common to both bird lines (Additional file 1: Table S14). Furthermore, out of the 14 common genes obtained, 6 genes were functional annotated using DAVID (Additional file 1: Table S15).

Table 5 DAVID analysis of allele-specific expression (ASE) genes in response to Marek’s disease virus (MDV) infection (P<0.05) in broilers and layers

Similar DAVID analyses were performed on the DE gene list obtained from broiler and layer samples. Based on functional annotation, 55 and 34 clusters were obtained in the broiler and layer samples from an initial input of 528 and 430 gene IDs, respectively. With the enrichment cutoff set to >1.0, there were 15 and 10 clusters in the broiler and layer samples, respectively (Additional file 1: Tables S16 and S17). Of these clusters, 2 clusters were enriched and showed exact matches between the broilers and layers (cluster 15 in broilers and cluster 5 in layers). At the pathway level for DE genes, 7 and 9 pathways were identified in broilers and layers, respectively (Table 6). Two common pathways were found, which were Toll-like receptor (TLR) signaling and JAK/STAT signaling.

Table 6 DAVID analysis of differentially expressed genes between uninfected and MDV-infection (P<0.05) in broilers and layers

Discussion

ASE is a powerful and elegant technique that separates gene expression signals into allelic components resulting in significantly increased sensitivity and added power, and can be used to understand the genetics of gene regulation. In this study, we used ASE to investigate transcriptional regulation in response to MDV infection in two different chicken types, specifically broilers and layers. The primary objective of this study was to identify SNPs exhibiting ASE responding to MDV infection, which would provide a strong foundation for future experiments designed to identify genes conferring genetic resistance to MD. Since it takes 7 days post infection (dpi) for MDV to undergo its first lytic cycle, transcriptional changes are most likely to occur prior to the prevalence of this phenotypic observation and, thus, 4 dpi was chosen as the sampling window captures to capture these transcriptional changes in the birds. In this experiment, we were able to successfully align RNA-Seq reads to the reference chicken genome and identify high quality SNPs in both broiler and layer chickens using our pipeline as described. As a result, we identified 6,132 and 4,528 SNPs significant SNPs in 4,768 and 3,718 genes from broiler and layer samples, respectively.

In chicken, the power of RNA-Seq will not be fully realized until the reference genome sequence is finished [21] (9 of the 39 chromosomes are still wholly absent) and gene models and annotation become more complete. For example, while only exons should be transcribed, nonetheless, a large percentage of the SNPs, when classified using ANNOVAR, fell into categories such as downstream, intronic and intergenic regions. Poor genome annotation is most likely responsible for the classification of SNPs in non-exonic regions as we obtained high quality reads with excellent coverage. We are in the process of developing new gene models based on our experimental RNA-Seq data from this study and additional ones.

At the SNP and gene level, 72 SNPs and 71 genes (Table 2) were common between ASE broiler and layer samples. The occurrence of identical SNP positions was negligible, which was expected given the wide genetic divergence between these two chicken types, and therefore is due to chance. Comparison of the enriched pathways obtained from ASE SNP positions also showed little similarities between the two bird types (Table 5). We conclude, based on the lack of concordance between the lists of identified SNPs, genes and pathways, there is no evidence to suggest common genetic elements that respond to MDV infection in these two bird types; this result also supports the notion that MD genetic resistance is complex and can result from different gene combinations. If true on a broader scale, then genetic markers for complex traits such as genetic resistance to MD based on ASE screens will need to be performed on each population and cannot be extrapolated from one bird line to another.

The chicken major histocompatibility complex (MHC) plays an important role in the determination of resistance to MDV [13]. The chicken chromosome 16 contains 3 loci namely, the B locus, the Y (or Rfp-Y) locus and the nucleolar-organizing region (NOR). Loci B and Y are unlinked and evolve independently [22]. In this study, in our analysis of nonsynonymous ASE SNPs in broilers and layers, we found a SNP associated with a gene that encodes for class I alpha chain of the Rfp-Y loci (ENSGALG00000024348, Additional file 1: Table S15). This result supports previous findings that have linked MHC to MDV resistance. However, our predications are limited due to incomplete sequence information of chromosome 16 for both galGal3 and galGal4 chicken genome builds. This limitation combined with the fact that many of the relevant genes have multiple family members hindered our effects to ascribe ASE on MHC genes and the pathways downstream.

Despite hindrances such as incomplete gene models, we successfully identified two pathways namely, TLR signaling and JAK/STAT signaling [23], by analyzing DE in broilers and layers. Recently, Smith and co-workers [24] reported the enrichment of the above-mentioned pathways in layers in response to MDV infection. Our data suggest that broilers also show similar pathway enrichment, thus, we conclude they have similar pathways for responding to MDV infection.

Understanding the biological implications of large datasets generated by RNA-Seq can be challenging. Moreover, to identify genetic effects that play an important role in downstream regulation and expression is a major concern in fully understanding the significance of pathways or networks. One hypothesis proposed by Clark et al. [25] was to consider the association of upstream gene(s) or loci as genetic factors that influence downstream processes such as gene expression. In other words, downstream DE in genes could be controlled by genetic factors (genes) upstream in key regulatory pathways. To dissect the influence of ASE on downstream DE analysis, we searched for possible associations between pathways enriched for genes with differential expression (i.e., DE gene lists) and genes exhibiting variation in transcriptional response (i.e., ASE gene lists).

Our data supports the above hypothesis of an underlying genetic component amplifying the expression of downstream genes in a pathway. Specifically, examination of the two common pathways (TLR signaling and JAK/STAT signaling), as shown in Figures 1 and 2, genes exhibiting ASE in response to MDV were found at the beginning of each pathway. Specifically, genes encoding CD14, LY96 (aka MD-2), TLR4, and MyD88 exhibit ASE in response to MDV infection and are found at the start of the TLR receptor-signaling pathway, which contains genes with DE in layers and especially for broilers. Likewise, IL5R, IL21R of the Cytokine R module, PTPN6 (aka SHP1), and PTPN11 (aka SHP2) are all found at the beginning of the JAK/STAT pathway for broilers and layers. It is also interesting to note that growth hormone (GH), one of the three MD resistance genes identified [8], also lies at the start of the JAK/STAT pathway though we could not evaluate for ASE due to the lack of SNPs in the F1 progeny. Interestingly, nuclear factor-kappa B (NF-κ B), which regulates genes associated with cell survival, proliferation, programmed cell death (PCD), stress, inflammation, and immunity was previously shown to be a key component of MDV infection [26] and showed ASE in the layers in this study as well as being a member of the TLR signaling pathway among others (e.g., viral carcinogenesis, B and T cell receptor signaling, chemokine signaling). If the genes at the beginning of the aforementioned pathways do confer genetic resistance to MD, then this has strong implications on how to analyze RNA-Seq data to identify strong candidate genes in other complex traits.

Figure 1
figure 1

The JAK/STAT signaling pathway in response to MDV infection. The JAK/STAT signaling pathway is displayed and genes that exhibit either differential expression (DE) or allele-specific expression (ASE) following Marek’s disease virus (MDV) infection in both broilers and layers are shown.

Figure 2
figure 2

Toll-like receptor signaling pathway in response to MDV infection. The Toll-like receptor signaling pathway is displayed and genes that exhibit either differential expression (DE) or allele-specific expression (ASE) following Marek’s disease virus (MDV) infection in both broilers and layers are shown.

Conclusions

In conclusion, we have successfully identified SNPs associated with ASE in both broilers and layers chicken types, as well as revealed genes and pathways that were common or unique to each bird or analysis type. Thus, the method is powerful for identifying SNPs and genes that exhibit differential allelic response to MDV infection as well as providing candidate markers and genes for MD genetic resistance. Furthermore, we believe that greater exploration should be performed placed on genes exhibiting ASE at the beginning of pathways identified by gene expression enrichment analysis. The larger question that still remains is how much do genes exhibiting ASE (i.e., expression variation) account for overall genetic variation in complex traits such as MD genetic resistance? To answer this question, we have developed a custom chip containing ASE SNPs that is being genotyped on an MD resource population, which will determine, if present, the size of effect and which alleles are favorable for genetic resistance to MD.

Methods

Animals and experimental design

Two chicken types, commercial broilers (meat-type) and experimental layers (egg-type), were used in this study. The broiler chickens were either chicks from two outbred Cobb-Vantress pure lines or their F1 progeny as determined by the experiment. The layer chickens were F1 progeny from intermating Avian Disease and Oncology Laboratory (ADOL) lines 63 and 72, two highly inbred layers lines that are MD resistant and susceptible, respectively.

MD incidence in the broilers was measured in two trials. For each trial, ~100 day old chicks were inoculated intra-abdominally with 2,000 pfu MDV (686 strain). All the chickens were housed in Horsfall-Bauer isolators during the entire experiment, and mortality and MD incidences were measured daily or at the termination date (8 weeks post inoculation) by necropsy.

To generate material for determining ASE, F1 progeny broiler and layer chicks were randomly divided into either treatment group (challenged) or control group (unchallenged). For the viral challenge, 2,000 pfu MDV (686 or Md5 strain for broilers and layers, respectively) was injected intra-abdominally at 2 days of age. At 4 days post infection (dpi), 7 birds from each group were euthanized, and splenic tissue samples collected and stored in RNAlater ® (Ambion, Austin, TX, USA). Animal care and management followed the ADOL animal care and usage committee policy.

RNA extraction and sequencing

Total RNA was extracted from splenic tissues using the Stratagene Absolutely RNA Miniprep kit (Santa Clara, CA, USA). RNA quality was determined using an Aligent Bioanalyzer 2100 lab-on-a-chip and only samples with RIN scores over 7 were submitted for sequencing.

Sequencing was done using Illumina sequencing platforms. Broiler RNA (single end, 75 base read lengths) was sequenced at Michigan State University Research Technology Support Facility (http://www.rtsf.msu.edu) using an Illumina Genome Analyzer II. Layer RNA (pair-end, 100 base read lengths) was sequenced at DNA Landmarks (Montreal, Canada) using an Illumina HiSeq.

Mapping and assembling

All RNA-Seq data reads were trimmed of sequencing adapters, and sequence statistics performed using FASTQC program [27]. Based on these results, the reads were further trimmed using Sickle, a program (https://github.com/najoshi/sickle) that removes reads containing ‘N’ as well as pairing reads, if paired end sequencing was used for down stream alignment. A minimum post-trimmed length of 50 bases was set irrespective of single or paired-end reads. The resulting FASTQ files were processed by Tophat (Version 2.0.4) [28]. Alignments were invoked inside Tophat using Bowtie (Version 2.0.6; [14]) by providing the library insert size for paired ended reads, the reference chicken genome (galGal3), and the ENSEMBL chicken gene GTF file list (Version 67). Read quality statistics of the resulting alignment files were analyzed using SAMStat [29] and the files were used to call SNPs and estimate expression of genes.

SNP detection, ASE estimation and functional annotation of SNPs

Unique read groups were added to each alignment file and population level SNPs within each group (broiler or layer) were called using Freebayes (Version 0.9.6), a Bayesian SNP calling program [15]. The variant calling format (VCF) files [30] derived from infected and uninfected birds were merged with VCFtools [30] and further filtered based on quality scores. SNP call statistics subsequently determined by VCFtools and the raw SNPs were further filtered on the basis of quality (Q=100), thereby, additionally increasing the stringency of called SNPs. Subsequently, a 4-column file with chromosome number, nucleotide position, number of reference SNPs, and number of alternate SNPs were parsed out from each sample from the VCF file. Data was formatted and read into SAS for ASE estimation.

The frequency of each SNP allele expressed as a proportion of total number of reads for the allele present in the reference genome, was analyzed using the following linear model:

Y ijk = μ + T i + C j + T C ij + ε ij k

Where (Y ijk ) is the allele frequency in the kth individual (k=1,6) within the ith treatment and jth cross, Ti is the effect of treatment, Cj is the effects of cross, and TC is the interaction of treatment with cross. The dependent variable was the reference allele frequency with variation among biological replicates within treatment used for testing. SNPs were eliminated for a bird type if they had less than 2 samples for any treatment group. A pooled error term was used to test for significance. As a first pass, all genes passing the 0.05 level of significance where chosen for further analysis using pathway enrichment, in this way the rate of Type 2 errors was reduced, yet allowed for a final test using Fisher exact test for pathway enrichment which controlled for Type 1 errors.

ANNOVAR [17] was used to functionally annotate the putative SNPs. The ANNOVAR database was set up as described [17] and each SNP was classified based on its position in the reference chicken genome as exonic, intronic, intergenic, 5’ UTR, 3’ UTR, splice acceptor or donor site, downstream or upstream. Functional annotation such as nonsynonymous, synonymous, stop codon gain or loss, and amino acid changes was also determined for each SNP.

Differential expression (DE) analysis

To test DE, alignments were analyzed using DESeq, a negative binomial distribution-based [20] approach within R [31]. For input data, the number of reads (count data) that uniquely mapped to an exon was counted using Htseq-count [18] and the provided reference chicken gene list. The reads were processed in Htseq using a union overlapping mode, gene as feature, and with no strand-specificity. The resulting data from all the individual samples were converted to text file imported into R and analyzed. A portion of the reads (5.7 and 12.6 million reads on average in broiler and layer samples, respectively), were assigned to “no feature.”

Pathway analysis using DAVID

ASE and DE significant genes (p<0.05) lists were submitted to DAVID [16]. The analysis classification stringency was set to the highest level with suitable controls. The resulting clustering was then limited to an enrichment score of >1.00 and FDR for multiple testing was performed by the Benjamin and Hochberg method invoked within DAVID. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis was also performed using DAVID.