Background

The linkage disequilibrium (LD) is important in livestock genetics for its key role in genomic selection [1] and detecting the causal mutations of economically important traits [26]. Based on the LD information, there are two main methods to detect genes underlying phenotypic variation, including one from phenotype to genome and another one from genome to phenotype. The first method is performed by targeting particular candidate genes or by quantitative trait loci (QTL) mapping and positional cloning of QTL. In the second method, patterns of LD in populations that are incompatible with the hypothesis of genetic neutrality are identified, and these patterns are selection signatures [7]. The aim of the second method is to identify artificial selections by statistically evaluating the genomic data [7].

Allele frequencies underlying selection are expected to change. A neutral mutation will take many generations until the mutated allele reaches a high or low population frequency. In this case, the LD between the mutation and its neighboring loci will be degraded because of the recombination in every generation [8]. The frequency of a novel mutation will increase or decrease more rapidly than the neutral mutation because it is underlying artificial selection, so that the surrounding conserved haplotype was long [9, 10]. This is the background of the extended haplotype homozygosity (EHH) statistic method used to detect selection signatures [11]. There are also many other methods to detect selective sweeps from DNA sequence data, including the Tajima’s D[12] and Fay and Wu’s H-test [13] for selected mutations, measuring large allele-frequency differences among populations by FST[14], and the integrated Haplotype Score (iHS) [15], which is an extension of the EHH statistic [11]. Among these methods, the EHH test is particularly useful [7, 11]. The EHH test is used to detect artificial selections according to the characteristics of haplotypes within a single population, and do not require the genotype of the ancestor [7]. Furthermore, the EHH test is less sensitive to ascertainment bias than other approaches, so it was designed to work with SNP rather than sequencing data [7, 16].

The broilers used in this study were selected for eleven generations and genomic regions controlling AF deposition are expected to exhibit signatures of selective sweep. The aim of this study was to identify the selection signatures underlying the artificial selection for AF in chicken and to investigate the genes important for AF deposition.

Methods

Ethics statement

All animal work was conducted according to the guidelines for the care and use of experimental animals established by the Ministry of Science and Technology of the People’s Republic of China (Approval number: 2006–398) and approved by the Laboratory Animal Management Committee of Northeast Agricultural University.

DNA samples and data preparation

Broilers used in this study were from two Northeast Agricultural University broiler lines divergently selected for AF content (NEAUHLF). The two lines have been selected since 1996 using AF percentage (%AFW or AFP) and plasma very low-density lipoprotein (VLDL) concentration as selection criteria [17]. The two lines were selected for 11 generations and the AFP changes over the 11 generations are shown in Figure 1. A total of 475 individuals from generation 11 of NEAUHLF were used in this study.

Figure 1
figure 1

The separation of AFP over 11 generations between lean and fat lines.

Genotyping was carried out using the Illumina chicken 60K SNP chip containing a total of 57636 SNPs. Markers were filtered to exclude loci with unknown positions, monomorphic loci and loci with a minor allele frequency <0.05.

The haplotype and LD analysis

The fastPHASE [18] (http://depts.washington.edu/fphase/download/) was used to reconstruct the haplotypes for every chromosome using the default parameters. The reconstructed haplotypes were inserted into HAPLOVIEW v4.1 [19] to estimate LD statistics based on pairwise r2 and to construct the blocking pattern in the candidate regions of interest to enable selection signature analysis.

The EHH test

The “core region” was defined as the region in the genome characterized by the strong LD among SNPs involving a set of “core haplotypes” [7]. The Sweep v.1.1 (http://www.soft82.com/get/download/windows/sweep/) was used to identify the core regions [11]. The algorithm defined a pair of SNPs to be in strong LD if the upper 95% confidence bound of D’ is between 0.70 and 0.98 [20]. The program was set to select core regions with at least two SNPs. EHH was defined as the probability that two randomly chosen haplotypes carrying the candidate core haplotype were homozygous for the entire interval spanning the core region to a given locus [11]. The EHH test [11] was based on one of the core haplotype vs. other haplotypes in the same position. The “Relative Extended Haplotype Homozygosity” (REHH) statistic corrects EHH for the variability in recombination rates [7]. It was computed by EHHt / EHH ¯ ; with EHH ¯ defined as the decay of EHH on all other core haplotypes combined. The REHH value was used in the current study to determine the selection signatures. To determine the significance of REHH values, the haplotypes were ordered into 20 bins according to their frequencies [7]. The REHH values of each haplotype in a candidate region were compared with all equally frequent haplotypes and the P-values were obtained [11]. The significant selection signatures were defined as P<0.01.

Results

Markers and core haplotypes

A total of 43034 SNPs on 28 autosomes in chickens were included in the selection signature analysis (Table 1). These markers covered 950.68 Mb of the genome, with an average of 22.09 kb between adjacent markers.

Table 1 Summary of genome-wide marker and core region (CR) distribution in the lean and fat lines

For the SNPs analyzed in this study, the average minor allele frequency was 0.29 ± 0.13. A summary of genome-wide markers and core haplotype distribution in the data set is shown in Table 1. A total of 5357 and 5593 core regions spanning 549523.91 kb and 480784.79 kb of the genome, respectively, in the lean and fat lines were detected (Table 1). Mean core region length was estimated as 102.58±37.24 kb and 85.96±26.65 kb, with a maximum of 2288.64 kb and 2191.34 kb in the lean and fat lines, respectively (Table 1). Chromosome 1 was the largest chromosome in chickens, and it had the largest haplotypic structures in the genome, which covered 110644.43 kb and 105728.03 kb in the lean and fat lines, respectively. For each chromosome, the proportion of length covered by core regions vs. total length, as well as the number of SNPs forming core regions vs. the total number of SNPs, are shown in Table 1. The distribution of the size of core regions is shown in Figure 2. Overall, 25069 and 22180 SNPs in the lean and fat lines, respectively, participated in forming core regions, with a range of 2 to 19 SNPs per core.

Figure 2
figure 2

Distribution of SNP numbers in the core regions (A) and the length of core regions (B) in lean and fat lines.

Whole genome selection signatures

For all 5357 and 5593 core regions in the lean and fat lines, respectively, a total of 44822 and 46775 EHH tests, with an average of 8.37 and 8.36 tests per core region, were calculated. To find outlying core haplotypes, we calculated REHH at 1 Mb distances both on the upstream and downstream sides. Figure 3 shows the distribution of REHH values vs. haplotype frequencies in the lean and fat lines, respectively. Corresponding P-values are indicated by different colored symbols. The –log10 of the P-values associated with REHH against the chromosomal position was plotted to visualize the chromosomal distribution of outlying core haplotypes with frequency <25% (Figure 4). The results indicated that these selection signals were not uniformly distributed across all chromosomes, with a substantial overrepresentation on chromosomes 1, 2, 3 and 4.

Figure 3
figure 3

Distribution of REHH vs. core haplotype frequencies in the lean and fat lines. Core haplotypes with P-values lower than 0.05 and 0.01 are presented in blue and red, respectively.

Figure 4
figure 4

Genome-wide map of P -values for core haplotypes with frequency >0.25 in lean and fat lines, respectively. Dashed lines display the threshold level of 0.01.

The genome-wide statistics of the selection signature test, including the number of tests and outlying core haplotypes for each chromosome, are presented in Table 2. Of 16677 and 18346 tests on core haplotypes with frequency ≥0.25, there were 51 and 57 tests with P<0.01 in the lean and fat lines, respectively. There were 153 and 251 tests with P<0.05 in the lean and fat lines, respectively.

Table 2 The number of tests on core haplotypes (CH) (both sides) with frequency≥0.25 and P -values of REHH test

The conformity of the distribution of Tukey’s outliers was examined, with outlying core haplotypes defined at the threshold level of 0.01. Figure 5 displays box plots of the distribution of –log10 (P-values) within each bin of core haplotype frequency. The results indicated that the extreme outliers appear in the small haplotype frequencies bins.

Figure 5
figure 5

Box plot of the distribution of P -values in core haplotype frequency bins in the lean (left) and fat (right) lines. The dashed and continuous lines indicated the threshold P-values of 0.01 and 0.001, respectively.

Mapping selection signatures to genes

A summary of statistics for 51 and 57 positively selected core regions with P<0.01 of the REHH tests in the lean and fat lines, respectively, is presented in Table 3. Corresponding genes were identified by aligning the core positions with the chicken genome sequence (Table 3). The full genes names were from Ensembl online (http://www.ensembl.org/index.html). A total of 66 and 46 genes in the core regions were detected in the lean and fat lines, respectively, including RB1 (retinoblastoma 1), BBS7 (Bardet-Biedl syndrome 7), MAOA (monoamine oxidase A), MAOB (monoamine oxidase B), EHBP1 (EH domain binding protein 1), LRP2BP (LRP2 binding protein), LRP1B (low-density lipoprotein receptor-related protein 1B), MYO7A (myosin VIIA), MYO9A (myosin IXA) and PRPSAP1 (phosphoribosyl pyrophosphate synthetase-associated protein 1). The haplotype analysis of these genes revealed that the haplotype frequencies were significantly different (P<0.01) between the two lines (Table 4).

Table 3 Statistics summary for core haplotypes with P <0.01 after the relative extended haplotype homozygosity (REHH) test
Table 4 Haplotype frequencies in the lean and fat lines of the core regions including 10 important genes

Mapping selection signatures to QTLs

The chicken QTL database available online (http://www.animalgenome.org/cgi-bin/QTLdb/GG/index) was explored to identify any overlapping of the core regions with significant REHH P-values (P<0.01) and published QTLs in chickens. The approximate positions of the overlapping QTLs for each core region are listed in Table 5. There were many overlaps between the core regions with significant REHH P-values (P<0.01) and published QTLs for AF content in chickens.

Table 5 Reported QTL near the core regions with P <0.01 in the lean and fat lines

Discussion

Selective sweep is used to detect genomic regions with reduced variation in allele frequency in any population experiencing divergent selection for specific traits. Here, we determined the feasibility of the selective sweep approach for finding genes important for AF deposition in chickens. The long-range haplotype test was employed, which detects selection signature by measuring the characteristics of haplotypes within the lean and fat lines divergently selected for AF content. There were 5357 and 5593 core regions in the lean and fat lines, respectively. When comparing the average marker spacing with mean core length and number of SNPs forming cores, we revealed that core regions are more likely to appear in regions with higher marker density.

The selection signatures on the whole genome were calculated, and a subset of putative core regions with significant REHH P-values (P<0.01) was identified. The genes in these core regions were detected and 10 genes, including RB1, BBS7, MAOA, MAOB, EHBP1, LRP2BP, LRP1B, MYO7A, MYO9A and PRPSAP1, were important for fatness. Among these 10 important genes, seven genes, including RB1, BBS7, MAOA, MAOB, EHBP1, LRP2BP and LRP1B, were all in the QTL regions reported previously for AF in chickens (Table 5). Although the other three genes, including MYO7A, MYO9A and PRPSAP1, were not in the QTL regions, these genes were also important for the AF deposition.

The known functions of these 10 genes were analyzed and the results indicated that they were likely to be linked with fatness. The RB1 gene regulates the C/EBP-DNA-binding activity during 3T3-L1 adipogenesis and plays a key role in adipocyte differentiation [40, 41].

The BBS7 gene is a member of the Bardet-Biedl syndrome (BBS) family. BBS is a pleiotropic genetic disorder characterized by obesity, photoreceptor degeneration, polydactyly, hypogenitalism, renal abnormalities, and developmental delay [42]. BBS is recognized to be a genetically heterogeneous autosomal recessive disorder mapped to eight loci [42]. Positional cloning and candidate genes identified six BBS genes, including BBS1, BBS2, BBS4, BBS6, BBS7, and BBS8[42]. These BBS genes may be important for obesity.

The MAOA and MAOB are two enzymes important for dopamine production. The dopamine levels influence the risk of obesity and MAOA and MOAB may be implicated in human obesity [43].

The EHBP1 gene is required for insulin-stimulated GLUT4 movements [44]. Insulin stimulates glucose transport in adipose tissues by recruiting intracellular membrane vesicles containing the glucose transporter GLUT4 to the plasma membrane [44]. The mechanisms involved in the biogenesis of these vesicles and their translocation to the cell surface were studied and the results indicated that EHD1 and EHBP1 are required for perinuclear localization of GLUT4, and the loss of EHBP1 disrupts insulin-regulated GLUT4 recycling in cultured adipocytes [44]. This indicates that the EHBP1 gene may be important in adipocyte differentiation.

The LRP2BP and LRP1B genes are two members of the low-density lipoprotein receptor family that participates in a wide range of physiological processes, including the regulation of lipid metabolism, protection against atherosclerosis, neurodevelopment, and transport of nutrients and vitamins [45].

The MYO7A and MYO9A are two myosin genes. A spontaneous mutant mouse line, Myo7ash1-6J, was used to study the function of the MYO7A gene, and the result indicated that the mutant male homozygous mice displayed decreased body weight and body fat [46]. The MYO9A gene was in the BBS4 region of chromosome 15q22-q23 [47], which might be important for obesity.

The PRPSAP1 gene is named as phosphoribosyl pyrophosphate synthetase-associated protein 1. The results of differentially expressed genes associated with insulin resistance indicate that PRPSAP1 gene is associated with percentage of body fat [48].

The associations of these 10 genes with obesity or lipid metabolism were mainly in humans and mice. Because of the high conservation of these genes between humans, mice and chickens, the 10 genes might also be important for AF deposition in chickens.

Conclusions

Our results provide a genome-wide map of selection signatures in two chicken lines divergently selected for AF content. There were 51 and 57 core regions showing significant P-values (P<0.01) of selection signatures in the lean and fat lines, respectively. In these core regions there were a number of important genes, including RB1, BBS7, MAOA, MAOB, EHBP1, LRP2BP, LRP1B, MYO7A, MYO9A and PRPSAP1. These genes are important for AF deposition in chickens.