Estimates of linkage disequilibrium and effective population sizes in Chinese Merino (Xinjiang type) sheep by genome-wide SNPs

Knowledge of linkage disequilibrium (LD) is important for effective genome-wide association studies and accurate genomic prediction. Chinese Merino (Xinjiang type) is well-known fine wool sheep breed. However, the extent of LD across the genome remains unexplored. In this study, we calculated autosomal LD based on genome-wide SNPs of 635 Chinese Merino (Xinjiang type) sheep by Illumina Ovine SNP50 BeadChip. A moderate level of LD (r 2 ≥ 0.25) across the whole genome was observed at short distances of 0–10 kb. Further, the ancestral effective population size (N e) was analyzed by extent of LD and found that N e increased with the increase of generations and declined rapidly within the most recent 50 generations, which is consistent with the history of Chinese Merino sheep breeding, initiated in 1971. We also noted that even when the effective population size was estimated across different single chromosomes, N e only ranged from 140.36 to 183.33 at five generations in the past, exhibiting a rapid decrease compared with that at ten generations in the past. These results indicated that the genetic diversity in Chinese Merino sheep recently decreased and proper protective measures should be taken to maintain the diversity. Our datasets provided essential genetic information to track molecular variations which potentially contribute to phenotypic variation in Chinese Merino sheep.


Introduction
Linkage disequilibrium (LD) is the nonrandom co-occurrence of alleles within a chromosome or haplotype, i.e., statistical associations between alleles at separate loci that differ from the expectation for independent, and randomly sampled alleles (Wall and Pritchard 2003). The effective population size (N e ) refers to the size of an idealized population that has the same dispersion of gene frequency under random genetic drift or the same degree of inbreeding as the population under consideration (Wright 1938). Both of these are crucial parameters for evaluations of population genetic diversity and can provide a powerful method to characterize and understand the genetic architecture underlying complex traits. With the advent of high-density SNP chips, high-throughput genotyping provides substantial data for more accurate estimate of LD and N e . These datasets help to improve our understanding of historical recombination within population (Reich and Lander 2001). LD over long distances can be used to evaluate recent N e , while LD at short distances can be applied to estimate ancient N e (Hayes et al. 2003). Accordingly, N e provides important information for the protection of populations.
Abstract Knowledge of linkage disequilibrium (LD) is important for effective genome-wide association studies and accurate genomic prediction. Chinese Merino (Xinjiang type) is well-known fine wool sheep breed. However, the extent of LD across the genome remains unexplored. In this study, we calculated autosomal LD based on genomewide SNPs of 635 Chinese Merino (Xinjiang type) sheep by Illumina Ovine SNP50 BeadChip. A moderate level of LD (r 2 ≥ 0.25) across the whole genome was observed at short distances of 0-10 kb. Further, the ancestral effective population size (N e ) was analyzed by extent of LD and found that N e increased with the increase of generations and declined rapidly within the most recent 50 generations, which is consistent with the history of Chinese Merino sheep breeding, initiated in 1971. We also noted that even when the effective population size was estimated across different single chromosomes, N e only ranged from 140.36 to 183.33 at five generations in the past, exhibiting a rapid decrease compared with that at ten generations in the past. These results indicated that the genetic diversity in In domestic livestock, many economic traits have been subjected to natural and artificial selection. These traits are passed on to offspring, and shape pattern of LD in populations. Thus, LD patterns throughout the genome reflect selection in individual breeds in domestic livestock species. In fact, the LD pattern in a population is the basis for genomic prediction, genome-wide association studies (GWAS), and quantitative trait loci (QTL) mapping for complex traits. The extent of LD has been examined in diverse taxa, including pigs (Amaral et al. 2008;Badke et al. 2012), horses (Corbin et al. 2010), cattle (Khatkar 2008;Mokry et al. 2014), and chickens (Qanbari et al. 2010). In the study of sheep, LD has been estimated using microsatellite markers across the genome in the early stage (Meadows et al. 2008).With the development of the Illumina Ovine SNP50 BeadChip, researchers investigated the extent of LD genome-wide in the wild big horn sheep (Miller et al. 2011), Dorper Spanish Churra sheep, Sunite (García-Gámez et al. 2012, German Mutton Merino, Dorper (Zhao et al. 2014), etc. Previous sheep studies revealed that LD in sheep persists for relatively shorter genomic distances compared with LD in other domestic species, and the behavior of LD varies among sheep breeds (Kijas et al. 2012). These findings provided insights into population structure, population diversity and genomic selection programs in sheep.
Chinese Merino (Xinjiang type) is the most well-known sheep breed for fine wool production in China. It has been breeding for highly specific purposes, such as fine and soft wool and hornless females, by the introduction of Australian Merino sheep via stages of grading, crossing, and collective reproduction. However, to date, the extent of LD across the genome in Chinese Merino sheep remains unexplored. The objective of this study was to investigate the behavior of LD in the genome of Chinese Merino (Xinjiang type) using ovine 50 K SNP panels.

Animals
A total of 635 female Chinese Merino (Xinjiang type) sheep were randomly selected. All sheep were born between 2006 and 2012 in Bohu farm and Gongnaisi farm (Xinjiang, China). A very small sample of ear tissue was obtained for genomic DNA preparation using the saturated phenol-chloroform method (Sambrook and Russell 2002). DNA samples (2500 ng) with a 260/280 absorbance ratio of ≥1.8 and a DNA concentration of ≥50 ng/µl were submitted for genotyping. The DNA was genotyped using the Illumina Ovine SNP50 Bead-Chip (Illumina Inc., San Diego, CA,USA) (Teo et al. 2007), which contained 54,241 SNPs and at an average distance of 50.9 kb. The marker-QC (quality control) process included three steps: (i) control of the call rate (≥0.95); (ii) minor allele frequency (MAF) (≥0.05); (iii) correspondence with the Hardy-Weinberg equilibrium (HWE) (P-value > 0.00001). These parameters were calculated using PLINK v1.07.

LD estimation and functional annotation of genes
Two measures, r 2 (William 1974) and D′ (Lewontin 1964), have been proposed to estimate the extent of LD. In this study, we used r 2 as a measure of LD because it is more robust to allele frequency variation than D′ (Ardlie et al. 2002) and population size has a greater influence on D′ than r 2 (Zhao et al. 2007). The r 2 value can be expressed as follows: where P A1 , P A2 , P B1 , and P B2 are the frequencies of each allele at loci A and B, and P 11 , P 12 , P 21 , and P 22 are the frequencies of haplotypes A1B1, A1B2, A2B1, and A2B2, respectively.
PLINK (Purcell et al. 2007) includes a set of options to calculate pair-wise LD between SNPs and to process these data. The average r 2 values for distances of 0-25 kb, 25-50 kb, 50-100 kb, 100 − 500 kb, 0.5-1 Mb, 1-5 Mb, and 5-10 Mb in 635 Chinese Merino (Xinjiang type) sheep were calculated. LD was estimated as the mean r 2 value for each autosomal chromosome. Moreover, LD was calculated for randomly sampled populations (N = 30, 50, 100, 200, and 400). The four highest average r 2 values for a distance of 0-10 kb were detected on the four individual autosomes (OAR25, 24, 18, and 10). Genes were further evaluated when they were located between the pair of SNPs with r 2 ≥ 0.25 or were within 100 kb upstream or downstream of the pair of SNPs. The annotated gene lists were then used as inputs for a UniProt (http://www.uniprot.org/) analysis to obtain NCBI gene and protein IDs. Functional annotations of identified genes were made using BLAST2.0 and a parameter score of alpha greater than 0.6 was used to obtain gene ontology (GO) annotations (Conesa et al. 2005).

Effective population size estimation and genetic diversity
LD data make it feasible to estimate N e . Sved (1971) described the relationship between LD and N e using the following equation: (1) r 2 = (P 11 P 12 − P 12 P 21 ) 2 P A1 P A2 P B1 P B2 (2) Here, r 2 is the LD between different markers, N e is the effective population size, and c is the genetic distance between various markers measured in Morgans; n is the chromosome experimental sample size; α = 1 in the absence of mutation and α = 2 if mutation is taken into account; k = 4 for autosomes and k = 2 for the X chromosome. In contemporary studies, physical distance is used instead of genetic distance to estimate population size. A physical distance of 100 kb is approximately equivalent to a genetic distance of 0.1 cM. The genetic distance reflects the effective population size at a certain number of generations in the past according to 1/2c (Hayes et al. 2003). Generally, the formula that assumes the absence of mutation is used to estimate N e . Hence, we used k = 4, and c = 1 to calculate N e .
It is generally agreed that abundant genetic diversity within a livestock species is a prerequisite for coping with potential changes in livestock farming conditions. Given that genetic diversity influences LD and N e , Chinese Merino (Xinjiang type) were also used to analyze the proportion of polymorphic loci (P N ), inbreeding coefficient and gene diversity (H E ) using PLINK v1.07 (Purcell et al. 2007).

SNP statistics
After quality control, we identified 46,062 SNPs in Chinese Merino (Xinjiang type) sheep distributed over 26 autosomal chromosomes. SNP information for each autosomal chromosome is summarized in Table 1. The total autosomal chromosome length was 2650.80 Mb, with an average chromosome length of 101.95 Mb; the longest Ovis aries autosomal chromosome was OAR1 (299.64 Mb) and the shortest was OAR24 (44.85 Mb). The average distance between adjacent SNPs was 57.49 kb; the longest adjacent SNP interval was 3.42 Mb within OAR10 and the shortest interval was observed in OAR14.

Extent of LD across the genome
Generally, at a distance of greater than 10 Mb, free recombination is assumed (De Roos et al. 2008;Zhao et al. 2014); accordingly, the range between markers was set at 0-10 Mb to estimate LD between SNP markers. The mean values of r 2 for genetic distances of 0-10 kb, 10-25 kb, 25-50 kb, 50-100 kb, 100-500 kb, 0.5-1 Mb, 1-5 Mb, and 5-10 Mb were 0.25, 0.17, 0.11, 0.07, 0.03, 0.02, 0.015, and 0.010, respectively, as summarized in Table 2. There is a decline in average values of r 2 with the increasing physical distance between SNPs. The extent of LD was quite different among individual chromosomes. The average r 2 for SNPs separated by intervals 0-10 kb, 10-25 kb, 25-50 kb, 50-100 kb, 100-500 kb, 0.5-1 Mb, 1-5 Mb and 5-10 Mb in each autosomal chromosome are presented in Table 3. However, when the genetic distance exceeded 100 kb, the mean r 2 values exhibited even smaller differences among chromosomes. The mean r 2 values were 0.017-0.022 within 0.5-1 Mb, and were maximal within 100 kb-1 Mb for OAR10.

Sample size and LD estimates
In order to determine the proper sample size for LD evaluation, population subsets of five different sizes (N = 30, 50, 100, 200, and 400) were randomly selected from the population and the level of LD was estimated for each sub-population. As shown in Fig. 1, as the sample size increased, the  physical distance increased and the degree of LD tended to decrease. However, as shown in Fig. 2, within a physical distance of 50 kb, when N = 200 and physical distance was fixed at 50 kb, LD was not significantly different from what observed for the overall samples (P > 0.05).

Functional annotation of genes
We performed a search for positional candidate genes for the regions with the four highest average r 2 values at 0-10 kb. In total, 280 Ensembl genes were found. Of these, 1 3 236 genes were located between pairs of SNPs (r 2 ≥ 0.25), and 44 genes were located within 100 kb upstream and downstream of pairs of SNPs. These genes were evaluated in a GO analysis. In total, 2353, 320, and 395 GO terms related to biological processes, cellular components, and molecular function, respectively, were identified. Ten GO terms (GO:0001942, GO:0043588, GO:0010631, GO: 0 002064, GO:0060429, GO:0043473,GO:0007173, GO:0016055, GO:0007219, and GO:0030509) were related to hair follicle development, skin development, epithelial cell migration, epithelial cell development, epithelium development, pigmentation, epidermal growth factor receptor signaling pathway, Wnt signaling pathway, Notch signaling pathway, and BMP signaling pathway, respectively (Table 4). Among them, 47 genes were related to epithelium and hair follicle development, of which 28 genes were located on OAR10, 14 genes were located on OAR18, 2 genes were located on OAR24, and 3 genes were located on OAR25.

Effective population size and genetic diversity
We calculated N e of seven generations of whole autosomal and individual autosomes, respectively. Table 5 summarizes the N e of Chinese Merino (Xinjiang type) over time. Estimates of N e for Chinese Merino (Xinjiang type) showed an increasing trend with the increase of generations, as shown in Fig. 3. N e at 2000 generations ago was approximately 4,471.67 and decreased to 159.77 at five generations ago. The effective population size of Chinese Merino (Xinjiang type) differed depending on chromosome, and this was mainly caused by differences among chromosomes in LD between SNP markers (Table 6; Fig. 4). In the 5th generation (from the present), N e was 140.36-183.33. This indicated that the effective population size of Chinese Merino (Xinjiang type) decreased over time, and this decrease was more rapid at 100-50 years ago and recently stabilized. Within the same generation, there were significant differences in effective population size between chromosomes (ANOVA, P < 0.001, F-statistics, SPSSv19). When we calculated the effective population size for each chromosome, we found the value of generations became bigger. Within the current generation, OAR10 exhibited the smallest effective population compared with the other chromosomes. Due to OAR3 exhibit small effective population after 50 generations, OAR3 genetic information was selected in Chinese Merino (Xinjiang type) since 50 generations. In general, sheep breeds have roughly comparable levels of polymorphism, with a polymorphic loci (P N ) value ranging from 0.767 (Border Leicester) (Kijas et al. 2014) to 0.973 (Badger Faced) (Beynon et al. 2015). Accordingly, expected heterozygosity (H E ) was relatively high in all sheep breeds (range 0.306-0.380) (Beynon et al. 2015).
For the 46,062 SNPs analyzed in this study, estimates of polymorphic loci and expected heterozygosity were 0.994 and 0.393, respectively. The mean inbreeding coefficient was 0.017, which was lower than that of other sheep breeds in previous studies (Kijas et al. 2012).

Discussion
LD maps have improved the power and precision of association mapping, and are used to define optimal marker spacing. In this study, we analyzed LD in 635 Chinese Merino (Xinjiang type) using 46,062 SNPs distributed over 26 autosomal chromosomes. LD was distinct from populations, and with the sample size increased, the extent of LD droped down. When N = 200, the estimated LD within a physical distance of 50 kb was close to that of the general population.
We used r 2 as a measure of LD because it is more robust to allele frequency variation than D′ (Ardlie et al. 2002). For Chinese Merino (Xinjiang type) sheep, a moderate level of LD (r 2 ≥ 0.25) was only observed at marker distances of 0-10 kb. When moving from 10 to 500 kb, average r 2 declined from 0.25 to 0.03. Marker pairs with r 2 ≥ 0.25 were, on average, separated by 10 kb. However, not all marker pairs within 10 kb featured by r 2 ≥ 0.25. Our calculated r 2 values were smaller than those observed in Poll Dorset and Border Leicester sheep, but greater than those of Australia Merino and Suffolk breeds for 0-10 kb (Kijas et al. 2014). Compared to LD calculated by Zhao et al. (2014) within 0-10 Mb, Chinese Merino (Xinjiang type) LD was less than that of German Mutton Merino, but greater than that of Sunite sheep. Meanwhile, the extent of LD was different from the same fragment length on different autosomal chromosomes, which was consistent with previous results of sheep and cows (Edea et al. 2015;Lu et al. 2012;Zhao et al. 2014). Potential causes of these differences include variations in recombination rates between and within chromosomes, heterozygosity, genetic drift, and selection for economic traits.
Meat, wool, and milk are the most important economic traits in livestock production. In our analysis, we found 280 genes in 0-10 kb with maximum r 2 values for the first four autosomes (OAR25, 24, 18, and 10), of which 47 genes were involved in biological processes related to wool traits, particularly TNFRSF19, FOXO1, SMAD9, and FGF9. TNFRSF19 is located on OAR10 from 34,589,405 to 34,690,170 bp in sheep. It is a member of the tumor necrosis factor receptor superfamily (Hu et al. 1999) and plays a crucial role in hair follicle development, skin development, epithelial cell development, and epithelium development (Kojima et al. 2000;Pispa et al. 2008). FOXO1 functions on epithelium development and the epidermal growth factor  receptor signaling pathway, as well as Wnt signaling pathway (Guan et al. 2015;Mori et al. 2014;Xu et al. 2015 (Tsukamoto et al. 2014;Yoshimoto et al. 2005). FGF9 located on OAR10 from 35,583,772 to 35,610,032 bp, is a member of the fibroblast growth factor family. This gene is involved in the epidermal growth factor receptor signaling pathway and the Wnt signaling pathway (Loke et al. 2013;Zheng et al. 2012). Wang et al. (2014) found 28 significant SNPs in Chinese Merino (Junken type) mainly through a GWAS approach and from these SNPs, 25 genes were found to be affected. The 25 genes are dispersed located on OAR1, OAR2, OAR3, OAR4, OAR5, OAR6, OAR7, OAR8, OAR9, OAR10, OAR13, OAR23 and OAR25. In contrast, we found 280 genes locating on four chromosomes, and these genes showed a nonrandom distribution. The main reason is that Wang et al. used quantitative trait correlation analysis to detect genes under selection, while we detected genes directly from the perspective of genetic linkage. The genetic difference of these two breeds may also contribute to the distinction of the results. Wang et al. found candidate genes harboured in OAR10 and OAR25. Noteablly, after careful study, we also found some of these candidate genes located at LD blocks, such as CFDP2, TPTE2, NBEA as well as SLC25A5.
The LD pattern in a population is generally shaped by selection, mutation rate, recombination rate, consanguinity, genetic drift, and other factors. Moreover, LD and effective population size are closely related to each other. These data can help us understand the evolutionary history and genetic mechanisms of complex traits formation (Reich and Lander 2001). In the present study, we found that level of LD in Chinese merino sheep was lower and the effective population size of Chinese Merino (Xinjiang type) increased with the number of generations increase. The effective population size within the most recent 100 generations decreased more slowly, and a relatively rapid decrease was observed after 100 generations. The rate of decay increased at 50 generations, which corresponds to the time of Chinese Merino (Xinjiang type) breeding. N e estimated using each chromosome increased with the number of generations raised and decreased with LD drop down. These results were consistent with the analysis of N e for the whole autosomal genome. The effective population size estimated by OAR10 was smaller than other autosomes. The corresponding r 2 value was high. QTL related to horn characters is located on this chromosome, and as expectation, we detected RXFP2 gene, which is a candidate for horn phenotypes (Montgomery et al. 1996). This was consistent with the history of long-term breeding of hornless Merino sheep. OAR3 exhibited a relatively small effective population size after 50 generations (corresponding to the completion of Xinjiang type Merino breeding). This chromosome has QTL related to greasy fleece weight and staple length (Ponz et al. 2001). Keratin family genes related to wool quality were also located on this chromosome (McLaren et al. 1997). Factors that influence LD also influence N e according to the formulae used to estimate these parameters. For example, artificial selection will lead to higher LD, and a high LD would result in decreased N e . Therefore, the small effective population estimated for OAR10 and three may be explained by selection for characters related to wool quality on the basis of hornless sheep. This breed had a larger effective population size than that of Sunite sheep calculated by Zhao et al. (2014) within the same 2000 generations. This indicated that Chinese Merino (Xinjiang type) sheep may have originated from a large effective population. These results may be related to natural selection of Chinese Merino sheep. We did not estimate present-day N e in this study. Therefore, we selected 10 Mb as the maximal fragment length. The average r 2 for Chinese Merino (Xinjiang type) population was 0.010. N e of Chinese Merino (Xinjiang type) was determined in this estimate, which may estimates the effective population size in current populations. For genetic diversity, Al-Mamun et al. (2015) examined crosses of Merino × Border Leicester with Poll Dorset (MxBxP; N = 231) and reported an expected heterozygosity of 0.38, but an estimated N e of 152, which is lower than that of other sheep breeds. In our study, Chinese Merino (Xinjiang type) displayed the highest genetic diversity as measured by polymorphic loci and gene diversity. Historically, Chinese Merino (Xinjiang type) did not undergo intensive selection, unlike the other sheep breeds. Hence, it is reasonable to assume that Chinese Merino (Xinjiang type) sustain higher levels of genetic variability compare with other sheep breeds. However, the N e of Chinese Merino (Xinjiang type) was much lower than that of other sheep breeds, particularly Badger Faced and Llandovery White Faced sheep breeds (Beynon et al. 2015). This difference may be explained by the history of Chinese Merino sheep breeding, in which a few Australia Merino rams were introduced, led to a decrease in N e , while the large number of native ewes facilitated the maintenance of high genetic diversity in Chinese Merino.
In short, in Chinese Merino (Xinjiang type) population, LD decayed as genomic distance increased and the observed LD values were consistent with previously estimates in other sheep populations. After 50 generations (in the past), N e decreased more rapidly than in recent generations and proper protective measures should be taken to maintain the breed resources genetic diversity. Fig. 4 Boxplot represents the trend in log10(N e ) over time. The variability at each time point reflects variation in the estimates among the 26 autosomes. The X axis represents the number of generations in the past; the Y axis represents log10(N e ) he/she has no conflict of interest. Mingjun Liu declares that he/she has no conflict of interest.
Ethical approval All animal work was conducted according to the guidelines for the care and use of experimental animals established by the Ministry of Agriculture of China. The ethics committee of Xinjiang Academy of Animal Science approved this study.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.