Background

Numerous studies have shown applicability of genomics in the field of quantitative genetics and identification of sources of variation of important phenotypic features such as production traits [1]. Unfortunately, research in this field in most cases requires numerous study populations, well characterized in terms of phenotype, relatedness structure and genome features. Due to the high cost of conducting such research, generated by the need to establish genotypes and phenotypes of large animal populations, attempts have been made to identify traces left in animal genomes by selection pressure directed at the consolidation or improvement of particular phenotypic traits [2]. The identification of selection signatures assumes detection of genomic regions in which gene variants subjected to a rapid increase in allele frequencies under the influence of selective pressure (ongoing selection) are located or detection of genome regions that are fixed in a population with well-established phenotypical features. Detection of selection signals in conjunction with subsequent candidate gene identification approach may indicate the location of major genes responsible for selected traits [3]. The advantage of such approach is that it is independent of the availability of detailed information on the phenotype of individual animals and is applicable to relatively small study populations [2].

Similarly, as other livestock species, pigs have been under long term selection during domestication, breeds formation and further improvement of production and functional traits. To analyze the mechanism underlying phenotypic differentiation caused by selection in pigs, the evidence of selection has been searched in genomes of various pig breeds using whole genome genotype data or high-throughput sequencing [4,5,6,7,8,9]. The studies allowed detecting several selection signals associated with growth traits, reproduction traits, coat color or ear phenotype and to indicate several genes with major effects on these traits [5, 7]. Nevertheless, selection patterns in pig breeds differ depending on their evolution and breeding histories, so exploration of selection signatures in possibly the largest number of different breeds will help to better understand the genetic variation underlying the traits of interest.

In the present study, we detected selection signatures at the whole genome level in three conserved pig breeds derived from the native pig populations (Puławska, Złotnicka White and Złotnicka Spotted) and a commercial Polish Landrace breed, differing in terms of production, reproduction characteristics and exterior features. Among the native breeds we included Puławska breed which is valued for good resistance to harsh environmental conditions and diseases, taste of meat (its aroma and juiciness) and usefulness for extensive breeding in ecological farms. Two other studied Złotnicka pig breeds (White and Spotted) are characterized by similar functional features but have worse productivity than Puławska breed and have more primitive character. Especially Złotnicka Spotted is being considered as a meat-lard type breed and is characterized by high subcutaneous fat content and relatively low fertility [10,11,12,13]. The detailed characteristic of the studied breeds is presented in Additional file 1.

To identify selection signatures in the analyzed breeds, we applied two different methods: first, based on FST (classical measure of population differentiation) [14] and aimed at identification of among-breeds diversifying selection and second, based on relative extended haplotype homozygosity (REHH) [15] statistics allowing for detection of mainly within-breed ongoing selection. Both FST and REHH statistics were shown to be useful to detect selection signatures [16] and they are largely complementary, because REHH test has good power to detect selection signatures within breeds and is more accurate in case of ongoing selection, while FST is useful to detect selection signatures across breeds, represented mainly by loci that were differentially fixed in different breeds [17].

Despite the lack of typical selection in terms of production in the case of conservative breeding, the animals’ qualification into the conservation program (based on breed standard and aimed at consolidation of breed-specific traits and stabilization of exterior) may also be considered as an extensive selection and lead to similar but less pronounced changes in the frequency of alleles. The applied combination of breeds and statistical methods allowed us to search for selection signals associated with both fixed traits of individual breeds and features that are still under improvement, which may help to better understand adaptation of breeds to local environmental conditions and help to evidence processes behind good health, longevity and low environmental requirements of native pigs.

Methods

Animals and genotyping

The material of the study was genomic DNA obtained from blood or hair bulbs of 530 animals sampled from each of the four pig breeds: Polish Landrace (PL, n = 135), Puławska (PUL; n = 155), Złotnicka White (ZW; n = 141), Złotnicka Spotted (ZS; n = 99) differing in terms of production, reproduction and exterior features. The animals were selected to be unrelated for at least two generations and originating from different herds. Each population sample included at least 7% of males. This was because we analyzed breeding population (reproductive studs) in which share of males is matched to the number of boars which are designed to natural matting. All animal procedures were approved by the Local Animal Care Ethics Committee No. II in Kraków - permission number 1293/2016 in accordance with EU regulations. The genomic DNA was purified using a Sherlock AX kit (A&A Biotechnology) and after quality control was genotyped with the use of the PorcineSNP60 BeadChip assay (Illumina) according to the standard Infinium Ultra protocol. The obtained genotypes were controlled for quality by evaluation of call rates and only samples with more than 97% of called genotypes were used for further analysis. Of the 61,565 assayed SNPs, a panel of 50,485 markers was further obtained by removing SNPs mapped to contigs, located on the sex chromosomes (Sscrofa10.2 genome assembly) or classified as intensity-only probes.

Data analysis

The initially filtered SNP set was further reduced by applying population-wide polymorphism filters. The filtering included removal of SNPs with MAF lower than 5% and SNPs with more than 20% of missing genotypes across all breeds. MAF cutoff used for SNPs filtering was applied to the whole population (all breeds). This allowed to retain small proportion of SNPs that are monomorphic only in some breeds (presumably fixed for some reason, including selection and inbreeding). MAF value of 0.01 was used to characterize remaining SNP polymorphisms. SNPs deviating from HWE with critical P-value of 1.0E-06 in each breed separately were also removed resulting in a final panel of 43,923 common SNPs with average inter-marker distance of 55.7 kb (±78.0). The signals of diversifying selection were detected using pairwise Wright’s FST [18], the classical measure of population genetic differentiation. The FST values obtained for pairwise comparisons at each SNP were treated according to a methodology proposed by Akey et al. [19] and further applied by other studies [7]. In brief, standardized FST values were calculated (di) as:

$$ {d}_i={\sum}_{j\ne i}\frac{F_{ST}^{ij}-E\left[{F}_{ST}^{ij}\right]}{sd\left[{F}_{ST}^{ij}\right]} $$

where \( E\left[{F}_{ST}^{ij}\right] \) and \( sd\left[{F}_{ST}^{ij}\right] \) denote the expected value and standard deviation of FST between breeds i and j calculated from all analyzed 43,923 SNPs. This allowed to make comparison of each breed against all other breeds under study. To account for stochasticity in locus-by-locus variation, a 10-SNP sliding window was further implemented on the obtained values. Candidate selected regions were then defined as the 99.9th percentile of the empirical distributions of window-averaged di values. The adjacent regions under selection were merged and (while searching for gene content) regions were expanded on both ends by 25 kb to detect neighboring, potentially linked genes.

The signals of positive selection within single breeds were detected using REHH statistics implemented in the Sweep v.1.1 software [6]. First, the obtained genotypes were phased and imputed using the fastPhase software [20]. The phased genotypes were then used to detect core haplotypes with minimum of three and no more than twenty SNPs. The detected longest non-overlapping core haplotypes were then subjected to EHH test, which is based on comparing a core haplotype with both higher frequency and higher EHH with other core haplotypes at the same locus. Subsequently, a probability that two randomly selected haplotypes within a core region are identical-by-descent for the entire interval spanning the core region to a given locus was computed [15, 21]. Finally, considering variation in recombination rates across the genome, the relative extended haplotype homozygosity (REHH) statistics was used [15] and calculated at about 1 cM (approximated to 1 Mb) distance [22] on both upstream and downstream directions (with exception of chromosome ends) from each core against all other cores within the region. To determine REHH significance, haplotypes were allocated to twenty frequency bins and the REHH values were compared between equally frequent core haplotypes found within the region. REHH P-values were ultimately obtained by a logarithmic transformation of the REHH values within these bins (to reach normality) and calculation of mean and standard deviation. The core haplotypes with the most extreme P-values (extended by 0.5 Mb in each direction) were filtered for frequency (> 0.25) and screened for overlapping pig ENSEMBL genes with the use of UCSC Genome Browser.

The functional annotation of detected genes was performed using the KOBAS 3.0 web server [23] and WebGestalt (WEB-based GEneSeTAnaLysis Toolkit) [24]. A gene list enrichment analysis was done according to all known pig genes applying a correction for multiple testing.

The population differentiation was additionally visualized using the principal component analysis (PCA) based on SNP genotypes and a cladogram of mean pairwise FST distances created using the neighbor joining (NJ) method [25].

Results

SNPs polymorphism parameters and breeds genetic differentiation

The applied SNPs filtering criteria allowed obtaining a common set of 43,923 SNPs polymorphic across the whole population with mean inter-marker distance of 55.7 kb (±78.0). The number of polymorphic SNPs (MAF > 0.01) per breed ranged from 37,423 to 43,567 in ZS and PL breeds, respectively. The average MAF across all SNPs was the lowest in ZS (0.212) and the highest in PL (0.279). The averaged observed heterozygosity per breed ranged from 0.3 to 0.367 for the same breeds (Table 1). Mean and weighted overall pairwise FST distances were the highest between PUL and ZS (0.143 and 0.173) and the lowest level of genetic differentiation was found between PL and ZW (0.085 and 0.097) (Table 2). The visualization of pig breeds genetic diversity using PCA and NJ methods is presented in Fig. 1.

Table 1 SNPs panel polymorphism parameters
Table 2 Mean (above diagonal) and weighted (below diagonal) pairwise FST distances between the studied pig breeds
Fig. 1
figure 1

Genetic differentiation of the analyzed pig breeds based on (a) principal component analysis and (b) the neighbor joining method on mean pairwise FST distances

Signals of diversifying selection

Signals of diversifying selection among the studied pig breeds were detected based on the breed-normalized pairwise FST distances (Additional file 2). After smoothing of the data by moving average, top 0.1% of the observations were considered as pinpointing breed-specific selection signals. After merging of overlapping signals, from 7 to 11 genome regions with strong selection signals were detected per breed with a size ranging from 266.8 kb to 2.9 Mb. The highest number of selection signals across all breeds was detected on SSC1, SSC7 and SSC14 (SSC9 only in ZS breed) and only single regions were detected on SSC2, 4, 5, 15 and 17. No signals (except closely positioned regions on SSC7, between 32 and 34 Mb of the genomic sequence in PL and PUL) were common for different breeds (Table 3, Fig. 2).

Table 3 Genome regions spanning the strongest detected diversifying (FST-based) selection signatures
Fig. 2
figure 2

The genomic distribution of diversifying selection signals for all studied breeds. Dashed line indicates the top 0.1% of the highest standardized FST values

To analyze gene content of the genome regions spanning the detected selection signals, each region was expanded by 25 kb on each end to account for potentially linked genes. This allowed for detection from 61 (ZS) to 116 (PUL) ENSEMBL genes per breed (Additional file 3). To manage variety of genes found within the selection signals we performed a functional analysis aimed at the identification of enriched processes. Keeping in mind that the detected selection signatures are associated with a number of phenotypic features differing the studied breeds (which are conditioned by complex and very different molecular mechanisms), we expected only very few genes connected with separate biological processes in the enrichment analysis and low statistical significance of the obtained results. This analysis was rather treated as a supporting method which helped us to manage the obtained extensive gene content. Nevertheless, this analysis allowed to reduce the complexity of the obtained data and allowed us to search for processes and underlying genes potentially being the targets of diversifying selection. The functional classification of well-annotated genes showed that the genes were mainly involved in GO biological processes connected with: metabolic processes, cellular processes, biological regulation, response to stimulus and developmental processes. Among the top ten enriched (pointwise P < 0.05) GO biological processes there were inter alia those connected with: lipid binding, fatty acid metabolic process, cellular senescence or response to muscle stretch. When functional classification was performed for individual breeds, visible differences in the enriched GO categories were detected (Table 4). In PL pigs, a large share of genes was engaged in metabolic pathways and lipid binding, and included e.g.: COQ5, GATC, COX6A1, PLA2G1B, HK1, SDSL, PRIM2, ALDH2, SDS, GLTP and RPH3A genes. In PUL breed, the genes involved in several GO categories were detected, including those connected with: striated muscle cell proliferation, skeletal muscle tissue regeneration (PPARD and MAPK14) and embryo implantation (ARHGDIB, PPARD). The diversifying selection signals in ZW breed encompassed genes enriching very general categories of biological processes, like e.g. transcriptional repressor activity, intracellular processes or processes in nuclear lumen, however, some of the genes were connected with more specific pathways like: tryptophan metabolism (TDO2, AOX1) or salivary secretion (GUCY1A3, GUCY1B3). In ZS breeds, selection signals were associated with genes responsible for e.g.: neuropeptide Y receptor activity and feeding behavior (NPY1R, NPY5R), metabolic processes (PON1, PON2, PON3) and cortisol metabolic process (REST).

Table 4 Top ten biological processes enriched in genes detected in the genome regions under diversifying selection in separate pig breeds

Selection signals between breeds with different phenotypical features

To detect genome regions bearing variants potentially responsible for the most pronounced phenotypical differences between breeds, the pig breeds with white (PL, ZW) and spotted (PUL, ZS) coat color patterns were compared using FST statistics. The comparison between single color and spotted pig breeds revealed the strongest (99.9th percentile of the observations) selection signals on SSC3, SSSC7, SSC8 and SSC9 (Table 5). The most pronounced selection signal, involving five separate regions (between 50.5 and 58.6 Mb of sequence) was located on SSC8 in close vicinity to the KIT gene locus (SSC8, 41.4–41.5 Mb). Altogether, the regions spanned 72 different ENSEMBL genes which together did not enrich any biological processes of pathways, however, again included PPARD gene.

Table 5 Genomic regions spanning the strongest detected diversifying selection signals between single-color and spotted pig breeds

Within-breed selection signatures

The analysis performed with the Sweep v.1.1 software allowed detection from 4546 (ZS) to 5638 (PUL) longest non-overlapping core regions (CR – a unique genome region spanning locus specific core haplotypes) consisting of minimum three and no more than 20 SNPs with average lengths ranging from 193.7 kb (PL) to 394.3 kb (ZS). For all these core haplotypes, from 37,809 (ZS) to 46,162 (PUL) EHH tests were performed – from 8.2 to 8.3 per core region, on average (Additional file 4). Considering that alleles being under positive selection are fixed or going to be fixed and hence core haplotypes harboring these alleles should be frequent [21], core haplotypes with the frequency < 0.25 were removed. This allowed selection from 14,079 (ZS) to 17,580 (PUL) REHH tests per breed for which from 118 (ZS) to 176 (PL) core haplotypes were outliers at the significance level of 0.01. The obtained REHH P-values were –log10 transformed and plotted against the chromosomal positions to visualize outlying core haplotypes and selection patterns across the breeds genome (Fig. 3, Additional file 5). This analysis showed clear non-random distribution of selection signals across the breeds genome with visible overrepresentation of long and common haplotypes on e.g.: SSC2 and 14 in PL or SSC4 and 18 in PUL breed. The selection patterns also clearly differed between the analyzed breeds. To minimize the number of genes potentially associated with significant core haplotypes, only cores with P-value lower than 0.001 (representing the strongest selection signals) were further analyzed. The positions, haplotype frequency and other statistics for the most significant haplotypes belonging to separate core regions are presented in Table 6.

Fig. 3
figure 3

The genomic map of REHH P-values for all core haplotypes showing a frequency > 0.25. Dashed line represents the significance threshold of 0.001

Table 6 Statistics for the most significant core haplotypes being under selection and belonging to separate core regions in the analyzed breeds

After extension of the detected significant (P < 0.001) core haplotypes to 1 cM on both ends, 171 unique genes potentially being under selection were found in PL, 116 in PUL, 84 in ZW and 35 in ZS (Additional file 6). The functional annotation of the genes allowed credible GO categories enrichment analysis only for PL and PUL pigs, with only single genes connected with separate GO categories in ZW and ZS pigs (Table 7). Nevertheless, among the top 10 enriched biological processes, the genes associated inter alia with skeletal system development and morphogenesis (HOXB1, − 2, − 5, − 6, − 7, − 9) were found in PL pigs and regulation of protein transport and localization (AIP, TBC1D10C, GCC1, LEP) or muscle fiber development (FLNC, SMO) in PUL pigs. In ZW breed the suggestive biological processes were involved in e.g.: metal ion homeostasis and regulation (TTC7A, SLC30A10), negative regulation of innate immune response (DUSP10) and negative regulation of T cell apoptotic process (SLC46A2). In ZS breed, the most commonly represented processes included: galactosidase/galactosidase activity and lipid catabolic process (LOC102167689, LOC102167689).

Table 7 Top ten biological processes enriched in genes detected in the core haplotypes under positive selection in individual pig breeds

Discussion

Natural or artificial selection are the major mechanisms driving differentiation of the populations. Pig domestication resulted in considerable changes in the phenotypes and behavior of the animals. In the early stages of domestication, unconscious selection for behavioral traits was applied and this early stage was followed by methodical selection in which specific traits were selected based on breeding goals [26, 27]. This resulted in development of specialized breeds, improved to produce desired animal products or to represent a desired morphological standard. The artificial selection increased the variation between domesticated animals and their wild ancestors and generated a variety of different populations, differing in phenotypical features related to their specialization [28]. Identifying recent positive selection signatures in domesticated animals can provide valuable information on genomic regions that are under the influence of both artificial and natural selection, and thus, can help to identify beneficial variants and underlying biological pathways influencing economically important traits.

In this study, by using two largely complementary methods we detected selection signatures in four pig breeds, including Landrace and three conserved native breeds representing a maternal breeding component. The native pig populations can be considered as unselected, because the conservation programs are more focused on preserving genetic diversity and breed standard than improvement of production traits. The analysis of breeds’ genetic differentiation based on a cladogram obtained with the neighbor-joining method on the averaged FST values (accounting for all pairwise distances) showed the highest genetic similarity between both Złotnicka pig breeds. Such result was expected, due to the fact that both breeds originated from the same geographic background and were created in a similar time period based on the same genetic group. The clear separation of Landrace from the Polish conserved breeds was also observed, which probably reflects its origin from German and Swedish pig breeds and its high speciation in terms of production. The applied PCA method showed the highest genetic similarity between Polish Landrace and Złotnicka White pigs which both represent white, meat-type pigs with dropping ears.

In the studied pig breeds, we detected several selection signals, representing genome regions that were differentially fixed in different breeds or representing within-breed selection signatures. The diversifying selection signals were detected based on the comparison of a specific breed with all other breeds under study, presuming that these signals will be characteristic for unique breed features or strongly fixed breed traits being poorly developed in the other compared breeds. This can be an explanation for the lack of selection signals being common for different breeds. With the FST-based method, we detected large (over 1 Mb in size) selection sweeps on SSC7 (32.1–33.3 Mb) and 14 (42.2–43.7 Mb) among the top 0.1% of strongest selection signals in Polish Landrace. Several selection signals associated with these chromosomes were previously reported in other pig breeds [7, 29], however, their direct comparison is difficult since diversifying section signals are strongly dependent on structure of breeds used for comparisons. Several GWAS showed the association of closely positioned SNPs on SSC7 with backfat thickness [30], fatty acid metabolic indices [31], number of teats or body weight and height [32]. A similar to ours region on SSC14 was shown to be associated with a number of teats [33, 34]. The animals of conserved breeds included in this study considerably differed in terms of this trait, ranging from on average 13.71 (±0.97) teats in ZS to 14.22 (±0.88) in PUL breed. Unfortunately, no reproductive performance records were available for the PL pigs analyzed in this study. Nevertheless, according to the national evaluation, the average number of teats in PL breed is visibly higher than this observed in the conserved breeds - 15.02 (Additional file 7). This suggests that the detected strong selection signals on these autosomes may be a result of long-term selection for conformation and reproductive performance traits in PL breed.

While screening detected selection signals for gene content we extended the detected regions by additional bases. Extension of the detected regions was motivated by a need to account for linkage disequilibrium and the applied sliding-window approach. While moving average smooths the data, some surrounding genes may not fall directly into the selection signals. Because of some accidental differences in allele frequencies (e.g. genetic drift), selection signal peaks can be slightly shifted in relation to real functional variant locations. To not to lose this information we extended FST - based regions by additional 25 kb, which is an approximated half of mean distance between SNPs in this study. In the REHH method we extended the detected signals by 0.5 Mb in each direction. In our data, we found that core haplotype domains (their extended homozygosity) on average spans over 0.7 Mb in upstream and downstream directions from cores. We have narrowed these regions by adding 0.5 Mb to each core haplotype to account only for genes within the region of relatively high linkage with the core SNPs. Similar approach was previously proposed by Qanbari et al. [21]. The size of selection signals detected using the two approaches was comparable, suggesting that thanks to the linkage disequilibrium, selection signals are extensive and should not be narrowed down only to the detected signal peaks.

Within the strongest differential (FST-based) selection signals detected in Polish Landrace we found several candidate genes which may be responsible for the well-developed breed characteristics like good fertility and high growth rate. One of the interesting detected genes is RASAL1 gene (RAS Protein Activator Like 1), acting as a suppressor of RAS function. The protein enhances the weak intrinsic GTPase activity of RAS proteins resulting in the inactive GDP-bound form of RAS, thereby allowing control of cellular proliferation and differentiation [35]. A large share of detected genes in PL breed was engaged in metabolic pathways and included e.g. HK1 (Hexokinase 1) gene which phosphorylates glucose to produce glucose-6-phosphate which is the first step in glucose metabolism pathways. Glucose metabolism is important for energy homeostasis but has also an effect on growth and meat quality traits [36]. Two other genes were also detected within the identified selection signals in PL pigs, such as: LC8 (LC8 dynein light chain – DYNLL1) and KHDRBS2 (KH RNA Binding Domain Containing, Signal Transduction Associated 2) – related with reproduction traits. In mice, it was established that dynein regulates meiotic checkpoint during oocyte maturation [37] and KHDRBS2 was found to be associated with the number of teats in the GWA study in Large White pigs [33]. The other gene – PRIM2 (which encodes a DNA primase – large subunit) – was previously found in Bayesian GWAS to be in a gene network associated both with the number of stillborn piglets (SB) and the number of teats (NT) [33].

Among the strongest diversifying selection signals in PUL pig breed, we detected one with length over 1 Mb on SSC7 (35.9–37.3 Mb). This region was previously described to bear several QTLs for health, meat and carcass traits and exterior (Pig QTLdb, [38]). Within all diversifying selection signals in PUL pigs we detected candidate genes associated with inter alia striated muscle cell proliferation, skeletal muscle tissue regeneration (PPARD and MAPK14 - GO:0043403) and embryo implantation (ARHGDIB, PPARD - GO:0007566) important for fertility traits. The most interesting diversifying selection candidate in this breed seems to be PPARD gene (Peroxisome proliferator-activated receptor beta/delta) located on SSC7, which was shown to be associated both with ear morphology and backfat thickness [39, 40]. A previous study identified a missense mutation in the PPARD gene that significantly reduces its transcription activity, and consequently causes enlarged external ears in pigs [40, 41]. This is concordant with the phenotype of PUL breed which is the only breed with prick ear morphology in this study and the detected signal potentially results from diversifying selection for ear morphology among the studied breeds.

In the ZW breed, large size sweeps were detected on SSC1 (230.0–232.5 Mb) and 8 (45.2–47.5 Mb). These regions were previously shown to carry mainly QTLs for meat and carcass traits (SSC1) or health (SSC8) (Pig QTLdb, [38]). Within the strongest diversifying section signals in this breed, we detected genes associated inter alia with tryptophan metabolism (TDO2, AOX1) or salivary secretion (GUCY1A3, GUCY1B3). Tryptophan (TRP), as a precursor of neurotransmitters, was shown to be involved in pale, soft, exudative (PSE) pork syndrome [42] and dietary TRP deficiency was shown to correlate with decline of the appetite leading to reduced growth performance [43]. The TRP metabolism-associated genes may be then related with high stress resistance reported in the ZW breed and suggest its potential association with good meat quality and growth in this breed.

The largest genomic regions associated with differential selection in ZS breed were detected on SSC9. The regions were located between 75.6 Mb and 84.8 Mb of the chromosome sequence and were previously shown to carry mainly QTLs for exterior and reproduction features (Pig QTLdb, [38]). In ZS breed, the strongest detected selection signals were associated e.g. with genes responsible for: neuropeptide Y receptor activity and feeding behavior (NPY1R, NPY5R), metabolic processes (PON1, PON2, PON3) and cortisol metabolic process (REST). Our interest focused especially on genes associated with neuropeptide Y (NPY) activity. NPY has been implicated in several human diseases involving fat deposition aberrations and obesity [44]. This has special importance taking into account that Złotnicka Spotted breed is characterized by relatively high fat content in the carcass and, in comparison to other breeds, the meat of the ZS pigs is characterized by specific marbling resulting from higher intra muscular fat content. It was found that NPY reduces energy expenditure by decreasing adipose tissue thermogenesis [45, 46] and the obesity of mice was attenuated when NPY was knocked out [47, 48]. NPY deficiency also impaired responses to a palatable high fat diet in mice [49] and animals in which proopiomelanocortin neurons were knocked out were obese and hyperphagic [50,51,52]. These studies show the strong association of NPY with body fat generation by acting through several NYP receptors and suggest that two genes detected in this study that code for NPY receptors (2 and 5) may be connected with higher fat content and general higher fatness in the ZS breed when compared to the other studied breeds. The analysis of average back-fat thickens in the studied ZS animals confirmed general observations on this breed and showed that the average back-fat thickens was statistically significantly higher (P < 0.001) in ZS pigs (23.13 ± 3.53) than in three other analyzed breeds (in a range of 9.27 in PL to 18.03 in ZW breed) (Additional file 7).

Our FST-based comparison of white and spotted pig breeds revealed an ambiguous selection signal in SSC8 positioned in close vicinity to KIT gene – a known coat color modulator [53]. However, a few other strong signals were detected on SSC3, SSSC7 and SSC9, which did not overlap with any of genes commonly implicated in coat pattern in mammals, like e.g.: EDNRB, KITLG or MC1R [53]. Nevertheless, in a previous work similar to ours, QTL regions on chromosomes 7 and 9 were detected [54] which were shown to have effects on coat color extension in crossbreed pigs. Significant QTL loci affecting black spotting were also detected on SSC3 and SSC9 in exotic pig crosses [55]. This suggests that at least a few other coat color-related variants can be present in the studied pigs’ genomes. Another explanation of the observed signatures pattern can be that the phenotypes of both studied spotted breeds (PUL and ZS) are not yet fully stabilized or the combination of phenotypes used for comparisons was not fully correct, because PUL pigs can have black or reddish spots on slightly pigmented skin, whereas no skin pigmentation is characteristic for the other studied breeds, which may bias the obtained results.

At least some of the detected divergent selection signals could be easily associated with genes potentially responsible for traits especially pronounced or fixed in the studied breeds and suggest the detection of genome regions connected with mechanisms underlying differentiating traits. To supplement this data with information on within-breed selection signatures, which are independent of breeds differentiation, we additionally used REHH statistics [15], which was shown to be the most powerful for detecting ongoing selection for which the target allele has a moderate to high frequency [56]. The use of such data allowed us to search for selection signals connected with traits that are selected across all studied breeds, representing variants that are still segregating and are under ongoing selection. With this approach we detected from 118 (ZS) to 176 (PL) core haplotypes per breed being the outliers at the significance level of 0.01. The general genomic pattern of these signals had visible similarity among the studied breeds and was comparable with previously analyzed pig breeds (Fig. 4). The visible clusters of strong selection signals were found on several chromosomes, however, the particularly visible were those located in distal parts of SSC1, SSC14 and SSC15 (Fig. 4). These chromosomal regions were previously associated with meat color density (SSC14) [3], growth, reproduction and immune responses (SSC1, SSC15) [29], which seems reasonable as most of pig breeds are selected towards the improvement of these traits.

Fig. 4
figure 4

The comparison of the detected, significant (P < 0.01) within-breed selection signatures with previous results obtained using similar methods in different breeds

To analyze in details the genomic regions with the strongest signs of selection, we took a closer look at regions bearing haplotypes with P-value lower than 0.001. After extending them to 1 cM on both sides, we detected from 35 (ZS) to 171 (PL) unique genes per breed potentially being under positive selection. Several of those genes are connected with important production features like growth and fertility. Exemplarily, in PL pigs we identified 10 genes belonging to the Hox family (HOX1 to HOX9 and HOX13), which are key factors in the implantation and development of the embryo and regulate the function of the endometrium [57,58,59,60,61,62]. In PL pigs, we also detected a gene belonging to the ESR receptor signaling pathway – STAT5 – responsible for the signal transduction from the estrogen receptor. The STAT5 gene itself is broadly tested for association with fertility and milk production traits [63,64,65]. The STAT5 protein plays a key role in signaling in the estrogen receptor pathway and therefore can play an important role in modulating the role of estrogen [66]. A gene encoding the estrogen receptor itself – ER – recognized as a key component responsible for female reproduction traits [67] and reproductive-related behavior in female [68] was also detected within the selection signatures found in the studied PL pigs. Furthermore, our analysis allowed identification of CXCL12 gene (C-X-C Motif Chemokine Ligand 12) which as a ligand for the G-protein coupled receptor plays an important role in many biological processes including embryogenesis [69]. In pigs, CXCL12 gene was identified as one of differentially expressed genes up-regulated in endometrium at 18th day of pregnancy, which can indicate its potential role in embryogenesis [70].

In the regions overlapping with the strongest within-breed selection signals in PUL breed as many as seven genes involved in lipid metabolism were detected (DHCR7, CLPS, PNPLA1, PPARD, BPIFB5, CLPSL2, BPIFB1) which may be associated with former selection of this breed toward reduced fat content in carcass. We also detected genes connected with immune system processes related to resistance to bacterial (FLNC, TCIRG1) and virus (POLD4, SND1) diseases, which fits to the characteristics of this native breed which is known to have a good disease resistance and adaptation to harsh environmental conditions. In Złotnicka White pigs, the strongest within-breed selection signals encompassed genes associated with inter alia the immune system processes and adaptation to environmental conditions or stress response. The first group of genes included those connected with inflammation mediated by chemokine and cytokine: GNG10, PRKCE, SOCS5 or inflammatory mediator regulation of TRP channels – MAPK10 (mitogen-activated protein kinase 10) and PRKCE (protein kinase C epsilon), which may contribute to the good health and disease resistance observed in this breed. The second group included MCFD2 gene engaged in response to a stimulus and other genes responsible for response to stress (GO:00069 50)/stress-activated MAPK cascade, like: MARC1, ERCC4, DUSP10, MAPK10.

In Złotnicka Spotted breed, some predicted genes involved in lipid and carbohydrates metabolism were detected and were engaged in processes connected with: sphingolipid catabolism (LOC102167689), membrane lipid catabolism or glucosidase/galactosidase activity (LOC102167689) which as metabolic processes are often associated with selection signals in pigs [3]. The detection of the SCLY gene encoding isoform X1 of selenocysteine lyase associated with selenium metabolism and viral mRNA translation seems to be interesting because of the fact that it was established that SCLY selenocysteine plays a key role in lipid metabolism and can be related to selenium concentration in meat [71].

Conclusions

Summarizing, in this study we used two different but largely complementary statistical approaches, REHH and pairwise FST, to detect selection signatures in four Polish pig breeds, including three native unselected breeds. FST-derived signatures were useful to detect diversifying selection signals across breeds (represented by loci for which alleles were differentially fixed in different breeds) and allowed us to indicate genes connected e.g. with fertility, growth and metabolism which are potentially responsible for phenotypes of the studied breeds. These selection signals also comprised PPARD gene that contributes to the shape of external ears in pigs and genes encoding neuropeptide Y receptors involved in fat deposition and stress response which are important features differentiating the studied breeds. REHH statistics pointed several within-breed selection signals overlapping with genes connected with a broad range of functions, including e.g.: metabolic pathways, immune system response or implantation and development of the embryo. The study provides many potential candidate genes responsible for traits selected in the individual breeds and gives a strong basis for further studies aimed at closing the gap between the genotype and phenotype of the studied pig breeds.