Novel SNPs and InDels discovered in two promoter regions of porcine pregnancy-associated glycoprotein 2-like subfamily (pPAG2-Ls) in crossbreed pigs

This is a pioneer study of single nucleotide polymorphisms (SNPs) within the entire promoter region (1204 bp) of the dominant pPAG2-L subfamily in the pig. The pPAG2-L subfamily was sequenced/examined using genomic deoxyribonucleic acid (gDNA) templates of crossbreed pigs (Landrace x Large White), and compared to two bacterial artificial chromosome (BAC) clones containing gDNA of the Duroc breed (as the positive controls). Our analysis of the pPAG2-L promoter identified 31 SNPs and one InDel mutation in crossbreed pigs. Among 42 SNPs identified in two BAC clones, 24 SNPs had not been previously detected in crossbreed pigs. The sequence alignment of pPAG2-L promoter, performed with Lasagne-Search 2.0, Cluster Bluster and MatInspector software, revealed a total of 28 transcription factor binding sites (TFBS) and 10 TFBS (AP-1, CCAAT, CHOP:C, FOXP1, LSF, MRF-2, Myc, NF1, NF-Y, TGIF) within SNPs in the core sequences. It was noted that TFBS (NF1) was found to be unique to the pPAG2 promoter sequence containing SNPs: g.-1100G>A(R), g.-1101T>C(Y), represented by GA and TC genotypes (p x = 0.12). Our broad-based novel database thus provides an SNP PAG2-L pattern for modern genotyping of female and male progenitors. This is required for further studies of various potential correlations between guiding SNP genotypes of the pPAG2-L subfamily in the sows of many breeds, in which the most economically important reproductive traits are properly documented on each farm.

The PAG family encodes multiple chorionic polypeptide precursors that start expression during peri-implantation, the period of the highest embryonic mortality in many eutherians (Xie et al. 1997;Szafranska et al. 2006;Wallace et al. 2015). All identified PAG cDNAs permitted genomic DNA (gDNA) Martyna Bieniek-Kobuszewska and Grzegorz Panasiewicz contributed equally to this work. cloning and initial exon-intron structure organization discoveries of nine exons and eight introns in the cattle and the pig Szafranska et al. 2001). To date, 11 promoter sequences of the PAG family have been identified: bovine -bPAG1 (Xie et al. 1995), bPAG2, 3, 5, 8, 11, 12, 15, and18 (Telugu et al. 2009), porcine -pPAG2 (Szafranska et al. 2001), and equine -ePAG (Green et al. 1999). In the pig, the single nucleotide polymorphisms (SNPs) within the promoter of the pPAG2 and either pPAG2-L or pPAG1-L subfamilies have not yet been identified.
Despite all available information concerning the promoter sequences of the PAG family, there is no information on polymorphism or the potential influence of SNPs on the regulation of transcription. The present study was conducted to identify polymorphisms in the promoter region of the pPAG2-L subfamily and to determine the genotype frequency in crossbreed and Duroc pigs (as controls). Such examined SNPs in the crossbreed pigs will provide the main pattern for the genotyping of multiparous sows of many breeds, in which reproductive traits are known, which is economically important in the livestock industry.
Sequencing and SNP identification within pPAG2-L promoters All pPAG2-L amplicons were separated in 1 % agarose gels, purified according GenElute™ Gel Extraction Kit procedure (Sigma-Aldrich, USA) and sequenced in both sense and antisense directions by 3130 Genetic Analyzer using the BigDye ® Terminator v.3.1 Cycle sequencing procedure (Applied Biosystems, USA). The obtained chromatograms were initially examined by Sequencing Analysis Software (Applied Biosystems, USA), and all sequences were verified by FinchTV (Geospiza, Inc., USA) and aligned by DNASIS v.3.0 (Hitachi Software Engineering Co. Ltd., Japan) and the National Center for Biotechnology Information Basic Local Alignment Search Tool (NCBI BLASTn) using discontinuous Megablast or Blastn in the GenBank. All identified SNPs were named according to the International Union of Pure and Applied Chemistry (IUPAC) codes.

Computer analysis
In silico analysis of the pPAG2-L promoter sequences for a presence of putative TFBS was performed by Cluster Bluster, MatInspectior™ and Lasagna-Search 2.0 with TRANSFAC matrices. The analyses were carried out for all possible TFBS according to the individual settings of each software: Cluster Bluster (Gap Parameter 35; Cluster Score Treshold 2; Motif Score Treshold 2; Residue Abundance Range 100, Pseudocount 0.375), MatchInspector (minimize false positives), Genomatix RegionMiner tool for overrepresentation of TFBSs (Genomatix Software GmbH), and Lasagna-Search 2.0 (p ≤ 0.001).

Results
Identifications of sequences and novel SNPs within the pPAG2-L promoter in the crossbreeds In total, 31 novel SNPs located from g.-91C>T(Y) to g.-1101T>C(Y) plus one InDel mutation (g.-100/101InsG) upstream ATG were identified in the promoter region of the pPAG2-L subfamily (Fig. 1, Table 1). This provides a novel major pattern of the largest genetic variation of the porcine genome due to various crossbreeds. All SNPs were submitted to the dbSNP/NCBI database and analyzed according to the 1204 bp of the pPAG2 promoter (Acc. No. U39198; GenBank). The SNPs were identified within two promoter fragments, including F1) 947 bp proximal regulatory region (-720 bp upstream ATG) and F2) 489 bp flanking distal region (from -703 to -1137 bp upstream ATG). Among 32 SNPs/InDel, 13 SNPs were identified in the F1 and 19 SNPs in the F2. Within the F1, one insertion (g.-100/101InsG) and 12 SNPs (4 transitions -TRNs and 8 transversions -TRVs) were identified, while among 19 SNPs in the F2, we detected 9 TRNs and 10 TRVs.
All original chromatograms of 31 SNPs and one InDel identified within both F1 and F2 regions of the pPAG2-L promoter (Table 1), including the monoallelic (homozygotes) and biallelic (heterozygotes) visualized by the Finch TV (Fig. 2), revealed sequence differences compared to the only available consensus sequence of the pPAG2 promoter for various porcine breeds (U39198; Szafranska et al. 2001). In addition, we identified that in the BAC clones (CH242-60C13 and CH242-294016) used as the only available commercial control sequences (for pPAG3 and pPAG6), a surprisingly large diversity was identified for various members of the entire pPAG family, including the pPAG1-L and pPAG2-L subfamilies.

Identification of sequences and SNPs within the pPAG2-L promoter in control BAC clones
Sequencing of the commercial BAC clones (CH242-60C13 and CH242-294016), used as the major positive controls containing only gDNA specific for the Duroc breed, revealed very high sequence diversity, including the presently identified 36 SNPs and 6 InDels (Table 2). Conversely, our parallel broadbased in silico analysis of both BAC clone sequences revealed very huge multiplicity and diversity of the entire pPAG2-Ls and/or additionally numerous and various fragments (Panasiewicz et al. unpublished data).
Surprisingly, 42 SNPs with the Duroc gDNA template were discovered, including 24 novel SNPs (Table 2), not identified in crossbreed pigs. It was also found that 18 SNPs occurred in both Duroc and crossbreed pigs (underlined in Table 2). Among 42 SNPs, five deletions, g.-950DelA; g.-956DelG; g.-974DelG; g.-975 DelG, and g.-976DelG, were identified, 17 TRNs and 19 TRVs and one insertion (g.-101_-102InsG), which was also discovered in all crossbreed pigs (Table 2). Further evidence of the pPAG2-L genetic diversity provided a comparative analysis of the SNPs, which were identified in the promoter sequences of BAC clones (Duroc) and crossbreed pigs (Table 2).
In silico identification of various TFBSs in the pPAG2-L promoter sequence of the crossbreeds The investigations for the vertebrate-specific promoter elements by Lasagne-Search 2.0 (using 259 motifs from TRANSFAC ® application) in the pPAG2-L promoter sequence revealed 11 various TFBSs (Table 3, Fig. 3). Among of all identified motifs, only two were identified on the sense strand, whereas nine were on anti-sense strand (p value = 0, e value = 0, and score from 8.14 to 18.64). It was confirmed that allel G (g.-221G>C; according to IUPAC numbering in Table 1) created the GATA core motif, while the C allel was unable to create such TFBS (Fig. 1). Three SNPs, g.-1101T>C(Y), g.-924G>A(R), and g.-202T>G(K), appeared in the TFBS core sequences of NF-Y, CHOP:C, and MRF-2, Fig. 1 Schematic localization of the SNPs in the promoter sequence (1204 bp upstream from ATG) of the pPAG2-L subfamily examined in the crossbreed pigs. This figure includes the transcription start site (+1, +9, −29); potential binding sites for transcription factors -Ets, GATA, STAT (boxed); TATA-box (TATATAA); unique tandem repeats (double underlined); the occurrence of SNP (p.−168 g > c*) in the coding sequence for GATA respectively. The other SNPs occurred within AP-2rep, MyoD, HNF-1, FAC1, and AREB6, but STAT5A occurred outside of the TFBS core sequences.

Discussion
In total, 31 SNPs and one InDel (Table 1, Fig. 1) were identified in the pPAG2-L promoter (up to −1100 upstream ATG), including 19 SNPs in the F2 (15 SNPs in three clusters, and another four SNPs occurring in tandems). In F1, 13 SNPs were identified, including one InDel (g.-101/102InsG) inside the GATA sequence (p x = 0.86). Currently, the 32 polymorphisms of the pPAG2-L promoter (from g.-91C>T(Y) to g.-1101T>C(Y) identified in crossbreed pigs can only be compared to the five deposited promoter sequences: pPAG2 (Szafranska et al. 2001), bPAG1, ePAG, bPAG2 Green et al. 1999;Telugu et al. 2009), and fPAG (Ensembl). However, a comparison of the SNPs in the pPAG2-L promoter sequence is impossible, because the SNPs of bPAG1, ePAG, and bPAG2 have not been studied.
Surprisingly, a comparative analysis of identified BAC clone sequences revealed 97.0-99.3 % homology to the pPAG2 promoter (Szafranska et al. 2001), which suggests SNPs among different breeds. The sequence diversity of two BAC clones originating from Duroc (used as gDNA control templates) revealed 42 polymorphisms (36 SNPs and 6 InDel; Table 2), among which novel 24 SNPs have also been identified in the crossbreed pigs (Table 1). It should be noted that some identified SNPs in crossbreed pigs were identical as in the BAC clones (Duroc), which undoubtedly indicates that the currently-tested crossbreed pigs were interbreeding hybrids between Duroc with other breeds. Thus, the identified SNPs are evidence of duplication and positive selection of the pPAG2-L subfamily in different breeds.
Previously, specific sequences of the various TFBSs were identified in porcine pPAG2: Ets, Ets-1, GATA, GATA-like, and STAT (Szafranska et al. 2001). Presently, the location of SNPs identified in the pPAG2-L promoter regulatory F1 region suggests importance due to the potential impact on the TFBSs and, consequently, transcription activation. In silico analysis using three programs (Table 3) revealed a total of 28 various TFBSs. In the F1, we found conserved sequences for TATATAA box (from −73 to −79 bp), AP1 (activator protein 1) and CCAAT (enhancer binding protein (C/EBP). The Lasagne 2.0 software was able to detect SNPs (g.-1101T>C, g.-924G>A, and g.-202T>G), which diminish binding sites for NF-Y, CHOP:C, and MRF-2, which may have an influence on the PAG2-L expression in these three heterozygote genotypes: CT (p x = 0.12), AG (p x = 0.29), and TG (p x = 0.44), respectively. Cluster Buster also confirmed that two SNPs, g.-1000G>A(R) and g.-1101T>C(Y), may affect the PAG2-Ls expression by NF1 and AP-1.
Furthermore, the presence of AP-2 transcription factor was detected, which was also found in the promoters of some placental bovine genes, especially bPAG1 and bPAG17. This suggests that the AP-2 family is a major factor regulating genes depending on cytochrome P-450 involved in the production of steroid hormones in the binucleated cells (Ushizawa et al. 2007). The significant evidence of the PAG family involvement in the regulation of pregnancy maintenance has provided commercial bovine microarrays containing 1780 genes, including 30 expressed genes (25-250 dpc), mainly in the bovine two-nucleated placental cells (Ushizawa et al. 2007). Moreover, Affymetrix microarrays showed a significant correlation of the bPAG11 with prostaglandins (PG) synthesis: PGE synthase (r = 0.76), cytosolic PGE synthase 3 (r = 0.69), and endoperoxide synthase 2 (r = 0.86), suggesting an important role of the bPAG11 in the PG cascade activation (Thompson et al. 2011). It is necessary to underline that two presently identified SNPs in the pPAG2-L promoter (g.-117A>T and g.-168C>T) were localized within, or almost nearby, 10 nt unique tandem repeats (TCTTATCAGG located at -94 to -103 and -113 to -122 upstream of ATG), which are specific in the activation of the PAG gene family in pigs (Szafranska et al. 2001) or/and cattle (Telugu et al. 2009), respectively. Both of these SNPs identified in crossbreed pigs are very close to the conservative Ets sequence with proximity to the GATA within the pPAG2 promoter recognized previously (Szafranska et al. 2001). In cattle, the Ets analyzed in eight known bPAG promoters maintained conservative sequences (Telugu et al. 2009). Thus, these SNPs may affect placental development and pregnancy maintenance in both species.
Although there was no prevalence of SNPs in the GATA sequence of the pPAG2-L promoter, the Cluster Bluster revealed that allele A (SNP g.-1100G>A) creates NF1, while allele G (genotype GG) does not create this TFBS. However, the frequency of heterozygote genotype GA and TC was, in both cases, p x = 0.12; the frequency of homozygote genotypes GG and TT (which do not determine TFBS) was p x = 0.88. The SNPs g.-1100G>A(R) that appeared inside the core determined the occurrence of a new TFBS (NF-1).
In contrast, the MatInspector revealed sequences characteristic for three variously located FOXP1 ES.01, although only one SNP (g.-929A>T(W)/g. 273A>T) was localized in the core AACA sequence of this TFBS (265-281 bp). Previous studies have shown that FOXP1 deletion has an embryonic lethal defect that affects a variety of organs, including cardiac (Wang et al. 2004) and lung development (Shu et al. 2007), and B cell differentiation (Hu et al. 2006). The SNPs within the pPAG2-Ls promoter (from g.-990C>G to g.-954A>G; g.-820C>A; g.-417C>A; g.-276T>C to g.-213T>G; g.-168C>T; and g.-117A>T to g.-91C>T) did not influence or affect the appearance of TFBSs. The other TFBSs identified in silico is Tbx20a transcription factor that is essential for proper heart development in a growing fetus (Song et al. 2006).
The participation of many transcription factors involved in bPAG activation has been shown by EMSA (electrophoretical mobility shift assay), and the most important was assigned to the Ets2 and DDVLdrosophila dorsal ventral factor (Telugu et al. 2009), as well as in the regulation of transcription of chorionic genes, e.g., IFNτ in cattle (Ezashi et al. 1998) or hCG in women (Ghosh et al. 2003). Presently, in crossbreed pigs, no SNPs were identified in the Ets2 binding site, although the identified SNPs in the GATA suggest the possibility of disturbance during pPAG2-L subfamily activation. In addition, the bovine microarrays (Kumar et al. 2010) indicate a strong influence of the STAT Pax-2 (signal transducer and activator of transcription; paired homeobox 2) on the promoters of genes that are expressed in the placenta, e.g., bPAG2, PTGS2 (COX2, PGendoperoxide synthase 2) and LSG 34F, as a homologue of a secretory vesicle protein in the male (SSLP-1; seminal vesicle protein secreted).
The investigation of SNP spreading in the selected population requires an important parameter -MAF (minor allele frequency) at the level of >0.1 in commercial pig breeds (McLaren et al. 2013). Although in our study, the MAF was not specified, among 32 SNPs, we are able to identify the genotype frequency p x = 0.56-0.88 for 18 SNPs, which indicates the dominance of homozygotes, while in the case of 14 SNPs it indicates the dominance of heterozygotes (p x = 0.55-1).
It is well known that the smallest polymorphism results from homozygosity of different domestic pig breeds. The occurrence of 1.2 SNPs/kbp in the genome of European Large White breed and only 0.05 SNPs/kbp in the genome of Sus barbatus indicates the inbreeding of pigs from the isolated populations, as a result of natural barriers or a human economic activity (Ferreira et al. 2008). The regions of homozygosity (ROHs) in European, Asian, and wild pigs (60 K SNP microarray) vary about 778.8 ± 349 ROHs/genome (one ROHs range between 10 kbp to 83.6 Mbp; average 1.11 Mbp), which represents about 23 % of the porcine genome (Bosse et al. 2012). A higher level of ROHs occurs in genomes of wild and domestic European pigs, while the lowest ROHs level (and the largest polymorphism) is present in the genome of wild Asian pigs. Furthermore, 1733 SNP ± 0.57/kbp occur in the porcine genome, but only 2.49 SNP ± 0.57/bp are in regions outside the ROHs (Bosse et al. 2012). This may suggest that the greatest heterozygosity of the pPAG2-L promoter occurs within various areas potentially located outside the ROHs in the crossbreed pigs.

Conclusion
This study provides pioneering information on polymorphism and hints at the discovery of 32 SNPs/InDel identified within the regulatory proximal and flanking distal regions of the pPAG2-L promoter subfamily in crossbreed pigs and 42 SNPs/ InDels identified in the Duroc breed (as inserts in BAC clones used as controls). Many of the pPAG2-L SNPs were identified in various TFBSs (at least 8 to 26, due to the three softwares used), which suggests the high importance of allelic (homo-and heterozygotic) diversity and meaningful influence on transcriptional regulation of the pPAG2-L subfamily expression.
Since this is the first study describing the pPAG2-L subfamily diversity in the genome of crossbreed pigs, it therefore also increased/extended the general knowledge on the last version of the domestic pig genome (Ss10.2). The results present a broad-based novel database (main pattern)as the widest genotyping prototype, which is required for further investigations of various potential correlations between guiding SNP genotypes of the pPAG2-L subfamily in the sows of many breeds, in which the most economically important reproductive traits are properly documented on each farm of female and male progenitors.