Construction of a transposase accessible chromatin landscape reveals chromatin state of repeat elements and potential causal variant for complex traits in pigs

Jiang, Tao; Ling, Ziqi; Zhou, Zhimin; Chen, Xiaoyun; Chen, Liqing; Liu, Sha; Sun, Yingchun; Yang, Jiawen; Yang, Bin; Huang, Jianzhen; Huang, Lusheng

doi:10.1186/s40104-022-00767-3

Construction of a transposase accessible chromatin landscape reveals chromatin state of repeat elements and potential causal variant for complex traits in pigs

Research
Open access
Published: 11 October 2022

Volume 13, article number 112, (2022)
Cite this article

Download PDF

You have full access to this open access article

Journal of Animal Science and Biotechnology Aims and scope Submit manuscript

Construction of a transposase accessible chromatin landscape reveals chromatin state of repeat elements and potential causal variant for complex traits in pigs

Download PDF

Tao Jiang¹^na1,
Ziqi Ling¹^na1,
Zhimin Zhou¹,
Xiaoyun Chen¹,
Liqing Chen¹,
Sha Liu¹,
Yingchun Sun¹,
Jiawen Yang¹,
Bin Yang¹,
Jianzhen Huang¹ &
…
Lusheng Huang¹

3099 Accesses
3 Citations
4 Altmetric
Explore all metrics

Abstract

Background

A comprehensive landscape of chromatin states for multiple mammalian tissues is essential for elucidating the molecular mechanism underlying regulatory variants on complex traits. However, the genome-wide chromatin accessibility has been only reported in limited tissue types in pigs.

Results

Here we report a genome-wide landscape of chromatin accessibility of 20 tissues in two female pigs at ages of 6 months using ATAC-seq, and identified 557,273 merged peaks, which greatly expanded the pig regulatory element repository. We revealed tissue-specific regulatory elements which were associated with tissue-relevant biological functions. We identified both positive and negative significant correlations between the regulatory elements and gene transcripts, which showed distinct distributions in terms of their strength and distances from corresponding genes. We investigated the presence of transposable elements (TEs) in open chromatin regions across all tissues, these included identifications of porcine endogenous retroviruses (PERVs) exhibiting high accessibility in liver and homology of porcine specific virus sequences to universally accessible transposable elements. Furthermore, we prioritized a potential causal variant for polyunsaturated fatty acid in the muscle.

Conclusions

Our data provides a novel multi-tissues accessible chromatin landscape that serve as an important resource for interpreting regulatory sequences in tissue-specific and conserved biological functions, as well as regulatory variants of loci associated with complex traits in pigs.

Enhancer-promoter interaction maps provide insights into skeletal muscle-related traits in pig genome

Article Open access 09 June 2022

Constructing eRNA-mediated gene regulatory networks to explore the genetic basis of muscle and fat-relevant traits in pigs

Article Open access 09 April 2024

A compendium and comparative epigenomics analysis of cis-regulatory elements in the pig genome

Article Open access 13 April 2021

Background

Eukaryotic genomes are usually packed into nucleosomes, which comprise of 147 bp DNA wrapping around a histone octamer, forming the structural units of chromatin. Chromatin is categorized as active euchromatin or inactive heterochromatin based on its accessibility [1, 2], which is linked to fundamental cellular processes including gene transcription, DNA replication and repair [3]. The accessible chromatin often reflects binding of transcriptional machinery to cis-regulatory elements such as promoters and enhancers [4,5,6,7,8]. Profiling the accessible chromatin in different tissues could help to interpret regulatory basis underlying spatiotemporal transcription of genes [9,10,11], as well as genomic variants causing diseases susceptibility and complex trait variations [12,13,14,15].

Over the past decades, several chromatin accessibility profiling methods including DNase I hypersensitive site sequencing (DNase-seq), micrococcal nuclease sequencing (MNase-seq) and assay for transposase-accessible chromatin using sequencing (ATAC-seq) have been developed. Among these, ATAC-seq has become increasingly popular because of its simplicity, high sensitivity and low cell number requirement [16], and has been applied to construct regulatory elements landscapes in human and mouse [17,18,19,20]. The international consortiums like, Encyclopedia of DNA Elements (ENCODE) [21], Roadmap Epigenomics projects [22] and the Functional Annotation of Animal Genomes (FAANG) [23] have played important roles in generating the epigenetic data in multiple species. There have been efforts to profile accessible chromatin in pigs [24,25,26]. However, only few tissues including fat, cerebellum, cortex, heart, liver, muscle, spleen, lung and hypothalamus were covered in these studies. Moreover, studies have investigated chromatin states of TEs including LINE, SINE, LTR and DNA transposons in human and mouse [27,28,29], allowing better understanding the regulatory function of TEs. However, the chromatin landscape of pig genomic TEs has not been explored yet. In particular, the PERV, as a type of TE, whose sequences are inserted into the pig genome, can be transmitted to humans after xenotransplantation, causing disease risk [30, 31].

Here, we performed ATAC-seq on 40 samples from 20 tissues in 2 female Duroc-Landrace-Yorkshire (DLY) commercial pigs, including 10 tissues (medulla spinalis, bronchia, parotid gland, pharynx, stomach, small intestine (SI), kidney, ovary, cervix, thymus) that have not been investigated in previous studies. We inferred the open chromatin regions displaying tissue-specific or tissue-conserved accessibility. By integrating transcriptomic data (RNA-seq), we revealed open chromatin regions that significantly correlated with gene transcription. In addition, we examined TEs including the PERVs and different TEs in the open chromatin regions, providing insights into the regulatory role of the TEs and their evolutionary history. The data and analyses in this study provide vital resources to explore the function of cis-regulatory elements in determining tissue-specific biological functions and complex traits in pigs.

Methods and materials

Sample collection

The pigs used in this study were obtained from a slaughter house in Nanchang, Jiangxi province. 20 tissues (brain, cerebellum, hypothalamus, medulla spinalis, lung, bronchia, parotid gland, pharynx, stomach, small intestine, liver, kidney, cervix, ovary, heart, thymus, longissimus dorsi, semimembranosus, backfat, leaf fat) was collected from 2 healthy commercial sows at age of 6 months. After slaughter, tissue samples were collected in sterile tube and stored at − 80 °C.

ATAC-seq library preparation and sequencing

ATAC-seq libraries were prepared as previously described with minor modifications [16]. Briefly, samples were ground to powder with liquid nitrogen using mortar. Then, we homogenized the tissue powder in 2 mL ice-cold homogenization buffer (10% NP40, 500 mmol/L EDTA, 1 mol/L sucrose, 100 mmol/L PMSF, 14.3 mol/L β-mercaptoethanol, 1 mol/L CaCl₂, 1 mol/L Mg(Ac)₂, 1 mol/L Tris [pH 7.8], nuclease-free H₂O) using a 7-mL Dounce homogenizer for 20 strokes. We filtered the solution through a 70-μm cell strainer and gradient centrifugation with different concentrations of iodixanol. After that, approximately 60,000 nuclei from each sample were collected, centrifuged at 500 × g for 5 min at 4 °C, washed the pellet with 1 mL ATAC-RSB (5 mol/L NaCl, 1 mol/L MgCl₂, 1 mol/L Tris-HCl [pH 7.4], nuclease-free H₂O) containing 0.1% Tween-20, centrifuged at 500 × g for 10 min at 4 °C; and resuspended the pellet in 50-μL of tagmentation reaction mix (2 × Tagmentation buffer, 10 × PBS, 0.5% Digitonin, 10% Tween20, Tn5 transposase and nuclease-free H₂O), incubated for 30 min at 37 °C. Libraries were prepared using active motif kit (ATAC-Seq Kit #53150) as previously described [32]. The ATAC-seq DNA was sequenced in the 150 bp paired-end sequencing mode with a NovaSeq 6000 platform. Two biological replicates were performed per tissue.

ATAC-seq processing

According to the Fig. 1b, the raw data with sequence adapters and low-quality reads were trimmed with Trim Galore v0.6.6 [33] using default parameters. The high-quality reads were then aligned to the pig genome (Sus scrofa 11.1) using Bowtie2 v2.3.5.1 [34]. Duplicate alignments were removed using Picard toolkit v1.119 [35] and alignments with low mapping quality (mapq < 30) and mitochondrial DNA were removed with SAMtools v1.9 [36]. We used MACS2 v2.1.1 [37] to generate the accessible chromatin regions (peaks) for each sample with the parameter --qvalue 0.01 --nomodel --shift − 100 --extsize 200 -B -SPMR --keep-dup all, and then we merged peaks across all samples within the same tissue in two individuals overlapping peaks using BEDTools merge [38]. We recalculated the number of reads in each merged peak using SAMtools bedcov [36] and peaks per kilobase million values were determined to measure the peak intensity using R program v3.5.1 [39]. The clustering of the samples based on peak intensity were performed using hcluster function with ward. D method in R program v3.5.1 [39], and visualized using R package circlize [40] and dendextend [41]. Genomic annotation for peaks was performed with Genomic Association Tester [42]. And tissue-specific peaks are defined as peak intensity in specific tissue at least twice higher than in the others.

RNA-seq library preparation, sequencing and processing

Total RNA was extracted from 40 frozen samples using Trizol and sequenced using MGI-2000 system with paired end and strand specific. RNA-seq reads of each sample were mapped to pig genome (Sus scrofa 11.1) using STAR v2.7.1a [43]. Expression level of each gene was determined with StringTie [44] and featureCounts [45] and transcript per million (TPM) normalization was performed using R program. Only genes expressed (TPM > 0) across the 40 samples in more than 2 samples were included in the subsequent analyses. The clustering of the samples based on gene expression levels were performed using hcluster function with ward.D method in R program, and visualized using R package circlize [40] and dendextend [41]. And two samples (stomach_sample1 and backfat_sample2) with poor clustering were discarded. And tissue-specific genes are defined as expression level in specific tissue at least twice higher than in the others.

Gene ontology enrichment analysis

The gene ontology enrichment analysis was performed using ClueGO [46]. The P-values for the enrichment of Gene Ontology (GO) terms were corrected using Benjamini-Hochberg approach.

Transcription factor motifs

We tested for enrichment of vertebrates known transcription factor motifs in peaks using HOMER [47] with default parameters.

Correlation of peaks with gene expression

To clarify the regulatory relationship between peaks and genes, three other data were used in this study. The first study was from Kern et al. [26]. A total of 16 samples were collected from 8 tissues (fat, cerebellum, brain, hypothalamus, liver, lung, muscle and spleen) of yorkshire with 2 biological replicates per tissue. The second was designed by Zhao et al. [48], which include 5 tissues (muscle, liver, fat, spleen, heart) of MeiShan, LargeWhite, Enshi and Duroc with 1–2 biological replicates per tissue. And the last one was executed by Yang et al. [49], which we downloaded and only included one tissue (skeletal muscle) of Luchuan and Duroc pigs with 3 biological replicates per breed (Additional file 1: Table S3). In total, combined with our own data, there were 85 samples of ATAC-seq and corresponding RNA-seq. After the above uniform ATAC-seq process analysis, we identified 796,506 peaks with P-value less than 10⁻⁵, and 14 samples with poor tissue clustering were discarded. And then 527,718 peaks exist in at least 2 samples were retained. For RNA-seq, after uniform RNA-seq process analysis above, 24,974 genes were identified, then, we selected 17,634 genes with TPM > 1 in at least 10% samples for further analysis. Using 527,718 peaks and 17,634 genes across 71 samples, we predicted target genes for peaks by implementing correlation between peak intensity and gene expression. Genes whose transcription start sites (TSSs) were within 500 Kb from the center of peaks were considered. We identified 827,942 significant correlations at q-value less than 0.01.

Spatial correlation between peaks and TEs

The data of TEs from pig (Sus scrofa 11.1) were analyzed by RepeatMasker v2.9.0 [50] with parameter -s. And correlated interval sets between peaks and TEs were calculated by GenometriCorr (Genometric Correlation) [51] which was used to calculate the spatial correlation of genome-wide interval datasets. The permutation tests with 100 times were performed to verify whether TEs enriched in the regulatory elements or not.

Identification of accessible TEs

As previously reported [52], accessible TEs were identified by the overlapping between TEs and peaks, and the evaluation criterion is that at least 50% of the overlap should be TE while at least 20% of the overlap should be peak region. Accessible TEs that were only identified in one tissue was defined as tissue-specific accessible TEs while accessible TEs that were identified in all tissues was defined as shared accessible TEs. GO enrichment analysis of tissue-specific accessible TEs and shared accessible TEs were performed using ClueGO [46].

Multiple sequence alignment

Virus sequences were obtained from a study (unpublished) by self-assembly and comparison with the database [53]. The conserved accessible TEs sequences and viral sequences were put together. Then, we used MAFFT v7.407 [54] with the automatically detected parameters (--auto) to identified sequence homology. FastTree v2.1.11 [55] was used to infer approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide sequences. Finally, experiment visualization features in FigTree v1.4.4 [56].

The enrichment of GWAS signals

The pig GWAS data of fatty acid was collected from a published study [57]. A total of 191 significant GWAS loci were obtained. Then, the enrichment analysis of these loci was performed with regulatory elements.

Results

Mapping the open chromatin states of 20 pig tissues

To map genome-wide regulatory elements in pig, we performed ATAC-seq and RNA-seq on 40 samples representing 20 tissues, namely, brain, cerebellum, hypothalamus, medulla spinalis, lung, bronchia, parotid gland, pharynx, stomach, liver, SI, kidney, ovary, cervix, heart, thymus, longissimus muscle (LD), semimembranosus muscle (SM), backfat and leaf fat from two female DLY pigs at 6 months (Fig. 1a), representing two biological replicates per tissue. The data was subjected to analysis following the workflow shown in Fig. 1b.

After quality control, we obtained a total of 2.43G uniquely mapped reads from the 40 samples (Additional file 1: Table S1). The reads showed enrichment around TSSs and a periodical fragment size distribution corresponding to the nucleosome-free regions (NFR) (< 100 bp) and mono-, di-, and tri-nucleosomes (~ 200, 400 and 600 bp, respectively) (Additional file 2: Fig. S1a, S1b). Furthermore, we computed quality related statistics FRIP (fraction of reads in peaks), TSS score, library complexity (NRF (Non-redundancy Fraction), PBC1 and PBC2 (PCR Bottlenecking Coefficient 1 and 2)) and cross correlation (NSC and RSC) (Additional file 1: Table S1). These statistics suggested that the data generated hereby are of good quality. We identified an average of 77,096 peaks with average size of 448 bp per tissue (Additional file 2: Fig. S1d). Peaks from all samples were merged into 557,273 non-redundant peaks, among which 363,638 (65.25%) overlapped with peaks identified in Kern et al. [26] (Fig. 2c), while the rest of peaks were newly identified owing to more tissue types were profiled in this study.

To obtain a set of peaks with higher confidence, we only kept peaks that were evidenced in both biological replicates for each tissue, and finally retained 267,836 merged peaks for further analysis. Most of the 267,836 peaks located in intronic (54.23%) and intergenic (36.75%) regions, meanwhile, 4.23%, 2.59% and 2.20% peaks were evidenced in CDS, 5′UTR and 3′UTR regions (Fig. 2e). Similar to the results obtained in the previous study [25], we found that accessible chromatin regions vary greatly across tissues (range: 9729 to 94,946), covering 0.56% to 3.21% sequences of the reference genome (Fig. 2b), with a good overlap with those reported in independent studies by tissues (Additional file 1: Table S4).

Hierarchical clustering based on the peak intensity clearly separated different tissue types (Fig. 2a), indicating that the peak intensities largely reflect tissue identity. We grouped the 267,836 peaks into 20,488 proximal peaks (within 1 Kb of TSS) and 247,248 distal peaks (away from TSS 1 Kb). As expected, the distal peaks displayed stronger tissue specificity comparing to proximal peaks (Fig. 2d and Additional file 2: Fig. S1c), agreeing with the reported characteristics of promoters and enhancers [58].

To investigate the effect of regulatory elements on gene expression, RNA-seq was successfully performed on 38 out of the 40 ATAC-Seq samples. The summary statistics of each RNA library is listed in Supplementary Table 2. A total of 17,624 genes were identified. The reproducibility of the 38 RNA-seq data was verified by hierarchical clustering of gene expression (Additional file 2: Fig. S2a).

Tissue specificity of the open chromatin peaks

To explore the key regulatory elements that determine specialized functions of tissues, we defined peak whose average intensity was at least two folds in the target tissue relative to any other tissues as tissue-specific peak, and identified 16,704, 7887, 6742, 5960, 5286, 4278, 3089, 884, 649, 407, 326, 277, 261, 177, 118, 84, 21, 16, 6 and 3 tissue-specific peaks in cerebellum, kidney, liver, heart, parotid gland, cervix, thymus, leaf fat, stomach, brain, SI, LD, pharynx, SM, ovary, hypothalamus, medulla spinalis, lung, backfat and bronchia, respectively (Fig. 3a, b). We exemplified a liver specific peak located close to ARG1 (Fig. 3c and Additional file 2: S2e), a protein coding gene expressed primarily in the liver and involved in the urea cycle [59]. The genes locate closest to the tissue-specific peaks were enriched in tissue-specific pathways, such as the heart and circulatory system development for heart specific peaks (Additional file 2: Fig. S2f), such as heart development and circulatory system development in heart. Given that the numbers of tissue-specific peaks were largely affected by the inclusion of physiologically similar tissues, we further determined peaks that specific to a certain tissue system using a similar rule applied on the individual tissues, and identified 11,114, 434, 310, 28, 5053 and 50 specific peaks for locomotor system, reproductive system, endocrine system, respiratory system, nervous system and digestive system, respectively, located close to genes with functions matching the biology of corresponding tissues (Fig. 3a). For example, genes expressed specifically in nervous system were significantly involved in synaptic membrane (q-value = 1.39E-32), while those expressed specifically in locomotor system were significantly associated with muscle structure development (q-value = 5.68E-28) (Additional file 2: Fig. S2d).

Next, we investigated enrichment of TF binding motifs in the tissue-specific peaks using HOMER [47]. We defined the TF motifs with enrichment strength (−log (P-value)) in one tissue were at least two folds relative to the other tissues as tissue-specific, and identified 205 tissue-specific TF motifs (Fig. 3d). These included motif of ATOH1 which controls primary cilia formation that facilitates SHH-Triggered granule neuron progenitor proliferation [60] was evidenced to be cerebellum specific, and motif of MEF2C which controls cardiac morphogenesis [61] was enriched in heart.

Ubiquitously accessible chromatin peaks

Meanwhile, we also identified 8734 peaks that consistently active in at least 90% of investigated tissues (Fig. 3e). The genes locate nearest to the conserved peaks were enriched for pathways like cytoskeleton and mitotic cell cycle that have housekeeping functions (Additional file 2: Fig. S2d). The top 5 enriched TF motifs (SP1, NFY, SP5, ELK4 and KLF3) (Additional file 1: Table S5) participate in chromatin remodeling and transcriptional activation (Fig. 3f), which are indispensable transcription factors in growth and development [62].

Tissue-specific gene expressions

We applied similar approach to investigate tissue specificity of gene expressions, and identified 135, 528, 439, 711, 228, 181, 117, 467, 40, 75, 469, 486, 94, 581, 124, 117, 411, 34, 336 and 1590 tissue-specific genes that were found to be specific in backfat, brain, bronchia, cerebellum, cervix, heart, hypothalamus, kidney, LD, leaf fat, liver, lung, medulla spinalis, ovary, parotid gland, pharynx, SI, SM, stomach and thymus, respectively (Additional file 2: Fig. S2b). For example, MYH7, a gene described to be responsible for hypertrophic cardiomyopathy, was highly expressed in heart than in other tissues (Additional file 2: Fig. S2c). The GO analysis showed these tissue-specific genes were associated with tissue-specific biological processes (Additional file 2: Fig. S2b). For example, genes significantly expressed in cerebellum were related to the function of synaptic signaling, and genes significantly expressed in thymus were related to the function of T cell activation (Additional file 2: Fig. S2b).

Joint profiling of chromatin accessibility and gene expression

To investigate the regulatory elements that link to gene transcription, we examined correlations of gene expressions with intensities of ATAC peaks within 500 Kb from the TSSs of corresponding genes, as majority of promoter-anchored chromatin interactions were within 500 Kb [63, 64]. The analysis was performed on the integrated dataset of 527,718 peaks and 17,634 genes from 71 samples, including 33 samples from public datasets (Additional file 1: Table S3, Additional file 2: Fig. S3). We identified a total of 827,942 significant peak - gene associations (q-value < 0.01) that involves 255,166 peaks and 17,493 genes (Additional file 2: Fig. S4). Majority of the significant correlations (406,896/827,942, 49.15%) are positive only, while we also identified 10.54% negative correlations only. For example, a peak (chr5:10,663,155-10,664,871) has high intensity in liver showed positive correlation with TMPRSS6 (Spearman r² = 0.77, q-value = 6.43E-12) (Fig. 4f), which encodes a type II transmembrane serine protease exclusively produced by liver [65]. We also exemplified negative peak - gene correlation where a peak (chr1:128,083,325-128,085,034) show strong negative correlation with expression levels of PDIA3 (Spearman r² = − 0.60, q-value = 1.42E-6) (Fig. 4g). In addition, there are 40.31% peak - gene correlations showed both positive and negative correlations, indicating that the identity of regulatory elements was mutable.

On average, there were 2.87 genes per peak and 35.86 peaks per gene in positive correlations while 2.13 genes per peak and 14.49 peaks per gene in negative correlations (Fig. 4a, c). Interestingly, we observed that the strength of positive and negative correlations showed distinct distribution against peak - gene distances (Fig. 4b). The positive correlations were enriched around the TSSs of genes, while the negative correlations appeared to be underrepresented in TSS regions (Fig. 4d). We identified 1786 tissue-specific genes were associated with 38,574 tissue-specific peaks within 500 Kb from the TSS regions (Fig. 4e, h and Additional file 2: Fig. S4), suggesting important roles of tissue-specific chromatin accessibility in driving the tissue-specific genes expression thereby encoded tissue-specific biological functions.

Open chromatin states of transposable elements in different pig tissues

Over 40% of the sequences in the non-coding regions of the mammalian genome are TEs whose functions were largely unexplored. To understand the functions of TEs in pig tissues, we examined the chromatin accessibility of genome wide TEs. We found 26.91% of the 267,836 open chromatin regions were overlapped with at least one TE.

The spatial correlation analysis between TEs and open chromatin regions calculated by GenometriCorr [51] suggested underrepresentation of ratio of TEs located in the open chromatin regions than random expectation, indicating that TEs are tend to be epigenetically silenced. Despite this, each tissue contains an average of 17.96% peaks that were overlapped with at least one TE, range from 9.49% to 22.33% (Fig. 5a). The SINE, LINE and LTR accounted for approximately 40%, 30% and 23% of the accessible TEs, respectively, across the 20 tissues (Fig. 5b). We found over 50% of these TEs existed in a single tissue were located close to genes involved in tissue-specific functions (Fig. 5c, d). For example, we observed enrichment of T cell receptor signaling pathway enriched in thymus while muscle system process enriched in LD (Fig. 5d). These results suggested that accessible TEs existed in one tissue exhibit strong tissue-specific regulatory roles, which was consistent with the previous reports [66, 67]. Meanwhile, we observed approximate 0.19% of accessible TEs were commonly accessible in all 20 tissues (Fig. 5c). The conserved accessible TEs were more associated with housekeeping functions (e.g., ribosome assembly) (Fig. 5d).

Mounting evidence suggests that specific TEs in pigs, especially PERV, could potentially lead to immunodeficiency and tumorigenesis, making it difficult for xenotransplantation [68, 69]. Upon further inspection, we found a total of 10 unique PERVs overlapped with 13 peaks (Additional file 1: Table S6). For example, the PERV (sPERV-JX2like-12-23.0 M) was overlapped with the peak (chr12:22,990,219-22,991,950) which showed high intensity in liver (Fig. 5e), significantly correlated with multiple genes (NEUROD2, ENSSSCG00000017507, CACNB1, PLXDC1, PIP4K2B). Among them, PIP4K2B, which participates in 1-phosphatidylinositol-4-phosphate 5-kinase activity and IP6 [70], the product of 1-phosphatidylinositol-4-phosphate 5-kinase, can facilitates assembly of the immature HIV-1 Gag lattice [71], shows highly correlation (Fig. 5f), suggesting potential role of the PERVs in facilitating virus infection.

Besides this, we also focused on the functions of other genes regulated by PERVs. For example, GTPBP8 [72], BCKDK [73], IDH3B [74] and IDH1 [75] were related to the function of mitochondrion; CD2BP2 [76] and FUS [77] involved in mRNA splicing; BOC [78] and FZD5 [79] participated in signal transduction pathway. Taken together, these results suggested that PERV which act as regulatory element involved in a variety of biological functions.

In this study, we identified a total of 162 TEs that are accessible in all investigated tissues. To trace the origin or evolutionary history of these ubiquitously accessible TEs, we calculated sequence similarity between 124 viral sequences (Additional file 1: Table S8) and 162 conserved accessible TEs sequences, and constructed a phylogenetic tree to reconstruct evolutionary history. Interestingly, we observed some conserved accessible TEs sequences were grouped with some viruses whose host was pig (Fig. 6), such as PBoV, TTSuVK2a, TTSuVK2b, PADV-A, PoBuV, PPV7 and Po-Circo-Like Virus 21 (Additional file 1: Table S9). We performed GO and KEGG enrichment analysis of all genes associated with conserved accessible TEs, and found that the biologic process of mitochondrial depolarization was highly enriched (Additional file 1: Table S10), which indicated that accessible TEs derived from viral sequences have a potential regulatory function in clearance of excess or impaired mitochondrial. Altogether, these evidences suggested that some special viral sequences have been integrated into the host genome and retained as regulatory elements that could play constitutive regulatory functions.

An annotation resource for regulatory mechanisms in complex traits

To demonstrate the utility of such resource in annotating candidate casual variants for complex traits, we examined overlap of 191 significant variants for fatty acid composition identified in a genome sequence based imputation GWAS [57] with the ATAC-seq peak regions identified in this study. A total of 22 variants were found to be coincident with the ATAC-seq peaks (Additional file 1: Table S7). Among these, a lead SNP (chr2:9,736,686, P = 2.10E-10) for C20:3n-6/C18:2n-6 was located in an ATAC-seq peak (chr2:9,736,357-9,737,907) that showed high intensity in brain and moderate intensity in muscle (Fig. 7a). This peak showed significant correlation with three genes (FADS1, FADS2 and RAB3IL1) (Additional file 1: Table S7). Among them, FADS1 (q = 1.56E-4) which exhibited strongest correlation (Fig. 7b), encodes delta-5 desaturase, one of the rate-limiting enzymes in the endogenous synthesis of polyunsaturated fatty acids (PUFAs) [80]. FADS2 which converts linoleate and alpha-linolenate into PUFAs, is one of the key limiting enzymes in the lipid metabolic pathway [81] (Fig. 7c). In addition, several other genes related to fatty acid have been identified. Osteoglycin (OGN) is involved in matrix assembly, cellular growth, and migration. Previous study showed that OGN was associated with fat acids composition traits to some extent [82] and regulated lipid differentiation through the Wnt/β-catenin signaling pathway [83]. TEAD3 which participates in the fatty acid, triacylglycerol, and ketone body metabolism pathway from the Reactome Pathways database [84], is a member of the transcriptional enhancer factor (TEF) family of transcription factors. Further experiment is needed to validate effect of the variants on the accessibility of the peaks, and in turn the expression of relevant genes and fatty acid composition phenotypes.

Discussion

Regulatory elements are crucial for gene expression regulation, which involved in the initiation and regulation of transcription in different tissues and organs and provided precise control of genes and restrict gene expression to certain cells and tissues throughout life [85]. Activation of regulatory elements needs ‘pioneer’ factor opening the compacted chromatin to recruit transcription coactivators and chromatin remodeling proteins [86, 87]. Regulatory elements usually contain multiple transcription factors binding sites (TFBS) which recognized by TFs. Genetic variations in regulatory elements affect gene expression usually through disrupting TFBS [88]. Therefore, the generation of multi-tissues atlas of open chromatin enabled exploration of gene expression regulation and non-coding genetic variations.

In this study, we performed ATAC-seq and RNA-seq across 20 tissues in pig at the same development stage to characterized chromatin state landscape, uncovering extensive tissue-specific regulation of gene expression. However, not a single study until now, as far as we are aware, has comprehensively and systematically investigated chromatin accessibility in multi-tissues by ATAC-seq and conjunction with transcriptomic analysis. In our study, we observed that the patterns of TSS proximal regulatory elements were more similar across tissues while distal regulatory elements were more tissue-specific, reflecting the architecture and function of distal regulatory elements have strong tissue specificity [89].

Consistent with the previous reports that regulatory elements mediate specific differentiation of tissues or cells [90, 91], we identified a large number of tissue-specific open chromatin regions which regulate the genes with function matching the biology of corresponding tissues. In addition, we identified tissue-specific TF motifs based on the enrichment of their cognate binding motifs, such as ATOH1 enriched in cerebellum and FOXA1 enriched in liver. And we acknowledge that TF motifs enrichment analysis needs to verify using other approaches such as ChIP-seq for specific TFs.

Connecting peaks to their target genes remains challenge given that Hi-C data is not available for the samples under study, particularly for distal peaks which are far from target genes, and regarding genes nearest to peaks as targets is error prone [92,93,94]. To solve this problem, we combine other published data to reveal the association between peak intensity and gene expression, which has been proved effective [90]. We observed significant negative peak - gene pairs showed strikingly different peak - gene distances compared to that in positive correlation pairs, which suggested that positive and negative correlation pairs may have different regulatory mechanism. To further understand tissue-specific regulatory relationships between peak intensity and gene expression, we filtered for tissue-specific correlation pairs (Additional file 2: Fig. S4). Our results suggest tissue-specific genes expression may occur through the activation of tissue-specific regulatory elements. Whereas, the above predictions require further support using 3D chromosome conformation data or validated through CRISPRi experiments.

To deepen our understanding of regulatory elements, here we performed profiles in TEs as they can provide binding sites for TFs and behave as alternative promoters and enhancers [95]. In this study, we observed a small proportion of accessible TEs, which were in good accordance with the classical polarization theory that TEs promote genetic innovation but also threaten genome stability [96]. Similar to previous study [52], we observed most of accessible TEs only present in one tissue. And tissue-specific accessible TEs were enriched in tissue-specific biological pathways, which support the findings of previous studies [52, 97,98,99]. Endogenous retroviruses (ERVs) are LTR retrotransposons (class I of TEs), which originated from exogenous retroviruses that infected the germ line throughout evolution [100, 101], which could manifest their function across multiple organs in an animal, thereby could have fundamental functional impact. In order to trace the potential source of accessible TEs, we performed multiple sequence alignments between the viral sequences and the conserved accessible TEs sequences, and found some candidate viral sequences (PBoV, TTSuVK2a, TTSuVK2b, PADV-A, PoBuV, PPV7 and Po-Circo-Like Virus 21) might be the source of conserved accessible TEs.

The annotation of regulatory elements has proven highly effective for the identification of candidate causative variants and screening candidate genes for complex traits [88, 102, 103]. Similarly, our results demonstrated that variants of fatty acid were enriched in active regulatory elements annotated by this study. Specifically, we speculate that a potential causative SNP (chr2:9,736,686, P = 2.10E-10) that was associated with C20:3n-6/C18:2n-6 and found within a tissue-conserved promoter, may regulate the expression of FADS1 which encodes the Δ5 desaturase enzyme - one of the rate-limiting enzymes in the endogenous synthesis of polyunsaturated fatty acids (PUFAs) [104]. Taken together, our data unveiled the chromatin landscapes for studying the regulation of gene expression, provided potential sources of chromatin accessibility and deconstructed underlying mechanisms of complex traits in pig. Further investigations with additional complementary data—such as single cell and different development stage data—are warranted to fully dissect biological mechanisms of gene expression.

Conclusions

We performed an integrated analysis of transcriptome sequencing and transposase-accessible chromatin with high throughput sequencing and provided a comprehensive understanding of tissue-specific accessible chromatin regions and tissue-specific TF motifs. Then, we constructed the relationship between regulatory elements and their target genes. This large regulatory network will serve as a foundation for understanding the precise regulatory mechanisms in different tissues of pig. We also analyzed accessible TEs which act as candidate regulatory elements. In addition, we analyzed the sequence characteristics of regulatory elements, and found some virus sequences similar to regulatory element sequences, which provided a certain basis for the essential characteristics of regulatory elements. Finally, the regulatory elements identified in this paper were used to annotate the GWAS loci that affect fatty acid traits, providing a reference for genetic analysis of complex traits.

Availability of data and materials

Supporting data produced in this study have been uploaded to National Genomics Data Center with the accession number CRA007240.

Abbreviations

TF:: Transcription factor
ATAC-seq:: Transposase-accessible chromatin with high throughput sequencing
DNase-seq:: DNase I hypersensitive site sequencing
ERVs:: Endogenous retroviruses
FRIP:: Fraction of reads in peaks
LD:: Longissimus muscle
MNase-seq:: Micrococcal nuclease sequencing
NFR:: Nucleosome-free regions
NSC:: Normalized strand cross-correlation coefficient
OGN:: Osteoglycin
PADV-A:: Porcine mastadenovirus A
NRF:: Non-redundancy Fraction
PBC1:: PCR Bottlenecking Coefficient 1
PBC2:: PCR Bottlenecking Coefficient 2
PBoV:: Parvoviridae Bocaparvovirus Porcine bocavirus
PoBuV:: Protoparvovirus Zsana/2013/HUN
PoClv21:: Po-Circo-Like Virus 21
PPV7:: Porcine parvovirus 7
PUFAs:: Polyunsaturated fatty acids
RSC:: Relative strand cross-correlation coefficient
SI:: Small intestine
SM:: Semimembranosus
TE:: Transposable element
TEF:: Transcriptional enhancer factor
TFBS:: Transcription factors binding site
TPM:: Transcripts per million
TSS:: Transcription start site
TTSuVK2a:: Torque teno sus virus k2a
TTSuVK2b:: Torque teno sus virus k2b

References

Simpson RT. Nucleosome positioning can affect the function of a cis-acting DNA element in vivo. Nature. 1990;343(6256):387–9. https://doi.org/10.1038/343387a0.
Article CAS PubMed Google Scholar
Schones DE, Cui K, Cuddapah S, Roh TY, Barski A, Wang Z, et al. Dynamic regulation of nucleosome positioning in the human genome. Cell. 2008;132(5):887–98. https://doi.org/10.1016/j.cell.2008.02.022.
Article CAS PubMed Google Scholar
Kouzarides T. Chromatin modifications and their function. Cell. 2007;128(4):693–705. https://doi.org/10.1016/j.cell.2007.02.005.
Article CAS PubMed Google Scholar
Lee CK, Shibata Y, Rao B, Strahl BD, Lieb JD. Evidence for nucleosome depletion at active regulatory regions genome-wide. Nat Genet. 2004;36(8):900–5. https://doi.org/10.1038/ng1400.
Article CAS PubMed Google Scholar
Ozsolak F, Song JS, Liu XS, Fisher DE. High-throughput mapping of the chromatin structure of human promoters. Nat Biotechnol. 2007;25(2):244–8. https://doi.org/10.1038/nbt1279.
Article CAS PubMed Google Scholar
Sheffield NC, Furey TS. Identifying and characterizing regulatory sequences in the human genome with chromatin accessibility assays. Genes (Basel). 2012;3(4):651–70. https://doi.org/10.3390/genes3040651.
Article CAS Google Scholar
Yue F, Cheng Y, Breschi A, Vierstra J, Wu W, Ryba T, et al. A comparative encyclopedia of DNA elements in the mouse genome. Nature. 2014;515(7527):355–64. https://doi.org/10.1038/nature13992.
Article CAS PubMed PubMed Central Google Scholar
Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, et al. The accessible chromatin landscape of the human genome. Nature. 2012;489(7414):75–82. https://doi.org/10.1038/nature11232.
Article CAS PubMed PubMed Central Google Scholar
Rivera CM, Ren B. Mapping human epigenomes. Cell. 2013;155(1):39–55. https://doi.org/10.1016/j.cell.2013.09.011.
Article CAS PubMed Google Scholar
Lettice LA, Williamson I, Devenney PS, Kilanowski F, Dorin J, Hill RE. Development of five digits is controlled by a bipartite long-range cis-regulator. Development. 2014;141(8):1715–25. https://doi.org/10.1242/dev.095430.
Article CAS PubMed PubMed Central Google Scholar
Reddi PP, Urekar CJ, Abhyankar MM, Ranpura SA. Role of an insulator in testis-specific gene transcription. Ann N Y Acad Sci. 2007;1120:95–103. https://doi.org/10.1196/annals.1411.012.
Article CAS PubMed Google Scholar
Smith RP, Eckalbar WL, Morrissey KM, Luizon MR, Hoffmann TJ, Sun X, et al. Genome-wide discovery of drug-dependent human liver regulatory elements. PLoS Genet. 2014;10(10):e1004648. https://doi.org/10.1371/journal.pgen.1004648.
Article CAS PubMed PubMed Central Google Scholar
Wang D, Chen H, Momary KM, Cavallari LH, Johnson JA, Sadee W. Regulatory polymorphism in vitamin K epoxide reductase complex subunit 1 (VKORC1) affects gene expression and warfarin dose requirement. Blood. 2008;112(4):1013–21. https://doi.org/10.1182/blood-2008-03-144899.
Article CAS PubMed PubMed Central Google Scholar
Smemo S, Tena JJ, Kim KH, Gamazon ER, Sakabe NJ, Gomez-Marin C, et al. Obesity-associated variants within FTO form long-range functional connections with IRX3. Nature. 2014;507(7492):371–5. https://doi.org/10.1038/nature13138.
Article CAS PubMed PubMed Central Google Scholar
Kapoor A, Sekar RB, Hansen NF, Fox-Talbot K, Morley M, Pihur V, et al. An enhancer polymorphism at the cardiomyocyte intercalated disc protein NOS1AP locus is a major regulator of the QT interval. Am J Hum Genet. 2014;94(6):854–69. https://doi.org/10.1016/j.ajhg.2014.05.001.
Article CAS PubMed PubMed Central Google Scholar
Corces MR, Trevino AE, Hamilton EG, Greenside PG, Sinnott-Armstrong NA, Vesuna S, et al. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nat Methods. 2017;14(10):959–62. https://doi.org/10.1038/nmeth.4396.
Article CAS PubMed PubMed Central Google Scholar
Markenscoff-Papadimitriou E, Whalen S, Przytycki P, Thomas R, Binyameen F, Nowakowski TJ, et al. A chromatin accessibility atlas of the developing human telencephalon. Cell. 2020;182(3):754–69.e18. https://doi.org/10.1016/j.cell.2020.06.002.
Article CAS PubMed PubMed Central Google Scholar
Wu J, Huang B, Chen H, Yin Q, Liu Y, Xiang Y, et al. The landscape of accessible chromatin in mammalian preimplantation embryos. Nature. 2016;534(7609):652–7. https://doi.org/10.1038/nature18606.
Article CAS PubMed Google Scholar
Gorkin DU, Barozzi I, Zhao Y, Zhang Y, Huang H, Lee AY, et al. An atlas of dynamic chromatin landscapes in mouse fetal development. Nature. 2020;583(7818):744–51. https://doi.org/10.1038/s41586-020-2093-3.
Article CAS PubMed PubMed Central Google Scholar
Corces MR, Granja JM, Shams S, Louie BH, Seoane JA, Zhou W, et al. The chromatin accessibility landscape of primary human cancers. Science. 2018;362(6413):eaav1898. https://doi.org/10.1126/science.aav1898.
The ENCODE Project Consortium. The ENCODE (ENCyclopedia of DNA elements) project. Science. 2004;306(5696):636–40. https://doi.org/10.1126/science.1105136.
Article Google Scholar
Roadmap Epigenomics Consortium, Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, et al. Integrative analysis of 111 reference human epigenomes. Nature. 2015;518(7539):317–30. https://doi.org/10.1038/nature14248.
Article CAS PubMed Central Google Scholar
Andersson L, Archibald AL, Bottema CD, Brauning R, Burgess SC, Burt DW, et al. Coordinated international action to accelerate genome-to-phenome with FAANG, the functional annotation of animal genomes project. Genome Biol. 2015;16:57. https://doi.org/10.1186/s13059-015-0622-4.
Article CAS PubMed PubMed Central Google Scholar
Pan Z, Yao Y, Yin H, Cai Z, Wang Y, Bai L, et al. Pig genome functional annotation enhances the biological interpretation of complex traits and human disease. Nat Commun. 2021;12(1):5848. https://doi.org/10.1038/s41467-021-26153-7.
Article CAS PubMed PubMed Central Google Scholar
Yue J, Hou X, Liu X, Wang L, Gao H, Zhao F, et al. The landscape of chromatin accessibility in skeletal muscle during embryonic development in pigs. J Anim Sci Biotechnol. 2021;12(1):56. https://doi.org/10.1186/s40104-021-00577-z.
Article CAS PubMed PubMed Central Google Scholar
Kern C, Wang Y, Xu X, Pan Z, Halstead M, Chanthavixay G, et al. Functional annotations of three domestic animal genomes provide vital resources for comparative and agricultural research. Nat Commun. 2021;12(1):1821. https://doi.org/10.1038/s41467-021-22100-8.
Article CAS PubMed PubMed Central Google Scholar
Kunarso G, Chia NY, Jeyakani J, Hwang C, Lu X, Chan YS, et al. Transposable elements have rewired the core regulatory network of human embryonic stem cells. Nat Genet. 2010;42(7):631–4. https://doi.org/10.1038/ng.600.
Article CAS PubMed Google Scholar
Ito J, Sugimoto R, Nakaoka H, Yamada S, Kimura T, Hayano T, et al. Systematic identification and characterization of regulatory elements derived from human endogenous retroviruses. PLoS Genet. 2017;13(7):e1006883. https://doi.org/10.1371/journal.pgen.1006883.
Article CAS PubMed PubMed Central Google Scholar
Jang HS, Shah NM, Du AY, Dailey ZZ, Pehrsson EC, Godoy PM, et al. Transposable elements drive widespread expression of oncogenes in human cancers. Nat Genet. 2019;51(4):611–7. https://doi.org/10.1038/s41588-019-0373-3.
Article CAS PubMed PubMed Central Google Scholar
Sykes M, Sachs DH. Transplanting organs from pigs to humans. Sci Immunol. 2019;4(41):eaau6298. https://doi.org/10.1126/sciimmunol.aau6298.
Denner J, Tonjes RR. Infection barriers to successful xenotransplantation focusing on porcine endogenous retroviruses. Clin Microbiol Rev. 2012;25(2):318–43. https://doi.org/10.1128/CMR.05011-11.
Article CAS PubMed PubMed Central Google Scholar
Buenrostro JD, Wu B, Chang HY, Greenleaf WJ. ATAC-seq: a method for assaying chromatin accessibility genome-wide. Curr Protoc Mol Biol. 2015;109:21.29.1–9. https://doi.org/10.1002/0471142727.mb2129s109.
Article Google Scholar
Trim Galore. https://github.com/FelixKrueger/TrimGalore. Accessed 17 June 2022.
Langmead B, Salzberg SL. Fast gapped-read alignment with bowtie 2. Nat Methods. 2012;9(4):357–9. https://doi.org/10.1038/nmeth.1923.
Article CAS PubMed PubMed Central Google Scholar
Picard. http://broadinstitute.github.io/picard. Accessed 17 June 2022.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078–9. https://doi.org/10.1093/bioinformatics/btp352.
Article CAS PubMed PubMed Central Google Scholar
Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008;9(9):R137. https://doi.org/10.1186/gb-2008-9-9-r137.
Article CAS PubMed PubMed Central Google Scholar
Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841–2. https://doi.org/10.1093/bioinformatics/btq033.
Article CAS PubMed PubMed Central Google Scholar
R Core Team. R. A language and environment for statistical computing. Vienna: R Foundation for statistical computing; 2013. URL http://www.R-project.org/
Google Scholar
Gu Z, Gu L, Eils R, Schlesner M, Brors B. circlize implements and enhances circular visualization in R. Bioinformatics. 2014;30(19):2811–2. https://doi.org/10.1093/bioinformatics/btu393.
Article CAS PubMed Google Scholar
Galili T. dendextend: an R package for visualizing, adjusting and comparing trees of hierarchical clustering. Bioinformatics. 2015;31(22):3718–20. https://doi.org/10.1093/bioinformatics/btv428.
Article CAS PubMed PubMed Central Google Scholar
Genomic Association Tester. https://github.com/AndreasHeger/gat/blob/master/doc/contents.rst. Accessed 17 June 2022.
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2012;29(1):15–21. https://doi.org/10.1093/bioinformatics/bts635.
Article CAS PubMed PubMed Central Google Scholar
Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT, Salzberg SL. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 2015;33(3):290–5. https://doi.org/10.1038/nbt.3122.
Article CAS PubMed PubMed Central Google Scholar
Liao Y, Smyth GK, Shi W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014;30(7):923–30. https://doi.org/10.1093/bioinformatics/btt656.
Article CAS PubMed Google Scholar
Bindea G, Mlecnik B, Hackl H, Charoentong P, Tosolini M, Kirilovsky A, et al. ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics. 2009;25(8):1091–3. https://doi.org/10.1093/bioinformatics/btp101.
Article CAS PubMed PubMed Central Google Scholar
Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell. 2010;38(4):576–89. https://doi.org/10.1016/j.molcel.2010.05.004.
Article CAS PubMed PubMed Central Google Scholar
Zhao Y, Hou Y, Xu Y, Luan Y, Zhou H, Qi X, et al. A compendium and comparative epigenomics analysis of cis-regulatory elements in the pig genome. Nat Commun. 2021;12(1):2217. https://doi.org/10.1038/s41467-021-22448-x.
Article CAS PubMed PubMed Central Google Scholar
Yang Y, Fan X, Yan J, Chen M, Zhu M, Tang Y, et al. A comprehensive epigenome atlas reveals DNA methylation regulating skeletal muscle development. Nucleic Acids Res. 2021;49(3):1313–29. https://doi.org/10.1093/nar/gkaa1203.
Article CAS PubMed PubMed Central Google Scholar
Chen N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics. 2004. https://doi.org/10.1002/0471250953.bi0410s25 Chapter 4:Unit 4.10.
Favorov A, Mularoni L, Cope LM, Medvedeva Y, Mironov AA, Makeev VJ, et al. Exploring massive, genome scale datasets with the GenometriCorr package. PLoS Comput Biol. 2012;8(5):e1002529. https://doi.org/10.1371/journal.pcbi.1002529.
Article CAS PubMed PubMed Central Google Scholar
Miao B, Fu S, Lyu C, Gontarz P, Wang T, Zhang B. Tissue-specific usage of transposable element-derived promoters in mouse development. Genome Biol. 2020;21(1). https://doi.org/10.1186/s13059-020-02164-3.
Roux S, Paez-Espino D, Chen IA, Palaniappan K, Ratner A, Chu K, et al. IMG/VR v3: an integrated ecological and evolutionary framework for interrogating genomes of uncultivated viruses. Nucleic Acids Res. 2021;49(D1):D764–D75. https://doi.org/10.1093/nar/gkaa946.
Article CAS PubMed Google Scholar
Katoh K, Misawa K, Ki K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002;30(14):3059–66.
Article CAS Google Scholar
FigTree. http://tree.bio.ed.ac.uk/software/figtree. Accessed 17 June 2022.
Gorkin DU, Barozzi I, Zhao Y, Zhang Y, Ren B. Author correction: an atlas of dynamic chromatin landscapes in mouse fetal development. Nature. 2021;589(7842):1.
Article Google Scholar
Zhang J, Zhang Y, Gong H, Cui L, Ma J, Chen C, et al. Landscape of loci and candidate genes for muscle fatty acid composition in pigs revealed by multiple population association analysis. Front Genet. 2019;10:1067. https://doi.org/10.3389/fgene.2019.01067.
Article CAS PubMed PubMed Central Google Scholar
Ko JY, Oh S, Yoo KH. Functional enhancers as master regulators of tissue-specific gene regulation and cancer development. Mol Cell. 2017;40(3):169–77. https://doi.org/10.14348/molcells.2017.0033.
Article CAS Google Scholar
Iyer R, Jenkinson CP, Vockley JG, Kern RM, Grody WW, Cederbaum S. The human arginases and arginase deficiency. J Inherit Metab Dis. 1998;21 Suppl 1:86–100. https://doi.org/10.1023/a:1005313809037.
Article CAS PubMed Google Scholar
Chang CH, Zanini M, Shirvani H, Cheng JS, Yu H, Feng CH, et al. Atoh1 controls primary cilia formation to allow for SHH-triggered granule neuron progenitor proliferation. Dev Cell. 2019;48(2):184–99.e5. https://doi.org/10.1016/j.devcel.2018.12.017.
Article CAS PubMed Google Scholar
Stone NR, Gifford CA, Thomas R, Pratt KJB, Samse-Knapp K, Mohamed TMA, et al. Context-specific transcription factor functions regulate epigenomic and transcriptional dynamics during cardiac reprogramming. Cell Stem Cell. 2019;25(1):87–102.e9. https://doi.org/10.1016/j.stem.2019.06.012.
Article CAS PubMed PubMed Central Google Scholar
Halstead MM, Ma X, Zhou C, Schultz RM, Ross PJ. Chromatin remodeling in bovine embryos indicates species-specific regulation of genome activation. Nat Commun. 2020;11(1):4654. https://doi.org/10.1038/s41467-020-18508-3.
Article CAS PubMed PubMed Central Google Scholar
Wu Y, Qi T, Wang H, Zhang F, Zheng Z, Phillips-Cremins JE, et al. Promoter-anchored chromatin interactions predicted from genetic analysis of epigenomic data. Nat Commun. 2020;11(1):2061. https://doi.org/10.1038/s41467-020-15587-0.
Article CAS PubMed PubMed Central Google Scholar
Javierre BM, Burren OS, Wilder SP, Kreuzhuber R, Hill SM, Sewitz S, et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell. 2016;167(5):1369–84.e19. https://doi.org/10.1016/j.cell.2016.09.037.
Article CAS PubMed PubMed Central Google Scholar
Finberg KE, Heeney MM, Campagna DR, Aydinok Y, Pearson HA, Hartman KR, et al. Mutations in TMPRSS6 cause iron-refractory iron deficiency anemia (IRIDA). Nat Genet. 2008;40(5):569–71. https://doi.org/10.1038/ng.130.
Article CAS PubMed PubMed Central Google Scholar
Jin P, Qin S, Chen X, Song Y, Li-Ling J, Xu X, et al. Evolutionary rate of human tissue-specific genes are related with transposable element insertions. Genetica. 2013;140(10-12):513–23. https://doi.org/10.1007/s10709-013-9700-2.
Article Google Scholar
Trizzino M, Kapusta A, Brown CD. Transposable elements generate regulatory novelty in a tissue-specific fashion. BMC Genomics. 2018;19(1):468. https://doi.org/10.1186/s12864-018-4850-3.
Article CAS PubMed PubMed Central Google Scholar
Bendinelli M, Matteucci D, Friedman H. Retrovirus-induced acquired Immunodeficiencies. In: Advances in cancer research. 1985;45:125–81. https://doi.org/10.1016/S0065-230X(08)60268-7.
Google Scholar
Wegman-Points LJ, Teoh-Fitzgerald ML, Mao G, Zhu Y, Fath MA, Spitz DR, et al. Retroviral-infection increases tumorigenic potential of MDA-MB-231 breast carcinoma cells by expanding an aldehyde dehydrogenase (ALDH1) positive stem-cell like population. Redox Biol. 2014;2:847–54. https://doi.org/10.1016/j.redox.2014.06.006.
Article CAS PubMed PubMed Central Google Scholar
York JD. Regulation of nuclear processes by inositol polyphosphates. Biochim Biophys Acta. 2006;1761(5-6):552–9. https://doi.org/10.1016/j.bbalip.2006.04.014.
Article CAS PubMed Google Scholar
Dick RA, Zadrozny KK, Xu C, Schur FKM, Lyddon TD, Ricana CL, et al. Inositol phosphates are assembly co-factors for HIV-1. Nature. 2018;560(7719):509–12. https://doi.org/10.1038/s41586-018-0396-4.
Article CAS PubMed PubMed Central Google Scholar
Verma Y, Mehra U, Pandey DK, Kar J, Perez-Martinez X, Jana SS, et al. MRX8, the conserved mitochondrial YihA GTPase family member, is required for de novo Cox1 synthesis at suboptimal temperatures in Saccharomyces cerevisiae. Mol Biol Cell. 2021;32(21):ar16. https://doi.org/10.1091/mbc.E20-07-0457.
Article CAS PubMed PubMed Central Google Scholar
Sato Y, Tate H, Yoshizawa F, Sato Y. Data on the proliferation and differentiation of C2C12 myoblast treated with branched-chain ketoacid dehydrogenase kinase inhibitor. Data Brief. 2020;31:105766. https://doi.org/10.1016/j.dib.2020.105766.
Article PubMed PubMed Central Google Scholar
Duncan DM, Kiefel P, Duncan I. Mutants for drosophila Isocitrate dehydrogenase 3b are defective in mitochondrial function and larval cell death. G3 (Bethesda). 2017;7(3):789–99. https://doi.org/10.1534/g3.116.037366.
Article CAS Google Scholar
Gonzalez-Menendez P, Romano M, Yan H, Deshmukh R, Papoin J, Oburoglu L, et al. An IDH1-vitamin C crosstalk drives human erythroid development by inhibiting pro-oxidant mitochondrial metabolism. Cell Rep. 2021;34(5):108723. https://doi.org/10.1016/j.celrep.2021.108723.
Article CAS PubMed PubMed Central Google Scholar
Kofler M, Heuer K, Zech T, Freund C. Recognition sequences for the GYF domain reveal a possible spliceosomal function of CD2BP2. J Biol Chem. 2004;279(27):28292–7. https://doi.org/10.1074/jbc.M402008200.
Article CAS PubMed Google Scholar
Orozco D, Edbauer D. FUS-mediated alternative splicing in the nervous system: consequences for ALS and FTLD. J Mol Med (Berl). 2013;91(12):1343–54. https://doi.org/10.1007/s00109-013-1077-2.
Article Google Scholar
Echevarria-Andino ML, Allen BL. The hedgehog co-receptor BOC differentially regulates SHH signaling during craniofacial development. Development. 2020;147(23):dev189076. https://doi.org/10.1242/dev.189076.
Katoh M, Katoh M. Molecular genetics and targeted therapy of WNT-related human diseases (review). Int J Mol Med. 2017;40(3):587–606. https://doi.org/10.3892/ijmm.2017.3071.
Article CAS PubMed PubMed Central Google Scholar
Lee JM, Lee H, Kang S, Park WJ. Fatty acid Desaturases, polyunsaturated fatty acid regulation, and biotechnological advances. Nutrients. 2016;8(1):23. https://doi.org/10.3390/nu8010023.
Xie D, Fu Z, Wang S, You C, Monroig Ó, Tocher DR, et al. Characteristics of the fads2 gene promoter in marine teleost Epinephelus coioides and role of Sp1-binding site in determining promoter activity. Sci Rep. 2018;8(1):5305. https://doi.org/10.1038/s41598-018-23668-w.
Article CAS PubMed PubMed Central Google Scholar
Mroz TL, Eves-van den Akker S, Bernat A, Skarzynska A, Pryszcz L, Olberg M, et al. Transcriptome analyses of mosaic (MSC) mitochondrial mutants of cucumber in a highly inbred nuclear background. G3 (Bethesda). 2018;8(3):953–65. https://doi.org/10.1534/g3.117.300321.
Article CAS Google Scholar
Donati G, Proserpio V, Lichtenberger BM, Natsuga K, Sinclair R, Fujiwara H, et al. Epidermal Wnt/beta-catenin signaling regulates adipocyte differentiation via secretion of adipogenic factors. Proc Natl Acad Sci U S A. 2014;111(15):E1501–9. https://doi.org/10.1073/pnas.1312880111.
Article CAS PubMed PubMed Central Google Scholar
Gillespie M, Jassal B, Stephan R, Milacic M, Rothfels K, Senff-Ribeiro A, et al. The reactome pathway knowledgebase 2022. Nucleic Acids Res. 2022;50(D1):D687–D92. https://doi.org/10.1093/nar/gkab1028.
Article CAS PubMed Google Scholar
Hernandez-Garcia CM, Finer JJ. Identification and validation of promoters and cis-acting regulatory elements. Plant Sci. 2014;217-218:109–19. https://doi.org/10.1016/j.plantsci.2013.12.007.
Article CAS PubMed Google Scholar
Visel A, Rubin EM, Pennacchio LA. Genomic views of distant-acting enhancers. Nature. 2009;461(7261):199–205. https://doi.org/10.1038/nature08451.
Article CAS PubMed PubMed Central Google Scholar
Zhu F, Farnung L, Kaasinen E, Sahu B, Yin Y, Wei B, et al. The interaction landscape between transcription factors and the nucleosome. Nature. 2018;562(7725):76–81. https://doi.org/10.1038/s41586-018-0549-5.
Article CAS PubMed PubMed Central Google Scholar
Huo Y, Li S, Liu J, Li X, Luo XJ. Functional genomics reveal gene regulatory mechanisms underlying schizophrenia risk. Nat Commun. 2019;10(1):670. https://doi.org/10.1038/s41467-019-08666-4.
Article CAS PubMed PubMed Central Google Scholar
Ong CT, Corces VG. Enhancer function: new insights into the regulation of tissue-specific gene expression. Nat Rev Genet. 2011;12(4):283–93. https://doi.org/10.1038/nrg2957.
Article CAS PubMed PubMed Central Google Scholar
Zhu Y, Zhou Z, Huang T, Zhang Z, Li W, Ling Z, et al. Mapping and analysis of a spatiotemporal H3K27ac and gene expression spectrum in pigs. Sci China Life Sci. 2022. https://doi.org/10.1007/s11427-021-2034-5.
Smith AD, Sumazin P, Zhang MQ. Tissue-specific regulatory elements in mammalian promoters. Mol Syst Biol. 2007;3:73. https://doi.org/10.1038/msb4100114.
Article PubMed PubMed Central Google Scholar
Mifsud B, Tavares-Cadete F, Young AN, Sugar R, Schoenfelder S, Ferreira L, et al. Mapping long-range promoter contacts in human cells with high-resolution capture hi-C. Nat Genet. 2015;47(6):598–606. https://doi.org/10.1038/ng.3286.
Article CAS PubMed Google Scholar
Schoenfelder S, Furlan-Magaril M, Mifsud B, Tavares-Cadete F, Sugar R, Javierre BM, et al. The pluripotent regulatory circuitry connecting promoters to their long-range interacting elements. Genome Res. 2015;25(4):582–97. https://doi.org/10.1101/gr.185272.114.
Article CAS PubMed PubMed Central Google Scholar
Sanyal A, Lajoie BR, Jain G, Dekker J. The long-range interaction landscape of gene promoters. Nature. 2012;489(7414):109–13. https://doi.org/10.1038/nature11279.
Article CAS PubMed PubMed Central Google Scholar
Pehrsson EC, Choudhary MNK, Sundaram V, Wang T. The epigenomic landscape of transposable elements across normal human development and anatomy. Nat Commun. 2019;10(1):5640. https://doi.org/10.1038/s41467-019-13555-x.
Article CAS PubMed PubMed Central Google Scholar
Senft AD, Macfarlan TS. Transposable elements shape the evolution of mammalian development. Nat Rev Genet. 2021;22(11):691–711. https://doi.org/10.1038/s41576-021-00385-1.
Article CAS PubMed Google Scholar
Jacques PE, Jeyakani J, Bourque G. The majority of primate-specific regulatory sequences are derived from transposable elements. PLoS Genet. 2013;9(5):e1003504. https://doi.org/10.1371/journal.pgen.1003504.
Article CAS PubMed PubMed Central Google Scholar
Sundaram V, Cheng Y, Ma Z, Li D, Xing X, Edge P, et al. Widespread contribution of transposable elements to the innovation of gene regulatory networks. Genome Res. 2014;24(12):1963–76. https://doi.org/10.1101/gr.168872.113.
Article CAS PubMed PubMed Central Google Scholar
Hutchins AP, Pei D. Transposable elements at the center of the crossroads between embryogenesis, embryonic stem cells, reprogramming, and long non-coding RNAs. Sci Bull (Beijing). 2015;60(20):1722–33. https://doi.org/10.1007/s11434-015-0905-x.
Article CAS Google Scholar
Garcia-Montojo M, Doucet-O'Hare T, Henderson L, Nath A. Human endogenous retrovirus-K (HML-2): a comprehensive review. Crit Rev Microbiol. 2018;44(6):715–38. https://doi.org/10.1080/1040841X.2018.1501345.
Article PubMed PubMed Central Google Scholar
Mager DL, Stoye JP. Mammalian endogenous retroviruses. Microbiol Spectr. 2015;3(1):MDNA3-0009-2014. https://doi.org/10.1128/microbiolspec.MDNA3-0009-2014.
Article CAS PubMed Google Scholar
Xiang R, Berg IVD, MacLeod IM, Hayes BJ, Prowse-Wilkins CP, Wang M, et al. Quantifying the contribution of sequence variants with regulatory and evolutionary significance to 34 bovine complex traits. Proc Natl Acad Sci U S A. 2019;116(39):19398–408. https://doi.org/10.1073/pnas.1904159116.
Article CAS PubMed PubMed Central Google Scholar
Fang L, Liu S, Liu M, Kang X, Lin S, Li B, et al. Functional annotation of the cattle genome through systematic discovery and characterization of chromatin states and butyrate-induced variations. BMC Biol. 2019;17(1):68. https://doi.org/10.1186/s12915-019-0687-8.
Article CAS PubMed PubMed Central Google Scholar
Zhao R, Tian L, Zhao B, Sun Y, Cao J, Chen K, et al. FADS1 promotes the progression of laryngeal squamous cell carcinoma through activating AKT/mTOR signaling. Cell Death Dis. 2020;11(4):272. https://doi.org/10.1038/s41419-020-2457-5.
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We are grateful to colleagues in State Key Laboratory of Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University for sample collection.

Funding

This work was supported by National Key Research and Development Program of China (2021YFF1000601).

Author information

Tao Jiang, Ziqi Ling contributed equally to this manuscript.

Authors and Affiliations

State Key Laboratory of Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, 330045, China
Tao Jiang, Ziqi Ling, Zhimin Zhou, Xiaoyun Chen, Liqing Chen, Sha Liu, Yingchun Sun, Jiawen Yang, Bin Yang, Jianzhen Huang & Lusheng Huang

Authors

Tao Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Ziqi Ling
View author publications
You can also search for this author in PubMed Google Scholar
Zhimin Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Liqing Chen
View author publications
You can also search for this author in PubMed Google Scholar
Sha Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yingchun Sun
View author publications
You can also search for this author in PubMed Google Scholar
Jiawen Yang
View author publications
You can also search for this author in PubMed Google Scholar
Bin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jianzhen Huang
View author publications
You can also search for this author in PubMed Google Scholar
Lusheng Huang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

JZH and BY analyzed the data and wrote the manuscript; TJ performed the experiments, analyzed the data and wrote the manuscript; ZQL performed the experiment and wrote the manuscript; ZMZ, XYC, LQC, SL, YCS and JWY assisted with experiments and data analysis. LSH designed this study, wrote and edited the manuscript. The authors read and approved the final manuscript.

Corresponding authors

Correspondence to Bin Yang or Jianzhen Huang.

Ethics declarations

Ethics approval and consent to participate

All of the procedures involving animals are in compliance with the care and use guidelines of experimental animals established by the Ministry of Science and Technology of China. The Ethics Committee of Jiangxi Agricultural University approved this study.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Supplementary Information

Additional file 1: Table S1.

Summary information on the ATAC-seq samples investigated in this study. Table S2. Summary information on the RNA-seq samples investigated in this study. Table S3. The information of download data. Table S4. Comparison with peaks of previous study designed by zhou et al. Table S5. Motif enrichment analysis of shared peaks. Table S6. The information of PERV overlapped peaks. Table S7. The information of GWAS SNPs which located in peak regions. Table S8. The information of virus data. Table S9. The information of virus data with sequence homology to shared open TEs. Table S10. GO and KEGG enrichment analysis of conserved accessible TEs homologous to viral sequences.

Additional file 2: Fig. S1.

The analysis process and characteristics of peaks. (a) and (b) The ATAC-seq signal enrichment around TSSs and insert size distribution of ATAC-seq for 2 samples (brain_sample1 and brain_sample2). (c) Proportion of tissue-specific peaks that were distal peaks or proximal peaks. (d) Distribution of peak size obtained in this study. Fig. S2. Identifying the functional pathway and tissue specificity via RNA-seq. (a) Hierarchical clustering using gene expression. (b) Functional enrichment of tissue-specific expression genes for each tissue based on Gene Ontology (GO) biological processes (left). Distribution of the number of tissue-specific genes (right). (c) Representative examples of heart specific gene which is highly expressed in heart tissue. (d) Bar plot showing representative biological processes enriched in target genes of the system specific peaks and conserved peaks. Different colors refer to different systems, and light yellow indicates biological processes enriched in target genes of conserved peaks. (e) Bar plot showing peak intensity of liver specific peaks which located close to ARG1 (a protein coding gene expressed primarily in the liver and involved in the urea cycle). (f) Functional enrichment of genes for each tissue based on GO biological processes. The columns represent 20 tissues. The rows represent GO terms in each tissue (upper). Representative examples of enriched biological processes in target genes of heart specific peaks (bottom). Fig. S3. Hierarchical clustering inferred from peak intensity and gene expression. (a) Hierarchical clustering using peak intensity. The top 100,000 peaks with high intensity were used for the analysis. (b) Hierarchical clustering using gene expression. The top 5000 highly variable genes were used for the analysis. Fig. S4. The flowchart and outcomes from the correlation between peak intensity and gene expression.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Jiang, T., Ling, Z., Zhou, Z. et al. Construction of a transposase accessible chromatin landscape reveals chromatin state of repeat elements and potential causal variant for complex traits in pigs. J Animal Sci Biotechnol 13, 112 (2022). https://doi.org/10.1186/s40104-022-00767-3

Download citation

Received: 18 April 2022
Accepted: 10 August 2022
Published: 11 October 2022
DOI: https://doi.org/10.1186/s40104-022-00767-3

Construction of a transposase accessible chromatin landscape reveals chromatin state of repeat elements and potential causal variant for complex traits in pigs

Abstract

Background

Results

Conclusions

Similar content being viewed by others

Enhancer-promoter interaction maps provide insights into skeletal muscle-related traits in pig genome

Constructing eRNA-mediated gene regulatory networks to explore the genetic basis of muscle and fat-relevant traits in pigs

A compendium and comparative epigenomics analysis of cis-regulatory elements in the pig genome

Background

Methods and materials

Sample collection

ATAC-seq library preparation and sequencing

ATAC-seq processing

RNA-seq library preparation, sequencing and processing

Gene ontology enrichment analysis

Transcription factor motifs

Correlation of peaks with gene expression

Spatial correlation between peaks and TEs

Identification of accessible TEs

Multiple sequence alignment

The enrichment of GWAS signals

Results

Mapping the open chromatin states of 20 pig tissues

Tissue specificity of the open chromatin peaks

Ubiquitously accessible chromatin peaks

Tissue-specific gene expressions

Joint profiling of chromatin accessibility and gene expression

Open chromatin states of transposable elements in different pig tissues

An annotation resource for regulatory mechanisms in complex traits

Discussion

Conclusions

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Supplementary Information

Additional file 1: Table S1.

Additional file 2: Fig. S1.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation