Background

Fatty acids (FA) are crucial for living organisms, as they serve as important energy sources and, in humans, they are known to play an important role in health. Furthermore, FA composition plays a significant role in meat quality in pigs [1], including technological and sensorial quality of meat products [2]. Fatty acids can be classified as saturated (SFA, absence of double bonds) or unsaturated (presence of one or more double bonds) with monounsaturated (MUFA) and polyunsaturated (PUFA).

Fat, liver, and muscle are important tissues for FA metabolism and have different FA compositions, which are affected by several factors including genetic, management and environmental factors, and by gene expression, among others. Notwithstanding, the relationship between FA composition and gene expression is complex and still not fully elucidated. For instance, the compositional distribution of lipids in pig muscle correlates with intramuscular fat content (IMF, also referred to as marbling). Specifically, the deposition of FA classes when neutral lipid content is adjusted for IMF fractionation as a covariate suggests that deposition of SFA and MUFA increases significantly with age from 6 to 9 months, whereas that of PUFA decreases significantly [3]. For complex phenotypes such as porcine IMF content and its FA composition, in a previous study based on a hierarchical clustering analysis [4], we have shown that lipogenic-related genes are in general positively correlated with MUFA, while the lipolytic-related genes are specifically positively correlated with PUFA. Likewise, by conducting RNA sequencing (RNA-Seq) experiments, several studies have reported numerous candidate genes with differential effects or global changes on intramuscular FA composition across several tissues, such as backfat, liver, and muscle [5,6,7,8,9]. Our research group has also identified links between the muscle transcriptome and its FA profile in Iberian × Duroc backcross (BC1_DU) pigs [9]. In that study, we used a univariate extended linear model and found that several candidate genes involved in carbohydrate and lipid metabolism were significantly associated with FA composition traits (mainly MUFA, PUFA, and FA ratios). Other relevant associations with FA ratios (e.g., ω6/ω3 and C18:2n-6/C18:3n-3, both including major FA) and their main transcriptional regulators still remain to be investigated. A better understanding of biological mechanisms that underlie complex traits requires integrative approaches via gene networks [10], as well as the identification of the main regulators driving gene-by-gene interactions [11]. In the context of such multivariate and integrative approaches, there are several tools that allow the exploration and integration of biological datasets with a focus on variable selection. Among these, the mixOmics R package includes a plethora of multivariate methodologies with extensive statistical approximations [12].

In the study reported here, we analyzed our existing muscle data (i.e., expression levels of genes and only individual FA phenotypes) of the BC1_DU population [9]. From a multivariate perspective and using a complementary integrative analysis, we investigated the gene expression of potential candidate genes and novel regulators for intramuscular FA composition. We focused on integrative analyses between intramuscular FA and gene expression profiles in pigs to identify representative FA phenotypes, key regulators, candidate genes, and biological processes and metabolic pathways related to the FA composition in muscle.

Methods

Animals and phenotypic data

The experimental backcross population (25% Iberian and 75% Duroc, BC1_DU) used in this study is described in [9]. Briefly, it included 129 animals, which were raised under the standard, intensive conditions of production; and fed ad libitum with a cereal-based commercial diet and free access to water. Details on the experimental BC1_DU generation, animal raising, and feeding are in [13]. Animal procedures were carried out according to the Spanish Policy for Animal Protection RD1201/05, which meets the European Union Directive 86/609 about the protection of animals used in experimentation. This study was conducted in accordance with relevant guidelines and regulations of the animal care and use committee of the Institut de Recerca i Tecnologia Agroalimentàries (IRTA), which adopts “The European Code of Conduct for Research Integrity”. The experimental protocol was approved by the Ethical Committee of the IRTA. Our study is also reported in full compliance with the ARRIVE guidelines (https://arriveguidelines.org/).

Animals were slaughtered in the same commercial abattoir in Mollerussa (Spain). Samples of the longissimus dorsi (LD) skeletal muscle (59 females and 70 non-castrated males) distributed in five slaughterhouse batches were collected, immediately snap frozen in liquid nitrogen and stored at − 80 °C until analysis. At slaughter, the average age of the pigs was 190 days (ranging from 174 to 205 days), with an average carcass weight (CW) of 73.70 kg (ranging from 46.10 to 109.20 kg).

The composition of FA in the chain-length range of C14-C20 in LD muscle (n = 129) was determined using a gas chromatography of methyl esters protocol [14]. Briefly, 200 g of muscle sample from each BC1_DU pig were homogenized and used to measure the FA profile in duplicate. Crespo-Piazuelo et al. [15] provide additional information on the muscle FA composition in the BC1_DU population. Then, each individual FA methyl ester (n = 15 FA phenotypes) was quantified and expressed as a percentage of the total amount of FA (Table 1).

Table 1 Descriptive statistics of the FA composition phenotypes measured in percentages in the longissimus dorsi muscle from BC1_DU pigs

Gene expression data

Total RNA was isolated from a muscle sample (100 mg) of each of the 129 animals using the RiboPure™ Isolation kit for High-Quality Total RNA (Ambion®; Austin, TX, USA) following the manufacturer’s recommendations. RNA quantification and purity were estimated with a NanoDrop ND-1000 spectrophotometer (NanoDrop products, Wilmington, DE, USA). RNA integrity was checked by an Agilent Bioanalyzer-2100 (Agilent Technologies, Inc., Santa Clara, CA, USA), and samples with an RNA integrity number (RIN) greater than 7 were used for the RNA-Seq experiment.

Library preparation and sequencing were performed at the CNAG institute (Centro Nacional de Análisis Genómico, Barcelona, Spain). For each sample, one paired-end library with an insert size of approximately 300 bp was prepared using the TruSeq Stranded mRNA kit (Illumina, Inc.; San Diego, USA). Libraries were labeled by barcoding, pooled, and run on the Illumina HiSeq 3000/HiSeq4000 systems (Illumina, Inc.; San Diego, USA), yielding on average 45.09 million 2 × 75 bp paired-end reads per sample.

Bioinformatic and statistical analyses

Quality control and basic statistics of the sequencing data were performed using the FastQC (v0.11.9) [16] and MultiQC v0.7 [17] programs. Sequencing reads were mapped with the STAR software (v2.7.9a) using default parameters [18], and the Sscrofa11.1 pig genome assembly as reference. On average, 90.1% of reads were uniquely mapped (ranging from 80.5 to 96.1%). Gene expression was quantified by the RSEM (v1.2.28) software [19] using default parameters and annotation from Ensembl Pig Genes 97.

Pre-processing of the raw count matrix was performed by filtering based on a minimum of 129 reads per row and 15,091 genes were retained for further analyses. The raw count data were transformed into counts per million (CPM) to normalize the values (i.e., with log = TRUE and prior.count = 1 arguments) using the edgeR v3.38.1 package [20]. Then, the 15,091 retained genes were matched against the newer annotation from Ensembl Pig Genes 104 (Sscrofa11.1) using the biomaRt package [21] v2.52.0, which left 12,381 genes with a gene name or symbol.

A regularized canonical correlation analysis (rCCA) was performed using the expression dataset of the 12,381 genes (matrix Y ) and the 15 FA traits (matrix X) measured on the 129 individuals. The rCCA multivariate approach is implemented in the mixOmics v6.14.1 package [12], which allows subsets of canonical variables that maximize the correlation between two datasets (X and Y, respectively of sizes n × p and n × q) to be identified [22]. The shrinkage method was used to tune out the regularization parameters (λ1 and λ2) with values of λ1 = 0.05 and λ2 = 0.15, and ncomp = 3. Rather than considering all the genes that were included in the first canonical component (CC1) and according to the previous estimate by Ramayo-Caldas et al. [23], we applied a conservative approach and only kept genes for which the correlation between gene expression and FA traits was at least 0.29.

To display the rCCA results and improve their interpretation, we applied three graphical outputs (circle plot, network, and Clustered Image Maps) that are all implemented in the mixOmics package [12] via plotVar, network and cim functions. In particular, the generated network output was exported to Cytoscape file format using the igraph v1.3.2 package [24]. In the network [25], FA were filled with colors to facilitate the identification of each group (i.e., SFA, MUFA and PUFA) and the connected genes. Likewise, the genes were filled with different colors according to their functional group or gene family using a pre-built list according to their functions (see the ‘Gene functional information’ section). In addition, we used a complementary heatmap via the ComplexHeatmap v2.14.0 package [26] to illustrate the different clusters of variables and the degree of correlation between them.

For the functional analysis, genes included in the CC1 (cutoff of r ±|0.29|) were submitted to the ClueGO v2.5.4 plugin [27] in the Cytoscape v3.7.1 software [25], using default parameters. Gene ontology (GO) significance was assessed with a hypergeometric test, keeping only the GO terms (biological processes, molecular function, and pathways) that had a corrected Benjamini–Hochberg (BH) P-value lower than 0.05 [28]. All genes expressed in muscle (12,381) were included as background in this step. Furthermore, GO tree interval levels were set from three to eight and a minimum k-score of 0.44 and a minimum of three genes per cluster with at least 4% in selected genes were used. Results with and without the fusion feature “GO Term Fusion” were generated to evaluate the redundant parent–child terms. In addition, we visualized the ClueGO output using an R script via the ggplot2 v3.3.5 package [29], which allowed us to identify the intersection of significantly associated genes according to over-represented GO terms.

Analysis of regulatory impact factors (RIF)

The RIF analysis was conducted using the runAnalysis function of the CeTF v1.8.0 package [30]. The RIF algorithm is described in detail in Reverter et al. [11]. Briefly, the RIF metrics (RIF1 and RIF2) aim at identifying relevant regulators (i.e., transcription factors, TF) from the gene expression data. This step calculates, for each condition, the co-expression correlation between the TF and the differentially expressed (DE) genes. For the DE analysis, we created two conditions by classifying samples according to their FA profile through a principal component analysis (PCA). In the PCA, the prcomp function with scale = TRUE was used, considering as input the composition data of the pre-selected FA from the rCCA. In fact, we chose the extreme values from PC1 (condition 1 and condition 2 with n = 60, i.e., 30 samples per condition). Here, animals in condition 1 included 10 females and 20 males while those in condition 2 included 15 females and 15 males, in both cases belonging to five slaughterhouse batches. The fviz_pca function of the factoextra v1.0.7 package was used to extract and visualize the PCA results [31], including the FA profile according to the three classes (SFA, MUFA and PUFA). Significant differences (corrected P-value < 0.05) between the means of FA conditions and phenotypes, for each FA selected in the rCCA were determined using a t-test approach, and the standard error of the mean (SEM) was calculated. We also tested the correlation between the FA used in the selection of the two groups for DE gene analysis and we identified the TF present in the expression data based on the list of pig TF available in the AnimalTFDB v3.0 database (http://bioinfo.life.hust.edu.cn/AnimalTFDB/#!/).

Analysis of the target genes of the transcription factors

The list of TF used in the RIF step was also examined to identify in silico putative target genes with an expression that was DE between the two FA conditions (n = 30 samples per condition). This complementary analysis was conducted using the SmearPlot function [30]. Briefly, the predicted target genes identified by this approach were extracted, and, for each condition, co-expression between the TF and the DE genes was calculated using the partial correlation with information theory (PCIT) algorithm. The resulting matrix contained the correlations between genes, and a separate object included information about the significant correlations (adjusted values, padj = 0.05) of target genes (lfc = 1.5 and padj = 0.05) and the TF of interest.

Functional classification of FA-interconnected genes in the network

To facilitate functional classification of the candidate genes, we ranked the global list of rCCA-derived genes. Overall, we focused on the functional annotation of genes and their plausible function in different aspects of FA, lipid, and carbohydrate metabolisms (see Additional file 1: Tables S1, S2 and S3). First, we used information from the ClueGO analysis that divided the genes into different functional groups, which contained biological processes and pathways clustered according to GO term similarities. Second, a trained list was created using the GUILDify v2.0 tool [32], which included genes associated with predefined keywords such as: “adipokine”, "amino acid metabolism”, “electron transport chain”, “enzyme”, “fatty acid beta-oxidation”, “fatty acid metabolism”, “fatty acid synthesis”, “gluconeogenesis”, “glucose metabolism”, “glycolysis”, “tricarboxylic acid cycle”, “lipid metabolism”, “carbohydrate metabolism”, “lipid degradation”, “lipid synthesis”, “nucleic acid”, “nucleotide metabolism”, “nutrient”, “receptor”, “transporter”, and “energy homeostasis”; as well as homo sapiens species and lipogenic tissues (adipose, liver and muscle-skeletal) options. Briefly, GUILDify uses the biological interaction and network analysis (BIANA) knowledge database to create a species-specific interaction network for each gene detected. In the current study, the netcombo prioritization algorithm based on network topology, and the highest guild score for the top 100 gene products (with only seed) were considered to constitute such a list. Third, the presence of TF and cofactors in pigs was corroborated according to the annotation of the aforementioned AnimalTFDB v3.0 database. Thus, the gene functional classification was based on the potential biological functions that were compiled from the overlap of the rCCA-derived gene list with the three previously explained information sources.

Results

Regularized canonical correlation analysis (rCCA)

Here, we used a multivariate integrative approach to explore the relationship between the muscle transcriptome (n = 12,381 genes) and intramuscular FA composition (n = 15 FA) of 129 BC1_DU pigs. The correlation structure between datasets of interest is shown in Fig. 1. The rCCA yielded 365 genes and 15 FA (see Additional file 2: Table S4) that were included in the first canonical component (CC1). As expected, our results revealed that CC1 separated both SFA (C16:0) and MUFA (C18:1n-9) from PUFA (C18:3n-3, C18:2n-6, C20:2n-6, C20:3n-3, C20:3n-6 and C20:4n-6), while the second canonical component (CC2) differentiated C16:1n-7 and C18:1n-7 MUFA from C18:0 SFA. An additional correlation circle plot from the PCA with contribution variables according to CC1 versus the third canonical component (CC3) is presented in Additional file 3: Fig. S1.

Fig. 1
figure 1

Correlation circle plot from the PCA applied to the FA phenotypes and gene expression in muscle of BC1_DU pigs for the first two rCCA dimensions (15 FA and 365 genes selected in total)

The most relevant variables participating in the definition of each component were C16:1n-7, C18:1n-9 and C18:0, as well as C18:2n-6, C20:3n-6, C20:4n-6, C20:3n-3 and C20:2n-6 (Fig. 1). For example, we observed a relevant contribution of C18:2n-6, C20:3n-6, C20:4n-6 (positive) and C18:1n-9, C16:0 (negative) on component 1, and C16:1n-7 (positive) and C18:0 (negative) on component 2. Likewise, Fig. 1 suggests a negative correlation between [C18:2n-6, C20:3n-6, C20:4n-6, C18:3n-3] and C18:1n-9. Notably, the 365 genes were clustered in four groups at radii ~ 0.29 (with prioritized variables of gene expression) (Fig. 1). Details on the genes correlated with each FA (with and without the cutoff) are in Additional file 2: Table S4.

The results of the network approach suggested a complex correlation structure (bipartite relationship) between the FA profile and gene expression data (Fig. 2). Simultaneously, the representation of correlations via edges indicated that positive relationships were more common than negative relationships (i.e., lines in green and in red, respectively, in Fig. 2). As shown in Fig. 2, the network added another layer of information, which allowed visualization of groups of variables (FA and genes). Our results show that 13 FA and 365 genes (378 nodes in total) were selected. Among the 13 FA, five (C16:1n-7, C20:1n-9, C18:0, C18:2n-6, and C20:4n-6) were correlated with a list of specific genes for each FA (see Additional file 2: Table S4). The fourth most interconnected FA (see Additional file 2: Table S4) were C20:4n-6, C18:2n-6, C20:3n-6 and C18.1n-9 (Fig. 2). Remarkably, both C20:4n-6 and C18:2n-6 showed the largest number of correlated genes, with 87% of the genes in common. In particular, C18:1n-9 was grouped very close to seven other FA (C14:0, C20:3n-3, C20:2n-6, C20:3n-6, C16:1n-10, C18:3n-3 and C16:0) (Fig. 2). In fact, C20:3n-6 and C18:1n-9 were the second pair of FA with the largest number of correlated genes, with 80.5% of genes in common. Please note, that the subsets of connected genes suggested more complex relationships with the presence of shared or specific genes for FA.

Fig. 2
figure 2

Network plot for the longissimus dorsi muscle study in BC1_DU pigs. Green and red edges indicate positive and negative correlations. Output obtained for the first three rCCA dimensions (13 FA and 365 genes were selected), showing the correlation structure for all bipartite relationships with a correlation cutoff of 0.29. Color legend: FA: magenta = SFA members; royal blue = MUFA members; orange red = PUFA members; and genes: dark orange = enzyme; aquamarine = adipokine; chartreuse = TF; turquoise = TF cofactors; yellow = lipid metabolism‐related genes; Navajo white = carbohydrate metabolism; crimson = glycolysis; gold = transporter; light pink = fatty acid beta-oxidation; coral = amino acid metabolism; corn silk = receptor family; deep sky blue = nucleic acid metabolism. Out of a total of 365 genes, the 176 colored genes refer to a functional or gene family classification (while 189 genes were unclassified)

It is worth mentioning that we observed a set of 10 rCCA-derived genes (PLIN1, UNC93A, SFRP5, PPP1R1B, LEP, OPRL1, PTPRU, NUDT14, TFRC, and CYP4B1) that showed the largest number of connections with FA (i.e., 7 to 10 connections out of 13 FA). In addition, six other rCCA-derived genes (CNP, LBX1, NDUFAF4, NMNAT2, PIK3C2G, and SDR16C5) showed at least six connections with total FA (Fig. 2) (for details [see Additional file 2: Table S4]). The LBX1, PIK3C2G, and SDR16C5 genes were positively correlated with C16:0 and C18:1n-9 and negatively correlated with C18:2n-6, C20:3n-6 and C20:4n-6, and only the PIK3C2G gene was negatively correlated with C18:3n-3. However, we also detected 107 genes that were uniquely correlated with a specific FA (Fig. 2) and (see Additional file 2: Table S4). For instance, the CEBPG and GCLC genes were negatively correlated with C18.0, while EXOC7 was positively correlated with C18.0. Genes such as ADIPOQ, LGALS12, PAX7, CARHSP1 and YBX2 were positively correlated with C16:1n-7. As other examples, the CEBPA, ELOVL6, and MCTP2 genes were correlated negatively with C18:2n-6 and C18:3n-3, respectively; while SDHD, LPIN1, DLD, and SNW1 were positively correlated with C20:4n-6, and HMGCR, LEPROTL1, and TCEA1 were positively correlated with C18:2n-6 and C20:4n-6. The PNPLA8, HMGCS1, KATNAL1, and ATAD1 genes were negatively correlated with C18:1n-9, but positively correlated with C18:2n-6 and C20:4n-6 (Fig. 2).

A complementary heatmap illustrated the hierarchical clustering of the variables (13 FA and 365 genes) and the degree of correlation between them (see Additional file 4: Fig. S2). In general, two large clusters of FA and four clusters of genes were observed. Regarding the FA, cluster 1 was composed of six FA (C16:1n-7, C18:0, C20:1n-9, C18:1n-9, C16:0 and C14:0), while cluster 2 grouped the other seven FA (C20:2n-6, C16:1n-10, C18:3n-3, C20:3n-3, C20:3n-6, C20:4n-6, and C18:2n-6). For the four gene clusters, a variable number of genes was detected in each of them. Notably, cluster 1 grouped together 41 genes, including the nine genes showing the greatest interconnection with FA, five TF (CARHSP1, CEBPA, DBX1, PAX7, and YBX2), plus 27 protein-coding genes including those functionally related to lipid metabolism (e.g., ELOVL6, FGFR4, LGALS12, NEU3, NGFR, NMRK1, NUDT14, PIK3C2G, SDR16C5, and WNT6). The two major gene clusters were 4 and 2, in which we detected several genes involved in energy and lipid metabolism [e.g., cluster 2: G0S2, SOD3, SLC44A2, PLCD3, EDF1, GDF1, and FBP1; cluster 4: CYCS, CYLD, EIF3A, GALNT15, GOT1, HMGCR, IDH3A, LPIN1, MDH1, MRPL44, MTIF2, NAMPT, NMNAT2, PDHA1, PNPLA8, SLC31A1, TFRC, and UGP2] and other TF [e.g., cluster 2: LBX1, MAFA, and SIX5; cluster 4: CEBPG, HBP1, KLF10, TADA2A, TERF1, and TFAM]. Finally, cluster 3 contained the smallest number of genes, most of which are not well described in the literature. As an example, the CHUK (encoding a component of a cytokine-activated protein complex), HSP90AA1, and GCLC genes are part of cluster 3 and are related to lipid or glucose metabolism. Details on the genes contained in each cluster and their respective correlation with FA are in Additional file 4: Fig. S2 and Additional file 2: Table S4, respectively. In addition, details on the distribution of correlations of FA with gene expression based on a density heatmap (with quantiles and mean values) are in Additional file 5: Fig. S3.

Functional analysis of genes correlated with FA

The 365 genes selected by rCCA were submitted to a GO analysis. Fifty-four GO terms (8 molecular functions, 8 pathways, and 38 biological processes) (see Additional file 7: Table S5) were significantly over-represented (BH corrected P-value < 0.05). In total, 125 genes were annotated into different functional groups, including an enrichment in GO terms related to lipid and carbohydrate metabolism. Notably, some of the closely associated Kyoto encyclopedia of genes and genomes (KEGG) pathways were “regulation of lipolysis in adipocytes”, “citrate cycle (TCA cycle)”, “non-alcoholic fatty liver disease (NAFLD)”, “oxidative phosphorylation” and “Insulin signaling pathway” (Fig. 3).

Fig. 3
figure 3

Functional analysis of the correlated genes from the rCCA approach that were significantly enriched in GO terms according to metabolic pathway delimitation. This output is a representation of the original table of generated with the ClueGO plugin in the Cytoscape software

The GO analysis also suggested that these genes were significantly enriched in biological processes (see Additional file 6: Fig. S4 with “GO Term Fusion”), such as “mitochondrial gene expression”, “tricarboxylic acid cycle”, “electron transport chain”, “ATP hydrolysis coupled cation transmembrane transport”, “regulation of response to nutrient levels”, “magnesium ion transmembrane transport”, “generation of precursor metabolites and energy”, and “respiratory electron transport chain”. The complete results of GO terms with and without “GO Term Fusion” are listed in Additional file 7: Table S5.

Analysis of regulatory impact factors (RIF)

Based on the output of the rCCA, 22 putative regulators (i.e., TF) were identified. To perform the RIF analysis, the dataset was split into two conditions (30 individuals in each) based on the FA profile from the PCA data (Fig. 4). Condition 1 included animals that had a FA profile with fewer SFA and MUFA and more PUFA (and a lower IMF content) and condition 2 was the opposite with more SFA and MUFA and fewer PUFA, concurrently with a higher IMF content (see Additional file 8: Table S6).

Fig. 4
figure 4

Principal component analysis summarizing the separation and similarities among FA profiles for the pigs with extreme FA values. PC1 and PC2 explained 77.40% of the total variance. The visualization of the eigenvectors was implemented to delimit the FA belonging to their respective group (SFA, MUFA, PUFA)

We also found a complex correlation pattern between components of the three classes of FA (SFA, MUFA and PUFA) in muscle. For example, some FA, such as C18:0 SFA and C18:1n-9 MUFA were positively correlated with each other but negatively with the most abundant PUFA (C18:2n-6, C18:3n-3, and C20:4n-6), which were positively correlated with each other. Correlations between the FA are in Additional file 9: Table S7. The results of the DE analysis revealed 293 DE genes (i.e., 274 genes and 19 TF) (lfc = 1.5 and padj = 0.05) between the two FA conditions (see Additional file 10: Table S8).

Table 2 contains a summary of the 19 regulators that were identified by including the RIF parameters. To interpret these results, it is important to note that information from RIF1 and RIF2 is complementary. The RIF1 score classified the TF that were the most differentially co-expressed with the highly abundant and highly DE genes, whereas the RIF2 score classified the TF that had the best ability to act as potential predictors of the abundance of DE genes due to FA differences. Among them, the top five TF with extreme values for the RIF metrics (Table 2) are the following, with a positive score: CARHSP1, TERF1, CEBPG, TFAM, MAF; and with a negative score: TADA2A, MBD2, HBP1, SIX5, and FOXJ3); and for RIF2 with a positive score: KLF10, TADA2A, HBP1, FOXJ3, and ZNF407, and with a negative score MBD2, SIX5, TERF1, CEBPG, and MAF. Based on the absolute values of RIF, we found that the first and second most relevant regulators were TADA2A and CARHSP1 based on RIF1 and KLF10 and TADA2A based on RIF2 (Table 2). Table 2 also shows that the TF were classified into ten different families (i.e., based on AnimalTFDB3.0 database).

Table 2 RIF analysis with 19 significant TFa identified by RIF1 or RIF2 metrics according to extreme FA profiles and gene expression in BC1_DU pigs

In silico prediction of transcription factor target genes in the post-RIF stage

After the RIF analysis, an in silico prediction was carried out based on the co-expression between each DE TF (n = 19) and its possible DE target genes (n = 274) for each FA condition. Consistently, target genes that were DE between the two conditions were identified (Table 3). PCIT indicated that only six TF genes were significantly correlated with 29 potential targets (CARHSP1, LBX1, MAFA, PAX7, SIX5, and TADA2A). These TF genes belonged to five of the 10 families identified (MYB, CSD, Homeobox, PAX and TF_bZIP).

Table 3 In silico prediction of TF target genes for high and low FA profiles, including potential regulators and their putative differentially expressed targets in BC1_DU pigs

Furthermore, as shown in Table 3, the TF genes were positively or negatively correlated with either a specific FA or several FA as shown in Additional file 2: Table S4. Additional information on the distribution of the DE genes and specific TF genes in both conditions and the relationships between log(baseMean) and expression difference are in Additional file 11: Table S9.

Overall, several of the identified candidate target genes (listed in Table 3) are well-known for their role in lipid metabolism (ADIPOQ, DHRS3, FGFR4, NEU3, PLIN1, TFRC, and WNT5B), carbohydrate metabolism (ADIPOQ, FBXO6, GALNT15, GOT1, NEU3, NUDT14, and PPP1R3D), glucose metabolism (MAFA), and ion binding (ADIPOQ, FGFR4, GOT1, NUDT14, SOD3, and TADA2A), among other categories. Thus, this target prediction approach allowed us to uncover target genes of newly identified TF genes in pig muscle, but also target genes that are associated with FA metabolism, such as TADA2A and CARHSP1. As shown in Table 3, both genes are targets for each other, but also share FA-related target genes (e.g., GOT1, PLIN1, and TFRC).

Discussion

Our results confirm a complex and bipartite relationship between intramuscular FA composition and gene expression. Gene expression levels can be associated with either specific or several FA traits (either positively or negatively). Our findings of the integrative analysis using FA composition and gene expression datasets complement and extend previous work reported by Valdés-Hernández et al. [9] who performed a univariate association analysis that ignored gene-by-gene interactions and the putative transcriptomic regulators. Here, we analyzed relevant traits such as FA involved in FA ratios that represent enzyme activites. In addition, an updated version of the pig genome annotation was used (release 104 vs. 97). Furthermore, to highlight the most relevant genes and their regulators, a 3-step analytical pipeline was executed, comprising (1) multivariate analysis; (2) partial correlation calculations with information theory (PCIT), and (3) study of RIF associated with FA metabolism. Taken together, the results may facilitate the implementation of breeding strategies based on the use of functional information and improve our understanding of gene regulation in muscle.

Identification of subsets of canonical variables that maximize the correlation between gene expression and FA profiles, including functional information

Our study provides useful information on representative FA for the SFA, MUFA and PUFA classes, including candidate genes that may be associated with such FA traits. Through exploratory approaches [22, 24,25,26], we illustrated the relationships between muscle gene expression and intramuscular FA composition. In total, we identified a subset of 378 correlated variables (13 FA and 365 genes).

Interestingly, FA were grouped into two large groups based on the heatmap approach [26]. Essential FA that are obtained only from the diet and their derivative molecules (e.g., C18:2n-6, C18:3n-3, C20:2n-6, C20:3n-3, C20:3n-6, and C20:4n-6) were clustered separately from those that can be derived from biosynthesis (e.g., C14:0, C16:0, C18:0, C18:1n-9, and C20:1n-9). We also found an interesting inverse relationship between [C20:4n-6, C18:3n-3] and C18:1n-9 FA. This was previously observed in humans, rats, and chickens, which suggests that the inverse association between relative abundances of C18:1n-9 (oleic acid) and C20:4n-6 (arachidonic acid) is related to C18:3n-3 (alpha-linolenic acid) [33] and could reflect the buffering capacity of n-3 FA over inflammatory signals. Omega-6 and omega-3 PUFA have antagonistic inflammatory functions, with C20:4n-6 being a pro-inflammatory and immunoactive FA [34] and an important constituent of membrane phospholipids involved in signal transduction [35]. In pigs, carcass fatness shows a positive correlation with IMF. In addition, greater fatness has been associated with a lower relative proportion of PUFA and a higher SFA and MUFA content [36]. Although the n-6/n-3 FA ratio is affected by feeding, for a particular diet, the C18:2n-6/C18:3n-3 ratio is higher in lean meat compared to meat with a higher fat level [36]. Hence, in the current work, muscle FA composition may be partially explained by differences in IMF (see Additional file 8: Table S6). The oleic acid, C18:1n-9, is one of the most predominant MUFA in the triacylglycerols, cholesteryl esters, wax esters, and membrane phospholipids [37], and may also improve meat organoleptic properties and overall acceptability parameters of meat [38].

Analysis of the interaction network [25] revealed the five most interconnected FA (C20:4n-6, C18:2n-6, C20:3n-6, C18.1n-9, and C18.0), which had the largest number of associated genes. We found that the n-6 FA (C20:4n-6, C18:2n-6, and C20:3n-6) shared more than 51% of its correlated genes (see Additional file 2 Table S4). Remarkably, we also found overlaps of 100% and 80.54% when comparing the C18.1n-9 list versus the C20:4n-6 and C18:2n-6, and against the C20:3n-6 list, respectively. It turns out that some of the genes that were correlated with C18.1n-9 also displayed other associations, including those with minority FA. This means that C18.1n-9 is a key FA in muscle and captures complex associations, with shared or specific genes linked to FA metabolism. Furthermore, five of the 13 analyzed FA (C16:1n-7, C20:1n-9, C18:0, C18:2n-6, and C20:4n-6) were correlated with a list of specific genes. For instance, emerging evidence in mice suggests that the lipokine, C16:1n-7, is an adipose tissue-derived lipid hormone that strongly stimulates muscle insulin action [39], thereby regulating systemic metabolic homeostasis.

Within the 365 rCCA-derived genes that were grouped into four clusters by the heatmap approach [26], we observed several candidate genes for FA metabolism, which also overlapped with results from previous research in pig populations. Considering the BC1_DU animals, 24 of the 365 FA-correlated genes in LD muscle were identified using a different association analysis strategy [9] (CPD, CYCS, LBX1, LEP, LGALS12, LRFN1, MAMSTR, MDH1, NMNAT2, NMRK1, NUP35, OPRL1, PDCD1LG2, PIK3C2G, PLCD3, PLIN1, PPP1R1B, PTPRU, SFRP5, TENT5C, TFRC, TIMM8A, UNC93A, and WNT6), with overlaps that were mostly observed in only six FA phenotypes (C18:0, C16:1n-7, C18:1n-9, C20:1n-9, C18:3n-3, and C20:2n-6). Therefore, the genes detected by both strategies pointed to several candidate genes related to FA metabolism, which provided further validation of our findings. Overall, we revealed 10 GO terms and two KEGG pathways that were consistent between the two gene functional studies, which were all mainly related to metabolism and energy homeostasis. Among these, the citrate cycle (TCA cycle or Krebs cycle) is an important aerobic pathway for the final steps of the oxidation of carbohydrates, FA, and amino acids [40], providing precursors for many biosynthetic pathways. For example, common functional genes such as LEP, MDH1, CYCS, and NMNAT2 were enriched in such GO terms, indicating a potential regulatory role of these genes in FA and energy metabolism.

Although the aforementioned results indicated some degree of overlap in the detected candidate genes and overrepresented GO terms, the univariate association analysis and the integrative analysis used here should be considered as complementary strategies, as they differ in their analytical methods. However, the current study was based on a combination of multivariate methods (i.e. rCCA) [41], which integrates two datasets measured on the same samples (gene expression and FA composition, here without correction for systemic factors). rCCA achieves dimension reduction in each dataset while maximising the similar information between them, thus selecting variables that maximize the correlation. This allows variable selection and priorization at the gene and FA level, with a higher potential to explain the largest proportion of the relationship between the two subsets of data. We conducted a combination of complementary analyses (network, and RIF, in silico prediction of putative target genes, and GO term enrichment) to further prioritize the informative variables and provide insight into biological processes and pathways, in particular among those, the one that is associated with intramuscular FA metabolim.

More specifically, functional analysis with the 365 rCCA-derived genes indicated overrepresentation of the insulin signaling pathway. It is worth noting that the LEP, MDH1, and CYCS genes were also enriched in the insulin signaling pathway, which can affect intramuscular lipid metabolism [42]. Regulation of lipolysis in adipocytes highlights the potential role of certain candidate genes in lipolysis of skeletal muscle (e.g., PLIN1) [43], as well as FA derived from intramuscular lipolysis (e.g., C16:1n-7 and C18:1n-9). In fact, a previous study of porcine adipocytes showed that PLIN1 (perilipin 1) is a novel candidate gene for IMF deposition and adipocyte differentiation [44]. In addition, taking the exemplified FA into account, activation of adipocyte lipolysis by C16:1n-7 acid treatment has been demonstrated, while C18:1n-9 acid was chosen as a control FA in investigations in mice [45].

Our findings suggest that the six genes that were correlated with FA composition (ADIPOQ, CYP4B1, LEP, PLIN1, SDR16C5, and SFRP5) may also be responsible for IMF deposition [46]. Interestingly, the top 10 highly connected genes with FA (CYP4B1, LEP, NUDT14, OPRL1, PLIN1, PPP1R1B, PTPRU, SFRP5, TFRC, and UNC93A) generally showed positive correlations with MUFA and SFA, but negative correlations with PUFA. Therefore, they might be considered as candidate markers of adipocyte physiology. Among these, the PLIN1, LEP, and CYP4B1 genes are known to play a pivotal role in the metabolism of FA (FA storage and degradation), while the UNC93A gene is related to innate and adaptive immunity [47].

Notably, the gene with the largest number of connections was PLIN1, which was associated with 10 of the 13 evaluated FA. With the exception of C18:3n-3, four other genes (LEP, PPP1R1B, SFRP5, and UNC93A) were connected with the same nine FA as PLIN1, presenting a similar pattern of positive correlations with SFA and MUFA, and negative correlations with PUFA. For instance, adipokine genes such as the LEP (leptin) and SFRP5 (secreted frizzled related protein 5) genes were identified as IMF-correlated genes in the muscle of Duroc × Luchuan pigs [46], and in addition, SFRP5 has been reported to be significantly DE in the muscle of Duroc pigs with an extreme lipid profile [48]. The SFRP5 gene encodes an adipokine with anti-inflammatory and insulin-sensitizing properties and appears to have an effect on cytokine release and insulin action in primary adipocytes and skeletal muscle cells [49]. Furthermore, SFRP5 plays an important role in recognizing FA as well as in lipogenesis and, depending on the type of ligand or co-receptor, it can stimulate or inhibit adipogenesis through the WNT pathways [50]. In mice, a long non-coding RNA of the protein phosphatase 1 regulatory inhibitor subunit 1B (PPP1R1B) gene is involved in skeletal muscle development [51]; such data argue for an important role of the PPP1R1B-lncRNA in promoting myogenic differentiation by competing for polycomb repressive complex 2 (PRC2) binding with chromatin of myogenic master regulators. While the relationship between UNC93A (unc-93 homolog A) and the lipid metabolism in muscle has yet to be explored, a study in mice determined that the expression of this putative solute carrier responded to nutrient availability [52]. In addition, UNC93A was also mentioned as a candidate gene in a quantitative trait loci (QTL) study for meat quality and disease resistance in the Chinese Jiangquhai pig breed [53].

Apart from the five genes mentioned above, among the 365 rCCA-derived genes several other candidate genes that are involved in lipid metabolism are worth mentioning, such as ADIPOQ, ELOVL6, LPIN1, G0S2, PNPLA8, and SOD3. As another adipokine and candidate for meat quality, the adiponectin (ADIPOQ) gene was positively correlated with abundance of. C16:1n-7. The protein encoded by ADIPOQ enhances FA oxidation both in the skeletal muscle and the liver. It stimulates muscle glucose uptake and inhibits glucose production by the liver, thus, decreasing blood glucose levels [54]. Moreover, it has been shown that upregulation of the ADIPOQ gene in muscle is associated with inhibition of fat deposition in castrated male Iberian pigs (Torbiscal variety) [55].

For regulation of lipogenesis, the ELOVL fatty acid elongase 6 (ELOVL6) gene is of particular interest, as it is directly involved in elongation of FA in mammals [56]. In our group, Corominas et al. [57] previously reported a plausible effect of the expression levels of ELOVL6 on the abundance of C16:0 and C16:1n-7 FA in backfat and muscle of Landrace backcross pigs. However, in our study, the BC1_DU pigs showed only a negative correlation with abundance of the essential FA C18:2n-6. As ELOVL6 does not elongate C18:2n-6, this negative correlation may be due to the increase in the relative abundance of other SFA and MUFA caused by ELOVL6. Regarding regulation of adipogenesis, our data revealed that the lipin 1 (LPIN1) gene was positively correlated with abundance of C20:4n-6. This gene has been shown to participate in metabolism of the arachidonic acid in Caenorhabditis elegans [58]. Furthermore, in a previous study from our group, LPIN1 has also been investigated as a potential candidate gene for intramuscular FA composition in a Landrace backcross population [4].

Apart from PLIN1, other genes involved in lipolysis and/or adipogenesis were FBP1, G0S2, and SOD3, which were positively correlated with abundance of C18:1n-9 and negatively correlated with abundances of C18:2n-6, C20:3n-6, and C20:4n-6. For instance, the expression level of the porcine G0/G1 switch 2 (G0S2) gene increased significantly during adipogenesis (both in in vitro and in vivo studies) [59]. In the same study, G0S2 was suggested to be a negative regulator of adipose triglyceride lipase (ATGL)-mediated lipolysis and cell proliferation in adipose tissue, and to be closely related to lipid accumulation [59]. As an antioxidant enzyme, SOD3 (superoxide dismutase 3) is secreted by adipocytes and has the potential to prevent oxidative stress. In mice, Cui et al. [60] suggested that SOD3 had a specific function in blocking adipogenesis and that overexpression of SOD3 suppressed expression of pro-inflammatory genes in adipose tissue, and increased expression of anti-inflammatory genes.

The positive correlations of levels of expression of the FBP1, G0S2, and SOD3 genes with abundance of oleic acid (C18:1n-9), and their negative correlations with the three other PUFA may be due to their involvement in lipid metabolism of the host, rather than to metabolism of the essential FA and their derivatives. In fact, we highlighted candidate genes, such as ADIPOQ, ELOVL6, and PLIN1 that were also identified as overexpressed genes in individuals that were divergent for IMF values and had a higher IMF content according to transcriptome analyses of LD muscle in Iberian pigs [61]. Furthermore, Benítez et al. [62] found that breed (Iberian/Duroc) had a modulatory effect on the expression of the ELOVL6 and LEP genes (adjusted P-value < 0.10) in the adipose tissue from growing pigs; lipogenic (ELOVL6 and LEP) and lipolytic (G0S2 and PLIN1) genes had a higher expression in biopsies obtained from the Iberian pigs. This seems to be the case when lipid deposition in the muscle of pigs is the result of a balance between lipogenesis and lipolysis processes [63]. Nevertheless, fat accumulation in animals results from an imbalance between synthesis and degradation. When synthesis of FA is greater than their consumption, FA are deposited in cells instead of mobilized to provide energy [64]. In addition, in an association study for backfat FA composition in free-range Iberian pigs, two single nucleotide polymorphisms (SNPs) ADIPOQ:g.124646194T>G and ELOVL6:g.112186423A>G were identified to have a significant association [65].

We also found that the levels of expression of genes such as patatin-like phospholipase domain containing 8 (PNPLA8), glutamate oxaloacetate transaminase 1 (GOT1), and 3-hydroxy-3-methylglutaryl-CoA synthase 1 (HMGCS1) were negatively correlated with abundance of C18:1n-9, while they were positively correlated with abundance of C18:2n-6 and C20:4n-6, together with the level of expression of the 3-hydroxy-3-methylglutaryl-CoA reductase (HMGCR) gene. The PNPLA8 protein (also known as iPLA2γ) plays an important role in lipolysis and FA oxidation. It belongs to a family of phospholipases that catalyze the cleavage of FA from membrane phospholipids [66]. Interestingly, PNPLA8 may preferentially act on arachidonic containing membrane phospholipids (C20:4n-6, a FA that can undergo beta-oxidation) to generate free arachidonic acid, along with lysophosphatidic acid [67]. Therefore, PNPLA8 plays an important role in mobilization of arachidonic acid in response to cellular stimuli [68] and in release of lipid second messengers. As another candidate for meat quality and carbohydrate metabolism, the GOT1 gene controls cellular metabolism by coordinating utilization of carbohydrates and amino acids to meet nutrient requirements [69], but it is also crucial in providing oxaloacetate at low glucose levels, likely to maintain the redox homeostasis. In contrast, the HMGCR gene encodes a cholesterol-synthesis limiting enzyme, an enzyme of the mevalonate pathway, which participates in fat deposition and is associated with meat composition traits [70, 71]. In addition, synonymous polymorphisms in this gene, such as HMGCR:c.807A>C, have been shown to be associated with muscle lipid deposition and cholesterol-related traits in Duroc pigs with an extreme lipid profile [70].

It is also worth mentioning that the two transporter genes exocyst complex component 7 (EXOC7) and solute carrier family 44 member 2 (SLC44A2) may have a role in lipid metabolism. The level of expression of the EXOC7 gene was found to be positively correlated with abundance of C18:0. As EXOC7 is a component of the exocyst complex, which regulates free FA uptake by adipocytes [72], it is involved in diverse biological functions, including promoting translocation of the glucose transporter GLUT4 in the cell. Although there are other members of the solute carrier family, the SLC44A2 gene presented the same correlation pattern (positive with MUFA but negative with PUFA) as the FBP1, G0S2 and SOD3 genes. In mice, SLC44A2 mediates choline transport into mitochondria, and regulates synthesis of ATP, platelet activation and thrombosis [73]. In addition, supporting information for SLC44A2 has suggested its important role for normal homeostasis [74].

Identification of potential regulators of gene expression and their putative gene targets in muscle

We used RIF analysis to highlight putative regulators and to assess their potential role in controlling their predicted target genes. We identified six TF-regulator genes, including two novel (TADA2A and CARHSP1) but also four well-documented regulators (MAFA, SIX5, LBX1, and PAX7). Remarkably, among these six TF genes, TADA2A and CARSHP1 based on RIF1 and KLF10 and TADA2A based on RIF2 were scored as the first and second most relevant TF genes. In addition, the LBX1, KLF10, PAX7, and SIX5 genes encode TF that may be putative regulators of lean muscle growth. Moreover, several target genes of TADA2A and CARSHP1 were detected as being functionally related to lipid metabolism (e.g., PLIN1 and TFRC) and/or meat quality (e.g., GOT1). However, no target genes were detected for KLF10.

Considering the rCCA approach, our findings in muscle suggest that the transcriptional adaptor 2A (TADA2A) gene was linked to the four most interconnected FA (positively associated with C20:4n-6, C18:2n-6, and C20:3n-6, respectively, and negatively associated with C18:1n-9). The TADA2A protein is part of the ATAC (Ada-Two-A-Containing) complex that interacts with the TATA-binding protein for transcriptional activation [75]. In addition, TADA2A has been suggested to be involved in de novo hepatic lipogenesis in chickens fed different diets [76]. The level of expression of the calcium regulated heat stable protein 1 (CARHSP1) gene was positively linked to the sixth most interconnected FA (C16:1n-7). In mice, CARHSP1 is regulated by the nutrient status in the liver and was suggested to inhibit hepatic gluconeogenic gene expression via repression of the transcriptional activity of the PPARα transcription factor [77]. Consequently, these two regulators were targets for each other in animals with condition 2 (i.e., more SFA and MUFA but fewer PUFA, and higher IMF content), as well as having shared target genes, such as GOT1, PLIN1, and TFRC. Taken together, these findings point to some of the potential transcriptional circuits through which key regulatory genes exert their impact on their targets and FA. Conversely, FA may also act as signaling candidates to regulate transcription of target genes that encode proteins that are involved in muscle lipid metabolism [78]. In turn, gene expression may also be modulated by FA abundance, as suggested by independent studies that correlate gene expression with FA composition or with IMF content in pig muscle [1, 4, 9, 46, 79, 80]. Other factors can also affect gene expression, such as environmental factors, genetic background, breeding systems, management, and host-factors. It has been reported that the FA composition in pig diets affects subcutaneous IMF FA profile [64, 79]. Recently, Ludwiczak et al. [81] also pointed out that FA profiles (SFA, MUFA, and PUFA) in the loin (e.g., in longissimus thoracis et lumborum muscle) or IMF content of European pigs (e.g., Nero Siciliano, Cinta Sense, and Iberian × Duroc) can be affected by diet or by the interaction between diet and housing system. However, in the present work a uniform diet was provided to all animals.

Conclusions

The findings of this study contribute to a better understanding of the complex relationship between FA composition and gene expression in muscle, but may also reveal patterns of gene expression involving Iberian and Duroc pigs. Based on the results of the rCCA, functional analysis, RIF analysis, prediction of target genes, and supporting literature, all the genes discussed above are promising candidates for muscle lipid deposition and FA composition in Iberian and Duroc pigs. Our rCCA-derived findings highlighted genes encoding enzymes associated with fat deposition, but also bioprocesses and metabolic pathways involved in the determination of FA traits. Using a complementary RIF analysis, we proposed two novel regulators (TADA2A and CARHSP1) for intramuscular FA metabolism. Functional analyses at the GO and pathway levels reinforced the significance of biological processes associated with energy, lipid, and carbohydrate metabolism, as well as of the KEGG pathway of regulation of lipolysis in adipocytes in muscle tissue. Furthermore, our results highlighted the ADIPOQ, ELOVL6, G0S2, HMGCR, LEP, LPIN1, PLIN1, PNPLA8, and SFRP5 genes for their lipogenic and/or lipolytic potential to contribute to intramuscular FA composition. Our study also identified the endogenous antioxidant enzyme SOD3 as having a promising role in regulation of adipogenesis in pig muscle.