Abstract
Background
Soybean is a major oil crop; the nutritional components of soybean oil are mainly controlled by unsaturated fatty acids (FA). Unsaturated FAs mainly include oleic acid (OA, 18:1), linoleic acid (LLA, 18:2), and linolenic acid (LNA, 18:3). The genetic architecture of unsaturated FAs in soybean seeds has not been fully elucidated, although many independent studies have been conducted. A 3 V multi-locus random single nucleotide polymorphism (SNP)-effect mixed linear model (3VmrMLM) was established to identify quantitative trait loci (QTLs) and QTL-by-environment interactions (QEIs) for complex traits.
Results
In this study, 194 soybean accessions with 36,981 SNPs were calculated using the 3VmrMLM model. As a result, 94 quantitative trait nucleotides (QTNs) and 19 QEIs were detected using single-environment (QTN) and multi-environment (QEI) methods. Three significant QEIs, namely rs4633292, rs39216169, and rs14264702, overlapped with a significant single-environment QTN.
Conclusions
For QTNs and QEIs, further haplotype analysis of candidate genes revealed that the Glyma.03G040400 and Glyma.17G236700 genes were beneficial haplotypes that may be associated with unsaturated FAs. This result provides ideas for the identification of soybean lipid-related genes and provides insights for breeding high oil soybean varieties in the future.
Similar content being viewed by others
Background
Soybean [Glycine max (L.) Merr.], a major oil crop, is commonly used in cooking oil [1]. Soybean oil is mainly composed of saturated fatty acids (FAs) and unsaturated FAs. Among them, saturated FAs include palmitic and stearic acids, and unsaturated FAs include oleic (OA), linoleic acid (LLA), and linolenic acids (LNA) [2, 3]. Unsaturated FA is the main component of vegetable oil, accounting for more than 80% [4]. The increase in the content of OA, a monounsaturated FA, can improve oxidative stability and prevent oxidation [4]. LLA and LNA are polyunsaturated FAs and are very beneficial to human health [5]. However, the LLA and LNA show poor stability at a high temperature and are easily oxidized [6]. Thus, an important goal of soybean breeders is to increase the OA level and reduce the LLA and LNA content [7, 8].
Genome-wide association study (GWAS) mapping can identify the genetic basis of a variety of complex traits [9]. To date, the single-locus GWAS method has been widely applied to mine genetic loci underlying important agronomic traits, including 100-seed weight in soybean and oil content and yield-related traits in maize [2, 10, 11]. However, quantitative trait nucleotides (QTNs) have been detected using the single-locus GWAS method, which has limited ability to detect QTNs because quantitative traits are affected by a polygenic background [12].
Currently, the mixed linear model (MLM) method can correct population structure and family relationships and is widely used [13]. Based on the MLM method, single-locus GWAS methods have been widely proposed, including EMMAX, FaST-LMM and GEMMA [12, 14, 15]. However, single-locus GWAS methods generally require Bonferroni correction and can be affected by a polygenic background. To overcome this problem in single-locus GWAS methods, multi-locus GWAS methods have been applied, in which statistics are applied to all loci [16]. These multi-locus GWAS methods mainly include FASTmrEMMA, FASTmrMLM, FarmCPU, and pLARmEB [17,18,19,20]. However, these methods have a high calculation burden, and the advantages of QTN-by-environment interactions (QEIs) have not been fully considered.
To address this, a new multi-locus GWAS model, the 3 V multi-locus random-SNP-effect mixed linear model (3VmrMLM), has been presented [21]. This method improves the QTL detection capability and can analyze the genetic variation of complex traits. It provides a new method for the gene discovery of complex traits.
In this study, a dataset of 194 soybean accessions with 36,981 SNPs was applied [2]. We analyzed the unsaturated FA content in this population of 194 soybean accessions based on the multi-locus GWAS model (3VmrMLM). Our aim was to detect significant QEIs and stable QTNs compared with the results of our previous study and other independent studies and to further identify candidate genes related to unsaturated FA content.
Results
Phenotypic variation of three soybean unsaturated FA compositions
The distribution of unsaturated FA content (including OA, LLA, and LNA) in the 194 soybean accessions is shown in Table 1. The coefficient of variation (CV%) differed among the three years. In 2013, the unsaturated FA content had the highest CV at 51% (OA), 48% (LLA), and 52% (LNA). In 2014, the CVs of OA, LLA, and LNA were relatively consistent at 28%, 25%, and 29%, respectively. In 2015, the CV of unsaturated FAs was basically the same as in 2014 (Table 1). The heritabilities of OA, LLA, and LNA were 0.41, 0.36, and 0.35, respectively (Table 1). The above results showed that the content of unsaturated FAs was affected by the environment.
The correlation coefficient of the unsaturated FA content was calculated. As shown in Fig. 1, OA, LLA, and LNA content had a high correlation within the same year. However, the OA, LLA, and LNA content was not high between different years. In 2013, OA was positively correlated with LLA and LNA (0.92 and 0.83, respectively). In 2014, OA was positively correlated with LLA and LNA (0.84 and 0.65, respectively). In 2015, OA was positively correlated with LLA and LNA (0.79 and 0.64, respectively). These results show that unsaturated FAs affect soybean oil accumulation.
Identification of QTNs for unsaturated FA-related traits using 3VmrMLM
In this study, the unsaturated FA content was reanalyzed using the single-environment QTN model (3VmrMLM). A total of 94 significant QTNs were associated with the unsaturated FA content (LOD score ≥ 3.0). Among them, 30, 34, and 30 QTNs were associated with OA, LLA, and LNA content, respectively (Table 2).
In 2013, 2014, and 2015, 12, 10, and eight QTNs were associated with OA content, with LOD scores of 4.55–25.92, 6.25–23.29, and 3.37–9.57, respectively. A total of 17, 10, and seven QTNs associated with LLA content were identified with LOD scores of 4.43–24.14, 3.42–20.76, and 4.98–17.08 in 2013, 2014, and 2015, respectively. In 3 years (2013, 2014, and 2015), 10, eight, and 12 QTNs associated with LNA content were detected with LOD scores of 4.17–22.53, 5.07–10.72, and 6.46–20.27, respectively (Table 2, Additional file 1: Fig. S1).
Detection of QEIs for unsaturated FA content using 3VmrMLM with multiple environments
The unsaturated FA content was reanalyzed in 3 years (2013, 2014, and 2015) using the multiple-environment QEI model (3VmrMLM) for identifying QEIs. A total of 19 significant/suggested QEIs were identified (Table 3, Fig. 2). Three significant QEIs overlapped with the above QTNs. In these QEIs, the r2 value was between 2.01 and 14.67, and the variance value was between 0.03 and 1.33 (Table 3).
Candidate gene prediction of significant QTNs associated with unsaturated FA in soybean
There were 1246 genes identified in the flanking genomic region of each significant QTN using the 3VmrMLM method (Additional file 1: Table S1). We further conducted the Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis. As shown in Additional file 1: Fig. S2A, 201 genes were significantly enriched in metabolism, genetic information processing, environmental information processing, cellular processes, and organismal systems, including lipid metabolism, amino acid metabolism, energy metabolism, transport, and catabolism. The results of the above enrichment analysis showed that some candidate genes around QTN were found in different processes.
The same methods mentioned above were used to analyze candidate genes in the flanking regions of the QEIs. A total of 301 candidate genes were found in the linked regions of significant QEIs (Additional file 1: Table S2). KEGG analysis found that 53 genes were significantly enriched in metabolism, genetic information processing, environmental information processing, and organismal systems, including carbohydrate metabolism and lipid metabolism (Additional file 1: Fig. S2B). In the multiple-environment QEI model, five known SNP markers were identified. In addition, some new SNP markers, including rs6528670, rs1902760, rs4633292, rs2457629, and rs48948953, were related to FA synthesis. Moreover, some known markers were identified in the multiple-environment QEI model, including rs14264702, rs34595703, rs44492166, rs23852645, and rs26951255. Three significant QEIs, namely rs4633292, rs39216169, and rs14264702, overlapped with significant QTN in a single year, of which rs14264702 has been reported [22].
Transcriptomic analysis of HUFA and LUFA soybean seeds
RNA-seq analysis was conducted to reveal the transcriptional regulation of unsaturated FA metabolism in HUFA (high unsaturated fatty acid) and LUFA (low unsaturated fatty acid) soybean seeds. Three comparison groups were analyzed: a comparison group of five HUFA and five LUFA varieties (FHUFA vs. FLUFA); a comparison of 10 HUFA and 10 LUFA varieties (THUFA vs. TLUFA); and a comparison of 15 HUFA and 15 HUFA varieties (HUFA vs. LUFA) (Additional file 1: Table S3).
There were 4013, 3504, and 2546 DEGs in the FHUFA vs. FLUFA, THUFA vs. TLUFA, and HUFA vs. LUFA groups, respectively (Fig. 3A, B). In each comparison group, the number of upregulated DEGs was higher than the number of downregulated DEGs. As shown in Fig. 3C, 1160 common DEGs were upregulated, while 183 common DEGs were downregulated.
Identification of candidate genes by integrating GWAS and RNA-seq analysis
To further identify candidate genes, DEGs were identified by integrating GWAS and RNA-seq analysis and by analyzing potential candidate genes. In the single-environment QTN model, 91, 85, and 61 DEGs were found in the FHUFA vs. FLUFA, THUFA vs. TLUFA, and HUFA vs. LUFA groups, respectively (Additional file 1: Table S1). A total of 30 DEGs were found in all three comparison groups. Among them, Glyma.10G079500, Glyma.19G163600, Glyma.09G033600, and Glyma.02G068900 genes were upregulated (Log2FC > 3), and Glyma.07G205400, Glyma.09G053700, and Glyma.06G175100 genes were downregulated (Log2FC < − 1) (Table 4, Additional file 1: Fig. S3A).
In the multiple-environment QEI model, 26, 28, and 20 DEGs were found in the FHUFA vs. FLUFA, THUFA vs. TLUFA, and HUFA vs. LUFA groups, respectively (Additional file 1: Table S2). Among these candidate genes, nine were simultaneously detected by GWAS and common DEGs in all three comparison groups. These nine genes included those encoding an Acyl-CoA-binding protein (Glyma.17G236700), Ankyrin repeat family protein (Glyma.09G053700), Nodulin MtN3 family protein (Glyma.08G025100), Integrase-type DNA-binding superfamily protein (Glyma.18G206600), Calmodulin-domain protein kinase 9 (Glyma.14G023500), ARM repeat superfamily protein (Glyma.03G036700), Protein kinase superfamily protein (Glyma.03G036000), BRI1 kinase inhibitor 1 (Glyma.06G039100), NAC domain-containing protein 73 (Glyma.13G234700), and unknown function protein (Glyma.18G205700, Glyma.18G205400) (Table 5, Additional file 1: Fig. S3B). The expression of these genes was further determined by qRT-PCR and was basically consistent with that of the transcriptome data (Additional file 1: Fig. S4).
Metabolic profiling analysis of MHUFA and MLUFA soybean seeds
To determine the unsaturated FA regulatory network at the seed development stage, a non-targeted metabolic profiling analysis was applied. There were 15 high unsaturated FA (HUFA) and 15 low unsaturated FA (LUFA) soybean varieties applied in this study (Additional file 1: Table S3). Multiple metabolites were detected using non-targeted metabolomics, including secondary metabolites, lipids, amino acids, and flavonoids.
To explore the differences in metabolites between different varieties, three comparison groups were studied: five high-unsaturated FA (FMHUFA) and five low-unsaturated FA (FMLUFA) varieties (FMHUFA vs. FMLUFA); 10 high-unsaturated FA (TMHUFA) and 10 low-unsaturated FA (TMLUFA) varieties (TMHUFA vs. TMLUFA); and 15 high-unsaturated FA (MHUFA) and 15 low-unsaturated FA (MLUFA) varieties (MHUFA vs. MLUFA). The OPLS-DA analysis showed that the model accurately described each sample and was suitable for subsequent analysis (Fig. 4A). According to the OPLS-DA model, 70 differentially abundant metabolites (DAMs) were upregulated, and 291 DAMs were downregulated in FMHUFA vs. FMLUFA (Fig. 4B). These metabolites included lipids, secondary metabolites, and unknown metabolites. In addition, 202 and 322 DAMs were identified in TMHUFA vs. TMLUFA and MHUFA vs. MLUFA, respectively (Fig. 4B). As shown in Fig. 4C, four common upregulated DAMs and 29 common downregulated DAMs were identified.
Differential accumulation of metabolites with MHUFA and MLUFA content
In this study, the metabolic changes of high and low unsaturated FA content in 30 soybean varieties during the R6 period were studied. In FMHUFA vs. FMLUFA, 29 DAMs were annotated into the KEGG pathway. Among them, the isoflavone pathway had the most DAMs, including Genistein, 8-C-glucosylnaringenin, genistin, and biochanin A (Additional file 1: Fig. S5A). In TMHUFA vs. TMLUFA, 16 DAMs were annotated into the KEGG pathway; among them, the TCA cycle had the most DAMs. 1-Pyrroline-4-hydroxy-2-carboxylate, 5-amino-6-ribitylamino uracil, and 2-(acetamidomethylene) succinate were differentially accumulated in TMHUFA vs. TMLUFA (Additional file 1: Fig. S5B). In MHUFA vs. MLUFA, 39 DAMs were annotated into the KEGG pathway, including the TCA cycle, LLA metabolism, and biosynthesis of amino acids. LysoPC (22:2(13Z,16Z)), (2S,5S)-trans-carboxymethylproline, and quercetin 3-sambubioside were differentially accumulated in MHUFA vs. MLUFA (Additional file 1: Fig. S5C).
Co‑expression analysis of candidate genes and DAM metabolites
Candidate genes and metabolite networks were analyzed. In the single-environment model, the co-expression network of 30 candidate genes and DAMs in three comparison groups was constructed. In the FHUFA vs. FLUFA network, the results indicated that the 75 subnetworks were significantly correlated (|r|> 0.5, p < 0.05). PE (22:5) was positively associated with Glyma.14G216100 (r > 0.51, p < 0.02), Glyma.03G040000 (r > 0.51, p < 0.01), Glyma.02G210300 (r > 0.51, p < 0.02), Glyma.17G236200 (r > 0.52, p < 0.01), Glyma.11G100600 (r > 0.59, p < 0.006), and Glyma.09G157500 (r > 0.68, p < 0.0007). Quinone was positively associated with Glyma.17G236700 (r > 0.53, p < 0.01) and Glyma.03G040400 (r > 0.67, p < 0.001) (Fig. 5A). In the THUFA vs. TLUFA network, 1-pyrroline-4-hydroxy-2-carboxylate was positively associated with Glyma.09G051900 (r > 0.88, p < 5.53E−14) and Glyma.03040400 (r > 0.88, p < 2.35E−14). Glyma.17G236700 was positively associated with 2-(acetamidomethylene) succinate (r > 0.51, p < 0.0007), 5-amino-6-ribitylamino uracil (r > 0.69, p < 6.85E−07), and 1-pyrroline-4-hydroxy-2-carboxylate (r > 0.69, p < 6.09E−07) (Fig. 5B). In the HUFA vs. LUFA network, 1-pyroline-4-hydroxy-2-carboxylate B was significantly associated with Glyma.03G040400 (r > 0.84, p < 3.69E−17) and Glyma.09G051900 (r > 0.85, p < 8.61E−18). Glyma.17G236700 was significantly associated with LysoPC(22:2(13Z,16Z)) (r > 0.64, p < 3.04E−08) and 1-pyrroline-4-hydroxy-2-carboxylate B (r > 0.64, p < 2.01E−08) (Fig. 5C).
In the multiple-environment QEI model, a co-expression network of nine candidate genes and DAMs was constructed for three comparison groups. In the FHUFA vs. FLUFA network, the 22 subnetworks were significantly correlated (|r|> 0.5, p < 0.05). Quinone was positively associated with Glyma.08G025100 (r > 0.88, p < 2.14E−07) and Glyma.17G236700 (r > 0.53, p < 0.01) (Fig. 6A). In the THUFA vs. TLUFA network, Glyma.18G205400 was positively associated with 5-amino-6-ribitylamino uracil (r > 0.90, p < 9.27E−16) and 1-Pyrroline-4-hydroxy-2-carboxylate (r > 0.92, p < 2.96E−17). Glyma.17G236700 was significantly associated with 2-(acetamidomethylene) succinate (r > 0.51, p < 0.0007), 1-pyrroline-4-hydroxy-2-carboxylate (r > 0.69, p < 6.09E−07), and 5-Amino-6-ribitylamino uracil (r > 0.69, p < 6.85E−07) (Fig. 6B). In the HUFA vs. LUFA network, Glyma.17G236700 was significantly associated with LysoPC (22:2(13Z,16Z)) (r > 0.64, p < 3.04E−08) and 1-pyrroline-4-hydroxy-2-carboxylate B (r > 0.649, p < 2.01E−08) (Fig. 6C).
Gene-based association and haplotype analysis of candidate genes
To further determine the relationship between candidate genes and traits, the SNPs of the candidate genes were applied for the gene-based association and haplotype analysis of the candidate genes. According to the results of candidate gene screening based on gene expression data from qRT-PCR and transcriptomics, Glyma.03G040400 and Glyma.17G236700, as the candidate genes of QTNs and overlapping SNPs of QTNs and QEIs, were studied to understand the gene variations affecting soybean unsaturated FAs and to further determine beneficial haplotypes. Three SNPs were found in the promoter and CDS regions of Glyma.03G040400 (Additional file 1: Table S4). SNP markers 39,188,954, 39,189,172, and 39,190,333 showed an association with LLA (Fig. 7A, Additional file 1: Table S4). Among the three haplotypes of Glyma.03G040400, Hap 3 and Hap 2 had a significantly higher LNA content than Hap 1 in 2013 and 2014 (Fig. 7B and C).
For candidate gene Glyma.17G236700, seven SNPs were found in the promoter and CDS region. Of these, SNP markers 5,048,564, 5,048,670, 5,048,842, 5,049,415, 5,049,422, 5,049,438, and 5,050,845 were significantly associated with LLA content in 2013 and 2014 (− log10(P) ≥ 2) (Additional file 1: Table S4, Fig. 7D). Four haplotypes of Glyma.17G236700 were defined by the seven SNPs (Fig. 7E and F). Among the four haplotypes, Hap 3 and Hap 4 had a significantly higher LNA content (2013 and 2014) than Hap 1 and Hap 2.
Discussion
Soybean is an important oil crop. However, different proportions of FAs may play an important role in soybean oil. Therefore, it is of great significance to improve the content and quality of soybean oil. The single locus method has been widely used to detect genetic variation in crops, including GLM and MLM [23, 24]. However, single-locus GWAS methods generally need Bonferroni correction and can be affected by a polygenic background. In this study, 194 soybean accessions were analyzed using the 3VmrMLM method (Additional file 1: Figure S1, Table 2). We identified 12, 10, and eight significant/suggested SNPs for OA, 17, 10, and seven significant/suggested SNPs for LA, and 10, eight, and 12 significant/suggested SNPs for LLA in 2013, 2014, and 2015, respectively (Table 2). In addition, we compared 3VmrMLM with a single-locus MLM method by Zhao et al. We detected 63 SNPs using the MLM method. Hence, the 3VmrMLM method detected more significant SNPs than the MLM method. Among these SNPs, four SNPs were found using the MLM and 3VmrMLM methods simultaneously, including rs4953186 rs52833743, rs35024325, and rs6481810, and the discovery of rs35024325 and rs6481810 SNPs has been reported [2, 22].
Environmental changes have an important impact on the quality and yield of crops; analysis of multiple environments can increase the detection capability of SNPs. In this study, six, five, and eight QEIs were found for OA, LA, and LLA, respectively (Fig. 2, Table 3). Among these SNPs, five have been reported [22, 25, 26]. A total of 1246 genes around the significant/suggested QTNs were predicted in this study; of them, 40 genes were involved in lipid synthesis (Additional file 1: Table S1). For example, the MYB transcription factor has been reported to affect oil accumulation [27]. The OsLTP gene is involved in the transport of lipid molecules in rice [28]. In this study, Glyma.03G040400 (GmLTP1), located on chromosome 3, was significantly related to LNA using the GLM method based on gene-based association (Additional file 1: Table S4). In addition, the GmLTP1 gene was a beneficial haplotype (Fig. 7).
A total of 301 genes around the significant/suggested QEIs were detected in this study (Additional file 1: Table S2). Three significant QEIs, namely rs4633292, rs39216169, and rs14264702, overlapped with significant single-environment QTNs. Among the overlapping SNPs, genes related to FA synthesis and seed development, such as ACBP and FTSH, were identified. ACBPs can play an important role in maintaining lipid homeostasis [29]. In addition, we found that the Glyma.17G236700 (ACBP) gene had a beneficial haplotype (Fig. 7).
Conclusion
The 3VmrMLM method was more comprehensive for GWAS. This method overcame the huge computational burden of traditional models. In this study, 94 QTNs and 19 QEIs were identified. Five major candidate genes were found. The gene expression data from different soybean tissues and transcriptome data were used to identify Glyma.03G040400 and Glyma.17G236700 as key candidate genes around the SNPs. The beneficial haplotypes of Glyma.03G040400 and Glyma.17G236700 may be helpful for further application in soybean breeding.
Methods
Plant materials, field trials, and phenotypic evaluation
An association panel of 194 soybean germplasm resources was planted at Harbin (162.41° E, 45.45° N) in 2013, 2014, and 2015. Field trials were conducted using single-row plots (2 m long and 0.65 m between rows) and a randomized complete block design with three replicates per experimental site. The unsaturated FA content of each sample was determined using gas chromatography (GC-14C, Shimadzu Company, Japan), according to our previous method [30]. The OA, LLA, and LNA content were applied in single-environment (QTN) and multi-environment (QEI) analyses.
Genotypic data
A genotypic dataset consisting of 36,981 SNPs from 194 soybean germplasm resources was generated by Specific-Locus Amplified Fragment Sequencing (SLAF-seq), which was reported in Han et al. and Zhao et al. [2, 31]. The 36,981 SNPs were distributed on 20 soybean chromosomes, with minor allele frequencies > 0.04 and missing data of < 10% (Fig. 8).
GWAS
The 36,981 SNPs and unsaturated FA content of 194 soybean accessions were used for association analysis via the 3VmrMLM method in 3VmrMLM software [21]. QTNs for OA, LLA, and LNA content were calculated from a single environment (3 years of phenotypic data from 2013, 2014, and 2015). QEIs for OA, LLA, and LNA content were calculated using a joint analysis of multiple environments. The threshold of significance for QTNs and QEIs was set at p = 0.05 and LOD score ≥ 3.0.
Differential expression analysis based on RNA-seq
At the R6 stage, 30 soybean varieties with a high content of three unsaturated FAs and a low content of three unsaturated FAs were collected for RNA sequencing (RNA-seq) with two biological replicates. Total RNA was extracted using TRIzol reagent (Invitrogen, Carlsbad, CA, USA). The cDNA sequencing libraries of 18 RNA samples were constructed and sequenced, and RNA-seq data were generated using the Illumina platform. Differentially expressed genes (DEGs) were identified using the edgeR package in R software [32]. The significance level was set as follows: |log2(fold change)|≥ 1.
Identification of candidate genes
The 100-kb flanking region of each identified QTN and QEI was defined to search for candidate genes according to linkage disequilibrium decay analysis, as described in Zhao et al. [2]. Candidate genes for unsaturated FAs were extracted in the following steps. According to previous reports, known genes related to FA content in Arabidopsis were considered references to screen their homologous genes in the soybean genome. The new candidate genes were identified using DEGs for unsaturated FAs.
Metabolite profiling
The non-targeted metabolome was completed by Bioacme Biotechnology Co., Ltd. (Wuhan, China). Briefly, a 100 mg soybean sample was loaded into a 2-mL centrifuge tube, and 300 μL 75% methanol/water was added. The tubes were centrifuged at 12,000 rpm for 15 min at 4 °C. Metabolites were screened and identified using the Metlin database. The differential metabolites were calculated using an orthogonal partial least squares-discriminant analysis (OPLS-DA) model, with a variable importance in the projection (VIP) score of ≥ 1 and a |log2 (fold change)| of ≥ 1.
Haplotype analysis and gene-based association analysis of candidate genes
The SNP variation of candidate genes was analyzed based on genome sequencing data. These SNPs were located at the full length of the gene, including exons, intronic regions, and upstream and downstream of the gene. Therefore, in this study, phenotypic data, including high and low total unsaturated FA content, from 50 soybean germplasm resources were used over 3 years to conduct an association analysis. A general linear model (GLM) was used to further determine the association between the SNP variation of candidate genes and unsaturated FA content using TASSEL software [33]. Significant SNP variation in candidate genes was considered when the P value was less than 0.01.
Quantitative real-time PCR (qRT-PCR)
Soybean seeds with high and low unsaturated FA content were collected at the R6 stage. Total RNA was extracted using the TRIzol method, and cDNA was generated using the ReverTra Ace qPCR RT Master Mix (TOYOBO, Osaka, Japan). Real-time quantitative PCR (qRT-PCR) was performed on an ABI 7500 fast real-time PCR platform with SYBR Green (TOYOBO, Osaka, Japan). GmACTIN4 was used as an internal control, and the primer sequences for candidate genes are listed in Additional file 1: Table S5. The L-13 soybean seed samples were used as a calibrator. The results of qRT-PCR were calculated using the 2−ΔΔCT method [34].
Co-expression analysis
The correlation coefficient was calculated between candidate genes and DAM metabolites, and a Pearson correlation cutoff value of 0.5 was generated. Data were visualized using the Cytoscape package [35].
Statistical analysis
Statistical significance was evaluated using Student’s t-test performed with SPSS 22.0 software (IBM Corp., Armonk, NY, USA). “*” and “**” represent a significance level of p < 0.05 and p < 0.01, respectively. The mean and standard deviation (mean ± SD) were calculated using the data from three biological replicates.
Availability of data and materials
All data generated or analyzed during this study are included in this published article and its additional files.
References
Tunde-Akintunde T, Olajide J, Akintunde B. Mass-volume-area. Related and mechanical properties of soybean as a function of moisture and variety. Int J Food Prop. 2005;8:449–56. https://doi.org/10.1080/10942910500267513.
Zhao X, Jiang H, Feng L, Qu Y, Teng W, Qiu L, Zheng H, Han Y, Li W. Genome-wide association and transcriptional studies reveal novel genes for unsaturated fatty acid synthesis in a panel of soybean accessions. BMC Genomics. 2019;20:68. https://doi.org/10.1186/s12864-019-5449-z.
Liu X, Qin D, Piersanti A, Zhang Q, Miceli C, Wang P. Genome-wide association study identifies candidate genes related to oleic acid content in soybean seeds. BMC Plant Biol. 2020;20:399. https://doi.org/10.1186/s12870-020-02607-w.
Ascherio A, Willett W. Health effects of trans fatty acids. Am J Clin Nutr. 1997;66:1006–10. https://doi.org/10.1093/ajcn/66.4.1006S.
Di Q, Piersanti A, Zhang Q, Miceli C, Li H, Liu X. Genome-wide association study identifies candidate genes related to the linoleic acid content in soybean seeds. Int J Mol Sci. 2021;23:454. https://doi.org/10.3390/ijms23010454.
Marangoni F, Agostoni C, Borgh C, Catapano A, Cena H, Ghiselli A, La Vecchia C, Lercker G, Manzato E, Pirillo A, Riccardi G, Risé P, Visioli F, Poli A. Dietary linoleic acid and human health: focus on cardiovascular and cardiometabolic effects. Atherosclerosis. 2020;292:90–8. https://doi.org/10.1016/j.atherosclerosis.2019.11.018.
Flores T, Karpova O, Su X, Zeng P, Bilyeu K, Sleper D, Nguyen H, Zhang Z. Silencing of GmFAD3 gene by siRNA leads to low alpha-linolenic acids (18:3) of fad3-mutant phenotype in soybean [Glycine max (Merr.)]. Transgenic Res. 2008;17:839–50. https://doi.org/10.1007/s11248-008-9167-6.
Al Amin N, Ahmad N, Wu N, Pu X, Ma T, Du Y, Bo X, Wang N, Sharif R, Wang P. CRISPR-Cas9 mediated targeted disruption of FAD2–2 microsomal omega-6 desaturase in soybean (Glycine max L.). BMC Biotechnol. 2019;19:9. https://doi.org/10.1186/s12896-019-0501-2.
Zhu C, Gore M, Buckler E, Yu J. Status and prospects of association mapping in plants. Plant Genome. 2008;1:5–20. https://doi.org/10.3835/plantgenome2008.02.0089.
Zhang X, Guan Z, Li Z, Liu P, Ma L, Zhang Y, Pan L, He S, Zhang Y, Li P, Ge F, Zou C, He Y, Gao S, Pan G, Shen Y. Combination of linkage mapping and GWAS brings new elements on the genetic basis of yield-related traits in maize across multiple environments. Theor Appl Genet. 2020;133:2881–95. https://doi.org/10.1007/s00122-020-03639-4.
Li H, Peng Z, Yang X, Wang W, Fu J, Wang J, Han Y, Chai Y, Guo T, Yang N, Liu J, Warburton M, Cheng Y, Hao X, Zhang P, Zhao J, Liu Y, Wang G, Li J, Yan J. Genome-wide association study dissects the genetic architecture of oil biosynthesis in maize kernels. Nat Genet. 2013;45:43–50. https://doi.org/10.1038/ng.2484.
Zhou X, Stephens M. Genome-wide efficient mixed-model analysis for association studies. Nat Genet. 2012;44:821–4. https://doi.org/10.1038/ng.2310.
Yu J, Pressoir G, Briggs W, Vroh B, Yamasaki M, Doebley J, McMullen M, Gaut B, Nielsen D, Holland J, Kresovich S, Buckler E. Unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet. 2006;38:203–8. https://doi.org/10.1038/ng1702.
Kang H, Zaitlen N, Wade C, Kirby A, Heckerman D, Daly M, Eskin E. Efficient control of population structure in model organism association mapping. Genetics. 2008;178:1709–23. https://doi.org/10.1534/genetics.107.080101.
Lippert C, Listgarten J, Liu Y, Kadie C, Davidson R, Heckerman D. FaST linear mixed models for genome-wide association studies. Nat Methods. 2011;8:833–94. https://doi.org/10.1038/Nmeth.1681.
Wang S, Feng J, Ren W, Huang B, Zhou L, Wen Y, Zhang J, Dunwell J, Xu S, Zhang Y. Improving power and accuracy of genome-wide association studies via a multi-locus mixed linear model methodology. Sci Rep. 2016;6:19444. https://doi.org/10.1038/srep19444.
Liu X, Huang M, Fan B, Buckler E, Zhang Z. Iterative usage of fixed and random effect models for powerful and efficient genome-wide association studies. PLoS Genet. 2016;12: e1005767. https://doi.org/10.1371/journal.pgen.1005767.
Ren W, Wen Y, Dunwell J, Zhang Y. pKWmEB: integration of Kruskal-Wallis test with empirical bayes under polygenic background control for multi-locus genome-wide association study. Heredity (Edinb). 2018;120:208–18. https://doi.org/10.1038/s41437-017-0007-4.
Wen Y, Zhang H, Ni Y, Huang B, Zhang J, Feng J, Wang S, Dunwell J, Zhang Y, Wu R. Methodological implementation of mixed linear models in multi-locus genome-wide association studies. Brief Bioinform. 2018;19:700–12. https://doi.org/10.1093/bib/bbw145.
Zhang J, Feng J, Ni Y, Wen Y, Niu Y, Tamba C, Yue C, Song Q, Zhang Y. pLARmEB: integration of least angle regression with empirical bayes for multilocus genome-wide association studies. Heredity (Edinb). 2017;118:517–24. https://doi.org/10.1038/hdy.2017.8.
Li M, Zhang Y, Xiang Y, Liu M, Zhang Y. IIIVmrMLM: the R and C++ tools associated with 3VmrMLM, a comprehensive GWAS method for dissecting quantitative traits. Mol Plant. 2022;15:1251–3. https://doi.org/10.1016/j.molp.2022.06.002.
Bachlava E, Dewey R, Burton J, Cardinal A. Mapping and comparison of quantitative trait loci for oleic acid seed content in two segregating soybean populations. Crop Sci. 2009;49:433–42.
Zhong H, Liu S, Sun T, Kong W, Deng X, Peng Z, Li Y. Multi-locus genome-wide association studies for five yield-related traits in rice. BMC Plant Biol. 2021;21:364. https://doi.org/10.1186/s12870-021-03146-8.
Reinprecht Y, Poysa V, Yu K, Rajcan I, Ablett G, Pauls K. Seed and agronomic QTL in low linolenic acid, lipoxygenase-free soybean (Glycine max (L.) Merrill) germplasm. Genome. 2006;49:1510–27.
Kim H, Kim Y, Kim S, Son B, Choi I. Analysis of quantitative trait loci (QTLs) for seed size and fatty acid composition using recombinant inbred lines in soybean. J Life Sci. 2010;20:1186–92. https://doi.org/10.5352/JLS.2010.20.8.1186.
Hyten D, Pantalone V, Saxton A, Schmidt M, Sams C. Molecular mapping and identification of soybean fatty acid modifier quantitative trait loci. J Amer Oil Chem Soc. 2004;81:1115–8. https://doi.org/10.1007/s11746-004-1027-z.
Shi M, Yu L, Shi J, Liu J. A conserved MYB transcription factor is involved in regulating lipid metabolic pathways for oil biosynthesis in green algae. New Phytol. 2022;235:576–94. https://doi.org/10.1111/nph.18119.
Chen L, Ji C, Zhou D, Gou X, Tang J, Jiang Y, Han J, Liu Y, Chen L, Xie Y. OsLTP47 may function in a lipid transfer relay essential for pollen wall development in rice. J Genet Genomics. 2022;49:481–91. https://doi.org/10.1016/j.jgg.2022.03.003.
Hamdan M, Lung S, Guo Z, Chye M. Roles of acyl-coA-binding proteins in plant reproduction. J Exp Bot. 2022;73:2918–36. https://doi.org/10.1093/jxb/erab499.
Xie D, Han Y, Zeng Y, Chang W, Teng W, Li W. SSR- and SNP-related QTL underlying linolenic acid and other fatty acid contents in soybean seeds across multiple environments. Mol Breed. 2012;30:169–79.
Han Y, Zhao X, Liu D, Li Y, Lightfoot D, Yang Z, Zhao L, Zhou G, Wang Z, Huang L, Zhang Z, Qiu L, Zheng H, Li W. Domestication footprints anchor genomic regions of agronomic importance in soybeans. New Phytol. 2016;209:871–84. https://doi.org/10.1111/nph.13626.
Robinson M, McCarthy D, Smyth G. edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26:139–40. https://doi.org/10.1093/bioinformatics/btp616.
Bradbury P, Zhang Z, Kroon D, Casstevens T, Ramdoss Y, Buckler E. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 2007;23:2633–5. https://doi.org/10.1093/bioinformatics/btm308.
Price A, Patterson N, Plenge R, Weinblatt M, Shadick N, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38:904–9. https://doi.org/10.1038/ng1847.
He L, Xiao J, Rashid K, Yao Z, Li P, Jia G, Wang X, Cloutier S, You F. Genome-wide association studies for pasmo resistance in Flax (Linum usitatissimum L.). Front Plant Sci. 2019;9:1982. https://doi.org/10.3389/fpls.2018.01982.
Panthee D, Pantalone V, Saxton A. Modifier QTL for fatty acid composition in soybean oil. Euphytica. 2006;152:67–73. https://doi.org/10.1007/s10681-006-9179-3.
Acknowledgements
We thank LetPub (www.letpub.com) for its linguistic assistance during the preparation of this manuscript.
Funding
This study was financially supported by Heilongjiang Provincial Project (ZD2022C002, JD22A015), the National Key Research and Development Project of China (2021YFD1201604, 2021YFF1001204, 2021YFD1201103), the Chinese National Natural Science Foundation (U22A20473), the Youth Leading Talent Project of the Ministry of Science and Technology in China (2015RA228), the National Ten-thousand Talents Program, The national project (CARS-04-PS07). Young leading talents of Northeast Agricultural University (NEAU2023QNLJ-003). The funding bodies had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
XZ (Xunchao Zhao) performed the experiments and prepared the manuscript. YZ (Yuhang Zhan), KL and YZ (Yan Zhang) prepared the samples. CZ, MY, ML and YL analyzed the data. PZ, YH and XZ (Xue Zhao) conceived the experiments and revised the manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
The authors have consented for publication.
Competing interests
The authors declare that they have no competing interests. All authors agree to authorship and approved the final manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1: Table S1.
QTNs identified for unsaturated fatty acids content using the QTN detection model in 3VmrMLM. Table S2. QEIs identified for unsaturated fatty acids content three environments detected using the QTN-by-environment detection model in 3VmrMLM. Table S3. Unsaturated fatty acids content of the 30 soybean varieties. Table S4. The association between SNP in Glyma.17G23670 and Glyma.03G040400 gene and soybean unsaturated fatty acids content based on 50 soybean germplasms. Table S5. Primers used for qRT-PCR. Figure S1. Manhattan plots of the single-environment analysis for the oleic, linoleic and linolenic traits in 2013, 2014 and 2015 of soybean. Figure S2. A and B KEGG pathway annotation around QTN and QEI candidate genes, respectively. Figure S3. Candidate genes are identified in the transcriptome and (A): QTN detection model, (B): QTN-by-environment detection model. Figure S4. Analysis of candidate genes by qRT-PCR. Figure S5. Differential accumulation of metabolite in the three comparison groups.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Zhao, X., Zhan, Y., Li, K. et al. Multi-omics analysis reveals novel loci and a candidate regulatory gene of unsaturated fatty acids in soybean (Glycine max (L.) Merr). Biotechnol Biofuels 17, 43 (2024). https://doi.org/10.1186/s13068-024-02489-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13068-024-02489-2