Introduction

Soybean [Glycine max (L.) Merri] is the major oilseed crop in the world (Yaklich and Vinyard 2004). Fatty acids are predominant components of soybean oil. Fatty acids consist of saturated fatty acids (palmitic acid and stearic acid) and unsaturated fatty acids (oleic acid, linoleic acid, and linolenic acid). Different soybean fatty acids participate in different physiological functions (Mostofsky et al. 2001). Therefore, different compositions of fatty acids are desired depending on the end uses of the soybean oil (Panthee et al. 2006). For instance, vegetable oil with high concentration of unsaturated fatty acids is preferred in human diet for health reason. For industrial applications, however, soybean oil with high concentration of saturated fatty acids is more suitable due to its stability to oxidation (Henderson 1991; Hu et al. 1997; Spencer et al. 2003). Unsaturated fatty acids could decrease detrimental cholesterol in blood and lower the risk of cardiovascular disease (Mensink and Katan 1992; Connor 2000). Nevertheless, the polyunsaturated fatty acids, particularly linolenic acid, are prone to oxidation by lipoxygenase isozymes (Hildebrand et al. 1993) and negatively affect the flavor and shelf-life of soybean products (Robinson et al. 1995). Therefore, the improvement of fatty acid composition and the increase of oxidative stability have become to major goals of soybean breeding program for decades (Ha et al. 2010; Oliva et al. 2006).

In recent years, QTL mapping for main fatty acid components and identifying molecular markers closely linked to specific fatty acids had been performed in soybean breeding programs. In these studies, simple sequence repeat marker (SSR) (Bachlava et al. 2009; Panthee et al. 2006), single nucleotide polymorphism marker (SNP) (Wang et al. 2012a, b; Xie et al. 2012), and other types of marker (Diers and Shoemaker 1992; Li et al. 2011; Reinprecht et al. 2006) were employed in soybean genetic linkage map construction with different types of mapping populations [e.g. recombinant inbred lines (RILs), doubled haploid lines (DHs), and backcross population (BC)]. Based on these genetic maps, preliminary mapping of QTLs for the fatty acid concentrations was conducted. To date, 165 QTLs for individual fatty acid components were recorded in USDA-ARS Soybean Genetics and Genomics Database (SoyBase, http://soybase.org) according to previous studies (Alrefai et al. 1995; Bachlava et al. 2009; Brummer et al. 1997; Diers and Shoemaker 1992; Hyten et al. 2004a, b; Kim et al. 2010; Li et al., 2011; Shibata et al. 2008; Spencer et al. 2003; Panthee et al. 2006; Reinprecht et al. 2006; Wang et al. 2012a, b; Xie et al. 2012). These QTLs are useful for MAS in breeding program to alter soybean fatty acid composition.

Although numerous QTLs associated with fatty acid components have been identified, low repeatability and stability, as well as inconvenience of integration among numerous QTLs are still problems for their application due to the significant differences among various genetic backgrounds and environments. Therefore, it is necessary to identify and validate stable QTLs for individual fatty acid components across multiple environments. In this study, RIL populations derived from a cross of cv. Luheidou 2 × Nanhuizao were planted in Beijing over 2009–2011, and were used to identify QTLs underlying individual fatty acid concentrations. Amounts of consistent additive QTLs were identified across multiple environments. Moreover, the epistatic interactions of QTLs underlying fatty acids were also analyzed. Taken together, this study provided some new knowledge on the genetic basis of soybean fatty acid composition.

Materials and methods

Plant materials and field design

The mapping populations were initially developed from a cross between cv. Luheidou2 (distributed in HuangHuaiHai valley region) and Nanhuizao (distributed in South region of China). The RIL population was advanced by using single-seed descent of F2 lines up to F5 and then establishing 200 F5:7 and F5:8 populations. From 2009 to 2011, the 200 RILs together with their parents were planted at Shunyi Experimental Stations (N40°13′ and E116°34′) in Beijing. The three environments of 2009, 2010 and 2011 were designated as E1, E2 and E3, respectively. In this study, the F5:7 and F5:8 population were planted in rows of 2 m long, 0.5 m apart and with a space of 0.1 m between two plants. Three replicates were conducted with a randomized complete block design.

Fatty acid extraction and determination

The procedure was followed as described by Kamal-Eldin and Andersson (1997). About 20 g soybean seeds of each RIL were ground to fine powder with a Sample Preparation Mills (Retsch ZM100, Φ = 1.0 mm, Rheinische, Germany) and stored at −20 °C before used. Three hundred milligrams of each soybean powder was weighted with analytic balance (SartoriusBS124S) and transferred to a 2-ml centrifuge tube preloaded with 1.5 ml n-hexane. After mixing, the mixture was stored at 4 °C for 12 h. Then the samples were centrifuged at 5000×g for 10 min. The supernatant was collected into a new 2-ml centrifuge tube with 350 µl sodium methoxide solution and shaken for 1 h on the twist mixer (TM-300, ASONE, Japan) for full methyl esterification. Then the mixture was centrifuged at 5000×g for 10 min again and supernatant was collected to detect the fatty acid composition (Kamal-Eldin and Andersson 1997).

Fatty acid composition was determined using gas chromatography (GC-2010, SHIMADZU, Japan). Operation instrumental conditions were as follows (Ma et al. 2015): chromatographic column RTX-Wax (30 m × 0.25 × 0.25); auto injection l µl; split ratio 40:1; injection port temperature 250 °C;the carrier gas nitrogen 54 ml min−1; hydrogen 40 ml min−1; air 400 ml min−1; take the temperature programmed mode (180 °C keep for 1.5 min, up to 210 °C by 10 °C min−1 and keep 2 min, up to 220 by 5 °C min−1and 5 min). Detector FID temperature was 300 °C. The area normalization method was employed to calculate the percentage of five predominant fatty acids (palmitic acid, stearic acid, oleic acid, linoleic acid and linolenic acid) on GC2010 workstation (Shimadzu, Japan).

Polymorphic SSR marker selection and genotyping for RILs

One hundred F5:7 RILs were selected randomly from the 200 F5:7 RILs. For each line of the 100 RILs, the young leaves collected from approximately 10 seedlings were ground to powder in liquid nitrogen with mortar and pestle. Total genomic DNA of each line was extracted separately using the CTAB method (Doyle 1990). The resultant DNA was dissolved in 200 µl ddH2O, and quantified with NanoDrop ND-100 spectrophotometer. Then, the genomic DNA of each sample was diluted to a concentration of 100 ng μl−1, and used in the subsequent genotyping analysis.

In this study, a total of 530 pairs of SSR primers selected from the SoyBase (http://soybase.org) were synthesized in Shanghai ShengGong biological engineering technology service co., LTD. The polymorphism of these SSR markers was tested between two parents of RIL population. Towards the end, 161 polymorphic SSR markers were identified. The polymorphic markers were used in genotyping for 100 RILs. Polymerase chain reaction (PCR) was performed as follow: 94 °C for 3 min followed by 35 cycles at 94 °C for denaturation for 30 s; 47–55 °C for annealing for 30 s; 72 °C for 45 s for extension, and the last step at 72 °C for final extension for 10 min.

Construction of the genetic map and data analysis

The genotyping of 100 RILs was performed based on the band type of the polymorphic SSR markers. The maternal band type was marked with ‘2’, the paternal band type was marked with ‘0’, the hybrid band type was marked with ‘1’ and the missing ones marked with ‘−1’.

Construction of the genetic map was performed using the MAP model in the software of QTL IciMapping v3.3 (Cui et al. 2011). According to concentrations of five main fatty acid components in soybean seeds, the QTLs were detected for palmitic acid, stearic acid, oleic acid, linoleic acid and linolenic acid using inclusive composite interval mapping (ICIM) method in BIP model of QTL IciMapping v3.3. The threshold of LOD scores for evaluating the statistical significance of QTL effects was determined using 1000 permutations. Based on these permutations, a LOD score of 3.1 was used as a minimum to declare the presence of a QTL in genomic region. The epistatic effect of QTL was analyzed by ICIM-EPI in BIP model of QTL IciMapping v3.3 (P value <0.0005; LOD >5.0). This method has been described in detail by Li et al. (2008), and applied to epistatic QTL mapping for amounts of essential traits (Ding et al. 2014; Wang et al. 2012a, b). The genes within QTLs were annotated and analyzed via the database of Phytozome v9.1 (www.phytozome.net) and NCBI (www.ncbi.nlm.nih.gov).

Analysis of variance (ANOVA) was performed to determine the significance of genotypic differences between the RILs and environments. Correlation analysis among five components of fatty acid in soybean seeds was conducted using CORR procedure of SAS version 9.2 and the frequency distribution was analyzed by Microsoft Excel 2013.

Results

Phenotypic analysis of fatty acid in soybean seeds

The determination of fatty acid composition was performed by gas chromatography analysis. As a result, five predominant fatty acid components including stearic acid, palmitic acid, oleic acid, linoleic acid, and linolenic acid were identified and quantified in soybean seeds. The characters of fatty acid compositions of the RIL population were analyzed across multiple environments. As shown in Table 1, the two parents of RILs exhibited significant differences for almost all fatty acids except palmitic acid, implying different genetic backgrounds between them. Moreover, the RILs exhibited a broad range of variation in fatty acid concentrations. Of the five predominant fatty acid components, linoleic acid presented a minimum coefficient of variation ranging from 4.0 to 6.1 %, while stearic acid showed a maximum coefficient of variation ranging from 10.2 to 18.4 % (Table 1). According to the Kolmogorov–Smirnov test, the frequency distributions of five fatty acid components across various environments generally exhibited in a continuous and normal manner except the oleic acid distribution in E3 (Fig. S1 and Table S1), suggesting the fatty acid concentrations are typical quantitative traits and suitable for QTL mapping. Noticeably, transgression segregations for individual fatty acid concentration were also observed in the RIL population (Fig. S1), which suggests that the favorable alleles for fatty acid concentrations were derived from both parents.

Table 1 The statistical analysis of fatty acid components in the RIL population

Consistent with previous studies (Dornbos and Mullen 1992; Hou et al. 2006), the concentrations of almost all fatty acid components were significantly influenced by both genetic and environmental factors (Table S2). Nevertheless, the significant correlations of fatty acid concentrations among various years (Table S3) suggested genetic effect still plays the most important role in soybean fatty acid composition despite of the environmental effect.

QTL mapping of fatty acid components in soybean

Based on 161 polymorphic SSR markers, a soybean genetic linkage map was constructed. The genetic map spanned 3591.2 cM in genome sized with an average distance of 22.3 cM between adjacent markers. With this map, QTL mapping was conducted for the fatty acid concentrations. Finally, a total of 52 QTLs were identified on 19 linkage groups (LGs) except Gm04, which could explain 5–40 % of the phenotypic variation for the individual fatty acid concentrations in soybean seeds, with the LOD scores ranging from 3.1 to 16.0. Of these loci, 35 QTLs were detected in single environment, including nine QTLs for linolenic acid, eight for linoleic acid, one for oleic acid, six for stearic acid, and thirteen for palmitic acid (Table S4). The other 17 QTLs were mapped to 10 LGs accounting for specific fatty acid components across multiple environments or underlying multiple fatty acid components (Fig. 1; Table 2). In consideration of the environmental effect on fatty acid composition, these 17 consistent QTLs may represent the major genetic basis for fatty acid composition, thereby were focused subsequently.

Fig. 1
figure 1

Consistent additive QTLs associated with specific fatty acids across multiple environments or associated with multiple fatty acids. The orange regions indicate the location of these QTLs on LGs. The name of the QTL, shown near their location in orange color, is a composite of the influenced trait: palmitic acid (PA), stearic acid (SA), oleic acid (OA), linoleic acid (LA) and linolenic acid (LNA) followed by the chromosome number. For QTL underlying multiple fatty acid components, the name is a composition of fatty acid (FA) followed by the chromosome number. The SSR markers are shown on the right of LGs, and their positions on LGs are indicated on the left in centimorgan (cM). (Color figure online)

Table 2 QTLs detected underlying specific fatty acids across various environments or underlying multiple fatty acids

Of the 17 consistent QTLs, eight QTLs accounted for specific fatty acid components across various environments (Table 2). Specifically, four QTLs (qLNA3-1, qLNA6-1, qLNA7-1, and qFA8-1) contributed to linolenic acid concentration, accounting for 7–23 % of the phenotypic variation. The favorable alleles of qLNA6-1 and qFA8-1 were inherited from the paternal parent cv. Nanhuizao, whereas, the favorable alleles of qLNA3-1 and qLNA7-1 were derived from the maternal parent cv. Luheidou2. Noticeably, qLNA6-1 could stably explain the a relative high phenotypic variation for linolenic acid concentration across three environments (with a mean of 23 %), with the average LOD score of 8.4, suggesting this QTL may be a major and stable locus for linolenic acid concentration. Three QTLs (qSA6-1, qSA13-1 and qSA20-1) were associated with stearic acid concentration, accounting for 12 to 24 % of phenotypic variation. The favorable alleles of qSA6-1, qSA13-1 were derived from cv. Nanhuizao, while the favorable alleles of qSA20-1 were derived from cv. Luheidou2. In addition, Nanhuizao-derived qPA18-1 explained 10 % of the phenotypic variation for palmitic acid concentration (Table 2).

On the other hand, of the 17 consistent QTLs, 10 were associated with multiple fatty acid components (Table 2). They could explain 5–40 % of phenotypic variation, with the LOD value from 3.1 to 16.0. Of these loci, qFA3-1, qFA7-1, qFA7-2 and qFA8-1 accounted for both linolenic and stearic acids. qFA9-1 and qFA12-1 were associated with both linoleic and stearic acids. qFA10-1 and qFA10-2 were detected for both oleic and linoleic acids. qFA3-2 contributed to oleic, linoleic, and stearic acids. Additionally, qFA20-1 was detected for palmitic and stearic acids (Table 2).

Comparison of QTLs underlying fatty acid components

Subsequently, these QTLs were compared with previous studies. As shown in Table S4, of the all 52 QTLs, 37 were overlapped with the known QTLs for fatty acid concentration according to the Soybase Database. The other 15 QTLs, explaining 5–24 % of phenotypic variation for individual fatty acids, were novel loci (Table S4). For the 17 QTLs detected across multiple environments or associated with multiple fatty acids, 15 were reported previously, while two novel loci were observed (Table S4). qSA20-1 contributed to stearic acid across multiple environments, while qFA12-1 was associated with both stearic and linoleic acid (Table 2).

In addition, with the completion of whole genome sequencing for soybean genome, genomic comparative analysis for QTL intervals could also be conducted. Therefore, the genomic information within the 52 QTL intervals was obtained. Toward the end, five QTL intervals (i.e. qLNA3-1, qFA7-2, qLA2-1, qLA11-1 and qLA14-1) were found containing genes involved in fatty acid biosynthesis or accumulation (Table S4). These genes (GmFAD2-2, GmFAD3-2b, GmSAD1, GmSAD2, and GmSACPD-C) are responsible for the introduction of double bonds into fatty acyl chains, following the removal of two hydrogen atoms, and thereby are essential for biosynthesis of unsaturated fatty acid (Ohlrogge and Browse 1995; Fofana et al. 2004). The inclusion of these genes within QTL intervals suggested a relatively high accuracy of QTL mapping in this study.

Epistatic QTLs associated with fatty acid components

Finally, the epistatic interactions for fatty acid components were also analyzed. As a result, 25 pairwise QTLs with epistatic interactions were identified for five fatty acids among three environments (Fig. 2). These epistatic QTLs were mapped to almost all LGs except Gm02 and Gm19, and explained phenotypic variation between 7 and 29 % (Table 3). Nearly half of these epistatic QTLs were involved in linolenic acid, while that associated with saturated fatty acids were less. The high phenotypic variation explained by epistatic QTLs suggested the fatty acids, especially linolenic acid, were significantly influenced by epistatic interactions of QTLs. Unfortunately, the stable QTL × QTL interaction was not detected across multiple environments in this study. This result suggested the fatty acids were also affected by epistatic QTLs and the epistatic interactions were considerably complex.

Fig. 2
figure 2

The epistatic interactions of QTLs for individual fatty acids in soybean seeds among different environments. The figure was generated by ICIM-EPI in the QTL IciMapping v3.3 software using multi-environmental functionality. The 20 colors in the ring represent the soybean 20 chromosomes. The numbers in the ovals indicate the positions of markers on chromosomes. The different colors of dash lines connecting the loci represent the epistatic interaction between the two loci for different fatty acids in specific environment. The numbers on the dash lines indicate the LOD scores of the additive × additive effects between two QTLs. (Color figure online)

Table 3 Epistatic QTLs for the fatty acid components in soybean seeds among multiple environments

Discussions

Soybean is the leading oilseed crop in the world. The fatty acid composition determines the quality and nutritional value of soybean oil. Different soybean fatty acid compositions are desired depending on the end uses of the soybean oil. However the soybean fatty acid composition was regulated by a complex network including genetic and environmental factors.

Fatty acids were influenced by both genetic and environmental factors

Soybean fatty acid concentrations are indicated as quantitatively inherited characters. They are subjected to an oscillation even in the same population under different environments (Hou et al. 2006). Therefore, multiple environments should be considered in QTL mapping. On the other hand, a high heritability in the broad-sense was observed for fatty acid components despite of the environmental influence (Gesteira et al. 2003), indicating their phenotypic variations are mainly controlled by genetic factor. In this study, QTL mapping was conducted in a RIL population over three environments. As expected, QTLs underlying specific fatty acids varied with different environments. However, 17 consistent additive QTLs were identified underlying specific fatty acids or underlying multiple fatty acids. These loci may represent the genetic basis of fatty acid biosynthesis.

The comparative analysis suggested stable and novel QTLs for fatty acid concentrations

The comparative analysis of QTLs suggested a consistency between our results and previous studies. A great part of QTLs identified in this study have been reported previously, suggesting a reliability and accuracy of our study to some extent. As to the 17 consistent QTLs underlying specific fatty acids across multiple environments or underlying multiple fatty acids, 15 of them have been reported. These loci are stable for fatty acid concentrations among multiple mapping populations, thereby may represent the common genetic basis for fatty acid composition. On the other hand, the 15 novel QTLs, especially the two novel consistent QTLs (qSA20-1 and qFA12-1), provided new QTL information for fatty acid composition, therefore, should be emphasized in subsequent study.

By comparing the genomic regions of QTLs with soybean reference genomic sequences, five essential genes accounting for biosynthesis of unsaturated fatty acids were discovered within five QTL intervals. This result implied that these genes may represent the genetic basis of related QTLs, although lots of work should be done in further study. In addition, a bZIP-transcription-factor-like gene (Glyma20g01030.1) was discovered within the genome interval of qSA20-1, which explained a great part of phenotypic variation for stearic acid concentration. It is reported the bZIP transcription factor (bZIP67) could regulate the Omega-3 fatty acid concentration by activating FAD3 (Mendes et al. 2013). Therefore, this gene should be paid more attention in our further study.

The fatty acid concentrations were influenced significantly by epistatic interaction of QTLs

Genetic variation may be attributed by epistatic interaction of QTLs. Significant epistatic effect have been reported in soybean for yield (Lark et al. 1995), plant height (Orf et al. 1999), isoflavone contents (Gutierrez-Gonzalez et al. 2011) and fatty acid components (Li et al. 2011). In this study, the epistatic QTLs were also detected on almost all LGs, and explained a high phenotypic variation. Interestingly, half of these epistatic QTLs contributed to linolenic acid. This result, combined with previous studies, demonstrated fatty acids, especially linolenic acid, were significantly influenced by QTL × QTL interaction. However, most of these epistatic interactions of QTLs were detected in single environment. Stable epistatic interaction across multiple environments was not detected in this study. That may be due to the limited population size and marker number in this study. Moreover, epistatic QTL × environment interaction for fatty acids should also been considered in the subsequent study.

Taken together, the epistatic interaction for fatty acid components, combined with additive QTLs, suggested a complex network controlling fatty acid composition. That made it difficult to apply MAS for improving fatty acid composition in soybean breeding immediately. However, our result provided the solid evidence for the involvement of epistatic QTLs in fatty acid accumulation.

The limitation of efficiency and accuracy for QTL detection

The efficiency of QTL identification is determined by numerous factors such as the population size, the density of polymorphic markers, the mapping algorithm, etc. (Darvasi et al. 1993; Li et al. 2010; Stange et al. 2013; Yu et al. 2011). In this study, although 100 RILs are sufficient to detect QTL explaining a relatively high phenotypic variation (PVE >10 %) with the existing marker density (Li et al. 2010), the QTLs with minor effect could be missed. For instance, only 17 consistent QTLs were identified for five predominant fatty acids in this study. Moreover, the fine mapping of these QTLs or MAS for fatty acid composition is also infeasible due to the large distance within QTL interval. Therefore, a large population size and high density of polymorphic markers were required for further study to detect more consistent QTLs and narrow down the QTL confident intervals.