Background

Soy food has been taken as a functional food because it contains many health beneficial molecules such as isoflavones [1]. Studies on human nutrition have shown that soybean isoflavones play an important role in preventing a number of chronic diseases [2, 3]. Equally, isoflavones are critical factors in defending soybean crops against pests [4, 5], in promoting nodulation by rhizobia [6], and in changing or adjusting the microorganisms around plant roots [7]. The major bioactive components of soybean isoflavones in human nutrition are daidzein (DZ), genistein (GT) and glycitein (GC). Isoflavone contents in soybean seed are inherited as complex quantitative traits [811]. Since soy seed isoflavones are regulated by multiple genetic factors, their concentrations in seed are highly variable [1, 1214]. Over fifty QTLs underlying individual and/or total soybean isoflavone content have been reported [8, 1523]. However, only 12 of these QTLs were in genomic regions encoding isoflavone synthesis enzymes.

A group of enzymes in the phenylpropanoid pathway lead to the biosynthesis of DZ, GT and GC [11]. Phenylalanine ammonia lyase (PAL; EC 4.3.1.5), chalcone synthase (CHS; EC 2.3.1.74) and flavanone 3-hydroxylase (F3H; EC 1.14.11.9) [24] are the first three enzymes that convert the amino acid phenylalanine into p-Coumaroyl-CoA in this pathway [11]. In the isoflavonoid biosynthetic pathway [25], the co-catalytic action of CHS [26, 27] with chalcone reductase (CHR; EC 2.3.1.170) [28] produces isoliquiritigenin and naringenin chalcone, which are isomers of the central isoflavanone intermediates naringenin and liquiritigenin, respectively. Isoliquiritigenin and naringenin chalcone are respectively converted into liquiritigenin and naringenin by chalcone isomerase (CHI; EC 5.5.1.6) [29]. These two products are the precursors of DZ and GT, which are formed after the catalysis of the precursors by the key enzyme 2-hydroxyisoflavanone synthase (IFS; EC 1.14.13.136) [30, 31]. The enzyme F3H, that competes with IFS in utilizing naringenin, catalyzes the conversion of flavanones to dihydroflavonols, which are intermediates in the biosynthesis of flavonols, anthocyanidins, catechins and proanthocyanidins [32, 33]. For the synthesis of GC, isoliquiritigenin is likely a precursor to form GC after several biochemical steps, which are not entirely known yet [34]. However, seed isoflavone concentrations in soybean can be regulated by metabolic engineering of the complex phenylpropanoid biosynthetic pathways [35].

Regulating transcript abundance is an effective approach to improve phenotypes [36]. The integrated analysis of genotype and transcript abundance data for association with complex traits can be used to identify novel genetic pathways involved in complex traits. ‘Expression QTL’ (eQTL), first defined by Jansen and Nap [37], could identify the genetic determinants of transcript abundances and is widely used for investigating gene regulation pathways. This approach treats transcript abundance of individual genes as quantitative traits in a segregating population. The eQTL map information enables genetic regulatory networks to be modeled that can provide a better understanding of the underlying phenotypic variation. It has been successfully applied in humans [3840], plants [4144], yeasts [45, 46], worms [47], flies [48], mice [49, 50], pigs [51] and rats [52] populations. These studies showed that transcript abundance was highly heritable and could be linked to either a local locus (cis-eQTL) or a distant locus (trans-eQTL). Cis-eQTL is mapped to the same genomic location like an expressed gene (within 5 Mb), and trans-eQTL is mapped to a different genomic location from an expressed gene (>5 Mb or on different chromosomes) [40, 53]. In general, cis-eQTL tends to produce stronger statistical associations than does by trans-eQTL [54]. This phenomenon is regarded as evidence of greater biological plausibility for the existence of true functional cis-eQTL [55]. Trans-eQTL could occur individually at a single genomic locus or could occur collectively as part of eQTL trans-bands [55]. This genomics approach has been employed to identify eQTL related genes in soybean [36, 5658]. To date, no information concerning eQTLs underlying soybean isoflavone synthetic enzyme genes is available.

It has been proved that many enzymes in the phenylpropanoid pathway underlie QTLs that determine the accumulation of isoflavone contents in soybean seeds [11]. Meanwhile, the modification of enzyme encoded genes that are involved in phenylpropanoid pathway could promote the biosynthesis of isoflavone [31, 35]. In this study, PAL, CHS, IFS and F3H in the phenylpropanoid pathway were selected as the target genes (TGs) to analyze isoflavone-relative eQTL. Potential candidate genes underlying the accumulation of isoflavone contents in soybean seed were also evaluated. In addition, overlapped loci both for eQTL and phenotypic QTL (pQTL) were identified.

Results

Total and individual isoflavone contents, target gene transcript abundance and correlation analysis

Transcript abundances of target genes (TGs) between parents from R3 to R8 developmental stages were compared. Total and individual isoflavone contents and transcript abundances of TGs at R6 stage of soybean development were measured in the F5:11 population. The results showed that significant differences among the transcript abundances of TGs between the two parents existed at the R6 stage. The phenotypic variation of individual and total isoflavones showed a continuous distribution (Table 1).

Table 1 Total and individual isoflavone content of the RIL populations and parents

GT showed a high positive correlation coefficient with DZ (r = 0.762, P < 0.01; Table 2). The transcript abundance of PAL was positively correlated with both GT and TI, but exhibited no significant correlation with DZ and GC. The transcript abundance of CHS was positive correlated with DZ, GT and TI, but negatively associated with GC amount. The transcript abundance of IFS displayed a positive correlation with DZ, but showed no correlation with other isoflavone components. The transcript abundance of F3H showed significantly negative correlation with individual and total isoflavone contents.

Table 2 Correlations among individual and total isoflavone contents, as well as the transcript abundances of the four TGs in the RIL populations

Identification of genomic region for target genes

Through BLAST searches (http://www.phytozome.net/soybean), the PAL has six homologous regions (E ≤ 0), which are located on Gm10 (LG O, PAL1/ PAL2), Gm13 (LG F, PAL1), Gm03 (LG N, PAL1), Gm19 (LG L, PAL1), Gm20 (LG I) and Gm02 (LG D1b, PAL1). Homologous regions encoding CHS (E-value ≤ 1.0E-05) are located on Gm11 (LG B1, CHS8), Gm01 (LG D1a, CHS6/CHS7), Gm08 (LG A2, CHS1/CHS2/CHS3/CHS4/CHS5/CHS9), Gm05 (LG A1, CHS2), Gm02 (LG D1b), Gm09 (LG K, CHS6), Gm19 (LG L) and Gm13 (LG F). Genes that encode F3H are located on Gm02 (LG D1b, F3H1/F3H2), Gm16 (LG J), Gm01 (LG D1a), Gm11 (LG B1), Gm18 (LG G) and Gm19 (LG L). Genes encoding IFS are located on Gm07 (LG M IFS1), Gm13 (LG F, IFS2), Gm10 (LG O), Gm03 (LG N), Gm12 (LG H), Gm19 (LG L), Gm17 (LG D2) and Gm11 (LG B1). Genes encoding IFS have the function of P450 cytochromes [27] and might have additional functional homologs.

eQTL analysis for four TGs

The linkage map that included 194 SSR markers (accepted by Molecular Biology Reports) and covered 2,312 cM with mean distance of about 12 cM between markers was used to identify eQTLs associated with the expression of the four TGs. Thirty-three eQTLs that appeared to underlie transcript abundance of the four TGs are detected and located on fifteen LGs (Table 3, Figure 1). Regarding to the locational relationships between the eQTL and the genes, thirteen of the eQTLs were cis-acting (within 5 Mb upstream or downstream of the genes) and twenty of the eQTLs were trans-acting (more than 5 Mb away or on different chromosomes) [40, 53].

Table 3 The eQTLs for target genes of PAL, CHS , IFS and F3H
Figure 1
figure 1

Summary of eQTL and QTL locations detected in the soybean genome. eQTL/ QTL represented by bars were shown on the left of the linkage groups, close to their corresponding markers. The lengths of the bars were proportional to the confidence intervals of the corresponding eQTL/QTL in which the inner line indicates the position of maximum LOD score.

Among the identified eQTLs (Table 3), qPALB2_1 and qPALD2_1 were associated with PAL transcript abundance, and could explain 8.11% and 6.67% of the phenotypic variation, respectively. Eight eQTLs, underlying CHS transcript abundance, were located on six LGs, and could explain 2.07-15.65% of the phenotypic variation. qCHSDla_1 (Satt436-Sat_345, Gm01) was detected with a higher LOD score (8.64) in the regions where cis-elements and CHS family genes were located.

Two eQTLs (qCHSDlb_1, qCHSDlb_2), located in the interval of Satt459 and Satt546, could explain 2.13% and 3.90% of phenotypic variance and overlap with qGCD1b_1. qCHSF_1 (Satt149-Sat_234), associated with CHS and IFS transcript abundance, were overlapped with the marker interval of qGTF_2, and could explain 3.57% of phenotypic variance. qCHSL_1 (Satt278-Sat_134) and qCHSL_2 (Sat_134-Sct_010) were associated with the same SSR marker (Sat_134), and contributed 16.12% and 17.97% of the variation of IFS transcript abundance.

Twelve eQTLs were associated with IFS expression. Of them, qIFSD2_1 (Satt186-Satt226) explained 16.67% of the phenotypic variation. qIFSF_1 (Satt423-Satt569, R2 = 15.84%) shared the same SSR marker Satt569 with other three QTLs (qDZF_2, qGTF_1, qTIF_2). qIFSN shared the same SSR marker (Satt530) with qGCN_1 (Table 3, Figure 1).

Eleven eQTLs were associated with F3H expression (Table 3, Figure 1). Of them, four eQTLs were located on Gm02 (LG Dlb), and explained 5.54-14.32% of the phenotypic variation. qF3HDlb_2 (Sat_135-Sat_096) had higher LOD score and explained 14.32% of the phenotypic variation. qF3HE_1 (R2 = 4.85%) had the same interval (Sat_112- Sat_380) with qGCE_1, qGTE_1 and qTIE_1, meanwhile, qF3HF_1 and qDZF_1 shared the same marker interval (Sat_262- Sat_103) (Figure 1).

Identification of candidate genes underlying the overlapped loci of pQTL and eQTL

Thirty four pQTLs for both individual and total seed isoflavone contents of soybean were compared with eQTLs to identify the overlapped loci. Five eQTL intervals were overlapped with pQTLs, and a total of eleven candidate genes within the overlapped eQTL and pQTL were identified (Table 4). Two genes, C4H (Glyma02g40290.1) and PAL1 (Glyma02g47940.1), were identified on Gm02 (LG D1b) between Satt546-Satt459. CHI (Glyma17g34430.1) and DFR (dihydroflavonol reductase; EC 1.1.1.219) were identified on Gm17 (LG D2) between Satt186-Satt226. Genes encoding 4-coumarate-CoA ligase (EC 6.2.1.12; Glyma13g01080.1/2), FLS (Glyma13g02740.1) and CHS (Glyma13g09640.1) were identified on Gm13 (LG F) between Satt423-Satt569. Additionally, CHS (Glyma13g24200.1) and IFS (Glyma13g09640.1) was found within another eQTL/pQTL interval (Satt149-Sat_234).

Table 4 Identification of candidate genes underlying overlapped locus of eQTL and QTL

Discussion

Soybean isoflavones have been broadly used in food, medicine, cosmetics and animal husbandry [59]. Increasing and decreasing seed isoflavone content will be an important target of soybean breeding. MAS based on genotype selection rather than solely on phenotype selection provides additional power for the selections during soybean breeding [60]. Cultivar ‘Zhongdou 27’ proved to have high-isoflavone content (3,791 μg/g isoflavone in seed) as reported previously [16]. Meng et al. [19] identified two QTL underlying resistance to soybean aphid through leaf isoflavone-mediated antibiosis in soybean cultivar ‘Zhongdou 27’. A number of pQTLs associated with seed isoflavone were identified in multiple environments from cultivar ‘Zhongdou 27’ using 194 SSR markers (accepted by Molecular Biology Reports). Therefore, ‘Zhongdou 27’ should be given more attention as an elite germplasm to improve soybean seed isoflavone concentration, disease and pest resistances.

In our previous studies, some identified QTLs associated with individual/total isoflavone contents showed higher contribution to phenotypic variation. Some specific copies of genes (PAL, CHS, IFS, F3H) in the phenylpropanoid pathway were near or falling into these quantitative trait loci by browsing the reference genome sequence of Williams 82 (http://www.phytozome.net/soybean).

To investigate the regulation mechanism of isoflavone synthetic enzyme genes, the transcript abundances of PAL, CHS, IFS and F3H in the mapping population were examined, and the genomic regions affecting the expression of the TGs were identified using the eQTL methodology [61]. A global microarray eQTL analysis of a limited number of samples can be used for exploring functional and regulatory gene networks and for scanning cis-eQTL, whereas the subsequent analysis of a subset of likely cis-regulated genes by real-time RT-PCR in a larger number of samples may identify QTL region by targeting these positional candidate genes [62]. In this study, real-time PCR reactions were used to analyze the transcript abundance variations of the four TGs in the F5:11 RI lines.

When combined with classical QTL phenotypes, correlation analysis can directly provide an overview of potential genes underlying isoflavone traits [63, 64]. Through the comparison of the transcript abundances of the four TGs (PAL, CHS, IFS and F3H), the parents (‘Zhongdou 27’ and ‘Jiunong 20’) showed different patterns at the R6 stage. This observation was consistent with the previous report by Sarah et al. [65]. Significant correlations between the transcript abundances of TGs and isoflavone contents were found in developing seeds at the R6 stage, indicating that these genes could affect total and individual isoflavone accumulations (Table 2).

Previously, two major QTLs that affect isoflavone content across multiple environments were mapped on Gm05 (LG A1) and Gm08 (LG A2) by Gutierrez et al. [17] and Yang et al. [20], respectively. In the present work, one eQTL qIFSA2_1 (Sat_129-Sat_181) was mapped close to qGCA2_1 on Gm08 (LG A2) (Figure 1, Table 5). This result suggested that qIFSA2_1 might be a cis-enzyme related locus. Some of these identified eQTLs associated with seed isoflavone content did not coincide with the TGs, suggesting that the differences in TGs transcript abundances might be caused by several trans-acting factors [66].

Table 5 Partial QTLs for individual and total isoflavone contents

In this study, since the 194 markers were not uniformly distributed, large gaps appeared with low marker density on chromosomes Gm02, 04, 13, 16 and 18, implying that more markers should be developed among these gaps and the authenticity of pQTL or eQTL should be further clarified. Among these gaps, special attention should be paid to eQTL qF3HDlb_2 on chromosome Gm02 and qIFSC1_1 on chromosome Gm04 because of their higher LOD score and contribution to phenotypic variation (Table 3). Overlapped loci of qF3HF_1 and qDZF_1, and genes that fall into this region should also be further clarified with more markers. Consequently, fine mapping on these intervals with more SSR or SNP markers and to determine the authenticity of these loci as well as the underlying genes were extremely essential in the future work.

The analysis of eQTL overlapped with pQTL suggested that the candidate genes or elements among the marker intervals could affect phenotypic traits [49, 67, 68]. Therefore, overlapped loci of eQTLs and pQTLs were analyzed to find the potential candidate genes affecting the accumulation of isoflavone contents in soybean seed. Five eQTL intervals were overlapped with pQTLs according to the comparison of genomic regions between pQTLs and eQTLs (Table 5). These results indicated that some candidate genes or elements in these intervals could regulate the biosynthesis of isoflavone components, and affect their accumulation. Additionally, some eQTLs overlapped with other eQTLs or shared the same markers with pQTLs, suggesting that some candidate genes or elements were located near these loci.

Several genes involved in isoflavone accumulation in soybean seed had been identified [22, 27, 31]. 11 candidate genes falling into the overlapped intervals of pQTL and eQTL were found (Table 4). Bolon et al. [58] identified eQTL for genes with seed-specific expression and discovered striking eQTL hotspots at distinct genomic intervals on chromosome Gm13. A chalcone isomerase (CHI3) and IFS2 gene were located in the same region identified by qGEN13 on Gm13 [11]. Another QTL for GC that encoded PAL and 4CL paralog was also reported on Gm13 [10, 11]. In the present work, seven candidate genes on Gm13 (LG F) were identified, implying that there could be a hotspot of gene cluster that regulated seed isoflavone content on Gm13. Among them, CHS (Glyma13g09640.1) and FLS (Glyma13g02740.1) were identified on three overlapped loci, implying that they could interact or trans-regulate other genes in the phenylpropanoid pathway. Furthermore, PAL1 (Glyma13g20800.1) and IFS (Glyma13g24200.1) paralogs were identified within two overlapped loci. In the marker interval (Satt149-Sat_234) associated with qCHSF_1, qIFSF_2 and qGTF_2, both Glyma13g24200.1 and Glyma13g09640.1 were found to encode CHS and IFS, indicating that they could be the potential candidate genes. It was supposed that Glyma13g09640.1 could interact or trans-regulate the expression of IFS. However, the function of these potential candidate genes should be tested in future works.

Although open questions about the biology and applications of eQTL mapping still exist [69], there are considerable advances in the eQTL studies. Detailed analysis of eQTL combined with cluster analysis of transcript abundance and eventually gene expression patterns could assist map-based cloning of genes underlying these traits. Markers based on underlying genes are also desirable for MAS in soybean breeding programs. The mechanism underlying seed isoflavone synthesis and its accumulation may contribute to the development of marker-assisted selection for soybean cultivars with high or low isoflavone contents.

Conclusions

A total of thirty three eQTLs (thirteen cis-eQTLs and twenty trans-eQTLs) were identified on fifteen chromosomes. Five eQTL intervals were overlapped with pQTLs and a total of eleven candidate genes within the overlapped eQTL and pQTL were identified. These results might be beneficial for the development of marker-assisted selection to breed soybean cultivars with high isoflavone contents.

Methods

Plant materials and growing conditions

The mapping population of 130 F5:11 recombinant inbred (RI) lines were derived through single-seed-descent from the cross between ‘Zhongdou 27’ (developed by the Chinese Academy of Agricultural Sciences, Beijing, China) and ‘Jiunong 20’ (developed by Jilin Academy of Agricultural Sciences, Jilin, China). ‘Zhongdou 27’ contains high individual and total isoflavone (TI) contents in seed (daidzein, DZ, 1,865 μg/g; genistein, GT, 1,614 μg/g; glycitein, GC, 311 μg/g and total isoflavone, TI, 3,791 μg/g), whereas ‘Jiunong 20’ has low individual and TI contents (DZ, 844 μg/g; GT, 1,046 μg/g; GC, 193 μg/g and TI, 2,061 μg/g).

To detect eQTL, the parents and the 130 F5:11 RI lines were planted at Harbin, Heilongjiang Province, China, in 2011. Randomized complete block designs were used for all experiments with rows 3 m long, 0.65 m apart, and a space of 6 cm between plants. Mature and immature seeds in the reproductive stages (from soybean growth stage R3 to R8) [70] were harvested from a bulked sample collected from three plants in each plot. These samples were quantified for individual and total seed isoflavone contents and transcript abundances.

Isoflavone extraction and quantification

Approximately 150 g of soybean seed samples were ground to a fine power using a commercial coffee grinder. Isoflavones were extracted from flour and separated using HPLC as described previously [16]. Measurements were done as micrograms of isoflavone per gram of seeds plus and minus the standard deviations (μg/g ± SD).

Synthesis of cDNA, Real-Time PCR and data collection

To investigate the expressions of four TGs, total RNA was isolated from soybean seed samples from R3 to R8 stages using plant RNA purification reagent Kit (D9108A, TaKaRa, Japan). RNAs were transcribed to cDNA using the first strand DNA synthesis reagent Kit (D6110A, TaKaRa, Otsu, Shiga, Japan). Four TGs (PAL, GenBank accession: GQ220305; CHS, GenBank accession: EU526827; IFS, GenBank accession: FJ770473 and F3H, GenBank accession: AY595420) in the phenylpropanoid pathway, were selected to analyze the transcript abundance variations in the F5:11 RI line population. These four TGs were analyzed by real-time PCR (Kit DRR081A, TaKaRa, Japan). Gene-specific primers for expression analysis of the four TGs were listed in Table 6. Primer specificity was confirmed based on each primer pair sequence against soybean genome sequences by BLASTing (http://www.phytozome.net/soybean) using the BLASTN algorithm. Moreover, through the BLASTN of the sequences of the TGs, PAL2 (located on Gm10 (LG O)) of the PAL gene family, CHS8 (located on Gm11 (LG B1)) of the CHS gene family, IFS1 (located on Gm07 (LG M)) of the IFS gene family, and F3H1 and F3H2 (located on Gm02 (LG D1b)) of the F3H gene family were amplified [11].

Table 6 Real-time PCR primer pairs for the expression analyses of PAL , CHS , F3H , and IFS genes

PCR amplification was performed as follows: 95°C for 60 s, followed by 40 cycles of 95°C for 11 s, 60°C for 12 s and 72°C for 18 s. The soybean actin4 (GenBank accession: AF049106) gene was used as a reference to quantify the expression levels of the target genes [71]. Three replicates for each reaction were performed. The relative transcript abundance of TGs in different samples was calculated using 2-ΔΔCt method [72], defined as: ΔCt = Ct (target) – Ct (actin). Pearson correlations between total/individual isoflavone contents and the expression of the four TGs in F5:11 RILs were evaluated using SAS 8.2 (Cary, NC, USA) [73].

Identification of genomic region of target genes

The whole genome sequence Glyma1 assembly for Williams 82 [74] provided a powerful tool for interrogating QTL data. Previously reported genes for isoflavone biosynthesis [75] were used in BLAST searches against the whole genome sequence to identify homologous regions in the genome with assigned or putative functions. All twenty soybean chromosomes have regions sharing a high percentage of homology with genes of known function in the phenylpropanoid pathway [11]. The coding regions of TGs were compared with genome of Williams 82 through BLAST (E-value ≤ 1.0E-05, http://www.phytozome.net/soybean) to identify homologous regions.

eQTL analysis

In previous work, fifteen QTL underlying seed isoflavone contents of soybean were identified based on RI line populations derived from a cross between ‘Zhongdou 27’ (high isoflavone) and ‘Jiunong 20’ (low isoflavone) through a genetic linkage map including 99 SSR markers [16]. Another 95 SSR markers were added to the map of Zeng et al. [16] to identify novel phenotypic QTLs (pQTLs) associated with seed isoflavone contents of soybean (accepted by Molecular Biology Reports). In this study, 194 polymorphic markers were assembled onto the 20 linkage groups (LGs) by Mapmaker 3.0b with the Kosambi mapping function [76]. WinQTLCart2.1 [77] was used to detect eQTL between marker intervals by 1,000 permutations at significance (P ≤ 0.05). The genetic linkage map was constructed using Mapchart 2.1 [78]. The nomenclature of the eQTLs/pQTLs included four parts following the recommendations of the Soybean Germplasm Coordination Committee. For example, qCHSF_1, q, CHS, F and 1 represent eQTL, trait (CHS), linkage group name and eQTL order in the linkage group, respectively.

Identification of candidate genes underlying overlapped loci of pQTL and eQTL

Coincident genetic locations of eQTL and pQTL may be available to identify important regulatory genes underlying traits, and lead to the identification of molecular mechanisms [49, 67, 68]. Previous studies have combined eQTL and pQTL mapping to gain insight into regulatory pathways involved in determining phenotypic traits [49, 68, 7981]. eQTL located in the same marker intervals of pQTL might contribute to significant phenotypic variations [49, 67, 68]. In this study, thirty four phenotypic QTL (pQTL) identified with the 194 SSR markers were compared with eQTL to identify overlapped loci. Genetic map positions were estimated by identifying the nearest flanking SSR markers using the genome browser (http://www.soybase.org). The candidate genes underlying overlapped loci of pQTL and eQTL were identified by browsing after using BLAST search of flanking markers against the whole genome sequence of Williams 82 (available at: http://www.phytozome.net/soybean).