Background

Soybeans [Glycine max (L.) Merrill] contain complete protein and oil, providing all the essential amino acids necessary for the human diet [1]. Hence, a great effort has been made to increase soybean yield, while maintaining a high level of quality characteristics [2]. Yield and quality-related traits of soybean are quantitative traits that are controlled by a combination of genetic and environmental factors [3].

The genetic maps with traditional molecular markers including restriction fragment length polymorphism (RFLP), simple sequence repeats (SSR) and amplified fragment length polymorphism (AFLP) have been traditionally used to identify the genetic basis of complex traits in plants [4,5,6,7]. However, conventional molecular markers often display a low density and are unevenly distributed throughout the whole genome. Therefore, the genetic maps developed using these molecular markers have limited both the efficiency and accuracy of QTL positioning. Recently, with the rapid development of high-throughput sequencing technology, single nucleotide polymorphism (SNP) markers have emerged as new molecular markers of choice because of their high-density and relatively even distribution across plant genomes. Further, they have resolved many of the problems associated with the efficiency and accuracy of QTL mapping [8,9,10,11,12]. Several new technologies for SNP genotyping have been developed over the last few years. A high-throughput method for genotyping recombinant populations utilizing whole-genome resequencing to construct a dense genetic map using recombination bins as markers was developed by Huang et al. [13]. Restriction-site associated DNA sequencing (RAD-seq), was one of the next generation sequencing (NGS) methods, has been effectively applied in high-throughput SNP marker discovery and quantitative trait loci (QTL) analysis including the mapping of quality and agronomic trait loci in soybean [14].

Based on these new technologies for SNP genotyping, numerous QTLs associated with yield or quality traits have been identified in soybean [15,16,17]. For example, Kim et al. evaluated two populations for seed yield and other agronomic traits using 1536 SNP markers. In total, 8 QTLs for plant height and 3 QTLs for seed yield were identified [18]. In another study, two QTLs for protein content and six oil content QTLs were identified by Akond and colleagues using a RIL population derived from a cross of PI43848913 × Hamilton [19]. Further, a high density map was developed using the 5376 SNP markers from the Illumina Infinium BeadChip array. In addition, one protein and 11 oil content QTLs were detected in the MD96–5722 by ‘Spencer’ RILs population [20]. Hwang et al. detected 40 SNPs associated with seed protein content and 25 SNPs associated with seed oil content. Among these markers, 7 SNPs were found to be significantly associated with both protein and oil content [21].

The objectives of this research reported here were (1) to develop a high-density soybean molecular genetic bin map with the RAD-seq method, and (2) to map QTLs for yield and quality-related traits in the RIL population and compare these data with previous research (http://www.soybase.org), (3) to determine if any QTLs were identified in both years and were co-localized with any other trait-related QTLs, (4) to select candidate genes that may influence both yield and quality using Gene Ontology (GO) enrichment analysis.

Methods

Plant materials and field trials

A RIL population was developed from a cross between Zhonghuang 24 (female parent) and Huaxia 3 (male parent) using a modified single seed method [22]. Zhonghuang 24 is a variety with high-oil content adaptive to Huang-Huai-Hai region. Huaxia 3 was derived from a cross between ‘Guizao 1’ and ‘BRSMG68 (Brazilian variety)’ that is a high-yielding soybean cultivar. The 164 F8 RILs were grown together with both parents at the Zengcheng Experimental Station (South China gricultural University, Guangzhou, China) following a randomized complete block planting with three replications in the summer of 2012. Each plot contained 10 plants per row, with 0.5 m between rows and 0.1 m between plants. The 146 F11 RILs were grown using the same methods in the same location in 2015. Field management followed normal soybean production practices for the area.

Measurement of yield-related and two quality traits

The five plants in the middle of each row were individually harvested to score the following traits: plant height (PH), number of nodes (NN), number of branches (BN), number of effective pods (EP), number of invalid pods (IP), 100-seed weight (SW), seed protein content (Pro) and seed oil content (Oil). PH was measured in mature plants as the distance (cm) from the cotyledonary node to the top node of the main stem. NN was measured by counting the number of nodes from the cotyledonary node to the top of the main stem. BN was determined by counting the number of branches with podding on the main stem. EP were obtained by counting the number of pods with more than one filled seed per pod. IP were obtained by counting number of pods that did not contain seed. SW was measured by weighing 100 random filled seeds. 50 g of seed from each line were used for protein and oil determination by an Infratec 1241 Grain Analyzer based on 10% moisture.

Frequency distribution and correlation analysis for the parental and RIL population were analyzed with the SPSS statistics 17.0 and Microsoft Excel 2007.

Genetic map and QTL detection

SNP genotyping

All the genotyping work was conducted at the Beijing Genome Institute (BGI) Tech, Shenzhen, China. The soybean reference genome from Williams 82 was used for read mapping with SOAP software [23]. Input data for SNP calling with realSFS was prepared by SAMtools [24]. According to site frequency at every site, population SNP calling was performed with realSFS. The likelihoods of genotypes for each individual were integrated and extracted as candidate SNPs and then filtering these SNPs using the following criteria: 40 ≤ depth ≤ 2500,sites with a probability ≥ 95%. The homozygous genotype of parents and their populations were obtained based on the high fidelity-SNPs. According to the sliding window approach, we chose to include 15 SNPs per window, identifying the genotype for each window and the exchange sites for each individual when sliding a SNP every time, and then using the genotype for each individual to generate bin information [13].

QTL analysis

A high-density genetic map was constructed using MST software (http://alumni.cs.ucr.edu/~yonghui/mstmap.html). The composite interval mapping (CIM) method was employed to scan QTLs. The LOD thresholds for QTL significance were determined by a test (1000 replications) with a genome-wide at the 5% level of significance to judge whether there exist QTL. The location of a QTL was described according to its LOD peak location and the surrounding region with 95% confidence interval calculated using WinQTLCart software [25]. Running result of software can show additive effects of QTLs and phenotypic variation. The LOD values were shown in Additional file 1. QTL mapping results were comprehensively compared to Soybase (http://www.soybase.org/).

Method for naming QTLs

All QTLs were named according to Cui et al. as follows [26]: initial ‘q’ denotes ‘QTL’; the letters following it are the abbreviation of the corresponding traits; the next number is the soybean chromosomes on which the corresponding QTL is distributed; then, ‘a’ and ‘b’ represent whether the QTL was identified in 2012 and 2015, respectively; if more than one QTL for a certain trait was dispersed along a certain chromosome, a serial number, viz.-1, 2, etc., is used after ‘a’ or ‘b’ to describe their order.

Results

Phenotypic analysis of the RIL population

Most of the traits of Huaxia 3 showed higher values compared with those of Zhonghuang 24, providing ideal material for population construction and QTL analysis, with the exception of oil (Additional file 2). Figure 1 shows the frequency distribution for eight traits in 2 years. Phenotypic values were found to be continuous with normal or skew normal distributions. Transgressive segregation in the RILs was shown for eight traits, suggesting that alleles with positive effects on the measured traits are distributed among the parents.

Fig. 1
figure 1

Frequency distribution for eight traits in 2012 (in blue) and 2015 (in red)

The correlation analysis showed that most of the yield-related traits were correlated with each other in both years (Table 1). PH was positively correlated with NN, BN, EP, IP and SW, except for EP and IP in 2012 and SW in 2015 where it was not significant. NN also showed significant positive correlations with BN and EP in both years, but no correlation was detected with SW. Significant negative correlations were found for SW with BN and EP, ranging from r = −0.215** to r = −0.327**in both years, but have a significant positive correlation with protein (r = 0.245**) in 2012. Most previous studies reported that there is a strong negative correlation between seed protein and seed oil content [20, 27]. In our study, a highly significant negative correlation (r = −0.775**, r = −0.761**) was observed between protein and oil in both years.

Table 1 Phenotypic correlations between yield-related traits and quality traits in 2012 and 2015

High-density SNP linkage map construction

Based on 0.2× RAD-seq (restriction-site associated DNA sequencing) of the Zhonghuang 24 and Huaxia 3 RIL population, 57.40G sequence reads were obtained and the average read number was 311.97 M. Half of them have more than 200 M reads. According to this data, a total of 47,472 high-quality polymorphic SNP sites were detected for the RILs. All of the SNP sites in the RILs were integrated into a recombination bin unit, and 2639 recombinant bins were obtained. The average physical length of the bins was 360.01 kb, ranging from 20.01 kb to 17.43 Mb. A total of 1126 bins’ length were less than 100 kb, 609 bins ranging from 100 kb to 200 kb, 291 bins from 200 kb to 300 kb, 175 bins from 300 kb to 400 kb and 438 bins above 400 kb. Based on the genotypes of 2639 bins, a high-density bin linkage map was constructed covering 2638.24 cM, with an average distance of 1.00 cM between adjacent markers. For each chromosome, the average genetic distance between adjacent bins ranged from 0.67 to 1.51 cM (Table 2). Therefore, the linkage map constructed with recombination bins resulted in well-distributed linkage distances and has higher resolution than conventional maps.

Table 2 Description of characteristics of 20 chromosomes in the high-density genetic map

QTL analysis for yield-related traits

Forty-seven QTLs associated with yield-related traits including PH, NN, BN, EP, IP and SW, were identified on 13 chromosomes (Chr04, Chr05, Chr06, Chr07, Chr08, Chr11, Chr12, Chr13, Chr14, Chr15, Chr17, Chr19, Chr20) (Fig. 2). A single QTL explained 3.78% (qPH13a)-32.56% (qNN19a) of phenotypic variance. Among the QTLs, 28 were identified on ten chromosomes in 2012. The most prominent QTL with the highest LOD score (15.63) was identified in a 56 kb region, which we designated qNN19a, explained 32.56% of phenotypic variation and displayed a negative additive effect, mainly with the positive allele from the male parent Huaxia 3. Nineteen QTLs on nine chromosomes were detected in 2015, and qPH19b-2 has the most significant LOD score (10.34), explaining 24.49% of phenotypic variation and showed negative additive alleles from the male parent Huaxia 3. Of these QTLs, 24 were in agreement with earlier reports and 23 QTLs were found to be novel (Additional file 3 and Table 3). Eight QTL clusters responsible for more than two traits were detected on four different chromosomes (Additional file 4). A total of 18 QTLs were stable across both years. Thirty-four of these QTLs had a positive additive effect, which were contributed from the female parent Zhonghuang 24, whereas 13 QTLs had a negative effect, with additive alleles from the male parent Huaxia 3.

Fig. 2
figure 2

The positions of QTLs for eight traits. 60 QTLs for eight traits identified across 2 years are depicted in different shapes on the right side of each linkage group. 36 QTLs identified in 2012 are colored in blue and 24 QTL identified in 2015 are highlighted in red

Table 3 Novel QTLs detected in Zhonghuang 24 × Huaxia 3 RILs population in 2 years

QTL analysis for quality traits

A total of 13 QTLs were associated with quality traits on ten different chromosomes (Chr01, Chr02, Chr06, Chr07, Chr10, Chr11, Chr12, Chr13, Chr17, Chr20) in both growing seasons (Fig. 2). Three QTLs for protein content were identified on Chr07, Chr10 and Chr13 in 2012, respectively. Five QTLs for oil content were identified on Chr01, Chr06, Chr10, Chr11 and Chr20, with the phenotypic variance effect ranging from 6.76% (qOil11a) to 13.30% (qOil01a). Four QTLs (qPro07a, qPro13a, qOil06a, qOil20a) showed positive additive effects ranging from 0.27 (qOil20a) to 0.42 (qPro13a), while the other four QTLs (qPro10a, qOil01a, qOil10a, qOil11a) showed negative additive effects that were from −0.27 to −0.47. A QTL (qPro17b) for protein content was detected in a 52 kb region on Chr17, explaining 9.29% of phenotypic variation in 2015. In addition, four QTLs on Chr02 and Chr12 were identified for oil content, which individually explained 7.52% (qOil02b-1) and 12.49% (qOil02b-2) of the phenotypic variation. Within these QTLs, three of them had positive additive effects, indicating that the female parent, Zhonghuang 24, contributed the trait for increased oil content. A total of ten QTLs were reported in prior studies, and three new QTLs were identified for the first time in the present study.

The Gene ontology enrichment analysis base on QTL hotpot

It was noteworthy that an important QTL hotspot was mapped in a physical position between 43,923,975 and 45,138,371 bp on Chr19. Seven QTLs associated with five traits that explained up to 32.56% of phenotypic variation, were all detected within this genomic region that was previously reported to be associate with seed weight, protein and oil in several different studies. In order to gain an in- depth understanding of which genes/QTLs were related to yield and quality in this region, we retrieved gene calls and annotations using Glyma.Wm82.a1.v1.1 gene model from SoyBase (https://soybase.org/SequenceIntro.php#mapscompare). A total of 139 genes were found within this region using Gene Ontology enrichment analysis, and among them, 51 annotated genes were closely related to yield or quality, which could be classified into five groups (Additional file 5). The first group contains 13 genes associated with phytohormone regulation, including hormones such as auxin, abscisic acid and ethylene, which play an essential role in coordination of in vitro and in vivo regulation mechanisms to simultaneously improve yield and quality [28]. The second group is comprised of 19 genes that are associated with metabolic processes, including carbohydrate metabolism, lipid metabolism, fatty acid catabolism and brassinosteroid metabolism, which are known to have an effect on the growth and development of soybean. The third group contains 6 genes associated with protein phosphorylation, which could be related to functional properties of food protein. Next, the fourth group is made up of 16 genes that are associated with cellular processes, including cell differentiation, cell proliferation, multicellular organism reproduction, and cell growth, which may have positive consequences for grain yield and quality in plants [29]. The fifth group consists of 16 genes associated with organ morphogenesis, including the development of root, stamen, leaf and seedling, etc., even directly influence on soybean yield and quality.

Discussion

Main effect factors for QTL mapping

The utility of QTL mapping is to obtain valuable alleles and understand genetic mechanisms, thus promoting genetic improvement of soybean by molecular methods, which is one of the main objectives in soybean breeding. Parental genetic diversity, environmental effects, and marker density are the main factors affecting QTL mapping [30]. In this study, the parents of the RIL population are derived from geographically distinct locations. Zhonghuang 24 is a main variety grown in central China, while male parent, Huaxia 3, is derived from Brazilian soybean germplasm that have high yield and become the main variety grown in southern China. Our data indicated that there were more differences in yield and quality-related traits between Zhonghuang 24 and Huaxia 3, relative to other similarly performed studies. Thus, the detected QTLs of these traits could be more useful for soybean improvement. In addition, quantitative traits can be strongly affected by environment factors [31]. In order to find QTLs that are stably expressed across environments, we chose two non-consecutive years including 2012 that was determined to be a suitable climate and 2015, which experienced greater than rainfall throughout all growth stages. According to Guangzhou Meteorological Service (http://www.gz121.gov.cn/), the total rainfall from July to October was 433 mm in 2012, while 1023 mm for the same period in 2015. Under these conditions, the QTLs identified in both years can be considered robust and environmentally stable. Furthermore, QTL mapping based on the resequencing genotyping method resulted in the integration of a total of 47,472 SNPs into 2639 recombination bin units. This was used to construct a high-density bin linkage map with an average distance of 1.00 cM between adjacent markers. The map has well-distributed linkage distances and higher resolution than the conventional map, making QTL mapping more accurate and reliable.

Comparison of the present study with previous research

In the present study, 14 QTLs were identified for PH, explaining 3.78 to 28.01% of phenotypic variation across the two growth seasons, of which, qPH19a was major QTL associated with PH and was detected in both years. This QTL has been previously reported by Lee et al. and Specht et al. [32, 33]. It is worth noting the importance of the novel QTLs (qPH04a, qPH04b) on Chr04 identified in this study, because they expressed across both years and accounted for 15.71 and 21.53% of phenotypic variation, respectively. Three QTLs (qPH06a-1, qPH06a-2, qPH06b-2) were identified on Chr06, which were in similar regions of those previously reported by Wang et al. and Gai et al. [34, 35], respectively. Four novel QTLs (qPH06b-1, qPH12a-1, qPH12a-2, qPH14a-2) were identified on Chr06, Chr12 and Chr14 for PH. NN was found to be influenced by nine distinct QTLs distributed across four chromosomes. The QTL detected on Chr04 in 2012, with an interval of 3,740,934–3,781,822 bp, was in a similar region (3657048–3,740,933 bp) to another one identified in 2015, and it is likely that they are the same. Two QTLs were identified on Chr19, qNN19a and qNN19b, which were consistent in both years and explained up to 32.56% of phenotypic variation. Interestingly, no similar positions were found for NN in prior studies. BN, a key constituent of soybean yield, has been studied extensively. Some researchers think that increasing production could be achieved through adjusting the branching number, and was confirmed by Panthee et al. [36]. In their study, sd yld24–1 was mapped for yield traits with satt076 on Chr19. Interestingly, qBN19a which controls the number of branches in our study falls within this interval. Moreover, sd wt4–1 and sd wt11–1 for seed weight were identified by Maughan and Lee [37, 38], which was located at the same position as qBN11a, qEP11a, and qEP11b on Chr11 in this study. Three other novel QTLs (qBN04a, qBN05b, qBN08b) for BN were detected on Chr04, Chr05, and Chr08, accounting for 6.29 to 13.44% of phenotypic variation. Pod number and 100-seed weight are important parameters in measuring soybean yield and controlled by multiple genes. Two QTLs, qEP19a and qIP19b, on Chr19 were found to be associated with pod number during both years, and were located in the same region as those previously reported by Zhang et al. [39]. Moreover, qSW19a-1 was shown to be associated with 100-seed weight, and is also mapped on Chr19 near this interval. Orf et al. reported a fine-mapped, 100-seed weight QTL located on Chr15, which just overlapped the intervals of the QTL for SW detected in both years in the present study [40].

In our study, a total of 4 protein content QTLs and 9 seed oil content QTLs were identified in 2 years. Three QTLs (qPro10a, qPro13a, qPro17b) were found to be novel, and no similar position has been identified previously for protein content. Ten of the 13 QTLs relevant to protein or oil content detected in the present study were consistent with previous research, and some of them shortened the interval. For example, A QTL associated with oil content, qOil06a, was found on Chr06 (37764770–38,299,977 bp). Palomeque et al. also reported that a QTL for oil content fell within the same interval, and a similar locus regarding seed oil and ‘oil plus protein’ related traits was also published by other researchers [41, 42], which indicated that this QTL is stable and may have pleiotropic effects. Meanwhile, three other QTLs (qPH06a-2, qNN06a-2, qBN06a) for yield-related traits were mapped to a similar region identified in our study, which explained 6.65 to 19.77% of phenotypic variation, respectively. qOil20a was mapped in a 39 kb region to bin 73 on Chr20 (34770628–34,809,740 bp), which falls within the same region identified by both Qi et al. and Reinprecht et al. [43, 44]. Moreover, qSW20b-2 (33207531–33,259,106 bp) for yield was also located near this position, suggesting that these two aforementioned regions should be of great value for genetic improvement of both soybean yield and quality. The remaining QTLs associated with protein or oil content in agreement with those of previous studies are presented in Additional file 3 [27, 45,46,47,48,49,50]. The coincidence of QTL across different genetic backgrounds not only reveals the stability and reliability of the QTL detected herein but also highlights the significance of these regions in marker breeding works designed to develop higher protein or oil soybean cultivars.

Important QTL hotspots

Most of the QTLs were clustered in eight genomic regions, particularly on Chr04, Chr06, Chr11 and Chr19 (Additional file 4). These QTLs hotspots included at least two traits such as PH, NN and SW, and was previously reported to be associated with some other traits in different genetic sources. Four QTLs for yield-related traits were mapped in two intervals of 3,657,048–3,781,822 bp on Chr04, which explained 6.17–17.68% of phenotypic variation. These QTLs have not been published and add to the growing knowledge on the genetic control of these traits. Three other QTLs were also detected on Chr04 (3815206–5,131,478 bp) explaining the range of phenotypic variation (9.59–21.53%). However, this region was reported to be associated with seed protein and seed weight in some earlier studies [33, 40]. Seven QTLs for PH, NN, BN, and oil were identified in two regions (18376759–19,504,937 bp, 37,764,770–41,420,709 bp) that were separated by a distance of more than 7 cM on Chr06, and accounted for 5.18–19.77% of phenotypic variation. Previously, Sun et al. located two QTLs for pod number on Chr06 near these two regions [51]. The first region on Chr06 in the present study has been shown to be associated with different traits by other researches [42, 45, 52]. Moreover, Chen et al. found that two QTLs for pod number and seed oil plus protein were consistent with the second region on Chr06 in our study [42]. More seed weight, protein and oil content QTLs were mapped to this locus in previous studies [17, 41, 45, 53, 54]. Three QTLs for BN and EP were identified on Chr11 that explained 4.97–9.31% of phenotypic variation. Of these, two QTLs for EP were expressed over 2 years. Three previously reported QTLs for protein content and seed weight were located in this region [35, 37, 38]. Seven QTLs were located in a physical position (43923975–45,138,371 bp) on Chr19, of which qPH19a, qNN19a and qPH19b-2 have large effect (28.01, 32.56, 24.49%) on phenotypic variation in comparison to the others. Mansur et al. found two QTLs associated with protein and oil were close to this region [55]. Orf et al. also reported that this locus as associated with seed weight [40]. In addition, three QTLs (qBN19a, qPH19b-1, qEP19b) were detected on Chr19 (40662371–40,701,058 bp) in this study. Similar loci have been previously reported for seed weight, protein and oil content [43, 46, 56]. Another two QTLs (qIP19a, qSW19a-2) on Chr19 were mapped to the interval of 42,309,067–42,469,449 bp. Some of the seed weight QTLs were detected near this position in past studies [40, 45, 56, 57]. Moreover, QTLs for protein and oil content were also previously identified in this region by both Orf et al. and Qi er al. [40, 43]. Interestingly, in this study, highly significant correlations were observed among PH, NN, BN, EP and SW. QTL mapping analysis showed that these traits were all linked to same region on three chromosomes (Chr04, Chr06, Chr19), which is consistent with the conclusion of phenotypic correlation analysis, and provided a genetic explanation for these associations. These QTL clusters may be cause of the pleiotropism or associations between the traits related. Every single cluster may function as an independent gene or closely linked genes [58]. More importantly, some of those QTLs on Chr04, Chr06, Chr11, and Chr19 were identified in both years. These chromosome regions can be considered robust and environmentally stable, which could be helpful for further studies aimed at simultaneously altering soybean yield and quality in a predictable manner.

Three candidate genes on Chr19

Based on the predicted function of the five groups, three predicted genes (Glyma19g37910, 37,570, 36,990) were selected as the best candidate genes that may affect both yield and quality because they are involved in various biological process (Table 4). Glyma19g37910 encodes a member of the basic leucine zipper transcription factor family, involved in arabidopsis abscisic acid signalling during seed maturation and germination. GO analysis showed that this gene participated in more than ten biological process, which include seed development, lipid storage, gibberellin biosynthesis, and vegetative to reproductive phase transition of the meristem, etc. Glyma19g37570 gene has a domain predicted to encode a serine/threonine protein kinase that could influence cells in various ways. This gene is related to the process of stem cell division, protein phosphorylation, gibberellin biosynthesis and timing of the transition from vegetative to reproductive development. Glyma19g36990 encodes a plastidic triose phosphate isomerase, and GO analysis revealed that this gene participates in three catabolic process (glycine, tryptophan, and glycerol) and four biosynthetic process (indoleacetic acid, cysteine, and glyceraldehyde-3-phosphate, isopentenyl diphosphate). Moreover, it also plays a key role in multicellular organism reproduction and primary root development, which may have an effect on the yield and quality of crops. In general, these three candidates should be investigated in more detail in further studies to increase our understanding regarding the factors involved in the process of improving quality and productivity in soybean.

Table 4 The information of three candidates’ annotations

Conclusions

In this study, we genotyped a recombinant inbred line (RIL) population (Zhonghuang 24 × Huaxia 3) using a restriction-site associated DNA sequencing (RAD-seq) approach. A high-density soybean genetic map with 2639 recombination bins was constructed and used to identify QTLs that were shown to influence six yield-related and two quality traits. A total of 47 QTLs for six yield-related traits and 13 QTLs for two quality traits were identified. Of these, 34 QTLs detected herein were coincident with those of previous research [18, 27, 32, 34, 35, 3950, 56, 57, 5964]. Eighteen QTLs were stable QTLs that were identified in 2 years. Twenty-six QTLs were shown for the first time in this research, of which 10 were novel and stable QTLs. In addition, eight QTL hotspots on four chromosomes were identified for the correlated traits. Three predicted genes were selected as candidate genes that may directly or indirectly influence both yield and quality in soybean.