Incorporating pleiotropic quantitative trait loci in dissection of complex traits: seed yield in rapeseed as an example

A comprehensive linkage atlas for seed yield in rapeseed. Most agronomic traits of interest for crop improvement (including seed yield) are highly complex quantitative traits controlled by numerous genetic loci, which brings challenges for comprehensively capturing associated markers/genes. We propose that multiple trait interactions underlie complex traits such as seed yield, and that considering these component traits and their interactions can dissect individual quantitative trait loci (QTL) effects more effectively and improve yield predictions. Using a segregating rapeseed (Brassica napus) population, we analyzed a large set of trait data generated in 19 independent experiments to investigate correlations between seed yield and other complex traits, and further identified QTL in this population with a SNP-based genetic bin map. A total of 1904 consensus QTL accounting for 22 traits, including 80 QTL directly affecting seed yield, were anchored to the B. napus reference sequence. Through trait association analysis and QTL meta-analysis, we identified a total of 525 indivisible QTL that either directly or indirectly contributed to seed yield, of which 295 QTL were detected across multiple environments. A majority (81.5%) of the 525 QTL were pleiotropic. By considering associations between traits, we identified 25 yield-related QTL previously ignored due to contrasting genetic effects, as well as 31 QTL with minor complementary effects. Implementation of the 525 QTL in genomic prediction models improved seed yield prediction accuracy. Dissecting the genetic and phenotypic interrelationships underlying complex quantitative traits using this method will provide valuable insights for genomics-based crop improvement.


Introduction
Crop production environments are subject to many abiotic and biotic factors affecting plant growth. In addition, networks of important agronomic traits interact with each other and with the environment in a very complex manner. Seed yield, one of the most important agronomic traits, is both extremely complex and strongly affected by the environment. Furthermore, yield performance depends on factors such as plant growth conditions, plant resistance to biotic and abiotic stresses, and interactions with other physiological and genetic processes through the entire crop life cycle. Yield itself is affected by many underlying traits and thousands of different genes (Bernardo 2008).
In the past few decades, many quantitative trait loci (QTL) analyses have been done using different statistical methods to dissect the genetic architecture of complex quantitative traits such as seed yield (Dargahi et al. 2014;Groos et al. 2003;Lacape et al. 2013;Li et al. 2014;Liu et al. 2015;Shi et al. 2009;Song et al. 2007;Xiao et al. 1996;Zhang et al. 2004). Nevertheless, understanding of such complex traits is still rudimentary, and this knowledge is insufficient to contribute substantially to further crop genetic improvement (Bernardo 2008). Theoretically, several hundred genes or QTL may account for differences in seed yield between two diverse parents, but it is difficult to detect such a large number of QTL in a single mapping study by conventional methods. Difficulties in detecting QTL arise for a number of reasons. Firstly, experimental variables such as soil fertility and time of crop emergence are difficult to control in standard field experiments. Secondly, genes with opposite contributions to seed yield may be located in the same QTL region, offsetting their contributions to yield. Thirdly, genotype-by-environment interactions are relatively poorly understood (Cowling and Balazs 2010;El-Soda et al. 2014). Finally, QTL mapping methods are often also statistically incapable of detecting genes with minor effects (Mackay et al. 2009;Würschum 2012).
More QTL can be detected by increasing the number of detectable recombination events in the segregating population. This usually involves enlarging the size of the population (and hence the number of recombinants) while maintaining genetic marker saturation such that all recombination events are detected (Mackay et al. 2009;Vales et al. 2005;Zeng 1994). However, larger population sizes and greater marker densities make experiments more difficult and costly, and often the trade-off is that fewer replicates are possible, hindering detection of fieldscale environmental variation. To address this issue, a number of statistical or methodological approaches have been suggested to improve QTL detection efficiency. Long et al. (2007) proposed inclusion of minor-effect "micro-real" QTL, which do not meet QTL detection thresholds, but nevertheless are either detected repeatedly in different environments or which overlap with QTL for related traits. For the latter, where QTL overlap with loci for related traits, Shi et al. (2009) coined the term "indicator QTL". In previous studies, indicator QTL with positive or negative effects on traits related to seed yield were often ignored. However, such loci may be very useful to further increase seed yield under selection in different genetic backgrounds and environments.
Multiple single trait interactions are likely to underlie all complex traits such as seed yield. We hypothesize that by considering these component traits and their interactions we can more effectively dissect individual QTL and more accurately predict the trait performance. To test this hypothesis, we investigated a large resource of phenotypic data generated in large-scale field trials: a total of 19 independent experiments in a doubled-haploid (DH) B. napus mapping population (BnaTNDH), with accompanying single nucleotide polymorphism (SNP) genotype data. Multienvironment phenotype data was available for seed yield and its components (Shi et al. 2009), flowering time (Long et al. 2007); seed quality traits (Jiang et al. 2014;Shi et al. 2009), glucosinolate content (Feng et al. 2012), nutrient use efficiency (Shi et al. 2013), tocopherol content (unpublished data), and disease resistance (unpublished data). Firstly, we analyzed multi-trait phenotypic and genetic correlations using this large trait data set in B. napus, and related this to expression of seed yield via path analysis. Secondly, we identified and integrated QTL regions associated with important traits (in particular seed yield) in B. napus, using a dense genetic map based on the Brassica Illumina Infinium 60 K SNP array. Thirdly, we analyzed the direct genetic effects and plasticity of QTL as well as indirect contributions to seed yield across environments, taking trait correlations into consideration. Finally, we tested the impact of yield-associated SNP markers on the accuracy of genomic prediction of seed yield.

Plant material and available phenotype data
A doubled-haploid population (named "BnaTNDH") derived from an F 1 hybrid between "Tapidor" (a European winter-type rapeseed cultivar) and "Ningyou7" (a Chinese semi-winter type rapeseed cultivar) was used for this study (Qiu et al. 2006). From this population, 182 DH lines with genotyping data generated by the Brassica Illumina Infinium 60 K SNP array (Clarke et al. 2016) were recently used for genetic map construction . Therefore, these 182 DH lines and this new map were used for QTL mapping in the present study with collected published phenotypic data (including seed yield and yield component traits, flowering time, glucosinolate presence in different tissues, seed erucic acid and oil content) and unpublished phenotype data (resistance to Leptosphaeria maculans, resistance to Sclerotinia sclerotiorum, and Vitamin E content) (see Tables 1, 2 for details of traits investigated and phenotyping locations).
The field locations were classified into three macro-environments: semi-winter, winter and spring. The harvest year combined with the location was set as an "experiment" and all experiments used a randomized complete block design with three replications, as described in Shi et al. (2009). In addition, three experiments were conducted under greenhouse conditions ( Table 2).

Collection of phenotype data on disease resistance
Resistance to Sclerotinia sclerotiorum (stem rot) was assessed at the seedling stage on the sixth leaf of each plant. The inoculation was performed in a plastic container, where mycelial agar disks of S. sclerotiorum were placed on the upper surfaces of leaves. Plants were kept in the dark for 16 h, then provided with 12 h photoperiod (12 h light at 32 °C/12 h darkness at 22 °C, light intensity at 210 µmol m −2 s −1 ). Diameters of S. sclerotiorum leaf lesions were recorded at three different time points to provide an index of resistance to S. sclerotiorum. This index was used as quantitative data for identification of QTL.
The phoma stem canker pathogen (Leptosphaeria maculans) isolate 99-79 was sourced from the AAFC Saskatoon Research Centre collection, kindly provided by Dr. H. R. Kutcher. This isolate produced distinct resistant (Tapidor) and susceptible (Ningyou7) phenotypes on parental lines and therefore was used to inoculate the BnaTNDH population lines to map the resistance gene.
Conidial suspension of isolate 99-79 was prepared from 12-day-old cultures on V8 agar and spore concentration was adjusted to 10 7 spores per ml. Seedlings of the 200 DH lines were grown in 40-well trays in a controlled environment room (20 °C day/20 °C night, 12 h light/12 h darkness, light intensity 210 µmol m −2 s −1 ). Cotyledons of 14-day-old seedlings were wounded and a 10 µl drop of conidial suspension was placed over each wound. Each seedling had four inoculation sites (one on each cotyledon lobe). Five seedlings of each DH line were inoculated (i.e. 20 inoculation sites). Symptoms were assessed at 16-18 days after inoculation using a 0-9 scale (0: no symptoms; 9: large gray lesions with pycnidia); where 0-4 were classified as a resistant response, 5 as an intermediate resistant response and 6-9 as a susceptible response. The disease scores were used as quantitative data for QTL identification.
Linkage map and marker alignment to the B. napus "Darmor-bzh" reference genome A genetic linkage map of the BnaTNDH population (182 lines) with a total of 2041 genetic bins was constructed from 12960 SNP markers detected with the Brassica 60 K Illumina Infinium array Supplementary Table 1). We aligned all the SNP markers included in the genetic loci of the BnaTNDH genetic linkage map to the reference B. napus genome of "Darmor-bzh 4.1" (Chalhoub et al. 2014) by BLAST (Altschul et al. 1990) as performed in Zou et al. (2016). For those representative bin markers with no hits or uncertain hits, we used the physical location of other markers in the same genetic bin. All information related to the physical alignment of the SNP markers in the genetic map to the reference genome is listed in Supplementary Table 1.

Phenotype and QTL analyses
Phenotypic correlation between all traits in a single environment was estimated using Spearman's rank correlation coefficient (Myers et al. 2010). Genetic correlation between traits investigated in multi-environments was estimated using multivariate restricted maximum likelihood estimation (REML) in SAS Proc MIXED (Holland et al. 2002). Heritability was calculated using the equation: h 2 = σ g 2 /(σ g 2 + σ 2 ge /n + σ e 2 /nr), where σ g 2 is the genetic variance, σ 2 ge is the variance representing genotype-by-environment interactions and σ e 2 is the error variance, n is the number of environments and r is the number of replications. The variance of the components was estimated by SAS using a general linear model (GLM). Path analysis (Wright 1934) was also done for each environment to further investigate links between traits and their effects on seed yield. Seed yield (SY) was set as the dependent variable while other traits investigated in corresponding experiments were set as independent variables. The direct or indirect effects of the independent variables on the dependent variable were estimated. The variables were initially standardized and the following path analysis model was established: where S i was the standard deviation of x, S y was the standard deviation of y, and α 1 , α 2 , …α m were the direct effects of independent variables on SY. A system of normal equations X ′ S X S α = X ′ S Y S was used to estimate the direct and indirect effects of each explanatory variable on SY: The indirect effect of independent variable x j on independent variable x i was calculated as r ij α jy . Statistical analysis was conducted in Microsoft Office Excel ® and R (R Core Team 2015).
WinQTLcart v2.5 (http://statgen.ncsu.edu/qtlcart/ WQTLCart.htm) was used for QTL detection by composite interval mapping (Silva Lda et al. 2012;Wang et al. 2012). The number of control markers, window size, and walking speed were set to 5, 10, and 1 cM, respectively. Permutation analysis with 1000 repetitions was performed to obtain the LOD threshold (Churchill and Doerge 1994). A significance level of (P = 0.05) was adopted as the LOD threshold, and QTL with LOD scores greater than the LOD threshold were accepted as significant QTL. To avoid elimination of minor effect QTL with small LOD scores, a smaller LOD (P = 0.5) was adopted to determine suggestive QTL. Suggestive QTL that were repeatedly detected in multiple experiments were accepted for further analysis. Finally, significant QTL overlapping with suggestive QTL were admitted as authentic QTL and named as identified QTL according to the method reported by Long et al. (2007) and Shi et al. (2009). The identified QTL with R 2 ≥ 20% were accepted as major QTL and those with R 2 < 20% were accepted as minor QTL.
A meta-analysis algorithm (Goffinet and Gerber 2000) was developed to determine the number of "real" QTL within a region containing many independently identified QTL. The BioMercator v4.2 (http://moulon.inra.fr/ index.php/fr/seminairedoc/cat_view/21-logiciels/101abi-project-and-software/104-biomercator) was used to perform this meta-analysis: an adjusted Akaike criterion was adopted to determine the probable QTL numbers from the five given models (1-, 2-, 3-, 4-and N-QTL model, corresponding to the number of predicted true QTL). The software also provided a method to estimate the confidence interval of the predicted QTL position, 2 was the variance of the position of the QTL i and k was the number of selected QTL selected for the meta-analysis (Chardon et al. 2004;Khatkar et al. 2004;Shi et al. 2009). Firstly, all QTL identified for each trait were integrated using meta-analysis to get consensus QTL, using the methods of Shi et al. (2009). Consensus QTL that could be detected in more than one experiment were classified as reproducible QTL, while others were classified as   non-reproducible. As glucosinolate and tocopherol content include a large number of components, meta-analysis was done for these two traits and their components as described in Feng et al. (2012). Secondly, based on the phenotypic analysis and complex QTL distribution, a second round of meta-analysis was done for all consensus QTL for different traits to obtain indivisible QTL related to seed yield across the whole genome.

Accuracy of genomic prediction on seed yield
Ridge regression best linear unbiased prediction (RR-BLUP) was applied with different sets of markers for genomic prediction of seed yield using R software package "rrBLUP" (Endelman 2011). Considering that the seed yield and related trait phenotypes for the QTL detection were mostly detected in the semi-winter environments (9) rather than in the winter environments (4), phenotype data of seed yield from semi-winter environments was processed for testing the prediction. The environmental effects were estimated according to Zhao et al. (2013Zhao et al. ( , 2015. In total, 100 runs of fivefold cross-validation were performed for the model, and the genomic selection accuracy was evaluated by the R 2 value divided by the square root of the heritability and predicted seed yield performance (Zhao et al. 2015;Zou et al. 2016).

Correlations between agronomic traits and their contributions to seed yield in the BnaTNDH population
Phenotype data for a total of 22 traits in the BnaTNDH population were collected over eight growing seasons, eight locations, and three greenhouse experiments (Tables 1, 2). The glucosinolate and tocopherol content measurements included different components as shown in Table 1. We calculated the heritability of the 17 traits which had phenotype data collected over multiple environments ( Fig. 1; Supplementary Table 2). Biomass yield (straw weight + seed yield) showed the lowest heritability (0.48), while heritability of all seed quality traits (erucic acid, oil content, glucosinolate content, linolenic acid) and developmental traits (flowering time, time to maturity) was >0.8 and seed yield showed heritability of 0.72.
Genetic correlations and phenotypic correlations were calculated between the investigated traits across environments and within single environment. Significant correlations with seed yield were observed for all traits across the different environments with the exception of glucosinolate content (GLU) and pod number per main branch (PNM) ( Fig. 1; Supplementary Table 3). These two traits  were not significantly correlated with seed yield across all experiments, but showed significant correlations in a single experiment and hence were also considered as associated traits for the subsequent analyses. Disease resistance traits were investigated separately in different experiments, and hence only phenotypic correlation was calculated for these traits. However, disease resistance traits also showed significant correlation with yield in several experiments (Supplementary Table 3). Path analysis was undertaken to identify complex connections between seed yield and other traits. The results demonstrated that although some specific traits did not affect seed yield directly, these traits affected seed yield indirectly via interactions with other traits (Fig. 2). For instance, in the N6 experiment, the direct path coefficient for plant height was −0.02, but its indirect path coefficient through pod number was 0.39. This suggests that a taller plant would usually have more pods, and therefore increased yield. In the S6 experiment, the direct path coefficient of seed development time was −0.1, but its indirect coefficients through flowering time and seed weight were 0.09 and 0.09, respectively, which explains its positive correlation of 0.15 with seed yield (Supplementary Table 4). Associated traits like oil content, seed weight, pod number and branch number showed a relatively consistent effect (positive correlation) on seed yield. Some traits also had differing impacts on seed yield in different environments. For example, flowering time showed a positive correlation with seed yield in winter environments, but was negatively correlated with seed yield in the semi-winter environments.

Detection of consensus QTL accounting for all investigated traits using the high-density SNP genetic map of the BnaTNDH population
QTL accounting for seed yield and 21 additional traits collected from up to 19 experiments were analyzed using a newly constructed high-density SNP genetic map of the BnaTNDH population ). The newly constructed genetic map spanned a total of 2077.9 cM with an average of one locus per cM, covering 94.92% of the "Darmor-bzh" assembled reference genome as calculated by the physical position of the loci on each chromosome (Supplementary Tables 1, 5).
A total of 2569 identified QTL accounting for the 22 traits were detected with the new map: the average LOD value, R 2 and confidence interval were 4.92, 7.60% and 6.25 cM, respectively (Table 3; Supplementary Table 6). After integrating redundant QTL that were either repeatedly detected from different environments or re-counted by totals versus components of glucosinolate and tocopherol content traits, a total of 1904 consensus QTL were integrated for the 22 traits, with an average confidence interval reduced from 6.25 to 5.67 cM, in which there were 42 major QTL (R 2 > 20%), included seed yield, glucosinolate, erucic acid, disease resistance, seed number per pod, branch number, pod number, relative first primary branch height (Supplementary Tables 6, 7,  8). For those traits investigated in both semi-winter and winter macro-environments, a total of 155 QTL could be detected in both of the micro-environments. For the traits investigated in more than one experiment, 22.18% (420 out of 1894) of the QTL were repeatedly detected, and these were considered to be reproducible QTL. Furthermore, the proportion of reproducible QTL was significantly associated with the heritability of the traits (r = 0.63, P < 0.05).
After the meta-analysis, most consensus QTL for a single trait overlapped with QTL for other traits, illustrating a complex genetic regulatory network. The additive effects of some QTL also corresponded to the genetic correlations between traits. For example, qPN.A3-1 and qSY.A3-2 overlapping on linkage group A3 were responsible for the increase in pod number and seed yield, respectively (Supplementary Table 7), suggesting a pleiotropic effect or close linkage of the genes for the two traits. The two traits  (Table 1). The values for correlation significance "0", "1", "2", "3" represent the significance levels "no significant correlation", "significant in a single environment (i.e., single experiment) with P < 0.01", "significant in several environments (i.e., several experiments) with P < 0.05" and "significant in several environments with P < 0.01", respectively were also significantly correlated (r = 0.25, P < 0.001). A total of 12 QTL clusters distributed in linkage group A2, A3, A4, A7, A9, A10, C4, C5 and C6 were observed (Supplementary Table 7). For instance, in linkage group C6, there were 16 consensus QTL clustered in the region covering 13.9 cM, accounting for seed yield, branch number, pod number, plant height, flowering time, glucosinolates, and erucic acid, and these traits were significantly correlated (Supplementary Tables 3, 7).

Detection of "essential QTL" for seed yield with consideration of trait associations
A total of 80 consensus QTL responsible for seed yield were detected using the seed yield phenotype value: of these, 78 overlapped with QTL for other correlated traits. Correlation and path analysis also showed that seed yield was directly or indirectly correlated with almost every trait investigated. Therefore, meta-analysis was conducted to integrate the QTL for the traits correlated with seed yield (Supplementary Table 9). As the genetic correlation between seed yield and seed quality traits would be affected by the path correlation among the seed quality component traits, seed quality trait-related QTL that did not overlap with yield and yield-component trait QTL after meta-analysis were removed. As a result, 525 inseparable QTL involving 1266 markers across the whole genome were obtained, with an average QTL confidence interval of only 2.48 cM, and denoted as "essential QTL" contributing to the trait of seed yield (Table 4; Fig. 3; Supplementary Tables 9, 10; Supplementary Fig. 1). SNP alleles of the 1266 markers that positively contributed to seed yield are provided in Supplementary Table 10, and the candidate genes in the interval of the essential QTL are provided in Supplementary Table 11. Among the 525 essential QTL, 428 were integrated from QTL for multiple traits and were defined as pleiotropic, while the other 97 QTL were trait-specific. For the trait-specific essential QTL, two were seed yield-specific and 11 were seed yield component-specific. Most trait-specific essential QTL were responsible for developmental traits such as plant height, flowering time and branch number. There were also several seed quality trait QTL controlling, for example, glucosinolate components, seed oil and erucic acid content, which may affect seed yield in specific environments (Supplementary Table 9). Some of the essential QTL (75) were directly comprised of consensus QTL for seed yield and other consensus QTL for yield-related traits. For example, seed yield QTL qSY.C9-2 on linkage group C9 overlapped with QTL for seed number (qSN.C9-2), pod number (qPN.C9-1), and plant height (qPH-LP.C9-2) (Supplementary Table 9, Fig. 4a). Some of the essential QTL (31) did not directly involve consensus QTL for seed yield, but contained at least two consensus QTL for seed yield-related traits with consistent effects (either positive or negative contribution to the traits). Although consensus QTL for seed yield could not be detected from such loci, the accumulated effects from these yield-component QTL could contribute substantially to seed yield. For instance, the allele of QTL qSN.C3-3 associated with increasing seed number overlapped with QTL qPN.C3-1 for increasing pod number (77-86 cM on linkage group C3; Fig. 4b). A number of the essential QTL (25) did not contain consensus QTL for seed yield, but contained consensus QTL for seed yield-related trait components with opposite phenotypic contributions (both positive and negative in relation to seed yield). For example, at 60.9-66.2 cM on linkage group A2, the Tapidor allele of QTL qSW.A2-1 was responsible for reducing seed weight, whilst the overlapping qPN.A2-1 was responsible for increasing pod number (Fig. 4c).
For indirect QTL with consistent effects, 26 showed environment-specific contribution to seed yield, as they overlapped with several seed yield component QTL detected in the same macroenvironment. Meanwhile, five indirect QTL affected the seed yield through different yield components in different macroenvironments. For instance, es.C3-22 included QTL for pod number (qPN.C3-3, detected in the winter environment) and QTL for seed number (qSN.C3-5, detected in the semi-winter environment). This suggested that this essential QTL for seed yield could affect yield performance through different paths in different environments. Of the indirect QTL with mixed effects, 21 had opposite contributions to yield components, offsetting the yield improvement in the same macro-environment. However, four indirect QTL showed such a contribution in different macro-environments. These QTL showed an environmentspecific potential for yield improvement, and mostly showed a positive contribution in the semi-winter environment. Only es. C9-17 showed a positive contribution to seed yield in the winter environment (through increased pod number).

Prediction accuracy for seed yield based on "essential QTL"
To test the prediction accuracy for seed yield based on predictions implementing essential QTL, we used the 2041 marker bins in the TN linkage map to perform prediction on seed yield in semi-winter environments using the prediction model RR-BLUP. Of these 2041 markers, 1266 markers were located within the confidence intervals of essential QTL (Supplementary Table 10) and 775 markers were not located within the confidence intervals of essential QTL. To avoid confounding effects from shared markers between the two datasets, two additional sets of 1266 and 775 markers were also randomly selected from the linkage map to assess prediction accuracy. The highest prediction accuracy of 0.5211 was achieved when the markers used only involved the 525 essential QTL. This prediction accuracy was superior to using the set of markers out of the QTL interval, using the same number of markers randomly chosen from the whole set of markers, or even to using the whole set of 2041 markers for prediction (accuracy of 0.5068) (Fig. 5).

Discussion
Most agronomic traits of interest in crop breeding are complex quantitative traits, controlled by many genes  (Table 1). Path analysis for the traits investigated in the environment S6 (experiment done in a semi-winter environment in 2006). Traits that have a relatively high path coefficient for seed yield are shown. For some traits not investigated in the environment S6, data from other environments (S4 and S5, experiments done in semi-winter in 2004 and 2005, respectively) were used and are indicated with blue and yellow lines. These traits include seed yield and its components (diamond), developmental traits (circle) and seed quality traits (square), linked by three different colors of lines. More results for the trait correlation and path analysis under other environments can be found in Supplementary Tables 3  and 4 with minor effects (Chesler et al. 2005). Detecting these minor-effect genes and biologically and statistically validating their effects on crop phenotypes is challenging. As a result of crop domestication, a considerable number of traits are associated, either by co-selection, or through coregulation of underlying genes (Balasubramanian et al. 2007;Bernardo 2008). When estimating and attempting to improve seed yield, breeders always consider not only the trait of seed yield itself, but also traits associated with seed yield, such as plant height, branching, flowering time and pod number. However, these interrelationships between traits are rarely considered in QTL prediction. This may limit prediction accuracy as well as our understanding of how different phenotypes and underlying genetic loci interact to control complex traits. In this study, we propose the concept of "essential QTL": combined multi-trait or multi-environment QTL that contribute either directly or indirectly (via interactions between related traits) to seed yield. Our results provide a model for predicting and understanding complex traits in crops.
Our analysis of correlations between traits and QTL locations in the large BnaTNDH data set (Figs. 1, 2 Table 4), anchored by at least one unique SNP-bin marker. Most of the essential QTL showed pleiotropic effects, as well as environment effects. Finally, a total of 1266 essential QTL markers (or representative markers for genetic bins) were identified for reference in future studies (Supplementary Table 10).
Using essential QTL significantly improved prediction accuracy for seed yield under a genomic selection model relative to other marker datasets used in our study (Fig. 5). Although there are a few reports on the prediction accuracy for seed yield with GS models as 0.45 (Jan et al. 2016;Würschum et al. 2014), little research related to prediction accuracy of seed yield has been done in rapeseed. In this study, the genomic prediction accuracy for seed yield in semi-winter environment using the whole marker dataset was 0.5068, but the accuracy was improved (to 0.5211) by using markers located in QTL regions (Fig. 5), which highlights the potential use of these markers in increasing prediction accuracy. With the implementation of essential QTL on the whole genome selection model for seed yield, the prediction accuracy would be further improved. Therefore, combining previously acquired QTL information with our proposed prediction strategy could facilitate the process of whole genome selection for complex traits in future. Of the essential QTL we have identified, nearly onefifth had a direct effect on seed yield, but the majority had an indirect effect on seed yield through the contribution of yield-related traits. These latter essential QTL had either a direct effect (all QTL positive or all negative with respect to effect on seed yield) or contained a mixture of positive and negative QTL affecting seed yield in the essential QTL region. In common practice, only direct QTL are detected via the final value of seed yield (Chen et al. 2010;Maccaferri et al. 2008;Moncada et al. 2001;Peng et al. 2011;Radoev et al. 2008). Using the essential QTL approach, we detected an additional 56 QTL that influenced seed yield indirectly through control of other traits. Nearly half (25) of these essential QTL were composed of QTL with opposite effects, offsetting each other in their final contribution to seed yield. The effect of co-localized QTL has also been reported in other studies (Sun et al. 2012;Zhao et al. 2016). In rice, gene OsPTR9 was found to increase seed yield by 16.7-24.5% compared with wild-type lines by improving the nitrogen utilization efficiency (Fang et al. 2013). Recently, Fan et al. (2016) also identified a protein controlling nitrogen uptake efficiency which could increase rice yield by fifty percent. These results provide evidence that yield-related traits such as nutrient uptake efficiency can affect seed yield. The detection of "essential QTL" provides a means of identifying the genetic basis of complex traits under a dynamic growing system with sophisticated trait associations. Our analysis of seed yield comprises an effective approach to dissect complex traits, which would be also used for similar investigation of other traits in other crops.
Comprehensive QTL analysis to discover trait-associated markers is necessary for molecular marker-based selection in rapeseed pre-breeding and breeding. The high-density genetic map newly constructed with the assistance of the Illumina Infinium Brassica 60 K SNP array , combined with eight years of phenotype data across different environments for up to 22 traits (including seed yield and yield components, plant development, seed quality, and biotic stresses), allowed us detect a total of 1904 consensus QTL accounting for 22 traits with an average confidence interval of 5.67 cM (Supplementary Table 7). Furthermore, the available "Darmor-bzh" B. napus reference genome sequence (Chalhoub et al. 2014) allowed us to align 60 K SNP markers to physical positions genome-widely.
From the QTL analysis of BnaTNDH population across multiple-environments, a majority of the QTL were environment-specific, presenting in one macro-environment or single experiment. It indicated significant environment effects on complex traits, and these QTL would provide reference for understanding the environment interactions with genes (Li et al. 2003;Yang et al. 2007). While there were also considerable QTL detected in both macroenvironments (155 QTL) and multiple experiments (420), introgression of the favorable alleles of these QTL would be useful for improving the adaptation of B. napus. Quite a lot of QTL clusters and major QTL were also identified (Supplementary Table 7). These QTL clusters would also provide references for further understanding of the coselection or linkage drags in rapeseed for trait improvement. The major QTL as well as the minor QTL would also provide references for gene cloning. For example, the resistance QTL qBR.A10-1 against L. maculans identified on A10 was found to be the major resistance gene Rlm2, which has now been cloned (Larkan et al. 2014). Two small QTL for resistance against L. maculans were also detected on A4 and C3 of the BnaTNDH population, which suggests that these two QTL may contribute to quantitative resistance. Previous studies also provided evidence that there were several QTL on A4 and C3 contributing to resistance against L. maculans (Jestin et al. 2015;Huang et al. 2016). Candidate genes FLOWERING LOCUS T, FAE1, Rlm2 for flower time, erucic acid content, resistance against L. maculans, were consistently detected in the QTL regions for these traits (Supplementary Table 11).

Fig. 3
The outer circle is the BnaTNDH linkage map, the short black lines indicate the positions of map markers. The red triangles, green triangles and black triangles under the linkage groups indicate the essential QTL for seed yield, seed yield components and other traits, respectively. The red wave in the middle circle indicates the LOD scores for corresponding composite QTL, while the yellow wave in the inner circle indicates the R 2 values for corresponding essential QTL. The details of LOD scores and the R 2 values for corresponding essential QTL are shown in Supplementary Table 9 (color figure  online) 1 3 The BnaTNDH genetic mapping population of B. napus was derived from the cross between the typical European winter-type cultivar "Tapidor" and typical Chinese semiwinter cultivar "Ningyou7", and hence contains genomic diversity between the two major gene pools of B. napus growing in Europe and Asia (Qiu et al. 2006). More QTL were discovered in the A genome than in the C genome in our study (Table 3), supporting previous results (Shi et al. 2009). This may have been due to richer allelic variation in the A genome than in the C genome in the rapeseed Fig. 4 Horizontal lines indicate the QTL intervals while the short vertical lines indicate the positions of peaks. A downward peak position line indicates that the value of the genetic effect of the QTL is negative, and an upward line indicates that QTL has a positive genetic effect on the trait. a On the C9 linkage group, the seed yield QTL (qSY. C9-2) that overlapped with seed yield component QTL were integrated into es.C9-13; b On the C3 linkage group, QTL for seed yield components with positive contribution were integrated into es.C3-17; c On the A2 linkage group, QTL for seed yield components with opposite contributions were integrated via meta-analysis into one essential QTL named as es.   germplasm pool, and this may particularly be the case for the two parents of the BnaTNDH mapping population, as the parent cultivar Ningyou7 is known to carry introgressions from the more diverse A-genome progenitor species B. rapa (Bancroft et al. 2011). In future, as data will be continually obtained through QTL mapping and genome wide association analysis (e.g. Chen et al. 2007;Huang et al. 2016;Lee et al. 2015;Raman et al. 2014;Ramchiary et al. 2015;Xu et al. 2009), it would be possible to integrate the data from this study into further meta-analyses using this approach with consideration of trait association to provide better understanding of complex traits for crop improvement. These data and the integration of genetic and physical maps will provide an important theoretical and applied resource for studies in rapeseed genetics and genomics.
Author contribution statement ZLL and MW performed the research, analyzed the data and wrote the manuscript; YJH and BF provided the unpublished phenotypic data for resistance against Leptosphaeria maculans and revised the manuscript; JLM, CYZ and XL provided the unpublished phenotypic data for resistance against Sclerotinia sclerotiorum and data for tocopherol components; JXX, YL, LS and JLM provided the SNP genotypes, genetic map and the previous published phenotypic data of BnaTNDH population; ASM, RS, PFL and JLM contributed to the methods; JZ designed the research, analyzed the data, and wrote the manuscript. All authors contributed, read, revised and approved the final manuscript. Fig. 5 Prediction of genomic selection accuracy was done using 2041 representative bin markers from the BnaTNDH genetic map, 1266 bin markers located within QTL confidence intervals, 775 bin markers located outside QTL confidence intervals, and by comparison 1266 randomly selected bin markers and 775 randomly selected bin markers, respectively. The top and bottom of each box represent the prediction values of the upper 25% and the lower 25% of the markers; the bold line within each box represents the median prediction value. The upper and lower lines outside the box represent the largest and smallest prediction values All TN map markers (2041) Markers located in QTL (1266) Markers outside of QTL (775) Random selected markers (1266) Random selected markers (