Background

Genomic prediction or genomic selection (GS) was proposed by Meuwissen et al. [1] as a methodology to use genome-wide dense marker information to estimate genetic values for selection of breeding populations. The main difference between GS and previous approaches, such as marker-assisted selection (MAS), is that in MAS a requirement is to identify quantitative trait loci (QTL) first by linkage disequilibrium (LD) in breeding varieties and then use them as candidate genes for selection, whereas in GS it is not necessary to detect QTL and their significance, using markers prior to selection [2].

The application of GS requires phenotypic data and marker information of a training population (TP) of individuals that are used to develop prediction models to calculate genomic estimated breeding values (GEBV). GEBV are then validated through a validation population (VP) of individuals, or selection candidates, which are genetically related to the TP, and for which only marker data is available to predict their own GEBV [35]. Since the introduction of GS, many simulations and experimental results in animal-, crop- and tree breeding have shown the potential of GS to estimate genetic values, to shorten breeding cycles, to increase selection intensities and capture Mendelian segregation effects in order to increase genetic gains [68].

Practical application of GS in animal and crop breeding programs would not have been possible without the rapid and cost-effective development of next-generation sequencing (NGS) technologies which, consequently, have accelerated the discovery of thousands of single nucleotide polymorphisms (SNP) markers [911]. Among all the NGS technologies nowadays available, genome-wide SNP arrays had been shown as preferable for their reproducibility, manageability and storage logistics, as well as their cost efficiency for breeding programs [2]. However, genome-wide SNP arrays require the availability of a reference genome to compare, contrast and detect SNP markers from the genome of the population of interest.

The size and complexity of the conifer mega-genomes (20–40 Gbp), makes the genome assembly process tedious and costly; so far only a few conifer genomes have been assembled, for instance, Picea glauca (Moench) Voss [12], Picea abies (L.) Karst [13], Pinus taeda L. [14, 15], Pinus lambertiana Dougl. [16], and recently, the first draft of the Pinus radiata D. Don genome (G. Sturrock personal communication). Hence, SNP arrays from genome re-sequencing are not available for most conifer species and other technologies as genotyping-by-sequencing (GBS) and/or exome probe panels have been used instead [1722].

Scots pine (Pinus sylvestris L.) is the most widely distributed pine in the world [23, 24]. It is also a highly important commercial species in Europe, particularly in northern countries [25], being the second foremost species for wood production in Sweden [26]. Despite its importance, neither a reference genome, nor a SNP array, are at present available for the species. While the first draft of the Scots pine genome is currently ongoing [27], and until a SNP-chip is developed for the species, it is necessary to use other NGS methodologies in the meantime.

GBS uses restriction enzymes based complexity reduction sequencing method suited for complex, large genomes. GBS utilizes a barcoding system for multiplex sequencing, which increases its efficiency and reduces the genotyping costs [28, 29]. GBS can generate a very large number of SNPs but also produces significant amount of missing data. The latter can be solved with the aid of different imputation methods, such as mean imputation (MI), expectation maximization (EM), family-based k-nearest neighbor (kNN-Fam) or singular value decomposition (SVD) [30, 31]. The imputation with EM algorithm developed at the R package rrBLUP, was specially designed for GBS data assuming that markers follow a multivariate normal distribution and are imputed based on a realized relationship (averaged over all markers), resulting in higher accuracies of GEBV [32, 33]. GBS marker information has been successfully used for parentage reconstruction in Scots pine [34], as well as to perform genomic predictions studies in livestock [35, 36], maize [37], wheat [33], soybean [38], Picea glauca (Moench) Voss ×Picea engelmannii Parry ex Engelm. [39] and radiata pine [40]. Therefore, GBS is an attractive technology that can be used to perform GS and genome-wide association studies (GWAS) for Scots pine [41, 42].

A typical Scots pine breeding program consists of a combination of several selection strategies, essentially conventional progeny testing and breeding value prediction based on pedigree information and reliable phenotypic assessments, at age of 10–15 years. Usually the breeding cycle takes roughly 36 years when the selection strategy is backward selection based on polycross progeny tests of full-sibs, or 21 years when the strategy is forward selection [43]. One of the greatest advantages of GS in conifers is the potential to reduce the length of the breeding cycle, for example by shortening field progeny test time through early evaluation of greenhouse seedlings, based on molecular marker information. Furthermore, selection intensities can increase and therefore higher genetic gains per unit time could be achieved [2, 44, 45].

Traditional breeding value predictions consist on generating kinship coefficients between relatives to estimate the numerator relationship matrix (NRM), based on pedigree, i.e., a relationship matrix based on the expected proportion of the genome shared by two individuals [46, 47]. The NRM is then used in a Best Linear Unbiased Prediction (BLUP) analysis [48] to calculate Estimated Breeding Values (EBV). On the other hand, EBV can be estimated by replacing the NRM in the BLUP analysis with a genomic realized relationship matrix (GRM or RRM), generated with the kinship coefficients or realized proportion of the genome shared between individuals, computed based on the marker information, i.e., the number of loci shared between individuals [45, 49]. Hence, relationships between individuals are more accurately estimated given that the markers can account for Mendelian inconsistencies and for the contemporary and historical pedigree [2, 50, 51], if the number of markers is enough to path the identical-by-descent (IBD) status across the genome [52].

Multiple statistical methods are available to estimate GEBV. Genomic Best Linear Unbiased Prediction (GBLUP) is based on coancestry and the infinitesimal model in quantitative genetics, assuming that QTL allelic effects are normally distributed, and all have a similar contribution to the genetic variance. Conversely, most of the Bayesian approaches presume a prior non-normal distribution of QTL allelic effects (gamma or exponential distribution), thus the variance at each locus can vary [1, 49, 53]. For instance, Bayesian LASSO (BL) assumes that QTL effects follow a Laplace (or double exponential) distribution [54]. Nevertheless, Bayesian ridge regression (BRR) assigns QTL effects to a multivariate normal prior distribution with a common variance, which is modelled hierarchically through a scaled inverted chi-squared distribution [53, 55, 56].

The accuracy of GS predictions depends on the model selected, but also on other factors such as the level of LD, heritability of the trait, effective population size (Ne), TP size, density and amount of the SNP markers, and distribution of QTL effects [3, 7, 57]. Generally large Ne (i.e., low LD between SNP markers and QTL) normally decreases the precision of the GS models, as well as a small number of individuals in the TP or the low heritability of the trait of interest [45, 58, 59]. Increasing the number of high density markers and the size of TP can improve the efficiency of the GS models to a certain extent [6062].

Despite the great number of articles published during the recent years on genomic prediction on forest species, different methodologies have been used to assess the effectiveness or accuracy of predictions, which complicates the comparison of different models and reliabilities between them. By definition, accuracy is the correlation between true breeding values (TBV) and EBV [63], but TBV are never known, therefore approximations to TBV need to be used. The most common methods used in tree breeding to estimate efficiency of genomic prediction models are, 1) the predictive ability (r1) estimated as the Pearson product-moment correlation between the cross-validated GEBV and phenotypes, 2) predictive accuracy (r2) estimated as r1 scaled by the square root of heritability, 3) theoretical accuracy (r3) which is the square root of reliability (i.e., squared correlation between TBV and EBV) [63, 64], and 4) the Accuracy or prediction accuracy (r4) defined as the Pearson product-moment correlation between the cross-validated GEBV and the pedigree based EBV (PEBV) estimated from PBLUP (pedigree based Best Linear Unbiased Prediction). Generally, r4 showed the highest values whereas r1 showed the lowest ones, when compared with the remaining methods, for instance in eucalypt hybrids (Eucalyptus urophylla ×Eucalyptus grandis) [65], maritime pine (Pinus pinaster Ait.) [66, 67], Norway spruce [68], or Eucalyptus nitens [69].

The genomic model may influence the effectiveness of the estimates and Bayesian approaches may seem more appropriate as they can accommodate different distributions of the allelic effects, however the literature on GS in forest trees showed similar results for most models. For instance, Chen et al. [68] observed similar r1 and r4 among four genomic prediction models (GBLUP, BRR, BL and reproducing kernel hilbert space (RKHS) in Norway spruce. Isik et al. [66] detected similar r1 in maritime pine comparing GBLUP, BRR and BL prediction models. Although GBLUP and ridge regression BLUP (rrBLUP) were recommended by Tan et al. [61] for their computational advantages in a eucalypt hybrid study, similar r1 were noted for GBLUP, rrBLUP, BL and RKHS. In an interior spruce study, Ratcliffe et al. [70] stated similar r4 for rrBLUP and BayesC π, which in turn performed better than the generalized ridge regression (GRR), whereas Thistlethwaite et al. [71] observed almost identical predictions with rrBLUP and GRR in Douglas-fir (Pseudotsuga menziensii Mirb. (Franco)). On the contrary, Resende et al. [58] observed better r1 for disease resistance in a loblolly pine study with Bayesian methods when compared with BLUP-based methods.

The objective of this study was to assess, as proof-of-concept for Scots pine, the effectiveness (or efficiency) of genomic versus pedigree predictions for growth and wood quality traits, using two imputation algorithms combined with four prediction models (GBLUP, BL, BRR and PBLUP) and comparing four methods to assess efficiencies (r1,r2,r3 and r4) under several training and validation population scenarios as well as with different numbers of SNPs.

Results

Heritabilities

Narrow sense heritability estimates based on PBLUP were slightly higher than those based on GBLUP, excluding DBH2 (diameter at breast height assessed at 36 years old) which was higher for GBLUP-EM (Table 1). MOEs showed the same heritability for PBLUP and GBLUP-EM. GBLUP heritability estimates calculated from the RRM derived from EM imputation method were higher than those derived from the RND imputation method for almost all traits, except Ht1 (tree height measured at 10 years old) and MOEd. Standard errors were similar for growth traits regardless of the BLUP method used but they were always lower when derived from GBLUP methods.

Table 1 Additive genetic variance \(\left (\widehat {\sigma }_{a}^{2}\right)\), residual variance \(\left (\widehat {\sigma }_{e}^{2}\right)\) and narrow sense heritability with standard errors \(\left (\widehat {h}^{2}\pm \text {SE}\right)\), from PBLUP and GBLUP models for eight phenotypic traits

Among the genomic models, r1,r2 and r3 were larger for traits with higher narrow sense heritabilities (Ht2, DEN, MOEd and MOEs) than for traits with low narrow sense heritabilities (Ht1, DBH1, DBH2 or MFA microfibril angle) (Table 2). Moreover, across all genomic models and imputation methods, a positive linear correlation between r1,r2,r3 and trait heritabilities was observed (respectively r=0.97, p <0.0001; r=0.77, p <0.0001 and r=0.78, p <0.0001), but not between r4 and heritabilities (r=0.15, p=0.3) (Fig. 1).

Fig. 1
figure 1

Regression plots between all genomic models prediction efficiencies and narrow sense heritability (\(\widehat {h}^{2}\)). Prediction efficiencies: a) r1 - predictive ability, b) r2 - predictive accuracy, c) r3 - theoretical accuracy, and d) r4 - prediction accuracy

Table 2 Prediction efficiencies of genetic models for eight phenotypic traits. Four prediction efficiencies (r1 - predictive ability, r2 - predictive accuracy, r3 - theoretical accuracy, and r4 - prediction accuracy, and their standard errors) for eight traits based on pedigree (PBLUP), and three genomic models (GBLUP, BL, and BRR) combined with two imputation methods (EM and RND)

Prediction efficiency of the different models

Through 10-fold cross-validation, r1,r2 and r4 were estimated for all models and imputation methods, and additionally r3 was also estimated for GBLUP and PBLUP (Table 2). The h (square root of heritability) estimated from the GBLUP-EM using the full data, was used to calculate r2, since the RRM captures IBD and identical-by-state (IBS) status between the individuals and can be considered a better estimation than the one from PBLUP.

The lowest prediction efficiency estimates were obtained for r1 (0.19–0.44) and the highest for r4 (0.66–0.84) for all traits, regardless of the model and imputation method used (Table 2). The genomic prediction models performed similarly for all the different calculations of the efficiency (r1 to r4) for most traits, except in terms of r2 for which BRR-EM showed slightly higher estimations for wood traits and diameter, compared with the other genomic prediction methods. Concerning r3, GBLUP-EM showed higher estimations than GBLUP-RND and PBLUP for all traits. Similarly, among all the genomic models r4 showed higher values for GBLUP-EM for all traits. In summary, although the best r4 were observed with PBLUP for all traits, genomic prediction models performed similarly or slightly better than PBLUP regarding r1,r2 and r3 for all traits. Moreover, there was no single genomic prediction model that performed better than others for all the traits, and only EM imputation method combined with GBLUP or BRR had some improvement for some traits.

Relative size effect of the training and validation populations

All models showed similar increasing patterns of r1,r2,r3 and r4 as the number of individuals in the TP increased, for all traits (Fig. 2). The lowest r1,r2,r3 and r4 were observed when half of the individuals were assigned to the TP. However, BRR-EM model only reached its highest r1,r2 and r4 when 90% of individuals were assigned at the TP, for all traits except Ht2 (height measured at 30 years old). In terms of r1 all the remaining models perform similarly when TP size was between 70–90% of individuals, for almost all traits (Fig. 2a).

Fig. 2
figure 2

Prediction efficiencies of different training (TP) and validation population (VP) sizes. Prediction efficiencies: r1 - predictive ability, r2 - predictive accuracy, r3 - theoretical accuracy, and r4 - prediction accuracy. TP sizes: 50%, 60%, 70%, 80% and 90% of the total number of individuals

No clear pattern was observed regarding r2, with the peak for genomic models when 90% of individuals were allocated to the TP, whereas PBLUP for wood traits, by contrast, showed a plateau for a TP containing 70–90% of the individuals (Fig. 2b). Concerning r3, similar efficiencies were seen when TP included 70 to 90% of individuals for GBLUP-EM, performing better than GBLUP-RND and PBLUP (Fig. 2c).

Among all methods, PBLUP had the highest r4 for all eight traits regardless of the TP ratio (Fig. 2d), whereas genomic models performed similarly. Nevertheless, unlike the Bayesian models and GBLUP-RND that required 80–90% of individuals to be allocated to the TP to reach the highest r4, GBLUP-EM needed a subsample of 70% or 80% individuals as TP for almost all traits.

Effect of increasing number of markers on predictions

The impact of the different subsets of SNPs was tested on BRR-EM and BL-EM models since such models consider different distribution of the QTL allelic effects. For instance, the variance at each locus can change, thus the effect of the SNP subsets on the model and its ability to predict BVs can be easily observed. The EM imputation method was selected due to the slightly higher values showed in the previous sections.

The r1,r2 and r4 increased for all traits as the number of SNPs rose (Fig. 3). However, for almost all traits, the greatest increase on r1 and r2 was attained when the subset of markers was 1000 SNPs, yet some oscillations were observed for both models at SNPs subsets of 500 to 6K for almost all traits (Fig. 3a and b). Although BRR and BL models had different patterns for the different number of SNPs, they showed similar r1 and r2 between 6K and 8K SNPs for most of the traits. Ht2, MOEs (static modulus of elasticity), DEN (density) and MOEd (dynamic modulus of elasticity) had the highest r1 and r2.

Fig. 3
figure 3

Prediction efficiencies of the number of markers. Prediction efficiencies: r1 - predictive ability, r2 - predictive accuracy, r3 - theoretical accuracy, and r4 - prediction accuracy. Eleven subsets of SNP markers (100, 200, 500, 1000, 2000, 3000, 4000, 5000, 6000, 7000 and 8719)

Conversely, r4 nearly followed an identical pattern of ascent for both models, as the number of SNPs increased (Fig. 3c). However, r4 kept almost constant around the same value at a range of 3K to 8k SNPs, for both models and all traits.

In short, for both models, the highest values of r4 and r2 were observed at SNP subsets that varied between 6K 8K, while within the range 3K 8K SNPs, no substantial increase was identified in r4 for any of the traits.

Relative selection response of GS

The relative genomic selection response, RSRGS:PS, was estimated for each genomic selection model (GBLUP, BRR and BL) considering only the EM imputation method since this method showed equal or slightly higher efficiencies (between 0.00–0.07) than RND method. The Swedish Scots pine breeding cycle combines several selection strategies sorted in two groups, according to their cycle lengths [43]. For the first group of strategies (i.e., backward selection) the cycle length takes up to 36 years, whereas for the second group of strategies (i.e., forward selection) it takes up to 21 years. In order to estimate RSRGS:PS, it was assumed that GS could help to reduce the cycle lengths to 18 and 11 years following two approaches. The first approach assumed that the cycle could be reduced to 18 years, shortening the progeny test time but considering that female flowering starts at 15–18 years in Scots pine [24]. The second approach presupposed that earlier flowering greenhouse stimulation [72] would produce female flowering at an earlier age in Scots pine, thus the breeding cycle could be reduced to about 11 years. In addition and for both approaches a reduction in the progeny test time was also assumed.

The percentage of increase in selection efficiency for all traits and models showed the potential of GS when reducing the breeding cycle by 50% or more (Table 3). For backward selection strategy and aided by GS, a reduction of 50% in the breeding cycle length (i.e., from 36 to 18 years) resulted in percentages of increased selection efficiencies between 57.1%–143.5%. Moreover, a further reduction in the breeding cycle length of more than 50% (i.e., from 36 to 11 years) assisted by GS and flowering greenhouse stimulation, increased considerably the selection efficiency, being greater for prediction efficiencies r1,r2 and r3 (195.6%–298.4%) than for r4 (157.1%–206.8%). On the other hand, for forward selection, a reduction of 3 years in the breeding cycle aided by GS showed small percentages of increase in selection efficiencies for r1,r2 and r2 ((5.4%–42.0%). Furthermore, this small reduction in the cycle length, showed that traditional phenotypic selection would be more effective, for some traits, in terms of r4 given the low percentages of selection efficiencies showed (-8.3%–9.4%). Nevertheless, for forward selection strategy a reduction of 50% in the breeding cycle length (i.e. from 21 to 11 years) assisted by GS and flowering stimulation, increased the selection efficiencies for all prediction efficiency methods varying between 50%–126.9%. In summary, for all traits and genomic prediction models, the percentage of increase in selection efficiency exceeded 50% when the breeding cycle was reduced by 50%, reaching in many cases percentages that varied between 50%–143.5%.

Table 3 Percentage of increase in selection efficiency of GS for each phenotypic trait, estimated for both selection strategies (Strategy 1 and 2), model ratio (i.e., GBLUP/PBLUP, BRR/PBLUP and BL/PBLUP) and prediction efficiency (r1 - predictive ability, r2 - predictive accuracy, r3 - theoretical accuracy, and r4 - prediction accuracy). Approach 1 and 2 are respectively the breeding cycle length assumptions without and with flowering greenhouse stimulation (i.e. 18 and 11 years)

Discussion

After the genomic selection (GS) concept was proposed in 2001 [1], genomic prediction studies were initially implemented in dairy cattle. The execution of GS in animal and crop breeding programs, such as dairy cattle, oat, maize and wheat, increased genetic gains [44, 73]. Implementation of GS in tree breeding is underway with recent publications in eucalypts [61, 7477], white spruce [7880], black spruce (Picea mariana [Mill.] BSP) [60], interior spruce [39, 70], Norway spruce [68, 81, 82], loblolly pine [58, 83, 84], lodgepole pine (Pinus contorta Douglas) [85] and maritime pine [66, 67]. GS was adopted in tree breeding in the last decade and different methods to estimate prediction efficiencies or accuracies of the cross-validated genomic predictions models have been implemented [2, 86]. In the current study, prediction efficiencies were assessed based on the four most common methods used in tree breeding (i.e., r1,r2,r3 and r4). Results from previous studies showed, in most cases, only one or two of these methods, suggesting that there is not a single consensual method within tree breeding community to evaluate genomic prediction estimations (Table S1).

Predictive ability (r1) has been widely used but normally shows lower values than r2,r3 and r4 for different traits (Table S1) as it uses the individual phenotypes as approximation to TBV and could be comparable to heritability [62, 68, 78, 87, 88]. Predictive accuracy (r2) and theoretical accuracy (r3) have been used less frequently [40, 74, 78]; the former one is considered as an unbiased estimate of accuracy of selection from n-fold cross-validation, since the correlation between an individual phenotype and its TBV cannot be higher than the square root of heritability [52, 89]. The r3 [63] was usually used to evaluate the models with full datasets, however in the current study it was used in the cross-validation analyses as well, since different PEV were obtained for each fold, thus estimations can be evaluated in the same way as the remaining methods.

Another method extensively used has been the prediction accuracy (r4) which generally showed higher values than r1,r2,r3 (Table S1). Congruent with previous studies [60, 65, 71, 80], we observed that all genomic models showed the highest values for r4 (0.65–0.75), followed by r3 (0.59–0.68), r2 (0.49–0.70), and r1 (0.19–0.43) that had the lowest values as expected. Nevertheless, using r4 (i.e., corr(EBVVP,PEBVy)) may inflate the prediction efficiency due to that the individuals in the validation population used to estimate EBV (EBVVP) were a proportion of the individuals in the full dataset (y) used to estimate PEBV (PEBVy) and therefore the correlation between them was generally higher [52, 68, 88].

Heritabilities

No clear pattern was detected between r4 and heritability estimates for maritime pine [67] and norway spruce [68]. Additionally, Grattapaglia and Resende [3] noticed that r3 did not significantly change under different simulated heritability scenarios. Whereas no trend was detected among r4 and trait heritabilities, a positive and strong linear trend between r1,r2,r3 and heritabilities was observed in the current study, i.e., traits with lower heritabilities (below 0.25) exhibited the lowest r1,r2 and r3 (Fig. 1). Higher prediction efficiencies were obtained for traits with moderate heritabilities (above 0.30) which is in line with the positive correlation between trait heritabilities and r1 reported in loblolly pine [58] and maritime pine [66]. Chen et al. [68] concluded that values of narrow-sense heritability were more similar to values of r1 than to r4, as r1 involves both phenotypic and genetic values, however using r2 instead of r1 could remove influence of heritability since it is considered an unbiased estimation of the accuracy of selection from n-fold cross-validation [52, 78, 89].

Effect of the imputation method on the genomic prediction efficiencies

For species such as Scots pine with large and complex genomes [90] but without a reference genome, and with no SNP chips or exome panels developed, genotyping-by-sequencing (GBS) method is considered as an attractive alternative to perform GS studies. When using GBS data, large amounts of missing data are produced, thus filtering and imputation of SNP markers are critical steps [42]. In an interior spruce genomic prediction study with GBS data [39], it was observed that the imputation method used had influence in the quality of genomic predictions and concluded that EM and kNN-Fam imputation methods, provided the highest genomic prediction accuracies (r4). EM was the most efficient imputation method in a wheat breeding GS study [33] with GBS data. Our study partially supports those findings, since among the genomic prediction models used, slightly higher predictions were observed in terms of r2,r3, and r4, when EM imputation algorithm was combined with GBLUP and BRR. In contrast, r1 was almost equal for each trait when BL model was used regardless of the imputation method used. We speculate that the slighly better performance of GBLUP-EM and BRR-EM could be due to that GBLUP and BRR respectively assume that QTL allelic effects are normally and multivariate normally distributed, and in addition the EM imputation method uses a kinship-based imputation algorithm which also assumes that marker genotypes follow a multivariate normal distribution [22, 33]. Thus, the combination of the same assumptions during the imputation and breeding value estimations can result in higher prediction efficiencies.

Effect of the model on the prediction efficiencies

Traits of interest in tree breeding programs have different genetic architecture; thus, different genomic prediction models to evaluate prediction efficiencies may be used [44]. In a two generations maritime pine genomic selection prediction study [66] was observed similar r1 among GBLUP, BRR and BL for growth and stem straightness traits, but with larger prediction bias (estimated by regressing the EBV on the GEBV in the validation set) when BL was used. In a different maritime pine study with three generations, larger prediction bias were detected for PBLUP than for GBLUP or BL [67]. Several statistical methods, namely, GBLUP, BRR, BL and RKHS, were compared in a Norway spruce study with relatively similar r1 (0.16–0.44) and r4 values (0.58–0.77) observed for all of them [68]. However in the same study PBLUP outperformed the genomic models in terms of r4. On the contrary, genomic models (GBLUP, rrBLUP, BL, and RKHS) performed better than PBLUP in terms r1 (0.27 and 0.12, respectively) in eucalyptus hybrids, yet pedigree errors were observed in the populations studied, resulting in the underestimation PBLUP estimates [61]. The authors contemplated the possibility that the marker data captured precisely the Mendelian sampling variation, therefore the genetic variation was based on the true proportion of the genome that was IBD or IBS among individuals.

Our study is in line with the studies mentioned above, since similar prediction efficiencies were observed regardless the genomic model used. PBLUP outperformed the genomic models in terms of r4 (0.75–0.84) for all traits, however in terms of r1,r2 and r3 GBLUP, BRR, BL and PBLUP showed similar prediction efficiencies for growth and wood quality traits (Table 2). In short, either GBLUP, BRR or BL provided similar prediction efficiencies for growth and wood quality traits in Scots pine.

Effects of the training and validation populations sizes on prediction efficiencies

Previous studies stated that r1 and r2 increased as the size of the training set increased without reaching a plateau which differed from our findings. For instance, Tan et al. [61] detected that r1 ascended as the TP size rose for all models and traits evaluated in eucalypt hybrids. Similarly, Lenz et al. [60] asserted that r4 increased as the TP size augmented, however after assigning TP of 67% of individuals the increase of r4 was negligible. Nevertheless, some similarities were found with other studies, especially when utilizing GBLUP, for which r4 rose as the TP size increased, achieving similar r4 values for height when TP reached 80–90% of individuals, and 75–90% of individuals for wood quality traits [68]. In the current study, a TP size of 70–80% was enough to obtain similar values as the full TP size in terms of r1,r2,r3 and r4, depending on the trait (Fig. 2). In the studies cited above [60, 61, 68], it was observed that the number of trees per family had an effect on the GS efficiency; however, in the current study in which the number of trees per family was very low, it was still observed the advantage of applying GS prediction methods in Scots pine.

Effect of the number of SNPs

In a general conifer breeding program simulation study Li et al. [57] detected an increase in the accuracy (correlation between GEBV and simulated TBV) of GEBV for traits with low and high heritability when the subset of SNP markers increased from 7K to 90K, for a TP with 1000 clones from five simulated generations. Moreover, the same pattern was observed for GBLUP, BRR, BL and RKHS models in Norway spruce [68], where r1 and r4 increased with number of markers reaching almost a plateau between 4K and 8K SNP markers, regardless of the model used. Similarly, in eucalypt hybrids [61], when the subset of SNP markers dropped below 5K larger reduction in the r1 was observed for GBLUP and RKHS models; further, traits with lower heritabilities were observed to be more sensitive to the reduction in the number of SNP markers. On the contrary, in black spruce when markers were reduced randomly from 5K to 1K no noticeable decrease was found in r4 for GBLUP and Bayesian framwork models [60]; nonetheless, when markers were further reduced to 500, the r4 decreased dramatically.

Our results were in accordance with those studies, reaching similar efficiencies in terms r1 and r2 when the number of SNPs reached 6K–7K, or 3K–7K for r4 (Fig. 3), to those achieved when using all 8719 SNPs and regardless of the genomic model used, therefore the number of SNPs had more influence on the prediction efficiency than the genomic model used.

Relative selection response of GS

A simulation study showed that when the breeding cycle length was reduced by 50% the RSRGS:PS doubled, and that when the cycle length was reduced by 75% the RSRGS:PS tripled at high marker levels [3]. This theory was confirmed by Resende et al. [91] that by reducing 50% the loblolly pine breeding cycle, reported a percentage of increase in selection efficiency of GS between 53–92% for DBH and 58–112% for Ht, compared to the traditional pedigree-based selection. Similarly, percentages of increase in GS efficiency varied between 106% to 139% for Ht when the breeding cycle length of interior spruce was reduced by 25% [70]. In Norway spruce, the percentages of increase in GS efficiency of MOE were between 69–83% when the cycle length was also shortened by 50% [68]. The results of the current study exhibited that a reduction of the cycle length by 50% increased the percentage GS efficiency to double for almost all traits, regardless the selection strategy (Table 3). Such reduction in the breeding cycle length of Scots pine could only be possible by shortening field-testing periods aided by the use genomic prediction at young ages, and that female flowering can start at earlier ages after greenhouse flowering stimulation [72]. Moreover, if cycles could even be shortened more than 50%, higher percentages of increase in GS efficiency could be reached which in the case of this study were almost triple than traditional pedigree-based selection (Table 3).

Conclusions

Our results provide an initial perspective in the use of genomic prediction in Scots pine and are encouraging to develop GS strategies for the species. Similar prediction efficiencies were observed among pedigree and all genomic prediction models for growth and wood quality traits, suggesting that genomic prediction methods can be applied as an alternative to traditional pedigree predictions for Scots pine.

Our study showed that GS could potentially reduce the breeding cycle by half, and under that assumption, the relative genomic selection efficiency could double depending on the selection strategy and the trait.

The results presented here are based on a relatively small population with a shallow pedigree, for which 8K SNPs were sufficient to reach high GS prediction efficiencies. More studies using different populations, preferably populations with deeper pedigrees should be carried out to better understand the predictive power of SNP markers for traits with complex inheritance patterns in the species. The predictive power of SNP markers should be tested over at least two generations because the marker-QTL phase is expected to change once the population undergoes through breeding, due to recombination of homologous chromosomes during the meiosis.

Methods

Plant material

In this study a Scots pine full-sib progeny trial (identified as F261-Grundtjärn), belonging to the Swedish tree improvement program at Skogforsk (The Forestry Research Institute of Sweden) was used. The trial consists of 184 full-sib families and 7240 trees (F1-generation), generated from a partial diallel mating design of 40 plus-trees (F0-generation). The progeny trial was established in 1971 by Skogforsk as a randomized single tree plot design, divided into 14 post-blocks [92]. A more detailed description of the trial can be found in Fries et al. [93]. A number of 694 progeny trees (F1) from 183 families were selected for this study, such that the number of trees per family varied from one to seven with an average of four individuals per family.

Phenotypic data

Growth traits were measured in the 7240 progeny trees whereas wood properties were estimated in a subset of 694 progeny trees. Height (Ht) was measured when the progeny trees were 10 (Ht1) and 30 (Ht2) years old. Diameter at breast height (DBH) was also measured twice, at ages 30 (DBH1) and 36 (DBH2). In 2011, increment cores at breast height were extracted and processed by Silviscan (Innventia AB, Stockholm, Sweden). From the Silviscan analysis, three traits were used in this study: microfibril angle (MFA), static modulus of elasticity (MOEs) and wood mean density (DEN). In addition, dynamic modulus of elasticity (MOEd) predicted by Hitman ST300 (Fiber-gen, Christchurch, New Zealand) was also used in the current study. Wood traits are further described in Hong et al. [94].

DNA extraction and genotyping

The commercial NucleoSpin Plant II kit (Machery-Nagel, Dren, Germany) was used to extract genomic DNA from vegetative buds or needles from the 694 progeny trees and 46 parents. DNA concentration was determined with Qubit 2.0 fluorometer (Invitrogen, Carlsbad, CA, USA). Then, three genomic libraries for GBS were prepared following the procedure described in Pan et al. [29] by using 827 samples (replicates included) and PstI high fidelity restriction enzyme (New England Biolabs, MA, USA). The libraries were sequenced on an Illumina HiSeq 2000 platform at SciLifeLab, Sweden.

Thereafter, paired-end raw reads of each GBS library were cleaned and demultiplexed by the process_radtags module of Stacks v.1.40 [95] on the basis of 300 barcodes with 48 bp. Cleaned reads of each sample were aligned to the Pinus taeda v1.0 [96] reference genome, using BWA mem v0.7.15 [97] with default parameters. Alignments were coordinate-sorted and indexed using Samtools v1.5 [98]. SNP markers were called using the mpileup command of SAMtools over all the samples simultaneously, with default parameters, and converted into VCF matrix using BCFtools v0.1.19 [99]. Furthermore, these variants were sorted to keep only high-quality SNPs. Using vcfutils in BCFtools with default parameters, the SNPs within 3bp around an indel or with mapping quality <20 were filtered out; using Vcftools v.0.1.12b [100], only SNPs with coverage ≥5x, genotype quality (GQ) ≥30, genotype calling rate >20% were retained. Using the custom Perl program (ReplicateErrfilter.pl), discordant genotypes of 66 replicated samples were detected and the SNP sites with ≥3 replicate errors were filtered out. After this step, 24,152 informative SNP markers were retained.

Finally missing genotypic data were imputed using two imputation methods to compare their prediction efficiencies. Random (RND) imputation with the codeGeno function in synbreed package [101] in R (R Core Team 2016) and imputation with the expectation maximization (EM) algorithm by the A.mat function implemented in rrBLUP package [32] in R. A total of 15,537 and 15,433 SNPs with minor allele frequency (MAF) lower than 1% and with a missing data threshold lower than 10% were removed using RND and EM imputation methods, respectively.

Statistical analysis

Initial analysis. Growth traits (Ht1, Ht2, DBH1 and DBH2) were available for all progeny trees in the trial, therefore univariate single site spatial analysis were performed in ASReml 4 standalone [64], with the objective to reduce the within-trial micro-environmental effects prior to any other analysis (see supplementary information S1). Briefly, diagnostic tools, variograms and plots of spatial residuals were used to detect design, treatment, local and extraneous effects. The predicted design effects and spatial residuals were extracted from the ASReml output files and used to remove micro-environmental effects from the raw data [102, 103]. Wood properties were assessed for a subset of 694 progeny trees and the micro-environmental effects were scaled for the raw data by removing the variation of the experimental design features and post-block effects. The environmentally adjusted phenotypic data (predicted values of each tree) were used for the genetic analysis [104107].

Best Linear Unbiased Prediction (BLUP). The following model was used for PBLUP and GBLUP:

$$\begin{array}{@{}rcl@{}} y=Xb+Za+e, \end{array} $$
(1)

where y is the vector of the adjusted phenotypic data for each trait, b is the vector of fixed effects (intercept), a is the vector of random effects and e is the vector of residual effects, which is assumed to follow a normal distribution as \(var(e)\sim N\left (0, I \sigma _{e}^{2} \right)\), where \(\sigma _{e}^{2}\) is the residual variance and I is the identity matrix. X and Z are the incident matrices of b and a.

In the PBLUP, the vector a (additive genetic effects) from Eq. 1 is assumed to follow a normal distribution with expectations of \(\sim N\left (0, A \sigma _{a}^{2} \right)\), where \(\sigma _{a}^{2}\) is the additive genetic variance and A is the numerator relationship matrix (NRM). Briefly, the diagonal elements (i) of A were estimated according to Lynch and Walsh [108] as:

$$\begin{array}{@{}rcl@{}} A_{ii} = 1 + \frac{A_{gh}}{2}, \end{array} $$
(2)

where g and h are the parent of individual i.

The off-diagonal elements are the relationship between individuals i and j and were estimated as:

$$\begin{array}{@{}rcl@{}} A_{ij} = A_{ji} = \frac{A_{jg} + A_{jh}}{2}, \end{array} $$
(3)

For the GBLUP, the vector a is assumed to follow a normal distribution with expectations of \(\sim N\left (0, G \sigma _{a}^{2} \right)\),where G is the genomic realized relationship matrix (RRM) estimated according to VanRaden [49] as:

$$\begin{array}{@{}rcl@{}} G=\frac{(M-P) (M-P)^{T}}{2\sum_{j}p_{j}\left(1-p_{j}\right)}, \end{array} $$
(4)

where M is the matrix of genotyped samples, P is the matrix of allele frequencies with the jth column given by 2(pj−0.5), where pj is the observed allele frequencies of the genotyped samples. The elements of M were coded as 0, 1 and 2 (i.e., the number of minor alleles) for the estimation of the G matrix with function kin from the synbreed package in R in the case of RND imputed data, and with the function A.mat from the rrBLUP package in R, for the EM imputed data. PBLUP and GBLUP analyses were conducted in ASReml-R version 4.1.0.106.

Bayesian models. BRR and BL were implemented using the BGLR function from the BGLR package in R [109]. In brief, the following model was used:

$$\begin{array}{@{}rcl@{}} y=1_{n}\mu+Wm+e, \end{array} $$
(5)

where y is the vector of n adjusted phenotypes, 1n is the vector of ones, μ is a scalar denoting the intercept, W is the incidence matrix for the m vector of marker effects, and e is the vector of residual effects that follow a multivariate normal distribution \(e\sim N\left (0,I_{n}\sigma _{e}^{2}\right)\). In BRR, vector m from Eq. 5 is assigned a multivariate normal prior distribution with a common variance to all marker effects, that is \(m\sim N\left (0,I_{p}\sigma _{m}^{2}\right)\), where p is the number of markers, \(\sigma _{m}^{2}\) is the unknown genetic variance which is contributed by each marker and assigned as \(\sigma _{m}^{2}~\chi ^{-}2(df_{m},S_{m})\), where dfm is degrees of freedom and Sm is the scale parameter. Residual variance is assigned as \(\sigma _{e}^{2}~\chi ^{-}2\left (df_{e},S_{e}\right)\), with dfe degrees of freedom and scale parameter for residual variance Se [55]. For the BL method assumes that vector m from Eq. 3 follows a hierarchical prior distribution with \(m\sim N\left (0,T\sigma _{m}^{2}\right)\), where \(T=diag\left (\tau _{1}^{2},\ldots,\tau _{p}^{2} \right)\). \(\tau _{j}^{2}\) is assigned as \(\tau _{j}^{2}~Exp\left (\lambda ^{2}\right), j\,=\,1,\ldots,p\). λ2 is assigned as λ2Gamma(r,δ). Finally, the residual variance is assigned as \(\sigma _{e}^{2}\sim \chi ^{-}2\left (df_{e},S_{e} \right)\), where dfe is degrees of freedom and Se is the scale parameter for residual variance [54].

For the Bayesian methods, GEBV in the VP were estimated as,

$$\begin{array}{@{}rcl@{}} \widehat{g}_{i}=\sum_{j=1}^{n}Z_{ij}^{'}\widehat{a_{j}}, \end{array} $$
(6)

where \(Z_{ij}^{'}\) is the indicator covariate (-1, 0, 1) for the ith tree at the jth locus and \(\widehat {a_{j}}\) is the estimated effect at the jth locus.

Model convergence and prior sensitivity analysis. Bayesian algorithms were extended using Gibbs sampling for estimation of variance components. The Gibbs sampler was run for 20,000 iterations with a burn-in of 1,000 iterations and a thinning interval of 100. The convergence of the posterior distribution was verified using trace plots.

Validation and evaluation methods

Cross validation. For all traits, pedigree based (PBLUP), genomic models (GBLUP, BRR and BL) and imputation method (EM and RND), a 10-fold cross-validation analysis was implemented, i.e., 90% of individuals randomly selected for the TP and 10% in the VP. BRR and BL were tested with eleven different sets of SNP markers randomly selected (i.e., 100, 200, 500, 1K, 2K, 3K, 4K, 5K, 6K, 7K and 8719). Additionally, to evaluate the performance of the pedigree versus genomic prediction models, different sizes of TP and VP were used. All individuals were randomly split into four different proportions of TP/VP, 80%, 70%, 60% and 50% (i.e., 555, 486, 417 and 347 individuals, respectively) for TP and the rest as VP. Each analysis was replicated 10 times.

Prediction efficiency of traditional and genomic genetic evaluations. Prediction efficiencies of pedigree and genomic models were evaluated and compared based on the predictive ability, predictive accuracy, theoretical accuracy, and prediction accuracy.

  1. 1)

    The predictive ability (r1) was defined as the Pearson product-moment correlation between the EBV of the individuals in the VP (EBVVP) and their adjusted phenotypes (y). i.e., r1=corr(EBVVP,y).

  2. 2)

    The predictive accuracy (r2) was estimated as the r1 scaled by h (square root of individual narrow sense heritability), i.e., r2=corr(EBVVP,y)/h [52].

  3. 3)

    The theoretical accuracy (i.e., square root of reliability) was estimated for PBLUP and GBLUP as \(r_{3}=\sqrt {1-\frac {PEV}{G_{ii} \sigma _{a}^{2}}}\), where PEV is the prediction error variance of the VP, and Gii is the diagonal element of the ith individual in the G matrix for GBLUP model or in the case of PBLUP model Gii=Aii, i.e., the diagonal element of A matrix [63].

  4. 4)

    The prediction accuracy (r4) was estimated as the Pearson product-moment correlation between the EBV of the individuals in the VP (EBVVP) and the PEBV estimated with all 697 progeny trees (PEBVy), i.e., r4=corr(EBVVP,PEBVy) [62, 91].

To avoid fold effects, all the methods were estimated within each fold and averaged across folds and replicates [53].

Heritability estimation. Pedigree- and genomic-based narrow sense heritabilities (h2) were estimated for PBLUP and GBLUP as:

$$\begin{array}{@{}rcl@{}} \widehat{h^{2}}=\frac{\widehat{\sigma_{a}^{2}}}{\widehat{\sigma_{p}^{2}}} = \frac{\widehat{\sigma_{a}^{2}}}{\widehat{\sigma_{a}^{2}}+\widehat{\sigma_{e}^{2}}}, \end{array} $$
(7)

where \(\widehat {\sigma _{a}^{2}}, \widehat {\sigma _{p}^{2}}\) and \(\widehat {\sigma _{e}^{2}}\) respectively are the additive genetic, phenotypic and residual variances.

Relative selection response of GS. Assuming that selection response is inversely proportional to the length of the breeding cycle, the relative selection response (RSR) of GS to the traditional pedigree-based selection (PS) can be estimated as a ratio (RSRGS:PS) between the efficiency or accuracy method and the breeding cycle time in years [3].

$$\begin{array}{@{}rcl@{}} RSR_{GS:PS}=\frac{r_{GS}}{r_{PS}}\times\frac{CL_{PS}}{CL_{GS}}, \end{array} $$
(8)

where rGS and rPS are the efficiency of GS and PS, respectively and CLPS and CLGS are the breeding cycle lengths of PS and GS, respectively.

The percentage of increase in selection efficiency of GS was estimated as (RSRGS:PS−1)∗100 [3]. In order to estimate RSRGS:PS, two GS approaches were assumed to reduce the breeding cycle, by shortening the period of field progeny testing needed for phenotypic evaluations. In addition, for the first approach the cycle length was reduced to 18 years considering that female flowering starts at 15–18 years in Scots pine [24], whereas the second approach assumed that flowering greenhouse stimulation [72] would produce female flowering at around 11 years.

RSRGS:PS was estimated considering the four different prediction efficiency methods described in the previous sections (r1,r2,r3 and r4) and both breeding cycle reduction approaches.