Introduction

The phenotype of an individual is controlled by its genotype, the environment and any interactions between genotype and environment (G×E interactions, or G×E). Such interactions are said to exist when the comparative performances of genotypes vary according to the environment. The performance of one genotype that is superior in one environment might be inferior in another environment (de Jong 1990; Falconer and Mackay 1996). In forestry, G×E can lead to unpredictability of some genotypes’ performance across certain sets of environments. Characterising and understanding G×E in order to reduce or even remove such unpredictability has, therefore, been an overarching goal. Where it occurs, G×E complicates the design of breeding programmes but it also provides opportunities for matching the most appropriate planting stocks to targeted deployment conditions to optimise forest health, growth and wood quality in intensively managed forests.

G×E can be categorised into two major types (Lynch and Walsh 1998; White et al. 2007): (1) rank-change interaction, whereby genotypes are ranked in different orders in different environments; and (2) level-of-expression interaction,Footnote 1 whereby the expression of genotypic differences (e.g. the spread of the breeding values) varies across environments, not necessarily with any change in the order of the genotype rankings. Breeders, therefore, need to determine the patterns and magnitude of G×E in order to obtain the best genetic gain for the forest industry (Muir et al. 1992; Raymond and Namkoong 1990). Since breeders are primarily concerned with evaluation and selection among candidate genotypes, rank-change interaction will generally be of greater interest for them. While level-of-expression interaction is generally less important for breeding, it may be of major interest for helping to decide what genetic material is chosen for deployment in particular growing environments and for particular end-products.

In forest tree breeding, G×E can often be substantial and create problems in finding consistently superior genotypes, especially for broad adaptation. Breeders often address G×E either by selecting stable genotypes that are not sensitive to environmental changes, or selecting genotypes for specific environments in order to maximise genetic gain on that site (Raymond and Namkoong 1990). In trees, significant G×E has been reported in almost all commercially important species such as radiata pine (Pinus radiata D. Don; Raymond 2011; Wu and Matheson 2005), loblolly pine (Pinus taeda L.; McKeand et al. 1997; Paul et al. 1997), Scots pine (Pinus sylvestris L.; Gullberg and Vegerfors 1987; Haapanen 1996), slash pine (Pinus elliottii Englem. var. elliottii; Hodge and White 1992; Roth et al. 2007), eucalypts (Eucalyptus spp.; e.g. Costa e Silva et al. 2006; Hardner et al. 2010), Douglas-fir (Pseudotsuga menziesii (Mirb.) Franco; Campbell 1992; Dungey et al. 2012), spruces (Picea spp.; Bentzer et al. 1988) and poplars (Populus spp.; Rae et al. 2008; Yu and Pulkkinen 2003).

Increased awareness of the magnitude and nature of G×E helps tree breeders and foresters to see the opportunities to increase genetic gain in the forests (Cullis et al. 2014; Ivković et al. 2013a, b). With this awareness comes a transition from thinking about G×E mainly in relation to selection and structuring of breeding populations to thinking more about G×E also in relation to deployment. For breeding, the expected benefits of capturing additional genetic gain may not outweigh the complexity and additional costs of a multi-environment breeding programme (Carson 1991). For deployment, however, concern among tree-growing businesses in New Zealand over the realisation of genetic gain on some sites led to a reappraisal of the situation by the Radiata Pine Breeding Company (RPBC) (Butcher, 2015, personal communication). Application of multi-environment factor analytic models has also given a new level of statistical efficiency and evidence that there are additional genetic gains possible by deploying the right genotypes on the right sites (Cullis et al. 2014).

In this review, we examine the available analytical methods for identifying G×E and their properties, the empirical evidence of the importance and pattern of G×E in forest trees for selected commercially important species, information on the environmental variables that drive G×E in forestry and strategies for dealing with G×E in tree breeding programmes. Based on this information, the implications and challenges in exploring G×E to optimise genetic gain in forest tree breeding are discussed. The importance of genomic selection in identifying G×E in forest tree breeding is also discussed. Our focus in reviewing the statistical methodology is on the analysis of data from existing trials for selection and other breeding decisions. The aspect of identifying the environmental drivers of G×E for future choices of screening environments is given only limited attention, being seen as a topic for separate review.

Analytical methodology for the estimation of G×E interactions

A number of analytical methods have been proposed to measure the extent of G×E for traits in tree breeding, including ratio of interaction to genetic variance, stability analysis, type-B genetic correlation biplot analysis, factor analytic models and reaction norm. All these methods can be implemented as special cases of mixed linear models. The ratio of interaction to genetic variance can be estimated using analysis of variance (e.g. Shelbourne 1972). Stability analysis has been used to identify stable or sensitive genotypes across multiple environments (e.g. Eberhart and Russell 1966; Finlay and Wilkinson 1963; Huehn 1990). Type-B genetic correlation (Burdon 1977) and factor analytic models characterise patterns of ranking changes of genotypes across multiple environments (Cullis et al. 2014). Likelihood ratio tests now provide robust tests of interaction in departures from perfect type-B correlations. Biplot analysis combines analysis of variance (ANOVA) and principal component analysis (PCA) and allows visualisation of results (Gauch 1992; Mandel 1971; Yan et al. 2007). Reaction norm describes a range of responses or phenotypes produced by a single genotype across a range of environments (Lynch and Walsh 1998; Pierce 2005; Woltereck 1909). The first three methods have well been covered in books or review articles for G×E in crop and forest tree breeding (Freeman 1973; Kang and Gauch 1996; Zobel et al. 1988). This review will cover biplot analysis, factor analytic models and reaction norm.

Biplot analyses

Biplot analysis is most commonly applied in plant breeding, with the aim of developing crops with high yield and good quality. Cultivars selected from one environment might not maintain their high performance in another environment due to G×E. Identifying cultivars with high performance and wide adaptability in multiple environments is the ultimate aim. A biplot analysis uses singular-value decomposition to break down data into a component matrix and displays both column and row information simultaneously (Gabriel 1971). The additive main effects and multiplicative interaction (AMMI) and the genotype main effects and G×E effects (GGE) are the two main biplot analysis methods to identify G×E patterns in plant and forest tree breeding. AMMI and GGE biplot analyses test the significance of G×E and the relative size of G×E variance to genetic variance and visualise stability of genotypes and environments where genotypes are best performed.

AMMI is a statistical approach that combines ANOVA and PCA (Gauch 1992; Mandel 1971; Mrode 2014). It decomposes the source of variation, first into the additive effects of genotypes and environments using ANOVA, and then into multiplicative effects for G×E using PCA (Zobel et al. 1988). The interaction effects are decomposed into a portion representing real responses to G×E and a portion due to random variation (Crossa et al. 1990). The linear model used for the AMMI analysis is as follows (Gauch 1992):

$$ {y}_{i j}=\mu +{\alpha}_i+{\beta}_j+\sum_{k=1}^n{\lambda}_k{\xi}_{i k}{\eta}_{j k}+{\rho}_{i j}+{\varepsilon}_{i j} $$

where y ij is the observed phenotype of genotype i in environment j, μ is the grand mean, α i is the genotype main effects as deviations from μ, β j is the environment main effects as deviation from μ, λ k is the singular value for the interaction principal component (IPC) axis k, ξ ik  and η jk are the genotype and environment IPC scores (i.e. the left and right singular vectors) for axis k, and ρ ij is the interaction residual containing all multiplicative terms that are not included in the model; n is the number of axes or principal components retained by the model, and ε ij is the residual associated to the genotype i at environment j, assumed independent with identical distribution. The additive part of the AMMI model (μ, α i and β j ) is estimated from ANOVA and the multiplicative part (λ k , ξ ik and η jk ) from PCA.

The GGE methodology is also based on ANOVA and PCA, using a sites regression model (SREG) with two principal components. The linear model used in GGE analysis is as follows (Yan and Hunt 2001):

$$ {y}_{ij}-{\beta}_j=\sum_{k=1}^2{\lambda}_k{\xi}_{ik}{\eta}_{j k}+{\varepsilon}_{ij} $$

where y ij is the average performance of genotype i in environment j, β j is the average performance of all genotypes planted at environment j, λ k is the singular value for the principal component k, ξ ik  and η jk are the scores for genotype i and environment j for principal component k, respectively, and ε ij is the residual associated to genotype i and environment j. The GGE methodology removes the environmental main effects through ANOVA and retains the genotypic main effect (G) and interaction effect (G×E) in the environment-centred data (Yan et al. 2000). It allows direct visualisation of the performance and stability of genotypes across multiple environments through PCA. This model is recommended when the environments are the main source of variation in relation to the contributions of the genotypes and the G×E with respect to the total variability (Kandus et al. 2010). In the GGE biplot, the lines that connect the biplot origin and the markers for the environments are environment vectors. The angle between the vectors of two environments is related to the correlation coefficient between them. The cosine of the angle approximates the correlation coefficient (Yan 2002), which is equivalent to the type-B genetic correlation in the study carried out by Ding et al. (2008b). A limitation of the GGE biplot is that it may explain only a small proportion of the total GGE when the genotype main effect is considerably smaller than the interaction effects or when the G×E pattern is complex (Ding et al. 2008b). The GGE biplot approach is not amenable to rigorous hypothesis testing, so it is better as a hypothesis-generator rather than as a decision-maker (Ding et al. 2008b).

Both the AMMI and GGE methods use linear models and treat the main and interaction effects as fixed effects (Crossa 2012). The residual variance has a normal distribution and is homogeneous and independent across environments (Gauch 1992; Kang and Gauch 1996; Piepho 1995). A biplot procedure is used to provide a graphical interpretation of results through plotting PCA scores of interaction effects for each genotype and environment (Crossa 1990; Crossa et al. 1990; Kempton 1984; Yan et al. 2000). The difference between them is that GGE biplot analysis is based on environment-centred PCA, whereas AMMI analysis refers to double-centred PCA (Kroonenberg 1995; Rad et al. 2013). The GGE biplot has many visual interpretations that AMMI does not have when presenting for ‘which-won-where’, particularly when visualising any crossover G×E (Ding et al. 2008b).

The AMMI analysis methodology has been extensively applied in the statistical analysis of multi-environment cultivar trials in crop breeding (Annicchiarico 1997; Crossa et al. 1990, 1999; Table 1; Gauch and Zobel 1989; Gauch and Zobel 1997; Hassanpanah 2010). AMMI has also been used for characterising G×E in forest trees, e.g. in Eucalyptus (Baril et al. 1997a; Karuntimi 2012; Lavoranti et al. 2007), pines (Pinus spp.; Chambel et al. 2008; Falkenhagen 1996; Kim et al. 2008), poplars (Populus spp.; Rae et al. 2008), white spruce (Picea glauca (Moench) Voss; Rweyongeza 2011), birch (Betula spp.; Zhao et al. 2014) and lodgepole pine (Pinus contorta Douglas; Wu and Ying 2001).

Table 1 Examples of evidence of G×E in various forest tree studies in the literature

The GGE method has been used to investigate G×E in many agronomic trials (Yan et al. 2000, 2007) and also in tree breeding (Correia et al. 2010; Ding et al. 2008b; Sixto et al. 2011). Correia et al. (2010) used GGE biplot analysis to study G×E for total height, diameter, stem form and survival of 30 maritime pine (Pinus pinaster Aiton) populations sourced from Portugal, Spain, France and Australia, in a multiple environment provenance trial in Spain. The GGE biplot analyses were used to analyse the biomass production and stability of poplar clones, which were ranked according to mean performance and stability in southwest Europe (Sixto et al. 2011). Ding et al. (2008b) used GGE biplot to investigate G×E of 216 radiata pine families across five environments in Australia. The biplot analysis allowed different groups of clones to be identified according to their performance and degree of interaction displayed, thus providing useful information for the selection process in radiata pine.

Factor analytic models

Factor analytic (FA) models can provide a reliable, parsimonious and holistic approach for estimation of genetic correlations between all pairs of trials (Cullis et al. 2014; Smith et al. 2015) and provide a natural framework for modelling G×E patterns in complex multi-environment experiments (Meyer 2009). The FA model is the most useful for making decisions of selection for breeding populations and decisions of deployment for production populations.

The use of FA models in multi-environment trials is based on the use of eigenvectors from PCA (Jolliffe 1986; Smith et al. 2001) and extended to accommodate both additive and non-additive effects (Oakey et al. 2006a, b). Cullis et al. (2014) used FA models to accommodate a large number of environments and poor connectivity between environments with a reduced animal model. The FA models aim to identify the statistical common factors that give rise to correlations between variables (Mrode 2014). They represent traits assessed under multiple environments as linear combinations of a few latent variables (Cullis et al. 2014; Smith et al. 2001), referred to as common factors (Hardner et al. 2010; Meyer 2009), thereby reducing the dimensionality of the among-sites variation with respect to G×E. The number of factors is called the order of the model and an FA model of order k is denoted as FAk. Assuming a linear mixed model

$$ \mathbf{y}=\boldsymbol{X}\boldsymbol{\beta } +\boldsymbol{Zu}+\boldsymbol{e} $$

where y is the vector of observations for t sites; β is the vector of fixed effects; u is the vector of random additive genetic effects with u ∼ N(0, G ⊗ A), where \( \boldsymbol{G}=\left[\begin{array}{ccc}{\sigma}_{a_1}^2& \cdots & {\sigma}_{a_1{a}_t}\\ {}\vdots & \ddots & \vdots \\ {}{\sigma}_{a_t{a}_1}& \cdots & {\sigma}_{a_t}^2\end{array}\right] \), where \( {\sigma}_{a_i}^2 \)is the additive genetic variance for site i, \( {\sigma}_{a_i{a}_j} \) is the additive genetic covariance between site i and site j, A is the numerical relationship matrix and ⊗ denotes the Kronecker product; and e is the vector of random residual effects with \( \boldsymbol{e}\sim N\left(0,\left[\begin{array}{ccc}{\sigma}_{e_1}^2& \cdots & 0\\ {}\vdots & \ddots & \vdots \\ {}0& \cdots & {\sigma}_{e_t}^2\end{array}\right]\right) \), where \( {\sigma}_{e_i}^2 \) is the residual variance for site i, X and Z the design matrices associated phenotypes with β and u, and t is the number of trials. The FAk model for the additive genetic effects of m genotypes in t trials can be modelled as u = (Λ ⨂ I m )f + δ (Costa e Silva et al. 2006; Cullis et al. 2014), where Λ is the t × k matrix of trial loadings, f is the mk × 1 vector of scores and δ is the mt × 1 vector of genetic regression residuals. Var(u)=(ΛΛ′ + Ψ) ⨂I m with assumptions of var(f) = I mk , var(δ) = ψ ⨂ I m , where ψ is a t × t diagonal matrix with a variance (called a specific variance) for each environment, and the vectors of random effects f and δ are mutually independent as multivariate Gaussian distribution with zero means. The between-environment genetic variance matrix is defined as G e  = (ΛΛ  + ψ). G e can be estimated with the REML algorithms as \( {\widehat{\boldsymbol{G}}}_{\boldsymbol{e}}=\left(\widehat{\boldsymbol{\varLambda}}{\widehat{\boldsymbol{\varLambda}}}^{\boldsymbol{\prime}}+\widehat{\boldsymbol{\psi}}\right) \) and can be converted to a correlation matrix \( {\widehat{\boldsymbol{C}}}_{\boldsymbol{e}}={\widehat{\boldsymbol{D}}}_{\boldsymbol{e}}{\widehat{\boldsymbol{G}}}_{\boldsymbol{e}}{\widehat{\boldsymbol{D}}}_{\boldsymbol{e}} \), where \( {\widehat{\boldsymbol{D}}}_{\boldsymbol{e}} \) is a diagonal matrix with elements given by the inverse of the square roots of the diagonal elements of \( {\widehat{\boldsymbol{G}}}_{\boldsymbol{e}} \). The FA models outlined above are equivalent to the extended factor analytic models specified by Meyer (2009).

Latent regression plots were used to show genetic responses to trial loadings, indicating the magnitude of G×E (or stability) of selection candidates across multiple environments in the FA models (Chen et al. 2017; Cullis et al. 2014; Smith et al. 2015; Table 1). A latent regression of a selection candidate with a higher slope means that the candidate is more sensitive to the environment. A latent regression with a zero slope indicates that the performance of the candidate is stable across multiple environments.

This approach has been applied to investigate G×E across multiple environments in barley (Smith et al. 2001), potato (Burgueño et al. 2011), maize (Burgueño et al. 2011) and wheat (Burgueño et al. 2011; Oakey et al. 2006a, b), and also in livestock breeding (Meyer 2009). In tree breeding, this approach has been successfully applied to the studies of G×E in Khasi pine (Pinus kesiya Royle ex Gordon; Costa e Silva 2007), radiata pine (Cullis et al. 2014; Ivković et al. 2015), loblolly pine (Zapata-Valenzuela 2012), Eucalyptus and Eucalyptus hybrid clones (Costa e Silva et al. 2006; Hardner et al. 2010) and Norway spruce (Picea abies (L.) H. Karst.; Chen et al. 2017). Despite its statistical power, the FA approach has not yet delivered in identifying the roles of specific environmental factors in driving G×E (B. Cullis, 2016, personal communication).

Reaction norm

Reaction norm, also called a norm of reaction, describes a range of responses or phenotypes produced by a single genotype across a range of environments (Lynch and Walsh 1998; Pierce 2005; Woltereck 1909). It is suitable for analysing data on traits that vary gradually and continuously over an environmental gradient (e.g. temperature) (Kolmodin and Bijma 2004). A linear reaction norm for a single trait has a model (Kolmodin and Bijma 2004; Strandberg et al. 2000):

$$ {\boldsymbol{y}}_{\boldsymbol{i}\boldsymbol{k}}={\boldsymbol{b}}_0+{\boldsymbol{b}}_1{\boldsymbol{x}}_{\boldsymbol{k}}+{\boldsymbol{a}}_{0_{\boldsymbol{i}}}+{\boldsymbol{a}}_{1_{\boldsymbol{i}}}{\boldsymbol{x}}_{\boldsymbol{k}}+{\boldsymbol{e}}_{0_{\boldsymbol{i}}}+{\boldsymbol{e}}_{1_{\boldsymbol{i}}}{\boldsymbol{x}}_{\boldsymbol{k}} $$

where y ik is the phenotypic value of genotype i in environment k; b 0 and b 1 are the fixed effects of the intercept and slope of the reaction norm for genotype, respectively; \( {\boldsymbol{a}}_{0_{\boldsymbol{i}}} \) and \( {\boldsymbol{a}}_{1_{\boldsymbol{i}}} \) are the random additive genetic effects of the intercept (corresponding to the classical EBV for performance potential) and the slope (equivalent to the EBVs for environmental sensitivity) of the reaction norm for genotype i, respectively; \( {\boldsymbol{e}}_{0_{\boldsymbol{i}}} \) and \( {\boldsymbol{e}}_{1_{\boldsymbol{i}}} \) are the random residual effects of the intercept and slope of the reaction norm for genotype i; and x k is the effect of environment k on the phenotype. The random additive genetic effects a 0 and a 1 were assumed to be normally distributed with expectation zero and variances \( {\boldsymbol{\sigma}}_{{\boldsymbol{a}}_0}^2 \) and \( {\boldsymbol{\sigma}}_{{\boldsymbol{a}}_1}^2 \), respectively, and covariance \( {\boldsymbol{\sigma}}_{{\boldsymbol{a}}_0{\boldsymbol{a}}_1}. \)

The advantage of the reaction norm is that selection response can be predicted not only in the phenotypic expression in any environment but also in quantifying the environmental sensitivity of the trait through the slope of a linear reaction norm (robustness or responsiveness to changes in the environment) (Kolmodin and Bijma 2004). Disease exposure, stocking density and nutrient quality are thought to be included as environmental factors affecting livestock production when conducting reaction norm analysis (Rauw and Gomez-Raya 2015). Similarly, environmental factors related to forest tree breeding, such as temperature, rainfall and soil nutrition, can be applied to reaction norm analysis to identify the drivers of G×E in forest tree breeding. The limitations of the reaction norm are that any environmental variables included in the analysis need to clearly identified and the G×E patterns identified are valid only within the range of the modelled environmental conditions (Gregorius and Kleinschmit 2001). The stability of a given genotype in the reaction norm model can be visualised by plotting observed phenotypic values against one or more environmental variables. A flat slope of the reaction norm means that the phenotype produced over the entire range of environments is constant. Any divergence between a genotype’s reaction norm and that of the population as a whole constitutes a component of G×E. For instance, a particularly steep reaction norm slope for a given genotype means that its phenotype is more sensitive to environments, and the genotype will contribute strongly to G×E. The level of G×E can also be expressed as the ratio of variances in the additive genetic slope (\( {\boldsymbol{\sigma}}_{{\boldsymbol{a}}_1}^2 \)) to the additive genetic intercept (\( {\boldsymbol{\sigma}}_{{\boldsymbol{a}}_0}^2 \)) (Kolmodin and Bijma 2004) or by genetic correlation (r g ) between the additive genetic slope and intercept with \( {\boldsymbol{r}}_{\mathbf{g}}=\frac{{\boldsymbol{\sigma}}_{{\boldsymbol{a}}_0{\boldsymbol{a}}_1}}{\sqrt{{\boldsymbol{\sigma}}_{{\boldsymbol{a}}_0}^2{\boldsymbol{\sigma}}_{{\boldsymbol{a}}_1}^2}} \) (Strandberg et al. 2000).

The reaction norm approach has been applied in forest trees (Gregorius and Kleinschmit 2001), including species that are widely distributed with clinal ranges, notably Norway spruce (Oleksyn et al. 1998) and Scots pine (Abraitiene et al. 2002) in Sweden, and lodgepole pine (Rehfeldt et al. 1999; Wang et al. 2006) and its hybrids (Wu and Ying 2001) in Canada. Oleksyn et al. (1998) examined plant growth, partitioning, net CO2 exchange rate, tissue chemistry and phenology of 54 Norway spruce populations to quantify differences in growth and associated plant traits among populations from altitudinal gradients and better understand their relationships. This study showed that Norway spruce populations from cold mountain environments can be characterised by several potential adaptive features, such as mean annual temperature and altitude. Abraitiene et al. (2002) studied genetic variation of pollen viability and susceptibility to ozone in Scots pine. Significant genetic variation in susceptibility of pollen to increased ozone concentration was found, but only 5% of variation was attributed to G×E.

A reaction norm concept was used to derive response functions for incorporating climate variables into analytic models, and considerably improved their reliability in lodgepole pine populations (Wang et al. 2006) and structure of the specific combining ability between two species of Eucalyptus (Baril et al. 1997b). Response functions predicted that small changes in climate greatly affected growth and survival of forest populations and that maintaining contemporary forest productivities during global warming requires a wholesale redistribution of genotypes across the landscape (Rehfeldt et al. 1999). Significant population by site interactions among 10 natural lodgepole pine populations sampled from three lodgepole pine subspecies (Pinus contorta ssp. contorta, ssp. latifolia and ssp. murrayana) were found for 20-year heights measured in 57 provenance test sites across interior British Columbia (Wu and Ying 2001).

Evidence of G×E in forest trees

In forest trees, G×E has been studied in various economically important species, notably in New Zealand, Australia, USA, Europe, Asia and Africa (Table 1). Most studies investigated G×E for growth and form traits with some investigation of G×E for wood density traits and wood property traits. Table 1 presents examples of evidence of G×E in various forest tree studies in the literature. The criteria to measure the magnitude of G×E in forest tree breeding were proposed in two studies. Robertson (1959) suggested as a guideline that a genetic correlation of 0.8 or higher could be interpreted as G×E with less biological importance. Shelbourne (1972) suggested that interactions have a serious effect on genetic gains from selection and testing when the interaction variance reaches 50% or more of the genetic variance.

High G×E is normally found for tree growth (e.g. radiata pine; Carson 1991; Johnson and Burdon 1990; Table 1; Wu and Matheson 2005). The ratio of estimated interaction to genetic variance for diameter-at-breast-height (DBH) was above 0.50 in a diallel experiment covering 10 sites in Australia (Wu and Matheson 2005). The genetic correlation estimate between pairs of environments for DBH was 0.39 (Wu and Matheson 2005), 0.34–0.38 (Ding et al. 2008a) and −0.60 to 1.0 (Raymond 2011). Type-B genetic correlations were 0.27–0.84 for volume (Dieters and Huber 2007; Li and Mckeand 1989; Roth et al. 2007; Sierra-Lucero et al. 2003), 0.18–0.95 for height (Gwaze et al. 2001; Li and Mckeand 1989; Owino et al. 1977; Paul et al. 1997) and 0.27 for mean annual increment for volume per hectare (Sierra-Lucero et al. 2003). Baltunis and Brawner (2010) reported high G×E for growth traits in clonal trials among Australian sites but not among New Zealand sites. Nearly two-thirds of genetic correlation estimates for DBH between paired sites were below 0.6 in an analysis using data covering 76 sites across the whole of New Zealand (McDonald and Apiolaza 2009). Ivković et al. (2015) reported that pairwise genetic correlations for DBH among 20 control-pollinated trials ranged from −0.51 to 0.98, 48% of which were significantly different from the perfect genetic correlation of 1 with an average of 0.35. Estimates of type-B genetic correlations between trials increased with age, indicating that the importance of G×E appears to decline with age and early growth data may be unreliable for evaluating G×E at maturity (Dieters et al. 1995; Gwaze et al. 2001; Roth et al. 2007; Zas et al. 2003).

Tree form is often an important trait in tree breeding programmes. Traits such as stem straightness and branching confer value in trees at rotation age (Cown et al. 1984; Ivković et al. 2006). The extent of G×E in these traits varies much more among studies. Low levels of G×E have been reported in most studies for stem straightness (Carson 1991; Gapare et al. 2012b; Johnson and Burdon 1990; Pederick 1990), branch angle (Gapare et al. 2012b), branch size (Gapare et al. 2012b; Pederick 1990), branch habit (Carson 1991; Johnson and Burdon 1990) and malformation (Johnson and Burdon 1990). This contrasts with some evidence of G×E in form traits for branch size, numbers of forks and ramicorn branches (Wu and Matheson 2005) and for stem straightness and branch quality among some sites across Australia and New Zealand (Baltunis and Brawner 2010). Similarly, Gwaze et al. (2001) reported high levels of G×E for stem straightness in Zimbabwe for stem straightness and Suontama et al. (2015) reported high levels of G×E for branching in New Zealand. Conflicting G×E was found in two New Zealand studies: Dungey et al. (2012) reported a high level of provenance × site interaction for stem straightness but a low level of family-within-provenance × site interaction in Douglas-fir; Kennedy et al. (2011) found a high type-B genetic correlation (over 0.80) for straightness and form score and a low type-B genetic correlation (0.49) for malformation.

Traits associated with wood structure and quality generally appear to have low G×E in conifers, e.g. wood basic density (Apiolaza 2012; Baltunis et al. 2010; Gapare et al. 2010, 2012a; Johnson and Gartner 2006; Muneri and Raymond 2000), acoustic velocity or modulus of elasticity (Dungey et al. 2012; Gapare et al. 2012a; Jayawickrama et al. 2011; Johnson and Gartner 2006), wood chemical properties (Sykes et al. 2006), wood specific gravity (Jett et al. 1991) and resin canal traits (Westbrook et al. 2014). Osorio et al. (2001) reported minimal G×E for wood density in Eucalyptus grandis in Colombia. Eucalypt hybrid clones (Eucalyptus grandis × Eucalyptus urophylla) in Brazil were, however, found to have significant G×E for wood basic density across four sites (Lima et al. 2000).

Statistically significant type-B genetic correlations may not reflect the true magnitude of G×E. Good experimental design is essential at all stages of testing when estimating genetic parameters. The lack of randomization for the seedling population apparently resulted in a problem with partitioning of the genetic variance, causing among-family variance to be inflated (Baltunis et al. 2007). Poor (i.e. very imprecise) estimates of genetic correlations between environments are often related to a limited number of parents in common between environments (Apiolaza 2012; Raymond 2011). A propagation effect may be relating to the season when cuttings are rooted. The worst genetic correlations were observed between the trial established with rooted cuttings from the winter setting and any of the other trials, while the best genetic correlations were obtained from the field trials that included rooted cuttings originating from the spring settings (Baltunis et al. 2005, 2007).

Impacts of G×E in tree breeding

The overall impact that G×E has for tree breeders is to complicate breeding programme design. The efficiency of selecting for a trait in one environment in pursuing genetic gain in another environment is proportional to the genetic correlation between the two environments and the overall heritability of the trait across the two environments (Falconer and Mackay 1996). Overall heritability across environments is calculated as \( {\boldsymbol{h}}^2=\frac{{\boldsymbol{\sigma}}_{\boldsymbol{a}}^2}{{\boldsymbol{\sigma}}_{\boldsymbol{a}}^2+{\boldsymbol{\sigma}}_{\boldsymbol{ge}}^2+{\boldsymbol{\sigma}}_{\boldsymbol{e}}^2} \) , where \( {\boldsymbol{\sigma}}_{\boldsymbol{a}}^2 \) is the additive genetic variance, \( {\boldsymbol{\sigma}}_{\boldsymbol{ge}}^2 \) is the interaction variance and \( {\boldsymbol{\sigma}}_{\boldsymbol{e}}^2 \) is the residual variance. High G×E, therefore, reduces overall heritability across sites in two aspects. Firstly, large G×E variance in the denominator directly reduces the overall heritability. Secondly, G×E also compromises the estimation of genetic variance across multiple environments, further reducing the size of the heritability estimate. For example, heritability of mean annual increment in volume was 0.08 in an analysis across seven sites, whereas it was over 0.20 within each of two regions (Sierra-Lucero et al. 2003). G×E can also inflate estimation of heritability if estimated in one environment. When G×E is present and the estimates from a single location test are used for a general genetic prediction, the heritability estimate is inflated as part of G×E variance is partitioned into the additive genetic variance. For example, the additive G×E was found to be large enough to cause upward biases on heritability estimates and genetic gain predictions of up to 60–100% (Owino et al. 1977). Family-mean heritability was over-estimated by about 15% when estimated from a single site, compared with that estimated from across-site analysis, and even a type-B genetic correlation for acoustic velocity was 0.85 among sites.

G×E can affect estimation of predicted genetic gain in scenarios for testing and selection as it reduces overall heritability or accuracy across environments. Sierra-Lucero et al. (2003) found that selecting families in region 1 for deployment in region 2 resulted in a 4–8% reduction in mean annual increment in volume per hectare. Loss of genetic gain reached more than 10% in a scenario ignoring G×E when compared with a scenario considering G×E when type-B genetic correlation between sites was less than 0.80 (Diaz Solar et al. 2011). Leksono (2009) reported that genetic gains resulting from direct selection were apparently greater than those resulting from indirect selection, with a decrease of 24–60% in genetic gains if breeding populations were transferred between breeding zones. Diameters at the northern-most site in Queensland, Australia, were poorly correlated with those at other sites (r g = 0.39), and if individual selection was based on this site for planting at any other sites, the estimated genetic gain was only 29–57% as efficient as a selection programme based on the plantation site (Woolaston et al. 1991).

Xie (2003) found considerable G×E for height in interior spruce, a white spruce (Picea glauca (Moench) Voss) and Engelmann spruce (Picea engelmannii Parry ex Engelm.) complex (Xie and Yanchuk 2002), among five seed planning zones located in north-central interior British Columbia with an average between-sites genetic correlation of 0.64. The five seed planning zones were clustered into two new seed zones, with genetic correlations within the new seed zones of 0.97 and 0.84, respectively, and a genetic correlation between the two new zones of 0.41. When selecting the best 25% of tested parents within each zone, the expected genetic gain was 19% when considering the entire region as one zone, 24% when considering the five original zones and 26% when consolidating the five original zones into the two new zones. Despite the apparently modest enhancements of expected gains from regionalised breeding, failure to evaluate genotypes on sites where performance is poorly correlated with that elsewhere would inevitably sacrifice around half or more of the potential genetic gain on such sites.

Significant G×E does not always result in considerable loss of genetic gain. Jett et al. (1991) found significant G×E for specific wood gravity in loblolly pine. Four of 18 families were classified as unstable for the trait, accounting for half the observed G×E variance. However, this G×E only caused a negligible effect on potential genetic gain. Dieters et al. (1996) reported that type-B genetic correlations were over 0.67 for fusiform rust resistance and G×E did not appear to be important in the rust resistance of slash pine in the USA. Carson (1991) found significant G×E for diameter in radiata pine but genetic gains predicted for several regionalisation options suggested the size of G×E in radiata pine in New Zealand appeared to be too small to warrant regionalised breeding populations.

A measure of loss of potential gain (C) has been used as a criterion to evaluate the impact of G×E on breeding programmes for family selection (Matheson and Raymond 1984, 1986) with

\( \boldsymbol{C}=\left(1-{\left[\left({\boldsymbol{V}}_{\mathbf{f}}+\frac{{\boldsymbol{V}}_{\mathbf{e}}}{\boldsymbol{NBS}}\right)/\left({\boldsymbol{V}}_{\mathbf{f}}+\frac{{\boldsymbol{V}}_{\mathbf{i}}}{\boldsymbol{S}}+\frac{{\boldsymbol{V}}_{\mathbf{e}}}{\boldsymbol{NBS}}\right)\ \right]}^{\frac{1}{2}}\right)\times 100 \),

where V f is the phenotypic variance, V i is the variance due to interaction and V e is the error mean square; S is the number of sites, B is the number of replications at each site and N is the number of trees per plot. This assumes that the intensities of selection remain the same and that the residual and genetic components remain the same in the models including or not including the interaction term. Using the preceding relationships, a 2% loss of potential gain corresponds to an approximate reduction of 5% of the numerical value of heritability.

Drivers of G×E in tree breeding

Identifying what environmental factors are the key drivers of G×E is important for both breeding and deployment purposes. It informs the choice of environments in which the candidate genotypes are to be tested and evaluated. It is also important for knowing what the performance of genotypes in a particular environment can tell us about their expected performance in other environments, which may have their attributes characterised, but not necessarily by empirical G×E data.

Various studies have been conducted for characterising the roles of environments in generating G×E in radiata pine. G×E has often been found to reflect differential stress responses among genotypes when an environmental factor is at either a sub-optimal or a strongly sub-optimal level for at least some genotypes (Kang 2002). Stress may be either biotic (diseases or pests) or abiotic (e.g. temperature, salinity and excess or deficiency of water or nutrients). For a given set of genotypes, the more diverse the environments are, the larger the magnitude of possible G×E (Li et al. 2015). G×E may result from different adaptability of subraces or individual genotypes to environmental conditions promoting water and light stresses during the summer and, in some extent, from differential susceptibility to the biotic factors observed (Costa e Silva et al. 2006).

Johnson and Burdon (1990) obtained excellent discrimination between two site categories in New Zealand, representing Northland clays (which are naturally phosphorus-deficient) and pumiceland sites, a pattern that parallels some earlier results in both Australia (Fielding and Brown 1961) and New Zealand (Burdon 1971; Burdon 1976). Wu and Matheson (2005) found that prior land use created some grouping of sites according to interactive behaviour. Since then, the study by Raymond (2011) of G×E for diameter growth in radiata pine in New South Wales found elevation to be the prime driver of G×E, with lesser roles for prior land use and geological parent material. Elevation, however, was associated with a suite of important bioclimatic factors, including rainfall, its seasonality and temperature variables. High-latitude provenances and sites with cool winters and dry summers and high-elevation provenances and sites with high precipitation and short growing seasons contributed the greatest to G×E for height and DBH in white spruce (Rweyongeza 2011).

Wu and Matheson (2005) reported that a large genotype by region interaction in radiata pine was attributed to the extensive snow loading at the two higher-elevation sites. G×E for DBH was found to be possibly driven by extreme maximum temperatures in an analysis of data collected from 76 radiata pine trials across the whole of New Zealand (McDonald and Apiolaza 2009) and by minimum temperature in both provenance and progeny levels (Gapare et al. 2015). A mean daily temperature less than 3.2 °C in May and June explained 27.8% G×E interaction, and it was moderately correlated with the first factor in the FA model, indicating that spring or autumn frost weather conditions could be a main driver for G× E in Norway spruce (Chen et al. 2017). Ivković et al. (2013a) reported that high rainfall and cold temperature explained 25% of G×E variance and they are likely drivers of G×E in New Zealand, based on breeding values of DBH estimated by Cullis et al. (2014).

On the other hand, biotic factors such as foliage diseases, which tend to be strongly related to rainfall and its seasonality, are an obvious potential driver of G×E for stem diameter and volume growth (cf. Ades and Garnier-Géré 1997). G×E caused by differential exposure to Swiss needle cast in Douglas-fir is also a good example, as the needle disease is more prevalent in warmer temperatures and sites can have radically different disease loads and differential growth responses in the host tree population as a result (Dungey et al. 2012).

Li et al. (2015) reported that G×E levels for growth traits were significantly associated with site differences in soil nutrient levels of nitrogen and total phosphorus and mean annual temperature and that G×E levels for foliar calcium content and fascicle weight were significantly associated with site differences in soil levels of magnesium and potassium, respectively. At the broadest scale (across New Zealand and Australia), climatic variables such as temperature and rainfall were the most significant factors driving observed G×E; however, at a local regional scale, soils and topographical factors were of more significance (Ivković et al. 2013a).

Strategies for dealing with G×E in tree breeding programmes

Two main strategies have been proposed for dealing with the presence of strong G×E (Kang 2002; Raymond 2011): (1) select individuals that perform stably across sites; or (2) select individuals that are well suited to each individual environment to maximise genetic gain. The first strategy is applicable when no obvious source of the observed G×E can be found and the interactions are regarded as essentially ‘noise’. The strategy aims to select genotypes with broad adaptation which would be expected to yield dependably across a wide range of environments. Selection for stability is the approach that has been recommended in G×E studies for radiata pine in Australia (Ding 2008; Matheson and Raymond 1984), New Zealand (Carson 1991) and Spain (Codesido and Fernández-López 2009); for loblolly pine in the USA (Owino 1977; Paul et al. 1997); and for Norway spruce in Sweden (Bentzer et al. 1988). This approach aims to eliminate the genotypes that are the most interactive with environments. However, for this to succeed well, the breeder needs to use an appropriate set of test environments that exposes both the interactive and stable genotypes.

The second strategy is to exploit the interactions by analysing and interpreting genetic and environmental differences (Raymond 2011). This strategy would be expected to maximise heritability and genetic gain within each environment individually, and accordingly, this requires the creation of separate breeding populations and seed production with consequent problems of cost, management, recording and ancestry control (Barnes et al. 1984). This strategy might be impracticable when the number of environments is large and spans different climatic and geographic regions and different soil types. A practical adaptation of this strategy is to group similar environments into regions. Application of this strategy relies on the ability to identify which site or environmental factors are causing the interaction (Raymond 2011). Regionalisation is commonly used for Northern Hemisphere species growing in their natural range where a single environmental factor, usually related to temperature gradients, determines seedlot performance. As an example, for Scots pine in Sweden, breeding and seed zones have been established based on the latitudinal temperature gradient, and G×E within these zones for growth traits is very low (Hannrup et al. 2008; Raymond 2011).

Research results from a diallel mating design experiment in radiata pine seem to favour regionalisation of radiata pine breeding for DBH into two main regions in Australia: the high-elevation Tumut region in New South Wales, and regions of Victoria, South Australia and Western Australia (Wu and Matheson 2005). In New Zealand, however, Carson (1991) had reported that the magnitude of G×E in radiata pine appeared to be too small to warrant regionalised breeding populations. Matheson and Raymond (1984) proposed that a better solution to the G×E problem is not to attempt to regionalise the breeding but to omit families that seem to be particularly susceptible to environmental variation. Trade-offs need to be fully evaluated as these approaches have serious implications for operational breeding and a cost/benefit analysis may be needed. A regionalised breeding programme with separate breeding populations might have smaller breeding populations in each environment, lower selection intensities and therefore lower genetic gains than one national programme of the same size (Carson 1991). While extra gain might be available through regionalisation, further costs for land use, operation of multiple seed orchards, additional records and extension in the progeny-testing would increase proportionally (Barnes et al. 1984; Carson 1991).

Towards applications and future research

The multi-trait context

We have reviewed statistical methodologies for studying G×E primarily in relation to the single-trait case. However, quantitative geneticists and breeders almost always have to consider G×E in the context of multiple traits for selection and deployment. Among the methodologies, the study of type-B genetic correlations (Burdon 1977) can readily be extended to multiple traits between multiple environments. With multiple traits, several more complicated manifestations of G×E can arise. Different traits may exhibit different patterns of G×E. Moreover, genetic and phenotypic between-trait (type-A) correlation matrices can differ between environments as another form of G×E. Correlations between different traits expressed in different environments (for which the term type-AB correlations is proposed) can differ between pairs of environments, in a complex manifestation of G×E, likely to be relevant for traits related to health and frost or drought tolerance.

The multi-trait case creates interplays between rank-change and level-of-expression G×E for both selection and deployment. As indicated above, level-of-expression interaction can generate rank-change interaction, some classic examples being the effects of disease expression which, by influencing tree growth or tree form, can certainly influence genotypic rankings for the latter traits, with obvious implications for stability of genotype performance. One such example involves provenance trials of radiata pine in New Zealand, where the Cambria provenance, which is less resistant to needle-cast diseases, gave much worse relative performance for growth on disease-prone sites (Burdon et al. 1997). Another involves provenance trials of coastal Douglas-fir in New Zealand, in which comparative growth performance was strongly affected by Swiss needle cast (caused by Phaeocryptopus gaeumannii), with the susceptible lowest-latitude native provenance doing comparatively better at a highest-latitude site (46° S), where disease risk was much lower than at the lower-latitude site (38° S) (Dungey et al. 2012). Such disease-related influences are of interest for both breeding and deployment.

Level-of-expression interaction

Level-of-expression G×E does not cause rank change of selection candidates and is generally less important for breeding (Muir et al. 1992), but it is important for deployment across multiple environments (Burdon et al. 2017). If there is no rank-change interaction, testing in one environment should be sufficient. If selection for deployment involves a range of environments and multi-trait selection, level-of-expression interaction may require deployment of quite different sets of genotypes to different environments without there necessarily being rank-change interaction. In this situation, evaluation on several environments may be needed to give good resolution of genetic differences for all the traits of interest. Classic examples of level-of-expression interaction can arise with disease resistance, in which resolution of genotypic differences can depend greatly on disease incidence (e.g. Dieters et al. 1996; Sohn and Goddard 1979). Typically, resolution tends to be best at moderate to severe levels of disease, especially among the most resistant genotypes. Nevertheless, there may be cases in which a disease may be present at levels that are of no direct practical importance, but resolution of genetic differences in that environment may still be good. Resistance to that disease does not need to figure as a deployment criterion in such an environment, yet disease resistance expressed there may still be a valuable selection criterion for deployment in other environments, where the disease is troublesome but incidence not highly heritable because of high levels of ‘noise’ variation.

Genomics will help tackle G×E

Genomic selection is being increasingly explored as a major tool for selection in forest tree breeding (Grattapaglia and Resende 2011; Isik et al. 2011, 2016; Lexer and Stölting 2012; Ratcliffe et al. 2015; Resende et al. 2011, 2012a, b). The major benefit of genomic selection is that selection can be undertaken well before the normal age of phenotyping—around age 8 in radiata pine. DNA can be extracted and genotyping undertaken on only a few needles, before the age of 6 months. This means that generation intervals for breeding can be greatly reduced and the expected genetic gains per unit of time can be increased (Grattapaglia and Resende 2011; Isik 2014).

G×E in genomic selection has been studied for several forest species. When G×E interactions exist for a trait, observed SNP effects for the trait changed across environments, and their association with the trait might be significant in one environment but not significant in other environments, as shown in radiata pine (Li et al. 2016). Significant QTL by environment interaction was found for QTLs associated with DBH, basic wood density, Kraft pulp yield, and wood chemical compounds of cellulose, klason lignin and extractives, and lignin syringyl to guaiacyl ratio using 663 individuals of one F2 family and three F1 families of Eucalyptus globulus planted in Tasmania, Victoria and Western Australia (Freeman et al. 2011, 2013). Accuracies of genomic selection, however, have been shown to decline drastically when a model was developed using a dataset of one population to predict phenotypes of another population in Eucalyptus (Resende et al. 2012a). In loblolly pine in the USA, G×E severely affects the transferability of models across breeding zones (Resende et al. 2012b). In interior spruce planted in the western coast of Canada, genomic selection accuracy for growth and wood attributes was higher in a multi-site model (where the G×E term was fitted) than in a single-site model when predicting phenotypes for different sites (El-Dien et al. 2015).

To characterise the roles of genotypes and environments in driving G×E in forest tree breeding, we may need to collect data of genotypes at the molecular level, assess even broader number of phenotypes and collect climatic and geographical data together with edaphic data across all seasons and all years during which tested genotypes grow. Functional genomics (Pevsner 2009), genome sequence annotation (Ouzounis and Karp 2002; Wolf et al. 2001) and high-resolution phenotyping (Crowell et al. 2016) can be useful for characterising the roles of genotypes and environments in driving G×E. Developing environment-specific genomic breeding values is the next challenge to maximise genetic gain in multiple environments when using genomic selection in forest tree breeding. Among the analytical methodologies mentioned above, factor analytic models (Cullis et al. 2014; Smith et al. 2015) are a parsimonious and holistic approach for estimation of genetic correlations between all pairs of environments and can eventually play an important role to develop the environment-specific genomic breeding values for a large number of environments.

Overview

Statistical methods, AMMI, GGE biplot analysis, FA and reaction norm, are reviewed in this paper. Table 2 summarises the strengths and weaknesses of these methods. They are tools for identifying the patterns and magnitude of G×E in forest tree breeding. The first three methods infer groupings of environments and genotypes, based purely on phenotypic data and using PCA to reduce dimensionality. AMMI and GGE biplot analyses test the significance of G×E and the relative size of G×E variance to genetic variance and allow visualisation of stability and environments where genotypes are best performed. These methods are most suitable for making decisions around the deployment of genotypes across multiple environments. They would certainly be useful in the visualisation of forest data to help make informed decisions for deployment.

Table 2 Summary of utility of G×E methodologies: the additive main effects and multiplicative interaction (AMMI), the genotype main effects and G×E effects (GGE), factor analytic model (FA) and reaction norm

The FA model has the ability to estimate the unstructured genetic variance-covariance matrix for a large number of environments without the use of an excessive number of variance parameters. This type of model can be easily used to explain the nature and the extent of G×E. Type-B genetic correlations estimated from FA models can clearly show if there are rank changes of genotypes among environments. FA models are statistically efficient for breeding value estimations and provide a reliable, holistic approach to estimate genetic correlations between all pairs of environments (Cullis et al. 2014). FA models are most useful for making decisions of selection for breeding populations and decisions of deployment for production populations. The reaction norm approach uses a combination of phenotypic and environmental data to make inferences on the environmental drivers of G×E. It is suitable for analysing traits that vary gradually and continuously over an environmental gradient and needs environmental variables included to be clearly defined. Stability analysis has intuitive appeal in forest tree breeding, helping forest growers identify stable genotypes and reduce long-term risks.

All of these statistical methods provide solutions to the identification of high-performing stable genotypes in forest tree breeding programmes. However, pursuing stability of performance does not take full advantage of potential genetic gain in specific environments. In order to maximise potential genetic gains for all environments, selecting the right individuals best adapted to specific environments is likely to be the best approach.

Most G×E studies in forest tree breeding investigated the patterns and magnitude of G×E for growth traits. High levels of G×E were reported for these traits, especially in radiata pine, loblolly pine, Eucalyptus and Populus species. Some studies also investigated the patterns and magnitude of G×E for form traits and wood property traits. Half the studies included in this review reported a high level of G×E for form traits whereas no G×E or minimal G×E was reported for wood property traits.

In summary, G×E can be quantified in a number of ways, and taking several different approaches will likely give complementary insights to help understand the interactions involved, and ensure the best outcome. For New Zealand and for forestry in general, breeding value estimation is clearly most effective using new methods such as factor analytic models. For understanding and quantifying the detailed roles of environments, more work is still required. We believe that genomics technologies (e.g. Elshire et al. 2011; Neves et al. 2013) and genome sequence annotation (Ouzounis and Karp 2002; Wolf et al. 2001) will provide more information and help unravel the cause and effect at the genetic level. To maximise genetic gains and economic benefits from forest plantations, the strategy of selecting individuals for specific and known environments should be applied. The end result will be advances in forest productivity and forest management systems that will ensure the long-term sustainability and enhanced profitability of the forest industry.