Introduction

Plants are continually responding to a diversity of environmental cues. Genotypes differ in receptivity and reaction to such external stimuli, giving rise to genetically distinct patterns of phenotypes across a range of environments (Arnold et al. 2019; Kang 2002). Such differential “reaction norms” result in genotype by environment interaction (GE) (Costa-Neto and Fritsche-Neto 2021). In the plant breeding context, GE is typically quantified across an experimental network as variance due to a GE interaction component relative to cross-site genetic variance (Shelbourne 1972) or as a genetic correlation (rB) among sites (Burdon 1977).

GE is pervasive in forest genetics, especially for tree growth (Li et al. 2017). In general, GE can be partially explained by static physical attributes such as soil types and a less predictable component of temporal effects. Variance due to unexplained GE decreases genetic gain by reducing overall heritability across sites and compromising the estimation of genetic variance across multiple environments (Li et al. 2017). A common management response is to delineate spatially discrete breeding zones (BZs) within which GE is minimized (e.g., Raymond 2011; Ukrainetz et al. 2018). For species re-planted within their natural range, local populations are generally considered best adapted and safest to use (Ying and Yanchuk 2006) and their BZs are typically based on obvious environmental variables (EVs) such as latitude and temperature (e.g., Chen et al. 2017) or geographically defined regions (e.g., Johnson 1997).

An initial BZ classification may also be made on a regional basis for non-native species. Examples include for Pinus radiata in Australia (Ivković et al. 2015), Pseudotsuga menziesii in New Zealand (Dungey et al. 2012), and for Eucalyptus globulus in Australia (Dutkowski et al. 2015) and Chile (Sanhueza et al. 2002). Such regional classifications are unhelpful when the environmental drivers of GE are highly variable within regions or are more similar between sites in different regions. In these cases, rB estimates from a comprehensive set of trials across diverse sites can be used to produce a zone classification determined by one or more predictive EVs (e.g., based on altitude; Raymond 2011).

Some progress has been made in understanding the EVs that determine GE in forest trees. Mean annual temperature and precipitation (along with altitude, which is highly correlated) appear to be key in explaining GE in P. radiata (Burdon et al. 2017). Latitude and minimum temperature (also highly correlated with each other) explained some variation in GE for P. taeda (Lauer et al. 2021). In a more comprehensive prediction of GE, Ukrainetz et al. (2018) showed that 3 precipitation-related EVs, 4 temperature-related EVs, and 2 EVs related to the frost-free period provided a good zonation outcome for P. contorta across British Columbia. The resulting 4 BZs were precisely mapped across the province in a spatial modeling process that used these 9 significant EVs to predict spatial BZ extent for past and predicted future climates (Ukrainetz et al. 2018).

Despite the advances in the BZ classifications, the quality and resolution of environmental data have constrained the prediction of GE in forest tree species (Dutkowski et al. 2015; Gapare et al. 2015). On the other hand, recent GE studies in agricultural crops have considered vastly more EVs in terms of dimensional feature resolution, such as detailed soil properties and daily weather data (Romay et al. 2010). One key advance connecting envirotyping and crop ecophysiology in crop experiments has been to consider a division of the testing period into discrete stages of plant development or phenological periods (e.g., Costa-Neto et al. 2022; Heslot et al. 2014; Jarquín et al. 2014). The recent abundance of EV data has given rise to “envirotyping,” which involves collating numerous EVs at sites across a network of trials, identifying the most impactful EVs for traits of interest, and using these to define the “envirotype” as an environmental signature of each site (Costa-Neto and Fritsche-Neto 2021; Resende et al. 2021; Xu 2016). Envirotyping can lead to two main “enviromic” (envirotyping-based) outcomes: (1) the “enviromic assembly,” which is a large-scale characterization of the predominant environmental types for a given region or experimental network, in order to compute environmental variance–covariance similarity matrices among sites (Costa-Neto et al. 2021); and (2) the “enviromic prediction” using the enviromic characterization to predict BZs (Heinemann et al. 2022). Both outcomes can enhance a predictive breeding pipeline, such as one based on genomic selection, providing an additional source of variance to complement the genetic relationship matrices in the multiple-environment decision making context (e.g., Costa-Neto et al. 2022, 2021; Jarquín et al. 2014).

An advantage of using highly dimensional environmental data over modeling a linear reaction norm approach for a specific EV is that a combination of many low-signal EVs may be required to characterize the underlying environmental pressures determining GE, as opposed to expecting one or two overarching EVs to explain GE responses (as in Calleja-Rodriguez et al. 2019). This could play an important role for developing a broader understanding of the past and future GE and enable prediction of complex changes to genetic expression resulting from global warming (Crossa et al. 2021). In general, it is expected that locally adapted genotypes will be better suited to more poleward and higher elevation sites given a warmer future climate (Aitken and Bemmels 2016; Sáenz-Romero et al. 2020). For this reason, climate mapping and modeling can produce detailed maps of current and expected future BZs (e.g., Thomson et al. 2010). However, the BZ predictions under climate change scenarios are most reliable when the environmental determinants of GE are known precisely (e.g., Ukrainetz et al. 2018).

Many statistical approaches are available for characterizing GE (Van Eeuwijk et al. 2016), including most notably additive main effects and multiplicative interaction (AMMI), which combines ANOVA with principal component analysis (PCA) techniques (Crossa et al. 1990), factorial regression with geographic coordinates (Costa-Neto et al. 2020), and random regression using an EV such as temperature (Arnold et al. 2019), an environmental index of productivity (e.g., Alves et al. 2020; de Souza et al. 2020), or a latent environmental variable computed by factor analytic (FA) analysis (Meyer 2009; Smith et al. 2001). The FA method provides a complete rB matrix among trial sites while accommodating poor connectivity between sites that is typical of tree breeding experiments (Cullis et al. 2014). It has therefore been extensively used and recommended for analysis of GE in forest tree populations (Calleja-Rodriguez et al. 2019; Chen et al. 2017; Cullis et al. 2014; Gezan et al. 2017; Hardner et al. 2010; Ivković et al. 2015; Ogut et al. 2014; Shalizi and Isik 2019; Ukrainetz et al. 2018). Nevertheless, interpretation of the resultant rB matrix is dependent on the quality of supporting environmental information and is improved by detailed enviromic data (Tolhurst et al. 2022).

Handling large enviromic datasets is complicated by their high multidimensionality and multicollinearity. Partial least squares analysis (PLS), which is related to PCA and multiple linear regression, provides a way to analyze such data with strongly collinear, correlated, noisy and numerous variables (Wold et al. 2001). The method relates one or more response variables Y to a p variables × n samples matrix (X), achieving dimensionality reduction by selecting a subset of p that contributes most to explaining variation in Y. PLS has previously been applied to the problem of selecting EVs to explain crop GE most parsimoniously (e.g., Crossa et al. 1999; Monteverde et al. 2019; Porker et al. 2020). Additionally, PLS could be used to trace environmental signatures across previously characterized breeding zones (Costa-Neto et al. 2022).

Eucalyptus globulus is a commercially important plantation species, especially for pulp and paper production. GE in E. globulus has generally been quantified in very general terms, without revealing the nature of potentially significant spatial patterns. For example, Costa e Silva et al. (2009) calculated a uniform rB of 0.83 for diameter at breast height (DBH) among eight typical Portuguese sites, which is similar to Callister et al.’s (2011) average of 0.76 (0.52 to 1.00) among seven within-series trial pairs in Western Australia (WA). Similarly, Li et al. (2007) fitted uniform rB to subrace, additive, and dominance effects using full-sib families across 10 Australian sites, and reported estimates of 1.02 and 0.71 for DBH and 0.00 and 0.72 for tree height (HT), for sub-race and additive levels, respectively. In a closer examination of GE patterns in E. globulus, Dutkowski et al. (2015) reported that fitting uniform rB values within each of five Australian regions and between each pair of regions produced realized rB estimates ranging from 0.00 to 0.94 between regions (mean 0.57) and from 0.20 to 1.00 within regions (mean 0.69). Notably, uniform rB within WA was 0.59, within the "Green Triangle" region (GT) was 0.66, and between WA and GT was 0.58 (Dutkowski et al. 2015). Using long-term site climate attributes, Dutkowski et al. (2015) found that minimum monthly evaporation in three classes produced uniform within-zone rB values of 0.64, 0.76, and 0.63, and between-zone of 0.52, 0.38, and 0.60.

Approximately 460,000 hectares of E. globulus plantations is managed in Australia (Downham and Gavran 2019) and an improved understanding of GE in the species could contribute substantially to realized gain from genetic improvement programs. Progeny data from two Australian E. globulus improvement programs were recently combined in a unified single-step genomic BLUP analysis (“HBLUP”), using three broad regions as BZs (Callister et al. 2021). Based on the previous, the primary goal of the present study is to use these same full-sib family data and characterize progeny trial sites by environmental attributes to produce a classification for BZs in WA and GT using envirotyping. A second goal of this study is to examine the temporal and spatial variability in BZ distributions across the WA and GT commercial areas. To achieve these goals, we applied an integrative approach combining factor analytic models, enviromics, and spatial modeling.

Materials and methods

Experimental populations

This study was conducted using a subset of the data described by Callister et al. (2021) from two multi-generational breeding populations of E. globulus in southern mainland Australia, EG1 (Australian Bluegum Plantations) and EG2 (HVP Plantations). Forty-eight field trials were established in WA, GT, and Gippsland between 1998 and 2015, representing a total of 126,467 full-sib progeny in 1973 families produced by controlled crossing (see Table 1 for details by population and region). A relatively small number of half-sib families and checks were included in these trials but are not the focus of the present study.

Table 1 Summary of trials, control pollinated families, and control pollinated progeny by breeding program (EG1 and EG2) and region

The EG1 population of full-sib families was founded on 112 base-population trees, and the EG2 population was founded on 83 base-population trees in native forests or international landraces (Table SM1). All E. globulus races apart from Tidal River were represented in the combined pedigree (Fig. 1 and Table SM1). A total of 347 EG1 parents and 107 EG2 parents were represented by progeny included in the present analyses.

Fig. 1
figure 1

Map of southern Australia with natural distribution and races of E. globulus in brown (after Freeman et al. 2007) and the 33 trial sites used for envirotyping represented by blue diamonds within Albany and Esperance districts (Western Australia region) and Green Triangle region. Gippsland sites used in the FA analysis are located within the natural distribution of Strzelecki Ranges and South Gippsland races (trial sites not shown)

Trials were primarily established in randomized incomplete-block designs with four to eight replicates of each family established in four- to five-tree row plots. Two EG1 trials were single-tree plot designs. Trial layouts were generally contiguous, allowing for the resolution of spatial trend effects within each trial. Planted stocking generally ranged from 800 to 1087 stems/ha, with one site at 556 stem/ha (Fig. SM1). Mean survival was 90% and mean tree height was 12.2 m at assessment (Fig. SM1).

Phenotyping and genotyping

Diameter at breast height (DBH) was measured with a tape for each tree at between 3 and 8 years of age. Tree height (HT) was measured with a hypsometer for each tree in most trials, although a subset of 3–17% (mean 10%) of trees were assessed in 11 of the EG2 trials. In these cases, unmeasured HT data were predicted from the DBH-HT relationship established among measured trees for each trial. Stem volume (VOL) was calculated for each tree using DBH, HT (measured or predicted), and the region-specific volume function provided by each program.

Genotypic relationships among parents in the two populations were used to merge pedigrees for a combined analysis (see Callister et al. 2021 for details). There was little connectivity in the known pedigree, as only four founders were common to both populations. Pedigree-derived relationship coefficients between the two parent cohorts were rare, with two relationships of 0.25, 152 relationships of 0.125, and one expected relationship coefficient of 0.0625.

Leaves were sampled from 164 parents in the EG1 program and from 93 parents in the EG2 program. DNA extraction and genotyping were performed at Gondwana Genomics Pty Ltd, Canberra (Thavamanikumar et al. 2020). The marker panel consisted of 2579 single nucleotide polymorphism (SNP) and small biallelic insertion/deletion (INDEL) markers within previously identified candidate genes (Southerton et al. 2011). The filtered marker matrix included 2444 markers after removing those with minor allele frequency less than 0.05.

Experimental and genetic analysis

Relationship matrix calculation

Joint-program relationship matrices were formed among parents, tested progeny, and their ancestors using preGSf90, which is a module of the Fortran-based BLUPF90 suite (Misztal et al. 2014). The default settings of preGSf90 were used for quality control and calculation of the relationship matrices. The genomic relationship matrix (G) was calculated following the first method of VanRaden (2008). G was rescaled to make it compatible with the comparable section of A representing genotyped individuals (A22) following Christensen et al. (2012). The rescaled matrix Ga was then weighted with 0.05A to avoid difficulties with inversion (Aguilar et al. 2010) and used to calculate H, with inverse (Christensen and Lund 2010; Legarra et al. 2009):

$${\mathbf{H}}^{-1}= {\mathbf{A}}^{-1}+ \left[\begin{array}{cc}0& 0\\ 0& {\mathbf{G}}_{\mathbf{w}}^{-1}-{\mathbf{A}}_{22}^{-1}\end{array}\right]$$
(1)

H and A were compared in a preliminary step to highlight identity errors and mistakes in the documented pedigree. These errors were rectified in the pedigree and the process of creating the hybrid relationship matrix was then repeated with the correct pedigree information.

Single-site analyses

All genetic analyses were conducted using ASReml version 4.1 (Gilmour et al. 2015). Single-site, univariate animal models (pedigree BLUP) of VOL were first fitted to data from each trial following Henderson (1984) using the general linear mixed model framework:

$$\mathbf{y}=\mathbf{X}{\varvec{\uptau}}+\mathbf{Z}\mathbf{u}+\mathbf{e}$$
(2)

Where y is the vector of phenotypic values, X is the incidence matrix for fixed effects, \({\varvec{\tau}}\) is the vector of fixed effects, Z is the incidence matrix for random effects, u is the vector of random effects with E(u) = 0, and e is the vector of residual effects expected to be independently normally distributed with E(e) = 0. Races and landraces were included as fixed genetic groups within the pedigree (Westell et al. 1988). Other fixed effects were the overall mean, replicates, and checklots. Random effects were additive and family-specific genetic effects of full-sib families, incomplete blocks, plots, and half-sib families. The dispersion matrices contained elements A \({\widehat{\sigma }}_{a}^{2}\), I \({\widehat{\sigma }}_{f}^{2}\), I \({\widehat{\sigma }}_{b}^{2}\)¸ I \({\widehat{\sigma }}_{p}^{2}\), I \({\widehat{\sigma }}_{h}^{2}\), and I \({\widehat{\sigma }}_{e}^{2}\), where \({\widehat{\sigma }}_{a}^{2}\) is the estimated additive genetic variance fitted to full-sib families, \({\widehat{\sigma }}_{f}^{2}\) is the estimated non-additive full-sib family-specific variance, \({\widehat{\sigma }}_{b}^{2}\) is the estimated incomplete block variance, \({\widehat{\sigma }}_{p}^{2}\) is the plot variance, \({\widehat{\sigma }}_{h}^{2}\) is half-sib family variance, \({\widehat{\sigma }}_{e}^{2}\) is the residual variance, A was the verified pedigree-based numerator relationship matrix, and I is the identity matrix. Additional within-plot error was fitted for checklots. A two-dimensional autoregressive structure was fitted to model the spatial variability effects within each trial (Dutkowski et al. 2002), which was retained for all but one site after significance testing by two-tailed likelihood ratio tests (Gilmour et al. 2015) and visual inspection of two-dimensional smoothness.

Narrow-sense heritability was estimated for each site: \(\widehat{{h}^{2}}=\frac{{\widehat{\sigma }}_{a}^{2}}{{\widehat{\sigma }}_{a}^{2}+{\widehat{\sigma }}_{f}^{2}+{\widehat{\sigma }}_{b}^{2}+{\widehat{\sigma }}_{p}^{2}+{\widehat{\sigma }}_{e}^{2}}\). VOL data were adjusted for spatial effects (following Ye and Jayawickrama 2008) and standardized by site to have a zero mean and an additive genetic standard deviation of one prior to cross-site analysis (VOLstd).

Factor analytic HBLUP analysis of genotype x environment interaction

VOLstd was analyzed across all 48 sites using a parent model for additive genetic variation and family-specific effects:

$$\mathbf{y}= \mathbf{X}{\varvec{\uptau}}+ \frac{1}{2}\left({{\varvec{Z}}}_{{{\varvec{m}}}_{{\varvec{p}}}}+{{\varvec{Z}}}_{{{\varvec{p}}}_{{\varvec{p}}}}\right){{\varvec{u}}}_{{{\varvec{a}}}_{{\varvec{p}}}}+{{\varvec{Z}}}_{{\varvec{f}}}{{\varvec{u}}}_{{\varvec{f}}}+{{\varvec{Z}}}_{{\varvec{s}}{\varvec{i}}{\varvec{t}}{\varvec{e}}:{\varvec{f}}}{{\varvec{u}}}_{{\varvec{s}}{\varvec{i}}{\varvec{t}}{\varvec{e}}:{\varvec{f}}}+{\varvec{e}}$$
(3)

where \(\mathbf{y}\) is now the combined vector of VOLstd data for n full-sib progeny across t sites; \({\varvec{\uptau}}\) is the vector of fixed effects of trials with associated design matrix \(\mathbf{X}\); \({{\varvec{Z}}}_{{{\varvec{m}}}_{{\varvec{p}}}}\) and \({{\varvec{Z}}}_{{{\varvec{p}}}_{{\varvec{p}}}}\) are indicator matrices to assign each progeny with a maternal parent and a paternal parent, respectively; \({{\varvec{u}}}_{{{\varvec{a}}}_{{\varvec{p}}}}\) is the vector of additive genetic effects for the parents; \({{\varvec{u}}}_{{\varvec{f}}}\) is the vector of family-specific genetic effects across sites with associated design matrix \({{\varvec{Z}}}_{{\varvec{f}}}\); \({{\varvec{u}}}_{{\varvec{s}}{\varvec{i}}{\varvec{t}}{\varvec{e}}:{\varvec{f}}}\) is the vector of site × family interaction effects with associated design matrix \({{\varvec{Z}}}_{{\varvec{s}}{\varvec{i}}{\varvec{t}}{\varvec{e}}:{\varvec{f}}}\); and \({\varvec{e}}\) is the vector of residuals, which includes Mendelian sampling effects of progeny. \({{\varvec{u}}}_{{\varvec{f}}}\), \({{\varvec{u}}}_{{\varvec{s}}{\varvec{i}}{\varvec{t}}{\varvec{e}}:{\varvec{f}}}\), and \({\varvec{e}}\) were assumed to be normally distributed with mean 0.

\({{\varvec{u}}}_{{{\varvec{a}}}_{{\varvec{p}}}}\) was modeled with a factor analytic (FA) structure across sites. The general form of additive genetic effect i at site j in a FA model with order k (FAk) can be written as

$${{\varvec{u}}}_{{{\varvec{a}}}_{{{\varvec{p}}}_{{\varvec{i}}{\varvec{j}}}}}={\lambda }_{{a}_{{1}_{j}}}{f}_{{a}_{{1}_{i}}}+{\lambda }_{{a}_{{2}_{j}}}{f}_{{a}_{{2}_{i}}}+\dots +{\lambda }_{{a}_{{k}_{j}}}{f}_{{a}_{{k}_{i}}}+{\delta }_{{a}_{ij}}$$
(4)

which is a sum of k multiplicative terms and \({\delta }_{{a}_{ij}}\), which represents the lack of fit of the regression model. Each multiplicative term is a product of an environmental effect, or “loading” for the jth site (\({\lambda }_{{a}_{{k}_{j}}}\)) and a genetic value for the ith individual in the relationship matrix (\({f}_{{a}_{{k}_{i}}}\)). The FA model can be represented in matrix notation as

$${{\varvec{u}}}_{{{\varvec{a}}}_{{\varvec{p}}}}=({{\varvec{I}}}_{{\varvec{n}}}\otimes {{\varvec{\Lambda}}}_{{\varvec{a}}}){{\varvec{f}}}_{{\varvec{a}}}+{{\varvec{\delta}}}_{{\varvec{a}}}$$
(5)

where \({{\varvec{\Lambda}}}_{{\varvec{a}}}\) is the t (site) × k matrix of environmental loadings, \({{\varvec{f}}}_{{\varvec{a}}}\) is the qk × 1 vector of additive genetic scores, and \({{\varvec{\delta}}}_{{\varvec{a}}}\) is the qt × 1 vector of genetic regression residuals, where q is the number of additive effects. We assume that \({{\varvec{f}}}_{{\varvec{a}}}\) and \({{\varvec{\delta}}}_{{\varvec{a}}}\) are mutually independent and distributed as multivariate normal with zero mean and variances given by var(\({{\varvec{f}}}_{{\varvec{a}}}\)) \(=\mathbf{H}\otimes {\mathbf{I}}_{{\varvec{k}}}\) and var(\({{\varvec{\delta}}}_{{\varvec{a}}}\)) \(=\mathbf{H}\otimes {{\varvec{\Psi}}}_{{\varvec{a}}}\), where \(\mathbf{H}\) is the q × q relationship matrix for parents and their ancestors and \({{\varvec{\Psi}}}_{{\varvec{a}}}\) is a t × t diagonal matrix with a variance for each site. The variance of additive genetic effects is var(\({{\varvec{u}}}_{{{\varvec{a}}}_{{\varvec{p}}}}\)) \(=\mathbf{H}\otimes {({{\varvec{\Lambda}}}_{{\varvec{a}}}{{\varvec{\Lambda}}}_{{\varvec{a}}}^{{\varvec{T}}}+{\varvec{\Psi}}}_{{\varvec{a}}})\) so that the additive genetic correlation matrix among sites, \({{\varvec{\rho}}}_{{\varvec{a}}}\), is \({{\varvec{\rho}}}_{{\varvec{a}}}={({{\varvec{\Lambda}}}_{{\varvec{a}}}{{\varvec{\Lambda}}}_{{\varvec{a}}}^{{\varvec{T}}}+{\varvec{\Psi}}}_{{\varvec{a}}})\).

FA model performance was evaluated by calculating the percentage of additive genetic variance accounted for by the k multiplicative terms at each site (Cullis et al. 2014):

$${v}_{{a}_{j}}=100\sum_{r=1}^{k}{\lambda }_{{a}_{rj}}^{2}/\left(\sum_{r=1}^{k}{\lambda }_{{a}_{rj}}^{2}+{\psi }_{j}\right)$$
(6)

and an overall (across-site) percentage accounted for

$$\overline{{v }_{a}}=100tr\left({{\varvec{\Lambda}}}_{{\varvec{a}}}{{\varvec{\Lambda}}}_{{\varvec{a}}}^{{\varvec{T}}}\right)/tr\left({{\varvec{\rho}}}_{{\varvec{a}}}\right)$$
(7)

Factor analytic models were fitted to the 48-trial data first with one, then two k factors, and finally with three factors (FA3) to adequately explain the additive genetic variance across sites.

Selection of sites and definition of BZ classes for model training

The full set of 48 sites was reduced to 33 well-characterized sites for envirotyping (see the “Results” section). All 6 sites in Gippsland were excluded due to lack of directly observed soils information for the region. Six EG1 progeny trials from an earlier generation were also excluded as these sites displayed good rB between each other but were not well correlated to any other site. Three further sites were excluded as their \({v}_{{a}_{j}}\) values were less than 50. The retained trials were established between 2001 and 2015 and were assessed for volume at age 3 years (3 sites), 4 years (6 sites), or 5 years (24 sites). Eleven of the retained trials were in GT (4 from EG1 and 7 from EG2) and 22 in WA (20 from EG1 and 2 from EG2). Information about the level of genetic connection between sites selected for envirotyping is presented in Figs. SM2 and SM3.

Next, a classification was performed with Euclidean hierarchical clustering to assign each of the 33 sites to one of 5 zone classes based on the rB estimates produced by the FA3 models. However, initial efforts using PLS to identify the EVs contributing to membership of unique zone classes did not produce models with satisfactory prediction capacity. Thus, we re-examined the heatmap of rB values and realized that some sites could be assigned to more than one class based on rB. The hypothesis was considered that class-defining EV combinations might afford sites with non-exclusive class membership. The Euclidean hierarchical clusters were considered “core” site membership and then additional “supplemental” sites were added to each class one by one if the mean rB between the candidate site and the existing sites in the class (“marginal rB”) was greater than 0.70 and if the mean rB within the resulting class remained greater than 0.80.

Environmental data collection and envirotyping assembly

A total of 145 EVs were included in the envirotyping study: 71 soil and landscape EVs, 73 climate EVs, and 1 management-associated EV (Table 2). Sources of soil and landscape data were directly observed soil pits or drills, laboratory analyses of soil samples from top 10cm, and the Soil and Landscape Grid of Australia (SLGA), which is a fine resolution (3 arcsecond grid (~ 90 m × 90 m)), public dataset of soil, landscape, and regolith data predictions known to influence crop growth (Grundy et al. 2015). Daily weather data for each site were downloaded from the online SILO platform (Jeffrey et al. 2001) from planting until measurement age or the first 5 years of each trial’s history, whichever was shorter (https://www.longpaddock.qld.gov.au/silo/point-data/, accessed July, 2022).

Table 2 Summary of environmental variables included in the principal component analyses

Soil and landscape attributes were summarized by 71 EVs (Table 2, with details in Table SM2).

Precipitation, minimum temperature, and maximum temperature data were processed into average values for each month across the first 3-year period. These monthly averages were then used to calculate the 19 climate metrics used for species distribution prediction in the BIOCLIM framework (Booth et al. 2014) using the “dismo” R package (see Table SM3 for details).

Studies to predict GE in agricultural crops make good use of well-defined phenological stages such as tillering, jointing, booting, heading, etc., to distinguish periods of weather that affect different physiological processes. For a long-lived perennial such as E. globulus, alternating periods of growth and of survival could be considered phenological stages. Growth periods occur when water and temperature balances allow net anabolism, and survival periods occur when resources must be expended to adapt to unfavorable environmental conditions. For established E. globulus on each of the trial locations studied here, the annual survival stage would be prompted by water limitation and perhaps high temperature, rather than by freezing stress. We considered the first year of a E. globulus plantation to include two unique phenological stages. Both growth and survival are important in the first 4–6 months of establishment, due to the risk of frost death in a period when water and temperature are otherwise favorable for growth. The first summer after growth can also be considered a unique growth period because temperatures are highly suitable for anabolism, while stored soil water is usually capable of meeting a plantation’s water requirements.

Daily weather data from SILO were used to translate the growth stage concept into practice at each of the trial sites as follows. Climate wetness index over a 90-day period (CWI90) was defined as the ratio of total precipitation to total evaporation in the preceding 90 days for each day commencing 91 days after planting. The establishment phase (Estab) was defined from planting until CWI90 reduced to a threshold of 0.5 (Table 3). In the absence of a proven system of phenology for E. globulus, this value was used to mark the commencement of the first-year dry period. The critical threshold of 0.5 for CWI90 was used in each subsequent year to define the end of growth periods and the start of dry periods. Each dry period was ended once 120 mm of rainfall had accumulated since the start of the respective dry period (Table 3).

Table 3 Definitions of phenological stages applied to each trial site

In addition to the BIOCLIM variables, up to 54 climate EVs were calculated for each site (Table 2 with details in Table SM4). Cardinal values used for calculation of temperature effects on growth were [3, 14, 28, 35], relating to base, minimum optimum, maximum optimum, and maximum cardinal temperatures in degrees C (Smethurst et al. 2022). T.opt.hours was defined as the number of hours between 14 and 28 °C, T.low.hours as the number of hours less than 3 °C, and T.high.hours as the number of hours above 35 °C.

Twenty-five of the 33 envirotyped sites were first-rotation plantations, with either pasture or cropping before establishment. The remaining eight sites were second rotation established after a first rotation of E. globulus. This first- versus second-rotation distinction was the only management-associated EV considered in the envirotyping process.

Partial least squares analysis to discover the major envirotypes for each breeding zone

The next task was to produce a parsimonious characterization of the 5 BZ classes from the curated enviromic dataset of 145 EVs. Each BZ was represented by a training dataset that simply classified sites as included or excluded, based on rB estimates. Sparse PLS-discriminant analysis (sPLS-DA) (Chung and Keles 2010; Lê Cao et al. 2011) was implemented in the R package “Mixomics” (Rohart et al. 2017) to identify which EVs were required to predict classification of sites into membership and non-membership classes for each BZ (a PLS primer is provided in supplemental materials). All EV input data were centered and scaled and the NIPALS algorithm was implemented within Mixomics to input missing values.

For each BZ, the modeling process commenced with specification of 10 components (latent variables) and 20 variables per component, to gauge a maximum classification accuracy possible with the correlation-based training data. The ROC-AUC or area under the receiver-operator characteristic curve (Brown and Davis 2006) was used as an index of classification accuracy. This initial step identified three outlier sites, which were removed from certain training datasets: site no. 16 was removed from BZs 2, 3, and 4 and site no. 31 was removed from BZ1 due to poor EV data quality in both cases, while site no. 21 was removed from BZ2 due to a history of poor weed control.

Parsimony in sPLS-DA model prescription was then achieved for each BZ model by iteratively reducing the number of components and EVs per component, while estimating classification error rates and ROC-AUC statistics. The goal was to achieve classification using 1 or 2 components and as few EVs as possible. Maximum, Euclidean, and Mahalanobis distance metrics were compared for classification on a two-dimensional plane involving two-component solutions. However, visually identified lines separating member sites from non-member sites produced fewer classification errors than these algorithms in each case. EVs used in classification and their respective coefficient values were output for each component for mapping purposes.

Spatial mapping of breeding zones

The goal of this exercise was to produce regional maps of BZs calculated for a sample of 15 planting years in the recent past. A network of 80 prediction locations was created; 40 across the WA estate and 40 across the eastern estate (see Fig. SM4). Priority was given to providing a wide spread of positions across each commercial estate. Locations were also generally chosen where three pre-existing soil pits and three sample collections for soil chemistry could be located within close proximity. Soil EVs for the prediction points were then calculated as the mean of values from these nearby datasets. Notable exceptions were eight sites at the western extent of the WA prediction area for which published chemical information was referenced.

Daily weather data was obtained from SILO for each of these 80 prediction points corresponding to the first 5 years commencing on July 1 on each year from 2002 to 2017 (15 years). Climate EVs that were determinants of BZ class assignment were calculated. The 2010 planting year in GT was excluded as it featured extremely heavy summer rains in the first year. None of the observed progeny trials were planted in 2010 in GT and the heavy summer rain created an atypical environment for the Y1dry phenological stage. BZ class prediction was then performed for each planting year at each of the 80 prediction locations, using the custom distance functions discovered with sPLS-DA. The proportion of planting years (2002–2017) for which each prediction point was classified to each BZ was summarized, and mapped in four quartile classes: 0–25%, 25–50%, 50–75%, and 75–100%. The expected area of each BZ was calculated as the total of products between each quartile mid-class value and its respective area, represented as percentage of area in each region.

Results

Additive genetic correlations across breeding zones

A FA3 model was first used to represent additive genetic variance and site–site correlations (rB) in stem volume across 48 trials of 126,467 full-sib progeny from two separate breeding programs that were connected by HBLUP. This FA3 model explained 85.9% of additive genetic variation across all 48 sites. The percentage of additive genetic variance accounted for by the FA3 model (\({v}_{{a}_{j}})\) ranged from 79.6 to 100 for the 33 envirotyped trials, with a mean of 94.5 (Table SM5 and Fig. SM5). The resultant rB matrix (\({{\varvec{\rho}}}_{{\varvec{a}}}\)) displayed six clear site groups, two of which partially overlapped (see lower left corner of Fig. 2). “Core” constituency was assigned to sites for each of 5 BZs, with BZ2 including core site members from two branches of the dissimilarity dendrogram (Fig. 2).

Fig. 2
figure 2

Heatmap of \({{\varvec{\rho}}}_{{\varvec{a}}}\) matrix of inferred additive genetic correlations among 48 trial sites from the FA3 model. Breeding zone classes used for training the sPLS-DA models are shown below the diagonal: BZ1 (green perimeters), BZ2 (blue perimeters), BZ3 (red perimeter), BZ4 (gold perimeters), and BZ5 (purple perimeters). The numbers in red indicate “core” constituents of each breeding zone (see the “Methods” section). The hierarchical tree uses average clustering. Diagonal cells with \({{\varvec{\rho}}}_{{\varvec{a}}}=1\) are shown in gray

Supplementary sites were added to four of the five BZs (see process outlined in Table SM6). This resulted in BZ1 with three supplementary sites along with four core sites, BZ2 with seven supplementary sites along with eight core sites, BZ3 with only the nine core sites, BZ4 with three supplementary sites along with seven core sites, and BZ5 with four supplementary sites along with five core sites (Fig. 2).

Characteristics of the major environmental signatures for each breeding zone

A subset of 33 trials with good quality soil data and explained variance in the FA3 analysis was envirotyped using 145 environmental variables (EVs). Sparse PLS-discriminant analysis was used to identify EVs that were required to predict classification of sites into five non-exclusive BZ classes on the basis of rB. The results are described below for each breeding zone (from BZ1 to BZ5). Four BZs were defined with two PLS components each (one to five EVs contributing to respective components), and one BZ was defined with a single PLS component (eight EVs). In each case, the goal was to minimize classification errors that may occur by classifying a site that was not assigned to a BV’s training set (i.e., false positive) or failing to classify a site that was assigned to a BV’s training set (i.e., false negative).

Site classification to BZ1 was predicted with one false positive and one false negative (ROC-AUC 0.96) using two PLS components (Table 4, Fig. 3A). The first component was composed of two significantly correlated (r = 0.94, p < 0.005) EVs named Y1.5_optH and Y1.5_GDD (Table 5). These EVs describe the number of optimum growing hours and growing degree days, respectively, in the combined establishment phase, first-year dry season, and growth phases of the first 5 years. The second component was composed of two EVs: most significantly SLT100 (PLS coefficient 0.98), which is the silt content at 60–100 cm, and C_N, which is the C:N ratio (PLS coefficient − 0.22; Table 5). Taken together, these EVs describe BZ1 as an envirotype characterized by optimum growing temperatures and siltier soils with greater nitrogen levels.

Table 4 Summary of sPLS-DA results classifying sites to each of 5 breeding zones
Fig. 3
figure 3

sPLS-DA classification plots for A BZ1, B BZ2, C BZ3, and D BZ4. Orange and blue symbols represent sites within, and out of, each respective training set, respectively (see Table SM6). Lines represent custom distance metrics (see Table 4)

Table 5 Environmental variables and coefficient values which were found by sPLS-DA to characterize 5 breeding zones

Site classification to BZ2 was predicted with no false assignments (ROC-AUC 0.98) using two PLS components (Table 4, Fig. 3B). The first component was composed of five EVs. Two were soil EVs COL_K (Colwell available K) and C_N (C:N ratio) and three were climate EVs estab_hotH (hours with temp > 35 °C in the establishment phase), Y3_precip_tot (total precipitation in year 3, which is strongly correlated to precipitation in growing periods across the first 3 years), and Y1.3dry_days (total length of dry seasons in the first 3 years; Table 5). The second component was composed of 3 EVs: most significantly two correlated (r = 0.84) EVs Y1.5dry_cumVPD and Y1.5_precip_deficit, which are cumulative vapor pressure deficit and difference between cumulative rainfall and evaporation, respectively, in dry seasons across years 1–5 (Table 5). The third EV contributing to the second component of BZ2 was Y1.3_BIO10; mean temperature of the warmest quarter. Taken together, these EVs describe BZ2 as an envirotype characterized by longer, warmer, and drier summers with better-quality soils (higher K and N) and fewer hot days during establishment. Note that with the classified sites for BZ2 in the lower-right of Fig. 3B, more positive values of the first PLS component and more negative values of the second PLS component will support assignment of an envirotype to BZ2 (relevant for interpreting Table 5).

Site classification to BZ3 was predicted with one false positive and one false negative (ROC-AUC 0.96) using 2 PLS components (Table 4, Fig. 3C). The first component was composed of four EVs. The most significant EV contributing to the first component was 1R_BIN, which is the binary coding differentiating first rotation and second rotation sites (PLS coefficient − 0.95, Table 5). This management EV separates the two clear site groups along the X-axis of Fig. 3C, with second-rotation sites to the right. The three lesser EVs contributing to the first component were the correlated (r =  − 0.81) EVs SLT60 and SND60 (silt content and sand content, respectively, at 30–60 cm) and Y1.3_radn_tot (total radiation in the combined establishment phase, first-year dry season, and growth phases of the first 3 years; Table 5). The second component was composed of four climate EVs: estab_precip_tot (total precipitation during the establishment phase), Y1dry_Tmax_ave (average daily maximum temperature in the first dry season, highly correlated with dry-season maximum temperatures in the first 3 years), Y1dry_days (length of first dry season), and Y1.3_radn_tot (total radiation in the combined establishment phase, first-year dry season, and growth phases of the first 3 years; Table 5). Taken together, these EVs describe BZ3 as an envirotype characterized by predominantly second rotation, somewhat sandier sites, less establishment-period rainfall, and higher temperatures in summer.

Site classification to BZ4 was predicted with one false positive and two false negatives (ROC-AUC 0.96) using 2 PLS components (Table 4, Fig. 3D). The first component was composed of a single EV: Y2.3_precip_cool6mo (precipitation in the coolest 6 months of years 2–3; Table 5). This EV is significantly correlated with other total precipitation EVs (Table SM4), including total precipitation in the first 3 years (r = 0.92). The second component was composed of four climate EVs: estab_hotH (number of hours with temperature > 35 °C during the establishment phase), Y1.3_BIO17 (precipitation of the driest quarter), Y1.3_BIO10 (mean temperature of the driest quarter), and Y1dry_Tmax_ave (average daily maximum temperature in the first dry season, highly correlated with dry-season maximum temperatures in the first 3 years; Table 5). Taken together, these EVs describe BZ4 as an envirotype characterized by drier winters (and lower overall annual rainfall), cooler at establishment, and somewhat wetter and cooler in summer.

Site classification to BZ5 was predicted with perfect accuracy (ROC-AUC 1.0) using a single PLS component (Table 4). The PLS component was composed of eight EVs, seven of which were related to soils: C_N (C:N ratio), pH_H2O (pH in water), TOT_N (total N), SLT100 (silt content at 60–100 cm), estab_hotH (number of hours with temperature > 35 °C during the establishment phase), SND > 200 (sand content deeper than 200 cm), CLY > 200 (clay content deeper than 200 cm), and S&R_FC > 200 (soil water at field content deeper than 200 cm; Table 5). The last three of these EVs are highly inter-related. Taken together, these EVs describe BZ5 as an envirotype characterized by low nitrogen, higher pH, and somewhat sandier soils and higher established-period temperatures.

Envirotypes at trial sites

Six progeny trial sites qualified as BZ1 following the sPLS-DA classification (see Table SM7 for details). Four of these were concentrated inland of Albany (Fig. 4A), where the milder temperatures and better-quality soil characteristic of this zone may be expected. One BZ1 site was in the East Albany district and shared BZ classification with zones 2 and 3 (Fig. 4A), and one was inland of Esperance (Fig. 4B). BZ2 was the breeding zone most assigned to progeny trials (15 of 33) and was allocated in all three districts (Albany, Esperance, and GT; Fig. 4). It was the sole breeding zone found in the central part of the GT Region (Fig. 4C). BZs 3, 4, and 5 were assigned to nine trials each. BZ3 was distributed through the drier part of the Albany Region and near the Victoria/South Australia border in the GT Region (Fig. 4A and 4C). Trials in the eastern part of Esperance region were characterized predominantly as BZ5 (Fig. 4B) and trials in the eastern part of the GT Region were characterized as BZ4 (Fig. 4C).

Fig. 4
figure 4

Maps of progeny trial sites classified into breeding zones using the sPLS-DA classification in A Albany district, WA, B Esperance district, WA, and C Green Triangle region. See Fig. 1 for geographical context

One site in the East Albany district was classified to BZs 1, 2, and 3 concurrently (see intersection in Fig. 5) and 13 trials were classified to two breeding zones concurrently (Fig. 5). Zones 2 and 3 had four sites in common, as did zones 2 and 4 (Fig. 5).

Fig. 5
figure 5

Venn diagram of trial site classification to 5 breeding zones by the sPLS-DA classification

The envirotype classification system produced within-zone mean rB between 0.76 and 0.84 (Table 6). In contrast, the previous region-based classification system produced within-zone mean rB of 0.62 for WA and 0.67 for GT (Table 6). The distinctions in rB between within- and across-class were also greater for BZs in the envirotype classification system (Table 6).

Table 6 Summary of additive genetic correlations (rB) within and between site definitions based on regions and on new envirotypes

Spatial mapping of breeding zones

Mapping of envirotypes showed that BZ1 was commonly found in the high-yielding near-coastal areas west of Albany and less frequently inland and east of Albany (Fig. 6). The occurrence of BZ1 was very uncommon in GT (Fig. 7). BZ2 is the most represented envirotype across both regions. Based on recent past climate, 74% of the estate area in WA and 51% of area in GT are expected to represent BZ2 based on climate between 2002 and 2022 (Fig. 7). In WA the areas consistently representing BZ2 are generally in the drier part of the estate (inland), whereas in GT the BZ2 envirotype was widely distributed (Fig. 6).

Fig. 6
figure 6

Distribution of breeding zones in (left-hand panel) WA and (right-hand panel) GT according to the percentage of incidence in modeled planting years commencing 2002– 2017. Note that BZ3 is represented for both first-rotation (1R) and second-rotation (2R) management

Fig. 7
figure 7

Expected area by breeding zone in WA (red bars) and GT (green bars). “1R”: first rotation, “2R”: second rotation

BZ3 is an envirotype that is relatively rare in WA under recent past climate on 1R sites (expected 11% by area, Fig. 6), but it is more common in WA on 2R sites (expected 42% by area; Fig. 6). In GT, 14% of area is expected to be in the BZ3 envirotype on 1R sites and 62% is expected by area on 2R sites (Fig. 6), making it the most represented envirotype for 2R sites in GT (Fig. 7).

The BZ4 is an unexpected envirotype in WA, but it is important in GT, especially in the more eastern part of the commercial area (Fig. 6). The total expected area of BZ4 in GT is 38% based on past climate (Fig. 7). BZ5 is relatively rare across WA and GT (up to 20% by expected area; Fig. 7). In WA, it was allocated to areas in the south-west that were also consistently allocated to BZ2 (Fig. 6). In GT, BZ5 was distributed most in the western part of the estate, where it overlapped with BZ3 for 2R sites (Fig. 6).

Discussion

This study proposed a detailed environmental classification of breeding zones for E. globulus in two regions of mainland Australia. Previous work with these E. globulus populations has demonstrated the potential of genomic technology to predict genetic values of untested genotypes (Callister et al. 2021). The present results will help to expand our prediction abilities from untested genotypes to untested environments (Crossa et al. 2021). For E. globulus in Australia, the untested environment of greatest interest is the future environment within the existing commercial regions (Pinkard et al. 2015). However, the first hurdle to predicting future climate responses (Fradgley et al. 2023) was to characterize eco-physiologically relevant aspects of the environments in which progeny have already been tested.

Envirotyping of breeding zones helps to understand eco-physiological aspects of adaptation

Challenges to accurate envirotyping for trees compared with agricultural crops include the far deeper soil environment affecting trees and the poorly understood progression of phenological development. Lack of trial-specific EV data with suitable precision has been cited as a limitation to similar studies by Ivković et al. (2015), Gapare et al. (2015), and Dutkowski et al. (2015). The use of direct soil observations wherever possible was an important requirement for accurately identifying the role of soil and landscape EVs in characterizing envirotypes, especially BZ1, BZ2, and BZ5. This highlights the importance of collecting good quality soil data at every genetic trial location, and of carefully positioning trials to avoid crossing boundaries in soil types. The phenological system we applied appears to have been useful, considering the predominance of climate EVs utilizing this system and the generally high levels of BZ classification accuracy produced by the sPLS-DA procedure. Nevertheless, a precise system for defining phenology of E. globulus based on dendrometer and physiological measurements is needed to improve the clarity of climate EVs in future enviromic studies.

Many previous enviromic studies have developed prediction models accounting for GE on continuous variation in EVs (i.e., reaction norms), most notably for annual crops (e.g., Costa-Neto et al. 2022, 2020; Li et al. 2021; Tolhurst et al. 2022). This approach facilitates extrapolation of genetic predictions at unobserved values of EVs and it may offer higher resolution genetic predictions rather than sharply defined boundaries between classes. On the other hand, more advanced treatments of GE in forest trees have classified areas into discrete zones based on EVs (Gapare et al. 2015; Ukrainetz et al. 2018; Yu et al. 2022). It is possible that genetic interactions with environmental attributes are more complex for forest trees than for annual crops, necessitating the inclusion of more EVs than can reasonably be included in a reaction norm model. Another explanation is that annual crop datasets are often much larger than those for forest trees, providing sufficient EV data to model the resolution required for reaction norms.

We took the unusual approach of allowing each site to be classified to multiple zones for training the sPLS-DA model to rB data. Previous GE studies that assigned each site to a single zone based on hierarchical clustering have published rB matrix heatmaps also showing that certain sites were highly correlated to groups outside of their assigned cluster, including for P. radiata (Cullis et al. 2014) and P. contorta (Ukrainetz et al. 2018). The phenomenon of multiple zone membership for a subset of sites may therefore be common to other species and contexts as well as ours, and it makes biological sense that environmental determinants of genetic expression need not be mutually exclusive. For example, if we examine the overlapping site membership between BZ2 and BZ3 we find four sites that meet the requirements of drier summers (BZ2) and lower establishment-period precipitation (BZ3); higher available potassium with lower C:N ratio (BZ2) and second-rotation management (BZ3). For overlapping site membership between BZ2 and BZ4, we find four sites that meet the requirements of drier summers (BZ2) and lower precipitation in the coolest 6 months (BZ4), in other words, drier throughout the year. On the other hand, certain combinations of zones were incompatible, such as BZ5 with BZ1 or BZ2. BZ5 has a positive PLS coefficient for C:N ratio, whereas BZ1 and BZ2 have negative PLS coefficients for C:N (see Table 5). No progeny trial sites were in common between BZ5 and either BZ1 or BZ2 (see Fig. 5), although the far western part of the WA estate did associate strongly with BZ5 and BZ2 due to alignment of other EVs in the classification scheme.

Eucalyptus globulus GE has previously been shown to be influenced by summer precipitation deficit, evaporative demand, and maximum temperature at the subrace level (Costa e Silva et al. 2006), family level (Dutkowski et al. 2015), and in association with genomic markers (Butler et al. 2022). Discrimination along a continuum of summer evaporative demand loosely translates to classification of sites according to BZ2 (drier summers) in our classification system. Dutkowski et al. (2015) also identified minimum month evaporation as an alternative discriminatory EV, and Butler et al. (2022) found isothermality and precipitation to be informative of local adaptation. EVs related to precipitation featured strongly in our zonation, and minimum month evaporation would be related to BIO6 (minimum temperature), which in turn was strongly correlated to optimum growing hours and GDD, both significant to the BZ1 classification. On the other hand, isothermality (BIO3) was not selected by sPLS-DA as a significant EV for classification.

The GE we have observed in these populations could be influenced by interactions with insect herbivores and diseases as well as direct physiological responses to the abiotic environment. The most impactful pests during the first 5 years of these trials are likely to have been Eucalypt weevil (Gonipterus spp.) and autumn gum moth (Mnesampela privata), while the principal disease would have been Teratosphaeria leaf disease (TLD; Teratosphaeria nubilosa and Teratosphaeria cryptica). Higher mean temperature of the warmest quarter (BIO10; characteristic of BZ2) is expected to favor both insect pests, while higher summer rainfall (BIO17; characteristic of BZ4) will favor TLD spore production and growth (Pinkard et al. 2017). It is therefore possible that differential tolerance of these biotic impacts is at least partially responsible for the GE in our E. globulus populations, especially with respect to BZ2 and BZ4.

Technical impacts of BZ mapping for E. globulus breeding

We discovered that BZ3 is more likely to be assigned to a site in second rotation (2R) than in first rotation (1R), which follows Raymond's (2011) finding that prior land use impacts on GE in P. radiata. It is a conclusion of operational significance, as all the Australian E. globulus improvement programs include a large proportion of 1R trials in breeding value estimation for seed that will be predominantly deployed to 2R sites. Nevertheless, we found that 2R status does not classify a site as solely BZ3. Four 2R sites classified as BZ2 as well as BZ3, one 2R site classified as BZ5 as well as BZ3, and one 2R site classified as BZ1. The 2R management factor could be considered a proxy for EVs relating to initial soil water and nitrogen, both of which are generally reduced in 2R relative to 1R (Mendham et al. 2011).

The range of mean within-envirotype rB from 0.76 and 0.84 reported here (see Table 6) compares favorably with published within-class rB from similar studies: 0.60 to 0.81 by Dutkowski et al. (2015) and 0.67 to 0.83 by Ukrainetz et al. (2018). We are confident that the envirotype classification presented here will produce greater within-zone estimates of genetic gain than the previous regional-based classification. Our approach also achieved a reconciliation of data from trials from a district that is no longer part of the commercial estate (Esperance) with the current commercial sites. It is notable that the four eastern-most Esperance sites qualified as BZ5, which is a relatively minor envirotype for ongoing breeding and deployment, so these trials will no longer contribute as significantly to future estimates of breeding value in the EG1 or EG2 programs.

Limitations and future improvements in envirotyping for E. globulus

There are some important limitations to our approach and suggested improvements for implementation subject to future data availability. Envirotype definitions created from the available data excluded explicit influences of pests and diseases, most notably TLD (Andjic et al. 2010), as well as the influence of ground water especially in the Wattle Range (South Australia). We have possibly overlooked a whole phenological stage corresponding to growth in the warmest 3 months after 2 years that would be triggered by rare summer storms. This study also ignored any effect of age on GE, although all phenotypes used in envirotyping were measured within a relatively narrow range of 3 to 5 years post-establishment, during which age-age genetic correlations are expected to be close to 1.0 (Salas et al. 2014; Stackpole et al. 2010). Although the rB matrix resulting from FA analysis is a logical starting point for defining breeding zones, there is a concern that its elements are estimated with varying precision, depending on the degree of genetic connection between pairs of sites. Future studies could apply variable weights to rB to account for precision as part of the BZ definition procedure. This could be done in the context of a critical examination of the non-exclusive BZ concept, specifically to identify a less subjective basis for including “supplemental” sites to “core” groups of sites than we have applied. Lastly, we have ignored GE due to differential levels of genetic expression between environments (Li et al. 2017), which is also an important consideration when structuring breeding programs and planning deployment. These avenues of exploration are all open for future review and will be assisted by the acquisition of new EV data and new sites.

Expectations have increased substantially since Barnes et al. (1984) wrote: “It is probably only realistic to expect to detect, explain and use (GE) when a single environmental factor affects an economically important trait in a predictable manner.” The results of our study show conclusively that a multitude of EVs affect GE in E. globulus in a complex, yet predictable manner. These results will be used to immediately reclassify breeding zones for cross-program analyses of these EG1 and EG2 breeding populations, thereby mitigating the loss of realized gains due to deploying families into less suitable environments. The BZ classification system described here is also being used in conjunction with climate change forecasts to predict which envirotypes will be more prominent in the future, to guide selection and breeding strategy.