Field trials and measurements
Ten genetic control-pollinated progeny trials were planted in each 1996 and 1997 by the Southern Tree Breeding Association, in the “BR” trial series. The trials are listed together with their locations and dates of planting in Table 1, and the map of locations is given in Fig. 1. All trials had replication, incomplete block and row-plot design features, and were represented on a row-column grid for spatial analyses (see below). The assessments of diameter growth at breast height (DBH) at juvenile stage (i.e. 6–10 years) were used for the analyses.
Table 1 State, region, abbreviation of National Plantation Inventory region (NPI), location coordinates, altitude and planting year for BR1996/1997 trial series
One of the most important issues in undertaking a G × E analysis is the assessment of the degree of genetic connectivity between trials. The concurrence matrix is based on the numbers of parents in common between trials is given in Online Resource 1. Number of parent trees represented in each trial ranged from 41 to 177 and the parent trees in common between trials from 31 to 165. There are no clear cut guidelines which determine an acceptable level of connectivity; however, it is obvious that disconnected or poorly connected data sets will not permit reliable estimation of the co-variance.
Environmental variables
Basic climate, geological and soil information for BR1996/97 trial locations are given in Table 2. Daily climate data for selected locations within Australia were extracted from the SILO Climate Database (http://www.longpaddock.qld.gov.au/silo/index.html). The daily climate data have been constructed using observations from 4,600 locations across Australia for rainfall, maximum and minimum temperatures, evaporation and solar radiation using spatial interpolation algorithms. A simple monthly aridity index (AIX) was calculated as a ratio of monthly mean daily pan evaporation rate to the total monthly rainfall. The aridity index is related to water balance, and monthly minimum AIX value for the most arid quarter was used to rank the sites in terms of aridity. Climate data sequence from planting to DBH assessment was obtained for each trial.
Table 2 Some basic climate, geological, and soil information for locations of BR1996/97 trials: Mean Temperature of Growing Season (Sep-Apr) (TGS), Mean Temperature Driest Month (TDM), Precipitation in Growing Season (Sep-Apr) (PGS), Precipitation Driest Quarter (PDQ), Total Evaporation (TE), Aridity Index (AIX), Parent Rock Code (PRC), Weathering Index (WI), Top soil Bulk Density (BD), Top soil PH
A seamless national coverage of outcrop and surface geology was obtained from Geoscience, Australia (http://mapconnect.ga.gov.au/MapConnect/Geology). Soil types were categorised using a technical classification system into 11 categories of parent rock code. This system was developed to group forest sites according to expected volume productivity (Turner et al. 2001). In addition, a high resolution weathering intensity index for the Australian continent, based on airborne gamma-ray spectrometry and digital terrain analysis was also obtained (Wilford 2012).
Australian Soil Resource Information System (ASRIS, www.asris.csiro.au) database was used to obtain information on soils in a consistent format across southern Australia. The information was obtained on soil depth, water storage, permeability, fertility, carbon and erodibility, with most soil information recorded at five depths. Soil profile data with fully characterised sites were also available from forestry organisations and companies.
Statistical analyses
Spatial analyses
Spatial analyses were performed using a two-dimensional separable autoregressive (AR1) model fitted to column-row grid diameter data of each trial using ASReml® (Gilmour et al. 2009). The spatial method partitioned the residual variance into an independent component and a two-dimensional spatially auto-correlated component. An additional random term with one level for each experimental unit was used so that a second (so called “units”) error term was fitted (Dutkowski et al. 2002).
All terms were fitted in a single model, where the spatial, extraneous (e.g. assessor) and treatment effects are estimated simultaneously. Diagnostic tools, variogram and plots of spatial residuals as recommended by Gilmour et al. (2009) were used to detect extraneous effects. Spatial adjustment to raw data was done for surface sum experimental design (i.e. replicate and plots) and/or spatial variation. Using the adjusted data, each trial was first analysed as univariate (i.e. single-site) in order to estimate the genetic variance components, before proceeding with multi-site analyses.
Statistical model, variance components and genetic parameters
The following reduced (i.e. parental) linear mixed-effects model was used for multi-environment analyses:
$$ y=X\tau +Z{u}_p+Z{u}_f+e $$
where y is a vector of observations formed by stacking the data for each trial j
y = (y
T1
, y
T2 …
y
T
j
)T, b is a vector of fixed effects (i.e. site mean), u
p
is a vector of random additive genetic effects (i.e. female parent), u
f
is a vector of random non-additive effects (i.e. full-sib family), and e is a vector of random residual terms. X and Z are known incidence matrices relating the observations in y to effects in b, and p, f, respectively. The random effects in the model were assumed to follow a multivariate normal distribution with means and variances defined by:
$$ {u}_p\sim N\left(\mathbf{0},{\sigma}_p^2A\right),{u}_f\sim N\left(\mathbf{0},{\sigma}_f^2I\right)\mathrm{and}\;e\sim N\left(\mathbf{0},{\sigma}_e^2I\right) $$
where 0 is a null vector; A is the numerator relationship matrix, which describes the additive genetic relationships among individual genotypes; I is identity matrix, with order equal to the number of full-sib families or number of trees; σ
2
p
is additive parental variance; σ
2
f
is the non-additive variance between full-sib families; and σ
2
e
is the residual variance.
For all models, Restricted Maximum Likelihood (REML) derived variance and covariance estimates using the ASReml program (Gilmour et al.
2009) were constrained to fall within the theoretically possible range; variance components estimates were constrained to be greater than zero. Two separate analyses were used to test for the significance of variance components and the difference between the 2 × log-likelihoods (i.e. likelihood ratio) of a full model that included a term and a reduced model without the term was obtained, and tested against a one degree of freedom chi-squared distribution to estimate p values.
Individual heritability (h
2) and proportion of dominance variance (d
2) were calculated following Costa e Silva et al. (2004):
$$ {h}^2=\frac{\sigma_a^2}{\sigma_P^2}=\frac{4\times {\sigma}_p^2}{2\times {\sigma}_p^2+{\sigma}_f^2+{\sigma}_e^2} $$
$$ {d}^2=\frac{\sigma_d^2}{\sigma_P^2}=\frac{4\times {\sigma}_f^2}{2\times {\sigma}_p^2+{\sigma}_f^2+{\sigma}_e^2} $$
where σ
2
a
is additive genetic variance component (4*general combining ability), σ
2
d
is non-additive variance component (4*specific combining ability), σ
2
P
is phenotypic variance, and the other parameters as defined previously. Standard errors of genetic correlations were derived based on Taylor series approximation using the R pin function (White 2013). Considering regionalisation of breeding in southern Australia, the numerator of heritability becomes σ
2
a
+ σ
2
a × r
as opposed to only σ
2
a
with no regionalisation, where and σ
2
a × r
is genotype by region variance (Atlin et al. 2000).
The site-site genetic correlations were confounded with the age-age correlations, but the assumption was that the between-site genetic correlations were not significantly influenced by the age of assessments, because the age-age correlations for a narrow range of ages are usually very high (e.g. Li and Wu 2005).
Extended factor analyses
The availability of closely related trees planted across a wide range of environments provided genetic links and allowed estimation of across-site variance and covariance components. Flexible (i.e. unstructured) covariance can account for both scale and rank interactions, but with many environments, the estimation of an unstructured covariance matrix is not feasible (Cullis et al. 2014). Mixed model analysis with an extended factor analytic (XFA) variance structure for the G × E effects and separate variance for the errors for each trial was used here (Gilmour et al. 2009). The XFAk form is a sparse formulation that requires an extra k levels to be inserted into the mixed model equations for the k factors.
The reduced (i.e. parental) mixed model in extended factor analytic model was used to efficiently model variance structure of the additive (and non-additive) G × E effects (e.g. Beeck et al. 2010):
$$ var\left({\boldsymbol{u}}_p\right)=\boldsymbol{A}\otimes {\boldsymbol{G}}_{tp} $$
$$ var\left({\boldsymbol{u}}_f\right)=\boldsymbol{I}\otimes {\boldsymbol{G}}_{tf} $$
$$ {\boldsymbol{G}}_{ts}={\boldsymbol{\varLambda}}_{ts}{\boldsymbol{\varLambda}}_{ts}^{\boldsymbol{T}}+{\boldsymbol{\psi}}_{ts},s=p,f $$
where: A is the relationship matrix with 1+ inbreeding coefficient as diagonal and coefficient of parentage as off-diagonal elements, G
jp
is additive genetic variance-covariance matrix for parental effect in trial j, G
jf
is non-additive genetic variance-covariance matrix for full-sib family effect in trial j, I is an identity matrix, Λ is a t × k
s
matrix of trial loadings (t being number of trials k being the number of factors included in the model), and Ψ = diag (ψ
j
) and where ψ
j
is the specific variance for the jth trial.
Heat map and hierarchical clustering
Cullis et al. (2010) recommended tools for exploring G × E interaction based on a factor analytic model. A heat map representation of the estimated genetic correlation matrix was used, with rows and columns ordered appropriately. A useful ordering is obtained by cluster analysis of trials. The cluster analysis used dissimilarity computed as 1-ρ
ij
(where ρ
ij
is the estimated genetic correlation between trials i and j) as a measure of distance between trials, and hierarchical clustering algorithm (i.e. complete linkage method with Euclidean distance measure), that is implemented within the hclust package in R (R Development Core Team 2011). The relationship between-site clustering and environment was examined by correlating site loadings for XFA1 and XFA2 with climate and soil variables (Costa e Silva et al. 2006).
Principal component (PCA) ordination was used to plot the variables of environmental data matrix in two-dimensional representations. The BiplotGUI package provides a graphical user interface for the construction and manipulation of ordination plots in R (La Grange et al. 2009). The trials are represented as points and climate variables as axes, with coordinates on principal component scales.