Environment characterisation for the interpretation of environmental effect and genotype × environment interaction
- First Online:
- Cite this article as:
- Lacaze, X. & Roumet, P. Theor Appl Genet (2004) 109: 1632. doi:10.1007/s00122-004-1786-6
- 117 Views
Increasing attention is being paid to environment characterisation as a means of identifying the environmental factors determining grain protein content (GPC) in durum wheat. New insights in crop physiology and agronomy have led to the development of crop simulation models. Those models can reconstruct plant development for past cropping seasons. One major advantage of these models is that they can also indicate the intensity of limiting factors affecting plants during particular developmental stages. The main environmental factors determining GPC in durum wheat can be investigated by introducing the intensity of limiting factors into genotype × environment (G×E) models. In our case, limiting factors corresponding to water deficit and nitrogen availability were calculated for the development period between booting and heading. These variables were then introduced into a clustering model. This model is an extension of factorial regression applied to discrete environment and genotypic variables. This procedure effectively described the environment main effect: around 30.9% of the sum of squares of the environment main effect was accounted for, using less than 33% of the degrees of freedom. It also partially accounted for G×E interaction. Our methodology, coupling the use of crop simulation and G×E analysis models, is of potential value for improving our understanding of the main development stages and identification of environmental limiting factors for the development of GPC.
Genotypes grown in multi-environment trials may react differently to a range of climatic conditions, soil characteristics or technical practices. These differential responses of genotypes in different environments are known collectively as the genotype × environment (G×E) interaction.
Formal ANOVAs demonstrate the existence of G×E interaction but do not provide sufficient information for analysis of the differences in response of individual genotypes to each environment. Several approaches and models have been developed for analysis and interpretation of the G×E interaction. Yates and Cochran (1938) and Finlay and Wilkinson (1963) were among the first to propose a regression-based procedure for relating genotypic performance to an environmental index. This approach resulted in the classification of genotypes into three classes: varieties well adapted to favourable environments, varieties adapted to unfavourable environments and genotypes that are non-specific or display intermediate levels of adaptation. One major problem with this type of regression analysis is that the proportion of the interaction accounted for by the heterogeneity of regression is generally low. Furthermore, environments may be considered favourable or unfavourable, but these notions provide no useful information for biological interpretation of the G×E interaction.
Multiplicative models provide a more advanced approach to the analysis of G×E interaction. First formalized by Gollob (1968) and Mandel (1969), these models make it possible to quantify the specific adaptation of one genotype to one environment. The interaction term is broken down into a sum of products involving genotype and environment parameters. These parameters are estimated by performing an analogous procedure, such as principal component analysis, on the residuals of the additive model summarising the G×E matrix in two dimensions. The additive main effects and multiplicative interaction [(AMMI), Gauch 1992; Vargas et al. 1999] procedure provides useful information for the analysis of genotypic and environmental stability. As for joint regression, the analysis is performed a posteriori and provides no explanation of the origins of G×E interaction. Interpretation of the results obtained with multiplicative models requires the identification of external information matched to predefined groups of genotypes or environments. Then, environmental variables can be added in an AMMI biplot (Reynolds et al. 2002).
If information concerning external environmental (or genotypic) variables, such as meteorological data, earliness or time to flowering are available, these variables may be correlated to or regressed on the environmental scores estimated by factorial regression (Denis 1989), biadditive factorial regression (Denis 1991) and clustering models (Denis and Vincourt 1982). Factorial regression models are usually linear models accounting for G×E interaction by differential cultivar sensitivity to explicit external environmental variables. The influence of these external variables on G×E interaction can be tested statistically. Environmental characteristics are regressed on main additive effects and/or interaction terms (Brancourt-Hulmel et al. 2001; Foucteau et al. 2001). Factorial regression approaches can be carried out with software such as the INTERA package (Decoux and Denis 1991). The relevance of the regression variables depends on the percentage of the sum of squares of main effects and interaction terms explained by these variables and the number of degrees of freedom involved in the analysis (Brancourt-Hulmel et al. 1997). Biadditive factorial regression uses linear combination of environmental factors as regressors on G×E interaction (Brancourt-Hulmel and Lecomte 2003). Clustering analysis is based on the same principles as factorial regression, since the environmental (or genotypic) covariates are not continuous, but are instead discrete variables, related to classes of intensity of these variables. Although clustering models and factorial regression may potentially provide a biological interpretation of the environment main effect and G×E interaction, the efficiency with which these analyses explain the interaction depends on the nature of the covariates (Desclaux 1996).
The covariates can be collected on special genotypes chosen for their known reaction to environmental factors. Data collected on well-known genotypes, ‘the probe genotype approach’, avoid the necessity for the direct measurement of environmental data and generates covariates corresponding directly to the factors limiting plant development (Brancourt-Hulmel et al. 2000; Desclaux 1996).
Recent decades have seen a number of new developments in crop physiology and agronomy, and some integrated approaches to crop simulation modelling have been developed. Such approaches make it possible to reconstruct the plant development cycle and to determine whether a stress occurred at any given development stage. This approach can be extended to a number of past cropping seasons. Nevertheless, this promising approach requires the collection of meteorological data and their integration into crop simulation models. There is currently no example of use of information derived from crop simulation models to analyse G×E interaction.
The aim of this study was to develop a methodology, using information derived from crop simulation and G×E models, to account for variation in the grain protein content (GPC) of durum wheat varieties evaluated in a multi-environment network.
We aim to show how information derived from crop simulation models can be used to interpret both environmental effect and the adaptation of a given genotype to specific environments. We illustrate this method using durum wheat GPC data.
Materials and methods
The data set was obtained from the French ‘Comité Technique et Permanent de la Sélection’ network; it included a total of 111 site-by-year combinations and 48 genotypes evaluated over 8 years. The study sites were located between 43.3°N and 45.4°N and at longitudes of −0.57° to 5.9°. Each genotype was tested in at least two cropping seasons (i.e. a biennial network) from 1992/1993 to 1998/1999. Only two varieties, Ardente and Néodur, were grown in every environment in all 8 years. In each biennial network, 6–12 varieties were tested over 11–21 environments. If a site was present in a given network in two consecutive years, it was considered as two distinct sites.
Each trial consisted of two replicates, with a plot size of approximately 10 m2 for each genotype. Each trial was treated with fungicides. The experimental design was either a split plot or a crisscross design. The traits measured in each trial plot were heading date, grain yield, thousand kernel weight, kernel weight per metre squared and GPC. In the field, lines were considered headed when 50% of the spikes had emerged from the flag leaf. GPC was determined by Kjeldahl’s method. We used a 1-g sample taken from the 3 kg of grain harvested in bulk for N analysis.
The data collected for each environment included climatic data (daily temperature, rainfall), cultural practices (fertilisation and irrigation) and soil characteristics [texture, depth and useful water reserve (UWR)].
Plant nitrogen nutrition and development stage
The booting-to-heading stage has been reported to be one of the most critical phenological phases for grain development. Nitrogen uptake increases dramatically during this period to reach a maximum accumulation rate (Gate 1995). We therefore focused our attention on this period of the plant cycle—the booting-to-heading stage.
As only heading date was recorded, a thermal (expressed in degree days) and vernalophotothermal index (Gate 1995), was used to estimate booting stage and to calculate the date and duration of the booting–heading period.
The duration of the booting-to-heading stage was simulated for each environment. Sowing and heading dates were available for all trials, whereas the booting stage was estimated according to the crop phenological schedule described by Gate (1995). As the mean heading date for the variety Néodur was close to the mean for the series of varieties (less than 1 day away from the mean, with a standard deviation of 1.1 days), this variety was used to characterise plant phenology at each site. Therefore, growth requirements for the variety Néodur were used for these simulations.
Climatic covariates were calculated for the booting-to-heading stage in each environment.
Each location was characterised climatically, using data from the nearest meteorological station (less than 20 km away) for temperature, radiation, rainfall and potential evapotranspiration (PET). The difference between maximum evapotranspiration (MET) and real evapotranspiration (RET) provided a daily estimate of water deficit (WD): WD=RET−MET. In this formula, MET=kcPET, where kc, the plant development index, was fixed at 1.2 for the booting-to-heading stage. RET was therefore calculated as a function of MET and water supply (WS) for the plant (RET=MET+WS). Water supply was estimated by calculating UWR, and precipitation (P). We considered that WS(t)=UWR(t − 1)+P, with the UWR at time (t−1) represented as UWR (t −1). Based on UWR values, the amount of available moisture (AM) was calculated and compared with WS. If WS(t)>AM, no WD occurs and RET(t)=MET(t). In addition, when WS<AM, RET(t)=MET(t)+WS/AM (Gate 1995). In these calculations, according to Soltner (2000), we assumed that AM is equivalent to 60% of the UWR.
Experimental sites were classified according to the intensity of WD occurring during the booting-to-heading stage. Three classes of WD, representing a gradient of water stress intensity, were defined—locations with no WD during the booting-to-heading stage, locations with a cumulative WD of 0–40 mm and experimental sites with a WD greater than 40 mm.
The availability of nitrogen to the plants was also considered. The application of nitrogen during the booting-to-heading stage was described by the variable SOLUB. SOLUB is 0 if no nitrogen was given during this period or 1 if at least one nitrogen application was performed.
A synthetic variable SOLWD was generated from the variables WD and SOLUB to simultaneously take into account water and nitrogen status during the booting-to-heading stage. From the three WD groups and the two SOLUB groups, we generated six SOLWD groups by subdividing the groups generated for WD and SOLUB.
We began by using the Néodur and Ardente data set. An ANOVA (GPC = year + region + variety) was performed to compare the magnitude of the factors ’year’ (eight levels) and ‘region’ (two levels, southeast and southwest). This analysis was carried out using the SAS procedure GLM (SAS Institute 1996).
Secondly, for each biennial data set, we fitted a fixed ANOVA model using INTERA software (Decoux and Denis 1991). All analyses were conducted with two primary effects, genotype, and environment. The residual term included a G×E interaction, because data for a given variety were not repeated at a given location.
Within this formula, s and t refer to the group level of genotypes and environment, respectively. Indices b and w stand for ‘within’ and ‘between’ group. Finally, i refers to the ith genotype and j to the jth environment. The additive model is included in this model as αi=αsb+αiw and βj=βtb+βjw.
The components between (B) and within (W) are readily comprehensible. The term αb represents the variation between groups with levels constant, whereas αiw only accounts for variation within groups. The mean of αiw was zero.
Ultimately, to recover the full interaction model we add the term θ ij= θstbb+θsjbw+θitwb+θijww.
Where θbb is the interaction term between groups of genotypes and groups of environments and θww the residual variation. The θbw and θwb terms are not readily comprehensible. They correspond to the interaction between groups of genotypes within one group of environments and vice versa. The efficiency of a model is defined as the ratio of the percentage of the sum of squares of main effects and interaction explained by the model to the percentage of degrees of freedom used. This analysis was performed for either the environment main effect or the residual term including both a pure error term and G×E interaction (software INTERA, Decoux and Denis 1991).
Nitrogen was applied to the crop during the booting-to-heading stage (variable SOLUB) in only 22 of the 111 environments. This cultural practice was not observed in the first two biennial networks and was rare in the 1994/1995 network (12% of the environments). In other networks, it occurred at 35–50% of the sites, and the amount of N fertiliser applied was between 30 kg/ha and 60 kg/ha.
ANOVA for grain protein content (GPC) of Ardente and Néodur over 111 site-by-year combinations. TSS Total sum of squares explained by the factor
Source of variation
Clustering analysis for GPC of Ardente and Néodur over 111 site-by-year combinations. SOLUB Nitrogen availability, WD water deficit, SOLWD combined availability of water and nitrogen
SS env (%)a
SS tot (%)a
Biennial network analysis
GPC variation was similar to that reported previously, ranging from 9.3% to 20%, with a mean of 13.5%. Mean GPC for all varieties was highest in 1997, reaching 15.7%, and lowest in 1994 (12.6%).
ANOVA for genotypes evaluated on biennial networks. SS Sum of squares
Source of variation
ANOVA for genotypes evaluated on biennial networks. Results are expressed as a percentage of total variation explained
Source of variation
Effect of environmental factors on GPC for all sites and all genotypes (SD standard deviation)
Clustering analysis on environment main effect
Clustering analysis for GPC of genotypes evaluated in biennial networks. Results are percentage of the sum of square of environment main effect explained. Degrees of freedom are given in parentheses. NS number of sites
Clustering analysis on G×E interaction
Clustering analysis for GPC of genotypes evaluated in biennial networks. Results are percentage of the sum of squares of residual and interaction effect explained and percentages of degrees of freedom absorbed are given in parentheses
For the 1997/1998 network, the first (MT1) and second (MT2) multiplicative terms accounted for 39.8% and 18.9% of the residual variation, respectively. A biplot representation of environment and genotypic main effects against the multiplicative genotypic and environmental second term, MT2, was drawn (Fig. 4a, b) and additional information concerning WD clustering was added. Overall, the multiplicative term, MT2, differentiated sites under stress from those with no water stress (Fig. 4a), whereas the plots of genotypes depended primarily on earliness (Fig. 4b). Comparison of the two biplots suggested that early lines produced higher GPC than later lines under stress.
Crop simulation models use agronomic and meteorological data collected over past cropping seasons to provide source elements for the reanalysis of past networks, without additional measurements. To illustrate this point, a large network (8 years, 111 locations, 48 genotypes) was analysed according to environmental conditions during the booting-to-heading stage, which is a key phase in the development of grain number and nitrogen accumulation. The analysis performed on the two genotypes tested in each location indicated that defining an environment in terms of limiting factors at this phenological stage was almost as relevant for explaining GPC as the simple but robust analysis of year and region. Based on the analysis of all genotypes, we identified two environmental parameters as most discriminatory for grain protein content: WD and nitrogen availability during the booting-to-heading stage.
However, stress during the booting stage may be correlated with limiting factors occurring during other developmental stages. Thus, stress at booting may be only indicative of other stresses occurring later in the life cycle of the plant. Nevertheless, the GPC was consistently explained by SOLWD (five of the seven biennial networks tested), demonstrating the importance of this variable. The reason for fluctuations in the percentage variation accounted for by WD (and hence SOLWD) may lie in the strong correlation with the number of trials in which WD occurred. This suggests that water deficit, when it occurs, is the most discriminating environmental factor accounting for variation in GPC. WD probably exerts its effects by reducing the number of grains per metres squared. These results confirmed the importance of water and nitrogen supply to development of GPC at the booting stage and are consistent with previous observations (Ottman et al. 2000; Strong 1982; Wuest and Cassman 1992).
These environmental covariates probably also account for G×E interaction. However, we underestimated the relevance of these variables for interpreting G×E, as we had to pool interaction terms and pure residual variation when estimating the sum of squares. Our diagnosis also provided us with a tool for the interpretation of G×E interaction. The multiplicative model gave an evaluation of the specific adaptation of genotypes to environments.
The originality of our work lies in the calculation of covariables corresponding to limiting factors in the plant cycle. These covariables may be introduced into G×E interaction models such as clustering models. By calculating water and nitrogen availability we were able to explain variations of GPC. Thus, our approach has the potential to provide us with an understanding of the environmental bases of phenotypic plasticity and local adaptation. This work focused on a particular stage of development—the booting-to-heading stage. Limiting factors may also influence the development of GPC at other developmental stages. We need to extend our analysis to the entire life cycle of the plant to determine which developmental stages are discriminant for GPC. This could lead to the identification of several developmental stages as being critical for the development of GPC. Further investigations are required to determine the genetic bases of phenotypic plasticity and local adaptation. This should lead to the identification of quantitative trait loci (QTLs) involved in the response to limiting factors. This approach is now possible, by means of factorial regression analysis to account for variation in the additive effects of QTLs on material evaluated in several environments. Our method can also be used to screen evaluation networks for redundant locations. By characterising each site in terms of the frequency of limiting factors occurring during various developmental stages, it should be possible to identify sites at which similar stresses occur.
We would like to thank Philippe Gate for his active participation and the Arvalis Institut du Végétal for providing the climatic data and software necessary for the analysis. We also thank Christelle Crespin for providing the experimental data from the CTPS network.