Integration of Crop Growth Models and Genomic Prediction

Onogi, Akio

doi:10.1007/978-1-0716-2205-6_13

Akio Onogi⁴

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2467))

3908 Accesses
5 Citations
2 Altmetric

Abstract

Crop growth models (CGMs) consist of multiple equations that represent physiological processes of plants and simulate crop growth dynamically given environmental inputs. Because parameters of CGMs are often genotype-specific, gene effects can be related to environmental inputs through CGMs. Thus, CGMs are attractive tools for predicting genotype by environment (G×E) interactions. This chapter reviews CGMs, genetic analyses using these models, and the status of studies that integrate genomic prediction with CGMs. Examples of CGM analyses are also provided.

You have full access to this open access chapter, Download protocol PDF

Modelling of Genotype by Environment Interaction and Prediction of Complex Traits across Multiple Environments as a Synthesis of Crop Growth Modelling, Genetics and Statistics

Challenges in Integrating Genetic Control in Plant and Crop Models

Modelling QTL-Trait-Crop Relationships: Past Experiences and Future Prospects

Key words

1 Introduction

Because genotype by environment (G×E) interactions are one of the key issues in plant breeding, genomic prediction (GP) to predict G×E interactions is an active area of investigation. One approach to this problem is to use crop growth models (CGMs) to take environmental conditions into account. CGMs simulate plant development based on various environmental (e.g., air temperature and soil water) and management (e.g., sowing dates and fertilizer application) inputs. CGMs are calibrated using data from controlled environments and field trials. Large CGMs often use parameters calibrated for each module/process independently using a variety of datasets. Once calibrated, CGMs can predict phenotypes for new (untested) environments if conditions at these environments are given. When CGMs are calibrated for each genotype, estimates of CGM parameters can differ between genotypes. Differences in parameters are considered as representing the differences in response to environmental stimuli among genotypes. Thus, G×E interactions may be modeled using CGMs calibrated for each genotype.

CGMs are connected with genes via various parameters and state variables (see next section). Genetic analyses, such as quantitative trait loci (QTL) mapping, are integrated into CGM parameters/state variables by treating them as “trait phenotypes.” This idea is transferred to studies integrating GP with CGMs [1, 2]. These studies predict phenotypes of new genotypes for untested environments. CGM parameters for new genotypes are first predicted with GP, then phenotypes for untested environments are predicted by running the CGM with predicted parameters and environmental and management inputs. This integration of CGMs with GP is hereafter referred to as GP-assisted CGM (Fig. 1). Alternatively, CGMs can support GP by predicting secondary (indicator) traits or via inferring growth stages. Such integration is termed CGM-assisted GP (Fig. 1).

The terminology used in this chapter includes “calibration,” which can be read as “training” which is often used in current quantitative genetics. Both terms are used when estimating model parameters from available data (i.e., training data). These terms are strictly distinguished from prediction, which refers to forecasting plant performance. The term “estimate” is used to indicate estimation of CGM or GP model parameters by applying the models to available datasets. The term “simulate” indicates running CGM using given environmental input regardless of whether the input is for current or future conditions.

Since the first attempt of GP-assisted CGMs reported in 2015 [1], attempts to use this fruit of crop science, CGMs, with GP have been continued. In this chapter, CGMs (concept, history, and classes), genetic analyses of CGM parameters, integration of CGMs with GP, and R examples for GP-assisted CGMs are briefly introduced to encourage additional empirical studies.

2 Crop Growth Models

The headwater of CGMs is a systematic approach introduced by B. Jensen in the early nineteenth century for understanding plant biomass production. Jensen is also famous for recognizing the hormone responsible for phototropism, auxin. His approach was advanced by subsequent studies, including the work of Monsi and Saeki [3], who developed mathematical models to simulate canopy production based on photosynthesis efficiency by introducing the concept of light interception. One of the first CGMs, ELCROS, was developed in 1970 [4]. ELCROS is a dynamic model, able to simulate biomass production based on canopy photosynthesis. Subsequently, various CGMs have been developed and used for research and practical purposes. Bouman et al. [5] provide a historical view of CGMs and the school of de Wit and Jones et al. [6] also provide historical perspectives. Muller & Martre [7] provide a brief and understandable introduction to CGMs, and summary of the genealogy of CGMs. CGMs currently widely used include DSSAT [8] and APSIM [9]. Both CGMs use a graphical user interface (GUI) and cover a wide range of crops. DSSAT is also available as an R package [10]. CGMs suitable for experiment purposes may be found in the “Qualitative Plant” database [11]. This database is a good starting point for finding an appropriate CGM.

Components of comprehensive CGMs are state variables (X) representing current plant status (e.g., leaf area index, biomass, and developmental stages), rate variables (R) representing rates of change in the state variables, environmental variables (E) representing environmental inputs (e.g., air temperature and radiation), and parameters characterizing functional relationships among the variables, X, R, and E [12]. State variables can be represented as:

$$ X=\int R\mathrm{d}t=\int f\left(X,E\right)\mathrm{d}t $$

reflecting the dynamic nature of CGMs. State variables X then relate to each other, and feedback loops exist among variables. Consequently, comprehensive CGMs comprise many parameters to be determined. These parameters are often determined from experiments under controlled environments.

A more descriptive approach is also used to evaluate dry matter production based on radiation use efficiency [13]. In such simplified models, biomass accumulation, which is modeled as the relationship between photosynthesis and respiration in comprehensive CGMs, is assumed to be proportional to the amount of solar radiation that plants absorbed. Accumulation of dry matter, w, is typically calculated as:

$$ \frac{\mathrm{d}w}{\mathrm{d}t}= SRE\times SR\left[1-r-\left(1-{r}_a\right)\exp \left(- kLAI\right)\right] $$

where SRE is solar radiation use efficiency; SR is solar radiation; r and r_a are reflectivity of canopy and soil, respectively; k is the radiation extinction efficiency of the canopy; LAI is the leaf area index [13]. Grain yield is then calculated by partitioning dry matter, which is known as the concept of harvest index (HI). Such models are hereafter referred to as “simplified CGMs.” Fewer parameters are needed for model application, and thus, simplified CGMs are often used for integration with GP.

Other CGMs are functional-structural plant models (FSPMs) that simulate the three-dimensional (3D) architecture of plants by integrating physiological processes with plant 3D architecture [14]. CGMs discussed above assess canopy-level processes. In contrast, FSPMs treat individual plant processes [7]. FSPMs will become more important for crop science as phenotyping technologies develop and more precise information on phenotypes becomes available. To date, only a single example of the use of FSPM for GP is reported [15]. This study uses MAppleT model, which simulates the above-ground development of apple trees.

Conventional growth curves, such as logistic [16] or Gompertz [17], are often used to model growth trajectories for both plants and animals. Growth curves usually do not account for environmental conditions, and thus, have no advantage in GP aimed at prediction of G×E interactions. However, because both growth curves and CGMs are mathematical models that treat dynamic (time-dependent) processes, studies using growth curves are also introduced together with CGMs in this chapter. In some cases, lessons on statistical inference on model parameters can be shared, as described later. Further, the extension of growth curves to include environmental conditions is possible, as illustrated by Campbell et al. [18]. This study demonstrates high similarity between growth curves and CGMs. Note that, although QTL mapping using growth curves is categorized as a class of functional mapping, reviewing all classes of functional mapping is beyond the scope of this chapter, and only a class based on growth curves (i.e., parametric models) is mentioned.

3 Gene-Based Models

CGMs consist of multiple parameters, some of which are regarded as genotype-specific and may explain differences in response to environments between genotypes. A popular application of CGMs is, therefore, to define or search ideotypes [19, 20]. That is, plants with ideal phenotypes for a given environment are searched in silico by modifying model parameters. The parameter values resulting in such ideal phenotypes are then regarded as breeding targets. Genotype-specific parameters of CGMs have various alias including genetic coefficients [21], input traits [22], physiological traits [1], and genotypic parameters [23].

A first attempt to connect CGM parameters with genes is reported in a common bean study [24]. Parameters were linked to known genes responsible for phenology, growth habit, and seed size using linear regression. Subsequently, the amount of phenotypic variation explained by CGMs was assessed with fitted parameter values. The model developed by the authors, GeneGro, was updated [25] by adding several genes responsible for photoperiod sensitivity. Modeling that links parameters of CGMs with genotypes of known genes is often called gene-based modeling. Chapman et al. [26] used such a model for sorghum to simulate phenotypes for traits, such as transpiration efficiency and flowering time, in future breeding environments. Stewart et al. [27] developed a gene-based model for soybean to predict flowering time by incorporating flowering genes (E loci). Similar approaches were implemented to simulate yield responses in soybean [28] and to analyze the effects of Vrn and Ppd loci that affect vernalization and photoperiodism, respectively, on flowering in wheat [29]. A comprehensive summary on the integration of genes/QTLs with CGMs is provided by Wang et al. [30].

In the gene-based models, model parameters selected as genotype-specific are expressed as linear combinations of gene effects. Gene effects are estimated after model parameter estimation using training data, a so-called “independent” or “two-step” approach. The term “independent” reflects model calibration/training that is initially performed independently for each genotype. Subsequent genetic analysis on model parameters is also conducted independently from this parameter estimation (Fig. 2). The opposite “joint” or “single-step” approach uses a process where parameter estimation for all genotypes and genetic analyses on these parameters are performed simultaneously (Fig. 2). This issue is discussed further later in this chapter.

4 CGMs and QTL Mapping

CGMs are also used to discover novel genes. QTL mapping and association analyses on parameters or state variables of CGMs are used for such exploration. Since the first attempt for yield of barley [31], many such studies have been reported (Table 1). Typically, models used are simplified CGMs or models for limited biological processes that have fewer than 10 genotype-specific parameters. QTL mapping on CGM parameters is usually conducted on parameters estimated for each genotype (independent approach). QTL mapping is also often conducted for growth curves (Table 2). This approach can be regarded as a type of functional mapping. Growth curves do not take environmental information into account, but QTL mapping on growth curve parameters has added benefits for avoiding multiple testing problems that appear when mapping at individual time points.

Table 1 QTL/GWAS analyses on crop growth model parameters

Full size table

Table 2 Genetic analyses on growth curve parameters

Full size table

Because parameters or state variables of CGMs have interpretable roles in physiological processes, QTL mapping on these parameters/variables is expected to provide more detailed information on QTL functions than mapping on final traits. Typical examples can be found in two studies on phenological traits in rice and wheat. In rice [32], a phenology model, developmental rate (DVR) model, was used to map flowering-time QTLs in backcross inbred lines. The DVR model has multiple genotype-specific parameters, including parameters that represent photoperiod and temperature sensitivity. QTL mapping on these two parameters showed partially overlapping QTLs. Known QTLs, including Hd1 and Hd2, were detected for both parameters, and two additional major QTLs, Hd8 and Hd9, were detected only for parameters related to photoperiod and temperature sensitivity, respectively. Interestingly, Hd9 was not detected when using heading dates (i.e., final trait) as the response variable. Hd8 and Hd9 are still not well characterized, though Hd9 is possibly involved in a photoperiod-independent pathway of heading regulation.

For wheat, a phenology model with two genotype-specific parameters was used to map flowering-time QTLs in an association panel. These parameters represented photoperiod sensitivity and vernalization requirement. The latter is related to temperature. Association analyses revealed 12 and 11 QTLs for these parameters, respectively. No photoperiod and vernalization QTLs colocalized, suggesting that the processes governed by these parameters are underpinned by different genetic architecture.

These examples illustrate that QTL mapping on CGM parameters can increase the interpretability of QTL mapping. However, caution is needed. When parameters are correlated, QTLs found for one parameter are not necessarily associated with the process that the parameter is assumed to be involved in. The QTLs may actually be associated with other correlated parameters.

5 CGMs and GP

5.1 Overview

Integrating CGMs with GP is achieved in two ways, GP-assisted CGMs and CGM-assisted GP (Figs. 1 and 2, and Tables 3 and 4). In the former, genotype-specific parameters of CGMs are predicted using GP, then phenotypes of target traits (e.g., yield or flowering time) are predicted using CGMs taking environmental information as inputs. Thus, phenotypes are predicted by the CGM, and GP aids these predictions. This approach is an extension of gene-based models and QTL mapping on CGM parameters. If the definition of GP is simultaneous fitting of genome-wide markers irrespective of effect size, the first attempt of GP-assisted CGMs was reported in 2015 [1]. In this conceptual study based on simulations, four parameters in a maize CGM that takes solar radiation and temperature as environmental inputs are regressed on whole-genome markers. Yields for new plants under new environments were then predicted using the same CGM.

Table 3 Genomic prediction-assisted crop growth models

Full size table

Table 4 Crop growth model-assisted genomic prediction

Full size table

Early applications of GP-assisted CGM to real data were reported by Cooper et al. [33] and Onogi et al. [2]. In the former study, a maize CGM [1] was modified to integrate drought stress and was applied to real maize data. In the latter study, the DVR model was combined with GP to predict rice heading dates. A common feature of these studies is, interestingly, adoption of a “joint” approach; CGM parameters of all genotypes were jointly estimated and marker effects, or additive genotypic effects, on CGM parameters were also estimated jointly (Fig. 2). The above studies [1, 2, 33] realized this approach by developing hierarchal models that incorporate CGMs with GP and applying parameter estimation procedures in Bayesian frameworks. In the maize case [1, 33], approximate Bayesian computation [34] was adopted; for case [2], a hybrid approach of Markov chain Monte Carlo (MCMC) and variational Bayesian inference was developed.

A joint approach, however, is not essential for GP-assisted CGMs. All except one study [35] on GP-assisted CGMs subsequently published adopts an independent approach (Table 3). CGM is applied to each genotype independently, and then estimated CGM parameters are subject to GP independently from CGM fitting (Fig. 2). A major advantage of an independent approach is its ease of implementation. When published CGMs, such as APSIM or DSSAT, are used, CGM implementation is unnecessary. Moreover, an independent approach is applicable to complex CGMs with many parameters. Conversely, a joint approach requires the development of hierarchical models and estimation procedures and is likely to be more difficult to apply as CGMs become more complex.

CGM-assisted GP uses CGMs to assist GP in multiple ways (Table 4). Largely three approaches have been proposed. The first is the use of CGMs to infer plant growth stage. Predicted growth stages are used successively for inferring environmental covariates that affect G×E interactions [36, 37]. Such environmental covariates are then used to create kernels between environments to assess reaction norms in mixed models [38]. The second approach is the use of CGMs to characterize environments [39]. A comprehensive CGM, APSIM, was first calibrated for a typical variety at reference environments and run for target environments. Nitrogen nutrition indices output by the CGM to indicate crop nutritional balance were then used to characterize these environments. GP was conducted by considering the genotype-by-index interactions. The third approach is the use of CGMs to predict secondary or indicator traits. For example, heading dates for new environments are predicted using a CGM, then predicted heading dates are included as covariates in mixed models of GP to predict yield [40]. Using real data of winter wheat, it was shown that including predicted heading dates increase the prediction accuracy of GP for between-environment prediction. Heading date or flowering time can affect various traits [41] and is generally easier to predict than yield with CGMs. Thus, this approach may be a good alternative to GP-assisted CGMs.

Note that GP-assisted CGMs and CGM-assisted GP will have qualitatively different roles in plant breeding. Although both methods predict phenotypes, to be exact, CGM-assisted GP predicts genetic merits or breeding values. Thus, CGM-assisted GP will be suitable for selecting candidates to increase genetic gain, whereas GP-assisted CGM will be suitable for designing ideal phenotypes under given environmental conditions.

5.2 Predictive Ability of GP-Assisted CGMs

Applications of integrated CGMs and GP have gradually increased, but comparative studies are still insufficient. In particular, GP-assisted CGMs have not been compared with other methods that consider G×E interactions (e.g., mixed models with reaction norms). Thus, the usefulness of GP-assisted CGMs in phenotype prediction is not fully understood. GP-assisted CGMs were compared with ordinary GP methods, such as genomic BLUP (GBLUP ) [1, 33, 35] and BayesA [42], which cannot account for G×E interactions, for between-environment predictions. GP-assisted CGMs showed better accuracy than GBLUP /BayesA in simulations as expected [1, 35], but showed accuracy similar to GBLUP with real data [33]. These results probably reflect the small number of environments used (two), too few for calibrating CGMs. In several other studies [2, 15, 43,44,45], proposed methods were not compared with other methods in between-environment predictions mainly because authors had other objectives. For example, the primary focus of Onogi et al. [2] was a comparison of joint and independent approaches. Alimi [43] compared single- and multi-trait GP models to predict CGM parameters with an independent approach and showed the superiority of multi-trait models. Rosen et al. [45] reported that CGM performance was better when CGM parameters were predicted through GP than when parameters were predicted using major QTLs.

An interesting comparison study is a biomass prediction for rice using longitudinal data [46]. An independent approach based on a simplified CGM was compared with an ordinary GP method (Lasso) and an approach that replaces the CGM with machine learning (linear regression or random forests) in an independent framework. Machine learning methods use state variables predicted with GP as inputs, then output phenotypes of the target trait (biomass). The two methods (the independent approach based on a CGM and the approach that replaces the CGM with machine learning) showed similar accuracy and much better accuracy than Lasso in between-environment prediction. Chen et al. [47] compared an independent approach based on the DVR model with XGBoost [48] that takes both environmental covariates and marker genotypes as inputs for prediction of heading dates of rice. Results were less promising for CGMs because XGBoost generally provides lower prediction errors.

In summary, use of GP-assisted CGMs remains a conceptual approach that needs further development to realize its potential for phenotype prediction. Issues to be considered to improve predictive ability are: (a) the choice of CGMs; (b) the choice of genotype-specific parameters of CGMs, and (c) the accuracy of CGM parameter estimation.

The predictive ability of CGMs differs depending on the model used and the target traits. Thus, the appropriate choice of CGMs is critical for high prediction accuracy. Predictive ability is also affected by relationships (or similarities) between target and training environments. The remaining two points are reviewed in the following sections.

Importantly, a fundamental assumption for use of CGMs for G×E analyses is that genotype-specific parameters are constant among environments and differences of phenotypes of a genotype (i.e., reaction norms) are brought about by environmental conditions via CGMs. However, this assumption often does not hold [49]. In such cases, mega-environments are needed where CGM parameters are consistent.

5.3 Genotype-Specific Parameters in CGMs

Despite the long history of CGMs, no good means exists for choosing genotype-specific parameters. This issue is, however, critical and can directly affect the predictive performance of GP with CGMs. Model ability to describe phenotypic variations depends on parameters chosen as genotype-specific [50]. Uncertainties in prediction are also affected by these designations [51]. Actually, as mentioned in the last Section (8.2), the predictive ability of the DVR model increased by increasing the number of parameters chosen as genotype-specific. A popular practice to determine genotype-specific parameters is to adopt knowledge garnered from the published literature. Usually, CGMs are calibrated for multiple major varieties. Thus, model parameters that show variation among varieties can be empirically deduced. Another method is to identify parameters that cause large variations in phenotype via sensitivity analyses [52], even though genetic aspects are lacking in this procedure. Considering that the aim is high prediction accuracy, a practical approach is to select a set of genotype-specific parameters based on predictive ability examined with cross-validation. This technique is time-consuming, but feasible when CGMs are small.

5.4 Parameter Estimation

Parameters for joint GP-assisted CGMs [1, 2, 33, 35] were estimated in Bayesian frameworks, which is reasonable considering the hierarchical structure of joint models. Bayesian methods, such as MCMC or an extension of MCMC, differential evolution adaptive metropolis (DREAM, [53]), and generalized likelihood uncertainty estimation [54], are often used for CGMs’ optimization [47, 51, 55,56,57]. Because prior distributions in Bayesian methods are statistically equivalent to penalties, parameter estimation by Bayesian methods could increase the accuracy of predictions. Non-Bayesian and general optimization methods such as Nelder–Mead optimization [58] or particle swarm optimization (PSO, [59]) can also be used for optimization of CGMs and growth curves. To prevent overfitting, adding penalties to objective functions may be useful. These optimizers (Nelder–Mead and PSO) are implemented with user-friendly R scripts, and are good candidates as a first step, particularly when small CGMs are used.

Prior distributions might be useful for avoiding identifiability problems in parameter estimation [56]. CGMs are an accumulation of physiological processes represented as mathematical equations, and parameters are often redundant and interrelated. Thus, parameters can compensate for each other and cause identifiability problems (i.e., different parameter values can result in the same output [49]). Correlated parameters may not be identifiable from data only, and different prior distributions can augment identifiability. However, estimates obtained with strong prior distributions are not the result of statistical learning. Rather, they reflect prior assumptions. Likewise, compensation among parameters raises a concern for the interpretation of QTLs. QTLs may be detected for multiple parameters that are interrelated, and compensation in physiological processes may result in QTLs detected for parameters controlling different processes.

For GP-assisted CGMs, two approaches for parameter estimation are used, joint and independent. The independent approach is easier to implement, but the joint approach has two major advantages: (a) uncertainty in CGM parameter estimation can be considered in GP, whereas in the independent approach, the uncertainty increases noises and leads to lower heritability of the parameters; (b) information from the phenotype of a genotype can be used simultaneously for CGM parameter inference for other individuals, through genome-wide marker effects. These advantages were first discussed in a simulation study using growth curves [60]. Onogi [61] showed that the joint approach could estimate model parameters more accurately than the independent approach in various mathematical models. The difference between the two approaches becomes more prominent as the number of phenotypic values used for estimation decreases, e.g., a large proportion of missing phenotypes or large sampling intervals in longitudinal data. The joint approach is often applied to growth curve analyses [60, 62,63,64] probably because model structures of growth curves are simple and easy to extend.

An interesting approach to increase the accuracy of parameter estimation of CGMs is to optimize the members (environments) of multi-environment trials (METs) used for calibrating the CGMs [65]. Using a set of pre-determined CGM parameter values, the method identifies a set of environments that provides diverse outputs (phenotypes). Using these environments for calibration, the accuracy of CGM parameter estimates can be increased. If this idea is modified such that the set of parameters is optimized whereas the members of METs are fixed, genotype-specific parameters may be more effectively chosen.

6 Examples of CGMs Applications

6.1 Overview of Examples

Popular comprehensive CGMs, such as APSIM and DSSAT, are available upon registration at their respective sites, which also supply rich supporting materials [66, 67]. Both programs are equipped with a GUI and cover a wide range of crops. Thus, connecting these CGMs with GP using an independent approach is not a hard task. On the other hand, applications of GP and QTL mapping are still limited to small CGMs or growth curves, and thus, it is possible to implement in-house CGMs with an arbitrary language and to integrate them with GP, QTL mapping, and genome-wide association studies. An advantage of in-house development is that fitting of CGMs, and subsequent genetic analyses, can be accomplished in a uniform digital environment that researchers/breeders know. Moreover, models can be diagnosed easily and extensions of the models are also feasible. Examples of CGMs implemented with R and Rcpp are presented below. The models are the DVR used in [2], and the maize growth model used in [1].

6.2 DVR Model

The DVR model predicts heading (flowering) time of rice from daily mean temperature (T, °C) and photoperiod (P, h) [32, 68]. The model is:

$$ {DVS}_D=\sum \limits_{d=1}^D{DVR}_d $$

$$ {DVR}_d=\left\{\begin{array}{c}f\left({T}_d\right), if\ {DVS}_d<{DVS}_1\ \mathrm{or}\ {DVS}_d>{DVS}_2\\ {}f\left({T}_g\right)g\left({P}_d\right), if\ {DVS}_1\le {DVS}_d\le {DVS}_2\end{array}\right. $$

$$ f\left({T}_d\right)=\left\{\begin{array}{c}{\left[\left(\frac{T_d-{T}_b}{T_o-{T}_b}\right){\left(\frac{T_c-{T}_d}{T_c-{T}_o}\right)}^{\left(\frac{T_c-{T}_o}{T_o-{T}_b}\right)}\right]}^{\alpha }, if\ {T}_b\le {T}_d\le {T}_c\\ {}0, if\ {T}_d<{T}_b\ \mathrm{or}\ {T}_d>{T}_c\end{array}\right. $$

$$ g\left({P}_d\right)=\left\{\begin{array}{c}{\left[\left(\frac{P_d-{P}_b}{P_o-{P}_b}\right){\left(\frac{P_c-{P}_d}{P_c-{P}_o}\right)}^{\left(\frac{P_c-{P}_o}{P_o-{P}_b}\right)}\right]}^{\beta }, if\ {P}_b\le {P}_d\le {P}_c\\ {}0, if\ {P}_d<{P}_b\ \mathrm{or}\ {P}_d>{P}_c\end{array}\right. $$

$$ {DVS}_1=0.145G+0.005{G}^2 $$

$$ {DVS}_2=0.345G+0.005{G}^2 $$

where DVS_d and DVR_d denote developmental stage and rate at day d; DVS₁ and DVS₂ define the period when the plant is photo-sensitive; α and β are sensitivity coefficients of temperature and photoperiod; G represents the earliness of flowering under optimum temperature and photoperiod conditions. The index d indicates days after emergence. Indices T and P, b, o, and c, indicate base (lower limit), optimum, and ceiling (upper limit) of temperature and photoperiod, respectively. The model outputs D once DVS_D is >G, as days to heading. The model has multiple parameters that can be genotype-specific, such as α, β, G, T_b, T_o, T_c, P_b, P_o, and P_c. In some studies [2, 68], α, β, and G were assumed to be genotype-specific, and in others were determined based on prior knowledge. In another study [61], P_o and T_o were also assumed to be genotype-specific and this assumption resulted in better prediction accuracy. Thus, this assumption is followed here. Boxes 1 and 2 show R and Rcpp script examples of the DVR model that returns days to heading under given environmental conditions (daily mean temperature and photoperiod) and parameters (α, β, G, P_o, and T_o). The Rcpp script is faster than the R script and thus will be preferred.

Box 1 An R Script for the DVR Model (DVRmodel.R)

Temp and Photo are matrices that store daily mean temperature and photoperiod (day length), respectively. Rows of each matrix indicate days, and columns indicate environments. The first day (row) is the emergence day. The model is fitted for each environment successively and returns days to heading for each environment as a vector, DTH. Arguments G, Alpha, Beta, P_o, and T_o are scalar and indicate α, β, G, T_o, and P_o, respectively. The function returns Md + 1 if heading does not occur (Md is the number of days included in Temp and Photo).

Box 2 An Rcpp Script for the DVR Model (DVRmodel.cpp)

Temp and Photo are matrices that store daily mean temperature and photoperiod (day length), respectively. Rows of each matrix indicate days and columns indicate environments. The first day (row) is the emergence day. The model is fitted for each environment successively and returns days to heading as a vector, DTH. Arguments G, Alpha, Beta, P_o, and T_o indicate α, β, G, T_o, and P_o, respectively. The function returns Md + 1 if heading does not occur (Md is the number of days included in Temp and Photo).

6.3 Maize Growth Model

The maize growth model [1, 69] uses daily mean temperature (T, °C), solar radiation (SR, MJ/m²), and plant population (Pop, plants/m²) as inputs (Fig. 3). Plant growth is simulated based on thermal unit (TU), the cumulative temperature calculated from T subtracted by a base temperature (8 °C). Emergence occurs at 87 TU. Leaf number (LN) increases exponentially depending on TU until it reaches total (maximum) leaf number (TLN). Plant leaf area (PLA) is calculated by integrating the area of each plant (AR) by LN. TU also promotes senescence, represented as the fraction of senescent leaf area (FAS), and LAI is calculated from PLA, FAS, and Pop. Then, SR, solar radiation use efficiency (SRE), and LAI are used to simulate the increase of biomass each day. Female heading (silking) occurs when TU exceeds 67 after the end of leaf growth. Grains grow based on a thermal unit for grain (TU_grain), calculated using a base temperature of 0 °C until TU_grain reaches physiological maturity (MTU). HI increases linearly from 3 days after silking. TLN, area of the largest leaf (AM), SRE, and MTU are assumed to be genotype-specific. Muchow et al. [69] set SRE to 1.6 g MJ⁻¹ and reduced to 1.2 g MJ⁻¹ once TU_grain exceeds 500. Examples of R and Rcpp script for this maize growth model are provided in Boxes 3 and 4, respectively.

Box 3 An R Script for the Maize Growth Model (MaizeGrowthModel.R)

Temp and SR are matrices that store daily mean temperature and solar radiation, respectively. Rows of each matrix indicate days and columns indicate environments. The first day (row) is the sowing day. Population is the vector including the plant populations (plants/m²) for each environment. The model is fitted for each environment successively and returns grain weight at maturity as a vector, GW.maturity. Arguments TLN, AM, SRE, and MTU are scalar values. The integral of plant leaf area is solved numerically with an arbitrary width (0.5).

Box 4 An Rcpp Script for the Maize Growth Model (MaizeGrowthModel.cpp)

Temp and SR of the function MGM are matrices that store daily mean temperature and solar radiation, respectively. Rows of each matrix indicate days and columns indicate environments. The first day (row) is the sowing day. Population is the vector including the plant populations (plants/m²) for each environment. The model is fitted for each environment successively and returns grain weight at maturity as a vector, GW.maturity. Arguments TLN, AM, SRE, and MTU are scalar values. The integral of plant leaf area is solved numerically with an arbitrary width (0.5).

6.4 Examples of CGM Fitting

The DVR and maize growth models in Boxes 1–4 can be fitted to observed data—days to heading for the DVR model and grain weight for the maize growth model—by wrapping with R functions (Boxes 5 and 6). Wrapper functions (FitDVR and FitMGM) evaluate model performance using mean squared errors. Vectors of observed data (DTH and GW) are given as the last arguments. The length of these vectors is the number of environments, and the order of environments should be the same as the order of Temp, Photo, and SR. Model functions (DVRmodel and MGM) simulate phenotypes for each environment, and calculation time increases linearly as the number of environments increases. The accuracy of parameter estimation will also improve. Wrapper functions use a vector of parameters as the first argument, which is a requirement of optimizers, such as optim of library stats for the Nelder–Mead optimization and psoptim of library pso for PSO. Here psoptim is used. Ranges of parameters α, β, and G in the DVR model were determined arbitrarily, and ranges of P_o and T_o are defined by P_b, P_c, T_b, and T_c, set to 0, 24, 8, and 42, respectively. Ranges for maize growth model parameters were determined following [1]. Optimization depends on initial parameter values, and 10 sets of randomly assigned initial values are tested to obtain the best results. The number of initial value sets can be modified. After repeating the optimizing process for individuals (genotypes) in the training set, GP models can be trained using parameter estimates as phenotypes.

Box 5 Fitting and Optimization of the DVR Model

(RunDVRmodel.R)

DVRmodel.R and DVRmodel.cpp are files of scripts illustrated in Boxes 1 and 2, respectively. Installation of Rcpp and pso from CRAN (https://cran.r-project.org/) is required before running these scripts. The first argument of psoptim, rep (NA, 5), is just used to provide the number of parameters and is included primarily for compatibility with optim (see the manual of the function).

Box 6 Fitting and Optimization of the Maize Growth Model (RunMaizeGrowthModel.R)

MaizeGrowthModel.R and MaizeGrowthModel.cpp are files of scripts illustrated in Boxes 3 and 4, respectively. Installation of Rcpp and pso from CRAN (https://cran.r-project.org/) is required before running these scripts. The first argument of psoptim, rep (NA, 4), is just used to provide the number of parameters and is included primarily for compatibility with optim (see the manual of the function).

6.5 Examples of the Joint Approach

A joint approach has important advantages, but implementation is not an easy task. The use of probabilistic programing languages such as stan [70] and Edward2 [71], which offer automatic parameter estimation for arbitrary models that users specify, can facilitate implementation. If R is the preferred environment, the R package GenomeBasedModel [61] designed for the implementation of a joint approach can be used. This package allows automatic parameter estimation, based on a variational Bayesian (VB) framework. Posterior distributions of marker effects on CGM parameters are approximated with VB methods. These methods are sufficiently fast to estimate genome-wide marker effects. On the other hand, posterior distributions of the CGM parameters are usually not closed forms and thus, rapid approximation is often infeasible. Such parameters are inferred with MCMC methods using genome-wide markers in prior distributions. Two estimation steps (estimation of CGM parameters with MCMC and marker effects on parameters with VB) are repeated until convergence (see [61] for detailed algorithms).

To use the GenomeBasedModel, functions for CGMs should meet requirements of the package. Functions should take three arguments for each type of script, Input, Freevec, and Parameter, and output a vector of phenotypic values. Input includes environmental and management conditions, Freevec is used for any purposes in the function, and Parameter includes all parameter values. The default use of the CGM function in the package is to fit the function to each variety successively. This approach keeps the function simple, but results in repetitive function calls for model fitting for all varieties and thus increases calculation time. Thus, the package provides an alternative treatment that allows model fitting to all varieties in a single function call. The CGM function then takes inputs and parameters for all varieties as matrices, and repeats model fitting for all varieties, and returns phenotypic values for each environment as a matrix. Example functions of the DVR model and the maize growth model for this process are provided in Boxes 7 and 8, respectively. Only Rcpp scripts are supplied since they are more practical than R scripts for analyses that include many varieties, but both R and Rcpp functions can be used. The GenomeBasedModel can process with log and logit transformed model parameters. In the DVR model, parameters α, β, and G are assumed to be log-transformed for genome-wide marker fitting. Thus, they are scaled back using an exponential function (Box 7).

Box 7 An Rcpp Script for the DVR Model Designed for GenomeBasedModel (DVRmodel_GBM.cpp)

The first argument Input is a (2 × Ne × Md) × Nl matrix where Ne is the number of environments, Md is the maximum day, and Nl is the number of varieties. The upper half from the first to the Ne × Mdth row of Input is temperature and the lower half is photoperiod. Both temperature and photoperiod for each environment from emergence to Md are vertically stacked. Each column of Input includes measurements for each variety. Freevec is a vector including Nl, Ne, and Md. Parameter is an Np × Nl matrix of parameters for all varieties. Np is the number of parameters (5 in this example).

Box 8 An Rcpp Script for the Maize Growth Model Designed for GenomeBasedModel (MaizeGrowthModel_GBM.cpp)

The first argument Input is a (2 × Ne × Md) × Nl matrix where Ne is the number of environments, Md is the maximum day, and Nl is the number of varieties. The upper half from the first to the Ne × Mdth row of Input is temperature, and the lower half is solar radiation. Both temperature and solar radiation for each environment from emergence to Md are vertically stacked. Each column of Input includes measurements for each variety. Freevec is a vector including Nl, Ne, Md, and populations at each environment. Parameter is an Np × Nl matrix of parameters for all varieties. Np is the number of parameters (4 in this example).

Several additional arguments required to run the GenomeBasedModel are listed below. See the manual of the package for details. Boxes 9 and 10 illustrate scripts to run GenomeBasedModel.

1.
Input: a matrix with Nl columns, including environmental and management information, where Nl is the number of genotypes. Each column is used to fit the CGM for each genotype. For the DVR model, Input includes mean temperature and photoperiod; for the maize growth model, it includes mean temperature and solar radiation. See also the captions for Boxes 7 and 8.
2.
Freevec: a vector used for any purpose in the model function. In the DVR and maize growth models, Freevec is used to define partitions in Input (e.g., which elements in Input are temperature and photoperiod). See also captions for Boxes 7 and 8.
3.
Y: a (Ne + 1) × Nl matrix, including phenotypic values (e.g., days to heading for the DVR model) where Ne is the number of environments. The first row is the IDs of genotypes (must be numeric).
4.
Missing: a scalar value indicating missing records in Y. NA is not allowed.
5.
Np: an integer indicating the number of parameters (five for the DVR model and four for the maize growth model).
6.
Geno: a (Nm + 1) × Nl matrix, including marker genotypes, where Nm is the number of markers. The first row is the ID of varieties (must be numeric). Missing values in Geno are not allowed.
7.
Methodcode: a vector of length, Np, specifying methods for regression of markers on model parameters. The GenomeBasedModel offers multiple choices for regression methods, including Bayesian lasso [72], extended Bayesian lasso [73], Bayesian alphabets from A to C [42, 74], and GBLUP . Codes are assigned to regression methods and users specify codes for each parameter. Whole-genome regression methods, such as BayesB, BayesC, and extended Bayesian lasso, can be used to detect QTLs [61]. Normal prior distributions that do not depend on genome-wide markers can also be specified.
8.
Referencevalues: a vector of length, Np, including reference (typical) values of model parameters. These values are used to specify prior distributions and to check model function.

The argument, PassMatrix, indicates that the model is fitted for all varieties within the model function. In Box 9, an additional argument, Transformation, specifies whether model parameters are transformed in marker fitting. “log” and “logit” indicate log and logit transformation, respectively, and “nt” indicates no transformation. The first three parameters (α, β, and G) are log-transformed and thus, “log” is assigned (Box 7).

Box 9 An R Script for GenomeBasedModel for the DVR Model (RunDVRmodel_GBM.R)

See the text for arguments. MCMC is used for model parameter estimation, and GenomeBasedModel returns MCMC samples for each parameter. Posterior means of parameters can be estimated from MCMC samples. Different regression methods are specified for each model parameter with Methodcode.

Box 10 An R Script to Run GenomeBasedModel for the maize Growth Model (RunMaizeGrowthModel_GBM.R)

See the text for arguments. MCMC is used for model parameter estimation, and GenomeBasedModel returns MCMC samples for each parameter. Posterior means of parameters can be estimated from MCMC samples. Different regression methods are specified for each model parameter with Methodcode. Returned objects differ depending on the regression method.

7 Concluding Remarks

CGMs are attractive tools to consider G×E interactions. Many studies have been conducted to connect CGMs with genetics, but integration of CGMs with GP, which is a purely prediction-oriented approach, has a relatively short history. Thus, ideas on how to integrate these two methods to predict crop performance have not been exhausted yet. As reviewed here, there are two ways of integration: GP-assisted CGMs and CGM-assisted GP. The former can be conducted with either independent or joint approaches, and various options exist for parameter estimation (e.g., Bayesian or non-Bayesian). For CGM-assisted GP, approaches for integration other than ones considered to date might be developed, and CGM-assisted GP is likely to accommodate more variations than GP-assisted CGMs. Given global environmental changes, considering environmental information is essential for plant breeding. In such era, integration of CGMs and GP can make a substantial contribution to this effort. To this end, exploration of new approaches along with fair and comprehensive comparisons to conventional approaches is essential. A key issue is collaboration between CGM modelers and quantitative geneticists.

8 Script Availability

All scripts illustrated in the Boxes are available at: https://github.com/Onogi/IntegratingCGMwithGP.

References

Technow F, Messina CD, Totir LR, Cooper M (2015) Integrating crop growth models with whole genome prediction through approximate Bayesian computation. PLoS One 10:e0130855
Article PubMed PubMed Central Google Scholar
Onogi A, Watanabe M, Mochizuki T, Hayashi T, Nakagawa H, Hasegawa T, Iwata H (2016) Toward integration of genomic selection with crop modelling: the development of an integrated approach to predicting rice heading dates. Theor Appl Genet 129:805–817
Article PubMed Google Scholar
Monsi M, Saeki T (1953) Über den lichtfaktor in den pflanzengesellschaften und seine Bedeutung für die stoffproduktion. Jpn J Bot 14:22–52
Google Scholar
de Wit CT, Brouwer R, Penning FWT (1970) The simulation of photosynthetic systems. Proceedings of the IBP/PP technical meeting, Trebon, 14-21 September 1969, 47-70
Google Scholar
Bouman BAM, van Keulen H, van Laar HH, Rabbinge R (1996) The “school de Wit” crop growth simulation models: a pedigree and historical overview. Agric Syst 52:171–198
Article Google Scholar
Jones JW, Antle JM, Basso B, Boote KJ, Conant RT, Foster I, Godfray HCJ, Herrero M, Howitt RE, Janssen S, Keating BA, Munoz-Carpena R, Porter CH, Rosenzweig C, Wheeler TR (2017) Brief history of agricultural systems modeling. Agric Syst 155:240–254
Article PubMed PubMed Central Google Scholar
Muller B, Martre P (2019) Plant and crop simulation models: powerful tools to link physiology, genetics, and phenomics. J Exp Bot 70:2339–2344
Article CAS PubMed Google Scholar
Jones JW, Tsuji GY, Hoogenboom G, Hunt LA, Thornton PK, Wilkens PW, Imamura DT, Bowen WT, Singh U (1998) Decision support system for agrotechnology transfer: DSSAT v3. In: Systems approaches for sustainable agricultural development. Springer, Dordrecht, pp 157–177
Google Scholar
McCown RL, Hammer GL, Hargreaves JNG, Holzworth DP, Freebairn DM (1996) APSIM: a novel software system for model development, model testing and simulation in agricultural systems research. Agric Syst 50:255–271
Article Google Scholar
Alderman PD (2020) A comprehensive R interface for the DSSAT cropping systems model. Comput Electron Agric 172:105325
Article Google Scholar
Lobet G, Draye X, Périlleux C (2013) An online database for plant image analysis software tools. Plant methods 9. Database is at https://www.quantitative-plant.org/model. Accessed on 30th June 2021, 38
Horie T, Yajima M, Nakagawa H (1992) Yield forecasting. Agric Syst 40:211–236
Article Google Scholar
Monteith JL (1977) Climate and the efficiency of crop production in Britain. Phil Trans R Soc Lond B 281:277–294
Article Google Scholar
Vos J, Evers JB, Buck-Sorlin GH, Andrieu B, Chelle M, de Visser PH (2010) Functional-structural plant modelling: a new versatile tool in crop science. J Exp Bot 61:2101–2115
Article CAS PubMed Google Scholar
Migault V, Pallas B, Costes E (2016) Combining genome-wide information with a functional structural plant model to simulate 1-year-old apple tree architecture. Front Plant Sci 7:2065
PubMed Google Scholar
Verhulst PF (1838) Notice sur la loi que la population suit dans son accroissement. In: Correspondances Mathematiques et Physiques, publiee par A Quetelet, vol 10:113–120
Google Scholar
Winsor CP (1932) The Gompertz curve as a growth curve. Proc Natl Acad Sci U S A 18:1–8
Article CAS PubMed PubMed Central Google Scholar
Campbell MT, Grondin A, Walia H, Morota G (2020) Leveraging genome-enabled growth models to study shoot growth responses to water deficit in rice. J Exp Bot 71:5669–5679
Article CAS PubMed PubMed Central Google Scholar
Kropff MJ, Haverkort AJ, Aggarwal PK, Kooman PL (1995) Using systems approaches to design and evaluate ideotypes for specific environments. In: Systems approaches for sustainable agricultural development. Springer, Dordrecht, pp 417–435
Google Scholar
Quilot-Turion B, Ould-Sidi MM, Kadrani A, Hilgert N, Génard M, Lescourret F (2012) Optimization of parameters of the ‘virtual fruit’ model to design peach genotype for sustainable production systems. Eur J Agron 42:34–48
Article Google Scholar
Hoogenboom G, White JW, Acosta-Gallegos J, Gaudiel RG, Myers JR, Silbernagel MJ (1997) Evaluation of a crop simulation model that incorporates gene action. Agron J 89:613–620
Article Google Scholar
Yin X, Struik PC, van Eeuwijk FA, Stam P, Tang J (2005) QTL analysis and QTL-based prediction of flowering phenology in recombinant inbred lines of barley. J Exp Bot 56:967–976
Article CAS PubMed Google Scholar
Dingkuhn M, Pasco R, Pasuquin JM, Damo J, Soulié JC, Raboin LM, Dusserre J, Sow A, Manneh B, Shrestha S, Kretzschmar T (2017) Crop-model assisted phenomics and genome-wide association study for climate adaptation of indica rice. 2. Thermal stress and spikelet sterility. J Exp Bot 68:4389–4406
Article CAS PubMed Google Scholar
White JW, Hoogenboom G (1996) Simulating effects of genes for physiological traits in a process-oriented crop model. Agron J 88:416–422
Article Google Scholar
Hoogenboom G, White JW (2003) Improving physiological assumptions of simulation models by using gene-based approaches. Agron J 95:82–89
Google Scholar
Chapman S, Cooper M, Podlich D, Hammer G (2003) Evaluating plant breeding strategies by simulating gene action and dryland environment effects. Agron J 95:99–113
Article Google Scholar
Stewart DW, Cober ER, Bernard RL (2003) Modeling genetic effects on the photothermal response of soybean phenological development. Agron J 95:65–70
Article Google Scholar
Messina CD, Jones JW, Boote KJ, Vallejos CE (2006) A gene-based model to simulate soybean development and yield responses to environment. Crop Sci 46:456–466
Article CAS Google Scholar
White JW, Herndl M, Hunt LA, Payne TS, Hoogenboom G (2008) Simulation-based analysis of effects of Vrn and Ppd loci on flowering in wheat. Crop Sci 48:678–687
Article Google Scholar
Wang E, Brown HE, Rebetzke GJ, Zhao Z, Zheng B, Chapman SC (2019) Improving process-based crop models to better capture genotype×environment×management interactions. J Exp Bot 70:2389–2401
Article CAS PubMed Google Scholar
Yin X, Stam P, Dourleijn CJ, Kropff MJ (1999) AFLP mapping of quantitative trait loci for yield-determining physiological characters in spring barley. Theor Appl Genet 99:244–253
Article CAS Google Scholar
Yin X, Kropff MJ, Horie T, Nakagawa H, Centeno HGS, Zhu D, Goudriaan J (1997) A model for photothermal responses of flowering in rice I. model description and parameterization. Field Crops Res 51:189–200
Article Google Scholar
Cooper M, Technow F, Messina C, Gho C, Totir LR (2016) Use of crop growth models with whole-genome prediction: application to a maize multienvironment trial. Crop Sci 56:2141–2156
Article Google Scholar
Tavaré S, Balding DJ, Griffiths RC, Donnelly P (1997) Inferring coalescence times from DNA sequence data. Genetics 145:505–518
Article PubMed PubMed Central Google Scholar
Messina CD, Technow F, Tang T, Totir R, Gho C, Cooper M (2018) Leveraging biological insight and environmental variation to improve phenotypic prediction: integrating crop growth models (CGM) with whole genome prediction (WGP). Eur J Agron 100:151–162
Article Google Scholar
Heslot N, Akdemir D, Sorrells ME, Jannink JL (2014) Integrating environmental covariates and crop modeling into the genomic selection framework to predict genotype by environment interactions. Theor Appl Genet 127:463–480
Article PubMed Google Scholar
de los Campos G, Pérez-Rodríguez P, Bogard M, Gouache D, Crossa J (2020) A data-driven simulation platform to predict cultivars’ performances under uncertain weather conditions. Nat Commun 11:4876
Article PubMed Central Google Scholar
Jarquín D, Crossa J, Lacaze X, Du Cheyron P, Daucourt J, Lorgeou J, Piraux F, Guerreiro L, Pérez P, Calus M, Burgueño J, de los Campos G (2014) A reaction norm model for genomic selection using high-dimensional genomic and environmental data. Theor Appl Genet 127:595–607
Article PubMed Google Scholar
Ly D, Chenu K, Gauffreteau A, Rincent R, Huet S, Gouache D, Martre P, Bordes J, Charmet G (2017) Nitrogen nutrition index predicted by a crop model improves the genomic prediction of grain number for a bread wheat core collection. Field Crops Res 214:331–340
Article Google Scholar
Robert P, Le Gouis J, BreedWheat Consortium, Rincent R (2020) Combining crop growth modeling with trait-assisted prediction improved the prediction of genotype by environment interactions. Front Plant Sci 11:827
Article PubMed PubMed Central Google Scholar
Hori K, Kataoka T, Miura K, Yamaguchi M, Saka N, Nakahara T, Sunohara Y, Ebana K, Yano M (2012) Variation in heading date conceals quantitative trait loci for other traits of importance in breeding selection of rice. Breed Sci 62:223–234
Article PubMed PubMed Central Google Scholar
Meuwissen TH, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829
Article CAS PubMed PubMed Central Google Scholar
Alimi NA (2016). Statistical methods for QTL mapping and genomic prediction of multiple traits and environments: case studies in pepper. Thesis submitted in fulfilment of the requirements for the degree of doctor at Wageningen University
Google Scholar
Uptmoor R, Pillen K, Matschegewski C (2017) Combining genome-wide prediction and a phenology model to simulate heading date in spring barley. Field Crops Res 202:84–93
Article Google Scholar
Rosen A, Hasan Y, Briggs W, Uptmoor R (2018) Genome-based prediction of time to curd induction in cauliflower. Front Plant Sci 9:78
Article PubMed PubMed Central Google Scholar
Toda Y, Wakatsuki H, Aoike T, Kajiya-Kanegae H, Yamasaki M, Yoshioka T, Ebana K, Hayashi T, Nakagawa H, Hasegawa T, Iwata H (2020) Predicting biomass of rice with intermediate traits: modeling method combining crop growth models and genomic prediction models. PLoS One 15:e0233951
Article CAS PubMed PubMed Central Google Scholar
Chen TS, Aoike T, Yamasaki M, Kajiya-Kanegae H, Iwata H (2020) Predicting rice heading date using an integrated approach combining a machine learning method and a crop growth model. Front Genet 11:599510
Article PubMed PubMed Central Google Scholar
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining 785–794
Google Scholar
Lamsal A, Welch SM, White JW, Thorp KR, Bello NM (2018) Estimating parametric phenotypes that determine anthesis date in Zea mays: challenges in combining ecophysiological models with genetics. PLoS One 13:e0195841
Article PubMed PubMed Central Google Scholar
Yin X, Kropff MJ, Goudriaan J, Stam P (2000) A model analysis of yield differences among recombinant inbred lines in barley. Agron J 92:114–120
Article Google Scholar
Bannayan M, Kobayashi K, Marashi H, Hoogenboom G (2007) Gene-based modelling for rice: an opportunity to enhance the simulation of rice growth and development? J Theor Biol 249:593–605
Article CAS PubMed Google Scholar
Quilot B, Génard M, Lescourret F, Kervella J (2005) Simulating genotypic variation of fruit quality in an advanced peach x Prunus davidiana cross. J Exp Bot 56:3071–3081
Article CAS PubMed Google Scholar
ter Braak CJF, Vrugt JA (2008) Differential evolution Markov chain with snooker updater and fewer chains. Stat Comput 18:435–446
Article Google Scholar
Beven K, Binley A (1992) The future of distributed models: model calibration and uncertainty prediction. Hydrol Process 6:279–298
Article Google Scholar
Dumont B, Leemans V, Mansouri M, Bodson B, Destain J-P, Destain M-F (2014) Parameter identification of the STICS crop model, using an accelerated formal MCMC approach. Environ Model Softw 52:121–135
Article Google Scholar
Acharya S, Correll M, Jones JW, Boote KJ, Alderman PD, Hu Z, Vallejos CE (2017) Reliability of genotype-specific parameter estimation for crop models: insights from a Markov chain Monte-Carlo estimation approach. Trans ASABE 60:1699–1712
Article Google Scholar
He J, Dukes MD, Jones JW, Graham WD, Judge J (2009) Applying GLUE for estimating CERES-maize genetic and soil parameters for sweet corn production. Trans ASABE 52:1907–1921
Article Google Scholar
Nelder JA, Mead R (1965) A simplex method for function minimization. Comput J 7:308–313
Article Google Scholar
Kennedy J, Eberhart R (1995) Particle swarm optimization. Proceedings of the ICNN’95-international conference on neural networks 4, 1942-1948
Google Scholar
Varona L, Moreno C, Cortes LAG, Yagüe G, Altarriba J (1999) Two-step versus joint analysis of Von Bertalanffy function. J Anim Breed Genet 116:331–338
Article Google Scholar
Onogi A (2020) Connecting mathematical models to genomes: joint estimation of model parameters and genome-wide marker effects on these parameters. Bioinformatics 36:3169–3176. The package GenomeBasedModel is available at https://github.com/Onogi/GenomeBasedModel
Article CAS PubMed Google Scholar
Ma CX, Casella G, Wu R (2002) Functional mapping of quantitative trait loci underlying the character process: a theoretical framework. Genetics 161:1751–1762
Article PubMed PubMed Central Google Scholar
Malosetti M, Visser RG, Celis-Gamboa C, van Eeuwijk FA (2006) QTL methodology for response curves on the basis of non-linear mixed models, with an illustration to senescence in potato. Theor Appl Genet 113:288–300
Article CAS PubMed Google Scholar
Sillanpää MJ, Pikkuhookana P, Abrahamsson S, Knürr T, Fries A, Lerceteau E, Waldmann P, García-Gil MR (2012) Simultaneous estimation of multiple quantitative trait loci and growth curve parameters through hierarchical Bayesian modeling. Heredity (Edinb) 108:134–146
Article Google Scholar
Rincent R, Kuhn E, Monod H, Oury FX, Rousset M, Allard V, Le Gouis J (2017) Optimization of multi-environment trials for genomic selection based on crop models. Theor Appl Genet 130:1735–1752
Article CAS PubMed PubMed Central Google Scholar
DSSAT website https://dssat.net/. Accessed on 30 June 2021
APSIM website https://www.apsim.info/. Accessed on 30 Jan 2021
Nakagawa H, Yamagishi J, Miyamoto N, Motoyama M, Yano M, Nemoto K (2005) Flowering response of rice to photoperiod and temperature: a QTL analysis using a phenological model. Theor Appl Genet 110:778–786
Article CAS PubMed Google Scholar
Muchow RC, Sinclair TR, Bennett JM (1990) Temperature and solar radiation effects on potential maize yield across locations. Agron J 82:338–343
Article Google Scholar
Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M, Brubaker M, Guo J, Li P, Riddell A (2017) Stan: a probabilistic programming language. J Stat Softw 76:1–32. https://doi.org/10.18637/jss.v076.i01
Article Google Scholar
Tran D, Kucukelbir A, Dieng AB, Rudolph M, Liang D & Blei DM (2016). Edward: a library for probabilistic modeling, inference, and criticism. arXiv preprint arXiv:1610.09787
Google Scholar
Park T, Casella G (2008) The Bayesian lasso. J Am Stat Assoc 103:681–686
Article CAS Google Scholar
Mutshinda CM, Sillanpää MJ (2010) Extended Bayesian lasso for multiple quantitative trait loci mapping and unobserved phenotype prediction. Genetics 186:1067–1075
Article PubMed PubMed Central Google Scholar
Habier D, Fernando RL, Kizilkaya K, Garrick DJ (2011) Extension of the Bayesian alphabet for genomic selection. BMC Bioinformatics 12:186
Article PubMed PubMed Central Google Scholar
Yin X, Chasalow SD, Dourleijn CJ, Stam P, Kropff MJ (2000) Coupling estimated effects of QTLs for physiological traits to a crop growth model: predicting yield variation among recombinant inbred lines in barley. Heredity (Edinb) 85:539–549
Article CAS Google Scholar
Reymond M, Muller B, Leonardi A, Charcosset A, Tardieu F (2003) Combining quantitative trait loci analysis and an ecophysiological model to analyze the genetic variability of the responses of maize leaf growth to temperature and water deficit. Plant Physiol 131:664–675
Article CAS PubMed PubMed Central Google Scholar
Quilot B, Kervella J, Génard M, Lescourret F (2005) Analysing the genetic control of peach fruit quality through an ecophysiological model combined with a QTL approach. J Exp Bot 56:3083–3092
Article CAS PubMed Google Scholar
Laperche A, Devienne-Barret F, Maury O, Le Gouis J, Ney B (2006) A simplified conceptual model of carbon/nitrogen functioning for QTL analysis of winter wheat adaptation to nitrogen deficiency. Theor Appl Genet 113:1131–1146
Article CAS PubMed Google Scholar
Welcker C, Boussuge B, Bencivenni C, Ribaut JM, Tardieu F (2007) Are source and sink strengths genetically linked in maize plants subjected to water deficit? A QTL study of the responses of leaf growth and of Anthesis-Silking interval to water deficit. J Exp Bot 58:339–349
Article CAS PubMed Google Scholar
Letort V, Mahe P, Cournède PH, de Reffye P, Courtois B (2008) Quantitative genetics and functional-structural plant growth models: simulation of quantitative trait loci detection for model parameters and application to potential yield optimization. Ann Bot 101:1243–1254
Article PubMed Google Scholar
Uptmoor R, Schrag T, Stützel H, Esch E (2008) Crop model based QTL analysis across environments and QTL based estimation of time to floral induction and flowering in Brassica oleracea. Mol Breeding 21:205–216
Article Google Scholar
Uptmoor R, Osei-Kwarteng M, Gürtler S, Stützel H (2009) Modeling the effects of drought stress on leaf development in a Brassica oleracea doubled haploid population using two-phase linear functions. J Amer Soc Hort Sci 134:543–552
Article Google Scholar
Prudent M, Lecomte A, Bouchet JP, Bertin N, Causse M, Génard M (2011) Combining ecophysiological modelling and quantitative trait locus analysis to identify key elementary processes underlying tomato fruit sugar concentration. J Exp Bot 62:907–919
Article CAS PubMed Google Scholar
Welcker C, Sadok W, Dignat G, Renault M, Salvi S, Charcosset A, Tardieu F (2011) A common genetic determinism for sensitivities to soil water deficit and evaporative demand: meta-analysis of quantitative trait loci and introgression lines of maize. Plant Physiol 157:718–729
Article CAS PubMed PubMed Central Google Scholar
Uptmoor R, Li J, Schrag T, Stützel H (2012) Prediction of flowering time in Brassica oleracea using a quantitative trait loci-based phenology model. Plant Biol (Stuttg) 14:179–189
CAS Google Scholar
Gu J, Yin X, Zhang C, Wang H, Struik PC (2014) Linking ecophysiological modelling with quantitative genetics to support marker-assisted crop design for improved yields of rice (Oryza sativa) under drought stress. Ann Bot 114:499–511
Article PubMed PubMed Central Google Scholar
Bogard M, Ravel C, Paux E, Bordes J, Balfourier F, Chapman SC, Le Gouis J, Allard V (2014) Predictions of heading date in bread wheat (Triticum aestivum L.) using QTL-based parameters of an ecophysiological model. J Exp Bot 65:5849–5865
Article CAS PubMed PubMed Central Google Scholar
Rebolledo MC, Dingkuhn M, Courtois B, Gibon Y, Clément-Vidal A, Cruz DF, Duitama J, Lorieux M, Luquet D (2015) Phenotypic and genetic dissection of component traits for early vigour in rice using plant growth modelling, sugar content analyses and association mapping. J Exp Bot 66:5555–5566
Article CAS PubMed PubMed Central Google Scholar
Constantinescu D, Memmah MM, Vercambre G, Génard M, Baldazzi V, Causse M, Albert E, Brunel B, Valsesia P, Bertin N (2016) Model-assisted estimation of the genetic variability in physiological parameters related to tomato fruit growth under contrasted water conditions. Front Plant Sci 7:1841
Article PubMed PubMed Central Google Scholar
Hwang C, Correll MJ, Gezan SA, Zhang L, Bhakta MS, Vallejos CE, Boote KJ, Clavijo-Michelangeli JA, Jones JW (2017) Next generation crop models: a modular approach to model early vegetative and reproductive development of the common bean (Phaseolus vulgaris L). Agric Syst 155:225–239
Article CAS PubMed PubMed Central Google Scholar
Gouache D, Bogard M, Pegard M, Thepot S, Garcia C, Hourcade D, Paux E, Oury F, Rousset M, Deswarte J, Le Bris X (2017) Bridging the gap between ideotype and genotype: challenges and prospects for modelling as exemplified by the case of adapting wheat (Triticum aestivum L.) phenology to climate change in France. Field Crops Res 202:108–121
Article Google Scholar
Kadam NN, Jagadish SVK, Struik PC, van der Linden CG, Yin X (2019) Incorporating genome-wide association into eco-physiological simulation to identify markers for improving rice yields. J Exp Bot 70:2575–2586
Article CAS PubMed PubMed Central Google Scholar
Bogard M, Biddulph B, Zheng B, Hayden M, Kuchel H, Mullan D, Allard V, Gouis JL, Chapman SC (2020) Linking genetic maps and simulation to optimize breeding for wheat flowering time in current and future climates. Crop Sci 60:678–699
Article CAS Google Scholar
Wu W, Zhou Y, Li W, Mao D, Chen Q (2002) Mapping of quantitative trait loci based on growth models. Theor Appl Genet 105:1043–1049
Article PubMed Google Scholar
Marguerit E, Brendel O, Lebon E, Van Leeuwen C, Ollat N (2012) Rootstock control of scion transpiration and its acclimation to water deficit are controlled by different genes. New Phytol 194:416–429
Article CAS PubMed Google Scholar
Li Z, Hallingbäck HR, Abrahamsson S, Fries A, Gull BA, Sillanpää MJ, García-Gil MR (2014) Functional multi-locus QTL mapping of temporal trends in scots pine wood traits. G3 (Bethesda) 4:2365–2379
Article Google Scholar
Amelong A, Gambín BL, Severini AD, Borrás L (2015) Predicting maize kernel number using QTL information. Field Crops Res 172:119–131
Article Google Scholar
Wei K, Wang J, Sang M, Zhang S, Zhou H, Jiang L, Clavijo Michelangeli JA, Vallejos CE, Wu R (2018) An ecophysiologically based mapping model identifies a major pleiotropic QTL for leaf growth trajectories of Phaseolus vulgaris. Plant J 95:775–784
Article CAS Google Scholar
Baker RL, Leong WF, Welch S, Weinig C (2018) Mapping and predicting non-linear Brassica rapa growth phenotypes based on Bayesian and frequentist complex trait estimation. G3 (Bethesda) 8:1247–1258
Article CAS Google Scholar
Khan MS, Struik PC, van der Putten PEL, Jansen HJ, van Eck HJ, van Eeuwijk FA, Yin X (2019) A model-based approach to analyse genetic variation in potato using standard cultivars and a segregating population. I. Canopy cover dynamics. Field Crops Res 242:107581
Article Google Scholar
Khan MS, Yin X, van der Putten PEL, Jansen HJ, van Eck HJ, van Eeuwijk FA, Struik PC (2019) A model-based approach to analyse genetic variation in potato using standard cultivars and a segregating population. II. Tuber bulking and resource use efficiency. Field Crops Res 242:107582
Article Google Scholar
Yin S, Li P, Xu Y, Liu J, Yang T, Wei J, Xu S, Yu J, Fang H, Xue L, Hao D, Yang Z, Xu C (2020) Genetic and genomic analysis of the seed-filling process in maize based on a logistic model. Heredity (Edinb) 124:122–134
Article CAS Google Scholar

Download references

Acknowledgments

I thank Dr. Hiroshi Nakagawa for providing useful information on the history of CGMs. I also thank Dr. Yubin Yang for his advice on GenomeBasedModel.

Author information

Authors and Affiliations

Department of Plant Life Science, Faculty of Agriculture, Ryukoku University, Otsu, Shiga, Japan
Akio Onogi

Authors

Akio Onogi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CIRAD, UMR AGAP Institut, Montpellier, France
Nourollah Ahmadi
UMR AGAP Institut, CIRAD, Montpellier, France
Jérôme Bartholomé

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Onogi, A. (2022). Integration of Crop Growth Models and Genomic Prediction. In: Ahmadi, N., Bartholomé, J. (eds) Genomic Prediction of Complex Traits. Methods in Molecular Biology, vol 2467. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2205-6_13

Download citation

DOI: https://doi.org/10.1007/978-1-0716-2205-6_13
Published: 22 April 2022
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2204-9
Online ISBN: 978-1-0716-2205-6
eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Integration of Crop Growth Models and Genomic Prediction

Abstract

Similar content being viewed by others

Modelling of Genotype by Environment Interaction and Prediction of Complex Traits across Multiple Environments as a Synthesis of Crop Growth Modelling, Genetics and Statistics

Challenges in Integrating Genetic Control in Plant and Crop Models

Modelling QTL-Trait-Crop Relationships: Past Experiences and Future Prospects

Key words

1 Introduction

2 Crop Growth Models

3 Gene-Based Models

4 CGMs and QTL Mapping

5 CGMs and GP

5.1 Overview

5.2 Predictive Ability of GP-Assisted CGMs

5.3 Genotype-Specific Parameters in CGMs

5.4 Parameter Estimation

6 Examples of CGMs Applications

6.1 Overview of Examples

6.2 DVR Model

Box 1 An R Script for the DVR Model (DVRmodel.R)

Box 2 An Rcpp Script for the DVR Model (DVRmodel.cpp)

6.3 Maize Growth Model

Box 3 An R Script for the Maize Growth Model (MaizeGrowthModel.R)

Box 4 An Rcpp Script for the Maize Growth Model (MaizeGrowthModel.cpp)

6.4 Examples of CGM Fitting

Box 5 Fitting and Optimization of the DVR Model

Box 6 Fitting and Optimization of the Maize Growth Model (RunMaizeGrowthModel.R)

6.5 Examples of the Joint Approach

Box 7 An Rcpp Script for the DVR Model Designed for GenomeBasedModel (DVRmodel_GBM.cpp)

Box 8 An Rcpp Script for the Maize Growth Model Designed for GenomeBasedModel (MaizeGrowthModel_GBM.cpp)

Box 9 An R Script for GenomeBasedModel for the DVR Model (RunDVRmodel_GBM.R)

Box 10 An R Script to Run GenomeBasedModel for the maize Growth Model (RunMaizeGrowthModel_GBM.R)

7 Concluding Remarks

8 Script Availability

References

Acknowledgments

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Search

Navigation